Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Error-rate testing to improve yield for error tolerant applications
(USC Thesis Other)
Error-rate testing to improve yield for error tolerant applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ERROR-RATE TESTING TO IMPROVE YIELD FOR ERROR-TOLERANT APPLICATIONS by Shideh M. Shahidi A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2008 Copyright 2008 Shideh M. Shahidi Dedication Dedicated to Minoo, Saeed, Mehdi ii Acknowledgements I would like to take this opportunity to thank my dissertation advisor, Professor Sandeep Gupta, for his generous time, supervision and help throughout the years. His outlook on problems taught me independent thinking. I thank him for guiding me through every step of my research. I thank Professor Melvin Breuer for his thorough review of this work and his invaluable feedback. I would also like to thank Professors Nenad Medvidovic, Keith Chugg, Alice Parker, and Antonio Ortega for their guidance throughout the course of this dissertation. I am grateful to other professors and staff at the Electrical EngineeringDepartmentofUniversityofSouthernCaliforniawhohelpedmeduring my studies. There are so many friends at USC who made this journey more pleasant and I would like to thank all of them for the company. In particular, I thank my friends and colleagues, Shahdad Irajpour and Ide Huang for always being available for discussions, and coffee breaks. I thank my friends and extended family all around the world for being patient with me not always reciprocating their expression of love and friendship because I was too busy at times. My thanks especially go to my mother and father in Law, Akram and Ali Hashemian, for their unconditional love and support. iii I thank my brother and sister, Mohammad and Shirin Shahidi, for always ac- cepting me for who I am and for being my first and oldest friends and inspirations. I am proud to be your sister. Above all, I would like to thank my mother and father, Minoo Vakilian and Saeed Shahidi, for teaching me my first and foremost lessons in life and always believing in me. I thank them for their love, trust, support, and the sacrifice they made in bearing the physical distance between us so I can grow. My father gave me my love for physics and oversaw my very first engineering project at an early age, and my mother taught me not everything in life can be expressed in terms of numbers and variables. I am grateful of you for the balance. Everything you taught me will always be with me. Last but not least, I would like to thank my husband, Mehdi Hashemian, for always supporting me, listening to me, understanding me, and putting a smile on my face despite his own worries and troubles. I thank him for bearing with my mood swings throughout these years and always being my constant. You teach me planning, perseverance, and determination. You help me keep my life in proper perspective. iv Table of Contents Dedication ii Acknowledgements iv List of Figures x List of Tables xii Abstract xiv Chapter 1: Introduction 1 1.1 Overview of error tolerance . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Yield improvement due to error tolerance . . . . . . . . . . . . . . . 6 1.3 Testing for error tolerance . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 Coverage requirements of a test . . . . . . . . . . . . . . . . 11 1.4 Different criteria for error tolerance . . . . . . . . . . . . . . . . . . 14 1.5 Previous work on testing for error tolerance . . . . . . . . . . . . . 15 1.6 Outline of this dissertation . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 2: Error-rate testing 19 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Random error-rate testing . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Deterministic error-rate testing . . . . . . . . . . . . . . . . . . . . 22 2.3.1 The central question and test application scenario . . . . . . 23 2.3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 Classification of faults as acceptable and unacceptable. . . . 26 2.3.3.1 Computation of error rate . . . . . . . . . . . . . . 27 2.3.3.2 Distribution of error rate. . . . . . . . . . . . . . . 31 2.4 Classical test generation and error-rate testing . . . . . . . . . . . . 34 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 v Chapter 3: Test generation for error-rate testing 38 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Error-rate testing for fanout-free circuits with primitive gates . . . . 39 3.2.1 Relationships between faults at a primitive gate . . . . . . . 40 3.2.2 Properties of fanout-free circuits . . . . . . . . . . . . . . . . 42 3.2.3 Properties of primitive gates in fanout-free circuits . . . . . 42 3.2.4 ERTG-ff: Error rate test generator for fanout-free circuits . 47 3.2.4.1 Faults detected by any test vector for a target fault 49 3.2.4.2 A new justification procedure . . . . . . . . . . . . 52 3.2.5 Challenges posed by arbitrary circuits . . . . . . . . . . . . . 56 3.2.5.1 Complex gates in fanout-free circuits . . . . . . . . 57 3.2.5.2 Non-reconvergent fanouts . . . . . . . . . . . . . . 60 3.2.5.3 Circuits with reconvergent fanouts . . . . . . . . . 62 3.2.5.4 Summary of complications . . . . . . . . . . . . . . 62 3.3 ERTG: Error Rate test generator . . . . . . . . . . . . . . . . . . . 63 3.3.1 Overview of ERTG . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.2 Key procedures . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.2.1 Fault effect propagation . . . . . . . . . . . . . . . 66 3.3.2.2 Justification procedure . . . . . . . . . . . . . . . . 71 3.3.2.3 Assigning values to partially specified inputs . . . . 73 3.3.2.4 Target fault selection . . . . . . . . . . . . . . . . . 77 3.3.3 Example of error-rate test generation . . . . . . . . . . . . . 79 3.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Chapter 4: Multi-vector tests: A path to perfect error-rate testing 85 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Multi-Vector Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3 Multi-vector test generation for single stuck-at faults . . . . . . . . 92 4.3.1 Necessary conditions for multi-vector test generation . . . . 93 4.3.2 Upper bound on the size of M-VTS . . . . . . . . . . . . . . 97 4.3.3 Multi-vector test generator for single stuck-at fault . . . . . 105 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter 5: Error-rate testing for delay faults 116 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2 Delay Faults and error-rate testing . . . . . . . . . . . . . . . . . . 118 5.2.1 Delay fault models . . . . . . . . . . . . . . . . . . . . . . . 120 5.2.2 Error-rate of a transition delay fault . . . . . . . . . . . . . 122 5.2.3 Central question and test application scenario . . . . . . . . 124 vi 5.3 Coverage of delay faults by stuck-at tests . . . . . . . . . . . . . . . 126 5.3.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 130 5.4 Test generation for transition delay faults . . . . . . . . . . . . . . . 131 5.4.1 Necessary conditions for multi-pair test generation . . . . . . 131 5.4.2 Upper bound on the size of M-PTS . . . . . . . . . . . . . . 137 5.4.3 Multi-pair test generator for delay faults . . . . . . . . . . . 140 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Chapter 6: Future research 145 6.1 Composite metric of error rate - significance . . . . . . . . . . . . . 145 6.2 Defect-based error-rate testing . . . . . . . . . . . . . . . . . . . . . 149 6.3 Extension of error-rate testing to sequential circuits . . . . . . . . . 153 Bibliography 155 vii List of Figures 1.1 Yield history for several manufacturers (500nm - 180nm) [39]. . . . 2 1.2 Classification of chips for (a) a classical application, and (b) an error tolerant application. . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Yield with respect to error tolerance, Y ET , versus process yield, Y. . 9 1.4 Ratio of yield with respect to error tolerance over process yield, Y ET /Y, versus process yield, Y. . . . . . . . . . . . . . . . . . . . . 10 1.5 Classification of chips by (a) an ideal test for error tolerance, and (b) a realistic (non-ideal) test for error tolerance. . . . . . . . . . . . . . 12 2.1 Policy of differentiation between acceptable and unacceptable faults. 32 2.2 Distribution of error rate for ISCAS85 benchmark circuits. . . . . . 33 3.1 Testing a single stuck-at fault associated with a gate in a fanout-free circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 An example circuit and the relationships between the sets of all vec- tors that detect the single stuck-at faults. . . . . . . . . . . . . . . . 46 3.3 Generating a test vector for target fault f t in a generic fanout-free circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 The proposed justification procedure, Justify-ff-allcv. . . . . . . . . 53 3.5 The proposed test generation procedure, ERTG-ff. . . . . . . . . . . 53 3.6 The fanout-free subcircuit in the fanin of the fault site. . . . . . . . 54 3.7 The sets of all test vectors for single stuck-at faults associated with (a) a two-input NAND gate, and (b) a two-to-one multiplexer. . . . 57 viii 3.8 The sets of all test vectors for single faults associated with a two- input XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.9 (a) An example circuit with non-reconvergent fanout, and (b) a generic circuit with non-reconvergent fanout. . . . . . . . . . . . . . 61 3.10 Flowchart of the proposed ERTG. . . . . . . . . . . . . . . . . . . . 64 3.11 (a) A generic circuit with fanouts partitioned into FFRs, and (b) potential propagation path graph for a target fault at a line within FFR0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.12 Propagation cost of an example UPP. . . . . . . . . . . . . . . . . . 69 3.13 Justification of {1} at the output of a 3-input NAND gate. . . . . . 72 3.14 Example gate with partially specified values. . . . . . . . . . . . . . 76 3.15 Specification of values in an example circuit. . . . . . . . . . . . . . 77 3.16 Selection of target faults. . . . . . . . . . . . . . . . . . . . . . . . . 78 3.17 Error-rate test generation for c17. . . . . . . . . . . . . . . . . . . . 80 4.1 Multi-vector testing. . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 An example multiplexer and the set of all test vectors for all stuck-at faults at its inputs and output. . . . . . . . . . . . . . . . . . . . . 89 4.3 An example XOR gate and the set of all test vectors for all stuck-at faults at its inputs and output. . . . . . . . . . . . . . . . . . . . . 90 4.4 A generic fanout-free circuit. . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Example circuit to derive upper bound on the size of M-VTS. . . . 102 4.6 Generic circuit with non-reconvergent fanout. . . . . . . . . . . . . 103 4.7 Example circuit for multi-vector test generation. . . . . . . . . . . . 106 4.8 First search tree for example execution of multi-vector test generator. 108 4.9 Secondsearchtreeforexampleexecutionofmulti-vectortestgenerator.109 4.10 Third search tree for example execution of multi-vector test generator.109 5.1 An example combinational block with input and output flip-flops. . 119 ix 5.2 Delay test application methodologies. . . . . . . . . . . . . . . . . . 120 5.3 Distribution of error rate of stuck-at faults and delay faults for a benchmark circuits, c880. . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4 An example AND gate with a slow-to-rise fault at the output. . . . 125 5.5 A generic fanout-free circuit. . . . . . . . . . . . . . . . . . . . . . . 137 5.6 Example circuit for multi-pair test generation. . . . . . . . . . . . . 141 5.7 The first search tree for the example execution of multi-vector test generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.8 Thesecondsearchtreefortheexampleexecutionofmulti-vectortest generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.1 Acceptance curve based on error significance and error rate. . . . . 147 6.2 Separation of faults based on significance and rate of error. . . . . . 149 6.3 A general model of a sequential circuit. . . . . . . . . . . . . . . . . 153 x List of Tables 1.1 Classification of acceptable and unacceptable chips. . . . . . . . . . 13 2.1 Acceptance gains of approaches 1, 2, and 3; T er = 0.1. . . . . . . . . 36 3.1 Fault categories and their properties . . . . . . . . . . . . . . . . . 56 3.2 List of weights of acceptable faults in c17, T er = 0.3. . . . . . . . . . 79 3.3 unacceptable coverage, acceptance gain, test-set size, and run-time of tests generated by ERTG; T er = 0.1. . . . . . . . . . . . . . . . . 82 3.4 Acceptance gain, normalized test-set size, and normalized run-time for different threshold error rates, for c880. . . . . . . . . . . . . . . 83 4.1 Local detection of a fault at the input of (a) an AND gate, and (b) an XOR gate. Both gates have inputs a and b, and output c. . . . . 93 4.2 Local propagation of a fault effect through (a) an AND gate, and (b) an XOR gate. Both gates have inputs e and f, and output g. . . . . 94 4.3 Justification of a value at the output of (a) an AND gate, and (b) an XOR gate. Both gates have inputs i and j, and output k. . . . . . . 95 4.4 Acceptance rate and size of M-VTS for ISCAS 85 circuits. T er = 0.1. 114 5.1 Conditions that a sequence of vectors [v i −v p ], must satisfy to de- tect a TF, expressed in terms of the vector’s ability to detect the corresponding SSAF [35]. . . . . . . . . . . . . . . . . . . . . . . . . 123 5.2 Error rate of a TF in terms of error rate of the corresponding SSAF. 123 5.3 Acceptable and unacceptable TFs and the corresponding SSAFs. . . 127 5.4 Acceptance gains of Approaches 2 and 4; T er = 0.1. . . . . . . . . . 130 xi 5.5 Local detection of a fault at the input of (a) an AND gate, and (b) an XOR gate. Both gates have inputs a and b, and output c. . . . . 133 5.6 Local propagation of a fault effect through (a) an AND gate, and (b) an XOR gate. Both gates have inputs e and f, and output g. . . . . 134 5.7 Justification of a value at the output of (a) an AND gate, and (b) an XOR gate. Both gates have inputs i and j, and output k. . . . . . . 135 xii Abstract VLSI scaling has entered an era where achieving desired yields is becoming in- creasinglychallenging. Theconceptoferrortolerancehasbeenpreviouslyproposed withthegoalofreversingthistrendforclassesofsystemswhichdonotrequirecom- pletely error-free operation. Such systems include audio, speech, video, graphics, and digital communications. Analysis of such applications has identified error rate as one of the key metrics of error severity. Error rate is defined as the percentage of clock cycles for which the value at the outputs deviates from the correspond- ing error-free value. An error tolerant application provides a threshold error rate. Chips with error rate less than the threshold are considered acceptable and can be used; other chips must be discarded. In order to maximize the yield gain by error tolerance a test must discard all unacceptable chips while discarding no acceptable chips. Ourmainobjectiveinerror-ratetestingistodetectallunacceptablefaultswhile not detecting any acceptable faults. Maintaining test generation and application times comparable to classical testing is our second objective. We prove that in arbitrary circuits the main objective is not always achievable. However, we develop a test generator that minimizes the number of acceptable faults that are detected. Our results show that it is possible to discard all chips with an unacceptable fault while discarding only a small percentage of chips with an acceptable fault. We introduce the new notion of multi-vector testing where testing is performed using a set of test sessions, each including multiple vectors. We redefine the con- ditions under which a chip is accepted or discarded, and prove that using this new xiii notion, our main objective of error-rate testing is achievable for all fault models. We theoretically derive a universal upper bound on the number of required vec- tors. This large upper bound is an overhead in conflict with our second objective. Therefore, we deploy modeled faults and a structural approach to achieve our main objective via fewer vectors. Our results confirm that the promise of error tolerance, namely higher yields can be achieved at little to no compromise in costs. xiv Chapter 1 Introduction Increasing process variations, defect rates, infant mortality rates, and susceptibility tointernalandexternalnoise, continuetodecreaseyieldfordigitalchips. Although yield has always been an issue, it might soon become more critical as the decrease may become intolerable in a few technology generations [33, 39]. Figure 1.1 [39] showstheyieldhistoryforseveralmanufacturersofdigitallogic. Thecurvesshowa decreasingtrendininitialandmatureyieldsforanumberoftechnologygenerations. Concepts of fault tolerance (FT) and defect tolerance (DT) have been explored to solve the problem of diminishing yields. These concepts are mostly based on adding redundancy to a design. Fault tolerance (FT) is defined as the ability of a system to continue correct execution of its tasks after the occurrence of hardware or software errors [7, 38]. Fault avoidance techniques use redundant structures to ensure that likely defects do not alter circuit functionality. According to [7] no authoritative definition exists for defect tolerance (DT) in the literature, but the term refers to any circuit imple- mentation that tolerates more defects than a non-defect-tolerant implementation. 1 Figure 1.1: Yield history for several manufacturers (500nm - 180nm) [39]. Defect tolerant techniques such as using multiple vias to connect two wire segments on different layers have also proven to be useful. Redundant logic and interconnects used by FT and DT techniques cause over- heads and are subject to manufacturing defects as well. Most of these techniques are therefore beneficial for circuits with regular structures, such as memories and busses, but typically much less effective for circuits with irregular structures. To date, other than regular structures, these techniques have been applied only when the design is not cost sensitive, as in space avionics applications. Authors show in [10] that direct application of existing fault tolerance approaches fails to improve functional yield in presence of massive number of defects. In addition, they show that some defect tolerance approaches are more suitable than fault tolerance ap- proaches for improving yields in the presence of high defect rates. Hence, current 2 defecttolerancetechniquesmustbemodifiedtopreservebenefitsofscaling-namely, lower cost, more functionality, lower power consumption, and higher performance. Limitations of FT and DT approaches motivate us to look for new venues to reverse the decreasing trend in yield. In FT and DT approaches the focus is on redesigning and reconfiguring the circuit with the objective of achieving error-free outputs. However, some applications have the ability to tolerate certain errors at the outputs, as long as error severities are within certain levels. For these applications, the set of fabricated chips that can be used can be ex- panded by developing and exploring a new paradigm for testing, namely intelligible testing, proposed in [5, 7, 9]. A chip may perform acceptably for some applications even if it contains a number of defects that only cause errors with certain pre- specified characteristics. This concept is referred to as error tolerance (ET). Error tolerance either by itself or in conjunction with FT and DT can provide higher functional yields and lower costs. Next, the concept of error tolerance is described in more detail. A number of error tolerant applications, different criteria for error tolerance, and yield benefits of error tolerance, are discussed in following sections. 1.1 Overview of error tolerance According to [5, 6, 7, 8, 9] a circuit is error tolerant (ET) with respect to an ap- plication, or a system that implements the application, if (i) it contains defects 3 Figure 1.2: Classification of chips for (a) a classical application, and (b) an error tolerant application. that cause internal and may cause external errors, and (ii) the system that incor- porates this circuit produces acceptable results. Unlike in DT and FT, a circuit may produce erroneous outputs (results) and still be used in a hardware system implementing an error tolerant application. Figure 1.2(a) shows that in a classical application, a defective chip that causes an error at the outputs is considered unacceptable, regardless of the degree of error. However in Figure 1.2(b) it is shown that for an error tolerant application, some of the chips that would be considered unacceptable in case (a) are deemed acceptable. Using chips from both ”error-free” and ”erroneous but acceptable” bins provides higher yield for error tolerant applications. Someoftheclassesofapplicationsinwhichtheconceptoferrortolerancecanbe applied include audio, speech, graphics, video, games, and digital communications. A number of studies have analyzed such applications [11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 26, 48, 49, 65, 68, 69, 70]. 4 Anumberofblocksusedindifferenterrortolerantapplicationshavebeenstudied and found to perform acceptably in presence of any one of a large set of defects or faults. This happens for two main reasons. First, some defects or faults in these blocks do not cause errors at the blocks’ outputs. Second, and more importantly, many other defects or faults do cause errors at the blocks’ outputs, but either (i) cause errors with only small significance, or (ii) cause errors with low rates. (Here the terms significance and rate are used informally. Formal definitions of these terms will be presented ahead.) In either case, the defect or fault causes only a slight degradation in the performance of the system. More precisely, in [13, 15, 22], the effect of single and multiple stuck-at faults in the motion estimation block of an MPEG encoder is studied. Stuck-at faults are inserted into different architectures implementing this block and the same video streamisencodedbyafault-freeandafaultyversionoftheencoder. Boththeresults are then decoded and the differences between the average peak signal to noise ratio of the two signals are computed as a measure of degradation. The results show that many faults have minimal impact on the performance of the motion estimator; however, the actual degradation depends on the video content, the location of the fault, and its polarity. For example, for one architecture, about 99.2% of the single stuck-at faults in the motion estimation block resulted in acceptable degradation in output quality of the video [13, 15, 22]. In [17, 18], the functional operation of discrete cosine transform (DCT) of a JPEG encoder is studied in presence of single stuck-at faults. In most architec- tures, DCT is followed by quantization. A stuck-at fault is inserted into the DCT 5 circuitry. Then two-versions of the encoder, one fault-free and the other faulty, process the same video stream. The outputs of both encoder versions are decoded. The difference between the PSNR of the two results is calculated and the level of degradation is computed. Authors in [18] found that more than 50% of the faults resulted in imperceptible quality degradation. In[48,49], itwasshownthatintransmissionofcodeddataoveranoisychannel, many faults in the memory blocks embedded within the decoder cause errors at the output of the block with only slightly higher rates than the block error rate of the fault-free decoder, resulting in minor and acceptable performance loss. It was also shown in [11, 12, 70] that some faults in the memory of an answering machine lead to acceptable recording and playback quality. Acceptance of the notion of error tolerance by the IC industry requires devel- opment of new techniques for system specification, design, testing, reconfiguration, binning, and marketing [10]. 1.2 Yield improvement due to error tolerance Inclassicaltesting,processyield,Y,isdefinedasthefractionofthemanufactured chips that is defect-free [35]. Yield with respect to error tolerance, Y ET , is defined as the fraction of manufactured parts that is acceptable (i.e., defect-free or defective but acceptable). Clearly, by enabling acceptance of chips that would normallyberejectedinclassicaltesting, exploitationofthenotionoferrortolerance improves yield for error tolerant applications. In this section simple formulation for 6 the increase in yield is presented in terms of the area of the chip, average defect density and the percentage of defects that are considered acceptable and classified accordingly by appropriate testing approach. Defectdensity,D, isdefinedasthenumberofdefectsperunitchiparea. Defects are assumed to be independent and randomly distributed over a chip area. The number of defects that fall in a chip area, A, is a poisson random variable denoted by K. f(K =k) = e −DA (DA) k k! . Assuming a defect density, D, with probability distribution function, f(D), the overall yield of all chips is as follows [50]. Y = Z ∞ 0 e −DA f(D)dD. (1.1) In this formulation, only defect-free chips contribute to the yield. However for error tolerant testing contribution of acceptable defects to the yield must be quantified. Consider n possible distinct single defects. Assume that every single defect is equally likely and independent of others. Probability of a circuit being defect-free is denoted by Y 0 . Y 0 can be calculated as follows. Y 0 = (1−p) n , 7 where p is probability of occurrence of each defect in the circuit. Under above as- sumptions, Y 0 is approximately equal toY. From this equation and approximation, probability of occurrence of a single defect can be found as p = 1− n √ Y, (1.2) where Y can be computed in terms of A and D from Equation 1.1. Assumethatmeansareavailabletoanalyzealldefectsinthecircuitandidentify those that cause only an acceptable error for a particular application. Assume that n a out of n single defects cause acceptable performance. Yield with respect to error tolerance, Y ET , was previously defined as the fraction of manufactured parts that is either defect-free or has an acceptable defect. Under the above assumptions, Y ET is approximately equal to probability of a chip being defect-free or containing one of the acceptable defects. Y ET can be calculated as follows. Y ET = (1−p) (n−na) . By substituting p from Equation 1.2, Y ET =Y (1− na n ) . (1.3) The improvement in yield can be measured as follows. Y ET Y =Y − na n . (1.4) 8 Figure 1.3: Yield with respect to error tolerance, Y ET , versus process yield, Y. GraphsinFigures1.3and1.4showY ET andY ET /Y versuschangesinY fordifferent n a /n values. Graphs show that the higher the n a /n, the greater the improvement is in yield due to error tolerance. For example, authors in [18] found that more than 50% of the single stuck- at faults resulted in imperceptible quality degradation. With a process yield of Y = 0.5, functional yield with respect to error tolerance can be found to be around 0.7, or 40% higher than the original process yield. The yield analysis in this section uses a simple model which assumes single, equally likely, and independent defects. Future research will consider correlations between defects. 1.3 Testing for error tolerance Testing for error tolerance in mainly focused on developing test techniques for chips used in error tolerant applications. The application provides an acceptance criteria 9 Figure 1.4: Ratio of yield with respect to error tolerance over process yield, Y ET /Y, versus process yield, Y. by defining metrics and severities of errors and threshold values to determine errors that are acceptable. The test technique must distinguish chips that can have an error at the output which is not acceptable based on the acceptance criteria, from chips that are error-free or can only have acceptable errors. In classical testing, the main objective of a test is to distinguish between chips with erroneous outputs and those with error-free outputs. It is possible for the objective of testing to separate the chips into three bins, namely those with unacceptable, acceptable, and no error. In this case, a set of tests generated in a classical manner can be applied to the chips first. Chips that pass this test are considered error-free. A set of tests generated for error tolerance are then applied to the chips that fail the first set of tests, namely erroneous chips. Chips that pass the second set of tests are chips with acceptable error and those that fail this test are chips with unacceptable error. 10 In a second approach the tests generated for error tolerance can be applied to the chips first. Those chips that fail this test are chips with unacceptable error. Those that pass the test go through a second round of testing with tests generated by a classical test generator. Chips that fail this test are chips with acceptable errors and those that pass it are error-free chips. Acquiring of the first or the second approach results in different test times, depending on the number of chips belonging to each bin and the test generation time of each of the test generators. In this work, the focus is on separating the chips with unacceptable errors from the rest of the chips. 1.3.1 Coverage requirements of a test Since our new approaches in testing for error tolerance may be imperfect, a fraction of acceptable chips might be identified as unacceptable and discarded. Also some unacceptable chips may not be identified as unacceptable and may be sold. Figure 1.5 shows the difference between classification of chips by (a) an ideal, and (b) a realistic non-ideal test approach. In general, once the error tolerance testing is performed, there will be four categories of chips as shown in Table 1.1. Whenconsideringallthechips, chipsinCase-1andCase-2collectivelyrepresent the actual set of unacceptable chips. Similarly, those in Case-3 and Case-4 collec- tively represent the actual set of acceptable chips. Chips in Case-1 and Case-4 are correctly identified as unacceptable and acceptable, respectively. Chips in Case-2 and Case-3 are identified incorrectly by the tests for error tolerance. 11 Figure 1.5: Classification of chips by (a) an ideal test for error tolerance, and (b) a realistic (non-ideal) test for error tolerance. Case-1 represents unacceptable chips that are correctly identified as unaccept- able by the test. The percentage of unacceptable chips that fall into this category signifies the quality of the error-tolerant test. We refer to the percentage of unac- ceptable chips that fall into this category as unacceptable coverage. Case-2 represents chips that are classified as acceptable by the test method, but are actually unacceptable. If a non-negligible fraction of all chips fall into this case, then the quality of test will be in question. We refer to the percentage of unacceptable chips that fall into this case as test escape. We design our test approaches to ensure that test escape is negligible. ThepercentageofacceptablechipsinCase-4,namelythepercentageoferroneous butacceptablechipsthatarecorrectlyidentifiedasacceptablebythetest,isreferred toasacceptance gain andrepresentsthebenefitsoferrortoleranceovertheclassical method of testing, which discards all error producing chips. 12 There might also be chips that are classified as unacceptable, but are actually acceptable. These chips fall in Case-3 and represent the shortcomings of the test approach compared to a hypothetical perfect approach that can identify all accept- able chips. We refer to the percentage of acceptable chips in Case-3 as acceptable coverage. Note that even when a fraction of erroneous but acceptable chips fall in Case-3, error tolerance approach can still be beneficial compared to classical meth- ods of testing if it identifies as acceptable a large fraction of chips (see Case-4). WhatwaspreviouslyshowninFigure1.5as”unacceptablesold”and”acceptable discarded” is what is defined here as test escape and acceptable coverage, respec- tively. (These are generalizations to the context of error tolerance of the concepts of defect level and overkill, respectively.) Table 1.1: Classification of acceptable and unacceptable chips. Actual Results Unacceptable Acceptable Proposed Approach Unacceptable Case-1 Case-3 Unacceptable Acceptable coverage coverage Acceptable Case-2 Case-4 Test Acceptance escape gain Itwasexplainedearlierthatthemainobjectiveof errortoleranceisincreasing the functional yield. It is clear that in order to benefit more from error tolerance, the objective of testing for error tolerance would be to maximize the unaccept- able coverage, i.e., percentage of unacceptable chips that are correctly identified 13 as unacceptable, as well as to maximize the acceptance gain, i.e., percentage of acceptable chips that are correctly identified as acceptable. A secondary objective of testing for error tolerance, as for any other testing approach, is to minimize the complexity and cost of test application. Duetodifferentacceptabilityrequirementsofdifferentapplications,itispossible to classify acceptable chips into multiple bins. For different levels of acceptability the chips in different bins can be priced appropriately, which will alter the financial benefitsoferrortolerance. Classificationofchipsinthismannercanbeasecondary objective of error tolerance. 1.4 Different criteria for error tolerance In order to support the notion of error tolerance, system specifications need to be augmented to include degrees of acceptability. As mentioned earlier, the degrees of acceptability of performance at the outputs of an error tolerant system translate to acceptance criteria at the outputs of the system’s individual modules. According to [9], three simple measures of error tolerance are: error rate, error significance,anderroraccumulation. Foragivenapplication,theacceptancecriteria (i.e., threshold) for each relevant measure must be specified. Error rate is defined as the percentage of clock cycles during normal circuit operation for which the value at a set of outputs deviates from the corresponding error-free value. For example, in a combinational circuit, if a fault is detected by 14 half of all possible input vectors, and each input vector is applied during normal chip operation with equal likelihood, this circuit has an error rate of 0.5. For a faulty combinational circuit, error significance for a set of circuit outputs is defined as the maximum amount by which the response at the set of outputs can deviate from the corresponding error-free value. For example, an error in the least significant bit of an arithmetic operator may not be too critical for some applications. Error significance is discussed in more detail in [36, 37]. Depending on feedback structure of a design, the accumulation of error over successivecomputationsmaybecomecritical. Erroraccumulationconcernschanges inthesignificanceorrateoferrorovertime. Forexample, inafeed-forwardpipeline architecture, a defect produces constant error rate; however, in a feedback-oriented finite state machine, the error rate might increase over time. It is also possible to define composite metrics, such as a composite metric that considerserrorrateinconjunctionwitherrorsignificance. Foracombinationallogic block, such a composite metric would be a measure of the percentage of vectors for which a response at a set of outputs deviates by a certain amount from the corresponding error-free value. 1.5 Previous work on testing for error tolerance In [36, 37], threshold testing is introduced as a test generation approach to distin- guish chips based on the significance of error at their outputs. In threshold testing, faults are targeted one-by-one and tests are generated for unacceptable faults. In 15 this approach, accurate fault models need to be considered. In threshold testing there is no possibility of acceptable chips being discarded. The main challenge here is to achieve high coverage of unacceptable faults at reasonable test application costs. In [53], authors propose a composite metric of error rate and error significance, namely significance-based error-rate (SBER). In this work they propose three dif- ferent ways to quantify SBER value(s) of a defective chip using built-in self-test under different scenarios. They also discuss the statistical characteristics of their estimations. In [52, 59], two BIST approaches are introduced to estimate error rate of a chip using a large set of random input vectors. In contrast, in this research, we propose a deterministic solution to error-rate testing in order to avoid the use of large sets of tests. In [28, 29, 30, 40], techniques such as test-pattern-selection are introduced to facilitate test generation for error-rate testing. These techniques are discussed in more detail in Chapter 3, and the proposed test generation approach is compared with these techniques. 1.6 Outline of this dissertation In next section, an overview of error-rate testing is provided. Random error rate testing is briefly discussed, and deterministic error-rate testing is discussed in more 16 details. It is shown that using classical test generation approaches, perfect unac- ceptable coverage and relatively high acceptance gains are achievable for the set of single stuck-at faults for arbitrary circuits. In Chapter 3, it is proven that perfect error-rate testing in terms of acceptance gain and unacceptable coverage can be achieved for the set of single stuck-at faults in fanout-free circuits with primitive gates only. It is also shown that these results are not guaranteed in the presence of fanout and some complex gates. Properties of gates and error rates of single stuck-at faults are derived. These properties are used to develop a test generator, based on heuristics, specifically designed for error-rate testing. Results show that the proposed test generator achieves perfect unacceptable coverage and high acceptance gain in test generation times and test application times comparable to classical testing. In Chapter 4, notion of multi-vector testing is introduced. It is proven that using this notion, perfect error-rate testing can be achieved, independent of the fault model, for arbitrary circuits. Set of single stuck-at faults is considered to find an upper bound on the number of vectors required in each test to achieve the objectivesoferror-ratetesting. Amulti-vectortestgeneratorisdevelopedforsingle stuck-at faults, that can achieve perfect acceptance gain in test generation times and test application times comparable to classical testing. In Chapter 5, transition delay fault model is studied in the concept of error-rate testing. Acceptance gain and unacceptable coverage of the set of transition delay faults is studied for a set of vectors generated for single stuck-at faults. A test 17 generator is developed for delay faults that can achieve the objectives of error-rate testing at reasonable costs. In Chapter 6, some future research directions of this work are discussed. 18 Chapter 2 Error-rate testing 2.1 Introduction As discussed earlier, for a combinational logic block, error rate is defined as the percentage of clock cycles during normal circuit operation for which the value at a set of outputs deviates from the corresponding error-free value. In error tolerant applications where error rate is the appropriate metric, the application provides testing algorithm with a threshold error rate, T er . If a chip under test (CUT) has an error rate greater than or equal to the specified threshold it is considered unac- ceptable and is discarded. On the other hand if error rate of the chip is determined to be less than threshold, the chip is considered acceptable for use in this particular error tolerant application. Clearly, the main objective of error-rate testing is to distinguish acceptable chips (including defect-free chips), i.e., chips with error rate less than the threshold, from unacceptable chips, i.e., chips with error rate greater than or equal to threshold. 19 Based on the definition of error rate an intuitive approach for error-rate testing is: (i) apply a large number of random test vectors to the circuit, and (ii) count the number of clock cycles for which erroneous responses are captured. The ratio of the number of erroneous responses to the total number of test vectors applied gives an estimationoferrorrate. Comparingthisestimatederrorratevaluewithathreshold errorratewillleadtoacceptanceorrejectionofthechip. Suchanapproachrequires a very large set of test vectors, specially in case of low threshold error rate values [52, 59] and hence is viable for built-in self-test. Developing approaches for error- ratetestingthataremoresuitablefortestingusingautomatictestequipment(ATE) [60, 61, 62] is proposed in this work. The main challenges tackled here is generation of deterministic tests for error-rate testing. In the next section, the previously proposedapproachesbasedonrandomtestingarebrieflydiscussedbeforediscussing our main idea of deterministic tests for error-rate testing. 2.2 Random error-rate testing For a combinational block of logic, the definition of error rate can be rephrased as the ratio of the number of vectors that cause an error at the outputs of a CUT to the total number of vectors applied. By applying all possible vectors to a circuit, thenumberoferrorsattheoutputscanbecountedanderrorratecanbecomputed. However, since the set of all possible vectors is too large for most practical circuits, suchanapproachisfeasibleonlyforasubsetofallpossiblevectors. Ifasubsetofall possible vectors is applied to the circuit, the number of errors at the outputs will be 20 a hypergeometric random variable. Therefore in a first order estimation, the error- rate can be estimated as the expectation of this random variable, hence, fraction of the number of errors observed to total number of vectors applied. However, in order for this to be a good estimator, number of vectors that are applied to the circuit needs to be significantly large. We show the formulation to find a lower bound on the number of vectors for a certain confidence coefficient and margin in Section 2.3.3.2. The number of vectors required for most typical threshold values for error rate is very large and therefore, this approach is only viable for built-in self-test. Imple- menting this approach as a self-test approach raises other complications, namely, compression of the output responses into signatures due to lack of memory space. Compressed output responses make it difficult to directly count the number of er- rors at the outputs. Therefore, additional effort is required to obtain an estimation of the error rate. As in traditional self-test approaches, the compressed output result is compared with that expected of the fault-free version of the circuit, also stored on the chip. In classical testing a match between the two signatures indicates a fault-free chip, considering the probability of aliasing to be small. A miss-match between the two signature indicates a faulty chip. However, in testing for error-rate, a miss-match does not carry any information regarding the number of vectors for which the value at the output of the chip differed from that expected of the fault-free version of the chip, namely the number of error occurrences. 21 In order to compensate for the loss of information in the process of compaction of results, self-test is repeated a number of times to gather more information re- garding distribution of the number of errors at output. Two different approaches for gathering multiple signatures are proposed in [59] and [52]. The two signature analysis approaches are based on one’s counting compression and simple LFSR compression. Details of the two works along with required size of the set of vectors and number of sessions are reported in [52, 59]. 2.3 Deterministic error-rate testing Defect is referred to as a physical phenomenon. It is defined as the physical differ- ence between the ”good” or ”correct” circuit and the current circuit. Defects are modeled with faults at higher levels of abstraction. A fault model at a lower level of abstraction is more accurate in terms of corresponding to the actual physical defect. However, the number of faults that must be dealt with at this level may be very large, and it may cause a huge test burden. At the same time, a fault at a higher level maps to a number of faults at a lower level, hence, detection of the former results in detection of the latters. However, because of imperfect modeling, many lower-level faults may remain undetected by this higher-level test set. In deterministic approaches, physical defects in the circuit are modeled with structural fault models. Test vectors are generated for modeled faults. If the error rate due to a modeled fault is greater than or equal to the given threshold error 22 rate, the fault is referred to as an unacceptable fault; otherwise it is an acceptable fault. While a classical automatic test pattern generator (ATPG) must ideally dis- tinguish every faulty circuit under test (CUT) from fault-free CUTs, the primary objective of an error-rate test generator is to distinguish any CUT which has an un- acceptable fault from any CUT that is either fault-free or has an acceptable fault. In applications where a fault-free CUT commands a higher market price than a faulty CUT which has an acceptable fault, a secondary objective is to distinguish a CUT which has an acceptable fault from a fault-free CUT. We concentrate on the new challenges posed by the primary problem, since once these challenges have been addressed the secondary problem lends itself to incremental solutions. Next, the central question that needs to be addressed is described. The main problem to be solved in this work is stated in more detail afterwards. 2.3.1 The central question and test application scenario In classical testing scenarios which use vectors generated by automatic test pattern generators (ATPG), automatic test equipment (ATE) is used (i) to apply each gen- erated test to the inputs of a CUT, (ii) to capture the response at its outputs, and (iii) to compare the captured response with that expected of the fault-free version. In such testing, one vector is applied and the CUT is discarded if the response captured at CUT outputs differs from the corresponding fault-free response; other- wise, the testing continues with the next vector in the test sequence. Such a testing approach is called stop on first error (SOFE) and is attractive due to its low cost. 23 In the error-rate testing scenario, occurrence of erroneous response for one vector may not be sufficient condition for discarding a CUT. For example, consider a case where we test a circuit using N random vectors, generated such that the probabil- ities of occurrence of vectors are identical to those during normal operation of the circuit. In such testing, CUTs with erroneous responses for up to (approximately) T er N vectors would be classified as acceptable. The use of a random test sequence is not practical for ATE based testing since sufficiently high fault coverage of un- acceptable faults is achieved only when N = O(1/T er ) vectors are applied. The number of vectors is unacceptably large in cases where T er value is small. In this work, an approach for error-rate testing is studied that uses deterministic test vec- tors to reduce N, the number of vectors applied, and hence the test cost. Since the vectors applied to a CUT are not random, CUTs which have erroneous responses for T er N or fewer vectors cannot be classified as acceptable. For this reason as well as to reduce the test application cost, we adopt the classical SOFE test application scenario, i.e., discard a CUT if it produces erroneous response for even one vector. Later, in chapter 4, instead of a set of single test vectors a set of multi-vector test sessions is deployed, and a variation of SOFE is adopted, which is referred to as SOFES, i.e., stop on first erroneous session. The benefits of error-rate testing are maximized if (i) as many unacceptable faultsaspossiblearedetected,and(ii)asfewchipswithacceptablefaultsaspossible are rejected. While the requirement (i) above is encountered in classical testing, our problem is different because we must concurrently satisfy requirement (ii). 24 The above observation leads to the central question that needs to be addressed: Is it possible to generate a set of deterministic tests that detects every detectable unacceptable fault while detecting none of the acceptable faults? In the next section, the theory that answers the above question is developed. Recall that every unacceptable fault, f u , has error rate R(f u ) ≥ T er and every acceptable fault f a has error rate R(f a ) < T er . Also, recall that the error rate for each fault depends on the probabilities with which the 2 m possible vectors are applied to the m-input combinational circuit C during its normal operation. Most of the proposed theoretical results are valid independent of the probabilities with which each vector occurs during normal operation. It is explicitly stated so if otherwise. A slightly less constrained version of the central questionis empirically pursued: What is the minimum number of acceptable faults that must be detected by a deter- ministic test set that detects every detectable unacceptable fault in the circuit? Note that this question is formulated in a manner that does not compromise test qual- ity. The answer to this question leads to development of test generators specifically designed for error-rate testing. 2.3.2 Problem statement As explained earlier, the main concern in test generation for error-rate testing is the fraction of unacceptable faults that get detected as well as the acceptable faults that remain undetected. 25 Theproblemoftestgenerationforerror-ratecanthereforebedefinedasgenerat- ingatestsetwithfollowingproperties(mentionedintheorderoftheirimportance). 1. High coverage of unacceptable faults (high unacceptable coverage). 2. Low coverage of acceptable faults (high acceptance gain). 3. Low test application time (small test-set size). 4. Low test generation complexity (low run-time). The main objective is to increase unacceptable coverage while also increasing acceptance gain. Achieving high unacceptable coverage has the highest priority in testgenerationformulation. However, asunacceptablefaultscausehigherrorrates, these faults are typically easy-to-detect. Hence, it is typically easy to achieve close to perfect unacceptable coverage. The second objective is to increase acceptance gain that contributes to the yield benefits of error tolerance. Test application time and test generation time, although important, are not of highest priority in test generation formulation. The fact that only a subset of all possible faults, namely unacceptable faults, is targeted, tends to reduce run- time and test-set size, which typically allows test generation and application times comparable to classical test generation. 2.3.3 Classificationoffaultsasacceptableandunacceptable Deterministic error-rate testing consists of targeting unacceptable faults for test generation while avoiding detection of acceptable faults. Therefore the initial step 26 in deterministic error-rate testing is to differentiate unacceptable faults from the acceptable ones. Aparticularapplicationisanalyzedforthispurpose. Theacceptance(tolerance) criteriaofthesystemisidentifiedintermsofperformancedegradation,e.g.,aPSNR degradation for an MPEG encoder. The acceptance criteria of the system is then translated to acceptance criteria of the individual modules within the system. In caseswherethisacceptancecriteriaisprovidedintheformofathresholderrorrate, the problem of differentiating acceptable faults from unacceptable ones reduces to a mere comparison with the threshold error rate of the error rate due to each fault at the outputs of the module. 2.3.3.1 Computation of error rate Identification of acceptable and unacceptable faults, as mentioned before, depends on determining the error rate caused by the faults. Error rate is defined as the ratio of number of clock cycles for which there is an error at output of the circuit. In other words error rate is the probability of an erroneous output for a circuit. Different measures of correctness of the output of a circuit are introduced in the literature and are used for different purposes [55], such astestgenerationforthepurposeoffaultdetection, evaluationofthe”latency”[63] of a fault, i.e., the time between the occurrence of the fault and its manifestation at the circuit output; and the modeling of intermittent failures in a circuit. 27 The probability that the output of a circuit is correct is first defined as signal reliability in [3]. Clearly error rate, in the manner we define it, is the complement of the signal reliability. In[54],giventheprobabilityoffaultsoccurringinthecircuitandtheprobability of occurrence of various input vectors, the authors determine probability of output attaining a specified value. They assume the set of single stuck-at-faults and show the relationship between boolean operations and algebraic operations on symbolic representations of probabilities. They develop a probabilistic model of the circuit and compute a probability expression at the circuit output. The probability of the output is used in conjunction with some auxiliary circuits to determine the signal reliability of the circuit output. In [55] authors expand on this idea to further simplify the algorithms. About the aforementioned algorithm, they state, ” ... that although it is simple and straightforward, its complexity has an attainable upper bound of order 2 n , where n is the number of circuit inputs. Clearly the boolean analysis is the cause of the complexity”. They therefore propose another approach in which the output probability expressions are directly derived from circuit descriptions, meaning, the algorithm starts at the inputs and proceeds to the outputs to derive an algebraic probability expression at the output of each gate. It suppresses all exponents in each expression to obtain the correct probability expression for that signal. They state about this algorithm, ” ... that it is less complex than algorithm one. It can be implemented using the existing software with little or no modification”. 28 In [51], the authors transfer the logic circuit into a corresponding logical model [47] to identify equivalence classes. Inputs that cause an error for each class are identified and the probability of the output being incorrect can be calculated. In [34] STAFAN is proposed as an alternative to fault simulation of digital circuits. It makes use of the concepts of controllability and observability [27]. These quantities are redefined in the form of probabilities of controlling and observing the lines. Controllability is estimated by collecting the statistics of activity on a line, for a short random sequence of vectors. Observability is computed from estimated controllabilities. The product of the appropriate controllability and observability gives the detection probability of a fault. In [34], this detection probability is computedforthesetofallsinglestuck-atfaultsandisusedtoestimatethecoverage of a test. The detection probability is analogous to our definition of error rate. Another method of computation of detection probabilities of faults, cutting al- gorithm, is proposed in [4] and [58]. The cutting algorithm allows computation of bounds on signal probabilities and detection probabilities in combinational circuits. The purpose of this computation is to determine the necessary pseudo-random test length needed to test a circuit in a manner that ensures desired fault coverage. One of the problems with the cutting algorithm is the possibility of computing loose bounds and hence, unnecessarily long sequence of test vectors. In [57] the improved cutting algorithm, a careful combination of the original cutting algorithm [58] and the Parker-McCluskey algorithm [54], is proposed. The tightness of the bound on the detection probability of a fault can be controlled by 29 handling different porions of the circuit with cutting algorithm versus the Parker- McCluskey algorithm. There is a tradeoff between the accuracy of the results and the computational effort required. As mentioned earlier, in error-rate testing the absolute value of error rate is not as important to us as its comparative value with respect to the threshold error rate. Let us assume that the cutting algorithm determines the upper bound on detection probability of a fault to be dp u , so that dp u < T er , where T er is the threshold error rate specified by the error tolerant application. This upper bound computation classifies the fault as acceptable with less computation complexity than needed to compute the exact detection probability, i.e., error rate. Also, if a lower bound dp l is found on the detection probability, so that dp l ≥ T er , then the fault is classified as unacceptable. If the cutting algorithm determines the lower and upper bounds on detection probability such that dp l <T er ≤dp u , further consideration is needed to classify the fault as acceptable or unacceptable. In this work, in order to determine the error rate of a fault, the fault is injected into the circuit and simulation is performed for large sets of random vectors. Num- ber of times an error occurs at the output of the circuit is counted, and error rate is calculated as the ratio of the number of errors at the output by the total number of vectors applied to the circuit. The precision of the estimates found by this method are discussed in [59]. Clearly, in order to achieve high confidence in the estimated error rate a large set of vectors is required for simulation, especially for faults with very small values of error rate. 30 All of the previously discussed approaches can be used without any modifica- tions or with slight modifications to classify faults as acceptable or unacceptable, based on their error rates. However, some of these approaches perform unnecessary computationtocomputeprecisevaluesoferrorrates, whileinordertoclassifychips it suffices to find out whether or not the error rate is less than the threshold error rate. 2.3.3.2 Distribution of error rate Fault simulation with a large number of pseudo-random vectors is performed on a number of benchmark circuits to count the number of errors at the outputs and to estimate the error rate. Lower bound required on number of vectors is found to determine with high confidence whether or not the error rate of a fault is greater than or equal to the threshold error rate. In other words, the focus is on finding minimum number of vectors that is required for fault simulation to differentiate between acceptable and unacceptable faults with a desired degree of confidence. Consider the policy described in Figure 2.1 as the policy of differentiation be- tween acceptable and unacceptable faults based on the particular application. Let us denote the allowed margin by and the required confidence coefficient by γ. For a CUT with actual error rate r, the differentiation policy can be summarized as follows. • If r <T er (1−), then p(ˆ r <T er )≥γ. • If r >T er (1+), then p(ˆ r≥T er )≥γ. 31 Figure 2.1: Policy of differentiation between acceptable and unacceptable faults. where ˆ r is the estimated error rate from the simulations. The above two conditions can be rewritten as follows. • If r <T er (1−), then p(N R <N V T er )≥γ. • If r >T er (1+), then p(N R ≥N V T er )≥γ. where N V is total number of applied vectors and N R is number of vectors that cause an error at the output. The following inequality guarantees the above two conditions. p(|N R −N V r|≤NT er )≥γ. (2.1) In order to verify the above claim, inequality 2.1 is rewritten as −N V T er ≤ p(N R −N V r ≤ N V T er ) ≥ γ. It can be verified that if this inequality condition holds true and ˆ r <T er (1−), then p(−N V T er ≤N R ≤N V T er )≥γ. Since N R ≥ 0, p(−N V T er ≤N R ≤ 0) = 0. Therefore,anyvalueofN V ,thatsatisfiesp(−N V T er ≤ N R ≤N V T er )≥γ, also satisfies the original condition for differentiation. It can be similarly shown that satisfying inequality 2.1 guarantees the original condition for r >T er (1+). 32 Figure 2.2: Distribution of error rate for ISCAS85 benchmark circuits. On the other hand, for random variable N R , Chebychev inequality states that, p(|N R −E{N R }|≤N V T er )≥ 1− VAR(N R ) (N V T er ) 2 . (2.2) Hence, Inequality 2.1 is satisfied if 1− VAR(N R ) (N V Ter) 2 ≥ γ. VAR(N R ) can be replaced with N R 2 as an upper bound to compute a lower bound on N R as, N R ≥ 1 (1−γ)(T er ) 2 . (2.3) BasedontheaboveformulationforathresholderrorrateT er = 0.1,errormargin = 0.1, and confidence coefficient γ = 0.9, lower bound on the number of vectors is 50,000. Extensive fault simulations are performed on a number of benchmark circuits for 50,000 pseudo-random vectors to find the distribution of error rate for all single stuck-at faults in these circuits. Results are presented in Figure 2.2. 33 2.4 Classicaltestgenerationanderror-ratetesting Inthissectionanumberofapproachesforadoptingaclassicaltestgeneratorinintu- itivewaysaredescribedtoassesstheimportanceofdevelopinganATPGspecifically for error-rate testing. What is targeted to be achieved in this section is to approx- imately answer the second version of the question asked in Section 1: What is the minimum number of acceptable faults that must be detected by a deterministic test set that detects every detectable unacceptable fault in an arbitrary combinational circuit? A classical ATPG is used to target unacceptable faults one at a time. The percentage of acceptable faults that remain undetected by a test is referred to as acceptance gain of that particular test. Results provided by these approaches are used as a baseline against which we compare more systematic approach, presented later in Section 3.3. Approach 1: A classical test generator is used to generate test vectors for un- acceptable faults only. A test pattern generated for an unacceptable fault typically detects a number of acceptable faults as well. Hence, such an approach typically significantly erodes the acceptance gain. Approach 2: In order to improve the acceptance gain of the test generator proposed in Approach 1, multiple outputs of the circuit are used to distinguish between the effects of acceptable and unacceptable faults detected by a vector. For each test vector, one of the outputs to which the effect of the target unacceptable fault gets propagated is saved along with the test vector. At the time of test 34 application, for each test vector, the chip is rejected only if an error occurs at the corresponding saved output. It is obvious that ignoring the fault effects at other outputs reduces the degree of test compaction and results in higher test sizes. This approachstillerodestheacceptancegain,althoughtoalesserextentthanApproach 1. Approach 3: Another approach is to generate a number of tests for each unac- ceptable fault, and from these tests select the test that detects the least number of acceptablefaultsandaddthetesttothetestset. Notethatthisapproachissearch- ing for an optimum test vector for each unacceptable target fault in the space of all test vectors that can be generated for the fault. Therefore, if the test generator is allowed to search for a sufficiently long time, the optimum vector will eventually be found. However, this approach has unpractically high run-time. A number of other modifications can be made to a classical test generator to target unacceptable faults. However each of these approaches has associated dis- advantages. We have implemented the above three approaches and used them to generatetestsforanumberofISCAS85circuits. (InApproach3,thetestgenerator searches the space to generate 100 test vectors for each target fault.) The percentages of unacceptable and acceptable single stuck-at faults for a par- ticular threshold error rate, T er , for different circuits are shown in columns 2 and 3 of Table 2.1. Consistent with our expectations, all the above approaches achieve 100% cov- erage of unacceptable faults. We also observed higher test set sizes and run-times for Approach 2 and Approach 3, respectively. Test-set size and run-time of the test 35 Table 2.1: Acceptance gains of approaches 1, 2, and 3; T er = 0.1. Circuit Unacceptable Acceptable Acceptance gain(%) 2 faults(%) 1 faults(%) 1 Approach 1 Approach 2 Approach 3 C880 59.8 40.2 51.5 71.3 74.1 C1355 34.4 65.6 47.5 52.2 57.8 C1908 55.2 44.8 56.3 48.6 59.4 C2670 51.8 48.2 50.6 52.0 61.5 C3540 36.0 64.0 46.6 65.4 67.2 C5315 27.9 72.1 32.3 59.9 – 1 Percentage of all single stuck-at faults 2 Percentage of acceptable faults generator are not our primary focus. Therefore, we do not present these values in the table. Nevertheless, using Approach 3 to generate a test set for a larger circuit, such as c5315, resulted in such a long run-time that we were forced to terminate the process before it could complete. Acceptance gains of the three approaches are shown in columns 4, 5, and 6 of Table 2.1. No significant improvement is viewed in improvement of acceptance gain between Approach 2 and Approach 3 for larger circuits. This shows that Approach 3 is not scalable. The relatively low acceptance gains as well as high test-set sizes and high run-times provides motivation to take a more systematic approach to test generation for error-rate testing. 2.5 Summary In this Chapter, deterministic error-rate testing is discussed and central question in test generation for error-rate testing is introduced: is it possible to generate a set of 36 deterministic tests that detects every detectable unacceptable fault while detecting none of the acceptable faults. Problemoftestgenerationforerror-ratetestingisformulatedintermsofnumber of acceptable and unacceptable faults that get detected, as well as test application time and test generation time. Identification of acceptable and unacceptable faults, is a major issue in error- rate test generation. The error tolerant application is analyzed to provide us with a threshold error rate. Hence, identification of acceptable and unacceptable faults is reduced to computation of error rate. Different approaches exist to compute error rate of a fault. In this work error rate is estimated by exhaustive simulation. Adaptationsofexistingtestgenerationapproachesprovideperfectunacceptable coverage and high acceptance gains. These results motivate development of a new testgenerationapproach,specificallydesignedforerror-ratetestingtomaximizethe acceptance gains, and minimize test generation times and test application times. 37 Chapter 3 Test generation for error-rate testing 3.1 Introduction Test generation for error-rate testing is defined as generating a set of test vectors for a circuit with the objective of distinguishing chips with unacceptable error rate from those that are error-free or have an acceptable error rate. When generating a test for a circuit, faults are targeted one-by-one and tests are generated and added to the test set. In previous chapter the main question that needs to be addressed before we can proceed with test generation for error rate is posed: Is it possible to find a test set for a circuit that detects all unacceptable faults and none of the acceptable faults? In this chapter this question is answered for the set of single stuck-at faults. It is shown that using the proposed SOFE approach, such a test set can be found for a special family of circuits, namely fanout-free circuits with primitive gates only. A test generation approach is proposed for such circuits that provides full coverage of unacceptable faults while detecting none of the acceptable faults. 38 Main complications that arise when generating such a test set for more general classes of circuits are discussed. A number of algorithms and heuristics are intro- duced to develop a test generation approach for arbitrary combinational circuits. The proposed test generator is used on to answer the less constrained version of the question asked in previous chapter: How many acceptable faults must be detected by a test set to detect every detectable unacceptable fault? Results show that using the proposed heuristics and algorithms, low acceptable coverage as well as full coverage of unacceptable faults can be achieved with little to no compromise in test generation and test application times. In Section 3.2 a number of properties of fanout-free circuits and primitive gates are discussed and the proof is presented for positive answer to the central question for circuits with no fanouts and no complex gates. A test generation approach for error-rate testing is proposed for such circuits. Complications that arise when generating tests for a more general class of circuits are discussed. In Section 3.3, a test generation approach is developed for arbitrary combinational circuits, and in Section 3.4 results of test generation using this test generator for a number of benchmark circuits is presented. Section 3.5 provides a summary of this chapter. 3.2 Error-ratetestingforfanout-freecircuitswith primitive gates In this section a number of properties of fanout-free circuits with primitive gates only, are explored as well as some relationships between faults in such circuits to 39 support the claim that it is possible to generate a perfect test set for error-rate testing for this family of circuits. A test generator is presented and later these propertiesareusedinothersectionstodevelopatestgeneratorforarbitrarycircuits. 3.2.1 Relationships between faults at a primitive gate The notions of fault equivalence [35, 47] and fault dominance [35, 56] are used ex- tensively in fault simulation and test generation. These notions are used to identify relationships between error rates of different faults in a circuit. Consider an arbitrary combinational circuit C. Let C f denote a version of the circuit with fault f. Equivalence and dominance relationships between two faults f i and f j can be summarized in the following properties. Here, C f (V) denotes the input-output logic behavior of C f for vector V, and SATV(f) denotes the set of all test vectors for fault f, i.e., a set containing every vector that detects f. R(f) denotes the error rate of faultf. (It is important to keep in mind that faulty circuit versions with either f i or f j are considered, i.e., faulty versions that have both f i and f j are not considered.) Property 3.1. If two faults f i and f j in an arbitrary combinational circuit C are equivalent, then (i) C f i (V) = C f j (V),∀V; (ii) SATV(f i ) = SATV(f j ) ; and (iii) R(f i ) =R(f j ). Property 3.2. If fault f i in an arbitrary combinational circuit C dominates f j , then (i) C f i (V) = C f j (V),∀V ∈ SATV(f j ); (ii) SATV(f i ) ⊇ SATV(f j ) ; and (iii) R(f i )≥R(f j ). 40 InProperty3.1andProperty3.2,weexplicitlyincludetherelationshipsbetween SATV(f i ) and SATV(f j ) because if these sets satisfy the conditions of the type shown,thenthecorrespondingrelationshipsbetweenerrorratesaresatisfiedforany distribution of the probabilities of occurrence of 2 m vectors during normal circuit operation. We include the relationships between C f i and C f j because these show that even in cases where the outputs of C are partitioned into multiple buses in arbitrary ways, the corresponding relationships between error rates are satisfied on a bus-by-bus basis. Equivalence and dominance relationships exist between single stuck-at faults associated with every primitive gate (i.e., INV, NAND, AND, NOR, and OR). Every primitive gate has a controlling value (e.g., see [35]), cv, which is a logic value whose application to one gate input determines the logic value at the gate’s output,independentofthevaluesappliedatotherinputsofthegate. Thecontrolled response, cr, of a gate is the logic value implied at its output by the application of its controlling value at one or more of its inputs. The complement of a gate’s controlling value is called its non-controlling value, ncv. The concept of controlling value can be extended for many commonly used families of complex gates, notably and-or-inverts (AOI) and or-and-inverts (OAI). However, some complex gates, e.g., XOR and XNOR, do not have controlling values. For such gates, the controlling value, non-controlling value, as well as controlled response are undefined. Wecannowenumeratetheequivalenceanddominancerelationshipsbetweenthe single stuck-at faults associated with inputs and the output of a k-input primitive gate. Let the gate inputs be x 1 ,x 2 ,...,x k and the gate output be z. 41 • Faults x 1 SAcv,x 2 SAcv,...,x k SAcv , and z SAcr are equivalent. • Fault z SAcr dominates each of the single stuck-at faults: x 1 SAncv,x 2 SAncv,...,x k SAncv. 3.2.2 Properties of fanout-free circuits The following two special properties of fanout-free circuits help extend the relation- ships between faults to our target class of circuits. Property 3.3. In a fanout-free circuit, there is a unique path from the output of any gate to the primary output. This path is called the unique propagation path (UPP) and the gates and lines along this path are called on-path gates and on-path lines, respectively. For each on-path gate, the input that is on the UPP is called the on-path input and every other input is called a side-input. Property 3.4. In a fanout-free circuit the lines where it is necessary to apply particular values for detecting a target fault are driven by sub-circuits which are fanout-free and pair-wise disjoint. It is therefore possible to apply any combination of values at the lines of interest. Next section elaborates on these properties and the associated terminology. 3.2.3 Properties of primitive gates in fanout-free circuits Consider a primitive gate in a fanout-free circuit, namely a NAND gate with inputs x and y and output z. The fact that the NAND gate is in a fanout-free circuit does 42 Figure 3.1: Testing a single stuck-at fault associated with a gate in a fanout-free circuit. not help identify any stronger relationship between the set of all test vectors and error rates for the three equivalent faults associated with the gate, namely x SA0, y SA0, and z SA1. That is, Property 3.1 still holds. In contrast, for the dominance relationship stronger conditions can be identified. Figure 3.1 shows the above NAND gate (marked as GUC) within an arbitrary fanout-free circuit. Thick lines are used to depict the unique propagation path via which the fault effect for any fault associated with the GUC, i.e., our NAND gate, must be propagated. On-path lines are called o 1 ,o 2 ,...,o k , and side inputs of on-path gates are called s 1 ,s 2 ,...,s l . To detect any single stuck-at fault associated with the GUC, appropriate values must be applied (i) at the inputs of the GUC to excite the fault (and, in the case of a fault at an input of the GUC, to also propagate its effect to the output of the GUC), and (ii) at the side inputs, s 1 ,s 2 ,...,s l , of the on-path gates to propagate its effect along the UPP to the primary output. In the example circuit in Figure 3.1, we need to apply appropriate values at lines x, y, s 1 ,s 2 ,s 3 ,...,s l . 43 The consequence of Properties 3.3 and 3.4 is that it is possible to apply any combination of values at the inputs of the GUC that will excite an associated single fault (and, if the fault is at an input of the GUC, propagate its effect to the output of the GUC) and, for each such combination of values, it is possible to propagate the effect of the fault to the primary output of the circuit. For this example two-input NAND gate, z SA0 dominates x SA1 and z SA0 dominatesy SA1. Notethatthefaultz SA0canbeexcitedbyapplying(00), (01), or (1 0) at gate inputs, x andy. In contrast, x SA1 can be excited and its effect can be propagate toz by applying (0 1) atx andy. Similarly, y SA1 can be excited and its effect can be propagated z by applying (1 0) at x and y. When viewed with the aforementioned special properties for a fanout-free circuit, the general relationship between the sets of all test vectors can be modified for these faults by replacing the ’⊇’ relation by the ’⊃’ relation, to obtain SATV(z SA0) ⊃ SATV(x SA1), and SATV(z SA0)⊃ SATV(y SA1). Furthermore, by applying (0 0) at the GUC inputs, x and y, fault z SA0 can be detected without detecting either of the faults it dominates, namely x SA1 and y SA1. This can be generalized for arbitrary primitive gates in fanout-free circuits to obtain a strengthened version of Property 3.2 in the following form. Property 3.5. Consider a primitive gate with output z and inputs x 1 ,x 2 ,...,x k . Let f 0 = z SAcr be the single stuck-at fault at the output z, which dominates each of the single stuck-at faults f 1 = x 1 SAncv,f 2 = x 2 SAncv,...,f k = x k SAncv at its inputs x 1 ,x 2 ,...,x k . If the primitive gate is used in a fanout-free combinational circuit C, then 44 (i) C f 0 (V) =C f i (V),∀V ∈SATV(f i );∀i∈{1,2,...,k}; (ii) SATV(f 0 )⊃{SATV(f 1 )∪SATV(f 2 )∪...∪SATV(f k )}; and (iii) R(f 0 )>R(f 1 )+R(f 2 )+...+R(f k ). An important consequence of Property 3.5 is that in a fanout-free circuit one or morevectorsexistthatcandetectthefaultf 0 =z SAcrwithoutdetectinganyofthe faults it dominates, namely f 1 =x 1 SAncv,f 2 =x 2 SAncv,...,f k =x k SAncv. Any vectorinthe(non-empty)setSATV(f 0 )−{SATV(f 1 )∪SATV(f 2 )∪...∪SATV(f k )} makes this possible. For equivalent faults associated with a primitive gate with inputs x 1 ,x 2 ,...,x k , and output z embedded in an arbitrary fanout-free circuit, Property 3.1 is restated in the form of Equations 3.1 and 3.2. SATV(z SAcr) =SATV(x 1 SAcv) =... =SATV(x k SAcv). (3.1) R(z SAcr) =R(x 1 SAcv) =... =R(x k SAcv). (3.2) For the faults with dominance relationship, Property 3.2 is restated in the form of Equations 3.3 and 3.4. SATV(z SAcr)⊃{SATV(x 1 SAcv)∪...∪SATV(x k SAcv)}. (3.3) R(z SAcr)>R(x 1 SAncv)+...+R(x k SAncv). (3.4) 45 Figure 3.2: An example circuit and the relationships between the sets of all vectors that detect the single stuck-at faults. Figure 3.2 shows an example fanout-free circuit and illustrates the relationships between the sets of all test vectors for every single stuck-at fault in the circuit. Every box with one or more fault names as a label is the set of all possible vectors for the corresponding faults. Note that for every single stuck-at fault in this circuit there exists a vector that detects the fault without detecting any fault that has a lower error rate than the target fault. For example, vector (1 1 1 1) detects f SA1 without detecting either of the two faults it dominates, namely c SA0 and d SA0. 46 3.2.4 ERTG-ff: Error rate test generator for fanout-free circuits The above extended properties are used to construct a new test generator for unac- ceptable faults in fanout-free circuits with primitive gates only. Vectors generated by the new test generator can detect every unacceptable fault without detect-ing any acceptable fault in such circuits. Commonly used complex gates (AOI, OAI, XOR, XNOR, and multiplexers) as well as circuits with fanouts, non-reconvergent as well as reconvergent, and the challenges they pose to our theory are discussed, in the next section (for definition of terms see [35]). A test generator for error-rate testing for single stuck-at faults in fanout-free circuits with primitive gates, ERTG-ff, is constructed in this section. Properties of test vectors generated by ERTG-ff are developed to answer our central question. The central question is rephrased in the following form: Given a fanout-free circuit with primitive gates and a target single stuck-at fault f t , is it possible to generate a test vector for f t , such that SATV(f) ⊇ SATV(f t ) for every single stuck-at fault f that is detected by the vector? Clearly, such a test generator will target a fault only if it is unacceptable. A test generated in this manner will only detect unacceptable faults. Consider a generic fanout-free circuit shown in Figure 3.3, where every gate is a primitive gate. Consider a gate G with inputs x 1 ,x 2 ,...,x p and output z. Let the targetfaultbef t =z SAα. Excitationofthetargetfaultnecessitatesapplicationof value(s) at one or more inputs of G, namely x 1 ,x 2 ,...,x p . The values applied at the 47 Figure 3.3: Generating a test vector for target fault f t in a generic fanout-free circuit. input(s) ofG must imply at its output, i.e., at the fault sitez, a valuev(z) =α in a fault-free version of the circuit. Also, as described earlier and illustrated in Figure 3.1, the effect of the target fault must be propagated along the UPP with on-path gates G 1 ,G 2 ,...,G k−1 and on-path lines o 1 ,o 2 ,...,o k . To propagate the fault-effect of f t , it is necessary to apply at every side-input s i of every on-path gate G j , the gate’s non-controlling value. In other words, any test vector for f t must imply a value v(s i ) =ncv(G j ), for every side input s i of every on-path gate G j . Assignmentofspecificvaluesatthefaultsitezandateverysideinputs 1 ,s 2 ,...,s l of every on-path gate of the corresponding UPP implies specific values at on-path lines o 1 ,o 2 ,...,o k . Let these values be v(o 1 ),v(o 2 ),...,v(o k ). Note that all the above value assignments and implications are mandatory and must be satisfied by any test vector generated for f t by any test generator. 48 To obtain a test, the values at z and s 1 ,s 2 ,...,s l must be justified (e.g., see [35]) by assigning values at the inputs of the fanout-free sub-circuits sc 0 ,sc 1 ,...,sc l shown in Figure 3.3. In general, the desired value at the output of sub-circuit sc i canbeobtainedusingoneofanumberofcombinationsofvaluesattheinputsofsc i . Hence, we must develop a new justification procedure that identifies a combination of values at the inputs of sc i that (i) implies at the output of sc i the value desired for detection of the target fault f t , and (ii) satisfies our other original objective, namely detects only a fault f such that SATV(f) ⊇ SATV(f t ). The proposed justification procedure is discussed in the following section, after briefly discussing the faults that get detected by a test vector generated for the target fault, f t . 3.2.4.1 Faults detected by any test vector for a target fault First, faults at on-path lines, o 1 ,o 2 ,...,o k , and at off-path inputs, s 1 ,s 2 ,...,s l , which are detected by any vector that detects the target fault are dealt with. All results in this section hold for any vector that detects the target fault. In other words, these results hold for any test generation algorithm that can be used to generate a test vector for the target fault. Lemma 3.1. Any vector that detects a target stuck-at fault in a fanout-free circuit withprimitivegatesalsodetectsastuck-atfaultateachon-pathlinealongtheunique propagation path from the fault site to the output. In particular, any vector that detects the target fault f t = z SAα in the generic fanout-free circuit illustrated in Figure 3.3 also detects single stuck-at faults o 1 SAv(o 1 ), o 2 SAv(o 2 ), ..., o k SAv(o k ) at on-path lines. 49 Proof. Considerthegenericfanout-freecircuitwithprimitivegatesshowninFigure 3.3. Consider a vector v t that detects the stuck-at fault f t at line z. Vector v t must excite the fault effect at the fault site, propagate the fault effect through the UPP, and justify the unjustified values at all lines. Consider an on-path line o i at the input of an on-path gate G i . Vector v t assigns v(o i ) at this line. Hence a single stuck-at fault SAv(o i ) is excited at this line. Vector v t must propagate the fault effect of f t via this path, hence it assigns non-controlling value of each on-path gate to its side-inputs, hence the effect of SAv(o i ) at o i gets propagated to the output. Vector v t excites and propagates the fault effect of o i SAv(o i ), and justifies all unjustified lines, hence, it detects o i SAv(o i ). Lemma 3.2. The single stuck-at fault o i SAv(o i ) at on-path line o i detected by any vector that detects the target fault f t = z SAα in the generic fanout-free circuit with primitive gates, illustrated in Figure 3.3, is either equivalent to or dominates the target fault f t . In other words, this fault satisfies the following conditions: (i) SATV(o i SAv(o i ))⊇SATV(f t ) ; (ii)R(o i SAv(o i ))≥R(f t ); and (iii)C o i SAv(o i ) = C ft ,∀V ∈SATV(f t ). Proof. AccordingtoLemma3.1, anyvectorv t thatdetectsf t (i.e.,v t ∈SATV(f t )), alsodetectssinglestuck-atfaulto i SAv(o i )(i.e.,v t ∈SATV(o i SAv(o i ))). Hence,(i) SATV(o i SAv(o i ))⊇ SATV(f t ). Independent of the distribution of input vectors, (ii) R(o i SAv(o i )) ≥ R(f t ). Since the circuit has a single output (iii) C o i SAv(o i ) = C ft ,∀V ∈SATV(f t ). In other words o i SAv(o i ) dominates f t . 50 The above results collectively show that while it is impossible to detect the target fault f t without also detecting particular single stuck-at faults at on-path lines o 1 ,o 2 ,...,o k , since the target fault is unacceptable, each of o 1 SAv(o 1 ), o 2 SAv(o 2 ), ..., o k SAv(o k ) is also unacceptable. Lemma 3.3. Any vector that detects the target fault f t in the generic fanout-free circuitwithprimitivegates, illustratedinFigure3.3, alsodetectsthefaults j SAv(s j ) at a side input s j of the on-path gate G i , if and only if the single stuck-at faults at the side and on-path inputs of G i , i.e., s j SAv(s j ) and o i SAv(o i ), are equivalent. Proof. Consider vector v t that detects the target fault f t . First, we assume that v t detects s j SAv(s j ), and prove that s j SAv(s j ) and o i SAv(o i ) are equivalent. In order to detect f t , v t must assign ncv(G i ) to s j . In order to detect s j SAv(s j ), v t must assignncv(G i ) too i . Hence SAcv(G i ) ato i ands j are excited and detected by v t . According to properties of faults at primitive gates, o i SAcv(G i ) is equivalent to s j SAcv(G i ). Therefore, s j SAv(s j ) and o i SAv(o i ) are equivalent. Next, s j SAv(s j ) and o i SAv(o i ) are assumed to be equivalent, and it is proven that v t detects s j SAv(s j ). According to Lemma 3.1, v t detects o i SAv(o i ). If s j SAv(s j ) and o i SAv(o i ) are equivalent, any vector that detects one, detects the other as well; hence, v t detects s j SAv(s j ). Results of Lemma 3.3 and Lemma 3.2, combined, show that fault s j SAv(s j ) at a side input s j of an on-path gate is detected if and only if the s j SAv(s j ) is equivalent to or dominates the target fault f t . 51 3.2.4.2 A new justification procedure The justification procedure is used to justify the values required at the fault site z and side-inputs s 1 ,s 2 ,...,s l of on-path gates by applying specific combination of values at the inputs of the corresponding fanout-free sub-circuits sc 0 ,sc 1 ,...,sc l , respectively. Let the objective of justification be to assign value v(q) at line q, where q is the output of a primitive gate G with inputs q 1 , q 2 , and so on. First consider the case where v(q) =cr(G). In this case, the desired justification can be accomplished only by assigning the value ncv(G) to all the inputs of G, i.e., by assigning v(q i ) = ncv(G),∀q i that is an input to G. Next, consider the only other case, i.e., when v(q) = cr(G). In this case, the desired justification value at the output of the gate can be justified by applying the gate’s controlling value, i.e., cv(G), to one or more of its inputs. If the value cv(G) is applied to only one input of G, say q j , then the single stuck-at fault q j SAncv(G) will be excited and its effect be propagated to q, the output of G. In contrast, if the value cv(G) is applied to two or more inputs of G, then the effect of none of the stuck-at faults at any line in the transitive fanin of G will be propagated to the output q of G. In turn, this implies that none of the single stuck-at faults at the lines in the fanin of G will be detected by the test vector generated for the target fault. Based on above observations, procedure Jusitfy-ff-allcv shown in Figure 3.4 is used. To justify a desired value at the output of a fanout-free sub-circuit sc i , this procedure is initially invoked with the line that is the output of sc i and the desired value. 52 .Justify-ff-allcv(q, v(q)) .// q is output of gate G with controlling value cv(G) .// and controlled response cr(G) .if v(q)= cr(G) . then for every input q i of G . assign v(q i )= cv(G) . if q i is not a primary input . then Justify-ff-allcv(q i , v(q i )) .else . for every input q i of G . assign v(q i )= ncv(G) . if q i is not a primary input . then Justify-ff-allcv(q i , v(q i )) Figure 3.4: The proposed justification procedure, Justify-ff-allcv. .ERTG-ff() .// G i is a gate on the path from the fault site to the output .// s k is a side input of the on-path gate G i .select a target fault f t , z SAα, such that R(f t )≥ T er .Justify-ff-allcv(z,α) .for every on-path gate G i . for every side-input, s k , of G i . assign v(s k )= ncv(G i ) . Justify-ff-allcv(s k ,v(s k )) Figure 3.5: The proposed test generation procedure, ERTG-ff. 53 Figure 3.6: The fanout-free subcircuit in the fanin of the fault site. The above justification procedure can be used in combination with the above conceptstoobtaintheproposedtestgeneratorforasinglestuck-atfaultinafanout- free circuit with primitive gates. The proposed procedure is shown in Figure 3.5. NotethatERTG-ffisusedtoexplicitlygenerateatestforafaultonlyifthefault is unacceptable. Based on the description of ERTG-ff and the above discussion of the justification procedure Justify-ff-allcv, the following results are achieved. Lemma 3.4. Any vector generated by ERTG-ff for an unacceptable target fault f t in a fanout-free circuit with primitive gates, detects a stuck-at faultq SAv(q) - where q is a line within the fanout-free sub-circuit sc 0 that drives the fault site z - if and only if the fault q SAv(q) is equivalent to f t . Proof. Consider vector v t that is generated by ERTG-ff for f t . First, assume that v t detects q SAv(q), and prove that q SAv(q) and f t are equivalent. Consider line q at an input of a gate G 0i in sc 0 . Consider the path from line q to z in sc 0 as 54 illustrated in Figure 3.6. At each gate on this path, v t must assign non-controlling value of the gate to all inputs that are not on the path, in order to detect q SAv(q). According to justification procedure of ERTG-ff, if non-controlling value of a gate is assigned to one input, it is assigned to all inputs of the gate. Hence at each gate, non-controlling value of the gate is assigned to the input on the path as well. Therefore, v(q) = ncv(G 0i ), and q SAv(q) is equivalent to the SAcr(G 0i ) at the output of G 0i . The SAcr(G 0i ) at the output of G 0i is in turn equivalent to the SAcr(G 0(i−1) ) at the output of G 0(i−1) . By chain effect, is equivalent to f t which is the SAcr(G 01 ) at the output of G 01 . Next, assume that q SAv(q) and f t are equivalent, it is derived from the prop- erties of equivalent faults that v t detects q SAv(q). Lemma 3.5. Any vector generated by ERTG-ff for an unacceptable target fault f t in a fanout-free circuit with primitive gates, detects a stuck-at fault q SAv(q) - where q is a line within the fanout-free sub-circuit sc j that drives a side input s j of an on-path gate G i with on-path input o i - if and only if the fault q SAv(q) is equivalent to the fault o i SAv(o i ). Proof. This Lemma can be proven similar to Lemma 3.4. Table 3.1 summarizes coverage of faults at different locations in the circuit. Fi- nally, allaboveproceduresandlemmascancollectivelyanswerthecentralquestion. Theorem 3.1. Any test vector generated by the procedure ERTG-ff for an unac- ceptable fault does not detect any acceptable fault. 55 Table 3.1: Fault categories and their properties Fault Location Option Property On-path lines Must be detected ∀f : R(f)≥ R(f t )≥ T er Not detected No faults in this category Side inputs Must be detected ∀f : R(f)≥ R(f t )≥ T er Not detected Fanin 1 of Must be detected ∀f : R(f)≥ R(f t )≥ T er side inputs Not detected Fanin 1 of Must be detected ∀f : R(f)≥ R(f t )≥ T er site of f t Not detected 1 We use the term fanin to describe lines in transitive fanin as well. The proposed test generator, ERTG-ff, provides a perfect solution for the prob- lem of test generation for error-rate testing for single stuck-at faults in fanout-free circuits with primitive gates and can achieve full coverage of unacceptable faults while covering no acceptable faults, i.e., maximizing acceptance gain, and hence maximizing the benefit of error tolerance. In the next section we will discuss a number of challenges we face during test generation for error-rate testing in arbi- trary circuits. 3.2.5 Challenges posed by arbitrary circuits Anarbitrarycombinationalcircuitmayusecomplexgates,suchasAOI,OAI,MUX, and XOR, and may have non-reconvergent or reconvergent fanouts. In this section, first, challenges posed by some complex gates are analyzed. Properties of circuits with non-reconvergent fanouts that only use primitive gates are studied next. Fi- nally,thechallengesposedbyarbitrarycombinationalcircuitsareconsidered,which 56 Figure 3.7: The sets of all test vectors for single stuck-at faults associated with (a) a two-input NAND gate, and (b) a two-to-one multiplexer. combine above challenges with new ones raised by the presence of reconvergent fanouts. 3.2.5.1 Complex gates in fanout-free circuits The above concepts, procedures, and results can be easily extended to two com- monlyusedfamiliesofcomplexgates,namelyand-or-invert(AOI)andor-and-invert (OAI). However, some commonly used complex gates do not have the properties that we have identified above. It is described in this section, how some complex gates can pose new challenges to error-rate testing even in an otherwise fanout-free circuit. Figure 3.7 shows the sets of all test vectors for single stuck-at faults located at inputsandoutputoftwodifferentcircuitelements. Inthecaseoftwo-inputNAND, c SA0 dominates a SA1 as well as b SA1. As described above, in a case where this gate is embedded within an arbitrary fanout-free circuit, SATV(c SA0) ⊃ {SATV(a SA1)∪SATV(b SA1)}. Thishastwoimportantconsequences. First,the 57 errorrateforc SA0ishigherthantheerrorratesfora SA1andb SA1, independent oftheprobabilitiesofoccurrenceofvalues(00),(01),and(10)atthegate’sinputs, aandb. Second,itispossibletousevectorsthatimply(00)ataandbtodetectthe dominating faultc SA0, which has the higher error rate, without detecting either of the two faults that it dominates, namely a SA1 and b SA1, which have lower error rates. Next, consider the faults c SA0, a SA0, and b SA0 in the two-to-one multi- plexer. It can be seen that c SA0 dominates each of the other two faults, and SATV(c SA0)⊃SATV(c SA0), and SATV(c SA0)⊃SATV(b SA0). Error rate for c SA0 is higher than the error rates for a SA0 and b SA0, inde- pendent of the probabilities of occurrence of values (1 0 0), (1 1 0), (1 1 1), and (0 1 1) at a, b, and s. However, SATV(c SA0) = {SATV(a SA0)∪SATV(b SA0)}. Hence it isnot possible to detect the dominating fault c SA0, which has the higher error rate, without detecting one of the two faults that it dominates, namely a SA0 and b SA0, which have lower error rates. Faults s SA1, a SA0, and a SA1 have a type of relationship that does not occur for primitive gates. The relative values of error rates of these three faults vary with the probabilities of occurrence of values (0 0 0), (0 1 0), (1 0 0), and (1 1 0) at the inputs of the multiplexer. Consider a case where the error rate fors SA1is higher than those fora SA0anda SA1. (This can occur, for example, if the values (0 1 0) and (1 0 0) occur at the lines a, b, and s with probabilities greater than those for values (0 0 0) and (1 1 0).) It is easy to see that in such cases, it is not possible to detect the fault with higher error rate, s SA1, without detecting at least one of the two faults with lower error rates. 58 Figure 3.8: The sets of all test vectors for single faults associated with a two-input XOR gate. Similar complications arise in the case of an XOR gate. Consider two-input XOR gate and the set of all test vectors for single stuck-at faults at its inputs and output in Figure 3.8. All faults at the inputs and output of the XOR gate have two test vectors. However, depending on the probability of the value assignment at each of the inputs, the error-rate of the faults might be different. Consider a case where c SA0 is unacceptable and all other faults are acceptable. It is easy to see from the figure, that it is not possible to detect c SA0 without detecting at least two of b SA0, a SA1, b SA1, a SA0. In general, when some complex gates are used in a fanout-free circuit, it may be impossible to detect some unacceptable faults without detecting at least one acceptable fault. However, the following approach can be used to minimize the number of additional acceptable faults that get detected. Consider the example of the XOR gate. Only this time, assume that b SA0 is acceptable, and a SA1, b SA1, and a SA0 are unacceptable. Fault c SA0 can be detected by assigning either (1 0) or (0 1) at a and b. It is easy to see that by 59 assigning (0 1) only one acceptable fault, a SA1, will get detected, however, by assigning (1 0) two acceptable faults, b SA1 and a SA0, get detected. Later, such analysis is used in our test generation approach, to reduce the num- ber of acceptable faults that get detected by a vector. 3.2.5.2 Non-reconvergent fanouts Consider a circuit with non-reconvergent fanouts. Figure 3.9(a) shows a three-gate circuit with a non-reconvergent fanout. Consider a scenario where every possible vector is applied with equal probability at the inputs of this three-gate circuit. In such a scenario, it is easy to compute the following error rates for the SA0 faults at the fanout stem e and its branches g and h: R(e SA0) = 9/16, R(g SA0) = 6/16, and R(h SA0) = 6/16. It is also easy to see why the stem fault e SA0, which has the higher error rate, cannot be detected without detecting the corresponding fault at one or more of the fanout branches, g SA0 or h SA0, each of which has a lower error rate. Note that this observation becomes important if the error rate thresholdspecifiedbytheapplicationandusedtodistinguishbetweenunacceptable and acceptable faults is, 6/16<T er ≤ 9/16. In general, even when a circuit with only non-reconvergent fanouts uses only primitive gates, it may be impossible to detect an unacceptable single stuck-at fault at a fanout stem without detecting at least one acceptable single stuck-at fault at one of its branches. 60 Figure 3.9: (a) An example circuit with non-reconvergent fanout, and (b) a generic circuit with non-reconvergent fanout. The number of acceptable faults detected by a complete set of test vectors for unacceptable faults can be minimized. Consider a generic circuit with a non- reconvergent fanout shown in Figure 3.9(b) where a fault at the fanout stem, say a SA0aswellassomecorrespondingfaultsinthefaninofthestemaareunacceptable. Even if any one of the corresponding faults at the branches of this fanout, i.e., either b SA0 or c SA0, is acceptable, a vector can be generated that detects the unacceptable fault a SA0 and the corresponding unacceptable faults in the fanin of a, by propagating each of their fault effects via the fanout branch which has SA0 fault that is also unacceptable. Now consider a scenario where SA0 faults at both fanouts are acceptable. In such a case, a vector can be generated that detects unacceptable fault a SA0 and the corresponding unacceptable faults in the fanin of a, by propagating each of their fault effects via one particular fanout branch. Hence, even if each vector generated for an unacceptable fault detects a number of acceptable faults, the total number of acceptable faults detected by the overall set of test vectors can be small. 61 3.2.5.3 Circuits with reconvergent fanouts In addition to the complications described above in an arbitrary combinational circuit the presence of reconvergent fanouts may make it (i) impossible to apply some combination of values at the inputs of a gate, or (ii) impossible to propagate the fault effect from the fault site to an output when a particular combination of values is applied to the inputs of the gate with the fault site to excite the fault. 3.2.5.4 Summary of complications A summary of the most significant complications faced while developing a test generator for a circuit with fanout and complex gates is as follows. (i) Single stuck-at faults associated with some complex gates, such as XOR and MUX, do not have dominance and equivalence relationships. For such a gate, it is not always possible to detect a fault at its output without detecting a fault with a lower error rate at one of its inputs. (ii)Incircuitswithfanout,theeffectofatargetfaultcanbepropagatedthrough any one of many alternative paths. Faults along the propagation path are typically also detected by the vector generated for the target fault and some of these faults may be acceptable. Finding a propagation path that will result in detection of the smallest number of acceptable faults can decrease the total number of acceptable faults that are detected by the test set generated by ERTG. 62 (iii) It is also possible for the effect of a target fault to propagate via multiple paths. Insuchacase,thevectorgeneratedforthetargetfaultmaydetectacceptable faults at lines along the multiple propagation paths. (iv)Incircuitswithreconvergentfanoutitisnotalwayspossibletoassigndesired values to inputs of a gate. Also the faults on the lines in the fanin of the site of the target fault are not necessarily equivalent to or dominated by the target fault. This is also the case for faults at the lines in the fanin of a side-input line. In Section 3.3, the above properties are used to develop heuristics to tackle the above complications and develop a new test generator for error-rate testing in presence of fanouts and complex gates. 3.3 ERTG: Error Rate test generator In this section the challenges described above are addressed by constructing an error rate test generator (ERTG) for single stuck-at faults in arbitrary circuits with a new objective, namely, detecting every unacceptable fault while detecting as few acceptable faults as possible (maximum acceptance gain, when perfect is not possible). First, an overview of the test generator is given. The proposed approach is discussed in more detail afterwards. 3.3.1 Overview of ERTG ThemainobjectiveofERTGistodetecteveryunacceptablefaultwhileminimizing the number of acceptable faults that get detected. With this in mind, the test 63 Figure 3.10: Flowchart of the proposed ERTG. generator illustrated in Figure 3.10 is proposed for error-rate testing for arbitrary circuits. Assume that an error-tolerant application will provide us with a threshold error rate which can be used to categorize all single stuck-at faults as acceptable or unacceptable. This test generator modifies each step of a classical test generator in a manner such that decisions are made to advance our main objective. Therefore, whenever there exist multiple choices in a classical test generator, our approach handles the decision making. A cost is assigned to every choice, defined as the approximate number of additional acceptable faults that will be detected if that choice is made. The comments in call-out boxes in Figure 3.10 show how some procedures of a classical ATPG are customized to consider our objective. 64 ERTG targets unacceptable faults one at a time. Once a test is generated for an unacceptable fault, all other faults detected by this vector are identified. If an acceptable fault is detected by one vector, it no longer contributes to the cost functions for subsequent vectors. The order in which unacceptable faults are targeted therefore plays an important role in determining the total acceptance rate of our test generator. Once an unacceptable fault is targeted, primary objective of the test generator is to make the detection of target fault possible. The effect of the targetfaultisthereforepropagatedtooneoftheoutputs. Duringtheprocessoffault effectpropagation,thesecondaryobjectiveistoavoidexcitationandpropagationof acceptable faults (that have not been detected by vectors generated for previously targeted faults). During the justification process, the primary objective of the test generator shifts from detection of the target fault to blockage of acceptable faults. Once the justification of all unjustified lines is completed and detection of the target fault is guaranteed, the values at lines with partially specified values are specified with the sole objective of blocking the effects of as many acceptable faults as possible. Note that in our test generator cost metric is used to select a single propagation pathfortheeffectofthetargetfault. Ourotheralgorithmsalsouseheuristicsbased on the propagation of a fault effect via a single path. However, our methodology is complete as it does consider the space of vectors for which the effect of the target fault is propagated concurrently via multiple paths. Next, ERTG procedures are described in detail. 65 3.3.2 Key procedures 3.3.2.1 Fault effect propagation Any circuit with fanouts can be divided into fanout-free regions that are connected viafanoutstemsandfanoutbranches[2]. Figure3.11(a)showshowagenericcircuit with fanouts is divided to fanout-free regions (FFR). Consider a target fault f t at a line within FFR 0 . It is obvious that to be propagated to the primary output, the effect of the target fault must first be propagated to line a, since all paths from the target fault site to primary outputs pass through a. From line a, the fault effect must propagate either through path 1 consisting of FFR 1 , FFR 2 and FFR 3 , or through path 2 consisting of FFR 4 and FFR 3 . Figure 3.11(b) shows the possible propagation path for the target fault as a graph. Each vertex represents the part of the propagation path through an FFR and each edge represents a circuit line between two FFRs. The propagation path through each FFR is indicated by the name of the on-path input of the FFR. In a fanout-free circuit which contains only primitive gates, the effect of the target fault f t can be propagated only via a unique propagation path (UPP). This guarantees the detection of one of the two single stuck-at faults at each line along the UPP. However, in that case it can be shown that the fault detected at any line along the UPP is also unacceptable (see Section 3.2)[60]. In contrast, in a circuit with fanouts, the effect of a target fault f t may be propagated via many different paths. Also the single stuck-at fault f op at a line along the propagation path that is excited by the vector may be acceptable. Due to presence of fanouts, the effect of 66 Figure 3.11: (a) A generic circuit with fanouts partitioned into FFRs, and (b) potential propagation path graph for a target fault at a line within FFR0. fault f op may not be propagated to the output. However, empirical analysis shows that faults like f op are detected with very high probability by a vector generated for f t . In other words, in a circuit with fanouts, the test for a target fault is likely to detect the stuck-at fault excited at each line along the propagation path. In order to minimize the number of acceptable faults that get detected in this manner, for a given target fault, the proposed test generator searches all potential propagationpathsandselectsthepathwithminimumcost. Thepropagationcostof apathforatargetfaultisdefinedasthenumberofacceptablefaultsthatpotentially get detected due to propagation of the corresponding fault effect through the path. The cost of propagation of a fault effect through a path is the sum of the propagation cost of all distinct path segments corresponding to on-path FFRs. For 67 example, let us assume that detection of f t through path 1 in Figure 3.11 (FFR 1 , FFR 2 , and FFR 3 ) results in detection of f a at line a, f b at b, f d at d, and f g at g. Clearlytherearefaultsattheon-pathlinesoftheuniquepropagationpathsegment through the FFRs that get detected as well. These faults contribute to the cost of propagation of a fault effect from input of an FFR to its output. Let us denote the costofpropagationofeffectoffaultf throughauniquepathsegmentfromlinexto line y as C x−y f . Propagation cost of path 1 for fault f t can be calculated as follows. CP 1 ft =C x−b ft +C b−d f b +C d−g f d +C g−h fg . In order to calculate the propagation cost of a path segment for a fault on an on-path line, number of acceptable faults on the on-path lines that get excited during fault effect propagation must be computed. Note that in order to propagate a fault effect at the on-path input of an on-path gate to the output of the gate, all side-inputs of the gate must be assigned with the gate’s non-controlling value. Therefore if a single SAcv at the on-path input is excited, then the effect of a single SAcv at any of the gate’s side-input lines also gets propagated to the output of the gate. More precisely, detection of a target fault in a fanout-free circuit will also result in detection of a fault at a side-input of an on-path gate, if and only if the fault at the side-inputs and on-path inputs of the gate are equivalent. At the output line of each on-path gate G, if fault SAcr(G) is excited by propa- gation of the target fault effect and is acceptable, number of faults in the transitive 68 Figure 3.12: Propagation cost of an example UPP. fanin of G that are equivalent to this fault must be counted. Weight and accept- ability metrics for a fault are therefore defined as follows. Definition 3.1. Weight of a fault, f denoted by W f , is the number of faults in the transitive fanin of the fault site that are equivalent to f. Definition 3.2. Acceptability of a fault, denoted by A f , is ”1” if the fault is accept- able and not yet detected, and is ”0” if the fault is unacceptable or if it is acceptable but has already been detected by an earlier vector generated by ERTG. Take the unique propagation path segment through FFR 1 as an example and expand it in Figure 3.12. Assume that in order to propagate the effect of fault f b through this path, single stuck-at faults f x , f y , and f d , at lines x, y, and d, respectively, getexcited. Propagationcostoff b throughthispathcanbecalculated as follows. C b−d f b =A f b W f b +A fx W fx +A fy W fy +A f d W f d . Propagation cost of each path segment (corresponding to each FFR) can be calculated off-line for the SA0 as well as the SA1 fault at each input of every FFR, before the test generation starts. At the time of propagation merely a minimum 69 path selection problem is solved. When an acceptable fault, f, gets detected by a test vector, A f becomes 0, therefore for each subsequent target fault propagation costs are recalculated for the affected FFRs. For a circuit with complex gates two different approaches can be taken to calcu- late the cost of propagation. First, there exist alternative value assignments at the side inputs of the gate that will result in propagation of the fault effect. Different assignments at different side inputs will also result in excitation of different faults at the output of the on-path gate. Therefore, for circuits with complex gates, cost of a path for a particular fault depends on the assignments at the side-inputs of the complex gates on the propagation path. More precisely, there might be several costs mapping to a path for a particular fault. The complexity of calculating the cost in this manner grows rapidly with the number of complex gates along a path. Therefore a second approach is used, where the complex gate is replaced with a sub-circuit that uses primitive gates to implement the same logic function. Above approach is used to calculate the propagation cost of each path. In order to find the weight of a fault at each line a breadth first traversal for the circuit starting at its primary inputs is performed. On the output line of each on- path gate G, weight of fault SAcr(G) is equal to the sum of weights of all SAcv(G) faults at side-inputs of the gate. Weight of fault SAcr(G) is equal to 1. The propagate procedure of our test generator first checks the cost of each po- tential propagation path for a target fault. It then identifies the path with the lowest cost and propagates the effect of the target fault through this path. Note 70 that due to reconvergent fanout, effects of a target fault may reconverge with op- posite values (i.e., D and D) and get canceled and propagation of a target fault through a potential propagation path may be impossible. On the other hand due to reconvergent fanout, the fault effect may propagate through multiple paths to one output and this may cause detection of more acceptable faults than specified by the cost function of the shortest path. Therefore the cost metric calculated here is a heuristic to help achieve the objective of the test generator. 3.3.2.2 Justification procedure Once the target fault effect is propagated through a preferred path to a primary output, values, especially at side-inputs of the on-path gates, need to be justified. The main objective of our justification procedure is to block the effect of acceptable faults from propagation to the outputs. In the proposed justification procedure for fanout-free circuits that contain only primitive gates, it was proven earlier that assigning identical values at the inputs of the unjustified gate will result in blocking the effect of any acceptable fault from propagation to the output of the gate. How- ever, in circuits with reconvergent fanout, assignment of such values to the inputs of a gate may sometimes result in a conflict. Consider a primitive gate G with inputs x 1 ,x 2 ,...,x k and output z in a circuit with fanout. To justify v(z) =cr(G), the only possible assignment of values at the inputs is x 1 = x 2 = ... = x k = ncv(G). In this case the effect of any one of the SAcv(G) faults at the input lines will get propagated to the output of the gate. All SAcv(G) faults at the input lines are equivalent to z SAcr(G). Therefore, if the 71 Figure 3.13: Justification of {1} at the output of a 3-input NAND gate. fault at the output is detected, all such faults at the input will also be detected. In this case, there is no other alternative assignment for the gate, therefore in our test generation algorithm, this assignment will fall under the category of implication and will not even be a part of the justification procedure. If the value at the gate output to be justified is v(z) =cr(G), any assignment of values at the input that includes cv(G) at one of the inputs will justify the value at the output. To justify a controlled response at the output of a gate assignment of controlling values to two inputs of the gate (e.g., see Figure 3.13(b)) is considered as the assignment option with the highest priority. In this case fault effect of single SAcv(G) and SAncv(G) on all input lines ofG will be blocked. Input lines at which thecontrollingvalueisassignedareselectedusingtheclassicalcontrollability metric of controlling value of the gate. In case of a conflict in value assignment, the next option is assignment of cv(G) to only one of the inputs (e.g., see Figure 3.13(c)) in which case the effect of the SAncv(G) at that particular input will be propagated to the output of the gate. 72 To satisfy the objective of our test generator, the input line at which the control- ling value is assigned is selected using the weight and acceptability metrics for the SAncv(G) fault at the input lines. Figure 3.13 shows justification of {1} at the output of a NAND gate as an example. The first alternative is assigning {0} to two of the three input lines, which is as good an option as assigning {0} to all inputs in terms of the faults at the input lines that get propagated to the output. In the last alternative, however, a SA1 at the input that is assigned with {0} will get propagated to the output, therefore {0} is assigned to the line with the SA1 fault with the lowest weight. In the case of justification of a value at the output of a complex gate, notions of controlling value and controlled response cannot be deployed. We also explained earlier that in some complex gates it is not possible to detect an unacceptable fault at the output without detecting some acceptable faults at the inputs of the gate. One approach to tackle this is to use the product of the weight and acceptability metrics of the faults at each input line to find the assignment with the lowest cost. Another approach is to implement the logic function of the complex gate by a sub-circuit implemented using primitive gates and justifying the gates using the approach discussed above. In our test generator we adopt the latter approach. 3.3.2.3 Assigning values to partially specified inputs Once a test vector is generated for a fault, some of the primary inputs as well as internal lines of the circuit may be assigned partially specified values. At the time of test application, however, the value at every input must be fully specified. 73 In a classical test generator, the partially specified values (or, don’t cares) can be used for test compaction. Since this might result in detection of more acceptable faults,unspecifiedvaluesareassignedinacompletelydifferentway,namelytoblock the effects of all acceptable faults that may otherwise be detected by the vector. The basis of this approach is the same as dynamic test compaction, in the sense that a test generator is used to generate alternative tests with special properties. However, the objective of our approach is to detect fewer acceptable faults instead of facilitating compaction. To asses the importance of proper specification of input values, simple exper- iments were performed. A test vector was generated for a target fault using the error-rate test generator. Random values were assigned to the partially specified inputs and created multiple vectors. The circuit was simulated for each of these vectors. As a result number of acceptable faults that were detected by each of these vectors varied significantly in some cases. It was concluded from above observa- tion that proper specification of input values plays a significant role in the overall coverage of acceptable faults. The main strategy for specifying bits of a partially specified test vector is the same as the strategy for justification. The objective is to block the effects of as many acceptable faults as possible at the time of value specification in a greedy manner. A line with partially specified value that is closest to one of the outputs is selected. Totalcostofvaluespecificationfordifferentalternativevaluesiscalculated and the alternative with lowest cost is selected. The procedure proceeds to the next 74 line with partially specified value. The line that is closest to the output is chosen because the weights of faults at such a line give a better representation of the group of faults in the transitive fanin of the gate whose propagation or blockage depends on the specified value. Consider the gate, G, in Figure 3.14 with output z and inputs w, x, and y. Let us assume that the output z has a partially specified value that contains both controlled response and its complement. Assigning this line with the controlled response of the gate may result in detection of z SAcr(G). If the value at two or more inputs of G includes the controlling value of the gate, all faults at the input lines can potentially be blocked. However, if only one of the input lines, e.g., x, can be assigned with the controlling value, the SAncv(G) at that input line may get detected as well. Therefore, cost of assigning the controlled response of a gate to its output is as follows. • A z SAcr(G) , if more than one inputs can be assigned with cv(G). • A z SAcr(G) +A x SAncv(G) W x SAncv(G) , if only one input, say x, can be assigned with cv(G). The cost of assigning the complement of the controlled response of a gate to its output is as follows. • A zSAcr(G) +W z SAcr(G) . Now consider a line, x, with a partially specified value at one of the inputs of the gate, G. Since the procedure starts with lines closest to the output, by the time 75 Figure 3.14: Example gate with partially specified values. line x is selected to be specified, it is guaranteed that the value at the output of the gate is fully specified. If the partially specified value contains the controlling value of G, the line should be assigned the value cv(G), otherwise it is already assigned ncv(G). Asanexample,considerthecircuitinFigure3.15withtargetfaulthSA0andthe selected propagation path shown in thick lines. The values after the completion of propagation and justification procedures are shown at the lines. Let us assume that line e is the closest line to one of the outputs with a partially specified value. Cost of specifying {1} at line e is A f SA1 W f SA1 +A e SA0 W e SA0 , while cost of specifying {D} at line e is 0. Therefore, {D} is specified at line e, which results in c = {1}. The proposed justification procedure will justify line c with a = b = {0}. The primary inputs are hence specified without detection of further acceptable faults. Obviously, the cost calculation is merely a heuristic to advance the test gener- ation towards our objective. In case of a later conflict in value assignments, the specification procedure backtracks to a suitable state and assigns an alternative value with a likely higher cost. 76 Figure 3.15: Specification of values in an example circuit. 3.3.2.4 Target fault selection Foreverytestthatisgenerated,faultsimulationisperformedandlistsofacceptable andunacceptablefaultsareupdated. Hence,theorderinwhichthetargetfaultsare selectedfortestgenerationcandeterminetheacceptancegainthatERTGprovides. Target faults can be sorted in ascending order of their error rates. There are more test vectors that detect a fault with higher error rate; therefore, it is more probable that a fault with higher error rate will be detected by a vector that was generated for another fault. Hence, ordering the target faults in this manner may result in a smaller test set as well as shorter test generation time. On the other hand, consider the two unacceptable faults, f 1 and f 2 in the ex- ample circuit in Figure 3.16. Let us assume that propagation of f 1 through FFR 1 and FFR 3 results in detection of f 2 at the input of FFR 3 . Propagation costs of each of the path segments through FFRs for f 1 are written on the input lines. Let us consider the case where we target f 1 before f 2 . In this case, propagation cost of the path through FFR 1 and FFR 3 is greater than the propagation cost of the path through FFR 2 and FFR 3 . Therefore, the fault effect is propagated through 77 Figure 3.16: Selection of target faults. the latter and as a result eight acceptable faults are detected. For f 2 , there is only one option to propagate the fault effect and that results in detection of six more acceptable faults. Howeverinthealternativecase,wheref 2 istargetedbeforef 1 ,thesixacceptable faultsthataredetectedastheresultofpropagationoff 2 throughFFR 3 aremarked offandthepropagationcostofthecorrespondingpathsegmentforf 1 isrecalculated as zero. Accordingly, propagation cost of the path throughFFR 1 andFFR 3 is less than the propagation cost of the other alternative. Therefore, the former path will be selected and as a result only four more acceptable faults get detected. This example suggests that sorting the target faults in an ascending order of the number of FFRs that are located between the fault site and a primary output line may result in overall detection of fewer acceptable faults. Based on our analysis, the latter approach is more helpful in meeting the ob- jective of our test generator; therefore it is adopted as current policy for selecting target faults. Development of a more systematic approach to this problem is a subject of future research. 78 3.3.3 Example of error-rate test generation Consider the benchmark circuit c17 in Figure 3.17 and target fault SA0 at line 11. The circuit is divided into four fanout-free regions as shown in the figure. For a hypothetical error rate threshold of 0.3 a list of all acceptable stuck-at faults in the circuit along with their weights is shown in Table 3.2. The cost of propagation of each of the faults at the inputs of the FFRs to the outputthroughtheuniquepropagationpathiscalculated. Therearethreepathsto propagate the target fault effect to either of the outputs: path 1 (11-14-16-20-22), path 2 (11-14-16-21-23), and path 3 (11-15-19-23). Propagation cost of path 1 is equal to C 14−16 14 SA0 + C 20−22 20 SA1 = A 14 SA0 W 14 SA0 + A 16 SA1 W 16 SA1 + A 20 SA1 W 20 SA1 + A 22 SA1 W 22 SA1 = 0. Table 3.2: List of weights of acceptable faults in c17, T er = 0.3. W 1 SA0 =1 W 1 SA1 =1 W 3 SA0 =1 W 3 SA1 =1 W 6 SA0 =1 W 6 SA1 =1 W 7 SA0 =1 W 7 SA1 =1 W 8 SA0 =1 W 8 SA1 =1 W 9 SA0 =1 W 9 SA1 =1 W 11 SA1 =2 W 14 SA1 =1 W 15 SA0 =1 W 15 SA1 =1 W 10 SA1 =2 W 19 SA1 =2 W 21 SA1 =1 Propagation cost of path 2 and path 3 are calculated as 1 and 3, respectively. Therefore, path 1 is selected to propagate the fault effect. The line assignments necessary to excite and propagate the fault effect through path 1 are as follows. Line11={1},2={1},and10={1,D}. Duetotheseassignments,thefollowing values are implied. Line 14 ={D}, 15 ={D}, 16 ={D}, 20 ={D}, 21 ={D}, 22 = {D}, 23 = {D}, and 19 = {1,D}. 79 Figure 3.17: Error-rate test generation for c17. Lines 11 and 10 remain unjustified after the propagation procedure. The pro- posed justification procedure results in the following assignments. Line 6 = {0}, and 9 ={0}, which implies 3 ={0}, and 8 ={0}, which causes the value at line 10 to change, 10 ={1}, and that justifies 1 ={0}. Lines 7 and 19 have partially spec- ified values. The specification procedure selects line 19 since it is closer among the two to an output. The partially specified value at 19 contains both the controlled response of the gate and its complement. Therefore, value at line 19 is specified as D}, which corresponds to the complement of the controlled response of the gate. Backward implication results in 7 = {1}. The test vector is generated as (1, 2, 3, 6, 7) = (0 1 0 0 1) which detects the target fault. It is easy to see that this test vector does not detect any acceptable faults. 3.4 Experimental results A given threshold values, T er , is used to identify acceptable and unacceptable faults in the manner described in Section 2. The proposed ERTG procedure is used to 80 generate test vectors for all unacceptable faults in ISCAS 85 benchmark circuits and unacceptable coverage, acceptance gain, run-time, and test-set size are studied. To demonstrate the effectiveness of the proposed ERTG approach, the accep- tancegainsachievedusingERTGiscomparedwiththebestacceptancegainachieved bypreviousapproachesdiscussedinSection2.4,eachofwhichusesadifferentadap- tation of a classical ATPG. This comparison shows an improvement of around 10% in acceptance gain in all cases. It is especially important to note that the main competition, in terms of acceptance rate, namely Approach 3, is not scalable to large circuits; in contrast, the complexity of the proposed approach is nearly the same as that of a classical test generator. Column 2 of Table 3.3 shows that for every circuit the unacceptable coverage achieved by ERTG is 100%. Column 3 shows the acceptance rates achieved for different circuits. Note that the increase in yield is directly proportional to these values. Columns 4 and 5 show the test-set sizes and run-times of a test generated by ERTG, normalized with respect to test-set size and run-time of a classical test. ERTG targets fewer faults compared to classical testing; however, because of the higher degree of selectivity in how each fault is detected, fewer unacceptable faults are detected by each vector. Therefore, the test-set size is slightly larger than the test size for classical testing for most of the cases. The run-time of a classical test generator is highly dependant on the number of faults for which the test generator cannot generate a vector, namely aborted faults, since these fault exhaust the limit for the number of backtracks allowed. For example in case of c1355, the classical test generator aborted for 66 faults, 81 and reported 8 faults as untestable. The list of error-rates associated with these faults reveals that all the aborted faults are acceptable faults. Therefore in ERTG, these faults are not targeted and thus the normalized run-time is less than one. Furthermore, ERTG targets only a fraction of faults, namely unacceptable faults. This contributed to the lower run-times as well. Table 3.4 shows unacceptable coverage, acceptance gain, test-set size, and run- time of tests generated by ERTG for different threshold values for c880. It can be seen that although the percentages of acceptable faults varies with the threshold on the error rate, the acceptance gain does not vary significantly. In all cases, the ERTG provided full coverage of unacceptable faults. A similar trend is observed for other circuits and shows that performance of the proposed ERTG is independent of thethresholdontheerrorrateintermsofacceptancegain. Clearly,theperformance in terms of test-set size and test generation run-time varies. Table 3.3: unacceptable coverage, acceptance gain, test-set size, and run-time of tests generated by ERTG; T er = 0.1. Circuit Unacceptable Acceptance Normalized Normalized coverage(%) 1 gain(%) 2 test-set size run-time C880 100 83.0 2.5 1.8 C1355 100 74.8 0.7 0.1 C1908 100 74.5 1.2 1.2 C2670 100 83.7 1.7 2.1 C3540 100 75.8 0.7 0.1 C5315 100 71.7 1.3 5.1 1 Percentage of unacceptable faults 2 Percentage of acceptable faults 82 Table 3.4: Acceptance gain, normalized test-set size, and normalized run-time for different threshold error rates, for c880. T er Unacceptable Acceptable Acceptance Normalized Normalized faults(%) 1 faults(%) 1 gain(%) 2 test-set size run-time 0.01 88.2 11.8 90.9 4.2 4.7 0.05 70.3 29.7 80.1 3.1 2.8 0.1 59.8 40.2 83.0 2.5 1.8 0.2 41.4 58.6 83.0 1.5 0.9 0.3 22.8 77.2 75.7 0.9 0.4 0.5 8.4 91.6 83.9 0.5 0.2 1 Percentage of all single stuck-at faults 2 Percentage of acceptable faults 3.5 Summary Testgenerationforerrorrateisdefinedasgeneratingasetoftestswiththeobjective of distinguishing faults based on the acceptability of their error rates. In error-rate testing, faults that cause unacceptable performance are targeted one-by-one and a test generator generates a test for each fault and adds it to the test set. A perfect test for error-rate is defined as a test that detects all unacceptable faults without detecting any of the acceptable faults. Itwasproveninthischapterthatforaspecialfamilyofcircuits, namely, fanout- free circuits with primitive gates only, it is possible to detect all unacceptable faults without detecting any of the acceptable faults. The methodology was used to develop a test generation approach for error-rate testing of this family of circuits. Such perfect error-rate testing cannot be guaranteed for a general family of circuits, mainly due to properties of fanouts and some complex gates. However, the methodology is expanded to the general class of combinational circuits using 83 heuristics and algorithms to develop a test generator for error-rate testing. The proposed test generator provides full coverage of unacceptable stuck-at faults while covering a small percentage of acceptable stuck-at faults. This approach increases the yield benefits of error-rate testing at test application times and test generation times comparable to classical testing. 84 Chapter 4 Multi-vector tests: A path to perfect error-rate testing 4.1 Introduction Test generation for error-rate testing was previously defined as generating a set of tests that detects all unacceptable faults and minimizes the number of acceptable faults that get detected. InChapter3,itwasproventhatperfecterror-ratetestingintermsofacceptance gain is achievable for fanout-free circuits with primitive gates. However, it was also shown that for a general family of circuits this is not achievable. A test generator based on SOFE test application scenario was developed to minimize the detection of acceptable faults while detecting all detectable unacceptable faults. A number of properties that capture relationships between error rates of faults and their detection were identified. These properties as well as the limitations of 85 the previous approach, motivated development of a new test application scenario for testing, namely stop on first erroneous session (SOFES). SOFES is based on the new notion of multi-vector testing where testing is per- formed using a set of test sessions. Each test session includes multiple vectors for a target fault. During test application, a chip is discarded only if every vector in a test session produces an erroneous response. Using this new concept perfect error-rate testing in terms of acceptance gain can be achieved for all circuits and independentofthefaultmodel. Theproposedmethodoftestinghassomeoverhead in terms of test generation and application time, due to the increased number of vectors required to achieve perfect acceptance gain. Since the theoretically derived upper bound on the number of vectors required to achieve the objectives is extremely high in some cases, a structural approach is developed and set of single stuck-at faults is deployed to study fairly general classes of circuits to generate test sets with much fewer vectors. Experiments on a number ofbenchmarkcircuitsshowsignificantimprovementinacceptancegainatlowcosts. A multi-vector test generator is developed for arbitrary circuits to achieve perfect error-rate testing is proposed. 4.2 Multi-Vector Testing In classical testing scenarios for combinational circuits that use deterministic vec- tors, firstasetoftestvectorsiscreatedbygeneratingonetestvectorforeachtarget fault. Automatic test equipment (ATE) is then used to apply each generated test 86 vector at the inputs of acircuitunder test(CUT) and tocapture the response atits outputs. The response is then compared with that expected of a fault-free version. In such testing, one vector is applied and the CUT is discarded if the response cap- tured at CUT outputs differs from the corresponding fault-free response; otherwise, testingcontinueswiththenextvectorinthetestsequence. Suchatestingapproach is called stop on first error (SOFE) and is attractive due to its low cost. We showed in Chapter 3 that such an approach is sufficient to achieve perfect acceptance gain for fanout-free circuits, but not sufficient for arbitrary circuits. Anewconcept,namelymulti-vector testing isherebyintroduced. Foreachfault, multiple test vectors are generated and added to the test set as one session. Then an ATE is used to apply each vector in a test session to the inputs of the CUT, one vector at a time. ATE captures the response at the outputs for each vector in the session, and compares the captured responses with those expected of the fault-free version of the CUT. In this approach, the CUT is discarded only if the response captured at CUT outputs differs from the corresponding fault-free response for every one of the vectors in that session. Otherwise, testing continues with the next session in the test sequence. A session of multiple test vectors is referred to as a multi-vector test session, or M-VTS. With some abuse of notation an M-VTS with k vectors is referred to as a k-VTS. Figure 4.1 shows a sequence of input vectors applied to a circuit under test: v 0 ,v 1 ,v 2 ,v 3 ,v 4 ,..., and the corresponding responses: r 0 ,r 1 ,r 2 ,r 3 ,r 4 ,..., at the out- put of the circuit. A set of two vectors,{v 0 ,v 1 }, forms a two-vector test session (2- VTS)foronefault,andthesetofthreevectors,{v 2 ,v 3 ,v 4 },formsathree-vectortest 87 Figure 4.1: Multi-vector testing. session (3-VTS) for another. The output response of the circuit to the two-vector test session is a sequence of two responses, shown as{r 0 ,r 1 } and the output corre- spondingtothethree-vectortestsessionisasequenceofthreeresponses{r 2 ,r 3 ,r 4 }. Inclassicaltestingifacircuitundertestfailsanysingleinputvector, itisdiscarded. However, in multi-vector testing, in order for a chip to be discarded, every vector in at least one test session must fail. An erroneous response is denoted by f and an error-free response by p. For the example in Figure 4.1, a sequence of responses, r 0 ,r 1 ,r 2 ,r 3 ,r 4 = f,f,p,f,p, will result in discarding the chip since both vectors in the 2-VTS cause errors. However the sequence, r 0 ,r 1 ,r 2 ,r 3 ,r 4 = f,p,f,p,p, will result in keeping the chip even though the output responses are erroneous for some vectors. Definition 4.1. A fault is said to fail an M-VTS if it causes an erroneous response for every vector in the M-VTS. A fault that fails an M-VTS is considered detected by the M-VTS. Definition 4.2. A fault is said to pass an M-VTS if it causes an error-free response for any proper subset of all the vectors in the M-VTS. A fault that passes an M-VTS is considered undetected by the M-VTS. 88 Figure 4.2: An example multiplexer and the set of all test vectors for all stuck-at faults at its inputs and output. The main purpose of adopting multi-vector testing is to achieve perfect accep- tance gain in error-rate testing. In this approach, for each target fault an M-VTS has to be generated so that the M-VTS fails a circuit with the target fault but passes for as many acceptable faults as possible (ideally for every acceptable fault). Note that within an M-VTS, the order of the test vectors is not important. Consider the multiplexer and the set of all test vectors for all faults at its inputs and output shown in Figure 4.2. The set of two test vectors , v 1 = (0 1 0) and v 2 = (1 0 0), is a 2-VTS which fails for s SA1, and passes for all other faults. Consider the XOR gate and the set of all test vectors for all faults at its inputs and output shown in Figure 4.3. The set of two test vectors, v 1 = (0 1) and v 2 = (1 0), is a 2-VTS which fails for c SA0, and passes for all other faults. A series of results are hereby derived to prove that perfect acceptance gain is achievable for any circuit and any fault model using multi-vector testing. 89 Figure 4.3: An example XOR gate and the set of all test vectors for all stuck-at faults at its inputs and output. To rephrase the definition of error rate of a fault f, R(f), in terms of set of all test vectors of f, SATV(f), probability of occurrence of a test vector, v, during normal operation is denoted byp(v). Error rate of faultf can be defined as follows. R(f) = X ∀v∈SATV(f) p(v) For any two faults, f 1 and f 2 , if SATV(f 1 )⊆SATV(f 2 ), then R(f 1 )≤R(f 2 ). Lemma 4.1. For every unacceptable fault, f u , and every acceptable fault, f a , there exists a vector that detects f u but does not detect f a . Δ ua , SATV(f u )− SATV(f a )6=∅. (Symbol, denotes equality by definition.) Proof. To prove this lemma by contradiction we assume that Δ ua =∅. Therefore, SATV(f u ) ⊆ SATV(f a ). This contradicts the fact that error rate of f u is greater than the error rate of f a . Hence, Δ ua =SATV(f u )−SATV(f a )6=∅. Lemma 4.2. For every unacceptable fault, f u , there exists an M-VTS that no ac- ceptable fault fails. 90 Proof. Consideranunacceptablefault,f u . Weprovedearlierthatforanyacceptable fault, f a , Δ ua 6=∅, and there exists a test vector, v∈ Δ ua . Now consider an M-VTS that includes all vectors for the unacceptable fault, i.e., SATV(f u ). It is easy to see that a circuit with no fault or only any one of the acceptable faults will pass such an M-VTS, while a circuit with fault f u will fail such an M-VTS. Lemma 4.3. The size of an M-VTS for an unacceptable fault f u so that no accept- able fault fails is upper bounded by the minimum of number of acceptable faults, n a , and number of test vectors for fault f u , i.e., |SATV(f u )|. Proof. As per Lemma 4.1, our goals can be achieved by an M-VTS which contains all vectors in SATV(f u ). To ensure passing of every acceptable fault, f a , as per Lemma 4.2, this M-VTS needs to include at least one vector, v∈ Δ ua . In the worst case, each acceptable, f a , will require a distinct vector, v, leading to a maximum of n a vectors in the M-VTS. Hence, the upper bound for the size of the M-VTS is min{n a ,|SATV(f u )|}. Theorem 4.1. There always exists a set of multi-vector test sessions that will discard any chip with an unacceptable fault without discarding any chip with only an acceptable fault. Theorem 4.2. The size of such a test set is upper bounded by X ∀fu min{n a ,|SATV(f u )|}. 91 Such a test set is a collection of M-VTS’s of all unacceptable faults. The formu- lation used here is not intended for implementation and is only a construction for the proof. Although Theorem 4.1 proves that achieving perfect acceptance gain is possible for all circuits and for any fault model; Theorem 4.2 computes a universal upper bound on the number of vectors which is quite large in some cases. However in the following section, we present a structural approach to study fairly general classes of circuits to compute much tighter bounds on the number of test vectors needed to achieve perfect acceptance gain. 4.3 Multi-vector test generation for single stuck- at faults In previous section it was shown that using multi-vector testing, perfect acceptance gain can be achieved independent of the fault model. In this section, set of single stuck-at faults is considered. Necessary conditions that locally need to be satisfied ateachgateareintroducedsothattheunacceptabletargetfaultlocallyfailsandall other acceptable faults locally pass. From these local value assignments (i) upper bound on the number of vectors required in each multi-vector test session (M-VTS) toachieveperfectacceptancegainiscomputed,and(ii)amulti-vectortestgenerator for single stuck-at faults that achieves perfect acceptance gain is developed. 92 Table 4.1: Local detection of a fault at the input of (a) an AND gate, and (b) an XOR gate. Both gates have inputs a and b, and output c. Gate Target Local Locally Locally fault assignment detected failing at a at a and b faults M-LVA AND a SA0 (1 1) a SA0, b SA0, c SA0 1 a SA0 a SA1 (0 1) a SA1, c SA1 2 a SA1 XOR a SA0 (1 1) a SA0, b SA0, c SA1 a SA0 (1 0) a SA0, b SA1, c SA0 a SA1 (0 1) a SA1, b SA0, c SA0 a SA1 (0 0) a SA1, b SA1, c SA1 1 a SA0, b SA0, and c SA0 are equivalent 2 c SA1 dominates a SA1 4.3.1 Necessary conditions for multi-vector test generation In this section, necessary conditions to locally excite a stuck-at fault and propagate its fault effect through a gate as well as necessary conditions for justification of a value at the output of a gate are presented. Tables 4.1-4.3 show all possible local value assignments (LVA) that satisfy these objectives. For each local value assignment, all faults that locally get detected are identified. It can be seen from thetablesthatassignmentofmorethanoneLVAisnecessaryinsomecasestoblock the effect of potentially acceptable faults. Consider the intersection of the set of faults detected by different LVAs. A multiple-local value assignment (M-LVA) is defined as any subset of the set of all LVAs that only fails for the fault in this intersection of the set of faults. In this section, all faults that can be blocked using appropriate M-LVAs are studied as well as the faults that fail the M-LVA. In the next section, properties of the faults in the latter category are further studied. It is shown that detection of these faults is inevitable but does not reduce the acceptance gain. 93 Table 4.2: Local propagation of a fault effect through (a) an AND gate, and (b) an XOR gate. Both gates have inputs e and f, and output g. Gate Fault Local Locally Locally effect assignment detected failing at e at f faults M-LVA AND D 1 e SA0, f SA0, g SA0 1 e SA0 D 1 e SA1, g SA1 2 e SA1 D 1 e SA0, f SA0, g SA0 – and D 1 e SA1, g SA1 XOR D 1 e SA0, f SA0, g SA1 e SA0 0 e SA0, f SA1, g SA0 D 1 e SA1, f SA0, g SA0 e SA1 0 e SA1, f SA1, g SA1 D 1 e SA0, f SA0, g SA1 – ∗∗∗ and 0 e SA0, f SA1, g SA0 D 1 e SA1, f SA0, g SA0 0 e SA1, f SA1, g SA1 1 e SA0, f SA0, and g SA0 are equivalent 2 e SA1 dominates g SA1 *** In this case, any three out of these four LVAs constitute an M-LVA, and each of the M-LVAs does not locally fail for any local fault. Table 4.1 shows the LVAs that locally detect a fault located at an input of an AND gate and an XOR gate. The table also shows the set of stuck-at faults locally detected by each LVA and the intersection of such tests for all possible LVAs, i.e., faults that locally fail the M-LVA. In particular, for the AND gate, for each fault the M-LVA contains only one LVA. In contrast, for the XOR gate, for each fault the M-LVA has two LVAs. Table 4.1 also shows that in each case, all faults except the target unacceptable fault, or a fault that dominates it or is equivalent to it and hence is also unaccept- able, can be blocked using an M-LVA. Now, consider a gate on the propagation path from the fault site to the primary outputs. Table 4.2 shows that in the case where the fault effect to be propagated 94 Table 4.3: Justification of a value at the output of (a) an AND gate, and (b) an XOR gate. Both gates have inputs i and j, and output k. Gate Justify Local Locally Locally value assignment detected failing at k at i and j faults M-LVA AND 1 (1 1) i SA1, j SA0, k SA0 1 k SA0 0 (0 0) k SA1 k SA1 ∗ (0 1) i SA1, k SA1 (1 0) j SA1, k SA1 1 (1 1) i SA0, j SA0, k SA0 1 – ∗∗ and (0 0) k SA1 0 (0 1) i SA1, k SA1 (1 0) j SA1, k SA1 XOR 1 (0 1) i SA1, j SA0, k SA0 k SA0 (1 0) i SA0, j SA1, k SA0 0 (0 0) i SA1, j SA1, k SA1 k SA1 (1 1) i SA0, j SA0, k SA1 1 (1 0) i SA0, j SA1, k SA0 – ∗∗∗ and (0 0) i SA1, j SA1, k SA1 0 (1 1) i SA0, j SA0, k SA1 (0 1) i SA1, j SA0, k SA0 1 i SA0, j SA0, and k SA0 are equivalent * In this case, the first LVA constitutes an M-LVA. ** In this case, the first LVA with any one of the other three LVAs constitute an M-LVA, which does not locally fail for any local fault. *** In this case, any three out of these four LVAs constitute an M-LVA, and each of the M-LVAs does not locally fail for any local fault. 95 from the input of the gate is D for every LVA (in the M-LVA), a SA0 (or a fault that is equivalent to it) at the input line locally fails the M-LVA. In the case where the fault effect at the input is D for every LVA (in the M-LVA), a SA1 (or a fault that dominates it) at the input line locally fails. In cases where the fault effect at the input is D for some LVAs (in the M-LVA) and D for some others, no fault locally fails the M-LVA. Now, consider a gate in the transitive fanin of the fault site or in the transitive fanin of an off-path input (i.e., an input that is not on the propagation path) of an on-path gate (i.e., a gate that is on the propagation path). Table 4.3 shows that in the case where the value to be justified at the output of the gate is 1 for every LVA (in the M-LVA), a SA0 at the output (or a fault that is equivalent to it) fails the M-LVA. In a case where the value to be justified is 0 for every LVA (in the M-LVA), a SA1 at the output fails the the M-LVA. In cases where the value to be justified is 0 for some LVAs (in the M-LVA) and 1 for some others, no fault fails the M-LVA. Tables 4.1-4.3 capture the properties of a two-input AND gate and a two-input XOR gate, as examples of primitive and complex gates, respectively. These results can be easily extended to other primitive and complex gates. Complex gates can be divided to two categories, those with no internal fanout (i.e., those that can be implemented using primitive gates without any internal fanout), namely and-or-invert (AOI) and or-and-invert (OAI), and the class of more complicated commonly used complex gates with internal reconvergent fanout, namely multiplexer and XOR and XNOR gate. The results presented here can 96 Figure 4.4: A generic fanout-free circuit. easily be extended to the former by replacing each complex gate with an implemen- tation of its logic function using multiple primitive gates. Similar results can be derived for the latter group of complex gates, XNOR and multiplexers, in the same manner as done here for XOR gates. In the next section, necessary conditions from Tables 4.1-4.3 are combined with other properties of some families of circuits to find an upper bound on the size of an M-VTS that can achieve perfect acceptance gain. 4.3.2 Upper bound on the size of M-VTS In this section an upper bound is computed on the size of an M-VTS that achieves perfect acceptance gain for different classes of circuits. Circuits with no fanout are considered first. Consider a fault at the input of a gate G 1 in the generic fanout-free circuit shown in Figure 4.4. According to Table 4.1 if G 1 is an AND gate, faults that locally get detected by the only possible local value assignment (i.e., a 1-LVA) are either equivalent to the target fault or dominate it, hence are unacceptable. In case G 1 isanXORgate, however, sincethereisnoinformationavailableonacceptability of faults at the inputs and the output of the gate, the 2-LVA shown in Table 4.1, 97 i.e., {(1 1),(1 0)} for SA0 or {(0 1),(0 0)} for SA1 must be used so that the target fault is the only fault that locally fails the multiple value assignment. Now consider gate G 2 on the propagation path. Table 4.2 shows that if G 2 is an AND gate, in case the fault effect propagated to the input of G 2 is D or D for every LVA (in the M-LVA), all faults that get locally detected are either equivalent to the fault at the input of G 2 or dominate it. Lemma 4.4. Consider a fanout-free combinational circuit with primitive and com- plex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers. Consider a single stuck-at fault at line x, and a line y on the unique propagation path from x to the output. If the fault effect at line x propagates to line y with the same faulty value for all test vectors, the single stuck-at fault that corresponds to the faulty value at line y dominates the fault at line x. Proof. Consider SAα at line x and a vector v t that detects x SAα. Assume that the effect of x SAα propagates to line y as faulty value β/β (faulty value is D if β = 0 and is D if β = 1). According to Lemma 3.1, v t detects a SAβ at line y. If every test vector of x SAα detects y SAβ, since the circuit has only one output, the response of the circuit for the two faults is the same, hence, y SAβ dominates x SAα. Based on Lemma 4.4, the fault at the input of G 2 for this case dominates the target fault. Therefore all faults that locally fail are unacceptable. In case the propagated fault effect at an input of G 2 is D for some LVAs (in the current M- LVA) and D for some others, there is no fault that locally fails the only available 98 2-LVA. Now, if G 2 is an XOR gate and the faulty value at the input is D for some LVAs (in the current M-LVA) and D for some others, a 2-LVA is required so that the only fault that fails is the one at the on-path input of G 2 , which according to Lemma 4.4 dominates the target fault. In the case where the faulty value at the input is D for some LVAs (in the current M-LVA) and D for some others, it takes a 3-LVA to avoid the fault at the on-path input from getting locally detected. Any three of the four local value assignment options shown in Table 4.2 can be used as a 3-LVA. Consider gate G 3 in the transitive fanin of the fault site. Assume that G 3 is an AND gate with a value to be justified at its output. According to Table 4.3 there is only one option for justifying a 1 at the output of the gate and this results in detection of faults that are all equivalent to the SA0 fault at the output of the gate. The table also shows the case where the value to be justified at the output of the AND gate is 0 for every vector in the current M-LVA. In this case a 1-LVA,{(0 0)}, will result in detection of the SA1 fault at the output of the gate only. In case of an XOR gate, if the value to be justified is 1 for some LVAs (in the current M-LVA) and 0 for some others, 2-LVAs ,{(0 1),(1 0)} and {(0 0),(1 1)}, will only result in detection of SA0 and SA1 at the output of the gate, respectively. Lemma 4.5. Consider a fanout-free circuit with primitive and complex gates, in- cluding commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers. Consider a single stuck-at fault at line x, and a line y in the transitive fanin of line x. If excitation of the fault at line x results in justification of the same 99 value at line y for all test vectors, the single stuck-at fault that corresponds to the value at line y is equivalent to the fault at line x. Proof. Consider SAα at line x and SAβ at line y. Consider the path from line y to line x in the fanout-free subcircuit in the fanin of line x. If all vectors that detect x SAα, imply β at line y, all gates on this path are primitive gates and all values on the input lines of these gates are non-controlling values of the gates. Consider a vector that detects x SAα. This vector assigns α to line x and implies β at line y. All side-inputs on the path from y to x have non-controlling value. Hence, the effect of y SAβ is excited and propagates to line x. It furthers propagates to the output by v t . Therefore, v t detects y SAα. Next, consider a vector that detects y SAβ. All side inputs of the gates on the pathfromy toxhavenon-controllingvalue. Thereforethefaulteffectispropagated tolinexasα/α. Hence,faultxSAαisexcitedandpropagatestotheoutput. Vector v t detects x SAα. Every vector that detects x SAα, detects y SAβ and vice versa. Circuit has a single output, therefore, the two faults are equivalent. Whether G 3 is an AND gate or an XOR gate, based on Lemma 4.5 it can be concluded that if the fault at the output of G 3 will inevitably fail any M-LVA, then the fault is equivalent to the target fault. In case the value to be justified is 0 for some vectors in the M-LVA and 1 for others, if G 3 is an AND gate, a 2-LVA, {(1 1),(0 0)}, will guarantee that no faults will locally fail the M-LVA. If G 3 is an XOR 100 gate, a 3-LVA, {(0 1),(0 0),(1 1)} (or any other three of the four LVAs from Table 4.3), guarantees that no fault locally fails the M-LVA. Finally, consider gate G 4 at the transitive fanin of a side-input of an on-path gate. The justification of a value at the output of G 4 is exactly the same as in case of G 3 . The only difference is that if a fault at the output of G 4 will inevitably fail any M-LVA, thenthe fault is equivalenttothe faultat the side-input of the on-path gate that gets detected. It was proven earlier in Lemma 3.3 that a fault at the side input of an on-path gate only gets detected if it is equivalent to the detected fault at the on-path input of the gate, which, based on Lemma 3.2, subsequently is equivalent to or dominates the target fault. Therefore, any fault at the transitive fanin of side-inputs of on-path gates that locally fails an M-VTS is unacceptable. Above four gate locations represent all categories of gates that are encountered when generating a test for any single stuck-at fault in a fanout-free circuit. It can be concluded from the above arguments that by using M-LVAs of maximum size 3, a target fault can get excited, its effect can get locally propagated to output and all unjustified lines can be justified, without any acceptable fault locally failing the M-LVAs. It is important to note that the required size of an M-VTS that fails for the target fault and passes for all acceptable faults is merely the maximum size of the M-LVAs, and not their product. Since the circuit is fanout-free, based on Property 3.4, there would be no conflict between assignment of derived values. The following simple example helps describe this more clearly. 101 Figure 4.5: Example circuit to derive upper bound on the size of M-VTS. Consider the example circuit with three XOR gates, shown in Figure 4.5. Con- sider a SA1 at the input of gate G 2 , line c. Assume that all other single stuck-at faults in the circuit are acceptable. According to Table 4.1, a 2-LVA ={(0 0),(0 1)} is require at lines c and d to excite and propagate the fault effect through G 2 . The LVA at line e is implied as{D,D}. According to Table 4.2, since the value implied at line e is D for some LVAs (in the M-LVA) and D for some others, a 3-LVA (any three of the four LVAs from the table) is required to locally propagate the fault effect through G 3 . It is easy to observe that two of the LVAs in this 3-LVA can be shared with the LVAs in the 2-LVA used to excite the fault effect. Therefore, only one vector is added to the M-VTS. Similarly, it can be seen that the 2-LVA used to justify a 0 at the output of G 1 , line c, shares two LVAs with the 2-LVA used to excite the fault effect at G 2 . Hence the total number of vectors required is 3 (and not 2 x 3 x 2 = 12). The case of other primitive and complex gates can be dealt with in a similar manner. Theorem4.3. For an unacceptable single stuck-at fault in a fanout-free circuit with primitive and complex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers, an M-VTS with three or fewer vectors is 102 Figure 4.6: Generic circuit with non-reconvergent fanout. guaranteed to fail for an unacceptable fault f u , and pass for all acceptable single stuck-at faults. Now consider the generic circuit with non-reconvergent fanout in Figure 4.6 and asinglestuck-atfaultatthestemoffanoutF 1 . AccordingtoSection3.2.5.2, itmay notbepossibletodetectthisfaultusinga1-VTSwithoutdetectingacceptablefaults at some of the lines on the paths from the target fault site to the outputs. However, it is possible to find at least one vector that blocks the effect of an acceptable fault at a branch from propagating to a primary output while it propagates the effect of the target fault at the stem through another branch to another primary output. With multiple fanouts on path from fault site to the outputs, the size of such an M-VTS grows rapidly. This provides motivation for using the notion of output-masking to reduce the size of the M-VTS that does not fail for any acceptable fault. In output-masking, propagation of the fault effect through one path is considered and corresponding output is watched for an error, i.e., any possible fault effect at other outputs is ignored. 103 In this case, where the target fault is at the stem of a fanout, one path emerging from the fanout F 1 is chosen. LVAs on this path are used to propagate the fault effect to the output. The primary output at the end of this path is set as the observation point and all unjustified lines are justified to obtain a test vector for the target fault. In the next phase a distinct path emerging from the fanout F 1 is chosenandtheprocessofpropagation, justificationandoutput-maskingisrepeated to obtain another test vector. Any fault on each of the two distinct paths will only be propagated to one of the observation points. Therefore, no fault at any line on either of the paths or at transitive fanin of the side-inputs of a gate on either of these paths fails a 2-VTS obtained in this manner. In general, along the distinct propagationpaths, independentofthecircuitelementsencountered, noM-LVAsare necessary. Lemma 4.6. In a circuit with no reconvergent fanout with primitive and com- plex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexerss, the error rate of a single stuck-at fault at the stem of every fanout is greater than or equal to the error rate of the single stuck-at fault with the same faulty value at its branch. Proof. At the fanin of the target fault site, values can be justified according to Table 4.3. According to Lemmas 4.5 and 4.6, a fault at the transitive fanin of the target fault site, if detected, is either equivalent to the target fault or dominates it. 104 Theorem 4.4. Using the notion of output-masking and multi-vector testing, for a circuit with non-reconvergent fanouts and primitive and complex gates, including arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers, an M-VTS with three or fewer vectors is guaranteed to fail for an unacceptable fault, f u , and pass for all acceptable single stuck-at faults. Note that, the LVAs in Table 4.1-4.3 are derived assuming that all potentially acceptable faults are actually acceptable. However, in a real circuit, knowing the list of acceptable and unacceptable faults, the number of required vectors in each M-LVA so that no acceptable fault passes, might be less than the upper bound computed in this Section. Above approach to find the upper bound on the size of M-VTS is not applicable to circuits with reconvergent fanouts. However, in the next section, a multi-vector test generator for arbitrary circuits is proposed. 4.3.3 Multi-vector test generator for single stuck-at fault In this section, a multi-vector test generation approach is described that can be used to generate tests that provide perfect error-rate testing of stuck-at faults. The foundations of this test generator, like any other, are three key concepts, namely,faulteffectexcitation,faulteffectpropagationandvaluejustification. How- ever, this test generator is different from others since in a single pass, it generates an M-VTS containing as many vectors as required to perform perfect error-rate testing. 105 Figure 4.7: Example circuit for multi-vector test generation. The test generator starts with generating an M-LVA to excite the fault effect at the fault site. Then the fault effect must get propagated to the outputs, and the values must get justified at unjustified lines. During each of these three tasks, the necessary conditions, namely blockage of all acceptable fault effects are checked. If, according to Tables 4.1-4.3, an additional LVA is necessary, one additional vector is added by the test generation procedure. The test generation algorithm is based on D-algorithm (see [35] for details). At each step of the search, a possible test generation subtask (TGST), such as fault effect excitation, fault effect propagation, and line justification, is identified and pursued. The values at the outputs of the CUT, gates in the D-frontier, and unjustified lines are analyzed to find out whether the test generation has been successful, a conflict has occurred, or the search has to continue. Here, the proposed approach is explained using a simple example circuit with three XOR gates, shown in Figure 4.7. Consider the SA1 at line c. Assume that all other single stuck-at faults in the circuit are acceptable. In Figures 4.8-4.10 the main search tree of the test generator is illustrated, with each TGST as a node and thealternativechoicesasbranches. Inthissearchtree, thesetofgatesinD-frontier is referred to as D-front, and the set of all unjustified gates as U-list. Value of a line x is denoted by V(x). Fault effect excitation and fault effect propagation are 106 denoted by FEE and FEP, respectively. VA and NC denote value assignment and check for necessary conditions, respectively. According to Figures 4.8-4.10, first task to be completed is excitation of fault effect at line c and local propagation of fault effect to output of G 2 . For purpose of simplicity, only one branch of the tree is shown at each point with alternatives. From Table 4.1, (0 1) and (0 0) are the two LVAs that excite and propagate this fault effect to the output of G 2 without detecting any additional acceptable faults. Consider the first LVA. It is clear that using this LVA, an acceptable SA1 fault at line d as well as an acceptable SA1 at line e locally get detected. Therefore, it is required to add a second vector to the test. Search tree of the second vector is rooted at this node in first tree, meaning that if second vector generation returns a conflict, the procedure returns to this node in first search tree and takes another alternative. The rest of the test generation for the first vector completes, while only identifying nodes where a second vector is required but not adding a vector at these points. Once the test generation for the first vector is completed, the second vector generation starts. In this example, once gate G 1 is justified for the first vector, the next task is identified as adding a vector to the test generator. The D-frontier and list of unjustified lines are identified for the second vector, and the next task is identified as fault effect propagation through G 3 . Once again, the necessary conditions at G 3 are checked. An acceptable SA1 at line f locally fails the 2-LVA. Therefore, a third vector is required. Note that the subtask tree of the third vector is rooted at 107 Figure 4.8: First search tree for example execution of multi-vector test generator. 108 Figure 4.9: Second search tree for example execution of multi-vector test generator. Figure 4.10: Third search tree for example execution of multi-vector test generator. 109 this node in the second tree. The rest of the test generation for the second vector continues. In order to justify 0 at the output of G 1 for the second vector, according to Table 4.3, there are two LVAs, (0 0) and (1 1). Necessary conditions are checked and the SA0 at linea andb are detected by the LVA assigned toa andb by the first vector. Therefore, the second vector chooses (0 0) as its LVA. After justification of G 1 , the local necessary conditions are satisfied. The next task is adding the third test vector. According to the third search tree, the next task is identified as justification of G 1 while there are no other necessary conditions to be satisfied, hence any of the LVAs can be assigned to a and b. Once G 1 is justified, the test generation task returns a success. Consider a case where the third search tree returns a failure. The search algo- rithm must backtrack to the node in the second search tree which is marked by (**) and choose another alternative for propagation of fault effect through G 3 , namely assigning 1 to line f. Using the above test generator, as many vectors as are required for perfect error rate testing of a circuit, can be generated in one pass. 4.4 Experimental results Itwasprovenearlierthatusingthenotionofmulti-vectortesting,perfectacceptance gain and perfect unacceptable coverage can be achieved for all circuits. The upper 110 boundontheminimumnumberofvectorsneededineachtestsessionforthegeneral case (see Theorem 4.2) is extremely pessimistic. Hence in Section 4.3, several sub- familiesofcircuitsarestudiedtodiscoverthattheupperboundforaclassofcircuits that uses most commonly used gates and has all but reconvergent fanouts, is three vectors. Since the proposed analytical method cannot cope with the constraints that are put on the local value assignments due to reconvergent fanouts, a number of experiments are performed on benchmark circuits to empirically determine the number of vectors required for general circuits. In these experiments, first the set of all acceptable and unacceptable stuck- at faults are identified by performing exhaustive simulations on the circuit and counting the percentage of vectors for which a fault causes an error at the output. (It is assumed that during normal circuit operation, vectors occur with uniform distribution.) One unacceptable fault is targeted at a time and the test generator proposed in Chapter 3, ERTG, is used to generate a test vector. ERTG is specially designed to use heuristics at decision points in the test generation procedure to reducethenumberofacceptablefaultsthatgetdetectedbyatestvectoritgenerates for a target unacceptable fault. Fault simulation is performed to identify the set of acceptable faults that are detected by this vector. Test vectors are generated for the target fault until the intersection of the set of acceptable faults detected by these test vectors is empty. The entire set of vectors is saved as an M-VTS for the target unacceptable fault and the entire process is repeated with the next unacceptable fault as the target. 111 When an M-VTS is found for every unacceptable fault in the circuit, total number of acceptable faults that pass the test set is reported as acceptance gain. Several factors can change the results of this experiment in terms of total ac- ceptance gain. One of the factors is the maximum size allowed for each M-VTS. Test generation for a target fault continues until the intersection of all sets of ac- ceptable faults detected by each vector is empty. In simulations an upper bound is put on the number of test vectors generated for each target fault. Test generation for an unacceptable target fault might reach this upper bound before an empty intersection is found. Hence, the acceptance gain might be greater than zero. In an ideal case where there is no upper bound on the number of test vectors generated for each unacceptable fault, an acceptance gain of 100% is previously proven to be guaranteed. Another factor that can reduce the acceptance gain from the expected 100% is the manner in which test vectors are generated. Once a vector is generated for an unacceptable fault, The procedure backtracks to a certain point in the search tree, takes a second alternative, and generates a new vector. If the backtracking is done in smaller steps, a greater percentage of all test vectors of the target fault will be generated. However, the empty intersection of set of acceptable faults might be reached at a slower pace because the smaller the number of backtracks, the more similar the generated test vectors are to one another, and more similar are the set of all acceptable faults that are detected by each of them. In these experiments, two extreme values of number of test vectors skipped duringbacktracksaretakenateachstep. Inthefirstexperiment, atestisgenerated 112 foratargetfault,thenexttestvectorisgeneratedbypropagatingthefaulteffectvia another path. In the second experiment we simply backtrack one step to generate the next test vector. Results for these two experiments are shown in Table 4.4. In both experiments, the original acceptance gain refers to the acceptance gain achievable using a set of single test vectors, each generated for a target fault using the specially designed ERTG. The fourth column shows the acceptance gain using multi-vector testing. The fifth and sixth columns show the maximum and average size of an M-VTS needed for an unacceptable fault to achieve the corresponding acceptance rate. For the first set of experiments, the test generator failed to generate sufficient number of vectors for a target fault to achieve 100% acceptance gain. However, the acceptance gain is high and the maximum size is smaller, which confirms the intuition. Results for the second set of experiments show that if no vectors are skipped while generating test vectors for multi-vector testing, acceptance gain of 100% is achievable. MaximumandaveragesizesofM-VTSareobviouslylargerforthiscase. 4.5 Summary It was previously shown that perfect error-rate testing in terms of acceptance gain is not achievable for general circuits. In this chapter, the notion of multi-vector testing was proposed under which the notions of failing and passing a test acquire new definitions. 113 Table 4.4: Acceptance rate and size of M-VTS for ISCAS 85 circuits. T er = 0.1. Circuit Acceptable Acceptance gain(%) 2 M-VTS size faults(%) 1 Original M-VTS Max Avg Experiment 1 c880 40.2 83.0 94.7 2 1.24 c1355 65.6 74.8 94.8 3 1.74 c1908 44.8 74.5 89.9 3 1.46 c3540 64.0 75.8 95.5 4 1.83 c5315 72.1 71.7 91.9 3 1.65 Experiment 2 c880 40.2 83.0 100 4 1.30 c1355 65.6 74.8 100 5 1.30 c1908 44.8 74.5 100 12 4.81 c2670 48.2 83.7 100 5 2.9 c3540 64.0 75.8 100 15 5.76 c5315 72.1 71.7 100 15 3.87 1 percentage of all faults 2 percentage of acceptable faults It was proven that using the notion of multi-vector testing, perfect acceptance gain is achievable for all families of circuits independent of the fault model. An upper bound was derived on the number of vectors needed in each test session to meet this promise. The upper bound derived for the general class of circuits appears to be extremely loose for most circuits. Therefore, a structural approach was proposed to derive much tighter bounds on the number of vectors for special families of circuits. Results show that in reality the promise of perfect acceptance gain can be met with much fewer vectors per test session than the general upper bound provided by the theory. A multi-vector test generator is developed for error-rate testing for the set of single stuck-at faults. 114 The anticipated objectives of error-rate testing are therefore achieved: (i) full coverage of unacceptable faults; (ii) zero coverage of acceptable faults; (iii) low test generation time; and (iv) low test application time. Variousoptimizationscanbecarriedouttocombinemultiplesessionsorcompact the length of the test sessions and subsequently reduce the test application time. For instance, as soon as a CUT passes a vector in a test session, we can move on to applying the next session. Another obvious method of compaction is to eliminate a test session if there exists another session that is a superset. Analysis and further development of these methods is a subject of future research. In this work, the notion of multi-vector testing is used for perfect error-rate testing. However, this notion has many other potential applications. In any ap- plication where generation of a constellation of vectors for a fault is required, this notioncanbeused. Oneoftheapplicationsofthisnotionisfaultdiagnosis. Another application is n-detection of a fault. 115 Chapter 5 Error-rate testing for delay faults 5.1 Introduction Notion of error-rate testing was previously introduced, and approaches were de- veloped to generate tests that detect all unacceptable faults, while detecting no acceptable fault. It was theoretically proven that perfect error-rate testing in terms of acceptance gain can be achieved using multi-vector testing independent of the fault model. However, single-vector and multi-vector test generation algorithms were developed only for single stuck-at faults. Single stuck-at fault model continues to be the most commonly used model for test generation in classical testing due to the following important reasons. Com- pared to other fault models the number of faults in this model is small, and sets of tests that detect a large number of single stuck-at faults typically detect a large portion of real fabricated defects [1, 35]. Although single stuck-at fault model is considered useful for test generation, it doesnotaccuratelycharacterizethebehaviorofcircuitswithrealfabricationdefects 116 [23, 42, 66]. Furthermore, it was discussed in [32, 44, 46, 45] that high coverage of single stuck-at faults does not always translate to good quality levels for tests. Fault analysis approaches, proposed in [23, 24, 25, 43, 64], take into account the fabrication defect statistics and circuit layout to determine the faults that are likely to occur in a VLSI circuit. According to these studies, at physical-level, faults that are likely to occur in a circuit include: bridges, breaks, missing devices, and extra devices. At the circuit level, the likely defects can be modeled with the following fault models: stuck-open fault (SOpF), stuck-on fault (SOnF), single stuck-at fault (SSAF), multiple stuck-at fault (MSAF), two-line bridge fault (including feedback (FBF), non-feedback (NFBF), wired-AND, wired-OR, driver-strength, and Byzan- tine),multiple-linebridgefault,transitiondelayfault(TF),gatedelayfault(GDF), and path delay fault (PDF). As mentioned earlier, in classical testing, tests generated for single stuck-at faults are often foundto alsoprovide high coverage of more realistic modeled faults. However, objectives of error-rate testing require that tests not only provide high coverage of unacceptable faults but also provide low coverage of acceptable ones. Therefore, error-rate testing must be further studied in this context. Previously, failures that alter the logic function realized by the circuit were focusedon,assumingaveryslowclockrate;henceignoringdefectswithinthecircuit that merely cause timing malfunctions. However, in the context of error tolerance it is necessary to target timing malfunctions within a circuit before considering functional malfunctions. There is a larger number of timing defects than functional defects, that contribute to yield decreases. Therefore, in this chapter, defects that 117 cause timing malfunctions are studied, and theory and algorithms of error-rate testing are extended to such defects. Next,thedelayfaultmodelthatisconsideredforerror-ratetestingisintroduced. Detectionofdelayfaultsandtherelationshipsbetweentheerrorrateofadelayfault and the error rate of a single stuck-at fault that corresponds to it is discussed in Section 5.3. Effectiveness of a set of tests generated for single stuck-at faults is assessed on delay faults in terms of objectives of error-rate testing. In Section 5.4, a method of test generation is proposed based on the notion of multi-vector testing, to achieve perfect error-rate testing for delay faults. 5.2 Delay Faults and error-rate testing The clock frequency of a digital system is usually maximized by the designer in order to obtain highest performance from the hardware. Consider a combinational block, surrounded by input and output flip-flops, as shown in Figure 5.1. During normal operation of this circuit, the same clock is applied to the input and output flip-flops and the system clock is determined by the frequency of the flip-flop clock, which is determined by the maximum propagation delay of the combinational logic block. Twopatterntestsarerequiredfordelaytestingofcombinationalcircuits. During testing of this block, two separate clocks are used, clk 1 and clk 2 . The two clocks run at a frequency slower than the normal clock cycle. Theses clocks are skewed by the period of the system clock (T sys−clk ). At time t 0 an initializing input vector v i 118 Figure 5.1: An example combinational block with input and output flip-flops. is applied to initialize the circuit lines to appropriate values. The circuit is allowed sufficient time so that the lines attain values implied by v i . Then at time t 1 , a propagating vector v p is applied to launch a transition at the target site and to robustly propagate it to the outputs. Sufficient time must elapse for the fault effect to get propagated to the output. At time t 2 the outputs are sampled and errors at the outputs are identified. For purpose of simplicity, in this work, it is assumed that error is only observed for the propagating vector, v p . This test application methodology, shown in Figure 5.2(a), is referred to as slow-fast robust testing [31]. In practice, manufacturers widely use at-speed testing to detect delay faults. A sequence of tests is generated using a BIST test pattern generator and is ap- plied at the normal circuit speed (see Figure 5.2(b)). Any two consecutive vectors, are considered a pair of initializing and propagating vectors. However, the time that elapses between the two vectors (T sys−clk ) might not be sufficient for proper initialization of values. Hence the test might be invalidated. 119 Figure 5.2: Delay test application methodologies. For the rest of this work, slow-fast testing is considered as our test application methodology. 5.2.1 Delay fault models In order to find the two-pattern tests [v i −v p ] for delay defects, these defects must be modeled with delay faults. Following is a brief description of the different delay fault models proposed in the literature. According to [41] there are three classical delay fault models to represent delay defects: transition delay fault, gate delay fault, and path delay fault. Each of these models have their own limitations and advantages. The transition delay fault (TF) represents a defect that increases a rising or a falling transition from an input of a gate to the output. If the gate increases the delay in a rising transition the fault is referred to as a slow-to-rise (STR) fault. If the gate increases the delay in a falling transition the fault is referred to as a slow-to-fall (STF) transition. Transition delay fault models defects for which the delay is sufficiently large to cause a logical failure at one or more outputs of the combinational logic block, independent of the propagation path. The transition 120 delay faults are also referred to as gross-delay faults. The main disadvantage of this model is the assumption that the delay defect is large. The gate delay fault (GDF) represents defects that add a certain amount of additional delay to the propagation time of a transition from an input of a gate to its output. A particular instance of a gate delay fault specifies the amount of additional delay at the fault site, such as 3 nanosecond delay from an input of a gate to its output. The path delay fault (PDF) represents a path with a total delay that exceeds the period of the system clock due to the presence of one or multiple defects (or variations)alongthispath. Therearetwotypesoflogicpathdelayfaultsassociated with a single physical path. A rising path (falling path) delay faults is associated with the path traversed by a sequence of transitions initiated by a rising (falling) transition at the input of the path. There are other delay fault models which are basically derived from the above three models, such as the line delay fault and the segment delay fault. For purpose of error-rate testing for delay faults, transition delay fault model is used. Conditions for detection of a transition delay fault are the conditions for detection of a corresponding single stuck-at fault plus some additional conditions. Therefore, theory and algorithms of error-rate testing developed for single stuck-at faultscanbemodifiedfortransitiondelayfaults. Itisexplainedlaterthatconditions for detection of a gate delay fault can also be expressed in terms of conditions for detection of transition delay faults plus additional conditions. Extension of the 121 theory and algorithms developed in this section to gate delay faults is a subject of future research. 5.2.2 Error-rate of a transition delay fault A sequence of vectors [v i −v p ] detects a transition delay fault at a line if the two vectors apply the required transition at the line, and the second vector propagates the transition delay to the primary outputs. Table 5.1 states the conditions under which a vector detects a transition delay fault in terms of the vector’s ability to detect the corresponding single stuck-at fault. According to this table, a sequence ofvectors[v i −v p ]willdetectatransitiondelayfaultifthefirstvectorinitializes the fault site to an appropriate value and the second vector detects the corresponding single stuck-at fault at the line. For example, if a defect causes a slow-to-rise transition at a lineb, v i must imply 0 at line b and v p must detect a SA0 at line b. We refer to a SA0 (SA1) at a line as the stuck-at fault corresponding to a STR (STF) delay fault at the same line, and vice versa. From the conditions in Table 5.1, we can derive the error rate of a transition delayfault,intermsoftheerrorrateofitscorrespondingsinglestuck-atfault. These resultsareshowninTable5.2. Thisformulationisderivedbasedontheassumption thatdetectionofasinglestuck-atfaultatalineisindependentofthevalueassigned at the same line by a previous vector. This assumption is true provided that the vectors in the sequence are independent. 122 Table 5.1: Conditions that a sequence of vectors [v i −v p ], must satisfy to detect a TF, expressed in terms of the vector’s ability to detect the corresponding SSAF [35]. Fault model Detection conditions TF at b STR v i implies 0 at b and v p detects b SA0 STF v i implies 1 at b and v p detects b SA1 Table 5.2: Error rate of a TF in terms of error rate of the corresponding SSAF. Fault model Error rate in terms of corresponding SSAF TF at b STR R(f)= P(b=0)R(b SA0) STF R(f)= P(b=1)R(b SA1) As mentioned earlier, the error rate of each delay fault is less than or equal to the error-rate of its corresponding single stuck-at fault. Therefore, for a circuit, the average error rate over the population of delay faults is less than or equal to the average error rate over the population of single stuck-at faults. The distribution of error rates for stuck-at faults for a number of benchmark circuits was previously shown in Figure 2.2. Distribution of delay faults for c880 is shown in Figure 5.3. The curves show that delay faults are more concentrated aroundsmallererrorratevalues. Hence,loweraverageerrorrateoverthepopulation of delay faults. A similar trend is observed for other ISCAS85 benchmark circuits. NowconsiderathresholderrorrateT er . Adelayfaultisunacceptableiftheerror rate it causes is greater than or equal to T er ; otherwise, it is acceptable. According to the distribution of error rate for delay and stuck-at faults, for the same threshold value there exist more acceptable delay faults than acceptable stuck-at faults. 123 Figure 5.3: Distribution of error rate of stuck-at faults and delay faults for a bench- mark circuits, c880. 5.2.3 Central question and test application scenario As previously mentioned, the first question that needs to be answered in test gen- eration for error-rate testing is: Is it possible to generate a set of deterministic single-vector tests that detects every detectable unacceptable fault while detecting none of the acceptable faults? It was previously shown that this objective can be achieved using multi-vector tests, independent of the fault model (see Section 4.2). Here, single-pair tests are considered for delay faults to find out whether properties similartothoseforstuck-atfaults(seeSection3.2)exist. Inthissection, thecentral question is answered for transition delay faults using the following simple example. Consider a STR fault at the output of the AND gate shown in Figure 5.4. In order to detect this fault, initialization vector, v i , must imply 0 at line c, while propagation vector, v p , must detect a SA0 at c. Pairs of [v i −v p ] that detect this 124 Figure 5.4: An example AND gate with a slow-to-rise fault at the output. fault are: [(0 0)-(1 1)], [(0 1)-(1 1)], and [(1 0)-(1 1)]. Considering occurrence of 0 and 1 being equally likely at inputs a and b, and independence between vectors, error rate of the STR fault at c can be computed as 3/16. Pairs of vectors that detect a STR are: [(0 0)-(1 1)], and [(0 1)-(1 1)]. Pairs of vectors that detect b STR are: [(0 0)-(1 1)], and [(1 0)-(1 1)]. Error rates of faults a STR and b STR can both be computed as 2/16. For a threshold error rate of 2/16 < T er ≤ 3/16, c STR is unacceptable while a STR and b STR are acceptable faults. However, from the list of vectors for each fault it is obvious that it is not possible to detect c STR, an unacceptable fault, without detecting at least one of the acceptable faults, b STR or a STR. Note that c STR dominates a STR and b STR. In general, a slow transition to the non-controlled value at the output of a primitive gate, dominates a slow transition to the non-controlling value of the gate at the inputs. It was previously proven in Section 4 that using the concept of multi-vector testing, independent of the fault model, a multi-vector test session can always be found for each unacceptable fault so that no acceptable fault is detected. Inthecaseofdelayfaults, eachtestconsistsofapairofvectors, [v i −v p ], hencea test session consists of multiple pairs of vectors. A multi-pair test session (M-PTS) for a delay fault is analogous to an M-VTS for a stuck-at fault. 125 Consideramulti-pairtestsession(M-PTS),{[v i1 −v p1 ],[v i2 −v p2 ],...,[v ik −v pk ]}. At the time of test application, first v i1 is applied to initialize the circuit, when the circuit stabilizes, v p1 is applied to the circuit. Outputs are sampled and possible errors are noted. Then each of the remaining pairs of v i and v p is applied in the same manner. Errors are only noted at the outputs after application of each v p . The circuit is discarded only if a fault causes an error at the output of the circuit for all k propagation vectors, i.e., for v p1 ,v p2 ,...,v pk . Such a fault is said to fail the M-PTS. A circuit with such a fault will be discarded. However, if a fault causes an error for less than all of the propagation vectors, it is said to pass the M-PTS. A circuit with such a fault will not be discarded. In the case of the example AND gate, c STR can be detected using a three-pair test session (2-PTS): {[(0 1)-(1 1)],[(1 0)-(1 1)]}. c STR will fail this 2-PTS but b STR and a STR will both pass it. Next, tests that are generated for single stuck-at faults are combined with ini- tializing vectors to form single-pair and multi-pair test sessions for delay faults to asses the rate of success in achieving the objectives of error-rate testing. 5.3 Coverage of delay faults by stuck-at tests In this section, rate of success in achieving the objectives of error-rate testing for delay faults, using a set of test vectors generated for single stuck-at faults is inves- tigated. It was discussed earlier that unlike in classical testing, in error-rate testing 126 theobjective is tomaximize the coverage of unacceptablefaults(i.e., increaseunac- ceptable coverage) as well as to to minimize the coverage of acceptable faults (i.e., increase acceptance gain). According to Table 5.2, error rate of a transition delay fault is less than or equal to the error-rate of its corresponding single stuck-at fault. (Error rate of a STR (STF) fault is less than or equal to the error rate of a SA0 (SA1) fault at the same line.) Based on this observation, delay faults can be categorized as shown in Table 5.3. Table 5.3: Acceptable and unacceptable TFs and the corresponding SSAFs. Corresponding SSAF Unacceptable Acceptable TF Unacceptable Case-1 Case-2 X % Acceptable Case-3 Case-4 X X Consider a threshold error rate value T er and an unacceptable single stuck-at fault with an error rate greater than or equal to the threshold error rate. The transition delay fault corresponding to this fault can cause an error rate greater than or equal to T er , or less than T er depending on the probability of a one or a zero at the line; hence, faults in Case-1 and Case-3. If a single stuck-at fault has an error-rate less than T er , the error rate of its corresponding transition delay fault can not exceedT er . This fault belongs to Case- 4. There are no faults in Case-2. Consider a set of test vectors that detects all unacceptable single stuck-at faults and does not detect any of the acceptable single stuck-at faults. Consider pairing 127 each vector of this set with a random vector to form the two-pattern delay test. No fault from Case-4 will be detected by this set. However faults from Case-1 and Case-3 might be detected. Detection of faults in Case-1 contributes to higher unacceptable delay coverage by the test, and is desirable. However detection of faults in Case-3, translates to lower acceptance gain, which erodes the benefits of error tolerance. The maximum degradation in acceptance gain is determined by the total number of unacceptable single stuck-at faults that are associated with acceptable transition delay faults, i.e., number of faults in Case-3. A number of experiments are designed to estimate the fraction of faults that fall in Case-3 and hence get detected by a test generated for single stuck-at faults. Consider a transition delay fault STx, the detection of which requires initial- ization of value x at line b by vector v i and detection of SAx at line b by vector v p . For each unacceptable delay fault a delay test is generated under the following constraints. Approach 1: Vector v i is generated such that it initializes line b with x. Vector v p is generated such that it detects SAx at line b. A set of tests generated in this manner will detect faults from Case-1, Case-3, and Case-4. Approach 2: Vector v i is generated such that it initializes line b with x. Vector v p is generated such that it detects SAx at line b and minimizes the number of acceptable stuck-at faults that get detected. A test set generated in this manner will detect faults from Case-1, Case-3, and Case-4. However, detection of faults in Case-4 is reduced compared to Approach 1. 128 Approach 3: Vectorv i isgeneratedsuchthatitinitializeslinebwithx. Amulti- vector test session (M-VTS) is generated so that no acceptable stuck-at fault is detected. Each vector from the M-VTS, is used as the propagation vector, v p , along with v i to form a multi-pair test session (M-PTS) to detect the target transition delay fault. Using a test set generated in this manner will result in detection of faults from Case-1 and Case-3. Faults in Case-4 will no longer be detected. Approach 4: In order to suppress acceptable delay faults in Case-3 that get detected in Approach-3, that approach can be modified to block detection of such faults using v i . In Approach 4, a multi-vector test session of v p ’s is generated so that no acceptable stuck-at fault is detected. For each v p , a v i is generated so that it blocks detection of as many acceptable delay faults as possible, by initializing the lines with the complement of the initialization value required in order to detect the delay fault. Consider an acceptable STR fault at line b. Consider the case where the SA0 at line b is unacceptable and is detected by the stuck-at multi-vector test session consisting of v p ’s. In order to block detection of the STR at line b, v i has to assign a zero to the line for at least one of the pairs in which v p detected SA0 at this line. Using this approach the number of faults detected in Case-3 will be reduced compared to Approach 3. Some of the above approaches are implemented to compute the number of ac- ceptable delay faults that get detected; hence, the degradation in acceptance gain. These results are discussed next. 129 5.3.1 Experimental Results Approaches 2 and 4 are implemented as explained above and tests are performed on a number of ISCAS85 benchmark circuits. Results are shown in Table 5.4. The first and second columns show the percentage of delay faults that are unacceptable and acceptable. The third column shows the percentage of acceptable delay faults that correspond to acceptable stuck-at faults, i.e., faults in Case-4. This represents the minimum acceptance gain achieved by an approach that uses a set of stuck- at tests that detects all unacceptable stuck-at faults and no acceptable stuck-at faults. The last two columns show the acceptance gain of Approaches 2 and 4, respectively. Note that the acceptance gain of Approach 4 is necessarily greater than the percentage of faults in Case-4 (i.e., third column); however, acceptance gain of Approach 2 can be less than the third column due to detection of faults in Case-4. For both approaches perfect coverage of unacceptable faults is achieved, due to unacceptable faults being easy-to-detect faults. Table 5.4: Acceptance gains of Approaches 2 and 4; T er = 0.1. Circuit Unacceptable Acceptable Case-4 Acceptance gain(%) 2 faults(%) 1 faults(%) 1 faults(%) 2 Approach 2 Approach 4 C880 32.4 67.6 59.4 78.3 92.0 C1355 11.8 88.2 74.4 94.7 95.9 C1908 8.1 91.9 96.0 96.0 98.8 C2670 24.3 75.7 63.9 77.4 90.2 C3540 19.6 80.4 79.5 81.4 91.0 1 Percentage of all delay faults 2 Percentage of acceptable delay faults 130 Preliminary results show high acceptance gains for delay faults using tests that wereoriginallygeneratedforstuck-atfaults. Next,theorytogeneratetestsfordelay faults to achieve perfect acceptance gain is presented. 5.4 Test generation for transition delay faults In previous section, it was mentioned that tests generated for error-rate testing for single stuck-at faults can be used to achieve high acceptance gain for transition delay faults. In this section we explain our approach to generate tests specifically designed to achieve perfect acceptance gain for transition delay faults. It was previously shown that achieving perfect acceptance gain for delay faults isnotpossibleusingsingle-vectortesting. Itwasalsoshownthatusingmulti-vector testing we can achieve perfect acceptance gain can be achieved independent of the fault model. In this section necessary conditions that locally need to be satisfied at eachgatetoblocktheeffectofallacceptablefaultsareintroduced. Fromtheselocal value assignments (i) an upper bound is found on the number of vectors required in each test session to achieve perfect acceptance gain, (ii) a multi-pair test generator is developed for delay faults that achieves perfect acceptance gain. 5.4.1 Necessary conditions for multi-pair test generation In this section, we present the necessary conditions to locally excite a delay fault and propagate its fault effect through a gate as well as necessary conditions for justification of a value at the output of a gate, in the same manner as done in 131 Section 4.3.1. As mentioned earlier, in order to detect a delay fault, a pair of vectors is required. Therefore, the necessary conditions for detection of delay faults is expressed in terms of local pair assignments (LPA). Each LPA consists of two LVAs, namely initializing LVA, LVA i , and propagating LVA p . Tables 5.5-5.7show all possible LPAs that satisfythe abovementioned necessary conditions. For each LPA, all delay faults that locally get detected are identified. It can be seen from the tables that assignment of more than one LPA in necessary in some cases to block the effect of potentially acceptable delay faults. Consider the intersection of the set of delay and stuck-at faults detected by each LPA. A multiple-local pair assignment (M-LPA) is defined as any subset of the set of all LVAs that only fails for the faults in this intersection. Note that if a single stuck-at fault fails the M-LVA consisting of LVA p s, it fails the M-LPA as well. Hence, when generating the M-LPA special attention must be paid so that no acceptable stuck-at fault fails the M-LVA consisting of LVA p s. In this section, all faults that can be blocked using appropriate M-LPAs are studied. In the next section, properties of faults that can not be blocked are further studied. It is shown that detection of these faults is inevitable but does not reduce the acceptance gain. Table 5.5 shows the LPAs that locally detect a fault located at an input of an AND gate and an XOR gate. The table also shows the set of delay faults locally detected by each LPA and the intersection of such sets for all possible LPAs, i.e, faultsthatlocallyfailtheM-LPA.Itshowsthesetofstuck-atfaultslocallydetected by each LVA p and the intersection of such sets for all possible LPAs, i.e., faults that 132 Table 5.5: Local detection of a fault at the input of (a) an AND gate, and (b) an XOR gate. Both gates have inputs a and b, and output c. Gate Target Local Locally Locally Locally Locally fault assignment detected failing detected failing at a v i −v p delay faults M-LVA stuck-at faults M-LVA AND a STR (0 0)-(1 1) a STR, b STR, c STR 1 a STR a SA0, b SA0, c SA0 2 a SA0 (0 1)-(1 1) a STR, c STR a SA0, b SA0, c SA0 2 a STF (1 1)-(0 1) a STF, c STF a STF a SA1, c SA1 3 a SA1 (1 0)-(0 1) a STF a SA1, c SA1 3 XOR a STR (0 0)-(1 1) a STR, b STR a STR a SA0, b SA0, c SA1 a SA0 (0 1)-(1 1) a STR, c STF a SA0, b SA0, c SA1 (0 1)-(1 0) a STR, b STF a SA0, b SA1, c SA0 (0 0)-(1 0) a STR, c STR a SA0, b SA1, c SA0 a STF (1 0)-(0 1) a STF, b STR a STF a SA1, b SA0, c SA0 a SA1 (1 1)-(0 1) a STF, c STR a SA1, b SA0, c SA0 (1 1)-(0 0) a STF, b STF a SA1, b SA1, c SA1 (1 0)-(0 0) a STF, c STF a SA1, b SA1, c SA1 1 c STR dominates a STR 2 a SA0, b SA0, and c SA0 are equivalent 3 c SA1 dominates a SA1 locally fail the M-LPA. In particular, for the AND gate, for each fault the M-LPA contains only one LPA (the second LPA). In contrast, for the XOR gate, for each fault, the M-LPA has two LPAs (some two of the four LPAs). It shows that in each case, all delay faults except the target unacceptable delay fault, or a delay fault that dominates it, and hence is also unacceptable, can be blocked using an M-LPA. All stuck-at faults except the stuck-at fault that corre- sponds to the target delay fault, or a fault that is equivalent to it or dominates it, and hence is unacceptable, can be blocked as well. Now, consider a gate on the propagation path from the fault site to the pri- mary outputs. Table 5.6 shows that in the cases where the pair of initializing and propagating values at the on-path input of the gate (i.e., the input which is on the propagation path) is [0−D] for every LPA (in the M-LPA), a STR (or a fault that 133 Table 5.6: Local propagation of a fault effect through (a) an AND gate, and (b) an XOR gate. Both gates have inputs e and f, and output g. Gate Fault Local Locally Locally Locally Locally effect assignment detected failing detected failing at e at f delay faults M-LVA stuck-at faults M-LVA AND 0-D 0-1 e STR, f STR, g STR 1 e STR e SA0, f SA0, g SA0 2 e SA0 1-1 e STR, g STR e SA0, f SA0, g SA0 2 1-D 0-1 f STR, g STR e SA0, f SA0, g SA0 2 e SA0 1-1 – – e SA0, f SA0, g SA0 2 0-D 0-1 – e SA1, g SA1 3 e SA1 1-1 – – e SA1, c SA1 3 1-D 0-1 e STF e STF e SA1, g SA1 3 e SA1 1-1 e STF e SA1, g SA1 3 XOR 0-D 0-1 e STR, f STR e STR e SA0, f SA0, g SA1 e SA0 1-1 e STR, g STF e SA0, f SA0, g SA1 1-0 e STR, f STF e SA0, f SA1, g SA0 0-0 e STR, g STR e SA0, f SA1, g SA0 1-D 0-1 f STR, g STF – e SA0, f SA0, g SA1 e SA0 1-1 – e SA0, f SA0, g SA1 1-0 f STF, g STR e SA0, f SA1, g SA0 0-0 – e SA0, f SA1, g SA0 0-D 0-1 f STR, g STR – e SA1, f SA0, g SA0 e SA1 1-1 – e SA1, f SA0, g SA0 1-0 f STF, g STF e SA1, f SA1, g SA1 0-0 – e SA1, f SA1, g SA1 1-D 0-1 e STF, f STF e STF e SA1, f SA0, g SA0 e SA1 1-1 e STF, g STR e SA1, f SA0, g SA0 1-0 e STF, f STF e SA1, f SA1, g SA1 0-0 e STF, g STF e SA1, f SA1, g SA1 1 g STR dominates e STR 2 e SA0, f SA0, and g SA0 are equivalent 3 g SA1 dominates e SA1 134 Table 5.7: Justification of a value at the output of (a) an AND gate, and (b) an XOR gate. Both gates have inputs i and j, and output k. Gate Justify Local Locally Locally Locally Locally value assignment detected failing detected failing at k v i −v p delay faults M-LVA stuck-at faults M-LVA AND 0-1 (0 0)-(1 1) i STR, j STR, k STR k STR i SA0, j SA0, k SA0 1 k SA0 (0 1)-(1 1) i STR, k STR i SA0, j SA0, k SA0 1 (1 0)-(1 1) j STR, k STR i SA0, j SA0, k SA0 1 1-1 (1 1)-(1 1) – – i SA0, j SA0, k SA0 1 k SA0 1-0 (1 1)-(0 0) k STF k STF k SA1 k SA1 (1 1)-(0 1) i STF, k STF i SA1, k SA1 (1 1)-(1 0) j STF, k STF j SA1, k SA1 0-0 (0 0)-(0 0) – – k SA1 k SA1 (0 0)-(0 1) j STF, k STF i SA1, k SA1 (0 0)-(1 0) j STF, k STF j SA1, k SA1 (0 1)-(0 0) j STF, k STF k SA1 (1 0)-(0 0) j STF, k STF k SA1 XOR 0-1 (0 0)-(0 1) j STR, k STR k STR i SA1, j SA0, k SA0 k SA0 (1 1)-(0 1) i STF, k STR i SA1, j SA0, k SA0 (0 0)-(1 0) i STR, k STR i SA0, j SA1, k SA0 (1 1)-(1 0) j STF, k STR i SA0, j SA1, k SA0 1-1 (0 1)-(0 1) – – i SA1, j SA0, k SA0 k SA0 (1 0)-(0 1) i STF, j STR i SA1, j SA0, k SA0 (0 1)-(1 0) i STR, j STF i SA0, j SA1, k SA0 (1 0)-(1 0) – i SA0, j SA1, k SA0 1-0 (0 1)-(0 0) j STF, k STF k STF i SA1, j SA1, k SA1 k SA1 (1 0)-(0 0) i STF, k STF i SA1, j SA1, k SA1 (0 1)-(1 1) i STR, k STF i SA0, j SA0, k SA1 (1 0)-(1 1) i STR, k STF i SA0, j SA0, k SA1 0-0 (0 0)-(0 0) – – i SA1, j SA1, k SA1 k SA1 (1 1)-(0 0) i STF, j STF i SA1, j SA1, k SA1 (0 0)-(1 1) i STR, j STR i SA0, j SA0, k SA1 (1 1)-(1 1) – i SA0, j SA0, k SA1 1 i SA0, j SA0, and k SA0 are equivalent 135 dominates it) at the input line locally fails the M-LPA. In case the pair of values is [1−D] for every LPA (in the M-LPA), a STF at the input line locally fails the M-LPA. In cases where the pair of initializing value and the propagating value at the on-path input is [0−D] for some LPAs (in the M-LPA) and [1−D] for some others, no delay fault locally fails the M-LVA. It can also be seen from the table that in cases where the propagating value (the second value in the pair) at the on-path input is D for every LPA (in the M-LPA), a SA0 (or a fault that is equivalent to it) at the input locally fails the M-LPA. In cases where the value is D for every LPA (in the M-LPA), a SA1 (or a fault that dominates it) at the inputs locally fails the M-LPA. In cases where the propagating value is D for some LPAs (in the M-LPA) and D for some others, no stuck-at fault locally fails the M-LPA. Consider a gate in the transitive fanin of the fault site or in the transitive fanin of an off-path input of an on-path gate. Table 5.7 shows that in the case where the pair of initializing value and propagating value is [0− 1] or [1− 0] for every LPA (in the M-LPA), a STR or a STF at the input line locally fails the M-LPA, respectively. For other cases, no delay fault fails the M-LPA. Itcanalsobeseenfromthetablethatinthecasewherethepropagatingvalueat the line is 1 or 0 for every LPA (in the M-LPA), a SA0 (or a fault that is equivalent to it) or a SA1 locally fails the M-LPA, respectively. As explained in Section 4.3.1, Tables 5.5-5.7, capture the properties of a two- input AND gate and a two-input XOR gate, as examples of primitive and complex 136 Figure 5.5: A generic fanout-free circuit. gates, respectively. These results can be easily extended to other primitive and complex gates. In the next section, the necessary conditions from Tables 5.5-5.7 are used, in the same manner as in Section 4.3.2, to find an upper bound on the size of an M-PTS that can achieve perfect acceptance gain. 5.4.2 Upper bound on the size of M-PTS In this section, an upper bound is computed on the size of an M-PTS that achieves perfect acceptance gain for different classes of circuits. Consider the fanout-free circuit in Figure 5.5, and an unacceptable delay fault at the input of the gate G 1 . From Table 5.5, if G 1 is an AND gate, faults that locally get detected by each of the second LPAs are either equivalent to the target fault or dominate it, hence are unacceptable. In case G 1 is an XOR gate, however, a 2-LPA is required so that the target fault and its corresponding stuck-at fault are the only faults that locally fail the M-LPA. Consider gate G 2 on the propagation path. It can be shown from the tables that if G 2 is an AND gate, in the case where the pair of values at the input of G 2 137 is [0−D] or [1−D] for every LPA (in the M-LPA), all delay faults that get locally detected are either equivalent to the delay fault at the input of G 2 or dominate it. Lemma 5.1. Consider a fanout-free combinational circuit with primitive and com- plex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers. Consider a delay fault at line x and a line y on the unique propagation path from x to the output. If the initializing and propagating vectors imply the same value and the same fault effect at line y for all pairs of vec- tors, the delay fault that corresponds to the initializing and propagating faulty value at line y dominates the delay fault at line x. Proof. This Lemma can be proven similar ro Lemma 4.4. Based on this Lemma, the delay fault at input of G 2 for this case dominates the target fault, hence is unacceptable. It was previously shown in Section 4.3.2 that the stuck-at faults that locally fail each M-LPA are equivalent to or dominate the stuck-at fault that corresponds to the target delay fault, hence are unacceptable. If G 2 is an XOR gate, similar to the AND gate, in the worst case, a 2-LPA is required to block the effect of all but unacceptable faults. Consider gate G 3 at the transitive fanin of the fault site or G 4 at the transitive fanin of a side-input of an on-path gate. Similar to Section 4.3.2, following result can be derived. Lemma 5.2. Consider a fanout-free combinational circuit with primitive and com- plex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers. Consider a delay fault at line x, and a line y in the 138 transitive fanin. If the initializing and propagating vectors imply the same pair of value at line y for all pairs of vectors, the delay fault that corresponds to the pair of initializing and propagating value at line y dominates the delay fault at linex. Proof. This Lemma can be proven similar ro Lemma 4.5. Using this Lemma and Table 5.7, similar to Section 4.3.2, it is shown that if the gate is an AND gate, in the worst case, a 2-LPA is required to block the effect of all but unacceptable faults. If the gate is an XOR gate, in the worst case, a 2-LPA is required to block the effect of all but unacceptable faults. The above four gate locations represent all categories of gates that are encoun- tered when generating a test for any delay fault in a fanout-free circuit. Concluding in the same manner as in Section 4.3.2, following theorem is derived. Theorem 5.1. For an unacceptable delay fault in a fanout-free circuit with prim- itive and complex gates, including commonly used gates such as arbitrary AOIs, OAIs, XORs, XNORs, and multiplexers, an M-PTS with two or fewer pairs of vectors is guaranteed to fail for an unacceptable delay fault f u , and pass for all acceptable delay and stuck-at faults. Consideringacircuitwithnon-reconvergentfanout, theabovetablesandresults can be used along with the concept of output-masking, similar to Section 4.3.2 to conclude the following theorem. Theorem 5.2. Using the notion of output-masking and multi-vector testing, for a circuit with non-reconvergent fanouts and primitive and complex gates, including 139 commonly used gates such as arbitrary AOIs, (OAI)s, XORs, XNORs, and multi- plexers, an M-PTS with two or fewer pairs of vectors is guaranteed to fail for an unacceptable delay fault f u , and pass for all acceptable delay and stuck-at faults. Asexplainedearlier, theaboveapproachisnotapplicabletocircuitswithrecon- vergent fanout, due to limitations in value assignment. However, next, a multi-pair test generator is presented for delay faults that generates M-PTS for arbitrary cir- cuits. 5.4.3 Multi-pair test generator for delay faults It was explained in Section 4.3.3 how the necessary conditions for generation of an M-VTSforastuck-atfaultcanbeusedtodevelopamulti-vectortestgenerator. The same approach can be used along with the new tables for delay faults to generate an M-PTS for each unacceptable delay fault. The multi-pair test generation algorithm is very similar to the multi-vector test generation algorithm, explained in Section 4.3.3. Test generation starts with gener- atinganM-LPAtoexcitethefaulteffectatthefaultsite. Thenthefaulteffectmust get propagated to the outputs, and the pairs of values must be justified at unjus- tified lines. During each of these tasks, the necessary conditions, namely blockage of all acceptable delay and stuck-at fault effects are checked. If according to Tables 5.5-5.7, an additional LPA is necessary, one additional vector is added by the test generation procedure. 140 Figure 5.6: Example circuit for multi-pair test generation. Here,inthesamemannerasinSection4.3.3,theproposedapproachisexplained using a simple example circuit, shown in Figure 5.6. Consider the STF fault at line c and assume that all other delay faults in the circuit are acceptable. Figures5.7and5.8,representthemainsearchtreeofthetestgenerator. Inthese figures, a pair of values at line x is referred to as P(x). PA denotes assignment of a pairofvalues. TherestofthenotationsarethesameasinSection4.3.3. Forpurpose of simplicity, only one branch of the tree is drawn at each point with alternatives. Similar to Section 4.3.3, the search tree of the second vector is rooted at the node where the addition of the vector becomes a requirement due to the necessary conditions. If test generation of first vector completes successfully, the search con- tinues on the second tree, starting at the root. If test generation of first vector returns a conflict, the procedure returns to the node where second tree is rooted and takes another alternative. 5.5 Summary In this section, the importance of deploying more realistic fault models is discussed in the concept of test generation for error-rate testing. 141 Figure 5.7: The first search tree for the example execution of multi-vector test generator. 142 Figure 5.8: The second search tree for the example execution of multi-vector test generator. 143 Itisemphasizedthatintestgenerationforerrorrateforaparticularfaultmodel, minimum coverage of acceptable defects is desired as well as maximum coverage of unacceptable defects. Transition delay fault is used to model timing defects and relationships between faults are studied. It was shown that slightly modified multi-vector stuck-at tests provide perfect unacceptable coverage of transition delay faults as well as high acceptance gain. The acceptance gains achieved in this manner are sufficiently high for most applications, however, if for a particular application, higher acceptance gains are required, a multi-pair test generator specifically designed for delay faults is devel- oped. Pairs of vectors generated in this manner provide perfect acceptance gain of delay faults at reasonable test costs. 144 Chapter 6 Future research In this chapter a number of extensions of this work are briefly discussed that can be subjects of future research. 6.1 Composite metric of error rate - significance Errorrateanderrorsignificancearebothconsideredaskeymetricsoferrortolerance in [5, 7, 9, 10]. Error rate as discussed throughout this work is defined as the percentage of clock cycles for which an error occurs at the output of a circuit duringnormaloperation. Errorsignificanceforasetofcircuitoutputs,aspreviously discussed, is defined as the maximum amount by which the response at the set of outputs can deviate from the corresponding error-free value. This work focuses on error rate as a single metric. Previous work by Jiang and Gupta [36, 37] focuses on error significance. A test generator generates a set of tests that distinguishes between chips that cause an error significance greater than a threshold value at the outputs and the rest of the chips. 145 In error-significance testing, as in error-rate testing, the error tolerant applica- tion provides the test algorithm with a threshold value. If the absolute difference between the output of a circuit and the output expected of the fault-free version of the circuit is greater than or equal to the threshold significance, the chip is said to be unacceptable and discarded. The approach in [36, 37] searches for test vectors for each fault until an absolute difference of greater than the threshold error sig- nificance can be achieved or proven impossible. Any vector that can achieve this objective is added to the test set. If it can be proven that such a result can never be achieved for a fault (i.e., no vector can cause an absolute difference greater than or equal to the threshold significance), the fault is considered acceptable. If the test generator passes a backtrack limit and cannot find a vector for a fault, then that fault is accepted despite the fact that it was not proven acceptable. At the time of test application, the test set generated by the error-significance test generator is applied vector by vector to the circuit and the chip is discarded if an error is observed at the outputs. If no error is observed for any of the test vectors, the chip is accepted. Outputs of a circuit can be divided to separate buses with separate threshold significance values. Based on their functionality, output lines can be divided to data or control buses. The control buses are usually not tolerant to errors, hence a threshold value of one. In [17] a discrete cosine transform (DCT) unit of an MPEG or a JPEG coder is considered. It is shown that a combination of error rate and error significance serves as a suitable metric to rank the degradation of quality of an image due to a 146 Figure 6.1: Acceptance curve based on error significance and error rate. hardware fault. Authors in [17] define acceptability (for a particular application) in termsofthepercentageofblocks(inanimageoravideosequence)forwhichvisible differencesbetweenimagescanbetolerated. Theyobservethaterrorswithhighsig- nificance that occur less frequently can still result in acceptable performance. This alsoappliestoerrorsthatoccurmorefrequentlybutcauseasmallerrorsignificance. A certain fault may cause errors with different significances and rates. For any fault, there is a profile of error significance and error rate. An acceptance curve similar to Figure 6.1 acts as the threshold for different fault profiles. If the profile of the fault falls below this curve the fault is considered acceptable, otherwise it is considered unacceptable. Although each fault has to be represented with a full profile of different errors with different significance and rate values, here, each fault is associated with only one error. The error with maximum significance is considered. Differentexperimentswereundertakenin[16,67]tofindthefaultprofilesforthe motion detection block of an MPEG encoder, the DCT block of a JPEG decoder, 147 andthebroadcasterSISOandaccumulatorSISOblocksofarepeatandaccumulate codedecoder. Therateoftheerrorwithmaximumsignificancecausedbyeachfault is found. The data from the above blocks, follows the trend shown in Figure 6.2, namely, acceptable and unacceptable faults are separated by a curve resembling the curve in Figure 6.1. In order to identify acceptable and unacceptable chips containing faults shown in Figure 6.2, we propose to perform a series of error-rate and error-significance tests. The rate and significance data of the faults can be processed to find the threshold values that separate the faults as shown in Figure 6.2. Error-rate test is performed with threshold error rate T er0 . If a chip passes this test, it is considered acceptable. However, if a chip fails this test, error-significance test is performed with threshold significance T es0 . If a chip fails this test, it is discarded, if it passes the test, error-rate testing is performed with threshold error rate T er1 , and so on. This process is repeated until all threshold values are covered. If the faults are not perfectly clustered, there might be some acceptable faults thatfallinthezoneonthegraphindicatedaszone2. Usingtheapproachexplained above, chips with acceptable faults that fall within zone 2 are rejected. Therefore, the number of acceptable faults that fall within zone 1, determines the acceptance gain of this approach. The costs associated with this approach are the costs of test generation for different thresholds as well as the costs of application of corresponding tests. For the purpose of test generation for error-rate testing, the number of target faults reduces for each threshold error rate, since no test is required for the faults that 148 Figure 6.2: Separation of faults based on significance and rate of error. pass the error-rate test with the smaller threshold. For example, when generating test vectors for T er1 , only faults with error rate greater than or equal to T er1 and less than T er2 are considered as target faults. This results in test generation and application time for error-rate testing in the order of total number of target faults. For error-significance testing, however, the cost of test generation is in the order of the product of number of target faults and number of significance thresholds; while cost of test application is merely in the order of number of target faults. 6.2 Defect-based error-rate testing The importance of not relying heavily on a particular fault model in test generation for classical testing is previously established. Consideration of defect-based testing is even more important in error-rate testing. It was explained in Section 5.1, that 149 unlike classical testing, tests that rely heavily on stuck-at fault model in error-rate testing might not provide good measures of error-rate testing for real defects. We propose to revisit this problem using a defect-based approach. For example, consider the case of single stuck-at faults, and transition delay faults presented in Chapter 5. A stuck-at fault under-specifies the conditions of detection of a corresponding transition delay fault. Consider a set of tests that detects all unacceptable stuck-at faults and no acceptable stuck-at fault. There might be unacceptable delay faults that correspond to unacceptable stuck-at faults that are not detected (i.e., imperfect unacceptable coverage), due to over-specified conditions of detection. At the same time, there are acceptable delay faults that correspond to unacceptable stuck-at faults that get detected (i.e., imperfect accep- tance gain). Of these two, the problem of imperfect unacceptable coverage is more serious. Therefore, in Chapter 5, only unacceptable faults that correspond to un- acceptable defects are targeted. Test generation approach is modified to consider detection conditions of the delay faults. Detection of all unacceptable delay faults is guaranteed in this manner. The problem now reduces to detection of accept- able delay faults that correspond to unacceptable stuck-at faults. In the worst case scenario where this problem can not be solved, it only translates to less beneficial error-rate testing and does not cause any fundamental problems. However, as ex- plained in Section 5.3, simple modifications can be done to achieve high degrees of acceptance gain. 150 Ingeneral, ifafaultmodeliscomprisedoffaultsthatover-specifytheconditions of detecting corresponding realistic defects, then detecting a fault results in detec- tion of the defect that corresponds to it. However, avoiding detection of a fault, does not guarantee that a circuit with corresponding defect will pass this test. At the same time, due to over-specification of detection conditions, the error rate of the modeled fault is less than the error rate of the corresponding defect. Consider a test generated for such a fault model that provides perfect unac- ceptable coverage and acceptance gain of the faults. Such a test will detect all unacceptable faults, and their corresponding defects. The error rate of the de- fects is greater than the error rate of the faults, therefore, all detected defects are unacceptable. Thetestwillalsoavoiddetectionofacceptablefaults. Theremightbeacceptable faults that correspond to unacceptable defects. Test vectors generated using the fault model will not detect such unacceptable defects. At the same time, because of over-specification in detection conditions of the faults, some acceptable defects may get detected, while detection of acceptable faults is avoided. On the other hand, if a fault model is compromised of faults that under-specify the conditions of detecting corresponding defects, then detecting a fault does not guarantee detecting the defect that corresponds to it. However, avoiding detection of a fault, guarantees that a circuit with corresponding defect will pass. At the same time, due to under-specification of the detection conditions, error rate of the fault is greater than the error rate of the corresponding defect. 151 Consider a test generated for such a fault model that provides perfect unaccept- able coverage and acceptance gain of the faults. Such a test will detect all unac- ceptable faults. There might be acceptable defects that correspond to unacceptable faults. Such acceptable defects will get detected. At the same time, because of under-specification of detection conditions of the fault, some unacceptable defects that correspond to unacceptable faults may not get detected. The test will also avoid detection of acceptable faults. All defects corresponding to acceptable faults are also acceptable. Due to under-specification of conditions, if detection of a fault is avoided, the corresponding defect does not get detected either. In neither of the cases perfect acceptance gain and unacceptable coverage of the defects is achieved. However, with some modifications these approaches might provide favorable measures of error-rate testing. In general, theory and practical approaches must be developed to consider two cases, (a) where a fault model includes faults whose conditions of detection under- specify the conditions of detection of the corresponding defects (a generalization of theaboveexample), and(b)whereafaultmodelincludesfaultswhoseconditionsof detection over-specify the condition of detection of the corresponding defects. Such different aspects of defect-based testing can be studies in the concept of error-rate testing. 152 Figure 6.3: A general model of a sequential circuit. 6.3 Extension of error-rate testing to sequential circuits Consider the sequential circuit in Figure 6.3, comprised of a combinational logic block(CLB)andasetofflip-flops. Theprimaryinputsandoutputsofthesequential circuit are x 1 ,x 2 ,...,x n , and z 1 ,z 2 ,...,z m , respectively. The present state variables, y 1 ,y 2 ,...,y k constitute the state inputs of the combinational circuit. The next state variables Y 1 ,Y 2 ,...,Y k constitute the state outputs of the combinational circuit. Y l and y l are respectively, the input and output of flip-flop FF l , 1≤l≤k. In general, test development for sequential circuits is difficult mainly due to lack ofcontroloverstateinputsandoutputs. Inerror-ratetestingforsequentialcircuits, additional challenges rise, which are briefly discussed in the following. 153 Consider a combinational logic block, with a stuck-at fault, f, with error rate R C (f). Now consider the sequential circuit in Figure 6.3 with this particular com- binational block as a core. Let us assume that applying vector, V 1 , to inputs x 1 ,x 2 ,...,x n of the sequential circuit causes an error only at line Y l . In the next cycle, when applying a second vector, V 2 , to x 1 ,x 2 ,...,x n , the fault effect stored at FF l is applied to input y l of the circuit. This fault effect may get propagated to oneoftheprimaryoutputsandcontributetothenumberoferroneouscyclesforthe sequential circuit, it may therefore increase the error rate of the sequential circuit, R S (f), compared to the error rate of the core, R C (f). On the other hand, this fault effect may cancel an error caused by vector V 2 at a primary output. Hence, it may decrease the error rate of the sequential circuit, R S (f), compared to the error rate of the core, R C (f). Expansion of the current error-rate testing approaches to sequential circuits, can be a valuable addition to this work. 154 Bibliography [1] M. Abramovici, M. A. Breuer, and A. D. Friedman. Digital Systems Testing and Testable Design. John Wiley and Sons, 1995. [2] M. Abramovici, P. R. Menon, and D. T. Miller. Critical path tracing: an alternative to fault simulation. IEEE Design & Test of Computers, 1(1):83–93, 1984. [3] S.AmarelandJ.A.Brzozowski. Theoreticalconsiderationsonreliabilityprop- ertiesofrecursivetriangularswitchnetworks. RedundancyTechniquesforCom- puting Systems, 1962. [4] P. H. Bardell, W. H. McAnney, and J. Savir. Built in Test for VLSI: Pseudo- random Techniques. John Wiley and Sons, Inc., 1987. [5] M. A. Breuer. Intelligible testing. In Proc. 4th Multimedia Technology and Applications Symposium, April 1999. [6] M.A.Breuer. DeterminingerrorrateinerrortolerantVLSIchips. InProc. 2nd International Workshop on Electronic Design, Test and Applications, 2004. [7] M. A. Breuer. Intelligible test techniques to support error-tolerance. In Proc. 13th Asian Test Symposium, 2004. [8] M. A. Breuer. Let’s think analog. In IEEE Computer Society Annual Sympo- sium on VLSI, May 2005. [9] M. A. Breuer and S. K. Gupta. Intelligible testing. In Proc. 2nd IEEE Int’l Workshop on Microprocessor Test and Verification, September 1999. [10] M. A. Breuer, S. K. Gupta, and T. M. Mak. Defect and error-tolerance in the presence of massive numbers of defects. IEEE Design & Test Magazine, 21:216–227, May 2004. 155 [11] M. A. Breuer and H. Zhu. Error-tolerance and multi-media. In Proc. Inter- national Conference on Intelligent Information Hiding and Multimedia Signal Processing, December 2006. [12] M. A. Breuer and H. Zhu. An illustrated methodology for analysis of error tolerance. IEEE Trans. on Design and Test of Computers, 25(2), 2008. [13] H-Y. Cheong, I. S. Chong, and A. Ortega. Computation error tolerance in motion estimation algorithms. In Proc. International Conference on Image Processing, October 2006. [14] H-Y. Cheong and A. Ortega. Distance quantization method for fast nearest neighborsearchcomputationswithapplicationstomotionestimation. InProc. Asilomar Conference on Signals, Systems, and Computers, November 2007. [15] H-Y. Cheong and A. Ortega. Motion estimation performance models with application to hardware error tolerance. In Proc. Visual Communications and Image Processing, 2007. [16] I. S. Chong. Error Tolerant Multimedia Compression System. PhD thesis, University of Southern California, 2009. [17] I. S. Chong, H-Y. Cheong, and A. Ortega. New quality metrics for multimedia compression using faulty hardware. In Proc. International Workshop on Video Processing and Quality Metrics for Consumer Electronics, January 2006. [18] I.S.ChongandA.Ortega. Hardwaretestingforerrortolerantmultimediacom- pression based on linear transforms. In Proc. 14th Defect and Fault Tolerance Conference, October 2005. [19] I. S. Chong and A. Ortega. Dynamic voltage scaling algorithm for power constrainedmotionestimation.InProc.InternationalConferenceonAcoustics, Speech, and Signal Processing, April 2007. [20] I. S. Chong and A. Ortega. Power efficient motion estimation using multiple imprecise metric computation. In Proc. International Conference on Multime- dia and EXPO, July 2007. [21] H. Chung and A. Ortega. System level fault tolerance for motion estimation. Technical Report, USC-SIPI 354, University of Southern California, July 2002. [22] H. Chung and A. Ortega. Analysis and testing for error tolerant motion esti- mation. In Proc. 14th Defect and Fault Tolerance Conference, October 2005. 156 [23] F. J. Ferguson and J. P. Shen. Extraction and simulation of realistic CMOS faults using inductive fault analysis. In Proc. International Test Conference, 1988. [24] F. J. Ferguson and J. P. Shen. A CMOS fault extractor for inductive fault analysis. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 7:1181–1194, November 1988. [25] F. Joel Ferguson. Inductive Fault Analysis of VLSI Circuits. PhD thesis, Carnegie Melon University, 1987. [26] N. K. Gokli. Software interface, hardware architecture, and test methodology fortheNvidiaNV34graphicsprocessor. Master’sthesis,UniversityofSouthern California, 2006. [27] L. H. Goldstein. Controllability/observability analysis of digital circuits. IEEE Trans. on Circuits and Systems, CAS-26:685–693, September 1979. [28] T. Hsieh, K. J. Lee, and M. A. Breuer. An error oriented test methodology to improve yield with error tolerance. In Proc. VLSI Test Symposium, May 2006. [29] T. Hsieh, K. J. Lee, and M. A. Breuer. Reduction of detected acceptable faults for yield improvement via error-tolerance. In Proc. Design Automation and Test in Europe, 2007. [30] T. Hsieh, K. J. Lee, and M. A. Breuer. An error rate based test methodology to support error tolerance. IEEE Trans. on Reliability, 57(1), March 2008. [31] Y-C. Hsu and S. K. Gupta. A simulator for at-speed robust testing of path delay faults in combinational circuits. IEEE Trans. on Computers, 45:1312– 1318, November 1996. [32] L. Huisman. Fault coverage and yield prediction: Do we need more than 100% coverage? In Proc. European Test Conference, 1993. [33] IRCI. International technology roadmap for semiconductors (ITRS), 2001. [34] S. K. Jain and V. D. Agrawal. STAFAN: An alternative to fault simulation. In Proc. Design Automation Conference, June 1984. [35] N. Jha and S. K. Gupta. Testing of Digital Systems. Cambridge University Press, 2003. [36] Z. Jiang and S. K. Gupta. An ATPG for threshold testing: Obtaining accept- able yield in future processes. In Proc. International Test Conference, July 2002. 157 [37] Z. Jiang and S. K. Gupta. Threshold testing: Covering bridging and other realistic faults. In Proc. Asian Test Symposium, November 2005. [38] B. W. Johnson. The Electrical Engineering Handbook. CRC Press, 1993. [39] R. C. Leachman and C. N. Berglund. Systematic mechanisms limited yield (SMLY) study. International SEMATECH, DOC 03034383A-ENG, March 2003. [40] K. J. Lee, T. Hsieh, and M. A. Breuer. A novel test methodology based on error-rate to support error-tolerance. In Proc. International Test Conference, July 2005. [41] A. K. Majhi and V. D. Agrawal. Delay fault models and coverage. In Proc. 1998 Eleventh International Conference on VLSI design, 1998. [42] Y. K. Malaiya and S. Y. H. Su. A new fault model and testing technique for CMOS devices. In Proc. Internation Test Conference, 1982. [43] W. Maly, F. J. Ferguson, and J. P. Shen. Systematic characterization of pgys- ical defects for fault analysis of MOS IC cells. In Proc. Internation Test Con- ference, 1984. [44] Peter C. Maxwell. Reduction in quality caused by uneven fault coverage of different areas of an integrated circuit. 14(5):603–606, May 1995. [45] Peter C. Maxwell and Robert C. Aitken. Test sets and reject rates: all fault coverages are not created equal. IEEE Design & Test of Computers, 10(1):42– 51, 1993. [46] Peter C. Maxwell, Robert C. Aitken, Vic Johansen, and Inshen Chiang. The effectofdifferenttestsetsonqualitylevelprediction: Whenis80%betterthan 90%? In Proc. International Test Conference, pages 358–364, 1991. [47] E. J. McCluskey and F. W. Clegg. Fault equivalence in combinational circuits. IEEE Trans. on Computers, 20:1286–1293, 1971. [48] J.Melzer. Low Complexity Turbo-Like Codes. PhDthesis, UniversityofSouth- ern California, 2006. [49] J.MelzerandK.Chugg. Irregulardeisgnsfortwo-statesystematicwithserially concatenated parity codes. In Proc. Military Communications Conference, October 2006. 158 [50] B. T. Murphy. Cost-size optima of monolithic integrated circuits. In Proc. IEEE, volume 52, pages 1537–1545, December 1964. [51] R. C. Ogus. The probability of correct output from a combinational circuit. IEEE Trans. on Computers, C-24:534–544, May 1975. [52] Z. Pan and M. A. Breuer. Estimating error rate in defective logic using signa- ture analysis. IEEE Trans. on Computers, 56(5):650–661, 2007. [53] Z. Pan and M. A. Breuer. Basing acceptable error-tolerant performance on significance-based error-rate (SBER). In Proc. VLSI test symposium, April 2008. [54] K. P. Parker and E. J. McCluskey. Analysis of logic circuits with faults using input signal probabilities. IEEE Trans. on Computers, C-24:668–670, May 1975. [55] K. P. Parker and E. J. McCluskey. Probabilistic treatment of general combi- national networks. IEEE Trans. on Computers, C-24:573–578, June 1975. [56] J. F. Poage and E. J. McCluskey. Derivation of optimal test sequences for se- quentialmachines. InProc. Symposium on Switching Theory and Logic Design, 1964. [57] J. Savir. Improved cutting algorithm. IBM J. Res. Develop., 34:381–388, March/May 1990. [58] J. Savir, G. S. Ditlow, and P. H. Bardell. Random pattern testability, 1984. [59] S. Shahidi and S. K. Gupta. Estimating error-rate during self-test via one’s counting. In Proc. International Test Conference, October 2006. [60] S. Shahidi and S. K. Gupta. A theory of error-rate testing. In Proc. Interna- tional Conference on Computer Design, October 2006. [61] S. Shahidi and S. K. Gupta. ERTG: A test generator for error-rate testing. In Proc. International Test Conference, October 2007. [62] S. Shahidi and S. K. Gupta. Multi-vector tests: A path to perfect error-rate testing. In Proc. Design Automation and Test in Europe, March 2008. [63] J. Shedletsky. The error latency of a fault in combinational digital circuits. Tech. Note 55, Dig. Syst. Labs, Stanford University, November 1974. [64] J. P. Shen, W. Maly, and F. J. Ferguson. Inductive fault analysis of MOS integrated circuits. IEEE Design & Test of Computers, 2, December 1985. 159 [65] D. Shin and S. K. Gupta. A technique for re-designing data-path units for im- proving manufacturing yield by exploiting given error tolerance. In Workshop on Design for Manufacturability and Yield, 2007. [66] R. L. Wadsack. Fault modeling and logic simulation of CMOS and MOS inte- grated circuits. Bell System Tech. J., 57:1449–1474, 1978. [67] O. Yeung. Implementation Architectures for Robust Iterative Receivers. PhD thesis, University of Southern California, 2008. [68] O. Yeung and K. Chugg. An iterative algorithm and low-complexity hardware architecture for fast aquisition of long PN codes in UWB systems. In Proc. Allerton Conference on Communications, Control, Computation, Oct 2005. [69] O. Yeung and K. Chugg. An iterative algorithm and low-complexity hardware architecture for fast aquisition of long PN codes in UWB systems. Journal of VLSI Signal Processing, 43, April 2006. [70] H. Zhu. Error-tolerance in digital speech recording systems. Master’s thesis, University of Southern California, 2006. 160
Abstract (if available)
Abstract
VLSI scaling has entered an era where achieving desired yields is becoming increasingly challenging. The concept of error tolerance has been previously proposed with the goal of reversing this trend for classes of systems which do not require completely error-free operation. Such systems include audio, speech, video, graphics, and digital communications. Analysis of such applications has identified error rate as one of the key metrics of error severity. Error rate is defined as the percentage of clock cycles for which the value at the outputs deviates from the corresponding error-free value. An error tolerant application provides a threshold error rate. Chips with error rate less than the threshold are considered acceptable and can be used
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Error-rate and significance based error-rate (SBER) estimation via built-in self-test in support of error-tolerance
PDF
Error tolerant multimedia compression system
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Error tolerance approach for similarity search problems
PDF
Timing-oriented approach for delay testing
PDF
Accurate and efficient testing of resistive bridging faults
PDF
Testing for crosstalk- and bridge-induced delay faults
PDF
Redundancy driven design of logic circuits for yield/area maximization in emerging technologies
PDF
Test generation for capacitance and inductance induced noise on interconnects in VLSI logic
PDF
High level design for yield via redundancy in low yield environments
PDF
Self-assembly for discreet, fault-tolerant, and scalable computation on internet-sized distributed networks
PDF
PerFEC: perceptually sensitive forward error control
PDF
Towards efficient fault-tolerant quantum computation
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
A framework for soft error tolerant SRAM design
PDF
Systematic performance and robustness testing of transport protocols with congestion control
PDF
Dynamic packet fragmentation for increased virtual channel utilization and fault tolerance in on-chip routers
PDF
Implementation architectures for robust iterative receivers
PDF
Lower overhead fault-tolerant building blocks for noisy quantum computers
PDF
Theory, implementations and applications of single-track designs
Asset Metadata
Creator
Shahidi, Shideh M.
(author)
Core Title
Error-rate testing to improve yield for error tolerant applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (Computer Networks)
Publication Date
12/11/2008
Defense Date
09/08/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
error rate,error tolerance,OAI-PMH Harvest,test generation,VLSI testing
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep K. (
committee chair
), Breuer, Melvin A. (
committee member
), Medvidovic, Nenad (
committee member
)
Creator Email
shideh_sh@yahoo.com,sshahidi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1904
Unique identifier
UC160963
Identifier
etd-Shahidi-2443 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-143342 (legacy record id),usctheses-m1904 (legacy record id)
Legacy Identifier
etd-Shahidi-2443.pdf
Dmrecord
143342
Document Type
Dissertation
Rights
Shahidi, Shideh M.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
error rate
error tolerance
test generation
VLSI testing