Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Error-rate and significance based error-rate (SBER) estimation via built-in self-test in support of error-tolerance
(USC Thesis Other)
Error-rate and significance based error-rate (SBER) estimation via built-in self-test in support of error-tolerance
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ERROR-RATE AND SIGNIFICANCE BASED ERROR-RATE (SBER) ESTIMATION VIA BUILT-IN SELF-TEST IN SUPPORT OF ERROR-TOLERANCE by Zhaoliang Pan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2008 Copyright 2008 Zhaoliang Pan ii Dedication To My Family iii Acknowledgements Without the help of many other people, the completion of my thesis would not have been possible. It is a great pleasure and honor to take this opportunity to convey my gratitude to all of them. My greatest gratitude goes to Dr. Melvin Breuer, my research advisor. Throughout my years of graduate studies at the University of Southern California, every single progress of my research has been accompanied and inspired by Dr. Breuer’s support, guidance and scientific insight. Moreover, his unwavering passion in research and unflinching persistence in technology discovery impressed me so deeply that it has profoundly changed my attitude towards work and life. I am deeply indebted to Dr. Sandeep Gupta, who not only guided and advised my research in the area of VLSI test but was also the one who first taught me this material in class. Much of my early knowledge about VLSI test came from his teaching. He has always provided valuable suggestions and critical comments to help me improve my research work. I gratefully acknowledge Dr. Antonio Ortega and Dr. Keith Chugg for their support and guidance for my research. I will never forget how they used to sit through my many presentations, attentively listening and critiquing for hours. Their thoughtful comments have greatly helped shape my research ideas and subsequent paper writing. I would also like to express my appreciation to Shideh Shahidi, Doochul Shin and other student members in the error-tolerance research group at USC for their friendship and assistance. iv Last but not least, I would like to thank my family for their unconditional support throughout these years. I cannot imagine where I would be today without the years of nurturing and dedication from my parents. I am also extremely lucky to have been blessed with my parents-in-law, who have loved me as one of their own. Words fail to express the joy and pride I have been feeling since the day my dear daughter Sophia came into my world. My special thanks go to my wife, Wei Cui, for her constant encouragement, unselfish love and sunny optimism and for standing by me and believing in me through the good and bad times. This thesis is based upon work supported in part by the National Science Foundation under Grant No. 0428940. v Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures ix Abstract xii Chapter 1: Introduction 1 1.1 Background 1 1.2 Facing the Challenges 3 1.3 Examples of System for Error-tolerance 6 1.3.1 Digital Answering Machine 6 1.3.2 CMOS Image Sensor 8 1.3.3 Discrete Cosine Transformation (DCT) 9 1.4 Measures of Acceptability 11 1.4.1 Error-rate and Error-significance 13 1.4.2 Significance-based Error-rate (SBER) 15 1.5 Built-in Self-test Architecture (BIST) 19 1.6 Contributions and Outline 21 Chapter 2: Error-rate Estimation Using Signature Analysis 23 2.1 Introduction 23 2.2 Theoretical Framework 25 2.3 Procedure to Estimating Error-rate 35 2.4 Classification of Chips by Error-rate 39 2.5 Experimental Results 48 2.6 Conclusions 51 Chapter 3: Error-rate Estimation Using Ones Counting 54 3.1 Introduction 54 3.2 Review of Previous Work 55 3.3 Statistical Analysis 59 3.4 Case Studies 65 3.4.1 Case 1: L=1 65 3.4.2 Case 2: 1/L ≈ 0 67 3.4.3 Case 3: p 1 = p 2 >0 68 3.4.4 General Case 71 vi 3.5 Simulation 73 3.6 Classification of Chips by Error-rate 79 3.7 Experimental Results 85 3.8 Error-rate Estimation of Multiple Output Circuits 92 3.8.1 Method of Error-rate Estimation for Multiple Output Circuits 93 3.8.2 Experimental Results 98 3.9 Conclusions 103 Chapter 4: Test for SBER Estimation with BIST 105 4.1 Introduction 105 4.2 Multiple Copies with At Least One Defect-free Copy 107 4.3 Multiple Copies Without Known Defect-free Copy 111 4.4 Single Copy 119 4.4.1 Estimation Method 119 4.4.2 Experimental Results 126 4.4.3 Special Case 128 4.5 Conclusions 132 Chapter 5: Summary of Contributions 135 Bibliography 140 Appendices 143 Appendix A Expectation and Variance of the Estimator Using Ones Counting 143 Appendix B Q Function: Q(x) 150 Appendix C Justification of (3.25) with r<1/5 being maximized when 0 0 = α (Section 3.4.4) 151 vii List of Tables Table 1.1: SBER of 864 single stuck-at faults of C432 for a threshold of 100 19 Table 2.1: P F as a function of r and F for S= 8 and L= 256 26 Table 2.2: r MAX as a function of F and L for S=9 27 Table 2.3: (L Smin , S min ) and (L e , S e ) as a function of r for μ=0.05 and γ =0.9545 35 Table 2.4: Simulation results for estimating error-rate via Procedure 1 38 Table 2.5: Outcomes of hypothesis test 41 Table 2.6: The probability of making erroneous classifications 44 Table 2.7: Classification results from simulation 49 Table 3.1: The maximal likelihood of making an erroneous classification 84 Table 3.2: Average number of test sessions of faulty C432_7 circuits for different error-rate range 89 Table 3.3: Misclassification in the experiment for C432_7 90 Table 3.4: Experimental results for faulty C880_26 circuits 91 Table 3.5: Output of a lower mod-4 circuit according to different combinations of output and error patterns of defective CUTs 96 Table 3.6: Output of an upper mod-4 circuit according to different combinations of output and error patterns of defective CUTs 97 Table 3.7: Mean and variance of the estimated values of P any,2n+1 , P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 and P any,4n+2 +P 2m,4n+1 + P 2m+1,4n+3 99 viii Table 3.8: Distribution of absolute difference between error-rate and the probability of error pattern E any,4n for 1760 faulty copies of C880 102 Table 3.9: Distribution of relative difference between error-rate and the probability of error pattern E any,4n for 1760 faulty copies of C880 102 Table 4.1: Experimental results for defective copies of C432 111 Table 4.2: Experimental results for T =0.03, T n =0.027 and μ = 0.0013. Each set includes three defective copies of C432 117 Table 4.3: Comparison of SBER estimation for C1, C2 and C3 using two methods. One method uses only C1, C2 and C3 together for the estimation. The other method uses C1, C2, C3 and C4 together for the estimation. C1, C2, C3 and C4 are all faulty copies of C432 119 Table 4.4: Probability of a test session having X errors. Numerical examples are provided too. For each value of X, the conditional probability that a test session fails is also listed 121 Table 4.5: The number of sessions (S) for different values of p, given e = 0.01, γ = 0.9545, ε = 0.05, L 1 =100. Comparison between our SBER estimation method and Bernoulli process based estimation method in terms of the number of correct signatures 126 Table 4.6: The different outcomes of the outputs for the special case 129 Table 4.7: The number of sessions (S) for different values of p, given e = 0.01, γ = 0.9545, ε = 0.05. Comparison between our SBER estimation method and Bernoulli process based estimation method in terms of the number of correct signatures 133 ix List of Figures Figure 1.1: Architecture of a digital answering machine 6 Figure 1.2: Average test score over 50 fault distributions for different fault densities 7 Figure 1.3: JPEG image compression process 10 Figure 1.4: Distribution of error-significance of a defective copy of benchmark circuit C432 for the range of 1-255. For error- significance being 0, its probability is about 92%. 18 Figure 1.5: Simplified BIST architecture 20 Figure 2.1: Plot of S vs. L according to (2.4), where r = 0.001, γ=0.9973 and μ=0.01 32 Figure 2.2: Partition of error-rate (chips) into three ranges (Types) 40 Figure 2.3: PDF curve of estimated error-rate r ˆ . P te is the area under this curve and to the left of r th : (a) Corresponds to a chip with true error-rate of r th ; (b) Corresponds to a chip with true error-rate greater than r th 42 Figure 2.4: PDF curve of estimated error-rate r ˆ for the case of true error-rate th r r = 43 Figure 2.5: Plot of S vs. r thn according to (8) when r th =0.1, L=1/r thn , and α=0.05 45 Figure 2.6: Plot of * ms S vs. r, when r th =0.1, r thn =0.09 and α=0.05 47 Figure 2.7: Normalized histogram of the number of test sessions applied. The smaller graph is the enlarged histogram for the number of test sessions between 100 and 1800 49 x Figure 2.8: Normalized histogram of error-rate of 1760 faulty circuits. The smaller graph is the enlarged part of error-rate between 0.05 and 0.12. 50 Figure 3.1: S vs. L based on Eq. (3.22) for different error-rates and 05 . 0 = ε . The points marked by an ‘X’ correspond to the selection of S and L that minimize SxL 69 Figure 3.2: (a) Distribution of estimated error-rate data from simulation. (b) The output from MATLAB tool “normplot” to test whether the data are normally distributed. For this figure, p 1 =0.006, p 2 =0.004, L=1 and S=1.0E+5 76 Figure 3.3: (a) Distribution of estimated error-rate data from simulation. (b) Output from MATLAB tool “normplot” to test whether data are normally distributed. For this figure, p 1 =0.006, p 2 =0.004, L=1000 and S=2000 77 Figure 3.4: (a) Distribution of estimated error-rate data from simulation. (b) Output from MATLAB tool “normplot” to test whether data are normally distributed. For this figure, p 1 =0.005, p 2 =0.005, L=50 and S=4000 78 Figure 3.5: (a) PDF of estimated error-rate of a chip whose true error-rate is r th . (b) PDF of estimated error-rate of a chip whose true error-rate is greater than r th 80 Figure 3.6: The minimal number of test sessions for different true error-rates 85 Figure 3.7: Normalized histogram of error-rates of 864 faulty circuits associated with C432_7. (a) The error-rate range is from 0 to 0.04. (b) The error-rate range is from 0.04 to 0.55. Note: the scales for (a) and (b) are different 87 Figure 3.8: Normalized histogram of the number of test sessions applied when classifying the 864 faulty circuits of C432_7. (a) The number of test sessions is in the range of 1 to 100. (b) The number of test sessions is in the range of 100 to 1100. Note: the scales for (a) and (b) are different 88 xi Figure 3.9: Two different faults in a half adder 93 Figure 3.10: Defective CUT feeds a parity checker 94 Figure 3.11: Defective CUT feeds a lower mod-4 circuit 95 Figure 3.12: Defective CUT feeds an upper mod-4 circuit 96 Figure 3.13: Output of “normplot” for the estimated values of (a) P any,2n+1 , (b) P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 and (c) P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 100 Figure 4.1: Test structure for the scenario where one defect-free copy exists 108 Figure 4.2: M min vs. p, where T=0.05 and T n =0.045 110 Figure 4.3: (a) Distribution of 500 estimated SBER values of C432 with fault stuck-at-0 at signal 39. (b) The output from MATLAB tool “normplot” to test whether the data are normally distributed 130 xii Abstract As CMOS scaling continues, feature size approaches molecular dimensions and the number of devices per chip reaches astronomical values. It becomes more and more difficult to reach desired yield levels. This motivates new models of design and test. One of such models is called error-tolerance, which permits defective chips with acceptable performance to be used in systems. One key issue of error-tolerance is how to justify the acceptability of a defective chip. This thesis considers two of many metrics of acceptability of defective chips. One is error-rate, which indicates how often on average an error is seen at the output when random input patterns are applied. The other is significance-based error-rate (SBER). SBER is also about the frequency of the occurrence of errors, but it only counts those errors whose error-significance is greater than a threshold. The main work of this thesis is developing techniques based on built-in self-test (BIST) architecture to estimate error- rate and SBER, and to classify chips by error-rate and SBER. This thesis consists of three main parts. In the first part, we present the technique of error-rate estimation based on signature analysis. For this technique, we develop the estimator and investigate the statistical characteristics of the estimator. We further develop procedures of estimating error-rate and classifying chips by error-rate, and discuss the selection of test parameters for both procedures. In the second part, we revisit the technique of error-rate estimation based on ones counting. A new estimator is developed. Similar to the estimator based on signature xiii analysis, its statistical characteristics are investigated, and the selection of test parameters is discussed for the procedure of estimating error-rate and classifying chips. We also extend this technique to multiple output circuits. By using parity checker and mod-4 circuits, we are able to estimate the error-rate of multiple output circuits excluding errors in which the number of ones in the observed output minus the number of ones in the correct output is a multiple of 4. In the last part, we identify three scenarios of SBER estimation, namely when (1) there are multiple copies of the target circuit and at least one defect-free copy exists and is known, (2) multiple copies of the target circuit exist and all can be defective, and (3) a single copy of the target circuit exists. We develop different SBER estimation methods for the above scenarios. Experimental results have shows that all our estimation methods use much less storage than the method using Bernoulli process, assuming the latter requires the storage of correct signatures. 1 Chapter 1 Introduction 1.1 Background As CMOS scaling continues, feature sizes have dropped below 65nm and approach molecular dimensions. Feature sizes smaller than 20nm are expected to emerge by 2015 [23]. Smaller feature size means higher device density. It is projected that ASIC (Application-specific IC) and MPU (Microprocessor Unit) will reach 10 10 devices/cm 2 in 2015 [23]. The fabrication of such small devices into highly integrated chip introduces many challenges to the semiconductor industry. One such challenge is reaching desired yields in light of high defect rates, process variation and quantum effects. Yield is defined as the fraction of good chips among all fabricated chips. Good chips are believed to have no harmful error-producing problems associated with defects, process variations and quantum effects, and hence functionally operate as expected. With smaller feature size, chips are more sensitive to manufacturing environment and process. Thus, more defects tend to occur as feature size scales down. Even though there are continual improvements in the manufacturing equipment and process, and the application of higher quality control measurement can reduce the possibility of defects occurring, many chips will fail testing because small feature size increases the sensitivity of CMOS transistor and metal wires to defect. Process variations refer to the variation of electrical parameters of transistors and wires from die to die, wafer to wafer and lot to lot. With smaller feature size, electrical parameters of transistors have smaller dynamic range. 2 In other words, chips with smaller feature size are more likely to fail due to process variation than chips with larger feature size. As CMOS scales down to 40nm and beyond, CMOS devices approach quantum physical boundaries. The quantum effects cause high leakage currents and large variability of device characteristics. Thus it is difficult and costly to maintain yesterday’s yields as CMOS scales down. Beyond current CMOS technology, new technologies have emerged. These technologies bring the size of devices down to molecular level, i.e., several nanometers. With such small size of devices, even higher device density of 10 10 ~10 12 devices/cm 2 will be reached. These technologies include devices such as carbon nanotubes (CNT), nanowires, molecular devices, resonant tunneling devices (RTD), and single electron transistors (SET) [3] [18] [19] [23]. Among them, CNT and nanowires are considered the most promising technologies that will replace CMOS technology. The basic type of CNT is a one-atom thick sheet of carbon molecule rolled up into a cylinder. The diameter of the cylinder is in nanometer scale. A nanowire is a very thin wire of several nanometers wide. Nanowire can be built with silicon or other materials [15]. CNT and nanowires can be used to build CNT FET and nanowire FET to replace CMOS FET. CNT FET and nanowire FET have been successfully used to build simple logic gates [3] [19]. Using logic gates built with CNT FET and nanowire FET, circuit architecture will be similar to that based on CMOS FET. Moreover, due to the special characteristics of CNT and nanowires, crossbars made with CNT and nanowires can be used to build logic too. Cross points in the crossbar can be either closed or opened by 3 changing the voltage between the top bar and bottom bar. Thus, cross points demonstrate the function of switch, which is fundamental for digital logic circuit. Research on new technologies is progressing. More complex circuits can be built now. Fabrication with these new technologies is different from traditional lithograph-based fabrication. For example, crossbar CNT and nanowires depend on a technique called Chemically Assembled Electronic Nanotechnology (CAEN) [4] [27]. CAEN is a self- assembly technique. CNT and nanowires are first self-assembled into regular structures such as arrays. Then by programming cross points, larger circuits are built. However, the difficulties of alignment of wires and accurate placement of devices become the biggest drawback of self-assembly. Thus, the fabricated devices will usually have a high rate of defects [4]. As CMOS scaling-down continues and new device technologies emerge, high defect rates, process variation and quantum effects will make it difficult to reach a desired level of yield. Facing this challenge, developing new design and test methods becomes a must. 1.2 Facing the Challenges Classically, VLSI tests attempt to classify chips into two categories, namely good and bad. Good chips pass the tests and are considered to be fault-free. The operation of good chips always generates correct results. Bad chips fail the tests and are discarded, i.e., are assumed to be of no value. In general, a chip fails the test because it has defects and/or unexpected artifacts of process variations. If a bad chip is used under normal voltage and clocking conditions, errors are assumed to occur at its outputs. 4 Thus, in the future, when tens of billions of transistors are integrated into a chip, it will probably be necessary to use fault-tolerance and defect-tolerance techniques in an attempt to reach desired levels of yield [4] [12]. Fault-tolerance is the ability of a system to continue to output correct results even after the occurrence of some hardware or software faults [20]. An example of a fault- tolerant design is triple-modular-redundancy (TMR). With TMR, three copies of a module and a voter are built into the system. The outputs from the three copies are fed into the voter, which selects the majority value as its output. With no more than one copy of the module failing, a TMR-system works correctly. Defect-tolerance refers to the ability of the chip with defects to produce correct results. The employment of defect-tolerance may lead to an increase in design and test complexity. One type of defect-tolerance design is using simple redundancy technique. For such design, each module in a chip is built with several copies. Via testing, a fault- free copy of each module is identified. Then through reconfiguration, these fault-free copies are connected to build a functionally good chip, while other unused copies are excluded from operation. Different from fault-tolerance and defect-tolerance, another technique, called "error- tolerance", was proposed for the purpose of increasing effective yield, which is defined as the fraction of marketable chips among all fabricated chips. The emerging of error- tolerance is based on the facts that not all applications require perfect or error-free chips. The goal for fault-tolerance and defect-tolerance is to produce error-free chips. The operation of such chips will not generate unexpected errors at the output. However, for 5 error-tolerance, chips with erroneous outputs may also be acceptable and marketable. These chips are considered bad and would be thrown away according to classical VLSI tests. However, under the concept of error-tolerance, those chips with degraded performance can be sold to the market at discounted prices. Accordingly, the number of marketable chips increases, and the effective yield increases. The concept of error-tolerance and related testing methodology has been studied since its inception in the late 1990s [9]. Breuer et al. formally defined error-tolerance as: "A circuit is error tolerant with respect to an application if it contains defects that cause internal errors and might cause external errors, and the system that incorporates this circuit produces acceptable results." [7] There are two aspects from this definition of error-tolerance that make error-tolerance distinct from other techniques. One is that error-tolerance allows some defective and performance-degraded chips to be sold at discounted prices. These defective chips are otherwise discarded according to classical VLSI tests. The operation of these chips with defects causes internal and possible external errors, while chips with fault tolerance and defect tolerance are free of external errors. Even though these chips may be only a small fraction of all defective chips, the profit from selling these defective chips will be large because of the large volume of production. The other distinction is that error-tolerance is application-oriented. Whether a defective chip is marketable or not depends on the performance of the application using this chip is acceptable to the end user. 6 Because error-tolerance is not based on classical VLSI tests, new test techniques must be developed. The research included in this thesis is about test techniques to support error-tolerance. 1.3 Examples of System for Error-tolerance It has been found that error-tolerance is applicable to many systems, most of which reside in the domain of multi-media and man-machine interfaces such as audio, video, artificial intelligence and pattern recognition. Some examples are described below. 1.3.1 Digital Answering Machine A digital answering machine is a system that converts analog voice signal into digital data, compresses the digital data then stores the compressed data into a RAM. When the voice message is retrieved, the compressed data is extracted from the RAM, decompressed then converted into analog voice for playing. The architecture of a digital answering machine is shown in Figure 1.1. ADC Encoder Decoder DAC RAM Input voice Output voice Figure 1.1: Architecture of a digital answering machine 7 Years ago, it has been found that RAM chips with a few defective cells can be used in digital answering machine. There are two patents that are related to using defective RAM in digital answering machine. One is a 1995 US patent entitled "Audio recording apparatus using an imperfect memory circuit" [26]. It claims that It is herein recognized that in an audio signal processing system, a DRAM chip for storing digitized audio signals and selected to include at least one inoperative memory location, is acceptable for use as a storage medium in that no noticeable error is produced on playback of the recorded signal due to the sampling rate of the audio signal and due to the relatively low rate of defects allowed. Furthermore, the use of a less-than- perfect DRAM chip for storing audio information is acceptable due to the substantially non-critical nature of audio signals, as opposed to the extremely critical nature of computer data. [26] In a 1997 US patent entitled "Digital secretary" [2], it is stated that audio RAMs are allowed to have a small number of defective bits in order to allow the use of lower-cost integrated circuit memory chips. Figure 1.2: Average test score over 50 fault distributions for different fault densities [8] 8 Further, Zhu and Breuer investigated using defective RAMs in digital answering machines from the point of view of error-tolerance [8]. In their work, many defective RAMs with different fault densities are simulated. For each defective RAM, a pre- encoded bit stream of speech is stored. Then the bit stream is retrieved from the RAM and decoded and compared with the original speech. The comparator gives a test score called MOS (mean opinion score) in the range of 1 to 5. Test score above 3 means the defective RAM is acceptable. In their simulation, two different codec algorithms, FS1016 and G.723.1, are used. For each algorithm, 50 different fault densities are selected. For each density, 50 different fault distributions are considered. Figure 1.2 shows the average test score over 50 fault distributions for different fault densities [8]. It can be seen that when fault density is below 0.2%, the defective RAM is acceptable with G.723.1 codec, and when fault density is below 0.1%, the defective RAM is acceptable with FS1016. 1.3.2 CMOS Image Sensor A CMOS image sensor is widely used in imaging systems, such as a digital camera. A European patent discusses using a defective CMOS image sensor in an imaging system [14]. In the design of the image sensor, both the CMOS sensor array and digital control circuit are fabricated together on the same chip. One function of the digital control circuit is to filter out abnormal pixel data. When a defective CMOS cell generates a pixel value outside a range determined by its immediately adjacent pixels, the filter can detect the pixel and reset the value for this pixel according to its adjacent pixels. 9 With the increasing computing capacity found in digital imaging devices, the function of the filter can be realized by a digital image processor, which is independent of the CMOS sensor. The image processor can carry out many operations such as filtering out bad pixels, reducing noises and enhancing an image. Thus, a standalone CMOS sensor with a few defective cells usually produces acceptable results. On the other hand, the effect of defective CMOS cell on the sensor is similar to that of dusts on the surface of a CMOS sensor. One “hard” method to remove the dusts from the sensor surface is with a blower. A “soft” method is to determine the location of the dust and send this dust location information to a dust deletion program, which can remove the effect of dust during post-processing. Obviously it is not necessary to replace a CMOS sensor because of some dust, and it is normal to continue using the CMOS sensor with dust. Similarly, it is normal to use CMOS sensor with a few defective cells, rather than throw the defective sensor away. 1.3.3 Discrete Cosine Transformation (DCT) A DCT is a linear transformation carried out in many video and image compression systems. DCT transforms a set of data in the spatial domain into a set of data in the frequency domain. Because most of the image information is in low frequency components, an image can be compressed in the frequency domain with proper coding techniques such as Huffman coding. In most video and image compression methods, the inputs and outputs of DCT are 8x8 array of real numbers. Following the process of DCT is quantization. For example, if 10 the value of a frequency component is 41 and the quantization factor for this frequency component is 15, the value of this frequency component becomes 45 after quantization. Each component in frequency domain has its own quantization factor. Thus, the quantization factors form an 8x8 array. Because quantization changes the image information in frequency domain, the compressed image is different from the original image. This is called information loss. Chong and Ortega investigated the effect of defects in DCT circuit on the JPEG image compression [17]. A typical architecture of JPEG image compression is shown in Figure 1.3. An image is split into 8x8 blocks. Each block is transformed by DCT and quantized independent of other blocks. Then entropy coding generates a bit stream as output. When the DCT circuit is defective, errors occur in the 8x8 blocks of the frequency domain. These errors can be treated as added noise. Some noise can be eliminated during quantization. For example, assume the correct value of one frequency component is 55 and the quantization factor is 20, and the defects in DCT cause the value of this component to be 62; then, after quantization, the value of this component is 60 and equal to the correct value. The noise generated by defects in DCT cannot always be eliminated. In the above example, if the erroneous value is 40, the error remains there after quantization. DCT Quantization Entropy coding 8x8 block (spatial domain) Compressed data stream 8x8 block (frequency domain) Figure 1.3: JPEG image compression process 11 To test what faults in DCT circuits can be accepted, Chong and Ortega studied a typical DCT architecture and invented methods to establish the thresholds for the acceptance of a DCT circuit. With their test methods, they found that “over 50% of single stuck-at interconnection faults in one of the 1D DCT modules lead to imperceptible quality degradation in the decoded images” [17]. This implies that these faults are acceptable. 1.4 Measures of Acceptability Error-tolerance is application-oriented. The measures of acceptability for error- tolerant chips are based on the measures of acceptability of applications using these chips. During the past years, several measures have been introduced for acceptability of error- tolerant chips. One family of measures is called RAS, which represents error-rate, error- accumulation and error-significance [9] [10]. Error-rate indicates how often on average an error is seen at the output when random input patterns are applied. If the circuit is combinational and contains a defect, and the patterns are uniformly selected over the input space, then the error-rate is equivalent to the fraction of the input space that detects the defect, i.e., produces an erroneous output. Error-accumulation deals with the retention of errors in a system. For combinational circuits and feed-forward pipeline circuits, errors are not accumulated. However, for circuit with feedback, errors at the output can remain in the circuit because of feedback. Thus, errors may stay in the system for a long period of clock cycles. Some circuits have 12 a property of short memory, which means they can flush errors and come back to correct operation during a short period of clock cycles. For some applications, the output represents a numerical value. Error-significance deals with how far the erroneous value of the output is from the correct value. A large error, absolute or relative, usually has more serious impact than a small error. For some applications, error-rate or error-significance alone is not sufficient, while significance-based error-rate (SBER) that combines error-rate and error-significance is an appropriate measure. SBER deals with the faction of input patterns whose error- significance is greater than a given threshold. A test of chip can be performed on system level (for example whole chip), module level (for example motion estimation module in MPEG) or block level (for example a combinational circuit). RAS are called generic measures, i.e., they can be used at different levels. There are other measures that are only used for the system level test. The definitions of these measures depend on the specifications of systems using error-tolerant chips. Signal-to-noise ratio (SNR) uses dB as its unit. SNR is an important parameter for many communication systems. Higher SNR means more reliable performance. dB loss refers to the amount of reduction of SNR. With error-tolerant chips used in communication system, dB loss can be used as a measure of acceptability for the system level test. For image and video processing systems, mean square error (MSE) is used to represent the distortion or noise impact in image and video data. The effect of using 13 error-tolerant chips can be treated as extra distortion or noise. In this case MSE can be used as a measure of acceptability. 1.4.1 Error-rate and Error-significance Error-rate and error-significance are two important measures of acceptability. Error- rate and error-significance and related test issues are discussed here from the point of view of VLSI test. For error-rate and error-significance, only combinational circuits are considered. For these circuits, only steady-state Boolean errors at a circuit’s outputs generated by static defects are considered. Thus, error-rate is formally defined as following: Given a combinational circuit C having n inputs and a static defect d, the error-rate r of C due to d is the fraction of all 2 n possible input patterns that produce an erroneous output. The error-rate is thus that fraction of the input space consisting of patterns that detect d. If we assume that the input patterns applied to C are randomly selected and uniformly distributed over the space of all input patterns, then, as the number of patterns applied to C increases, the fraction of output patterns that are erroneous converges to r. [28] As an example of error-rate, the benchmark circuit C880 has been studies for all of its single stuck-at faults. The study shows that 10.7% (13.8%) of the possible faults produced an error-rate that is less than 0.01 (0.015) [28]. In this example, only single stuck-at faults are considered. However, error-tolerance does not deal with faults or a fault model. It focuses on the errors induced by defects. Defective chips can be classified into different categories based on their error-rate. Each category is associated with certain error-rate range. For example, two error-rate thresholds, say r th1 and r th2 , where 0< r th1 < r th2 <1, can be used to divide the range of 14 error-rate into three parts, i.e., (0, r th1 ], (r th1 < r th2 ] and (r th2 , 1]. Defective chips can now be partitioned into three categories, namely category I with error-rate in the range (0, r th1 ], category II in (r th1 < r th2 ], and category III in (r th2 , 1]. Chips in category III (high error-rate) are discarded; chips in category II are sold at a large discount in price; and those in category I are sold at a moderate discount. Error-significance, as used in this thesis, is defined for circuits whose outputs represent an integer value. For such a circuit with defects, the response of each input patterns may be different from the correct response. For an input pattern i, the absolute value of the difference between its observed response value, * i O , and its correct response, i O , is defined as the “significance of the error” for this input pattern i, i.e., | | * i i O O − . Based on this definition of error-significance, the acceptability of a defective chip can be determined. Given an error-significance threshold value for a chip in a given application and a specific defective chip, if the error-significance for all possible input patterns is less than or equal to the threshold, this chip is classified as acceptable chip; otherwise, it is unacceptable chip. For example, C880 is an 8-bit ALU, whose outputs carry integer values in the range of 0~255. Assume the threshold is 25. It has been found that 8% of its single stuck-at faults are acceptable [24]. To accommodate different applications, the definition of error-significance can be extended. For some applications, the relative error significance is concerned. In this case, error-significance for an input pattern i is defined as i i i O O O / | | * − , the ratio of the error to the correct response. 15 For some applications, the output is not a single integer, but a vector of several integers. For example, consider a circuit with 64 output pins. Pin 0 to 15 represent a number, pin 16 to 31 represent a number, pin 32 to 47 represent a number and pin 48 to 63 represent a number. If the four numbers are not related, the above definitions of error- significance can be used for each of the four numbers. For each number, a threshold can be assigned. In some cases, the four numbers may combine together to represent something meaningful. For this situation, a vector distance can be used to define error- significance. Consider two vectors A and B with n components. A 1 , A 2 , …, A n are the n components of vector A. B 1 , B 2 , …, B n are the n components of vector B. One definition of the vector distance between A and B is ∑ = − n i i i B A 1 2 ) ( . Then, the error-significance with respects to an input pattern i and defect d is the vector distance between the observed output vector and the correct output vector. 1.4.2 Significance-based Error-rate (SBER) Error-rate and error-significance represent two different and independent measures of acceptability. The measure of error-rate requires that the frequency of the occurrence of errors must be small enough for acceptance. The measure of error-significance requires that the significance of errors must be small enough for acceptance. When these two measures are used independently, we can find how often an error occurs from error-rate test and how bad an error can be from error-significance test. For some application, using both measures together might be useful. In this case, an error-rate threshold and an error- 16 significance threshold are given. Both error-rate test and error-significance test are applied. When a circuit passes both tests of acceptance, it is accepted. However, this method of simply combining error-rate test and error-significance test does not cover all situations. For some chips, acceptability is not dependent on whether there exists an input pattern whose error-significance is greater than the threshold, but rather on the fraction of input patterns whose error-significance is greater than a threshold. Thus, significance- based error-rate (SBER) is introduced. SBER is defined as the fraction of the input pattern space whose associated error-significance values are greater than a given error- significance value. This concept can be generalized to the situation where there are several error-significance values, each of which is associated with a different error-rate. As a simple example, there may be some application where very small numerical errors are allowed to occur at a high rate, say at most 1 out of 1000 computations, while very large numerical errors are only allowed at a rate of at most 1 in 1,000,000. For a given threshold T ES of error-significance and a circuit with defect d, the measure is defined as the fraction of input patterns that not only detect defect d but also are associated with an error-significance greater than T ES . This measure is different from error-rate and error-significance but involves both error-rate and error-significance, and is referred to as significance-based error-rate (SBER). The related test is called a SBER test. In [16] one can find a study of the acceptability of a defective discrete cosine transform (DCT) module in a JPEG encoder that is based on the concept of SBER. The DCT is used in this encoder to transform data from the spatial domain to the frequency 17 domain. The input to the DCT includes 64 binary integers, each of which represents one element in an 8x8 block. Its output also consists of 64 binary integers, each of which refers to a frequency component. For image compression systems, the mean square error (MSE) is often used as a metric for compression quality. It represents the average value of a function of the error over many input patterns, so it can be treated as a variant of error-significance. However, [16] shows that MSE is not an appropriate metric for the acceptability of faulty DCT because large errors in a few input patterns that lead to unacceptable results can still result in a small MSE value. Thus, [16] proposes to use a SBER measure at each frequency component (one of 64 output numbers), i.e., at each frequency component a SBER threshold value is specified. If for every frequency component its SBER value is smaller than the associated threshold, the defective DCT is considered to be acceptable. As another example of this SBER measure, consider an audio processing system whose output is a digital audio signal in the time domain that will eventually produce sound consumed by human beings. Assume the sampling frequency of this system is 11.5 kHz, which means the system outputs 11,500 numbers every second. Assume the system has defects that cause errors in the output audio signal. The effect of these errors is equivalent to added noise. A small amount of noise can be easily tolerated, even though it might occur almost all of the time. When the noise is very large but only occur very infrequently, it might also be tolerated. Only the large noise with high frequency of occurrence would not be tolerated. Then, for this system, SBER can be used as the measure of acceptability. 18 50 100 150 200 250 0 0.002 0.004 0.006 0.008 0.01 Error-significance in the range of 1 to 255 Distribution Error-significance threshold: 100 Figure 1.4: Distribution of error-significance of a defective copy of benchmark circuit C432 for the range of 1-255. For error-significance being 0, its probability is about 92%. For an illustrate example, consider the benchmark circuit C432. The output of C432 is 8-bit wide and can be interpreted as being a binary integer. The range of data is 0 to 255. Error-significance is the absolute value of the difference between the observed output and the correct output. Thus, the range of error-significance is from 0 to 255. To demonstrate a distribution of error-significance, a defective copy of this circuit containing a single stuck-at fault is studied. A large number of input patterns are applied to this circuit, and their associated error-significance values recorded. About 92% of input patterns have zero error-significance. Figure 1.4 shows the distribution in the range of 1- 255. If the error-significance threshold is 100, for this defective copy of C432, its SBER is about 0.0133, which means 1.33% of the input patterns produce a value of error- significance greater than 100. Again consider C432 and an error-significance threshold of 100. This time, all 864 single stuck-at faults of C432 are individually considered. For each single stuck-at fault, its SBER is obtained via simulation. Table 1.1 shows the result and lists the percentage of 19 these faults for different SBER ranges. From the table, we can see that more than 92% of the faults have a SBER value less than 0.03. Different defective chips can have different SBER values. Thus, chips can be classified according to SBER. Table 1.1: SBER of 864 single stuck-at faults of C432 for an error- significance threshold of 100 SBER range Percentage of faults 0~0.01 78.24% 0.01~0.02 11.92% 0.02~0.03 2.08% >0.03 7.76% Similar to error-rate, the function of a SBER test is to classify chips according to their SBER value. As the basis of classification, a method for estimating SBER values for different threshold values is needed. 1.5 Built-in Self-test Architecture (BIST) To estimate error-rate and SBER, a scan-based test method can be used. For such a method, test patterns are scanned in and test responses are scanned out through scan chain. The responses are compared with pre-stored correct responses to see whether they are different and/or whether the error-significance is greater than a given threshold. The number of test patterns whose response is erroneous and/or whose error-significance is greater than a given threshold is recorded. The recorded number is divided by the total number of applied test patterns. The result is the error-rate or SBER. This method is based on the Bernoulli process. It requires that a large amount of correct responses be stored, and takes a great amount of time to scan in and out patterns. 20 R1 C R2 Figure 1.5: Simplified BIST architecture To reduce storage and test time, estimation techniques based on BIST are developed in our work. Currently, built-in self-test (BIST) is a common technique used to support the testing of a chip [1]. One classical BIST architecture, referred to as BILBO [25], consists of partitioning a circuit into combinational blocks of logic surrounded by flip- flops. A simplified version of this architecture is shown in Figure 1.5. R1 that drives the inputs to a block of logic C is configured to operate as a pseudo-random pattern generator (PRPG) during the test mode, and R2 driven by the output of C is configured to operate as a multi-input shift register (MISR), also known as a signature analyzer (SA). These registers are usually configured from linear feedback shift register (LFSR) circuitry. During the test mode, the PRPG applies L test patterns to C and the response is collected in the MISR. The final state of the MISR is called the signature. If a defect exists in C, it is possible that one or more of the L response patterns from C that enter the MISR are in error, resulting in an incorrect signature. By using a large enough value for L and properly designed PRPG and MISR, this test architecture can fairly accurately differentiate between blocks of logic that contain defects that result in errors and blocks that do not result in errors. It is important to note that this test scheme depends on the fact that if one or more output patterns from C are in error, there is a very high 21 probability that the signature is also in error. No counting occurs that keeps track of how many patterns are in error, or how many bits in an erroneous pattern are in error, or even the bit position of any erroneous bit. There are many variations of this test architecture that are nearly isomorphic in function to this BILBO technique and that are widely used in commercial systems (see chapter 11 in [1]). In our work, we enhance the classical BIST test technique so as to produce test data that can be used to estimate error-rate and SBER. Especially, to accommodate different situation of error-rate and SBER estimation, R2 may be implemented in different ways, such as by a LFSR, a counter or an accumulator. 1.6 Contributions and Outline This research focuses on techniques of error-rate and SBER estimation based on BIST architecture. For error-rate estimation, two techniques are developed. One technique is based on signature analysis, which implements the response compressor using an LFSR. The other technique is based on ones counting, which implements the response compressor using a binary counter. For both techniques, estimators are developed and analyzed using statistical methods. Factors affecting test parameters are also analyzed and methods of determining test parameters developed. For SBER estimation, three scenarios are identified. For each scenario, a SBER estimation method is developed, and an estimator is provided, along with statistical analysis. 22 The development of our estimation method focuses on the reduction of storage. Compared with estimation method using Bernoulli process, our methods use much less storage. This thesis is organized as follows. Chapter 2 discusses error-rate estimation based on signature analysis. A theoretic framework for this estimation method is presented. Based on this framework, a procedure for efficiently and accurately estimating the error-rate of a circuit is proposed. To support the analysis, simulation results are shown in this chapter. Finally, an effective method of chip classification based on error-rate, as well as several implementation issues, are discussed with experimental results. Chapter 3 presents error-rate estimation based on ones counting. Similar to signature analysis, we show the theoretical analysis of this method, and the method of classifying chips using this method. Distinct from signature analysis, different ones counting based methods are developed for single output circuit and multiple output circuit, respectively. Chapter 4 presents SBER estimation. For SBER estimation, we identify three scenarios, for each of which a SBER estimation method is developed. This chapter also presents statistical analysis, simulation and experimental results for each method. Chapter 5 completes this thesis with conclusions. 23 Chapter 2 Error-rate Estimation Using Signature Analysis 2.1 Introduction The test methodology for error-rate estimation is based on the BIST architecture discussed in Chapter 1, where the response compressor is implemented by MISR. During the test, the PRPG generates L test patterns that are fed into a Circuit Under Test (CUT). MISR compresses L responses into one pattern called the signature. Assume that we repeat this testing process S times, each time reinitializing the PRPG and MISR to appropriate new states. While the actual error-rate is not known, for appropriate values of L and S, it should be possible to accurately estimate the value of r. For example, for L=1 and S equal to or greater than 100/r, the estimate of r is quite good. Unfortunately, for this situation, 100/r correct signatures need to be stored. But, for any one of the S test sessions, if an incorrect signature is obtained, it is known that the applied test pattern produced an error. For L>1, if a test session fails, i.e., produces an incorrect signature, it is not known whether 1, 2, …, or L of the responses are in error. Hence there is information loss when carrying out this form of compression. Some of the issues regarding error-rate that are addressed in this chapter are listed next: 1) Derivation of a theoretical framework for accurately estimating the error-rate of a circuit. 2) Development of an efficient process for estimating error-rate. 24 3) Establishing an efficient process for accurately and efficiently partitioning error-producing circuits into bins based on their error-rate. One can attempt to reformulate our error-rate estimation problem in terms of a more classical defect-rate problem found in manufacturing. Assume one is manufacturing large quantities of some device, D. Given a large number of such devices, let the fraction of these devices that are defective be d, the defect-rate. It is important for the manufacturer to sample some of the newly made devices to determine if the current defect rate, d', is above some specified bound. In addition, a buyer of large quantities of these devices might receive a large shipment and want to determine whether or not to accept the shipment. Part of his decision might be based on sampling some of these devices and determining how many are defective. These and related problems fall under the general category of sampling and acceptance, and have been extensively studied for over 50 years [22], [21], [5], [6]. We have been unable to identify any results in the literature that directly apply to our problem in estimating error-rate, nor any results that are similar to those presented here. We believe the reason that our approach is unique is the fact that when a circuit is tested in the BIST environment, a test session either passes or fails based on whether one or more erroneous outputs occurred. If one or more erroneous outputs occur, the test session fails; otherwise, it passes. In solutions to classical sampling and acceptance problems, the focus is on the number of defective devices in each sample, and on the number of decisions that must be made. As mentioned in Chapter 1, the estimation method based on Bernoulli process is one solution of estimating error-rate. However, this method needs a large amount of storage 25 for correct responses. Thus, how our estimation methods reduce the amount of storage is emphasized in our discussion. This chapter is organized as follows. In Section 2.2, we present the theoretical framework regarding error-rate estimation as well as some related prior work. In Section 2.3 we present a procedure, based on this theory, for efficiently and accurately estimating the error-rate of a circuit. The simulation results for this procedure are presented in Section 2.3 as well those that demonstrate its fast convergence. An effective method for classifying chips according to their error-rate as well as several implementation issues are discussed in Section 2.4. Section 2.5 presents the experimental results of classifying chips. Section 2.6 includes our conclusions. 2.2 Theoretical Framework The scenario of using signature testing for error-rate estimation is to apply a number of test sessions, say S, to the CUT, and count how many test session fail, say F. The length of the test sequence for each test session is the same, say L. The application of S test sessions having a fixed value of L is called a test. The value of L and S can be different for different test experiments. F is the only quantity that is measured directly. For each faulty CUT, its error-rate, denoted by r, is fixed and is only known by an oracle. The estimated error-rate is denoted by r ˆ . Moreover, let r R − =1 and r R ˆ 1 ˆ − = . Because the test patterns are randomly generated, the measured quantity F is a random variable having a value in the set {0, 1, 2, …, S}. The probability that a test 26 session passes, i.e., every response is correct, is R L . Thus, the probability that it fails is 1− R L . So the probability that F test sessions fail is F S L F L F R R F S P − − = ) ( ) 1 ( . This model was first introduced in [11], and briefly discussed in [7] and [10]. In Table 2.1 we indicate some numerical results relating the value of P F to r and F for a fixed value of S and L. Table 2.1: P F as a function of r and F for S= 8 and L= 256 F= 0 1 2 3 4 5 6 7 8 rSL r is P F is P F is P F is P F is P F is P F is P F is P F is P F is 0.2 0.0001 0.818 0.166 0.015 0.001 2 0.001 0.135 0.307 0.306 0.174 0.062 0.014 0.002 0.000 4 0.002 0.018 0.095 0.216 0.280 0.227 0.118 0.038 0.007 0.001 8 0.004 0.000 0.005 0.028 0.095 0.204 0.281 0.243 0.119 0.026 16 0.008 0.000 0.002 0.013 0.066 0.213 0.392 0.316 32 0.016 0.000 0.008 0.125 0.867 64 0.031 0.003 0.998 128 0.063 1.000 In column 1, rSL represents the expected number of errors that will occur when a total of S*L unique random test patterns are applied. Entries that are very small are left blank. The entries in bold face represent the largest entry in each row; in most cases they are also the largest entry in each column. If the error-rate is small compared to 1/L = 0.003906, one would expect F to also be small. For example, when r = 0.00010, the most probably value of F is 0, (P F = 0.8181), with other values being very improbable. For r much larger than 1/L, one would expect F to be large. Note that for r = 0.00391, which is close to 1/L, the expected total number of errors is 8, but this does not mean that exactly one error occurs for each test session. 27 While the most likely value of F is 3, F equal to 2 and 4 are almost as likely. From this table it is seen that given F, it would be difficult to estimate the value of r with very much precision and confidence. Table 2.2: r MAX as a function of F and L for S=9 F= 0 1 2 3 4 5 6 7 8 L 1 0 1.25E-01 2.50E-01 3.75E-01 5.00E-01 6.25E-01 7.50E-01 8.75E-01 1 2 0 6.46E-02 1.34E-01 2.09E-01 2.93E-01 3.88E-01 5.00E-01 6.46E-01 1 4 0 3.28E-02 6.94E-02 1.11E-01 1.59E-01 2.17E-01 2.93E-01 4.05E-01 1 8 0 1.66E-02 3.53E-02 5.71E-02 8.30E-02 1.15E-01 1.59E-01 2.29E-01 1 16 0 8.31E-03 1.78E-02 2.89E-02 4.24E-02 5.95E-02 8.30E-02 1.22E-01 1 32 0 4.16E-03 8.95E-03 1.46E-02 2.14E-02 3.02E-02 4.24E-02 6.29E-02 1 64 0 2.08E-03 4.48E-03 7.32E-03 1.08E-02 1.52E-02 2.14E-02 3.20E-02 1 128 0 1.04E-03 2.24E-03 3.67E-03 5.40E-03 7.63E-03 1.08E-02 1.61E-02 1 256 0 5.21E-04 1.12E-03 1.83E-03 2.70E-03 3.82E-03 5.40E-03 8.09E-03 1 512 0 2.61E-04 5.62E-04 9.18E-04 1.35E-03 1.91E-03 2.70E-03 4.05E-03 1 1024 0 1.30E-04 2.81E-04 4.59E-04 6.77E-04 9.57E-04 1.35E-03 2.03E-03 1 4096 0 3.26E-05 7.02E-05 1.15E-04 1.69E-04 2.39E-04 3.38E-04 5.08E-04 1 16384 0 8.15E-06 1.76E-05 2.87E-05 4.23E-05 5.99E-05 8.46E-05 1.27E-04 1 65536 0 2.04E-06 4.39E-06 7.17E-06 1.06E-05 1.50E-05 2.12E-05 3.17E-05 1 Also in [11], it was shown that one can take the derivative of the expression for P F with respect to R, set this to 0, and solve for the value r that maximizes P F for fixed values of S, L and F. The result is r MAX = 1-[(S-F)/S] 1/L . Table 2.2 indicates a few results relating r MAX to L and F, where S = 8. Consider the case for L=64. Here, if F=0, the expected value of error-rate is 0. As F increases, so does the expected value of error-rate, until F reaches 8, at which time the expected value reaches 1. Note that for L=1 and F=S/2=4, the value of r MAX is 0.5. As we hold F constant and increase L, the corresponding value of r MAX monotonically decreases. Again, more analysis is required in terms of the selection of L and S before tables such as these can be used to accurately and reliably predict the value of r. In the 28 following, we provide the missing framework so that we can accurately and reliably use this self-test approach to estimate the actual error-rate. The random variable F has a binomial distribution with a discrete probability density function (PDF) of F S L F L F R R F S P − − = ) ( ) 1 ( . From the PDF of F, its expectation, F exp , can be expressed as ) 1 ( exp L R S F − = , and hence L S F r / 1 exp ) / 1 ( 1 − − = . Thus, if the expectation of F can be obtained, the exact error-rate r can be determined. Unfortunately, the measured value of F is unlikely to equal its expectation. However, there is a formula to determine the estimated error-rate r ˆ from F, namely L S F r / 1 ) / 1 ( 1 ˆ − − = , (2.1) where F /S and r ˆ are in the range [0, 1]. S and L are fixed; both F/S and r ˆ are random variables. The expected value of r ˆ is r, and that of F/S is 1−R L . In general, we desire r ˆ to be in a range, say [r(1−μ), r(1+μ)], with a confidence level of γ. μ is usually a small non-negative number less than 1, and γ is a probability. From (2.1) we obtain L r S F ) ˆ 1 ( 1 / − − = (2.2) Thus, when r r = ˆ , then L R S F S F − = = 1 / / exp ; when ) 1 ( ˆ μ − = r r , 1 ) 1 ( ) 1 ( )) 1 ( 1 ( 1 / − − − − ≈ − − − = L L L r Lr R r S F μ μ ; and when ) 1 ( ˆ μ + = r r , 1 ) 1 ( ) 1 ( )) 1 ( 1 ( 1 / − − + − ≈ + − − = L L L r Lr R r S F μ μ . 29 From the PDF of F, it is easy to get the PDF, expectation and variance of F/S. Here exp ) / ( S F denotes the expectation of F/S, and 2 / S F σ the variance of F/S. Thus, L R S F − = 1 ) / ( exp and S R R L L S F / ) 1 ( 2 / − = σ . Because F has a binomial distribution, its PDF converges to a normal distribution when S is infinite, and so does that of F/S. Even though S, in practice, is not infinite, the PDF of F/S can be approximated by the normal distribution N F / S (1− R L , (1− R L )R L /S). Explicitly we have S R R R S F L L S F L L L e S R R S F P / ) 1 ( 2 )) 1 ( / ( / 2 / ) 1 ( 2 1 ) / ( − − − − − ≈ π (2.3) This approximation simplifies the calculation of γ. Namely, γ can be expressed as r d r P r r r ˆ ) ˆ ( ) 1 ( ) 1 ( ∫ + − μ μ ) , where ) ˆ (r P r ) is the PDF of r ˆ . r ˆ is a function of F/S, so r d S F d S F P r P S F r ˆ ) / ( ) / ( ) ˆ ( / ˆ = . Thus, r d r d S F d S F P r d r P r r S F r r r ˆ ˆ ) / ( ) / ( ˆ ) ˆ ( ) 1 ( ) 1 ( / ) 1 ( ) 1 ( ˆ ∫ ∫ + − + − = = μ μ μ μ γ ∫ + = − = = ) 1 ( ˆ ) 1 ( ˆ | ) / ( | ) / ( / ) / ( ) / ( μ μ r r r r S F S F S F S F d S F P ∫ − − − + − − − − − − ≈ 1 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( / ) / ( ) / ) 1 ( , 1 ( L L L L r Lr R r Lr R L L L S F S F d S R R R N μ μ 30 With the assistance of the Q function (see Appendix B), dt e x Q x t ∫ ∞ − = π 2 / ) ( 2 / 2 , and its inverse function ) ( 1 x Q − , we obtain S R R r r r L Q L L L / ) 1 ( ) 1 /( ) 1 ( ) 2 / ) 1 (( 1 − − − = − − μ γ , which is equivalent to ) 1 ) 1 ( 1 ( ) 1 1 ( )) 2 / ) 1 (( ( 2 2 2 2 1 − − − − = − L r r L Q S μ γ (2.4) Equation (2.4) relates the five quantities, S, L, r, μ and γ, to one another. If any four of these quantities are known, the fifth one can be computed. If any three of them are known, the two unknown quantities can be resolved under certain constraints. Usually μ and γ are specified. Only the oracle knows the value of r. However, we can guess an initial value for r. Usually we desire to determine appropriate values for L and S to optimize some objective functions, such as: a) determining r while minimizing the total number of test patterns, L*S, applied to C -- this is a good estimate of total test time; b) determining r while minimizing S, the number of signatures that need to be stored. c) determining r while minimizing some function of L and L*S. We can associate a cost function c 1 *S+c 2 *L*S with the test process, where c 1 is the storage cost coefficient, and c 2 the time cost coefficient, where both c 1 and c 2 are non- negative. Assume μ and γ are given, r is known to an oracle, and S and L are unknown. To find S and L is equivalent to answering the question: for what values of S and L is the 31 estimated error-rate r ˆ in the range of [r(1−μ), r(1+μ)] with probability γ subject to minimizing some cost function? There are two special case for this cost function, one when c 1 =0 and c 2 ≠0, the other when c 1 ≠0 and c 2 =0. For the first case, only time is of concern. From (2.4), S*L can be expressed as L r K L / ) ) 1 /( 1 1 ( − + − , where K is a positive constant. It is easy to see that S*L is minimized when L=1. Intuitively this follows because there is no ambiguity in the number of errors produced when a test session fails. For the second case, only storage cost is of concern, so L can be selected such that S is minimized. Let L Smin and S min denote the values of L and S, respectively, that minimize S. In Figure 2.1, S vs. L is plotted according to (2.4), where r = 0.001, γ = 99.73% and μ = 0.01. This figure shows that S does have a minimal value when L is neither small nor large. It is consistent with our intuition. When L is small, most sessions are likely to pass. When L is large, most sessions are likely to fail. In both cases, a session provides little information. Thus, more test sessions are needed to gain enough information to estimate error-rate. So both low value and high value lead to high value of S, and there is a minimal value of S when L is neither small nor large. For other values of r, γ and μ, the plots of S vs. L have similar shape. To find L Smin and S min , we set the differential of S with respect to L to zero. Thus, 0 ) ) 1 ( 2 ) 1 ( 2 ( ) 1 ( 1 ) 1 1 ( )) 2 / ) 1 (( ( 3 2 2 2 1 = − − − − − − − = − L L L r Log r L r r Q dL dS μ γ . 32 0 1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 x 10 5 L S r = 0.001 μ μ μ μ = 0.01 γ γ γ γ = 0.9973 (L Smin , S min ) (L e , S e ) Figure 2.1: Plot of S vs. L according to (2.4), where r = 0.001, γ=0.9973 and μ=0.01 Setting the last term in this expression to zero, we get ) 1 ( 2 1 ) 1 ( r Log L r L − + = − . The root of this equation gives the value of L Smin . That is, L Smin satisfies the relation ) 1 ( 2 1 ) 1 ( min min r Log L r S L S − + = − (2.5) After L Smin is computed from (2.5), S min can be computed from (2.4) by setting L = L Smin . There is no closed form expression for L Smin because (2.5) is a transcendental function. However, L Smin can be computed numerically. Good values of L and S can also be determined as follows. Rewriting (2.2) by replacing r ˆ 1− with R ˆ , we obtain L R S F ˆ 1 / − = . Thus ) ˆ ( ) ˆ ( ) / ( / 1 exp r r LR R R S F S F L L L − − ≈ − − = − − . 33 After carrying out S test sessions, exp ) / ( / S F S F − is fixed. To minimize | | r r − ) , 1 − L LR should be maximized. It is easy to show that 1 − L LR is maximal when ) 1 ( / 1 / 1 r Log LogR L − − = − = . Let ) 1 ( / 1 r Log L e − − = (2.6) and S e denote the value of S when L = L e . Therefore ) 1 ( ) 1 ( ) 1 1 ( )) 2 / ) 1 (( ( 2 2 2 2 1 r Log e r Q S e − − − − = − μ γ (2.7) (L Smin , S min ) and (L e , S e ) are shown in Figure 2.1. The computation of L e and S e is simpler than that of L Smin and S min . Though S e is a little larger than S min , the value of L e *S e is about 30% less than L Smin * S min . In Table 2.3 we indicate some values of (L Smin , S min ) and (L e , S e ) for different values of r and μ, where γ =0.9973. Note that ) 1 ( / 1 r Log L e − − = can be expanded in series form as ) ( 02639 . 0 04167 . 0 8333 . 0 5 . 0 / 1 4 3 2 r o r r r r L e + − − − − = . For r < 0.01, L e can be approximated by 1/r. Thus, S e can be approximated by 2 2 2 1 / ) 1 ( ) 1 ( )) 2 / ) 1 (( ( μ γ − − − − e r Q . This result helps explain the observations made concerning the data in Table 2.1. Note that there are three outcomes to a test session, namely passing, failing due to a single error, and failing due to two or more errors. Let the probabilities of these three outcomes be p 1 , p 2 and p 3 , respectively. The entropy (from Shannon’s Theorem [30]) associated with a test session is 3 2 3 2 2 2 1 2 1 p Log p p Log p p Log p E − − − = , 34 where p 1 +p 2 +p 3 =1. The entropy corresponds to the information gained by carrying out a test session. It is easy to show that p 1 = p 2 = p 3 =1/3 maximizes E. That is, (F/S) exp = p 2 + p 3 =2/3. So 3 / 2 ) 1 ( 1 = − − L r , and L=(−Log3)/Log(1−r) = −1.09/Log(1−r)≈1/r. Table 2.3 shows the values of S and S*L for three cases, namely L=1, L=L e and L=L Smin . When L = 1 the value of S*L is smallest, which is consistent with minimizing test time. But the value of S is very large. Even though the S*L is smallest, the S is very large compared to other selection of S and L. As r decreases, the value of S monotonically increases. When L=1 this test scheme reduces to the Bernoulli process. This process necessitates the use of a large amount of storage for correct responses. When L = L e or L = L Smin , the value of S slowly and monotonically decreases as r increases. This is because that portion of (2.4) that is a function of r, namely 2 2 / ) 1 ) 1 /( 1 ( ) 1 / 1 ( L r r L − − − , is not too sensitive to the value of r in the range of interest. Thus S is primarily effected by γ and μ. As μ becomes smaller and γ becomes larger, and hence the estimate more precise and reliable, S increases. From ) 1 ( / 1 r Log L e − − = , it can be shown that e L e r / 1 1 − = − . Replacing ) 1 ( r − by e L e / 1 − in (2.5), we obtain e S L L L L e e S 2 / 1 min / min − = − . Thus, L Smin / L e ≈ 1.59. Similarly, it can be shown that S min * L Smin / S e * L e ≈ 1.42. Assume L e is chosen for our estimation method. Comparing the storage cost of our method and Bernoulli process based method, we have )) 1 ( ( ) 1 ( 1 − − = = e r r S S e L . Thus, as error-rate becomes smaller, our method saves more storage. Numerical data is also shown in Table 2.3. 35 While it appears that the value of S is very large and thus might not be useful for binning components by their error-rate, we will soon show that, in general, this is not a problem. Table 2.3: (L Smin , S min ) and (L e , S e ) as a function of r for μ=0.05 and γ =0.98 r S | L=1 L e S e S e * L e S L=1 / S e L Smin S min S min * L Smin 1e-5 2.16e+8 1e+5 3718 3.72e+8 5.82e+4 159362 3342 5.33e+8 5e-5 4.33e+7 2e+5 3718 7.44e+7 1.16e+4 31871 3341 1.06e+8 1e-4 2.16e+7 1e+5 3717 3.72e+7 5.82e+3 15935 3341 5.33e+7 1.5e-4 1.45e+7 6667 3717 2.48e+7 3.89e+3 10623 3341 3.54e+7 2e-4 1.08e+7 5000 3717 1.85e+7 2.91e+3 7967 3341 2.66e+7 3e-4 7.21e+6 3333 3717 1.24e+7 1.94e+3 5311 3341 1.77e+7 4e-4 5.41e+6 2500 3717 9.29e+6 1.46e+3 3983 3341 1.33e+7 5e-4 4.33e+6 2000 3717 7.44e+6 1.16e+3 3186 3339 1.06e+7 6e-4 3.61e+6 1667 3716 6.19e+6 9.72e+2 2655 3339 8.87e+6 7e-4 3.08e+6 1428 3716 5.30e+6 8.30e+2 2276 3339 7.60+6 8e-4 2.71e+6 1249 3716 4.64e+6 7.28e+2 1991 3339 6.65e+6 1e-3 2.16e+6 1000 3714 3.72e+6 5.83e+2 1593 3338 5.32e+6 2e-3 1.08e+6 500 3710 1.85e+6 2.91e+2 796 3335 2.65e+6 3e-3 7.20e+5 333 3706 1.23e+6 1.94e+2 530 3331 1.77e+6 4e-3 5.38e+5 250 3701 9.25e+5 1.45e+2 398 3329 1.32e+6 5e-3 4.30e+5 200 3697 7.40e+5 1.16E+2 318 3325 1.06e+6 2.3 Procedure to Estimating Error-rate In Section 2.2, we developed a theoretical framework to investigate error-rate with respect to a given test methodology. In this section, we use this theory to develop a procedure for estimating the (unknown) error-rate of a CUT. A formal definition of our problem follows. Consider a combinational circuit containing a defect. Given values for μ and γ, we desire to design an effective and 36 efficient procedure to determine the error-rate r ˆ , where the probability that r r r / | ˆ | − is smaller than μ is at least γ. One such procedure is given next. Procedure 1: Estimating Error-Rate Step 0: Set old r ˆ to an initial estimated error-rate that is supplied by the user. Step 1: Based on old r ˆ , μ and γ, compute values of L and S using (2.4). Step 2: Carry out one test experiment, i.e., S test sessions, and determine F. Step 3: Use (2.1) to compute the estimated error-rate, and assign this value to new r ˆ . Step 4: Compute e = old old new r r r ˆ / | ˆ ˆ | − . If e > μ, assign new r ˆ to old r ˆ and go to step 1, otherwise stop. Note that if the initial error-rate is much larger than the true error-rate, then L will be smaller than desired and most or all test sessions will pass in step 2 (F=0). For this case a smaller value of error-rate must be selected. If the initial error-rate is much smaller than the true error-rate, it is very likely that every test session fails in step 2 (F=S). For this case a larger initial value of error-rate must be selected. An informal argument on the convergence of this procedure can be made from the viewpoint of probability. From Section 2.2, we know that the estimated error-rate can be approximated by a normal distribution of N(r, Var( r ˆ )), where r is the true error-rate. In other words, the estimated error-rate is centered at the actual error-rate r. Assume at some iteration we have old r ˆ , and r r old > ˆ . From old r ˆ , we compute values of L and S, and obtain a new estimation of error-rate new r ˆ . Because Prob( old new r r ˆ ˆ > ) < Prob( old new r r ˆ ˆ < ), so new r ˆ is more likely to be smaller than old r ˆ , and closer to r. Thus, our estimation procedure will 37 eventually converge to the actual error-rate. If the estimated error-rate is too close to the actual error-rate, it may oscillate around the actual error-rate as iteration continues. The stop condition in Step 4 will terminate the procedure. With old r ˆ , μ, γ, L and S satisfying (2.4) in step 1, it is guaranteed that the estimated error-rate is in the range of [r(1−μ), r(1+μ)] with probability γ when the procedure stops. Any method of choosing L and S discussed in Section 2.2 can be used. Table 2.4 presents simulation results based on this procedure for estimating error-rate. In step 1 of the procedure, we use (2.6) and (2.7), the formulas for L e and S e , to compute L and S. The simulation results show that the error-rate estimation procedure is fast and usually terminates after several iterations. The value of S is usually large, which means it could be costly to store the fault-free signatures on-chip. This is not a problem when binning chips, because the problem in binning is not one of identifying the error-rate of a chip. In the procedure, the value of L is updated every iteration and is usual different for different defective circuits. Thus, even storing fault-free signatures off-chip might be impractical. One solution to this situation is to test the CUT off-line, and use a “golden- circuit” to generate the signatures for a “fault-free” circuit. Another solution is to only use a set of fixed or quantized values for L. For example, if the error-rate of interests varies from r min to r max , then this range can be divided into disjoint but continuous intervals. For each interval, say [r b , r t ], its value of L is ) /( 2 b t r r + , and its value of μ is ) /( ) ( b t b t r r r r + − . 38 Table 2.4: Simulation results for estimating error-rate via Procedure 1 True error- rate r Initial error- rate in step 0 Accuracy μ Number of iterations Value of S for last iteration Finally estimated error-rate 0.0001 0.001 0.05 3 3718 0.0001001 0.0001 0.01 0.05 3 3718 0.000098 0.0001 0.1 0.05 3 3718 0.0000994 0.0001 0.1 0.03 3 10327 0.00009999 0.0001 0.1 0.01 4 92949 0.00009991 0.005 0.1 0.05 3 3706 0.00494 0.005 0.1 0.03 3 10295 0.00503 0.005 0.1 0.01 3 92612 0.00499 0.004 0.1 0.05 2 3709 0.00395 0.004 0.1 0.03 3 10300 0.00407 0.004 0.1 0.01 3 92591 0.00401 0.002 0.1 0.05 3 3712 0.00201 0.002 0.1 0.03 4 10300 0.00203 0.002 0.1 0.01 3 92786 0.00201 0.001 0.1 0.05 3 3715 0.00103 0.001 0.1 0.03 3 10319 0.000996 0.001 0.1 0.01 3 92872 0.001015 Note: In the simulation, the value of γ is 0.98 To demonstrate that the value of S when γ and μ are held constant changes very little, from Table 2.4 it can be seen that the final value of S when μ = 0.03 is 10327, 10295, 10300, 10300 and 10319. In addition, when μ decreases from 0.05 to 0.03 to 0.01, S increases from about 3700 to 10300 to 93000, respectively. Note that there are many ways to potentially reduce the time required to execute Procedure 1, but these will not be dealt with in this thesis. The important point is that the final value of S is dictated by γ and μ, and hence cannot be reduced. We next show that to partition chips into bins seldom requires the use of large values of S. 39 2.4 Classification of Chips by Error-rate The purpose for error-tolerance is to increase effective yield. In today’s marketplace, most chip vendors do not intentionally sell chips that produce errors. In the near future this business model might change, and vendors will market defective chips for some applications as long as the error-rate from these chips is acceptable. In practice, vendors can sell error-free chips for a given price, and chips with small error-rates for a lower price. Unacceptable defective chips are discarded. Note that once a chip has been tested as part of a wafer, packaged, retested as part of a stress process, and then fails final package test, a company has already invested a significant fraction of the price of this chip. At this point, rather than discard this chip, it may be worth increasing their investment to see if this chip is marketable. It may not be practical or necessary to determine the error-rate of each defective chip. Error-rate, which takes values from 0 to 1, can be partitioned into several ranges, and different ranges correspond to different levels of acceptability, hence price. For example, in Figure 2.2 we show that the error-rate of an IC product has been partitioned into three ranges. Here, r th1 and r th2 are two threshold points, where r th2 > r th1 . If the error-rate of a chip is equal to or greater than r th2 , it is of Type III and will be discarded. If the chip has failed the classical go/no-go test and its error-rate is smaller than r th1 , it is of Type I. Otherwise it is of Type II. Type I chips warrant a higher price than those of Type II. In this example, to classify all chips into three types, they can be first classified into Type III and non-Type III. Then, for non-Type III chips, they can be further classified into Type I and Type II. These two steps of classification are algorithmically. 40 0 1 Error-rate Type I Type II Type III r th1 r th2 Figure 2.2: Partition of error-rate (chips) into three ranges (Types) To explain the method of classification, assume there is only one threshold point r th . If the error-rate of a faulty chip is smaller than r th , it belongs to Type A; otherwise it belongs to Type B. All chips are partitioned into Type A and Type B. The task of the classification is simple. First, values for L and S are selected, which is discussed later. Once selected, L and S are fixed and used for all chips. Then, for each CUT, one test experiment is applied and the estimated error-rate is obtained. The chip is classified based on this estimated error-rate. Namely, if r ˆ < r th then the chip is classified as Type A, otherwise it is Type B. Because the estimated error-rate is a random variable, it is possible that chips are classified incorrectly. To investigate this possibility, two hypotheses are defined. H0: the chip is Type B, i.e. its true error-rate r ≥ r th ; H1: the chip is Type A, i.e. its true error-rate r < r th . The error-rate estimation of a chip has four possible outcomes in terms of classifications, namely 1) the chip is Type B and declared to be Type B; 2) the chip is Type B but declared to be Type A; 3) the chip is Type A and declared Type A; and 4) the chip is Type A but declared Type B. Clearly, outcomes 1) and 3) are correct, and 2) and 4) are wrong. For example, consider a chip whose actual error-rate is r th , and is thus of Type B. However, there is a 50% probability that its estimated error-rate is less than r th . Thus, there is a 50% probability of making mistake. Outcome 4) means higher price 41 chips are sold at a lower price, whereas outcome 2) means lower price chips are sold at a higher price and the customer may be disappointed. Outcome 4) is likely to be acceptable to a vendor and of course to the customer, while 2) is not. These results are also shown in Table 2.5. In the following, P te denotes the probability that outcome 2) occurs. Table 2.5: Outcomes of hypothesis test H0 true H1 true Declared to Type A Outcome 2) causes test escape Outcome 3) Declared to Type B Outcome 1) Outcome 4) causes yield loss Note: test escape and yield loss are discussed in Section 2.6 The requirement for classification is limiting the value of P te , i.e. P te ≤ α, where α is specified by the vendor. Consider chips of Type B. Because all chips are tested using the same value of L and S, chips with different error-rates have a different value of P te . It is obvious that the higher the true error-rate, the lower the probability that outcome 2) occurs. This is shown in Figure 2.3. Figure 2.3(a) indicates the PDF of the estimated error-rate for a chip whose true error-rate is r th , and Figure 2.3(b) corresponds to the case where the true error- rate is greater than r th . For each case, P te is the area under the PDF curves and to the left of r th . So a chip whose true error-rate is r th represents the worst case. 42 r=r th P fa r th P fa r (a) (b) Figure 2.3: PDF curve of estimated error-rate r ˆ . P te is the area under this curve and to the left of r th : (a) Corresponds to a chip with true error-rate of r th ; (b) Corresponds to a chip with true error-rate greater than r th Consider only chips with error-rate r = r th . If the threshold point is kept at r th , P te is always 50%. To reduce this value for this case, one could artificially define a new threshold that is smaller than r th . With this new threshold, a chip is classified as Type B if the estimated error-rate is greater than or equal to this new threshold. For this new threshold, P te <α. Such a threshold is not unique. As shown in Figure 2.4, the shaded part of the curve has an area of α, and any point to the left of r thn can be the new threshold. However, the maximal value (r thn ) is the best one because it minimizes the probability that outcome 4) occurs. Here r thn is formally defined as: r thn = max{ artificial th r | Prob( artificial th r r < ˆ ) ≤ α when r = r th }. The new classification criterion is: if the estimated error-rate is greater than or equal to r thn , it is declared to be Type B, otherwise Type A. Due to this new criterion, chips with error-rate in the range of r thn to r th are actually Type A, but some may be classified as Type B. 43 r th r thn α α α α Figure 2.4: PDF curve of estimated error-rate r ˆ for the case of true error- rate th r r = . Because Prob( artificial th r r < ˆ ) increases as artificial th r increases, r thn satisfies the equation Prob( thn r r < ˆ ) = α. To compute Prob( thn r r < ˆ ), the PDF of r ˆ is needed. However, because r ˆ is derived from F/S via (2.1) and the PDF of F/S is a normal distribution, Prob( thn r r < ˆ ) can be computed using the PDF of F/S, i.e., from (2.3). From (2.2), when thn r r = ˆ , L thn r S F ) 1 ( 1 / − − = . So α = Prob( thn r r < ˆ )=Prob( L thn r S F ) 1 ( 1 / − − < ) − − − − − − = S r r r r Q L th L th L th L thn / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( . Thus, 2 2 1 ) ) 1 ( ) 1 (( ) ) 1 ( 1 ( ) 1 ( )) ( ( L th L thn L th L th r r r r Q S − − − − − − = − α (2.8) In (2.8), usually α and r th are given as part of the test specification. So (2.8) describes the relationship among test configuration parameters L, S and r thn . Similar to the situation discussed in Section 2.2, we have to choose appropriate values for L and S. The difference is that there is a new unknown quantity r thn . Because all chips when tested use 44 the same value of L and S, the fault-free signature can be stored on-chip. So it is critical to make S small. From previous discussion, L can be selected as 1/r thn , and hence S is uniquely determined by r thn . Because S decreases as r thn decreases, it is straightforward to make S smaller by choosing a smaller value for r thn . For example, assume α=0.05, r th = 0.1, r thn varies from 0.010 to 0.099 and L = 1/r thn . Figure 2.5 shows the relationship between S and r thn . Here we see that for r thn = 0.09, only about 350 signatures need to be stored. However, there is an important cost tradeoff between S and r thn . Even though small values of S save storage cost, small values of r thn increase the probability of outcome 4), and hence a loss in revenue. This tradeoff should be considered as part of the overall test methodology. Based on selected values of L, S and r thn , the probability of making erroneous classifications can be computed. Table 2.6 shows the formulas to compute these probabilities for three different cases. Table 2.6: The probability of making erroneous classifications Probability True error-rate r is smaller than r thn but r ˆ ≥ r thn (Type A and r < r thn but classified as Type B) − − − − − − S r r r r Q L L L thn L / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( True error-rate r is in the range [r thn , r th ] but r ˆ ≥ r thn (Type A and r thn < r < r th but classified as Type B) − − − − − − − S r r r r Q L L L L thn / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( 1 True error-rate r is not less than r th but r ˆ < r thn (Type B and r > r th but classified as Type A) − − − − − − S r r r r Q L L L L thn / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( 45 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 50 100 150 200 250 300 350 400 r thn S Figure 2.5: Plot of S vs. r thn according to (8) when r th =0.1, L=1/ r thn , and α=0.05 In practice, it is not always necessary to complete all S test sessions before making a classification. In fact, when this occurs the classification is almost always correct. For example, if the true error-rate of a chip is much smaller than r thn , almost all sessions will pass a test when using L = 1/r thn . Thus applying all S test sessions is really not necessary. Similarly, if the true error-rate is much larger than r thn , then almost all test sessions will fail, and again it may not be necessary to apply all test sessions. Next we discuss how to identify the minimal number of test sessions for a chip with error-rate r, denoted by S ms , required to make a final classification. To determine S ms , we apply the following conditions: C1: If the error-rate r is greater than or equal to r thn , S ms is the minimal value of S that satisfies α ≤ − − − − − − S r r r r Q L L L L thn / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( . 46 C2: If r < r thn , then S ms is the minimal value of S that satisfies β ≤ − − − − − − S r r r r Q L L L thn L / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( , i.e., the probability of classifying a chip having an error rate smaller than r thn as Type B is less than β. If β is chosen to be equal to α, then α ≤ − − − − − − S r r r r Q L L L thn L / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( . Both − − − − − − S r r r r Q L L L thn L / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( and − − − − − − S r r r r Q L L L L thn / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( decrease as S increases. So S ms satisfies α = − − − − − − ms L L L thn L S r r r r Q / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( if thn r r < , or α = − − − − − − ms L L L L thn S r r r r Q / ) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 ( if thn r r ≥ . On the other hand, the number of applied test sessions cannot exceed the value of S obtained from (2.8). So, in practice, the necessary number of test sessions to classify a chip with error-rate r is equal to min{S obtained from (2.8), S ms }, and is denoted by * ms S . Consider a test specification r th =0.1, r thn =0.09, α=0.05. Then L is chosen as 1/r thn = 11. From (2.8), S = 354. Figure 2.6 shows the dependence of * ms S on r. It can be seen that * ms S drops quickly as r moves away from the range r thn to r th . For example, for r = 0.06, only about 25 test sessions are needed before a classification can be made. This can dramatically save test time. Though the curve in Figure 2.6 suggests that for certain values of r, S can be very small, in practice this is not the case. These results are based 47 on certain statistical characteristics of distributions, and the underlying assumptions may not hold true if S is too small. 0 0.05 0.1 0.15 0.2 0 50 100 150 200 250 300 350 400 r S ms * r thn Figure 2.6: Plot of * ms S vs. r when r th =0.1, r thn =0.09 and α=0.05 To implement this scheme, a test experiment in now partitioned into test phases, where a test phase consists of one or more test sessions. There is no restriction on how many test sessions each test phase can have. However, the first test phase must have enough test sessions to make the statistics meaningful. After each phase is executed, r ˆ is computed based on the number of test sessions, S’, applied so far, and the number, F, that failed. If thn r r < ˆ , then, − − − − − − ' / ) ) ˆ 1 ( 1 ( ) ˆ 1 ( ) 1 ( ) ˆ 1 ( S r r r r Q L L L thn L is computed to see if it is less than α. If so, this chip is classified as Type A, otherwise another test phase is executed. If 48 thn r r ≥ ˆ , then − − − − − − ' / ) ) ˆ 1 ( 1 ( ) ˆ 1 ( ) ˆ 1 ( ) 1 ( S r r r r Q L L L L thn is computed to see if it is less than α. If so, this chip is classified as Type B, otherwise another test phase is executed. 2.5 Experimental Results To better illustrate our classification techniques, we applied them to the benchmark circuit C880. Since no data relating defects to erroneous output values exists, we used the concept of single stuck-at faults to model defects. There are 1760 single stuck-at faults in C880, so we let each fault define a different faulty circuit. We set r th = 0.02, r thn =0.019, α=0.05, and L = 1/r thn = 53. From (2.8), the maximal number of test sessions is 1685. To classify a faulty circuit, 20 test sessions are applied during the first testing phase. All subsequent phases, if needed, consist of a single test session. After each phase, the estimated error-rate is computed, and the classification conditions are checked. After processing each faulty circuit, we determine the total number of test sessions applied to classify the circuit and its type. Figure 2.7 shows the normalized histogram of the number of test sessions applied. It can be seen that most of the faulty circuits can be classified after applying very few test sessions. The average number of test sessions applied to a faulty circuit is 36.9, and only 7 faulty circuits needed the maximal number of test sessions of 1685. These results imply that most of the faulty circuits have estimated error-rates that are far away from the range of 0.019 to 0.02. This observation is confirmed by the normalized histogram of error-rate of these faulty circuits (see Figure 2.8). 49 From Figure 2.8, it is seen that there are very few faulty circuits that have an error- rate close to r th . But these are the circuits that require large values of S for classification. Assume that it is only feasible to store a small number of signatures, say 100, rather than the maximum possible number of 1685 that may be required. For this case, after 100 test sessions have been executed and a classification still cannot be made according to Table 5, then the chip can be classified pessimistically, i.e., as either Type B or discarded. 0 200 400 600 800 1000 1200 1400 1600 1800 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 100 900 1800 0 0.005 0.01 Figure 2.7: Normalized histogram of the number of test sessions applied. The smaller graph is the enlarged histogram for the number of test sessions between 100 and 1800. Table 2.7: Classification results from simulation Error-rate range Actual type Actual number of faulty circuits Number of faulty circuits classified erroneously 0~0.019 Type A 290 3 (classified as Type B) 0.019~0.02 Type A 12 7 (classified as Type B) 0.02~1 Type B 1458 10 (classified as Type A) 50 Figure 2.8: Normalized histogram of error-rate of 1760 faulty circuits. The smaller graph is the enlarged part of error-rate between 0.05 and 0.12. Classification results are also obtained from simulations. Table 2.7 indicates the actual number of faulty circuits and number of wrong classification in each of three disjoint ranges of actual error-rate. As seen from the first two numerical rows, there are a totally 302 faulty circuits that actually belong to Type A. Of these, 10 are misclassified as Type B. We define the percent of chips of Type A (think good) classified as Type B (think bad) as the yield loss. Essentially, these are relatively high price chips sold at a low price or discarded. The last row indicates that there are 1458 faulty circuits that actually belong to Type B, but 10 of them are misclassified as Type A. The percentage of misclassified circuits of this type is called the test escape, and quantifies the number of chips that should be either sold at a low price or discarded, but were actually sold at the high price. In this study, the yield loss is 3.31%, and the test escape is 0.7%. 51 Of course, the value of yield loss and test escape can be easily changed by changing the value of r thn and α. Classically, test escapes refer to defective (bad) chips classified as good ones, and are the major contributor to the concept of defects per million (DPM). The IC industry often attempts to achieve a DPM of about 100. Our case is somewhat different in that all chips are defective, and our misclassification deals with those cases where the error-rate is very close to a threshold value. 2.6 Conclusions In this chapter, we have addressed several testing problems dealing with error-rate estimation. The test methodology is based on signature analysis, and the hardware needed to support this technique can be either fully or partly built into the circuit-under- test. We first addressed the challenge of determining a mathematical framework for estimating the error-rate of a defective circuit block to within a given accuracy, μ, and level of confidence, γ. Our test strategy consists of executing S test sessions, each having L test patterns, where each test session either passes or fails based upon whether or not the correct signature is produced. We denote the number of sessions that fail by F. We have been able to derive analytic equations that capture the relationship among all of these quantities, in addition to the value of the estimated error-rate and the actual unknown error-rate. As a special case, by setting L = 1, we can determine an upper- bound on the minimal number of test patterns, namely S in this case, needed to accurately estimate the error-rate of a circuit. We have also derived a formulation for numerically 52 determining the value of L, such that S (L Smin and S min in the text), is minimal--this represents a minimal storage cost solution. A small modification to this formulation results in new values for L and S (L e and S e in the text), where S e is a little larger than S min , L e *S e is about 30% less than L Smin * S min , and L e ≈ 1/r. Assume L e is chosen as the value of L, then the storage required for our method is about r times the storage needed for the Bernoulli process. Thus, as r becomes smaller, our method saves more storage. We next addressed the problem of determining a procedure (see Procedure 1 in the text), to actually estimate the error-rate of a given circuit. This procedure involves carrying out the test process described above in an iterative fashion. In each iteration, a better estimate of the actual error-rate is determined, new values for S and L are computed, and another test experiment is executed. Based upon experiments, the procedure appears to converge within a very few (3 to 4) iterations. This error-rate estimation approach can also be used on each output line individually to determine the error-rate of each output line. We finally addressed the main application area for this work, namely assigning defective chips to bins based on their error-rate. An elegant methodology is presented that explains the types of misclassifications that can occur, and how the probability of these mistakes occurring can be reduced. We introduced concepts analogous to yield loss and test escape, and show how the value of these test attributes can be controlled. As noted in Section 2.4, most misclassifications occur when the actual error rate of a defective chip is near the value of the threshold, say r th , which separates one 53 classification from another. Given a new product, one could take a large sample of faulty chips and using techniques discussed above, partition these chips into two categories, namely those in a Category 1 that appear to have an error-rate in the range r th (1± μ), and those that do not (Category 2). If the fraction of faulty chips in Category 1 is large, then a large value of S will be required, and the storage requirements on all chips will be relatively large. On the other hand, if this fraction is small, then a much smaller value of S can be used. Chips that cannot be classified after S test sessions can be assigned to the lower quality bin. Thus classification can often be done quite efficiently. 54 Chapter 3 Error-rate Estimation Using Ones Counting 3.1 Introduction The error-rate estimation using ones counting is similar to that using signature analysis. The test frame is again based on a BIST architecture. As distinct from signature analysis, the response compressor is implemented via a counter, which counts the number of ones in the counter’s input sequence. Because the counter has a single input line, the ones counting technique is initially developed to estimate the error-rate of a single output line. This approach using ones counting was first proposed by Shahidi and Gupta [29]. Unfortunately, their analysis implied that error-rate estimation via ones counting used significantly more resources than that required via signature analysis. As will be explained later, their results are counter-intuitive. Subsequently we developed a new analysis for this ones counting technique and a new estimator of error-rate. The results from our analysis show that ones counting and signature analysis techniques use comparable resources for obtaining the same degree of accuracy in error-rate estimation. After the success of our work on the error-rate estimation using ones counting for single output line, we extended this technique to multiple output circuit. The extended work is successful to estimate error-rate of multiple output circuit when the difference between the number of ones in the observed output and that in the correct output is not a multiple of 4. Error-rate estimation for the case where the difference is a multiple of 4 is still unsolved. 55 This chapter is organized as follows. Section 3.2 reviews the ones counting error-rate estimation work presented in [29]. In Section 3.3, we present our new analysis of this problem. A new estimator is proposed and its statistical characteristics are presented. Several special cases of using the ones counting technique are described in Section 3.4. In Section 3.5, simulation results are presented. The problem of chip classification based on ones counting error-rate estimation is discussed in Section 3.6. Section 3.7 presents the experimental results of chip classification. In Section 3.8, the extended work on error-rate estimation of multiple output circuits using ones counting is presented. Section 3.8 concludes this chapter. 3.2 Review of Previous Work The error-rate estimation techniques proposed in [7], [10], [28] and [29] are illustrated using a BIST architecture, though a scan architecture is equally applicable. The register driving the CUT is a PRPG, which is usually implemented via a linear feedback shift register (LFSR). In [28], the output of the CUT is compressed using another LFSR, while in [29] a nonlinear finite state machine is used to obtain the ones counting in the output sequence. When both circuits operate in their test mode, a sequence of L test patterns are generated and applied to the CUT. For the ones counting methodology, the ones count in the output sequence is recorded and then compared with the pre-computed ones count of the fault-free circuit using the same L input patterns. The difference, D i , is stored. For the signature analysis technique, the final signature is compared to the pre- computed correct signature. If they are the same the test is said to pass; otherwise it fails. 56 This process of applying L patterns and determining the value of D i , or pass/fail, is referred to as a test session. The process for estimating error-rate consists of carrying out S test sessions, each using a different subset of test patterns from the space of all test patterns (see previous chapter). Carrying out these S test sessions results in S differences, denoted by D 1 , D 2 , …, D s . It is assumed that all test patterns are randomly generated and that the total number of patterns, namely SxL, is a small fraction of the input pattern space. Thus these S numbers are independent and have the same distribution. The sample mean M D and sample variance V D are given by the equations = ∑ = S i i D D S M 1 1 and ∑ = − − = S i D i D M D S V 1 2 ) ( 1 1 . Based on the sample mean and variance, the error-rate estimator, r ˆ , given in [29] is L N N L V L M r D D − − ⋅ − − − = 1 2 1 1 ˆ 2 2 (3.1) where N=2 n , and n is the number of input pins of the CUT. Again, N>>L. The selection of L and S depends on what objective is to be satisfied. In [29], the authors analyzed the problem of using this ones counting technique to classify chips according to their estimated error-rate. Let r th represent an error-rate threshold value, usually specified by an application, ε a margin of error allowed in the value of an estimated error-rate, and γ the confidence associated with this estimation. If the estimated error-rate, r ˆ , of a CUT is smaller than r th , the CUT is accepted, otherwise it is rejected. The following requirements are set for the estimation of error-rate. For a CUT with actual error-rate r, 57 a. If r < r th (1−ε), the probability of accepting the CUT is greater than or equal to γ. That is Prob( r ˆ <r th ) ≥ γ if r < r th (1−ε). b. If r > r th (1+ε), the probability of rejecting the CUT is greater than or equal to γ. That is Prob( r ˆ >r th ) ≥ γ if r > r th (1+ε). c. If r th (1−ε) ≤ r ≤ r th (1+ε), the CUT may be accepted or rejected. In [29] the authors discussed lower bounds for L and S such that the above requirements are satisfied. Their analysis showed that the lower bound for the number of test patterns per test session, L, is 2 ) )( 1 ( 2 1 ε γ th r − , and the lower bound for the number of test sessions, S, is 2 2 2 2 ) 2 / )( 1 ( 2 1 ε ε ε γ th th th r r r + + − . Assume r th =0.01, ε=0.1 and γ=0.9772. The lower bound of L is about 2.19E+7 and the lower bound of S is about 2.23E+7. Because the ones counting technique requires knowledge of the ones counting for the fault-free circuit in order to compute D i , S values must be stored, which results in an exorbitant amount of overhead. Employing the signature analysis technique for the same requirements as above, only 111 test patterns are needed for each of 584 test sessions. These results are counter- intuitive from the following point of view. In signature analysis one only gets a binary decision from each test session, namely pass or fail. Ignoring the issue of aliasing, this decision is always correct, but the fail can be caused by 1, 2, … up to L erroneous responses. Hence a great deal of information is lost per failing test session. On the other hand, for ones counting, one determines an actual observed ones counting, O i , of 0, 1, 58 2, …, L. Then the recorded value D i is i i O O − * , that can take on the values –L, − (L−1), …, −1, 0, 1, 2, 3, …, L, where * i O is the ones counting for the fault-free circuit. Hence there appears to be a much larger amount of information in D i than in pass/fail. In fact, if D i ≠0, then for sure the test has failed. But if D i =0, the test may have passed or failed. Thus aliasing plays a big role in ones counting. However, it seems like the value of getting a non-binary result should more than compensate for the problem of aliasing. In addition, if L=1 then no aliasing is possible, so if D i = 0 the response is correct, and if D i ≠ 0 it is incorrect. Hence, for a large enough value of S, the error-rate can be accurately estimated, and thus the lower bound on L is 1. Thus it is useful to reexamine the analysis of error-rate estimation via ones counting. To more fully characterize the statistical properties of the estimator r ˆ , its mean and variance should be computed. The mean of the estimator will be close to the true error- rate when the estimator is biased, and equal to the true error-rate when the estimator is unbiased. The variance of the estimator is not directly calculated in [29]. Instead, it is implied to be equal to the variance of another random variable (namely S e in [29]), which is not always true. In our work, we developed an analysis of the ones counting technique that is different from the analysis provided in [29]. Our analysis more accurately describes the characteristics of ones counting technique for the purpose of error-rate estimation. We also compared this ones counting technique with the previously published work using signature analysis. The comparison shows that the ones counting technique is able to 59 estimate error-rate as effectively as the signature analysis technique for single output circuit, and the hardware overheads for both techniques are comparable. 3.3 Statistical Analysis Consider a single output combinational circuit, C, and a faulty version of this circuit, C f . In response to each input pattern, the output of C f can be classified into one of four types, namely, 1/1, 0/1, 0/0 and 1/0, where 1/1 means that the output of both C and C f are 1; 0/0 means that the output of both C and C f are 0; 0/1 means the that the output of C is 0 and C f is 1; and 1/0 means that the output of C is 1 and C f is 0. If all possible N input patterns are applied to C f , a sequence of N outputs consisting of the four types of outputs is generated. The error-rate r of C f is the ratio of the number of outputs of type 0/1 plus type 1/0 to N. During a single test session, the observed ones count is equal to the number of type 1/1 and 0/1 outputs (type x/1). The result is subtracted by the correct ones counting of C, which is equal to the sum of type 1/1 and 1/0 outputs of C f . This difference, denoted by D, represents the difference between the ones count generated by C f and the ones count generated by C. It also equals the number of type 0/1 outputs minus the number of type 1/0 outputs of C f . A complete test consists of S test sessions. Thus, recorded are S numbers, each of which is the value of D for a test session. The error-rate is estimated according to these S numbers. Imagine that N possible outputs define a collection having four types of symbols. In each test session, we choose L outputs without replacement from the collection. From the 60 selected outputs, we do not ascertain the number of type 0/1 or 1/0 outputs, but the difference between the number of 0/1 and 1/0 outputs. After a test session is finished, we put these outputs back into the collection. S test sessions are carried out, resulting in S numbers. From these S numbers, we estimate the fraction of outputs in the collection that are either of type 0/1 or 1/0. Let p 1 be the fraction of 0/1 outputs in the collection, and p 2 the fraction of 1/0 outputs in the collection. Thus, 2 1 p p r + = . p 1 , p 2 and r are all positive and less than 1. An oracle knows p 1 and p 2 . We wish to estimate p 1 and p 2 since the estimated error-rate equals the sum of the estimated values of p 1 and p 2 . Assume the L outputs are drawn one at a time, and we have a counter, initialized to 0. If a type 0/1 output is drawn, the value of the counter is increased by 1; if a type 1/0 output is drawn, the value of the counter is decreased by 1; if a type 1/1 or 0/0 output is drawn, the state of the counter is not changed. The final state after L outputs are chosen, i.e., a test session, is just the state of the counter, say D. Because outputs are drawn randomly, D is a random variable. In the above process, it is implied that we are sampling without replacement. Because N is assumed to be large with respect to L, which is usually the case in practice, the change in the fraction of each type of output in the remaining collection after each of the L outputs is selected is very small and can be ignored. So we will treat this process as sampling with replacement. Thus, for each drawing, the probability that the counter increases by 1 is p 1 , the fraction of 0/1 outputs; the probability that it decreases by 1 is p 2 , the fraction of 1/0 output; and the probability that it does not change is 1− p 1 − p 2 . Let X 61 be a random variable such that X = 1 with probability p 1 , X = −1 with probability p 2 , and X = 0 with probability 1− p 1 −p 2 . Thus D = X 1 +X 2 +…+X L , where X 1 , X 2 , …, X L are identically and independently distributed (i.i.d.) random variables with the same distribution as X. From the probability density function (PDF) of X, we see that Expectation: 2 1 } { p p X E − = ; Variance: 2 2 1 2 1 ) ( } { p p p p X Var − − + = . Thus the expectation and variance of D are Expectation: } ... { } { 2 1 L X X X E D E + + + = ) ( } { 2 1 p p L X E L − = ⋅ = (3.2) Variance: } ... { } { 2 1 L X X X Var D Var + + + = ) ) ( ( } { 2 2 1 2 1 p p p p L X Var L − − + = ⋅ = (3.3) From the test process we obtain S samples of the random variable D, namely D 1 , D 2 , …, D S . The sample mean M D and the sample variance V D are defined by the equations ∑ = = S i i D D S M 1 1 and ∑ = − − = S i D i D M D S V 1 2 ) ( 1 1 . We intend to estimate the two parameters, p 1 and p 2 , in the distribution of D. A generic method for building an estimator is based on approximation of the moments of the random variable [13]. The first order moment of a random variable is its expectation, which is approximated by the sample mean. The second order moment is its variance, which is approximated by the sample variance. Thus, we have D M p p L D E ≈ − = ) ( } { 2 1 (3.4) 62 and D V p p p p L D Var ≈ − − + = ) ) ( ( } { 2 2 1 2 1 . (3.5) From (3.4) and (3.5) we solve for p 1 and p 2 , and obtain 2 2 1 2 2 2 L M L V L M p D D D + + ≈ and 2 2 2 2 2 2 L M L V L M p D D D + + − ≈ , and hence 2 2 2 1 L M L V p p D D + ≈ + . Because the error-rate is equal to the sum of p 1 and p 2 , the estimated error-rate, r ˆ , is 2 2 ˆ L M L V r D D + = , (3.6) which is also called the estimator of error-rate. r ˆ is a function of S samples of D. So the estimator itself is a random variable. To evaluate the performance of an estimator, its expectation and variance need to be computed. If the expectation is equal to the true value of the estimated quantity, the estimator is unbiased; otherwise it is biased. If the estimator is biased, the difference between the expectation of the estimator and the true value of the estimated quantity is of interest. Smaller differences imply better estimators. The variance of the estimator represents how close the estimation is to the expectation of the estimator. A large variance means the PDF is somewhat flat and the estimation result is likely to be poor. A small variance implies that the PDF is narrow around its expectation, and the estimation result is likely to be close to the expectation of the estimator. With the expectation and variance of the estimator, we are able to approximate the PDF of the estimator. If the type of distribution of the estimator is known, the PDF of the estimator can be expressed explicitly. If the type of distribution is unknown, as it is 63 for the error-rate estimator, the PDF of the estimator is usually approximated by a normal distribution, which is a function of the expectation and variance of the estimator. The procedure for deriving the expectation and the variance of the estimator r ˆ is given in Appendix A. The expectation of the estimator is { } SL p p p p p p L M L V E r E D D ] ) ( ) [( ) ( } { ˆ 2 2 1 2 1 2 1 2 2 − − + + + = + = (3.7) and its variance is { } { } 2 2 / / ˆ L M L V Var r Var D D + = 2 3 0 3 0 2 0 2 2 2 4 2 2 2 2 2 2 2 4 3 3 2 LS LS S S α α α α α α α α α α + + + − + + = 3 3 2 2 4 3 2 2 2 4 3 2 2 2 2 2 2 4 3 0 3 3 3 4 S L S L LS S L α α α α α α α α α − + − + − − + + (3.8) where 2 1 0 p p − = α , 2 0 2 1 2 α α − + = p p , 3 0 0 2 1 0 3 2 ) ( 3 α α α α + + − = p p and 4 0 2 0 2 1 2 1 4 3 ) 4 ) ( 6 ( ) ( α α α − − + + + = p p p p . The true value of the error-rate r to be estimated is p 1 +p 2 . Equation (3.7) shows that the estimator is biased. However, when SL is large, the terms with SL can be ignored and the estimator becomes unbiased. Later we show that for the problems addressed in this thesis, SL is large, and for these cases { } r p p r E = + ≈ 2 1 ˆ . We use the normal distribution { } { } ( ) r Var r E N r ˆ , ˆ ˆ to approximate the distribution of the estimator. Thus the PDF of the estimator can be expressed as ( ) { } ( ) { } r Var r r r e r Var r P ˆ 2 ˆ ˆ 2 ˆ 2 1 ˆ − − = π . (3.9) 64 We are interested in having the estimated error-rate be within a certain range of accuracy, say [ ) 1 ( ε − r , ) 1 ( ε + r ], with confidence not less than γ, where 0<ε <<1 and γ is between 0 and 1. Thus, we require ( ) { } ( ) ( ) ∫ ∫ + − − − + − = ≤ ) 1 ( ) 1 ( ˆ 2 ˆ ) 1 ( ) 1 ( ˆ ˆ ˆ 2 1 ˆ ˆ 2 ε ε ε ε π γ r r r Var r r r r r r d e r Var r d r P { } { } ∫ − − = r Var r r Var r t dt e ˆ / ˆ / 2 / 2 2 1 ε ε π ( ) ( ) r Var r Q ˆ / 2 1 ε − = (3.10) where the function Q is defined as ( ) ∫ ∞ − = x t dt e x Q π 2 / 2 / 2 (see Appendix B). Changing the form of (10), we have ( ) { } r Var r Q ˆ / 2 / ) 1 ( 1 ε γ ≤ − − (3.11) { } r Var ˆ is a function of r (=p 1 +p 2 ), p 1 −p 2 , S and L. Thus, (3.11) includes six quantities, namely ε, γ, r, p 1 −p 2 , S and L, where S and L are unknown and ε and γ are given as part of the test specifications. To determine values for S and L, which are test parameters for carrying out error-rate estimation, some additional constraints and/or objective functions are needed. Referring back to our VLSI test problem, some quantities of interests are listed next. a) Minimize SxL, which primarily determines the total test time; b) Minimize S, which primarily determines the storage cost for the correct ones counting; c) Minimize c 1 SxL+c 2 SxL, which is the weighted cost for both test time and storage cost, where c 1 and c 2 are both non-negative cost coefficient. 65 Referring back to (3.11), r is the true error-rate only known to an oracle, but we can guess a value for r and refine our guess once we have a value for the estimator. The quantity (p 1 −p 2 ) is also unknown, but again we can attempt to approximate it. Because the approximations are different for various situations, we will deal with these issues in the next section entitled case studies. 3.4 Case Studies In [29], the number of test patterns per session, L, has a lower bound. However, our analysis shows that any positive integer is feasible for L. First we consider two cases, namely L=1 and 1/L≈0 (i.e., L is very large). From (3.11) and (3.8) we see that S and L are inversely proportional to each other. So L=1 results in an upper bound for S, and 1/L≈0 results in a lower bound of S. Then we consider the symmetric case of p 1 =p 2 >0. Lastly, we consider the general case. In the following (except in Section 3.4.4), we assume γ=0.9972 and hence ( ) 3 2 / ) 1 ( 1 = − − γ Q . Then (3.11) reduces to { } 9 / ˆ 2 2 ε r r Var ≤ (3.12) 3.4.1 Case 1: L=1 From (3.8), we have { }= r Var ˆ 3 2 2 4 / ) 9 2 ( S α α − 2 4 3 0 2 2 / ) 6 2 ( S α α α α + + + S / ) 2 4 ( 3 0 2 0 2 2 2 4 α α α α α α + + − + . As 2 1 0 p p − = α , we have = ≤ 4 0 0 α 4 2 1 ) ( p p − = ≤ 2 0 α 2 2 1 ) ( p p − = ≤ | | 0 α ≤ − | | 2 1 p p 2 1 p p + 1 ≤ = r . From the definitions of 2 α , 3 α and 4 α , we know that all of 66 these terms are of the order of r. So 3 0 2 0 2 2 2 4 2 4 α α α α α α + + − , 4 3 0 2 2 6 2 α α α α + + and 2 2 4 9 2 α α − are all of the order of r. Keeping terms with S in the denominator and ignoring terms with higher orders of S, we have S r Var 3 0 2 0 2 2 2 4 2 4 } ˆ { α α α α α α + + − = , (3.13) and (3.12) can be rewritten as 9 / 2 4 2 2 3 0 2 0 2 2 2 4 ε α α α α α α r S ≤ + + − . (3.14) Replacing 2 α , 3 α and 4 α with their functions of r in (3.14) leads to ( ) 4 0 2 0 2 2 2 4 ) 2 6 ( 9 α α ε − − + − ≥ r r r r S . (3.15) Since ) ( 2 1 0 p p − = α and r are unknown, we cannot choose the value S to be ( ) ) /( 4 ) 2 6 ( 9 2 2 4 0 2 0 2 ε α α r r r r − − + − . However, we can choose S to be the maximum of ( ) ) /( 4 ) 2 6 ( 9 2 2 4 0 2 0 2 ε α α r r r r − − + − . Thus, (3.15) is satisfied and it is guarantees that the estimated error-rate is in the range of [ ) 1 ( ε − r , ) 1 ( ε + r ] with confidence γ. For r ≤ 1/3, which is typically the case, 0 2 0 = α maximizes 4 0 2 0 2 4 ) 2 6 ( α α − − + − r r r . So we choose S to be ( ) 0 4 0 2 0 2 2 2 2 0 4 ) 2 6 ( 9 = − − + − α α α ε r r r r , i.e., 2 1 ) 1 ( 9 ε r r S C − = . (3.16) For example, for r=0.01, ε=0.05 and L=1, we choose S to be 3.6E+5. When 1/3<r<1, 4 / ) 1 3 ( 2 0 − = r α maximizes 4 0 2 0 2 4 ) 2 6 ( α α − − + − r r r . So S can be chosen as 67 ( ) 4 / ) 1 3 ( 4 0 2 0 2 2 2 2 0 4 ) 2 6 ( 9 − = − − + − r r r r r α α α ε , i.e., 2 2 2 4 ) 1 2 5 ( 9 ε r r r + − . For r=0.4, ε=0.05 and L=1, we choose S to be 5625. 3.4.2 Case 2: 1/L ≈ 0 For the case of 1/L≈0, not only do we assume that L is very large, but also that L>>S, in which case we ignore all terms in (3.8) with an L, and obtain 2 2 2 2 2 3 2 } ˆ { S S r Var α α + = . (3.17) Ignoring the term with S 2 and replacing 2 α with its function of r and 0 α , (3.17) reduces to { } S r r Var / ) ( 2 ˆ 2 2 0 α − = . From (3.12) and (3.17), we have 2 2 2 2 0 ) ( 18 ε α r r S − ≥ . (3.18) Similar to case 1, we can choose the value of S to be the maximum of 2 2 2 2 0 ) ( 18 ε α r r − . Since the maximum value of 2 2 2 2 0 ) ( 18 ε α r r − is 2 18 ε , we choose S as 2 2 18 ε = C S . (3.19) For example, for ε=0.05, we get S C2 = 7200 and thus L >> 7200. From (3.19) we see that when L is very large, the number of test sessions required in error-rate estimation is independent of the error-rate, and only depends on the accuracy and confidence of the estimation. This follows since, if the error-rate is extremely small, then the allowable error in our estimation as specified by ε allows our test methodology to work, and for 68 larger values of error-rate, there is enough information gathered by using these values of L and S C2 to again estimate the error-rate. If we choose L=7200 and solve for S using (3.19), we get L=S. Then the condition to approximate (3.8) with (3.17) is not satisfied. In this case, we cannot use (3.19) to select S. Case 1 and Case 2 represent two extreme values of L resulting in upper and lower bounds for S. 3.4.3 Case 3: p 1 = p 2 >0 When p 1 =p 2 , the probability of observing a type 0/1 or 1/0 output are the same. Thus, the expected value of D for each session is zero, i.e., the sample mean of D, M D , is close to zero. In this case, it appears that the ones counting test methodology becomes ineffective. However, when M D =0 the estimator (3.6) reduces to V D /L. This means that the error-rate can be derived solely from the variance of D. With p 1 =p 2 , we have 0 3 0 = = α α and r = = 4 2 α α . From (3.8), we have { } 2 2 2 2 2 2 2 3 3 2 ˆ S L r r LS r r S r S r r Var − + − + + = . 3 3 3 3 3 2 3 2 2 3 2 S L r r S L r r LS r − + − + − (3.20) Assume S is large. When compared with S r / 2 2 , 2 2 / 3 S r can be ignored. When compared with LS r r / ) 3 ( 2 − , we can ignore 3 2 / 3 LS r , 2 2 2 / ) ( S L r r − , 3 2 2 / ) 3 ( S L r r − and 3 3 2 / ) 3 ( S L r r − . Thus, (3.20) reduces to { } ( ) S L r r r r Var / ) 3 ( 2 ˆ 2 2 − + = . (3.21) From (3.12) and (3.21), we have ( ) 9 / / ) 3 ( 2 2 2 2 2 ε r S L r r r ≤ − + , i.e., 69 − + ≥ ) 3 1 ( 1 2 9 2 3 r L S C ε . (3.22) Any positive integer is legitimate for L. When L=1, ) /( ) 1 ( 9 2 ε r r S − = . When L is very large, 2 / 18 ε = S . These results for S are consistent with those derived for Cases 1 and 2. Figure 3.1 shows the relationship between S and L for different error-rates and 05 . 0 = ε . Different from signature analysis, S decreases as L increases. As discussed before, the recorded number D is an approximation of L(p 1 -p 2 ). Dividing D by L results in the approximation of p 1 -p 2 . As L becomes bigger, D/L is more accurate to approximate p 1 -p 2 and each session provides more accurate approximation for p 1 -p 2 , on which the error-rate estimator is based. Thus, less test sessions are needed. 0 200 400 600 800 1000 1200 3.5 4 4.5 5 5.5 6 6.5 Number of Test Vectors per Session Log10(Number of Test Sessions) From top to bottom, the curves correspond to the cases of r =0.001, r=0.005, r=0.01, r=0.05 and r=0.1 Figure 3.1: S vs. L based on Eq. (3.22) for different error-rates and 05 . 0 = ε . The points marked by an ‘X’ correspond to the selection of S and L that minimize SxL. Consider the case of a fault in a single-output XOR circuit that causes half of all outputs to be wrong. Among those erroneous outputs, half are of type 0/1 and the other 70 half of type 1/0. So p 1 =p 2 =1/4 and r =1/2. This provides an example of Case 3. If the output line is stuck-at 0, then again r =1/2 but now all errors are of type 1/0. To see how the variance is instrumental in determining the estimated value of error- rate, again consider a circuit such that a randomly selected half of all possible input patterns map into 1, and the other half map into 0. Now consider a faulty version of this circuit, where again a randomly selected half of all possible input patterns map into 1, and the other half map into 0. So the true error-rate of the faulty circuit is 1/2. Now, for both the good and faulty circuits, the average value of the ones counting is L/2. Thus, the expected value of D (the difference of ones counting between good circuit and faulty circuit) is zero. In addition, for the faulty circuit, the probability of observing a 0/0, 0/1, 1/0 or 1/1 type response is 1/4, i.e., p 1 =p 2 =1/4. Assume L=4. For each session, the possible values of D are –4, –3, –2, –1, 0, 1, 2, 3 and 4. If D=–4 for a session, then all four outputs are 1/0 type. Thus, the probability of D=–4 is (1/4) 4 = 1/256. If D=–3 for a session, then three outputs are 1/0 type and one output is either 0/0 or 1/1 type. Thus, the probability of D=–3 is 4x(1/4) 3 x(1/2)= 1/32. Similarly, we can computer the probability of D being –2, –1, 0, 1, 2, 3 or 4. As a result, the probabilities of D being –4, –3, –2, –1, 0, 1, 2, 3 and 4 are 1/256, 1/32, 7/64, 7/32, 35/128, 7/32, 7/64, 1/32 and 1/256, respectively. With the distribution function of D, we have the expectation of D to be zero and the variance to be 2. From the expectation and variance of D, we know that the sample mean of D, M D , is about zero and the sample variance of D, V D , is about 2. From the estimator 2 2 / / L M L V D D + , we obtain the estimated error-rate to be 1/2, which matches the true error-rate. 71 3.4.4 General Case For the general case, we make no assumptions of the values of p 1 , p 2 and γ. However, we assume S is large. To make our discussion clear, we show a copy of (3.8) below. { } { } 2 2 / / ˆ L M L V Var r Var D D + = 2 3 0 3 0 2 0 2 2 2 4 2 2 2 2 2 2 2 4 3 3 2 LS LS S S α α α α α α α α α α + + + − + + = 3 3 2 2 4 3 2 2 2 4 3 2 2 2 2 2 2 4 3 0 3 3 3 4 S L S L LS S L α α α α α α α α α − + − + − − + + . Because S is large, the term 2 2 2 / 3 S a can be ignored when compared to S a / 2 2 2 . For the same reason, the terms 2 3 0 / 2 LS α α , 2 2 2 2 4 3 0 / ) 4 ( S L α α α α − + , 3 2 2 / 3 LS a , 3 2 2 2 4 / ) 3 ( S L a − α and 3 3 2 2 4 / ) 3 ( S L a − α can be ignored when compared to LS / ) 2 4 3 ( 3 0 2 0 2 2 2 4 α α α α α α + + − . Thus, the variance of the estimator becomes { } LS S r Var 3 0 2 0 2 2 2 4 2 2 2 4 3 2 ˆ α α α α α α α + + − + = . (3.23) Replacing 2 α , 3 α and 4 α with their functions of r and 0 α , we have { } ( ) LS r r r S r r Var 4 0 2 0 2 2 2 0 6 ) 2 10 ( ) 3 ( 2 ˆ α α α − − + − + − = . (3.24) Then, (3.11) becomes ( ) ( ) ( ) 2 1 2 2 4 0 2 0 2 2 2 0 2 / ) 1 ( 6 ) 2 10 ( ) 3 ( 2 γ ε α α α − ≤ − − + − + − − Q r LS r r r S r and hence 72 ( ) ( ) ( ) − − + − + − − ≥ − L r r r r r Q S 4 0 2 0 2 2 2 0 2 2 2 1 6 ) 2 10 ( ) 3 ( 2 2 / ) 1 ( α α α ε γ . (3.25) Without knowing the value of 0 α , we choose S to be the maximum achievable value of right hand side of (3.25). It can be shown (see Appendix C) that when r is less than 1/5, which is generally the case of interest, 0 0 = α results in (3.25) being maximal. Thus we choose S as ( ) ( ) − + − = − 3 1 1 2 2 / ) 1 ( 2 2 1 r L Q S ε γ . (3.26) For γ =0.9972, (3.26) reduces to (3.22). This is expected because Case 3, from which (3.22) is derived, is a special case of Case 4. For L=1, (3.26) reduces to (3.16). For 1/L≈0, (3.26) reduces to (3.19). These results imply that the analysis of the general case is consistent with the analysis of its special cases. Setting L= ∞ allows us to find a lower bound on S. For γ=0.9972, this lower bound is given by (3.19). For ε=0.1, the lower bound is 1800. Usually ε is much smaller than 0.1 and, as ε decreases the lower bound on S increases. Thus in (3.8), where an S and LS term exist, it is appropriate to ignore terms containing S 2 , LS 2 , L 2 S 2 , LS 3 , L 2 S 3 and L 3 S 3 . This justifies the approximations used to obtain (3.24) from (3.8). Now consider minimizing test time which is proportional to SxL. Assume γ=0.9972. From (3.26), we have − + = 3 1 2 9 2 r L SL ε . When r L / 1 2 = , then 2 / ) 6 4 ( 9 ε r S − = and SxL has a minimal value of 2 / ) 3 / 2 ( 9 ε − r . When r is small, S ≈ 2 / 36 ε and (SxL) min ≈ ) /( 18 2 ε r . In Figure 3.1, the points marked by ‘X’ correspond to the selections of S and L that 73 minimize SxL. It can be seen that for different values of error-rate, the values of S are almost the same, namely 2 / 36 ε . For error-rate estimation using signature analysis [28], it is recommended to set L = 1/r, which leads to 2 / 15 ε = S and ) /( 15 2 ε r SL = . Thus we see that the ones counting technique for error-rate estimation is comparable to signature analysis in terms of total test time, and a little higher in terms of overhead cost. 3.5 Simulation In the above analysis, we approximated the probability density function of the estimator with the normal distribution { } { } ( ) r Var r E N r ˆ , ˆ ˆ . Then we developed a way to select S and L to satisfy the accuracy and confidence requirement of error-rate estimation based on this approximation. In this section, we describe our results of estimating the error-rate via simulation. By repeating the simulation process many times, we can collect a large number of estimated error-rates and compare their distribution with { } { } ( ) r Var r E N r ˆ , ˆ ˆ . The simulation is implemented as follows. 1) A random number generator generates three numbers, namely a 1 with probability p 1 , −1 with probability p 2 and 0 with probability 1− p 1 −p 2 . 2) The number of 1s and the number of –1s in a sequence of L generated numbers are counted separately. Their difference is recorded. This represents a test session. 74 3) S test sessions are carried out, and S numbers are recorded. With the estimator presented in Section 3, the error-rate p 1 + p 2 is estimated. 4) The above procedure is repeated 3000 times and results in 3000 error-rate estimations. The distribution of the estimated error-rates is generated. We use the MATLAB tool “normplot” to verify if the data is consistent with a normal distribution. Normplot displays the cumulative distribution of the data. In the plot, a superimposed line is drawn to fit the sample data. If the data are normally distributed, the plot appears linear. First, we set p 1 =0.006, p 2 =0.004, L=1 and S=1.0E+5. Figure 3.2a shows the distribution of the data and it appears to be normal. The output of “normplot” is shown in Figure 3.2b, and is fairly linear, confirming that the data has a normal distribution. Now, a normal distribution can be defined by its mean and standard deviation. From the data, we estimate the mean to be 0.01 and the variance to be 1.21E-6. So the set of error-rate data has normal distribution N(0.01, 1.21E-4). In Section 3.3, we used a normal distribution to approximate the distribution of the estimator. The mean of the estimator is given by (3.7), and results in a value of 0.01. The variance of the estimator is given by (3.8), and results in a value of 1.23E-6. So in our analysis, we would use N(0.01, 1.23E-6) to be the distribution of the estimator. Thus our analytical results closely match the simulation results. Next, we choose different values of L and S while keeping p 1 =0.006 and p 2 =0.004. Figure 3.3 shows the distribution of the estimated error-rate data from simulation and the normal distribution test from “normplot” for the case where L=1000 and S=2000. L 75 =1000 is about 10 times 1/r. For ε=0.1, the lower bound on S is 1800. S =2000 is close to the lower bound on S, and thus the case outlined in Section 3.4.2 is satisfied. The figure shows that the simulation data are normally distributed. From the data, we estimate its normal distribution to be N(0.01, 1.05E-7). From the analysis in Section 3.3, we obtain the distribution of the estimator to be N(0.01, 1.048E-7), which again is an excellent match. Finally, we consider the case where p 1 = p 2 =0.005, L=50 and S=4000. Since L is small, one would expect that for most test sessions few if any errors would occur. In this case, the estimated error-rate is mainly derived on the variance of the sample data as mentioned in Section 3.4.3. The distribution of error-rate data from simulation and the output of “normplot” are displayed in Figure 3.4, which shows the simulation data has a normal distribution. The distribution function of the data is estimated to be N(0.01, 9.85E-8), which matches the distribution of the estimator from analysis, namely N(0.01, 9.85E-8). 76 0.005 0.01 0.015 0 0.01 0.02 0.03 0.04 0.05 Data from Simulation Distribution (a) (a) 0.006 0.008 0.01 0.012 0.014 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (b) (b) Figure 3.2: (a) Distribution of estimated error-rate data from simulation. (b) The output from MATLAB tool “normplot” to test whether the data are normally distributed. For this figure, p 1 =0.006, p 2 =0.004, L=1 and S=1.0E+5. 77 0.006 0.008 0.01 0.012 0.014 0 0.05 0.1 0.15 Data from simulation Distribution (a) (a) 0.009 0.0095 0.01 0.0105 0.011 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (b) (b) Figure 3.3: (a) Distribution of estimated error-rate data from simulation. (b) Output from MATLAB tool “normplot” to test whether data are normally distributed. For this figure, p 1 =0.006, p 2 =0.004, L=1000 and S=2000. 78 0.006 0.008 0.01 0.012 0.014 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Data from simulation Distribution (a) (a) 0.009 0.0095 0.01 0.0105 0.011 0.0115 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (b) (b) Figure 3.4: (a) Distribution of estimated error-rate data from simulation. (b) Output from MATLAB tool “normplot” to test whether data are normally distributed. For this figure, p 1 =0.005, p 2 =0.005, L=50 and S=4000. 79 3.6 Classification of Chips by Error-rate One application for error-rate estimation is to assign chips to bins that correspond to error-rate ranges that are defined by threshold error-rate values. Recall that a threshold separates a range into two adjacent sub-ranges. Threshold r th divides the domain of error- rates into two range (0, r th ) and (r th , 1), and consequently failing chips are partitioned into two types, A and B. The error-rate of a Type A chip is equal to or less than r th , while that of a Type B is greater than r th . Testing classifies a chip into Type A or Type B according to the estimated error-rate r ˆ . Namely, if th r r < ˆ , the chip is classified as Type A; otherwise, it is classified as Type B. Unfortunately, the random variable r ˆ can be greater than r th even though the true error-rate, r, is smaller than r th and vice versa. The chance for this to occur increases rapidly as r ˆ approaches the value r th . So the test can classify chips incorrectly. In statistics such a test is called a hypothesis test. In our case, the two hypotheses are: H0: The chip is Type B, i.e., th r r > ; H1: The chip is Type A, i.e., th r r ≤ . This hypothesis test is the same as the one discussed in Chapter 2. This test generates four possible outcomes. 1) The chip is Type B, and classified as Type B; 2) the chip is Type B, and classified as Type A; 3) the chip is Type A, and classified as Type A; and finally 4) the chip is Type A and classified as Type B. Outcome 2) results in a lower price chip sold erroneously at a higher price, while outcome 4) results in a higher price chip 80 sold at a lower price. Outcome 4) is likely acceptable to customers and outcome 2) is not. So the test should limit the probability of the occurrence of outcome 2). Assume it is required that the probability of any Type B chip being classified as Type A be smaller than β, where β<<1. According to the analysis of error-rate estimation, the estimated error-rate has normal distribution with its expectation being the true error-rate. Figure 3.5 shows the distribution of estimated error-rate of a chip whose true error-rate is r th and a chip whose true error-rate is greater than r th . It can be seen that for a chip with true error-rate r greater than r th , the further the true error-rate r is from r th , the lower that probability of outcome 2) occurring. This probability is represented by the dash area under the curve. However, when the true error-rate is equal to r th , the probability of outcome 2) is always 50%. Thus, the requirement is never satisfied. To solve this issue, we define another threshold, namely r thn , that is smaller than r th , and postulate the following classification criterion: If the estimated error-rate of a chip is smaller than r thn , it is classified as Type A. If the estimated error-rate of a chip is equal to or greater than r thn , it is classified as Type B. r th (a) r th (b) (a) (b) Figure 3.5: (a) PDF of estimated error-rate of a chip whose true error-rate is r th . (b) PDF of estimated error-rate of a chip whose true error-rate is greater than r th . 81 Assume that the probability of outcome 2) occurring is still limited to β. Thus this constraint also holds when r = r th , hence the probability of outcome 2) is smaller than β when r > r th . So we require β ≥ Prob( thn r r < ˆ | th r r = ). When th r r = , the estimated error- rate has normal distribution }) ˆ { , ( r Var r N th . Thus, Prob( thn r r < ˆ | th r r = )= { } β ≤ − r Var r r Q thn th ˆ , which is equivalent to ( ) ( ) { } r Var Q r r thn th ˆ ) ( 2 1 2 ≥ − − β . From (3.24), we already have { } ( ) LS r r r S r r Var th th th th 4 0 2 0 2 2 2 0 6 ) 2 10 ( ) 3 ( 2 ˆ α α α − − + − + − = . Thus ( ) ( ) ( ) LS r r r S r Q r r th th th th thn th 4 0 2 0 2 2 2 0 2 1 2 6 ) 2 10 ( ) 3 ( 2 ) ( α α α β − − + − + − ≥ − − (3.27) or equivalently, ( ) ( ) ( ) − − + − + − − ≥ − L r r r r r r Q S th th th th thn th 4 0 2 0 2 2 2 0 2 2 1 6 ) 2 10 ( ) 3 ( 2 ) ( α α α β (3.28) (3.28) describes the requirement for S and L such that the probability of outcome 2) occurring is limited to β. Without knowing the value of 0 α , we choose S to be greater than the maximum value of the right hand side of (3.28). With r th being smaller than 0.2, the right hand side of (3.28) is maximal at the value ( ) ( ) − + − − L r r r r r Q th th th thn th ) 3 ( 2 ) ( 2 2 2 2 1 β when 0 0 = α . So we choose S according to the expression ( ) ( ) − + − = − L r r r r r Q S th th th thn th ) 3 ( 2 ) ( 2 2 2 2 1 β (3.29) 82 In (3.29), β and r th are given in the test specification. We need to determine the value of S, L and r thn . From our previous analysis, the selection of S and L depends on what cost function is minimized. Assume L has been determined. Then S is determined by r thn . As r thn decreases, S decreases. Because smaller values of S result in less storage cost, it is important to keep S small. However, smaller r thn causes more Type A chips to be classified as Type B, meaning a loss in profit. So there is a tradeoff between storage cost and profit. For a Type A chip, its true error-rate, r, may be smaller or greater than r thn . Given L and S, the probability of outcome 4) can be computed. Consider the case of r < r thn . The estimated error-rate has normal distribution ( ) − − + − + − LS r r r S r r N 4 0 2 0 2 2 2 0 6 ) 2 10 ( ) 3 ( 2 , α α α . The probability of outcome 4), i.e. the estimated error-rate being greater than r thn is ( ) ( ) − − + − + − − ) ( 6 ) 2 10 ( ) 3 ( / 2 4 0 2 0 2 2 2 0 LS r r r S r r r Q thn α α α . (3.30) When r < 0.2 and 0 0 = α , (3.30) takes on a maximal value of − + − ) ( ) 3 ( / 2 2 2 LS r r S r r r Q thn , which is the upper bound of the probability of outcome 4) for the case of r < r thn . When r thn < r < r th , the probability of outcome 4) is ( ) ( ) − − + − + − − − ) ( 6 ) 2 10 ( ) 3 ( / 2 1 4 0 2 0 2 2 2 0 LS r r r S r r r Q thn α α α . (3.31) 83 Since r p p ≤ − = 2 1 0 α , 2 2 0 r ≤ α , it can be shown that if r < 1/3 then (3.31) is maximal when 2 2 0 r = α and has the value ( ) ( ) − + − + − − − ) ( 6 10 5 / 2 1 4 3 2 2 2 LS r r r r S r r r r Q thn . Similarly, the probability of outcome 2) for the case for r > r th can be computed. Table 3.1 summarizes the upper bounds on the probability of erroneous classification for these three cases. Similar to the error-rate estimation technique based on signature analysis [28], it is not necessary to apply all test sessions before making a decision because the further the true error-rate is from the threshold, the smaller is the probability of making an erroneous classification. The number of test sessions previously derived in (3.29) is based on the worst case, i.e., when r = r th . However, a defective chip usually does not represent the worst case, and sometimes never represents such a case. The classification requirement is that the probability of an estimated error-rate less than r thn be smaller than β if the true error-rate is greater than r th . So it is possible to make a decision without executing all S test sessions as long as the probability of making a wrong decision is smaller than β. Let S ms (r) be the minimal number of test sessions required for a chip with error-rate r that satisfies the constraint imposed by β. For r > r th , the upper bound on the probability that the estimated error-rate is smaller than r thn is − + − ) ( ) 3 ( / 2 2 2 LS r r S r r r Q thn . For r < r thn , the upper bound of the probability 84 that estimated error-rate is greater than r thn is − + − ) ( ) 3 ( / 2 2 2 LS r r S r r r Q thn . Both formulas are listed in Table 3.1. Then S ms (r) should satisfy β = − + − ) ( ) 3 ( / 2 2 2 ms ms thn LS r r S r r r Q if r > r th , (3.32) and β = − + − ) ( ) 3 ( / 2 2 2 ms ms thn LS r r S r r r Q if r < r thn . (3.33) Table 3.1: The maximal likelihood of making an erroneous classification Upper Bound of the Probability of Making Erroneous Classification True error-rate r<r thn , and estimated error-rate thn r r > ˆ (Type A classified as Type B) − + − ) ( ) 3 ( / 2 2 2 LS r r S r r r Q thn r thn <r<r th and thn r r > ˆ . (Type A classified as Type B) ( ) − + − + − − − LS r r r r S r r r r Q thn 4 3 2 2 2 6 10 5 2 1 r>r th and thn r r < ˆ . (Type B classified as Type A) − + − ) ( ) 3 ( / 2 2 2 LS r r S r r r Q thn When r thn < r < r th , S ms (r) is equal to S as specified by (3.29). As an example, let r thn = 0.009, r th = 0.01 and β = 0.05. L is chosen to be 100 (=1/r th ). Figure 3.6 shows the minimal number of test sessions S ms (r) for different true error-rates r. As r moves away from the range of [r thn , r th ], S ms (r) quickly decreases. So for error-rates far from [r thn , r th ], only a small number of test sessions are needed. This result is the same as the corresponding result from signature analysis. 85 0 0.01 0.02 0.03 0.04 0.05 0 100 200 300 400 500 600 700 800 900 True error-rate, r Minimal number of test sessions, S ms Figure 3.6: The minimal number of test sessions for different true error- rates To be able to make early decision before applying all test sessions, the test procedure must be modified. An original test is divided into multiple phases, each of which consists of a disjoint subset of the S test sessions. After each test phase is completed, the error-rate is estimated based on the results from the test sessions applied so far. If the estimated error-rate is smaller than r thn , the probability of making an erroneous classification is calculated using (3.33). If the estimated error-rate is greater than r th , the probability of making an erroneous classification is calculated using (3.32). If the computed probability is smaller than β, the test stops, otherwise it continues. If the estimated error-rate is in the range [r thn , r th ], the test also continues unless all test sessions have been run, in which case testing is finished. 3.7 Experimental Results To validate this chip classification technique, we applied it to the ISCAS’85 benchmark circuit C432 that has 7 primary outputs. Because our technique as described 86 here assumes a single output circuit, we created seven single output circuits from C432, labeled as C432_1, C432_2, C432_3, C432_4, C432_5, C432_6 and C432_7. The seven circuits have the same netlist as C432, except for minor modifications discussed next. For each of these netlists, only one primary output of C432 is treated as the output of the new circuit and other outputs are treated as internal wires. For example, the output pin 1 of C432 is the output of C432_1, and output pins 2, 3, 4, 5, 6 and 7 of C432 are treated as internal wires that are unobservable. We use the single stuck-at fault model to represent defects. Each of these circuits has 864 single stuck-at faults. Thus, corresponding to each fault-free circuit, there are 864 faulty circuits. Consider the 864 faulty circuits of C432_7. Since we know the actual faults in these circuits, we can obtain their actual error-rates (see Figure 3.7). To classify these circuits, we set r th =0.02, r thn =0.019, β=0.05 and L=50. The error-rate of a Type A circuit is in the range (0, 0.02), and for a Type B circuit [0.02, 1]. From (3.29), the maximal number of test sessions, S, is 3182. We use the multi-phase test scheme described in the previous section for classification. In the first phase, S*=20 test sessions are executed. The collected data is not statistically meaningful if S* is too small. In each of the following phases, only one test session is applied. Even though more test sessions can be applied, we choose one because we want to find the exact number of the test sessions needed. After each phase, the estimated error-rate is computed and the condition for stopping is checked. The total number of test sessions for each circuit is recorded. Figure 3.8 shows the histogram of test sessions for all the circuits. From Figure 3.8, it is seen that for many circuits, only a small number of test sessions are needed. This 87 means that the error-rate for each of these circuits is far from the range (0.019, 0.02). This result is consistent with Figure 3.7, that shows that only a small fraction of the circuits have an error-rate within or near the range (0.019, 0.02). 0 0.01 0.02 0.03 0.04 0 0.05 0.1 0.15 Error-rate Distribution (a) (a) 0.1 0.2 0.3 0.4 0.5 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Error-rate Distribution (b) (b) Figure 3.7: Normalized histogram of error-rates of 864 faulty circuits associated with C432_7. (a) The error-rate range is from 0 to 0.04. (b) The error-rate range is from 0.04 to 0.55. Note: the scales for (a) and (b) are different. 88 20 40 60 80 100 0 0.2 0.4 0.6 0.8 Test Sessions Distribution (a) (a) 200 400 600 800 1000 0 0.002 0.004 0.006 0.008 0.01 0.012 Test Sessions Distribution (b) (b) Figure 3.8: Normalized histogram of the number of test sessions applied when classifying the 864 faulty circuits of C432_7. (a) The number of test sessions is in the range of 1 to 100. (b) The number of test sessions is in the range of 100 to 1100. Note: the scales for (a) and (b) are different. To further demonstrate the correlation between the number of test sessions and error- rate, in Table 3.2 we list the average number of test sessions for different sub-ranges of error-rate. For error-rates far from 0.019~0.02, the average number of test sessions is small. As the error-rate gets close to 0.019~0.02, the average number of test sessions increases. This is consistent with the analysis in Section 3.6. For error-rates in the range 89 0.019~0.02, the average number of test sessions is 995.44, which is not equal to the maximal number of test sessions 3182. This is because for some circuits, the estimated error-rate is either below 0.019 or above 0.02, and the stopping condition is satisfied before running 3182 test sessions. Table 3.2: Average number of test sessions of faulty C432_7 circuits for different error-rate range Error-rate Range Average Number of Test Sessions Number of Circuits in the Range 0~0.004 20 177 0.004~0.008 21.33 98 0.008~0.012 28.32 74 0.012~0.015 54.23 66 0.015~0.017 192.33 21 0.017~0.019 296.53 19 0.019~0.02 995.44 16 0.02~0.022 341.05 19 0.022~0.024 252.90 28 0.024~0.027 95.82 34 0.027~0.034 56.13 40 0.034~0.05 31.20 99 0.05~1 20.15 172 Similar to what we did in Chapter 2, we investigated the test escape and yield loss associated with this classification technique. Recall that the percentage of circuits misclassified as Type A is called the test escape, and those misclassification as Type B is called the yield loss. Intuitively, if the error-rate of a circuit is far from the threshold, it is less likely to be misclassified. Because most of the 864 circuits in our experiment have an error-rate far from the threshold, the test escape and yield loss should be low. Table 3.3 lists the number of misclassification for three different error-rate ranges. Thus for this experiment the test escape is 10/471=3.33% and the yield loss is 13/393=2.12%. 90 Table 3.3: Misclassification in the experiment for C432_7 Error-rate Range Actual Type Actual Number of Faulty Circuits Number of Misclassified Circuits 0~0.019 Type A 455 1 0.019~0.02 Type A 16 9 0.02~1 Type B 393 13 The same classification experiment was applied to benchmark circuit C880. A single output circuit, namely C880_26, was derived from C880 such that one primary output of C880 is the only output of C880_26, and the other primary outputs of C880 are treated as internal wires of C880_26. Because C880_26 has 1960 single stuck-at faults, we obtained 1960 faulty copies of C880_26. Each faulty copy corresponds to a single stuck-at fault. In the experiment, r th =0.02, r thn =0.019, β=0.05 and L=50. Table 3.4 lists the average number of test sessions for different sub-ranges of error-rate and the number of faulty circuits in each range. Similar to the results for C432_7, the number of test sessions increases when the error-rate range is close to 0.019~0.02. Table 3.4 also quantifies the number of misclassifications made. Of the 1554 Type A circuits, 7 are misclassified. Of the 206 Type B circuits, 6 are misclassified. The resulting yield loss is 0.45% and the test escape is 2.91%. The results of these experiments on C432_7 and C880_26 are consistent with our analytical results. When the error-rate of a circuit is far from the threshold, correct classification decision can be made after a small number of test sessions. As the error-rate of a circuit approaches the threshold, more test sessions are needed. In our experiments, we choose β to be 0.05, which limits the probability of misclassification, and use β for the worse situation to determine other test parameters. When the error-rate of a circuit is 91 far from the threshold, the probability of making a wrong decision is actually smaller than β. From Table 3.4 it is seen that the probability of making a wrong decision increases as the actual error-rate approaches the threshold. Table 3.4: Experimental results for faulty C880_26 circuits Error-rate Range Actual Type Average Number of Test Sessions Number of Actual Circuits in the Range Number of Misclassified Circuits 0~0.004 A 20 1397 0 0.004~0.008 A 21.57 74 0 0.008~0.012 A 24.16 37 0 0.012~0.015 A 56.88 16 0 0.015~0.017 A 213.71 7 0 0.017~0.019 A 818 17 2 0.019~0.02 A 1657.83 6 5 0.02~0.022 B 873.76 17 4 0.022~0.024 B 244 7 0 0.024~0.027 B 80.38 8 1 0.027~0.034 B 68.14 7 1 0.034~0.05 B 34.97 38 0 0.05~1 B 20.19 129 0 In the experiment for C880_26, 1547 Type A circuits are correctly classified as Type A, and 6 Type B circuits are misclassified as Type A. Thus, 1553 circuits are classified as Type A with 6 of them actually being Type B. If we sell all 1553 circuits, in principle 6 might justifiable be returned. The fraction of chips that are beyond-tolerance (FBT), namely 6/1553, should not be confused with the classical notion of defects per million (DPM) used as a measure for product quality. We require that the maximal probability of misclassification for a circuit be β. Theoretically, for the worst scenario in classification, the number of actual Type B circuits that are misclassified as Type A is (β × the number of actual Type B circuits); and the number of actually Type A circuits that are 92 misclassified as Type B is (β × the number of actual Type A circuits). Thus the number of circuits that are classified as Type A is (the number of actual Type A circuits − β × the number of actual Type A circuits + β × the number of actual Type B circuits). Then, the FBT is equal to A type as classified are that circuits of number the circuits B type actual of number the × β . The actual value of FBT is usually smaller than its theoretical value because chips with error-rate far from the threshold has probability of misclassification lower than β. In our experiment on the 1960 faulty circuits of C880_26, the experimental value of FBT is 6/1553(=0.38%), while the theoretical value of FBT is 0.05x206/(1554- 0.05x1554+0.05x206) or 0.696%. 3.8 Error-rate Estimation of Multiple Output Circuits In the previous sections of this chapter, we discussed error-rate estimation using ones counting as applied to single output combinational circuits. For multiple output circuits, however, the error-rate of the circuit treating all output lines together as a pattern cannot be derived from the error-rates of individual output pins without knowing the correlation among these lines. Consider a half adder with two inputs a and b, and two outputs C (carry) and S (sum) (see Figure 3.9). Fault 1 is a single stuck-at-1 at signal line a. If all possible input patterns are applied, we find that the error-rate of C is 1/4, that of S is 1/2, and that of this faulty half adder is 1/2. Through analysis of correlation, we see that for this fault, if C is erroneous, then S is erroneous. Next consider the multiple stuck-at fault 93 labeled fault 2. In this case, the error-rate of C is 1/4, that of S is 1/2, and that of the half adder is 3/4. Even though the error-rate of the half adder is the sum of error-rates of C and S, analysis shows that errors at C and S are not independent. The correlations among output lines not only depend on the circuit, but also on the faults. It is impractical to find the correlation relationships in advance and use them to derive the error-rate of the circuit from the individual error-rates of the output lines. Half Adder a b C S SA-1 Half Adder a b C S Fault 1 Fault 2 SA-0 SA-0 Figure 3.9: Two different faults in a half adder Instead of involving the correlation relationship among output lines, we propose a method that uses the ones counting technique described previously to find the error-rate of multiple output circuits. From the discussion in Section 3.3, we see that for a single output circuit, if its outputs can be modeled as types 0/1, 1/0, 0/0 and 1/1, we can successfully estimate the probability of the occurrence of type 0/1 and 1/0 outputs. This result forms the basis of the error-rate estimation using ones counting for multiple output circuits. 3.8.1 Method of Error-rate Estimation for Multiple Output Circuits For multiple output circuits, the number of ones in an output pattern can be either even or odd. To estimate the error-rate, the output pattern is fed into a circuit called a 94 parity checker (see Figure 3.10). A parity checker is a multi-input single-output circuit that outputs 1 when the number of ones in the input is odd, and 0 otherwise. Defective CUT Parity Checker Input is type 2m -> output is 0 Input is type 2m+1 -> output is 1 Figure 3.10: Defective CUT feeds a parity checker Assume that the number of ones in an output pattern of a defect-free CUT is of type 2m. For the defective CUT, the number of ones of the observed output pattern is of type 2m'. Both m and m' are integers. For example, an observed output pattern of “110011” has 4 ones, and the correct output pattern of “111100” has 4 ones too. Both the observed and correct parity checker output is 0. This “0” is referred to as a type 0/0 output. This case corresponds to no errors or an even number of erroneous output pins. The number of ones of the observed output pattern could be of type 2m'+1, in which case the observed parity checker output is 1 and the correct parity checker output is 0. This “1” is referred to as a type 0/1 output, and this case corresponds to an odd number of erroneous output pins. In a similar fashion, we can define types 1/1 and 1/0 output. In a similar fashion, type 1/1 and 1/0 outputs can be defined. To represent different output situations and error patterns, we use notation E a,b . “a” refers to the number of ones in the correct output. “b” refers to the difference in the number of ones in the observed output and the number of ones in the correct output. Thus, “a+b” represents the number of ones in the observed output. 95 Thus, type 0/1 output of a parity checker corresponds to E 2m,2n+1 ; 1/0 to E 2m+1,2n+1 ; 0/0 to E 2m,2n and 1/1 to E 2m+1,2n , where n is an integer. Using the ones counting technique, we are able to estimate the probability of the occurrence of type 0/1 or 1/0 outputs of the parity check circuit, which is also the probability of the occurrence of E 2m,2n+1 or E 2m+1,2n+1 of the CUT. It follows that P any,2n+1 =P 2m,2n+1 +P 2m+1,2n+1 . In other words, with the assistance of a parity checker circuit we can estimate P any,2n+1 , namely the probability that an odd number of lines of the CUT are erroneous. To find the probability of the occurrence of an even number of erroneous lines at the output of the CUT, we use what we refer to as mod-4 circuits. There are two types of mod-4 circuits. One type, called the lower mod-4, is shown in Figure 3.11. It outputs a 0 when the number of ones in its input pattern is of type 4m or 4m+1, and conversely, outputs a 1 when the number of ones in its input pattern is of type 4m+2 or 4m+3. In other words, the number of ones in the input pattern of a lower mod-4 circuit is divided by 2 and the result is replaced by its floor. If the floor is even, the circuit outputs a 0. If the floor is odd, it outputs a 1. Using a lower mod-4 circuit, its output results, which are based on different combinations of the output and error patterns of the defective CUT, are listed in Table 3.5. Defective CUT Lower Mod-4 Circuit Input is type 4m or 4m+1 -> output is 0 Input is type 4m+2 or 4m+3 -> output is 1 Figure 3.11: Defective CUT feeds a lower mod-4 circuit 96 Table 3.5: Output of a lower mod-4 circuit according to different combinations of output and error patterns of defective CUTs Error pattern of defective CUTs Correct output of lower mod-4 Observed output of lower mod-4 Output type of lower mod-4 E 4m,4n 0 0 0/0 E 4m,4n+1 0 0 0/0 E 4m,4n+2 0 1 0/1 E 4m,4n+3 0 1 0/1 E 4m+1,4n 0 0 0/0 E 4m+1,4n+1 0 1 0/1 E 4m+1,4n+2 0 1 0/1 E 4m+1,4n+3 0 0 0/0 E 4m+2,4n 1 1 1/1 E 4m+2,4n+1 1 1 1/1 E 4m+2,4n+2 1 0 1/0 E 4m+2,4n+3 1 0 1/0 E 4m+3,4n 1 1 1/1 E 4m+3,4n+1 1 0 1/0 E 4m+3,4n+2 1 0 1/0 E 4m+3,4n+3 1 1 1/1 From Table 3.5, we see that the output of the lower mod-4 circuit can be modeled as types 0/0, 1/1, 0/1 and 1/0. The outputs of types 0/1 and 1/0 correspond to the error patterns E 4m,4n+2 , E 4m,4n+3 , E 4m+1,4n+1 , E 4m+1,4n+2, E 4m+2,4n+2 , E 4m+2,4n+3 , E 4m+3,4n+1 and E 4m+3,4n+2 , which can be combined as E any,4n+2 , E 2m,4n+3 and E 2m+1,4n+1 . Then, using the method discuss in Section 3.3, we can estimate the value of P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 . Defective CUT Upper Mod-4 Circuit Input is type 4m or 4m+3 -> output is 0 Input is type 4m+1 or 4m+2 -> output is 1 Figure 3.12: Defective CUT feeds an upper mod-4 circuit Now consider the upper mod-4 circuit shown in Figure 3.12. It outputs a 0 when the number of ones in its input pattern is of type 4m or 4m+3, and conversely, it outputs a 1 97 when the number of ones in its input pattern is of type 4m+1 or 4m+2. In other words, the number of ones in its input pattern is divided by 2 and the result is replaced by its ceiling. If the ceiling is even, the upper mod-4 circuit outputs a 0; otherwise, it outputs a 1. Similar to the lower mod-4 circuit, the outputs of this circuit can be modeled as types 0/0, 1/1, 0/1 and 1/0 according to different combinations of output and error patterns of the defective CUT. The details are listed in Table 3.6. From Table 3.6, we see that type 0/1 and 1/0 outputs correspond to E 4m,4n+1 , E 4m,4n+2 , E 4m+1,4n+2 , E 4m+1,4n+3 , E 4m+2,4n+1 , E 4m+2,4n+2 , E 4m+3,4n+2 and E 4m+3,4n+3 , which can be combined as E any,4n+2 , E 2m,4n+1 and E 2m+1,4n+3 . So using the method in Section 3.3, we can estimate the value of P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 . Table 3.6: Output of an upper mod-4 circuit according to different combinations of output and error patterns of defective CUTs Error pattern of defective CUTs Correct output of upper mod-4 Observed output of upper mod-4 Output type of upper mod-4 E 4m,4n 0 0 0/0 E 4m,4n+1 0 1 0/1 E 4m,4n+2 0 1 0/1 E 4m,4n+3 0 0 0/0 E 4m+1,4n 1 1 1/1 E 4m+1,4n+1 1 1 1/1 E 4m+1,4n+2 1 0 1/0 E 4m+1,4n+3 1 0 1/0 E 4m+2,4n 1 1 1/1 E 4m+2,4n+1 1 0 1/0 E 4m+2,4n+2 1 0 1/0 E 4m+2,4n+3 1 1 1/1 E 4m+3,4n 0 0 0/0 E 4m+3,4n+1 0 0 0/0 E 4m+3,4n+2 0 1 0/1 E 4m+3,4n+3 0 1 0/1 98 So from the above three tests using the parity checker, lower mod-4 circuit and upper mod-4 circuit, we can estimate the values of P any,2n+1 (=P 2m,4n+1 +P 2m+1,4n+1 +P 2m,4n+3 +P 2m+1,4n+3 ), P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 , and P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 . Now computing the sum of these three values we have (P 2m,4n+1 +P 2m+1,4n+1 +P 2m,4n+3 +P 2m+1,4n+3 ) + ( P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 ) + ( P any,4n+2 + P 2m,4n+1 +P 2m+1,4n+3 ) = 2(P any,4n+2 +P 2m,4n+1 + P 2m+1,4n+1 +P 2m,4n+3 +P 2m+1,4n+3 ) = 2(P any,4n+1 +P any,4n+2 +P any,4n+3 ) Dividing this sum by 2 results in the probability of occurrence of output patterns that are in error, where error patterns include types of multiplicity E any,4n+1 , E any,4n+2 and E any,4n+3 . At present we have not yet been able to solve the case for error pattern E any,4n . 3.8.2 Experimental Results To verify the error-rate estimation of multiple output circuits, we applied our technique to benchmark circuit C880 that has 26 primary outputs. A defective copy of C880, whose signal 55 has a stuck-at 1 fault, was chosen arbitrarily. The actual error-rate of this faulty circuit is 0.04472. The actual value of its P any,2n+1 is 0.03131, P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 is 0.02646, and P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 is 0.02619. To estimate the error-rate of this multiple output circuit, we inserted a parity checker circuit, a lower mod-4 circuit and an upper mod-4 separately to transform this circuit into a single output circuit, and then we used our error-rate estimation method for single output circuits to estimate the error-rate. The procedure of determining test parameters, 99 such as L and S, are the same as before. In this experiment, we chose L = 50 and S = 10000. We repeated the estimation 500 times, thus obtaining 500 estimates of P any,2n+1 , 500 P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 , and 500 P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 . Table 3.7 shows the sample mean and sample variance of each estimated quantity. We can see that the estimated data closely match their corresponding true value. Table 3.7: Mean and variance of the estimated values of P any,2n+1 , P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 and P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 True value Mean of estimated value Variance of estimated value P any,2n+1 0.03131 0.03110 2.4221E-7 P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 0.02646 0.02626 1.8440E-7 P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 0.02619 0.02624 1.7777E-7 Probability of errors excluding error pattern E any, 4n 0.04198 0.04180 1.5110E-7 It is expected that the estimated values of each quantity are normally distributed. To justify this, we used the MATLAB tool “normplot”. If the data has a normal distribution, the output figure of “normplot” will be close to a line. In Figure 3.13, the output figures from “normplot” for P any,2n+1 , P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 and P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 are shown, respectively. These figures confirm that the estimated values have a normal distribution, which are consistent with our expectation. 100 0.0295 0.03 0.0305 0.031 0.0315 0.032 0.0325 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (a) 0.025 0.0255 0.026 0.0265 0.027 0.0275 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (b) Figure 3.13: Output of “normplot” for the estimated values of (a) P any,2n+1 , (b) P any,4n+2 +P 2m,4n+3 +P 2m+1,4n+1 and (c) P any,4n+2 +P 2m,4n+1 +P 2m+1,4n+3 101 0.025 0.0255 0.026 0.0265 0.027 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (c) Figure 3.13, Continued Experimental results show that our estimation method accurately estimates the error- rate excluding error pattern E any,4n . However, because error pattern E any,4n is ignored, it is necessary to investigate the effect of ignoring this error pattern for estimating the true error-rate. In the following we will again focus on C880. For each faulty copy of C880, we record and compare the actual error-rate and the error-rate associated with pattern E any,4n . Table 3.8 shows the distribution of absolute difference of these two values for all 1760 faulty copies. Table 3.9 shows the relative difference of these two values for all 1760 faulty copies. From both Table 3.8 and Table 3.9, for some faulty copies, error pattern E any,4n is the main contributor of error-rate. Especially from Table 3.9, we see that for 32.67% of all faulty copies, the relative difference is greater than or equal to 0.05. Thus, we cannot simply ignore error pattern E any,4n . We have investigated the use of other types of circuits for converting a multiple output circuit to a single output circuit, such as 102 mod-6 and mod-8. On the one hand, using such circuits reduces the probability of excluded certain error patterns. On the other hand, the test complexity is increased. So far, we have not found a way to estimate the error-rate that includes all error patterns using ones counting on a multiple output circuit. This unsolved problem may be a research topic in the future. Table 3.8: Distribution of absolute difference between error-rate and the probability of error pattern E any,4n for 1760 faulty copies of C880 Range of absolute difference Distribution [0 ~ 0.001) 0.722727 [0.001 ~ 0.002) 0.022727 [0.002 ~ 0.004) 0.038068 [0.004 ~ 0.006) 0.015909 [0.006 ~ 0.01) 0.055682 [0.01 ~ 0.02) 0.047159 [0.02 ~ 0.04) 0.053409 [0.04 ~ 0.06) 0.018182 [0.06 ~ 0.1) 0.019886 >= 0.1 0.006818 Table 3.9: Distribution of relative difference between error-rate and the probability of error pattern E any,4n for 1760 faulty copies of C880 Range of absolute difference Distribution < 0.05 0.673295 [0.05 ~ 0.1) 0.175568 [0.1 ~ 0.2) 0.135795 [0.2 ~ 0.3) 0.008523 [0.3 ~ 0.4) 0.001136 [0.4 ~ 0.5) 0 [0.5 ~ 0.6) 0.001136 [0.6 ~ 0.7) 0 [0.7 ~ 0.8) 0 >= 0.05 [0.9 ~ 1] 0.004545 0.3267 103 3.9 Conclusions We considered the problem addressed in [29], and developed a mathematically simpler estimator for single output circuit. In addition, we analyzed the statistical characteristics of the estimator, such as its expectation, variance and probability density function. We determined the conditions when this estimator is biased vs. unbiased. Based on the statistical analysis, we described a method for selecting the values for two key test parameters, namely the number of test patterns per session, L, and the number of test sessions, S. These parameters impact the test resources used for error-rate estimation. We showed how these parameters are a function of both test time and storage requirements. In addition, we provided useful and practical bounds for these two parameters. The results of our analysis show that the test resources required for error-rate estimation based on ones counting are quite comparable to those needed when signature analysis is employed. Thus, the major difference in these two approaches is that signature based error-rate analysis can operate on a multiple output circuit, while ones counting processes one output at a time unless the test hardware is replicated for each output line. We also addressed the problem of classifying chips based on their error-rate. The proposed classification procedure is partitioned into multiple phases. This process can significantly reduce classification time without any loss in the quality of classification. In our discussion, several assumptions have been made, such as assuming a normal distribution for some random variables, and dropping terms that lead to second or third order effects. We have addressed these issues in our experiments, and validated that these 104 assumptions are appropriate. We have also considered various boundary or extreme conditions, such as where the test length L of a test session is 1 or very large. For multiple output circuits, the output lines may not be of the same importance. Thus, when these output lines are considered independently, each of these lines will be associated with a different error-rate threshold. For example, the most significance bit line might have a much smaller error-rate threshold than the least significance bit line. Under this situation, the technique for the error-rate estimation of single output line can be used and the results can be used to justify whether the error-rate is smaller than a threshold. We extended the work for single output circuits to multiple output circuits using ones counting and developed new estimation methods. Generally, the output lines of a multiple output circuit are not independently, and their relationship is not explicitly known. By using three special circuits, namely a parity checker, a lower mod-4 and an upper mod-4 circuit, we developed a method to estimate error-rate. However, the probability of error patterns in which the difference in the number of ones in the observed output and the number of ones in the correct output is equal to a multiple of 4, cannot be estimated. To estimate the error-rate of a multiple output circuit, we cannot simply ignore this error pattern because it is a significant contributor to the error-rate for some faulty circuits. 105 Chapter 4 Test for SBER Estimation with BIST 4.1 Introduction Generally, SBER estimation is much harder than error-rate estimation, and is a function of the definition of error-significance. As discussed in Chapter 1, error- significance can be defined in different ways, such as by the numerical difference between the observed output and the correct output, by the absolute value of the difference between them, and by their relative difference. Some definitions of error- significance involve complicated calculations, which make SBER estimation difficult. In addition, the output compressor in the BIST architecture for SBER estimation is not as simple as an LFSR or a ones counter. For example, it seems impossible to use an output compressor if the error-significance is defined as the relative difference between the observed and the correct outputs. In this thesis, we only focus on one definition of error- significance, which is the absolute value of the difference. Consider a defective multiple output combinational circuit, where the data on the output lines can be interpreted to represent a real number. For an input pattern i, the absolute value of the difference between its observed response, * i O , and its correct response, i O , is defined as the “error-significance” for this input pattern i, i.e., | | * i i O O − . We now redefine SBER according to this definition of error-significance. Consider a combinational circuit C with n input pins that contains a static defect d. For a given error- significance threshold T ES , the SBER of C(d) is defined as the fraction of all 2 n input 106 patterns whose error-significance is greater than T ES . From the definition of SBER, we can see that (1) an error-significance threshold must be provided, (2) the SBER value of the circuit C(d) depends on the error-significance T ES , and (3) each defective instance of circuit C can be associated with a SBER value. Using the SBER value of a defective circuit, we can justify the acceptability of this circuit. Assuming that we have a SBER threshold T SBER , we can set the criterion of acceptability of a chip as follows: if the SBER value is greater than T SBER , it is not acceptable; otherwise it is. As afore mentioned, the SBER estimation requires a large amount of data to be stored and/or computed. The test overhead for SBER using BIST is generally greater than that of classical BIST overhead, making it important to keep this overhead below some specified bound. In this chapter, we consider three different situations where SBER estimation is applied. For the first two situations, we assume that multiple copies of a target circuit exist on a chip. The existence of multiple copies of a circuit is normal in systems such as multi-core processors and chips with redundancy. Here we are not interested in how a target circuit is used after we justify its acceptability. However, we care about how to justify the acceptability of the target circuit. In our test process, we apply the same test pattern to two copies of a target circuit. By comparing the outputs from these two circuits we can compute a value of pseudo-error-significance, labeled PES, in the same way one would compute the actual error-significance. If one or more of these target circuits have been previously found to be defect-free, which corresponds to the first case to be studied, 107 PES is actually the error-significance. If both circuits are faulty, which represents the second case of interest, then what is computed is just a PES. For the third situation, only one copy of a target circuit exists. What makes the estimation of SBER non-trivial is that we do not store the correct responses to the input test patterns. This chapter is organized as follows. Section 4.2 discusses the SBER estimation when multiple copies of a circuit exist, but as least one is defect-free and its identity is known. The scenario that multiple copies exist without any known defect-free copy is discussed in Section 4.3. In Section 4.4, we present the SBER estimation method for the scenario where only one copy of the circuit exists. Section 4.5 concludes this chapter. 4.2 Multiple Copies with At Least One Defect-free Copy If there is one known copy of the target circuit that is defect-free, it can be used as a reference circuit when testing the other circuits. A PRPG generates a long sequence of test patterns that are applied to both circuits. For each input pattern, the output responses from the two circuits are fed into the error-significance calculator. The error-significance obtained from the calculator is compared with a given error-significance threshold value. A counter keeps counting the number of input patterns whose error-significance values are greater than the error-significance threshold. After all patterns have been applied, the SBER value of the defective circuit is approximated through dividing the final value in the counter by the number of applied input patterns. This test structure is shown in Figure 4.1. 108 Pseudo-random Pattern Generator (PRPG) C1 C2 Defect-free Error-significance Calculator Comparator Counter Error-significance Threshold T Figure 4.1: Test structure for the scenario where one defect-free copy exists Assume the true SBER value is p. Consider error-significance values of input patterns as samples. A random variable with Bernoulli distribution is defined as 1 when a sample is greater than the error-significance threshold, and 0 when less than or equal to the error- significance threshold. Thus p is the parameter for the probability that this random variable is equal to 1. Let M represent the number of input patterns applied to the circuits, and K the final value in the counter. M K p / ˆ = is the maximum likelihood estimator (MLE) of p. This estimator has been extensively studied [13]. It is known that its distribution is close to a normal distribution N(p, p(1-p)/M) when M is large. When SBER is a measure of the acceptability of a defective circuit, a SBER threshold T is provided. If the estimated SBER value is less than T, the circuit is acceptable; otherwise it is not. This leads to a hypothesis test regarding p, which has two types of 109 errors, 1) p>T and ≤ p ˆ T; and 2) p ≤ T and p ˆ >T. If the probability of these errors occurring is bounded by μ, the following requirements are applied to the hypothesis test: Given μ (0<μ<<1), Pr( ≤ p ˆ T) ≤ μ if p>T; and Pr( p ˆ >T) ≤ μ if p≤ T. However, the above requirement cannot be satisfied if p is very close to T. Consider the extreme situation where p=T. The probability of making an incorrect conclusion is always 0.5. To deal with this issue, we select another threshold T n , where T n is slightly smaller than T, and modify the test requirements as follows: 1) Pr( ≤ p ˆ T n ) ≤ μ, if p>T; 2) Pr( p ˆ >T) ≤ μ, if p ≤ T n ; and 3) any decision is fine, if T n <p<T. Prior to carrying out a test, the number of test patterns, M, needs to be predetermined. For p>T, we have that μ ≤ − − = ≤ ) / ) 1 ( / ) (( ) ˆ Pr( M p p T p Q T p n n , i.e., ( ) 2 1 ) /( ) ( ) 1 ( n T p Q p p M − − ≥ − μ , where dt e x Q x t ∫ ∞ − = π 2 / ) ( 2 / 2 . For p ≤ T n , we have that μ ≤ − − = > ) / ) 1 ( / ) (( ) ˆ Pr( M p p p T Q T p , i.e. ( ) 2 1 ) /( ) ( ) 1 ( p T Q p p M − − ≥ − μ . Let M min denote the minimum number of test patterns required to obtain the required degree of accuracy. Then M min = ( ) 2 1 ) /( ) ( ) 1 ( n T p Q p p − − − μ when p>T. and M min = ( ) 2 1 ) /( ) ( ) 1 ( p T Q p p − − − μ when p ≤ T n . As p moves away from the threshold T, M min decreases, i.e., fewer test patterns are needed. 110 Assume T=0.05, T n =0.045 and μ=0.0013. Figure 4.2 shows M min for different true SBER values. Clearly, the number of required test patterns reduces dramatically as the difference between the threshold and the true SBER increases. 0 0.05 0.1 0.15 0.2 0.25 0.3 0 2000 4000 6000 8000 10000 12000 14000 p M min T n =0.045 T=0.05 Figure 4.2: M min vs. p, where T=0.05 and T n =0.045 Before testing commences, the value of M must be determined. Because we do not know the true SBER value p, we cannot use either ( ) 2 1 ) /( ) ( ) 1 ( n T p Q p p − − − μ or ( ) 2 1 ) /( ) ( ) 1 ( p T Q p p − − − μ for this value. However, from this expression, we can use an upper bound of M min as the value of M, which is M*=max{ ( ) 2 1 ) /( ) ( ) 1 ( n T T Q T T − − − μ , ( ) 2 1 ) /( ) ( ) 1 ( n n n T T Q T T − − − μ }. Assume T=0.05, T n =0.045 and μ=0.0013, M*=1.71E+4. While testing, it is not necessary to apply all M test patterns before making a decision on acceptability. The procedure used for the classification of chips based on error-rate in Chapters 2 and 3 can be used here as well. In this procedure, the test is divided into multiple phases. During each phase, a certain number of test patterns are applied. At the end of each phase, the SBER value is estimated and the probability of making a wrong 111 decision is computed. If the probability of making a wrong decision is smaller than μ, testing can be terminated; otherwise, testing continues with the next phase. We confirmed our results experimentally using benchmark circuit C432. Assume there are 864 defective copies of C432, each of which contains a different single stuck-at fault. C432 has 7 output lines. We consider its output data to be an integer in the range [0, 255]. The error-significance threshold is set to 50. For each defective copy, we compared its SBER value with a SBER threshold of T=0.03. We set T n =0.027 and μ=0.0013. According to the above discussion, we needed a maximum of 3.0E+4 test patterns. We paired each defective copy of C432 with a fault-free copy, applied up to 3.0E+4 pseudo- random test patterns, and made a decision on acceptability. Table 4.1 shows the experimental results. We see that for most defective circuits the procedure made the correct decision. Table 4.1: Experimental results for defective copies of C432 SBER range Actual number of defective circuits Number of wrong decisions (0, 0.027] 792 0(computed SBER ≥0.03) (0.027, 0.03] 3 2(computed SBER ≥0.03) (0.03, 1] 69 2(computed SBER <0.03) 4.3 Multiple Copies Without Known Defect-free Copy When there is no known fault-free copy of the target circuit on the chip, the same test architecture as discussed previously can be applied, but now it is not clear if either of the pair of outputs being compared is correct. 112 Assume there are three copies of the target circuit labeled C1, C2 and C3. Let p 1 , p 2 and p 3 represent their corresponding actual SBER values, and e 1 , e 2 and e 3 their error- rates. Let q 1 =e 1 -p 1 , q 2 =e 2 -p 2 and q 3 =e 3 -p 3 . Assume e 1 , e 2 and e 3 are known and small, and can be determined using the error-rate estimation method presented in Chapter 2. Consider C1 and C2. For each input pattern, its PES is computed. A counter is used to record the number of input patterns whose PES is greater than the error-significance threshold. A test gives the ratio of the final value in the counter to the number of applied input patterns, i.e. K/M. In response to a specific input pattern, if the computed error- significance is greater than the error-significance threshold, then one of the following three situations has occurred: 1) the error-significance at C1 is greater than the threshold and C2 has no error; 2) the error-significance at C2 is greater than the threshold and C1 has no error; or 3) both C1 and C2 have errors in their outputs and the difference between their outputs is greater than the threshold. For case 3), it is possible that the true error- significance of both C1 and C2 are smaller than the threshold, or the error-significance of both are larger than the threshold, or one is larger and the other is smaller. For example, assume the correct output is 0 and the threshold is 10. For all the following cases, the error-significance calculator outputs a number greater than the threshold: C1 outputs 6 and C2 outputs -6; C1 outputs 11 and C2 outputs -10; C1 outputs 13 and C2 outputs 1. Assume errors from the two copies of the target circuit are independent, i.e., we are dealing with random rather than systematic defects. The probability of case 1) occurring is p 1 (1-e 2 ), and that of case 2) is p 2 (1-e 1 ). The probability of case 3) occurring is unknown but less than e 1 e 2 . However, e 1 e 2 is very small relative to p 1 (1-e 2 ) and p 2 (1-e 1 ), when e 1 113 and e 2 are much smaller than 1, p 1 and e 1 are within an order of magnitude of each other, and the same is true for p 2 and e 2 . Hence the probability that case 3) occurs can be ignored. The test result gives us an approximation of p 1 +p 2 , say t 12 . If C2 and C3 are paired together, the test result is an approximation of p 2 +p 3 , say t 23 . Similarly if C1 and C3 are paired together, the test result is an approximation of p 1 + p 3 , say t 13 . From p 1 + p 2 = t 12 , p 2 + p 3 = t 23 and p 1 + p 3 = t 13 , we can obtain the estimators for p 1 , p 2 and p 3 as 2 / ) ( ˆ 23 13 12 1 t t t p − + = , 2 / ) ( ˆ 13 23 12 2 t t t p − + = and 2 / ) ( ˆ 12 23 13 3 t t t p − + = . t 12 , t 23 and t 13 , and 1 ˆ p , 2 ˆ p and 3 ˆ p have normal distributions. The expectations of t 12 , t 23 and t 13 are p 1 + p 2 , p 2 +p 3 and p 1 +p 3 , respectively. Assume we apply M test patterns to each of the three pairing of circuits. Thus the variances of t 12 , t 23 and t 13 are ((p 1 +p 2 )- (p 1 +p 2 ) 2 )/M, ((p 2 +p 3 )-(p 2 +p 3 ) 2 )/M and ((p 1 +p 3 )-(p 1 +p 3 ) 2 )/M, respectively. Now we can obtain the variances of 1 ˆ p , 2 ˆ p and 3 ˆ p . Their variances are all the same, namely (Var(t 12 )+Var(t 23 )+Var(t 13 ))/4, which upon substitution, results in ) 2 /( ] [ 3 2 3 1 2 1 2 3 3 2 2 2 2 1 1 M p p p p p p p p p p p p − − − − + − + − . Using SBER as a measure for acceptability, we want to determine whether the SBER value of a defective circuit is greater or smaller than a given threshold T. Similar to the discussion in Section 4.2, we desire to limit the probability of making a wrong decision to be no more than μ (0<μ<<1). Again, define T n as another threshold, where T n is slightly smaller than T. The test should satisfy: 1) Pr( ≤ p ˆ T n ) ≤ μ, if p>T; 114 2) Pr( p ˆ >T) ≤ μ, if p ≤ T n ; and 3) any decision is fine if T n <p ≤ T. Consider C1 and assume p 1 >T. From μ ≤ − = ≤ } ˆ { ) ˆ Pr( 1 1 1 p Var T p Q T p n n , we have 2 1 1 3 2 3 1 2 1 3 3 2 2 1 1 ) ( ) ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 − − − − − + − + − ≥ − n T p Q p p p p p p p p p p p p M μ . Similarly, when p 1 ≤ T n , from μ ≤ − = > } ˆ { ) ˆ Pr( 1 1 1 p Var p T Q T p , we have 3 1 2 1 3 3 2 2 1 1 ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 p p p p p p p p p p M − − − + − + − ≥ 2 1 1 3 2 ) ( ) − − − p T Q p p μ . M min is still the minimal number of test patterns that need to be applied. For p 1 >T, 2 1 1 3 2 3 1 2 1 3 3 2 2 1 1 min ) ( ) ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 − − − − − + − + − = − n T p Q p p p p p p p p p p p p M μ . For p 1 ≤ T n , 2 1 1 3 2 3 1 2 1 3 3 2 2 1 1 min ) ( ) ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 − − − − − + − + − = − p T Q p p p p p p p p p p p p M n μ . Analysis shows that when 2p 1 + p 2 + p 3 <1, M min decreases when p 1 moves away from the threshold T. Thus, with respect to p 1 , the upper bound of M min is the maximum of 115 2 1 3 2 3 2 3 3 2 2 ) ( ) ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 − − − − − + − + − − n T T Q p p Tp Tp p p p p T T μ and 2 1 3 2 3 2 3 3 2 2 ) ( ) ) 1 ( ) 1 ( ) 1 ( ( 5 . 0 − − − − − + − + − − n n n n n T T Q p p p T p T p p p p T T μ . In the above expressions for M, p 2 and p 3 are unknown. Thus, to determine the value of M, we need to find an upper bound of M with respect to p 2 and p 3 , which can be obtained by setting p 2 = p 3 =(1-T)/2 and ignoring p 2 p 3 in the expressions for M. Thus, M is the maximum of 2 1 2 ) ( ) 1 ( 25 . 0 − − − n T T Q T μ and 2 1 2 ) ( ) 1 ( 25 . 0 − − − n n T T Q T μ , which is 2 1 2 ) ( ) 1 ( 25 . 0 − − − n n T T Q T μ because T n <T. Above, we focused on C1. Similarly, we will reach the same result if we focus on C2 or C3. Let μ=0.0013, T=0.05 and T n =0.045. To satisfy the test requirements on accuracy, about 90000 test patterns are needed. Also, if T − T n is fixed, M gets smaller as T increases. Similar to Section 4.2, by dividing the test into multiple phases we do not need to apply all M test patterns before making a decision. After each phase, the values of SBER for all target circuits are estimated and the probabilities of making wrong decisions are computed. If for each circuit, the probability of making a wrong decision is below μ, the test process stops; otherwise, it continues. To verify this methodology, we again turned to C432. Let T=0.03, μ=0.0013 and T n =0.027. According to the above discussion, 2.5E+5 test patterns are needed. In this 116 experiment, we chose four sets of circuits, each set including three defective instantiations of C432. For each set of circuits, we used the above methodology to determine whether the SBER values are greater or smaller than 0.03. Also for each set of circuits, tests are repeated 10 times. The number of correct decisions, which means all circuits in a set receive correct decision, is recorded. Table 4.2 lists the results of the experiment. For set 1, 2 and 3, the experimental results are always correct. However, for set 4, there are no correct results. This is because the erroneous outputs from two different target circuits are not independent, while the test method is based on the assumption that the errors from any two circuits are independent. In case of dependencies, the method in this section may not always work, and the method presented in Section 4.4 should be used. Because the success of this method depends on the assumption that the errors from all copies are independent, it is important to justify the independence before using this method. A partial solution exists when there are five or more copies. For example, let A, B, C, D and E be five copies of the target circuit. We can use A, B and C to estimate the SBER value of A, and then use A, D and E to again estimate the SBER value of A. If the two estimations are close, A is likely to be independent with respect to B, C, D and E. This partial solution of justifying independence needs extra test overhead and may not always guarantee the correct results. If it is impossible to justify independence of errors from different copies or we know they are dependent, this method cannot be used. For such case, and in fact for all situations, the method for a single copy described in the next section can be used. 117 Table 4.2: Experimental results for T=0.03, T n =0.027 and μ=0.0013. Each set includes three defective copies of C432 True SBER Number of test with correct decisions Percentage of correct decisions Circuit set 1 C1: signal 232 s-a-1 C2: signal 175 s-a-0 C3: signal 335 s-a-1 C1: 0.011 C2: 0.011 C3: 0.012 10 100% Circuit set 2 C1: signal 64 s-a-1 C2: signal 22 s-a-1 C3: signal 182 s-a-1 C1: 0.076 C2: 0.072 C3: 0.036 10 100% Circuit set 3 C1: signal 218 s-a-1 C2: signal 119 s-a-0 C3: signal 64 s-a-1 C1: 0.0054 C2: 0.0015 C3: 0.076 10 100% Circuit set 4 C1: signal 403 s-a-1 C2: signal 343 s-a-0 C3: signal 74 s-a-1 C1: 0.080 C2: 0.082 C3: 0.077 0 0 We showed above the SBER estimation for the case with three copies of the circuit. What if only two copies exist, say C1 and C2. In this case, we only obtain t 12 , which cannot be used to solve for both p 1 and p 2 . So if there are only two copies, the above discussion does not apply. We now consider the case when more than three copies exists. Assume there are four copies of the target circuit, C1, C2, C3 and C4. Similar to the above, we have t 12 =p 1 +p 2 by using C1 and C2, t 13 =p 1 +p 3 for C1 and C3, t 14 =p 1 +p 4 for C1 and C4, t 23 =p 2 +p 3 for C2 and C3, t 24 =p 2 +p 4 for C2 and C4, t 34 =p 3 +p 4 for C3 and C4. For C1 the estimator of p 1 is 6 / ) ( 3 / ) ( ˆ 34 24 23 14 13 12 1 t t t t t t p + + − + + = . The expectation of 1 ˆ p is p 1 . The variance of 1 ˆ p is (Var(t 12 )+Var(t 13 )+Var(t 14 ))/9+(Var(t 23 )+Var(t 24 )+Var(t 34 ))/36. In the case of three copies, the variance of 1 ˆ p is (Var(t 12 )+Var(t 23 )+Var(t 13 ))/4. 118 (Var(t 12 )+Var(t 13 )+Var(t 14 ))/9+(Var(t 23 )+Var(t 24 )+Var(t 34 ))/36−(Var(t 12 )+Var(t 23 )+ Var(t 13 ))/4 = (−5Var(t 12 ) −5Var(t 13 ) −8Var(t 23 )+4Var(t 14 )+ Var(t 24 )+Var(t 34 ))/36 ≈ (−p 1 −2p 2 −2p 3 +p 4 )/6 (when p 1 , p 2 , p 3 and p 4 are small) It implies that if p 4 is smaller than p 1 +2p 2 +2p 3 , the variance of 1 ˆ p in the case of using four copies is smaller than the case of using three copies. To show this, four circuits, say C1, C2, C3 and C4, are chosen. First, those four circuits are used together to estimate the SBER of C1. M test patterns are used for each pair of circuits. The estimation is repeated 500 times. Each time, a sample of the estimated SBER value is obtained. Then the sample variance of these estimates is calculated. Secondly, C1, C2 and C3 are used to estimate the SBER of C1. Also for each pair of circuits, M test patterns are used. This estimation is also repeated 500 times. The sample variance of these estimates is calculated too. Then the two sample variance values are compared. We used C432 to do the experiment. C1, C2, C3 and C4 are four different faulty copies of C432, each of which corresponds to a single stuck at fault of C432. Because of the requirement for independency, we intentionally selected four faults that satisfy the independence. M is set to 100000. The error-significance threshold is 25. The actual SBER values of C1, C2 and C3 are 0.007629, 0.014738 and 0.014157, respectively. If the estimated SBER of C1 has larger variance than that when only C1, C2 and C3 are used, the actual SBER value of C4 must be greater than 0.065419. As to C2, the actual SBER value of C4 must be greater than 0.05831. As to C3, the actual SBER value of C4 must 119 be greater than 0.058891. From the experiment of each method, we obtained 500 estimated SBER values of C1, C2 and C3. Their variances were computed and compared. Because the actual SBER of C4 is smaller than 0.065419, we expect the estimated SBER value of C1 using four copies is more accurate than that using three copies. The comparison is shown in Table 4.3. The result shows that the variance of estimated SBER when four circuits are used together for SBER estimation is smaller than that when three circuits are used. This is consistent with our expectation because the actual SBER value of C4 is 0.01128. Table 4.3: Comparison of SBER estimation for C1, C2 and C3 using two methods. One method uses only C1, C2 and C3 together for the estimation. The other method uses C1, C2, C3 and C4 together for the estimation. C1, C2, C3 and C4 are all faulty copies of C432 C1, C2 and C3 are used C1, C2, C3 and C4 are used Mean Variance Mean Variance C1: signal 232 s-a-1 (True SBER: 0.007629) 0.007344 1.8007E-7 0.007201 8.5617E-8 C2: signal 175 s-a-1 (True SBER: 0.014738) 0.014385 1.8393E-7 0.014736 9.3487E-8 C3: signal 335 s-a-1 (True SBER: 0.014157) 0.014285 1.6258E-7 0.014279 1.0044E-7 C4: signal 218 s-a-0 (True SBER: 0.011280) N/A N/A 0.011475 8.9956E-8 4.4 Single Copy 4.4.1 Estimation Method We now consider the situation when no duplicate copies exist, and hence there is no reference circuit. To reduce storage costs, we choose not to store the correct responses to the test patterns on-chip. Instead we use a form of signature analysis that compresses a set 120 of responses into one signature and employ a BIST test architecture similar to that used for signature-based error-rate estimation. The signature analyzer employed here for a SBER test is different from the signature analyzer used for error-rate estimation in Chapter 2 and that used in classical fault detection, namely a MISR. For error-rate estimation we only care whether or not there is an error. For a SBER test, we also care about the significance of an error. Again, we only consider the case where the error-significance is given by the quantity | | * O O − . For this case, we implement the signature analyzer by using an accumulator that simply adds all responses together at the output of the circuit under test. The test procedure is as follows: apply S test sessions, where each test session consists of applying L test patterns. After each test session, the “signature” or state of the accumulator is compared with the correct signature via computing their difference. If the absolute value of the difference is greater than the error-significance threshold, the test session is said to have failed; otherwise it passes. The total number of failing sessions, denoted by F, is recorded. Let e denote the error-rate of a faulty CUT, and p the SBER value. We assume e is known a prior to SBER estimation. If a test session has no error, it is sure to pass. Given that a test session has one error, the probability that it fails the test is p/e. It is easy to see that the probability of having one error in a session is 1 1 , ) 1 ( − − L L e e C , where ) )! ( ! /( ! , n m n m C n m − = . So the probability that a test session fails with one error, which is denoted by Prob(a test session fails and has one error), is e p e e C L L / ) 1 ( 1 1 , − − . 121 Unfortunately, we do not know the probability that a test session fails given two errors, or three errors and so on. Let q n denote the probability that a test session fails given there are n errors in the session. Thus, q 1 =p/e. One the other hand, we know the probability that a test session has n errors (n ≤ L), which is n L n n L e e C − − ) 1 ( , . Table 4.4: Probability of a test session having X errors. Numerical examples are provided too. For each value of X, the conditional probability that a test session fails is also listed X: Number of errors in a session Probability that a test session has X errors Numerical result of the probability that a session has X errors for e=0.01 and L=100 The probability that a test session fails given it has X errors 0 L e) 1 ( − 0.366 0 1 1 1 , ) 1 ( − − L L e e C 0.3697 p/e 2 2 2 2 , ) 1 ( − − L L e e C 0.1849 q 1 3 3 3 3 , ) 1 ( − − L L e e C 0.061 q 2 4 4 4 4 , ) 1 ( − − L L e e C 0.0149 Ignored 5 5 5 5 , ) 1 ( − − L L e e C 0.0029 Ignored >6 Ignored Now consider a case where e is small, say 0.01. Set L = 1/e, i.e., 100 in this case. Table 4.4 shows numerical results of the probabilities that a test session has a various number of errors. From these data, we see that the probability that a test session has more than 3 errors is very small. So it is reasonable to ignore these cases when the error-rate is small. Thus we obtain an expression for the probability that a test session fails as Prob(a test session fails) = ∑ = L n 1 Prob(a test session fails and has n errors) ≈ ∑ = 3 1 n Prob(a test session fails and has n errors) 122 = 3 3 3 , 2 2 2 2 , 1 1 1 , ) 1 ( ) 1 ( ) 1 ( − − − − + − + − L L L L L L e e C q e e C q e e C e p In our test scheme, there are S test sessions, F of which fails. So we use F/S to estimate the probability that a test session fails. Thus, we have 3 3 3 , 2 2 2 2 , 1 1 1 , ) 1 ( ) 1 ( ) 1 ( / − − − − + − + − = L L L L L L e e C q e e C q e e C e p S F . (4.1) However, we cannot solve for p from (4.1) because q 1 and q 2 are unknown. We may repeat the test three times, and obtain F 1 , F 2 and F 3 . If these three tests use the same value for L, they are equivalent to one test having 3S test sessions. It does not help solve the problem. Instead, we need different values of L for the three tests. They are denoted as L 1 , L 2 and L 3 for the first, second and third test, respectively. Then we have + + ≈ + + ≈ + + ≈ 2 3 1 3 3 3 2 2 1 2 2 2 2 1 1 1 1 1 / / / / / / q c q b e p a S F q c q b e p a S F q c q b e p a S F , (4.2) where 1 1 1 1 ) 1 ( − − = L e e L a , 2 2 2 , 1 1 1 ) 1 ( − − = L L e e C b , 3 3 3 , 1 1 1 ) 1 ( − − = L L e e C c , 1 2 2 2 ) 1 ( − − = L e e L a , 2 2 2 , 2 2 2 ) 1 ( − − = L L e e C b , 3 3 3 , 2 2 2 ) 1 ( − − = L L e e C c , 1 3 3 3 ) 1 ( − − = L e e L a , 2 2 2 , 3 3 3 ) 1 ( − − = L L e e C b and 3 3 3 , 3 3 3 ) 1 ( − − = L L e e C c . By solving for p from (4.2), we obtain the estimator of SBER as 3 3 3 2 2 2 1 1 1 3 3 3 2 2 2 1 1 1 / / / ˆ c b a c b a c b a c b S F c b S F c b S F e p = (4.3) If we ignore the cases where there are more than three errors in one test session, we see that F 1 , F 2 and F 3 are linear function of p, which implies that the expectations of F 1 , 123 F 2 and F 3 are also linear functions of p. Because the estimator of SBER, p ˆ , is a linear function of F 1 , F 2 and F 3 , the expectation of p ˆ is a linear function of the expectations of F 1 , F 2 and F 3 . By replacing the expectation of F 1 , F 2 and F 3 with functions in terms of p, we find the expectation of p ˆ to be p, i.e., Exp{ p ˆ } = p. (4.4) For the variance of p ˆ , we can use (4.3) to derive a function in terms of the variances of F 1 , F 2 and F 3 , which are functions of p, q 1 and q 2 . Finally, we have the variance of p ˆ as ( ) − − − − − − = 2 1 2 2 1 2 3 1 1 3 2 2 3 3 2 3 3 2 2 1 1 2 2 ) ( ) ( ) ( ) 1 ( ) 1 ( ) 1 ( } ˆ { c b c b c b c b c b c b t t t t t t SW e p Var (4.5) where 2 1 1 1 1 1 / q c q b e p a t + + = , 2 2 1 2 2 2 / q c q b e p a t + + = , 2 3 1 3 3 3 / q c q b e p a t + + = and 3 3 3 2 2 2 1 1 1 c b a c b a c b a W = . Similar to other estimators in previous chapters, we use N(p, Var{ p ˆ }) to approximate the distribution of the estimated SBER value p ˆ . With the knowledge of the statistical properties of the SBER estimator, let us consider the following question. Given error-rate e and SBER value p, what values should be chosen for L 1 , L 2 , L 3 and S so that the estimated SBER value is within [p(1-ε), p(1+ε)] with probability γ (also referred to confidence), where ε is a small value less than 1, and 0<γ<1? We have addressed questions similar to this one in chapters 2 and 3. 124 For the selection of values for L 1 , L 2 , and L 3 , we may add constraints such as minimizing S. Because the optimization is based on three variables, the analytical process is difficult and not practical. Instead, we can use numerical method. First, we require L 1 to be 1/e, and L 2 and L 3 to be less than L 1 . Thus, the assumption for ignoring the case of having four or more errors per session holds true. Next, we choose different values for L 2 and L 3 , then use the method discussed below to determine S until we find minimal value of S. To make the estimated value to be in [p(1-ε), p(1+ε)] with confidence γ , we have [ ] 2 1 2 2 ) 2 / ) 1 (( } ˆ { γ ε − ≤ − Q p p Var . In the expression of } ˆ {p Var , there are two unknown variables q 1 and q 2 , both of which are in [0, 1]. To solve this issue, we select values for q 1 and q 2 so that } ˆ {p Var is equal to its minimal upper bound in terms of q 1 and q 2 . This can be achieved when q 1 = q 2 = 1. Thus, we have [ ] 2 1 2 2 1 , 1 ) 2 / ) 1 (( } ˆ { 2 1 γ ε − ≤ − = = Q p p Var q q (4.6) From (4.6), we are able to determine the value for S. Assume e = 0.01, γ = 0.9545 and ε= 0.05. We set L 1 =100. For different values of p, the selection of L 2 and L 3 are determined by numerical analysis with respect to minimizing S. Table 4.5 shows the values of S obtained from (4.6) for different values of p. For each test, we need S test sessions. For each test session, one correct signature needs to be stored. Because there are three tests, we need to store 3S correct signatures. However, we can do the test with fewer correct signatures using a technique called 125 resampling. The concept of resampling is to use the same random data repeatedly. For example, in our case, if L 2 =2*L 3 , each session of length L 2 can be made up of two sessions of length L 3 . Because the correct signature is the accumulation of responses, each correct signature of the second test is the sum of two correct signatures of the third test. If resampling is used, the number of patterns per session in the first and the second tests must be multiples of the number of patterns per session in the third test. Thus, the total number of correct signatures to be stored is equal to the number of test sessions. Table 4.5 also lists the number of correct signatures to be stored for both without and with resampling. To evaluate our estimation method, we compared it with Bernoulli process based estimation method in terms of the number of correct signatures (responses) to be stored (see Chapter 2.2). In Bernoulli process based estimation method, we compare each observed response with the corresponding stored correct response, and determine whether the error-significance is greater than a threshold. The number of responses whose error-significance is greater than the threshold is recorded. The estimated SBER value is obtained via dividing this number by the total number of applied test patterns. From Table 4.5, we see that our method with resampling uses much less storage than Bernoulli process based estimation method. With the estimator of SBER, we are able to use SBER to classify chips. That is, given a SBER threshold T SBER , we can determine whether the SBER value of a defective circuit is greater than T SBER . This is exactly the same question that has been discussed several times in previous chapters. Note that for chip classification, the computation of the number of test session (S) will use T SBER instead of the true SBER value. 126 Table 4.5: The number of sessions (S) for different values of p, given e = 0.01, γ = 0.9545, ε= 0.05, L 1 =100. Comparison between our SBER estimation method and Bernoulli process based estimation method in terms of the number of correct signatures Number of correct signatures (responses) p S/L 2 /L 3 according to (4.6) Our method without resampling Our method with resampling Bernoulli process based estimation method 0.001 4.66E5/70/10 1.40E6 4.66E5 1.60E6 0.002 1.86E5/70/10 5.58E6 1.86E5 7.98E5 0.003 1.13E5/70/10 3.39E5 1.13E5 5.32E5 0.004 8.01E4/70/10 2.40E5 8.01E4 3.98E5 0.005 6.16E4/70/10 1.85E5 6.16E4 3.18E5 0.006 4.99E4/70/10 1.50E5 4.99E4 2.65E5 0.007 4.16E4/70/10 1.25E5 4.16E4 2.27E5 0.008 3.56E4/80/10 1.07E5 3.56E4 1.98E5 0.009 3.09E4/80/10 9.27E4 3.09E4 1.76E5 4.4.2 Experimental Results In the experiment, we used benchmark circuit C432 with a stuck-at-0 fault at signal 39. The true error-rate of this defective circuit is 0.067596. The error-significance threshold is 50. Its true SBER value is 0.032132. We choose L 1 to be 15, L 2 10 and L 3 5. We set S = 24000. Resampling is used in the experiment. A session with L 1 vectors is composed of three sessions with L 3 vectors. The signature of the session with L 1 vectors is the sum of the signatures of the three sessions with L 3 vectors. Similarly, a session with L 2 vectors is composed of two sessions with L 3 vectors. The signature of the session with L 2 vectors is the sum of the signatures of the two sessions with L 3 vectors. This experiment is repeated the SBER estimation 500 times. Thus, we obtained 500 samples 127 of the estimated SBER. MATLAB tool “normplot” is used to evaluate whether these data have a normal distribution. Figure 4.3a shows the distribution of the data and it appears to be normal. The output of “normplot” is shown in Figure 4.3b, and is fairly linear, confirming that the data has a normal distribution. From our data, we computed the sample mean to be 0.032994 and the sample variance to be 3.4775E-7. So the estimated SBER has a normal distribution of N(0.032994, 3.4775E-7). We described the statistical characteristics of the SBER estimator in previous section, based on which the estimated data has a normal distribution of N(0.032132, 4.2519E-6). The sample mean from the experiment is close to the expected SBER value, and the sample variance is better than expected. This shows that our mathematical analysis matches our experiment. We did another experiment with a copy of C432 that has a stuck-at-0 fault at signal 120. It true error-rate is 0.005891, and true SBER 0.004333. We set L 1 =168, L 2 =112, L 3 =56 and S=24000. Resampling is not used this time. From the above discussion, we expect the estimated SBER to have a normal distribution of N(0.004333, 6.5592E-8). Again, the estimation process is repeated 500 times, and 500 estimated SBER values are obtained. The mean of these data is 0.004362, and the variance is 6.0280E-8, which implies a normal distribution of N(0.004362, 6.0280E-8). These results again illustrate the match between the experiments and the analysis. 128 4.4.3 Special Case In the previous sections, we used | * | O O − to define error significance, where O* and O are the observed response and the correct response, respectively. In this section, a special case is considered. The special case assumes O* is always equal to or greater than O. This means that | * | O O − equals O O − * and is non-negative. This special case is not common in practical problems. However, for this special case, the SBER estimation becomes simpler than the general case. Because the special case shares the same fundamental concept of SBER estimation with the general case, we include this special case in this chapter to complete the discussion of our SBER. The special case uses the same test scheme as the general case. That is, a test includes S test sessions, each session has L test patterns, and an accumulator is used for the signature analyzer. We will show that for this special case only two tests are needed, not three. Again, let T ES denote the error-significance threshold. We define two types of output errors. If the error-significance of an erroneous response is greater than T ES , it is a Type A error; otherwise, it is a Type B error. Let e denote the error-rate, and p the SBER value. A test session passes, if there are no errors. If there is at least one Type A error in a test session, it fails regardless of the number of Type B errors. If there is no Type A error and only one type B error in a test session, it passes. If there is no Type A error and two type B errors in a test session, it may pass or fail. For this case, we use q to represent the probability that it fails. If there is no Type A error and there are more than two type B errors in a test session, it may pass or fail also. However, because the probability of 129 having three or more errors in a test session is small for the value of L being considered in this work, we can either ignore such events or assume this outcome always leads to a failing test session. For the purpose of making the computation simpler, we choose the second alternative. In Table 4.6, all these outcomes are listed with their probabilities. Table 4.6: The different outcomes of the outputs for the special case Outcome of a test session Probability of the outcome Numerical example (e=0.01, L=50 p=0.003) Given the outcome, the probability that a test session fails no error L e) 1 ( − 0.605 0 at least one Type A error L p) 1 ( 1 − − 0.1395 1 no Type A error and only one Type B error 1 ) 1 )( ( − − − L e p e L 0.2139 0 no Type A error and exactly two Type B errors 2 2 2 , ) 1 ( ) ( − − − L L e p e C 0.03705 q no Type A and more than two Type B errors − − − − L L e p ) 1 ( ) 1 ( − − − −1 ) 1 )( ( L e p e L 2 2 2 , ) 1 ( ) ( − − − L L e p e C 0.00457 1 (assumed for the purpose of computation) From Table 4.6, we have the probability that a test session fails as Prob(a test session fails) = 2 2 2 , 1 ) 1 ( ) ( ) 1 ( ) 1 )( ( ) 1 ( 1 − − − − − − − − − − − L L L L e p e C q e p e L e . In our test scheme, there are S test sessions, F of which fail. We use F/S to estimate the probability that a test session fails. Then we have 2 2 2 , 1 ) 1 ( ) ( ) 1 ( ) 1 )( ( ) 1 ( 1 / − − − − − − − − − − − = L L L L e p e C q e p e L e S F . (4.7) However, there are two unknown quantities in (4.7). So we repeat the test twice and obtain F 1 and F 2 , where the value of L for the first test is L 1 , and the second L 2 , and L 1 ≠ L 2 . Thus, we have 130 0.031 0.033 0.035 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Sample Data Distribution (a) 0.0315 0.032 0.0325 0.033 0.0335 0.034 0.0345 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Normal Probability Plot (b) Figure 4.3: (a) Distribution of 500 estimated SBER values of C432 with fault stuck-at-0 at signal 39. (b) The output from MATLAB tool “normplot” to test whether the data are normally distributed. 131 − − − − − − − ≈ − − − − − − − ≈ 2 2 2 2 2 1 1 1 ) / 1 ( ) 1 ( ) / 1 ( ) 1 ( 1 / ) / 1 ( ) 1 ( ) / 1 ( ) 1 ( 1 / 2 1 e p b q e p a e S F e p b q e p a e S F L L (4.8) where 1 1 1 1 ) 1 ( − − = L e e L a , 2 2 2 , 1 1 1 ) 1 ( − − = L L e e C b , 1 2 2 2 ) 1 ( − − = L e e L a and 2 2 2 , 2 2 2 ) 1 ( − − = L L e e C b . By solving p from (4.8), we obtain the estimator of SBER as 2 2 1 1 2 2 1 1 2 1 ) 1 ( / 1 ) 1 ( / 1 ˆ b a b a b e S F b e S F e e p L L − − − − − − − = (4.9) Similar to Section 4.4.1, if we ignore the approximation due to our assumption that three or more errors lead to failure, the expectation of the estimator will be p. We further derive the variance of the estimator from (4.9) as 2 1 2 2 1 2 2 2 1 1 1 2 2 2 ) ( ) 1 ( ) 1 ( } ˆ { b a b a S t t b t t b e p Var − − + − = , (4.10) where, 1 2 1 1 ) / 1 )( 1 ( ) / 1 ( ) 1 ( 1 b e p q a e p e t L − − + − + − = and 2 2 2 2 ) / 1 )( 1 ( ) / 1 ( ) 1 ( 2 b e p q a e p e t L − − + − + − = . Thus, we have the distribution of the estimated SBEB p ˆ as N(p, } ˆ {p Var ). Now consider the question of having the estimator SBER value to be in the range [p(1-ε), p(1+ε)] with probability γ, where ε is a small value less than 1 and 0<γ<1. To satisfy the requirement of estimation, we have [ ] 2 1 2 2 ) 2 / ) 1 (( } ˆ { γ ε − ≤ − Q p p Var . In the expression of } ˆ {p Var , there is an unknown variable q. We choose q = 0 to make } ˆ {p Var be at its least upper bound in terms of q. Thus, we have [ ] 2 1 2 2 0 ) 2 / ) 1 (( } ˆ { γ ε − ≤ − = Q p p Var q . 132 We computed appropriate values for L 1 and L 2 via numerical analysis so that S is minimized under the constraint that both L 1 and L2 are not greater than 1/e. In practice, L 1 can be set to 1/e. L 2 is chosen via numerical analysis so that S is minimized. Assume e = 0.01, γ = 0.9545 and ε= 0.05. Thus, we set L 1 = 1/e = 100. Table 4.7 shows the minimal value of S for different values of p and the appropriate value of L 2 . Without resampling, the number of stored signatures is 2xS, in which S signatures are for sessions of length L 1 and S signatures are for sessions of length L 2 . With resampling, we only need S signatures. In Table 4.7, we see that for some values of p, L 1 is a multiple of L 2 . In this case, S signatures for L 2 are stored. The signatures for L 1 are derived from the signatures for L 2 . However, for some values of p, L 1 is not a multiple of L 2 (for example, L 1 =100 and L 2 =30 when p=0.008). In this case, we store S signatures for 10 patterns per session (length 10). The signature for L 1 and L 2 are all derived from the signatures for sessions of length 10. We also compare our estimation method with Bernoulli process based estimation method, and show the results in Table 4.7. Clearly, our estimation method uses about 1/10 the storage needed for the Bernoulli process. Comparing Table 4.7 and Table 4.5, we also observe that the special case uses less storage than the general case. 4.5 Conclusions In this chapter, we considered the estimation problem of SBER, a measure to justify the acceptability of defective chips. We identified three scenarios for testing a target to estimate its SBER. The first one assumes that multiple copies of the target circuit exist and at least one is defect-free and identifiable. In the second scenario, there are again 133 multiple copies of the targeted circuit, all might be defective. In the third situation, there is only a single copy of the target circuit. For each scenario, we presented a test method based on a BIST architecture. Also for each estimator, we analyzed its statistical characteristics, such as mean and variance. Using these results, we are able to determine the test parameters for our BIST scheme. Table 4.7: The number of sessions (S) for different values of p, given e = 0.01, γ = 0.9545, ε= 0.05. Comparison between our SBER estimation method and Bernoulli process based estimation method in terms of the number of correct signatures Number of correct signatures (responses) p S/L 1 /L 2 Our method without resampling Our method with resampling Bernoulli process based estimation method 0.001 1.89E5/100/20 3.78E5 1.89E5 1.60E6 0.002 8.85E4/100/20 1.77E5 8.85E4 7.98E5 0.003 5.66E4/100/20 1.13E5 5.66E4 5.32E5 0.004 4.09E4/100/20 8.18E4 4.09E4 3.98E5 0.005 3.16E4/100/20 6.32E4 3.16E4 3.18E5 0.006 2.55E4/100/20 5.10E4 2.55E4 2.65E5 0.007 2.10E4/100/30 4.20E4 2.10E4 2.27E5 0.008 1.75E4/100/30 3.50E4 1.75E4 1.98E5 0.009 1.48E4/100/30 2.96E4 1.48E4 1.76E5 In the first two scenarios, the test methods are applicable using any reasonable definition of error-significance. In the third scenario, we considered the case where error- significance is defined as | * | O O − , and used a BIST test scheme that is similar to the one for error-rate estimation, where a test is divided into S test sessions and each session has L test patterns. However, the signature analyzer for SBER estimation is calculated using an accumulator, and the estimation includes three tests, each of which uses different values for L. We also introduced the concept of resampling. With resampling, the storage 134 for saving correct responses is significantly reduced. To evaluate our method, we chose the amount of storage as a parameter to minimize, and compared our estimation method with Bernoulli process based estimation method. The comparison shows that our method needs about one fifth of the storage used in the Bernoulli based estimation method. For the third scenario, we also considered the special case where O* is always equal to or greater than O. A different estimator is developed, for which, only two tests with different values of L are needed, and fewer test sessions (storage) are required when compared with the general case. 135 Chapter 5 Summary of Contributions In this thesis, we presented methods of error-rate and SBER estimation based on BIST architecture in support of error-tolerance. Error-tolerance is a concept to increase effective yield by using defective chips that result in acceptable performance in some systems. Several metrics exist to quantify the acceptability of defective chips, two of which are error-rate and SBER. Given uniformly distributed random patterns in the input space, error-rate is the probability of the occurrence of erroneous responses, and SBER is the probability of occurrence of erroneous responses whose error-significance is greater than a given threshold. Prior to our work, Bernoulli process based estimation methods had to be used. With certain requirements on accuracy and confidence, the number of test patterns that are necessary for Bernoulli process based estimation methods increases as error-rate decreases (see Table 2.3). This implies a large amount of storage of correct responses when error-rate is small. We developed an error-rate estimation method using signature analysis. A test is divided into multiple sessions (S), each of which includes multiple test patterns (L). Using our estimation method, the number of correct signatures is determined by S, and S is primarily affected by the accuracy and confidence of the estimation. For example, if we want the estimated error to be in the range [r(1−0.05), r(1+0.05)] with probability 0.9545, Bernoulli process based estimation method needs to store 1.6 million correct responses if 136 r=0.00001, while our method only needs to store less than 3000 correct signatures. This gives our estimation method advantages when storage is of concern. Generally, the storage of our estimation method is about r times the storage of Bernoulli process based method. As r becomes smaller, our method saves more storage compared to Bernoulli process based method. In the procedure of error-rate estimation using signature analysis, we determine whether a test session fails or passes by checking whether the observed signature is the same as correct signature. Ignoring aliasing, a test session fails when at least one response is in error. However, our test method does not keep track of how many errors occur in a test session. It seems that we might be ignoring some useful information. Different from signature analysis, ones counting can tell us the difference in the number of ones between a binary sequence from the CUT and a binary sequence from the golden circuit. So ones counting may provide us more information to reduce the resources of error-rate estimation. This leads to the second part of our work. In the second part of our work, we analyzed error-rate estimation method using ones counting for single output circuit. It turns out that the resources used for error-rate estimation using ones counting is quite comparable to that using signature analysis. The hardware complexities of both methods are about the same. Using ones counting method, we obtain the difference of ones counts of two binary sequences, but still do not know how many errors there are in the observed binary sequence. Some times, the difference of ones counts is misleading. For example, consider a single output circuit built with XOR gates. Assume there is one stuck-at fault. After applying N test patterns (N is large), the 137 difference of ones counts in the observed output sequence and the correct output sequence is close to zero. However, about N/2 errors exist in the observed output sequence. So ones counting provides more information than signature analysis for some cases, and less information for other cases. This explains why the ones counting method does not use significantly less resources than the signature analysis method, which is contrary to what was initially believed. We further extended ones counting method to multiple output circuits. Even though we can obtain the error-rate of each output line, it is impractical to estimate the error-rate of the multiple output circuit using these values because the error-rate is a function of both the logic structure of the circuit and where faults are located. To solve this problem, we classify the output patterns of a fault circuit into four types according to the difference in the number of ones in the observed output pattern and the number of ones in the correct pattern, namely 1) the difference is of type 4n, 2) the difference is of type 4n+1, 3) the difference is of type 4n+2, and 4) the difference is type 4n+3. Types 2), 3) and 4) imply there are errors in output patterns. Type 1) imply that there may be no errors, 2 errors, 4 errors and so on. We use a parity checker, a lower mod-4 circuit and an upper mod-4 circuit to convert a multiple output circuit to a single output circuit. Thus, we are able to estimate the probability of errors that belong to types 2), 3) or 4). However, finding the probability of errors in type 1) is still an open question. It is difficult to distinguish patterns having a multiple of two errors from those having no (zero) errors, since zero is also a multiple of 2. To estimate the error-rate of a multiple output circuit, we cannot simply ignore the probability of type 1) error patterns because 138 for some faulty circuits it is a significant contributor to the error-rate. This unsolved problem may be a research topic in the future. In the last part of our work, we proposed SBER as a metric for the acceptability of defective chips and developed techniques for estimating SBER. The techniques of SBER estimation are based on a BIST architecture, similar to that used for error-rate estimation. Because error-significance has various definitions, the estimation of SBER may be different for various definitions of error-significance. We identified three scenarios for SBER estimation. The first one assumes that multiple copies of the target circuit exist and at least one is defect-free. In this scenario, the estimation method is similar to Bernoulli process based estimation method. Because a defect-free copy acts as a golden circuit, there is no need to store correct responses. In the second scenario, there are again multiple copies of the targeted circuit, but it is not known which if any is defect-free copy. Our method works when there are three or more copies of the target circuit and they are independent with respect to erroneous output patterns. The problem with our method for this scenario is that we do not know in advance whether they are independent or not. We provided a partial solution when there are five or more copies. For example, A, B, C, D and E are five copies of the target circuit. We may use A, B and C to estimate the SBER value of A, and use A, D and E to estimate the SBER value of A again. If the two estimations are close, A is likely to be independent with respect to B, C, D and E. This partial solution of justifying independence needs extra test overhead and may not guarantee obtaining a correct result. 139 If we cannot justify independence in advance, the SBER estimation method for the third scenario can be used, where there is only a single copy of the target circuit. We use an accumulator to implement output compressor and the same test scheme as error-rate estimation. When there is only one error in a test session, we can determine if its error- significance is greater than the given threshold by whether or not the test session fails. However, if there are two or more errors in a test session, we cannot tell the relationship between error-significance and the failure of a test session. In our method, we treat as estimated parameters the probabilities that a test session fails when it has two or more errors. Thus, these parameters and the SBER are estimated together. The total number of tests is equal to the number of estimated quantities. These tests have the same value for S but different values for L. The total number of the stored correct signature is S times the number of subtests. However using resampling, we reduce the number of stored correct signature to S. Our experiments show that with resampling, the amount of storage is about one fifth of that used in Bernoulli process based estimation method. 140 Bibliography [1] M. Abramovici, M. A. Breuer and A.D. Friedman, Digital System Testing and Testable Design, W.H. Freeman and Company, New York, NY, 1990. [2] P. V. Argade, Digital Secretary, US Patent 5651055, July 1997, http://www.freepatentsonline.com/5651055.html. [3] A. Bachtold, P. Harley, T. Nakanishi, and C. Dekker, "Logic Circuits with Carbon Nanotube Transistors", Science, Vol. 294, pp. 1317-1320, 2001. [4] B. I. Bahar, "Trends and Future Directions in Nano Structure Based Computing and Fabrication", Proc. 24th Int'l. Conf. on Computer Design, pp. 522-527 Oct. 1-4, 2006. [5] G. A. Barnard, "Sequential Tests in Industrial Statistics", Journal of the Royal Statistical Society, Series B, Vol. 8, No. 1, pp. 1-26, 1946. [6] W. Bartky, "Multiple Sampling with Constant Probability", The Annals of Mathematical Statistics, Vol. 14, pp. 363-377, 1943. [7] M. A. Breuer, S. K. Gupta and T. M. Mak, "Defect and Error Tolerance in the Presence of Massive Numbers of Defects", IEEE Design and Test of Computers, pp. 216-227, May-June 2004. [8] M. A. Breuer and H. Zhu, "Error-Tolerance and Multi-Media," Proc. Int'l. Conf. on Intelligent Info. Hiding and Multimedia, pp. 521-524, 2006. [9] M. A. Breuer, "Intelligible Testing", Proc. 4th Multimedia Technology and Applications Symp., pp. 11-19, April 1999. [10] M. A. Breuer, "Intelligible Test Techniques to Support Error-Tolerance", Proc. 13th Asian Test Symp., pp. 386-393, 2004. [11] M. A. Breuer, "Estimating Error Rate in Error Tolerant VLSI Chips", IEEE Int'l. Workshop on Electronic Design, Test and Applications (DELTA 2004), pp. 321- 326, January 2004. [12] M. Butts, A. Dehon and S. C. Goldstein, "Molecular Electronics: Devices, Systems and Tools for Gigagate, Gigabit Chips", Proc. Int'l. Conf. on Computer Aided Design, pp. 433-440, Nov. 10-14, 2002. 141 [13] G. Casella and R. L. Berger, Statistical Inference, 2nd ed., Duxbury Press, 2001. [14] Z. J. Chen, E. G. Dierschke, S. D. Clynes and A. Liu, Filtering of Defective Picture Elements in Digital Images, European Patent EP 1045578, 2000, http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=EP1045578&F=0 [15] Y. Chen, et al, "Nanoscale Molecular-switch Crossbar Circuits", Nanotechnology, Vol. 14, Issue 4, pp. 462-468, 2003. [16] I. Chong, H. Y. Cheong, and A. Ortega, "New Quality Metric for Multimedia Compression Using Faulty Hardware", in Proc. of Int'l Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM'06), Arizona, January, 2006. [17] I. S. Chong and A. Ortega, "Hardware Testing for Error Tolerant Multimedia Compression Based on Linear Transforms", Defect and Fault Tolerance in VLSI Systems Symp., pp. 523-531, October, 2005. [18] C. P. Collier, E. W. Wong, M. Belohradsky, F. M. Raymo, J. F. Stoddart, P. J. Kuekes, R. S. Williams, J. R. Heath, "Electronically Configurable Molecular-Based Logic Gates", Science, Vol. 285, pp. 391-394, 1999. [19] Y. Cui and C. M. Lieber, "Functional Nanoscale Electronics Devices Assembled Using Silicon Nanowire Building Blocks", Science, Vol. 291, pp. 851-853, 2001. [20] R. C. Dorf, editor-in-chief, The Electrical Engineering Handbook, "Fault Tolerance", chapter 87, by B. W. Johnson, CRC Press, 1993. [21] A. J. Duncan, Quality Control and Industrial Statistics, 3rd ed., Richard D Irwin, Homewood, IL, 1965. [22] E. L. Grant and R. S Leaveworth, Statistical Quality Control, McGraw Hill, 1988. [23] Int'l Technology Roadmap for Semiconductors: Yield Enhancement, 2006 Update, http://www.itrs.net/Links/2006Update/2006UpdateFinal.htm. [24] Z. Jiang and S. K. Gupta, "An ATPG for Threshold Testing: Obtaining Acceptable Yield in Future Processes", Proc. Int'l. Test Conf., pp. 824-833, 2002 [25] B. Konemann, J. Mucha and G. Zwiehoff, "Built-in Test for Complex Digital Integrated Circuits", IEEE J. Solid State Circuits, Vol. SC-15, No. 3, pp. 315-318, June 1980. 142 [26] H. H. Kuok, Audio Recording Apparatus Using an Imperfect Memory Circuit, US Patent 5414758, May 1995, http://www.freepatentsonline.com/5414758.html. [27] M. Mishra and S. C. Goldstein, "Defect Tolerance after the Roadmap", Proc. 10th Int'l. Test Synthesis Workshop (ITSW), 2003. [28] Z. Pan and M. A. Breuer, "Estimating Error Rate in Defective Logic Using Signature Analysis", IEEE Trans. on Computers, Vol. 56, No. 5, pp. 650-661, May 2007. [29] S. Shahidi and S. K. Gupta, "Estimating Error Rate During Self-Test via One's Counting", Proc. Int'l. Test Conf., paper 15.3, 2006. [30] C. E. Shannon and W. Weaver. The Mathematical Theory of Communication, University of Illinois Press, 1963. 143 Appendices Appendix A Expectation and Variance of the Estimator Using Ones Counting For error-rate estimation based on ones counting compression, we propose using the estimator 2 2 ˆ L M L V r D D + = , where M D and V D are, respectively, the sample mean and sample variance of the sampled random variable D. In this appendix, we derive expressions for the expectation and variance of the estimator. 1) Preliminaries In Section 3.3, we defined D = ∑ = L i i X 1 , where X 1 , X 2 , …, X L are i.i.d. and have the same distribution as the random variable X. The PDF of X is Prob(X=1)=p 1 , Prob(X= −1)=p 2 , and Prob(X=0)=1−p 1 −p 2 , where 1 0 1 ≤ ≤ p , 1 0 2 ≤ ≤ p and 1 0 2 1 ≤ + ≤ p p . To simplify future mathematical expressions, we define 0 α , 2 α , 3 α and 4 α as follows: { } 2 1 0 p p X E − = = α , ( ) { } 2 0 2 1 2 2 1 2 1 2 0 2 ) ( α α α − + = − − + = − = p p p p p p X E , ( ) { } { } 3 0 2 0 0 2 3 3 0 3 3 3 α α α α α − + − = − = X X X E X E 3 0 2 0 2 1 0 2 1 2 1 ) ( 3 ) ( 3 ) ( α α α − − + + − − = p p p p p p 144 3 0 0 2 1 0 2 ) ( 3 α α α + + − = p p , ( ) { } 4 0 4 α α − = X E { } 2 0 2 3 0 0 3 4 0 2 0 2 4 2 4 4 4 α α α α α X X X X X E + − − + + = 4 0 2 0 2 1 2 1 3 ) 4 ) ( 6 ( ) ( α α − − + + + = p p p p . 0 α is the expectation of X, and 2 α , 3 α and 4 α are the 2 nd , 3 rd and 4 th central moments of X. Additional symbols are defined below. } { 0 D E = μ , 0 μ − = D Y and 0 μ − = i i D Y (i = 1, 2, … , S), 0 } { } { 0 1 = − = = μ μ D E Y E , } ) {( } { 2 0 2 2 μ μ − = = D E Y E , } ) {( } { 3 0 3 3 μ μ − = = D E Y E , } ) {( } { 4 0 4 4 μ μ − = = D E Y E . 0 μ denotes the expectation of D. Y 1 , Y 2 , …, Y S are i.i.d. and have the same distribution as Y. 1 μ , 2 μ , 3 μ and 4 μ are the 1 st , 2 nd , 3 rd and 4 th order moments of Y. Since ∑ = = L i i X D 1 , we are able to express 0 μ , 2 μ , 3 μ and 4 μ in terms of 0 α , 2 α , 3 α and 4 α . { } { } 0 0 α μ L X LE D E = = = ; ( ) { } { } D Var D E Y E = − = = 2 0 2 2 } { μ μ 2 2 0 2 1 ) ( α α L p p L = − + = ; 145 ( ) { } 3 0 3 3 } { μ μ − = = D E Y E ( ) { } 3 0 2 1 ... α L X X X E L − + + + = ( ) ( ) { } 3 3 0 3 1 0 α α α L X LE X E L i i = − = − = ∑ = ; ( ) { } 4 0 4 4 } { μ μ − = = D E Y E ( ) { } 4 0 2 1 ... α L X X X E L − + + + = ( ) − = ∑ = 4 1 0 L i i X E α ( ) { } ( ) { } ( ) 2 2 0 4 0 ) 1 ( 3 α α − − + − = X E L L X LE 2 2 4 ) 1 ( 3 α α − + = L L L . 2) Derivation of the Expectation and Variance of the Estimator The expectation of the sample variance of a random variable is equal to the expectation of the random variable [13]. So we have } { } { D Var V E D = . (A1) The expectation of 2 D M is computed as follows. + + + = 2 2 1 2 ... } { S D D D E M E S D + = ∑ ∑ ∑ = = ≠ = S i S i S i j j j i i D D D E S 1 1 , 1 2 2 1 ( ) ( ) 2 2 2 } { ) 1 ( } { S D E S S D SE − + = ( ) ( ) ( ) S D E S D E D E 2 2 2 } { } { } { + − = ( ) ( ) S D E S D Var 2 } { } { + = . (A2) Thus, the expectation of the estimator is } { 1 } { 1 } { 2 2 2 2 D D D D M E L V E L L M L V E + = + ( ) ( ) 2 2 } { } { 1 } { D E S D Var SL L D Var + + = ) /( ) ( / 2 2 0 2 2 SL S L μ μ μ + + = ) /( ) ( 2 2 0 2 2 2 SL SL L α α α + + = 146 SL p p p p p p ] ) ( ) [( ) ( 2 2 1 2 1 2 1 − − + + + = . (A3) Next we compute the variance of the estimator. + + = + 2 2 2 2 2 2 , 2 L M L V Cov L M Var L V Var L M L V Var D D D D D D { } { } ( ) 2 3 2 4 2 , 2 1 1 D D D D M V Cov L M Var L V Var L + + = . (A4) We next compute each component in (A4). Now { } { } { } ( ) 2 2 4 2 D D D M E M E M Var − = , (A5) where 2 0 2 2 2 }) { ( } { 1 } { μ μ + = + = S D E D Var S M E D and + + + = 4 2 1 4 ... } { S D D D E M E S D + + + + = 4 0 2 1 ... μ S Y Y Y E S + + + + = ∑ ∑ ∑ ∑ = = = = 4 0 1 3 0 2 1 2 0 3 1 0 4 1 4 6 4 μ μ μ μ S Y S Y S Y S Y E S i i S i i S i i S i i ( ) ( ) ( ) 4 0 2 2 2 0 3 3 0 4 2 2 4 6 4 ) 1 ( 3 μ μ μ μ μ μ μ + + + − + = S S S S S S S S 4 0 2 0 2 2 3 0 3 2 2 3 4 6 4 ) 1 ( 3 μ μ μ μ μ μ μ + + + − + = S S S S S . (A6) Thus, { } 2 2 0 2 4 0 2 0 2 2 3 0 3 2 2 3 4 2 6 4 ) 1 ( 3 + − + + + − + = μ μ μ μ μ μ μ μ μ S S S S S S M Var D S S S 2 0 2 2 3 0 2 2 3 2 2 4 4 4 2 3 μ μ μ μ μ μ μ + + + − = . (A7) 147 We next consider the term { } D V Var . Let ( ) 0 4 2 1 / ... μ − = + + + = D Y M S Y Y Y M . Then ( ) ∑ = − − = S i D i D M D S V 1 2 1 1 ( ) ( ) ( ) ∑ = − − − − = S i D i M D S 1 2 0 0 1 1 μ μ ( ) ∑ = − − = S i Y i M Y S 1 2 1 1 . Recall that { } { } { } ( ) 2 2 D D D V E V E V Var − = { } { } ( ) { } 2 2 2 2 2 μ − = − = D D V E D Var V E . (A8) Hence, ( ) ( ) − − = ∑ = 2 1 2 2 1 } { S M Y E V E S i Y i D − − = ∑ = 2 1 2 2 2 ) 1 ( 1 S i Y i SM Y E S { } { } + − − = ∑ = 4 2 2 2 1 2 2 1 2 2 2 ) 1 ( 1 Y Y S i i M E S M Y E S Y E S . (A9) In (A9), { } 2 2 2 1 4 1 2 1 2 ) 1 ( Y Y S S SY E Y E S i i − + = ∑ = 2 2 4 ) 1 ( μ μ − + = S S S (A10) and { } ( ) { } 2 2 2 1 2 1 2 2 1 ... S Y Y Y Y E M Y E S Y + + + = ( ) 2 2 2 4 ) 1 ( S S μ μ − + = (A11) and { } { } 4 4 2 1 4 ) ... ( S Y Y Y E M E S Y + + + = ( ) 3 2 2 4 ) 1 ( 3 S S μ μ − + = . (A12) Therefore, { } 2 2 2 4 2 ) 1 ( 3 2 μ μ − + − + = S S S S S V E D . (A13) So finally for (A8) we have 148 { } { } 2 2 2 μ − = D D V E V Var ) 1 ( 3 1 2 2 2 2 4 − + − − = S S S S μ μ μ . (A14) We now consider the term ( ) } { } { } { , 2 2 2 D D D D D D M E V E M V E M V Cov − = in (A4), where − − = ∑ = 2 1 2 2 ) ( 1 1 } { D S i D i D D M M D S E M V E − − = ∑ = 2 1 2 ) ( 1 1 D S i Y i M M Y S E − − = ∑ = 2 2 1 2 1 1 D Y S i i M SM Y E S 2 2 2 2 4 2 0 2 3 0 2 2 3 0 4 3 ) 1 ( 1 2 ) 1 ( 2 S S S S S S S μ μ μ μ μ μ μ μ μ μ − − − + − + + − − = (A15) and { } { } { } 2 2 } { D D D M E D Var M E V E = 2 0 2 2 2 2 0 2 2 μ μ μ μ μ μ + = + = S S . (A16) Therefore, ( ) 2 2 2 2 4 3 0 3 0 4 2 3 ) 1 ( 1 2 ) 1 ( 2 , S S S S S S M V Cov D D μ μ μ μ μ μ μ − − − − + − − = . (A17) Substituting { } 2 D M Var , { } D V Var and ( ) 2 , D D M V Cov into (A4) using (A7), (A14) and (A17), the variance of the estimator is obtained as = + 2 2 L M L V Var D D − + − − ) 1 ( 3 1 1 2 2 2 2 4 2 S S S S L μ μ μ + + + − + S S S L 2 0 2 2 3 0 2 2 3 2 2 4 4 4 4 2 3 1 μ μ μ μ μ μ μ − − − − + − − + 2 2 2 2 4 3 0 3 0 4 3 3 ) 1 ( 1 2 ) 1 ( 2 2 S S S S S S L μ μ μ μ μ μ μ . (A18) Because S is large compared to 1, we assume that S−1 ≈ S. Now we have = + 2 2 L M L V Var D D + − 2 2 2 2 2 4 2 3 1 S S L μ μ μ + + + − + S S S L 2 0 2 2 3 0 2 2 3 2 2 4 4 4 4 2 3 1 μ μ μ μ μ μ μ 149 − + − − + 3 4 3 0 2 2 2 3 0 4 3 2 3 2 2 S S S L μ μ μ μ μ μ μ 2 3 0 3 0 2 0 2 2 2 4 2 2 2 2 2 2 2 4 3 3 2 LS LS S S α α α α α α α α α α + + + − + + = 3 3 2 2 4 3 2 2 2 4 3 2 2 2 2 2 2 4 3 0 3 3 3 4 S L S L LS S L α α α α α α α α α − + − + − − + + . (A19) 150 Appendix B Q Function: Q(x) The Q function Q(x) computes the right tail probability of normal distribution N(0, 1). It is defined as ∫ +∞ − = x t dt e x Q 2 / 2 2 1 ) ( π . Many probabilistic expressions related to a random variable having normal distribution N(μ, σ 2 ) can be expressed in a simple way using the Q function. Here are some examples. Let T denote such a random variable. Probability of T being in (x, +∞) is ∫ +∞ − − = − x t dt e x Q 2 2 2 / ) ( 2 2 1 ) / ) (( σ μ πσ σ μ . Probability of T being in (-∞, x) is 1- Q((x-μ)/σ). Probability of T being in (μ-α,μ+α), where α>0, is 1-2Q(α/σ). 151 Appendix C Justification of (3.25) with r<1/5 being maximized when 0 0 = α (Section 3.4.4) Equation (3.25) in Section 3.4.4 states that ( ) ( ) ( ) − − + − + − − = − L r r r r r Q S 4 0 2 0 2 2 2 0 2 2 2 1 6 ) 2 10 ( ) 3 ( 2 2 / ) 1 ( α α α ε γ . Because 2 1 0 p p − = α and 2 1 p p r + = , we have r < ≤ 2 0 0 α and r r ≤ − < 2 0 0 α . Thus ) ( 2 0 α − r is maximized when 0 0 = α . So is 2 2 0 ) ( α − r . The term 4 0 2 0 2 6 ) 2 10 ( ) 3 ( α α − − + − = r r r T can be rewritten as ( ) 6 / ) 1 4 7 ( 6 / ) 1 5 ( 6 2 2 2 0 + − + − − − = r r r a T . (C1) When 6 / ) 1 5 ( 2 0 − > r α , T decreases as 2 0 α increases. For r<1/5, 6 / ) 1 5 ( − r <0. On the other hand, 2 0 α is always non-negative. For r <1/5, T is maximized when 0 2 0 = α . Thus, for r<1/5, both 2 2 0 ) ( α − r and ( ) 6 / ) 1 4 7 ( 6 / ) 1 5 ( 6 2 2 2 0 + − + − − − r r r a are maximized when 0 2 0 = α . Hence, for r <1/5, ( ) ( ) ( ) − − + − + − − = − L r r r r r Q S 4 0 2 0 2 2 2 0 2 2 2 1 6 ) 2 10 ( ) 3 ( 2 2 / ) 1 ( α α α ε γ is maximized when 0 2 0 = α , and its maximum is ( ) ( ) − + − = − 3 1 1 2 2 / ) 1 ( 2 2 1 r L Q S ε γ . (C2)
Abstract (if available)
Abstract
As CMOS scaling continues, feature size approaches molecular dimensions and the number of devices per chip reaches astronomical values. It becomes more and more difficult to reach desired yield levels. This motivates new models of design and test. One of such models is called error-tolerance, which permits defective chips with acceptable performance to be used in systems. One key issue of error-tolerance is how to justify the acceptability of a defective chip.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Error-rate testing to improve yield for error tolerant applications
PDF
Error tolerant multimedia compression system
PDF
Error tolerance approach for similarity search problems
PDF
High level design for yield via redundancy in low yield environments
PDF
Test generation for capacitance and inductance induced noise on interconnects in VLSI logic
PDF
Defect-tolerance framework for general purpose processors
PDF
Redundancy driven design of logic circuits for yield/area maximization in emerging technologies
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Silicon-based photonic crystal waveguides and couplers
PDF
RF and mm-wave blocker-tolerant reconfigurable receiver front-ends
PDF
Improving efficiency to advance resilient computing
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Development of a new self-healing concrete using engineered aggregates
PDF
Energy control and material deposition methods for fast fabrication with high surface quality in additive manufacturing using photo-polymerization
Asset Metadata
Creator
Pan, Zhaoliang
(author)
Core Title
Error-rate and significance based error-rate (SBER) estimation via built-in self-test in support of error-tolerance
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
10/10/2008
Defense Date
08/19/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
error-rate,error-significance,error-tolerance,estimation,OAI-PMH Harvest,significance-based error-rate
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Breuer, Melvin A. (
committee chair
), Gupta, Sandeep K. (
committee member
), Sun, Fengzhu Z. (
committee member
)
Creator Email
zhaoliangpan@yahoo.com,zhaoliap@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1655
Unique identifier
UC1152629
Identifier
etd-Pan-2445 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-117654 (legacy record id),usctheses-m1655 (legacy record id)
Legacy Identifier
etd-Pan-2445.pdf
Dmrecord
117654
Document Type
Dissertation
Rights
Pan, Zhaoliang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
error-rate
error-significance
error-tolerance
estimation
significance-based error-rate