Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A variation aware resilient framework for post-silicon delay validation of high performance circuits
(USC Thesis Other)
A variation aware resilient framework for post-silicon delay validation of high performance circuits
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A VARIATION AWARE RESILIENT FRAMEWORK FOR POST-SILICON DELAY VALIDATION OF HIGH PERFORMANCE CIRCUITS by Prasanjeet Das A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2013 Copyright 2013 Prasanjeet Das ii DEDICATION To My loving family Whose sacrifices have dwarfed my achievements!! iii ACKNOWLEDGEMENTS It is a pleasure to thank all those who made this dissertation possible through their supports, guidance, encouragements and inspirations. First and foremost, I would like to express my deepest gratitude to my advisor and the dissertation committee chairman, Professor Sandeep K. Gupta for his enthusiasm, mentorship, time, encouragement and guidance throughout my graduate studies and academic research. His undiminishing passion for scientific discoveries and his dedication in making intellectual contributions to the VLSI design and test community have been invaluable in shaping my academic and professional career. Special thanks go to my dissertation and qualification committees, including Professor Massoud Pedram, Professor Leana Golubchik, Professor Peter A. Beerel, and Professor Monte Ung for their time, guidance, and feedback. I would like to acknowledge the financial support from Intel Corporation and I am sincerely grateful to Dr. Suriyaprakash Natarajan for his mentorship and valuable inputs at various stages of my internship at Intel Corporation. I would also like to thank Dr. Eli Chiprout and Dr. Noel Menezes from Strategic CAD Labs Intel Corporation for providing me with the valuable opportunity to learn and work for a reputed company like Intel Corporation. I would also like to thank Liang-Chi Chen and I-de Huang for their earlier works at University of Southern California which served as the basis of the implementation framework for my dissertation. Also, my sincere thanks to Professor Gandhi Puvvada for iv providing me with valuable teaching and learning experience in digital logic and system design under his timely and valuable tutelage. I would like to thank all my teachers, friends and students at USC and India who have in many ways have contributed to the success of my professional, personal and academic endeavors. I must acknowledge the Electrical Engineering staff at the University of Southern California, particularly Annie Yu, Estela Lopez, Diane Demetras, and Tim Boston; my colleagues in the VLSI test research group, Dr. Doochul Shin, Dr. Mohammad M. Tabar, Da Cheng, Hsunwei Hsiung, Yue Gao, Byeongju Cha, and Jianwei Zhang; my various roommates including Parag Salve, Aditya Deshpande, Ketan Sharma, Praveen Narendranath, Manish Mahajan, Omkar Thorat, Khushnood Irani, Jitendra Patil, Abhishek Swaroop, Sachin Chachada, Santosh Mundhe, Chetan Diwan, and Tejas Bansod; my doctoral friends Siddharth Bhargav, Prasad Joshi and Suvil Singh Deora and my dearest friends Swati Sharma, Anshu Agrawal, Arpita Kar, Abhishek Gharpure, and Sreerama Kiran. Special thanks to Swati Sharma for her editing assistance on this dissertation. Love to my little sister Faiqa Ahmed to be my support at all times including the frustrating ones in this long journey. Special thanks to the Salve family - Parag Salve and Gauri Salve, to provide me with a home and family away from home. I would like to thank my family – Dr. Debiprasanna Das, Mrs. Bharati Das, and Pranab Das for their unconditional love and support and whose sacrifices have dwarfed my achievements indeed. v TABLE OF CONTENTS DEDICATION .................................................................................................................... ii ACKNOWLEDGEMENTS ............................................................................................... iii LIST OF TABLES ............................................................................................................. ix LIST OF FIGURES ......................................................................................................... xiii LIST OF ABBREVIATIONS ......................................................................................... xvii ABSTRACT .......................................................................................................................xx CHAPTER 1 ........................................................................................................................1 1.1 Motivation ....................................................................................................3 1.2 The scope .....................................................................................................9 1.2.1 Type of high performance chips ......................................................... 10 1.2.2 Type of delay deviants ........................................................................ 11 1.3 Other related work on validation ...............................................................12 1.4 Dissertation contributions ..........................................................................15 1.5 Dissertation outline ....................................................................................16 CHAPTER 2 ......................................................................................................................18 2.1 Delay fault models and delay test generation ............................................18 2.2 Limitations of current path delay testing approaches ................................19 2.3 The foundations of our proposed approach ...............................................22 2.3.1 Unique challenges .............................................................................. 22 2.3.2 Key ideas ............................................................................................ 24 2.3.3 Technical objective ............................................................................. 27 2.4 The preliminaries .......................................................................................27 2.4.1 Terminology and definitions .............................................................. 28 2.5 The proposed approach ..............................................................................31 2.5.1 The problem statement ....................................................................... 31 vi 2.5.2 The overall approach .......................................................................... 31 CHAPTER 3 ......................................................................................................................35 3.1 Post-silicon tasks require timing ................................................................35 3.2 Existing delay models: A review ...............................................................38 3.3 Developing the resilient delay model ........................................................42 3.3.1 The delay modeling approach ............................................................ 43 3.3.2 Characterization setup ........................................................................ 44 3.3.3 Basic delay model ............................................................................... 46 3.3.4 Timing functions ................................................................................ 49 3.3.5 Incorporating variability and bounding approximations .................... 51 3.3.6 Complexity trade-offs ......................................................................... 56 3.3.7 Extended model .................................................................................. 59 3.3.8 Application of delay model ................................................................ 59 3.3.8.1 Enhanced timing analysis ................................................................ 60 3.3.8.2 Path selection ................................................................................... 61 3.3.8.3 Vector generation approach ............................................................. 61 3.4 Experimental results to validate the resilient delay model ........................62 3.4.1 Experiments on the accuracy of timing analysis ................................ 63 3.4.2 Experiments on path selection and vector generation ........................ 72 3.5 Summary ....................................................................................................74 CHAPTER 4 ......................................................................................................................75 4.1 Emergence of low power CMOS circuits ..................................................75 4.2 Effect of voltage scaling on delay ..............................................................76 4.3 Delay sensitivity to variability at low voltages ..........................................77 4.4 Empirical learning from circuit simulations ..............................................78 4.4.1 Effect of variability on gate delay ...................................................... 79 4.4.2 Effect of MIS on gate delay ................................................................ 79 4.4.3 Combined effect of variability and MIS ............................................. 80 4.5 Existing near- and sub-threshold delay models .........................................80 vii 4.6 Experimental results on simple circuits .....................................................82 4.7 Experimental results on ISCAS benchmarks .............................................83 4.7.1 Experimental results on timing analysis: without variability ............. 84 4.7.2 Experimental results on timing analysis: with variability .................. 86 4.7.3 Experimental results on path selection and vector generation ........... 88 4.8 Summary ....................................................................................................89 CHAPTER 5 ......................................................................................................................91 5.1 Key observations from silicon studies .......................................................91 5.2 Existing path selection approaches ............................................................92 5.3 The value system........................................................................................93 5.4 Innovative use of the resilient delay model ...............................................96 5.4.1 Delay defining parameters .................................................................. 97 5.4.2 The taxonomy of timing cases ............................................................ 98 5.5 Maximum delay sensitization conditions.................................................100 5.6 Selective enumeration ..............................................................................108 5.7 The vector generation framework ............................................................110 5.8 Experimental results on c17 to illustrate the framework .........................113 5.8.1 Cone exhaustive approach – the first baseline ................................. 114 5.8.2 EFS conditions [80] – the second baseline ....................................... 115 5.8.3 New maximum delay sensitization conditions ................................. 116 5.8.4 Side input refinement ....................................................................... 117 5.8.5 Timing based pruning ....................................................................... 119 5.9 Experimental results on larger benchmarks .............................................121 5.10 Comparison with n-detect tests ................................................................129 5.11 Summary ..................................................................................................130 CHAPTER 6 ....................................................................................................................131 6.1 Timing uncertainty and delay variations..................................................131 6.2 The taxonomy of process variations ........................................................133 6.3 Review of existing work on variability ....................................................137 viii 6.4 Effect of process variations on validation vector set ...............................138 6.5 Segmentation of variation envelopes – divide and conquer ....................142 6.6 Usage of sub-envelopes during validation vector generation ..................147 6.7 Experimental results on ISCAS benchmarks ...........................................151 6.7.1 Experimental results on super-threshold circuits ............................. 151 6.7.2 Experimental results on near- and sub-threshold circuits ................. 156 6.8 Summary ..................................................................................................157 CHAPTER 7 ....................................................................................................................158 7.1 Crosstalk-induced delay ...........................................................................158 7.2 Testing for crosstalk-induced delay defects .............................................159 7.3 Surrogates for a crosstalk target ...............................................................159 7.4 Proposed approach and key ideas ............................................................161 7.4.1 Surrogate list pruning based on timing information ......................... 162 7.4.2 Identification of FS testable surrogates ............................................ 163 7.4.3 Surrogate list pruning due to timing dependent logic conditions ..... 163 7.4.4 Surrogates with non-overlapping timing ranges ............................... 165 BIBLIOGRAPHY ............................................................................................................167 APPENDIX ......................................................................................................................184 A.1. Validation budget for resilient characterization ε char ...............................185 A.2. Validation budget for resilient timing ε conv ..............................................194 A.3. Validation budget for resilient path selection ε path ...................................197 A.4. Validation budget for resilient vector-space ε vector ...................................198 A.5. Validation budget for resilient framework ε ............................................199 ix LIST OF TABLES Table 1.1: Shortcomings of existing approaches for validation and testing ..................... 14 Table 2.1: Ideas to make existing PDT approaches suitable for delay validation ............ 27 Table 3.1: Tasks that require timing ................................................................................. 36 Table 3.2: Review of existing delay models ..................................................................... 40 Table 3.3: Tightness of bounding approximations ........................................................... 55 Table 3.4: Complexity trade-offs: VRM V/s CSM for a 2 input NAND Gate ................. 57 Table 3.5: Complexity trade-offs: Our approach v/s pin-to-pin bounding ....................... 58 Table 3.6: ISCAS89 Benchmark Circuits ......................................................................... 62 Table 3.7: Accuracy of simulation based approach for smaller ISCAS Benchmarks ...... 63 Table 3.8: Accuracy of analysis based approcah for smaller ISCAS Benchmarks .......... 65 Table 3.9: Accuracy of ETA ............................................................................................. 66 Table 3.10: Analysis of medium size ISCAS benchmarks ............................................... 67 Table 3.11: ETA V/s Random Simulations Vs MDS simulations [54] for medium ISCAS benchmarks ....................................................................................................................... 69 Table 3.12: ETA V/s Random Simulations Vs MDS simulations [54] for big ISCAS benchmarks ....................................................................................................................... 70 Table 3.13: Analysis on s298 with partially specified vectors ......................................... 71 Table 3.14: Runtime Analysis: Our approach Vs CSM based approaches [72][171] ...... 72 Table 3.15: Analysis of validation metrics: Our approach V/s P2P based approaches .... 73 x Table 4.1: Analysis and effect of variability and MIS on max delay of a 2 input NAND gate in 65nm ...................................................................................................................... 79 Table 4.2: Review of existing near- and sub-threshold delay models .............................. 81 Table 4.3: Comparison of delay models for simple circuits ............................................. 83 Table 4.4: Comparison of delay models for ISCAS benchmarks: Timing simulation ..... 84 Table 4.5: Comparison of delay models for ISCAS benchmarks: Timing analysis ......... 85 Table 4.6: Experiments with variability: ISCAS benchmarks .......................................... 87 Table 4.7: Analysis of validation metrics: Our approach vs P2P based appraoches ........ 88 Table 5.1: Existing delay testing approaches .................................................................... 92 Table 5.2: Analysis of value systems ................................................................................ 94 Table 5.3: The complete eight value system [80] ............................................................. 96 Table 5.4: Classification of timing cases for to-controlling transition at on-path input of a 2 input NAND gate ......................................................................................................... 101 Table 5.5: Classification of timing cases for to-non-controlling transition at on-path input of a 2 input NAND gate .................................................................................................. 102 Table 5.6: Maximum Delay Sensitization conditions for to-controlling transition at on path input of a NAND gate ............................................................................................. 103 Table 5.7: Maximum Delay Sensitization conditions for to-non-controlling transition at on path input of a NAND gate ........................................................................................ 105 Table 5.8: Taxonomy of timing cases for c17 ................................................................ 116 Table 5.9: Values at primary inputs of c17 after TBP .................................................... 120 Table 5.10: Results for c17 ............................................................................................. 121 xi Table 5.11: Path analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations ........................................................ 122 Table 5.12: Vector analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations ................................................ 123 Table 5.13: Runtime analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations ................................................ 124 Table 5.14: Path analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations .............................................. 125 Table 5.15: Vector analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations .............................................. 126 Table 5.16: Runtime analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations ....................................... 127 Table 5.17: Robust vs Resilient: Comparison of delays invoked in circuit level simulations ...................................................................................................................... 128 Table 5.18: Analysis for larger ISCAS benchmarks ....................................................... 128 Table 5.19: Comparison with n-detect tests (n = 50) ...................................................... 129 Table 6.1: Effect of process variations on the number of paths selected and the number of validation vectors ............................................................................................................ 140 Table 6.2: Validation vector and path-sets - s1196 (full global plus local variability) .. 152 Table 6.3: Validation vector-sets for one-parameter segmentation ................................ 154 Table 6.4: Validation vector-sets for two-parameter segmentation ................................ 155 Table 6.5: Validation vector and path-sets for near- and sub-threshold circuits ............ 156 Table 7.1: Timing dependent FS conditions under unit delay model ............................. 164 Table 7.2: Timing dependent FS conditions under resilient delay model ...................... 165 xii Table A.1: Percentage of chips affected by limited path selection ................................. 198 xiii LIST OF FIGURES Figure 1.1: A typical design flow. (Solid arrows show flow of design information/artifacts, while dashed arrows indicated re-design/go-ahead signals.) ........... 2 Figure 1.2: Increasing complexity drives increased time-to-market and cost [96]............. 3 Figure 1.3: Post-silicon validation as a percentage of total design resources [134] ........... 4 Figure 1.4: Silicon debug v/s time–to-market ([1][193]).................................................... 5 Figure 1.5: Normalized delay variability of 16-bit adders ([10][145]) ............................... 6 Figure 1.6: Many ASICs require multiple silicon spins. ([95][179]) ................................. 8 Figure 2.1: Arrival and transition times for a falling transition at line X ......................... 28 Figure 2.2: A logical path in ISCAS85 benchmark c17 ................................................... 29 Figure 2.3: The propsoed flow .......................................................................................... 32 Figure 2.4: The proposed approach .................................................................................. 32 Figure 3.1: Delay vs Skew curves for MIS [28][29] ........................................................ 39 Figure 3.2: Delay model categorization [48][5] ............................................................... 39 Figure 3.3: Characterization setup without considering the effect of interconnect. ......... 45 Figure 3.4: Characterization setup considering the effect of interconnects ...................... 46 Figure 3.5: Delay vs skew curve for near-simultaneous transitions (basic delay model – simulations results) ........................................................................................................... 47 Figure 3.6: Various phenomena associated with MIS-TNC [29] .................................... 49 Figure 3.7: (a) Rise delay function, (b) fall delay function .............................................. 50 xiv Figure 3.8: Piecewise linear approximation for near-simultaneous transitions (basic delay model) ............................................................................................................................... 51 Figure 3.9: Delay vs skew curve for near-simultaneous transitions (resilient delay model – simulations result) .......................................................................................................... 52 Figure 3.10: Resilient delay model – 3,4 point linear bounding ....................................... 53 Figure 3.11: Resilient delay model – pin to pin bounding ................................................ 54 Figure 3.12: Possible input combinations for (a) output rising transition (b) output falling transition. .......................................................................................................................... 60 Figure 3.13: Calculation of output arrival times in ETA using resilient delay model. .... 60 Figure 3.14: Shrinking of timing ranges – timing implications. ...................................... 61 Figure 3.15: The benchmark c17 ...................................................................................... 64 Figure 3.16: ETA V/s Random simulations ...................................................................... 68 Figure 4.1: Energy delay trade-off in combinational logic [120] ..................................... 76 Figure 4.2: Delay in different supply voltage operation regions [57]............................... 77 Figure 4.3: Comparison of key sub-threshold, near-threshold and super-threshold sensitivities for 65nm [74] ................................................................................................ 78 Figure 5.1: Delay defining parameters for the resilient delay model ............................... 97 Figure 5.2: Timing cases for MIS-TC for a 2 input NAND gate ...................................... 99 Figure 5.3: Timing cases for MIS-TNC for a 2 input NAND gate ................................. 100 Figure 5.4: Partial-ordered graph for the side input values for various timing cases for to- controlling transition at on path input of a 2-input NAND gate ..................................... 104 Figure 5.5: Partial-ordered graph for the side input values for various timing cases for to- non controlling transition at on path input of a 2-input NAND gate .............................. 106 xv Figure 5.6: Elimination of inferior vector-spaces at leaf node. ...................................... 109 Figure 5.7: Algorithm to eliminate inferior vector-spaces at leaf node .......................... 109 Figure 5.8: Elimination of inferior vector-spaces at the intermediate node. .................. 110 Figure 5.9: Algorithm to eliminate inferior vector-spaces at intermediate node ............ 110 Figure 5.10: An overview of our six phase approach ..................................................... 111 Figure 5.11: A target path on c17 benchmark................................................................. 114 Figure 5.12: Cone exhaustive approach on c17 .............................................................. 114 Figure 5.13: EFS approach on c17 .................................................................................. 115 Figure 5.14: MDS approach on c17 ................................................................................ 116 Figure 5.15: The partially ordered graphs for the three lines in c17............................... 117 Figure 5.16: Search tree for SIR on c17 ......................................................................... 118 Figure 5.17: SIR approach on c17 .................................................................................. 119 Figure 5.18: TBP approach on c17 ................................................................................. 119 Figure 6.1: Steps in design process and their effect on timing uncertainties [12] .......... 132 Figure 6.2: Parameter variations causing delay variations [12] ...................................... 132 Figure 6.3: General taxonomy of variation ..................................................................... 134 Figure 6.4: Categorization of device variation [10] ........................................................ 136 Figure 6.5: Validation vector generation for c17 (a) local-only variability (b) full global plus local variabilities ..................................................................................................... 139 xvi Figure 6.6: Segmentation of full local plus global variability envelope for one parameter - (a) Full global plus local, (b) uniform (c) non-uniform .................................................. 143 Figure 6.7: Variability envelopes for two paramters - (a) Full global plus local, (b) full global-only and full local-only, (c) uniform segmentation, (d) non-uniform segmentation ......................................................................................................................................... 146 Figure 6.8: Application of global plus local variability sub-envelope for two paramters - (a) Full global plus local envelope, (b) uniform segmentation ....................................... 148 Figure 6.9: Validation vector and path set for s1196 - (a) No segmentation, ................. 150 Figure 7.1: Crosstalk-induced slow-down ...................................................................... 158 Figure 7.2: An example crosstalk site ............................................................................. 160 Figure 7.3: The target-exhaustive approach.................................................................... 161 Figure 7.4: Timing ranges for a surrogate candidate for elimination with a very early transition at A. ................................................................................................................. 162 Figure 7.5: Timing dependent functional sensitization conditions for to-controlling transition at on-path input of a 2-input NAND gate ....................................................... 163 Figure 7.6: Timing ranges for a surrogate candidate with non-overlapping timing ranges for application of MDS conditions ................................................................................. 166 Figure A.1: Calculation of max delay using three point approximations ....................... 188 Figure A.2: Estimating probability of failure at corner points of the three-point approximation for MIS-TC ............................................................................................. 191 Figure A.3: Bounding approximations on the intertwined curves (hypothetical) .......... 192 Figure A.4: Max arrival time calculation for MIS-TC [28] ............................................ 196 Figure A.5: Convolution to arrive at path PDF using gate PDFs [160] .......................... 197 xvii LIST OF ABBREVIATIONS ALU Arithmetic Logic Unit ANU Adaptive Non Uniform ASIC Application Specific Integrated Circuit ATE Automatic Test Equipment ATPG Automatic Test Pattern Generator AU Adaptive Uniform CCS Composite Current Source CMOS Complementary Metal-Oxide Semiconductor CPU Central Processing Unit CSM Current Source Modeling DSM Deep Sub Micron ECSM Effective Current Source Model EFS Enhanced Functional Sensitization ETA Enahnced Timing Analysis ETS Enhanced Timing Simulation FS Functional Sensitization FV Full Variability IEDM International Electron Devices Meeting ISCAS International Symposium on Circuits and Systems ITRS International Technology Roadmap for Semiconductors LER Line Edge Roughness xviii MC Monte Carlo MDP Minimum Delay Point MDS Maximum Delay Sensitization MEP Minimum Energy Point MIS Multiple Input Switching MIS-TC Multiple Input Switching – To-Controlling MIS-TNC Multiple Input Switching – To-Non-Controlling MV Multiple Vectors NA Non Adaptive NBTI Negative Bias Temperature Instability NTVC Near-threshold Voltage Circuits NV No Variability OTV Oxide Thickness Variation PDF Probablity Distribution Function PDT Path Delay Test PSRO Performance Sensitive Ring Oscillator PVT Process Voltage Temperature RC Resistor Capacitor RDF Random Dopant Fluctuations RLC Resistor Inductor Capacitor SIR Side Input Refinement SIS Single Input Switching xix SPDM Scalable Polynomial Delay Model SPICE Simulation Program with Integrated Circuit Emphasis SSTA Statistical Timing Analysis STA Static Timing Analysis STVC Sub-threshold Voltage Circuits TBP Timing Based Pruning TDDB Time Dependent Dielectric Breakdown VLSI Very Large Scale Integrated Circuit VRM Voltage Reference Modeling VS Vector Space xx ABSTRACT Despite advances in design and verification, it is becoming increasingly common for high-performance designs to misbehave on silicon. This is due to performance issues, such as, timing bugs, which cause significant percentage of fabricated chips to have delay failures that are first discovered only during post-silicon validation. Delay marginality is one such important variation induced timing bug that is often missed by existing validation approaches, as it changes delay to produce errors in significant fraction of fabricated chips, even in the absence of defects and even when the variations in the fabrication process are within the normally expected levels, i.e., even when there is no abnormal process drift. Thus, it becomes imperative to improve quality of validation with respect to delay marginalities. One obvious approach to achieve this would have been to adapt and apply the existing delay testing approaches (where the generated vectors are applied to a fabricated copy of chip to measure delay) to generate vectors for delay validation, i.e., vectors that are guaranteed to expose all the delay marginalities. But, several recent case studies using fabricated chips show that existing path delay testing approaches generate vectors that fail to expose the delay marginalities on silicon. The primary reason for such an anomaly can be attributed to the inability of existing approaches to account for the effect of advanced delay phenomena such as Multiple Input Switching (MIS) and process variations on delay. This necessitates the development of a variation aware framework that can provide an accurate estimate of the post-silicon delay. xxi The goal of this dissertation is to develop new models and methodologies to accurately estimate the delay of any given fabricated copy of a design. We identify key strengths of validation and formulate our problem of discovering delay marginalities. We also identify the major phenomena affecting gate delay namely Multiple Input Switching (MIS) and successfully develop new models (based on a new notion of bounding approximations) and methods that account for inaccuracies and variability at practical run-time complexities and are suitable for both pre- and post-silicon timing tasks. We show that compared to existing delay models, our delay models are much more accurate and provide order of magnitude reductions in runtime. Subsequently, we adapt and significantly extend existing delay testing approaches to generate vectors for delay validation, i.e., vectors that are guaranteed to detect the worst-case delay for a significant fraction (within a user-specified validation budget) of fabricated chips. We show that our vectors, being MIS and variation aware, invoke much higher delay than vectors generated by existing vector generation approaches. Finally, we developed a variability aware divide and conquer based method for efficient post-silicon validation that gives substantial reduction in test-application time and hence the test-cost. Our experimental results demonstrate that we have developed the first variation aware resilient framework for post-silicon delay validation. 1 CHAPTER 1 INTRODUCTION The development of a new digital chip starts with its specifications, which describe the desired functionality and key parameters, such as performance and power (see Figure 1.1). In most design-flows, a multi-step design process produces a detailed design in the form of a gate/transistor-level netlist and a layout. At various stages during this phase, pre-silicon verification is undertaken to check that the detailed design conforms to the specifications. Any serious disagreement between the specifications and detailed design is resolved, often via redesign. Eventually, a detailed design that passes verification is fabricated, first to obtain a small batch of chips, commonly called first- silicon. This batch of chips undergoes a process often called post-silicon validation or validation of first-silicon [73]. Any misbehavior detected during validation that is deemed likely to cause a significant fraction of fabricated chips to fail (and hence threaten the chip’s economic viability) is addressed via redesign. Redesign at this stage is expensive and time-consuming, since it requires diagnosis to identify the root cause of the observed failure, redesign, verification, creation of a new set of masks, re-fabrication, and validation. Each such redesign is commonly referred to as a new silicon spin. (Note that the term first-silicon also refers to the first small batch of chips fabricated for the second spin, third spin, and so on.) When validation is eventually successful, the corresponding set of masks is used to manufacture chips in high volume. High-volume-manufacturing testing is then carried out for each fabricated copy of the chip. In summary, verification 2 and validation ensure that a detailed design conforms to the specifications and is economically viable, in terms of total production cost per chip sold. Testing guarantees the correctness and performance of individual chips sold to customers. Figure 1.1: A typical design flow. (Solid arrows show flow of design information/artifacts, while dashed arrows indicated re-design/go-ahead signals.) We propose to develop the first variation aware framework for post-fabrication delay validation. The primary emphasis of our framework is on development of a high quality delay validation methodology that will detect the prominent delay anomalies that threaten a chip’s performance and hence its economic viability. As the current technology trends clearly show the growing importance of timing misbehaviors [45][185][26], by guaranteeing to expose these our approach promises to play an increasingly critical role in achieving high-performance designs in cost-effective manner in all future technologies – end-of-roadmap CMOS and beyond. 3 1.1 Motivation As the VLSI fabrication technology scaling is reaching nano-scale; the challenges of post-silicon validation are continuously increasing, driven by higher levels of integration, increased circuit complexity and platform performance requirement [96]. Design complexities along with the new complex architectural features have contributed heavily to the increasing number of bugs and the increasing validation efforts. Figure 1.2 shows that with increasing complexity, which based on some metric not provided in [96], the cost of validation, which is quantified by the number of person years to validate a complex product, increases significantly. Figure 1.2: Increasing complexity drives increased time-to-market and cost [96] As the industry continues to move to even smaller geometries, the contribution of Post-silicon validation to the overall design cost also scales manifolds. Harry Foster of Mentor Graphics said - “At 65nm post silicon validation consumes more than 50% of the overall design effort” [126]. Also, Intel reported increase in head count ratio between post-silicon validation and design from about 1:5 (approximated from Figure 1.3) in 2002 4 to about 1:3 (approximated from Figure 1.3) in 2005 which is a clear indication of the fact that the post silicon validation is becoming increasingly effort-intensive [134]. Figure 1.3: Post-silicon validation as a percentage of total design resources [134] This increasing importance of post-silicon validation is attributed to the underlying fundamental objective of validation that is to ensure that the design does not suffer from any systematic problem that threatens the chip’s economic viability. In particular, the focus is on exposing systematic problems that affect a significant percentage of chips and hence must be addressed via redesign. Each redesign, i.e., each new silicon spin, is expensive and time-consuming, since it requires diagnosis to identify the root cause of the observed failure, redesign, verification, creation of a new set of masks, re-fabrication, and validation. Validation primarily focuses on the circuit bugs (also known as electrical bugs) [123] as logic level design errors (categorized as functional bugs) should ideally be detected in the pre-silicon verification phase (though we continue to fall short of this). In 5 recent years, increasingly significant causes of low yield are first discovered during validation. Consequently, the first silicon debug that comprises of a major portion of any post silicon validation framework is becoming the most time consuming part of the chip development cycle [1] (see Figure 1.4). Figure 1.4: Silicon debug v/s time–to-market ([1][193]) Scaling the VLSI circuits beyond the ability to control specific performance- dependent and power-dependent parameters, give rise to increasing levels of variability in parameters such as delay and power consumption of CMOS devices, circuits, and chips [10]. The profoundness of influence of variability on high performance circuits can easily be deduced from the fact that it is becoming increasingly common for chip producers to miss deadlines or have re-spins due to the inability to estimate the effect of variability accurately [71]. Variability (also referred to as process variations) is a prime contributor to the increase in timing bugs which comprise a major portion of the circuit bugs. 6 Figure 1.5: Normalized delay variability of 16-bit adders ([10][145]) Process variations can be categorized (without any loss of generality) as, normal process variations, i.e., those within the normally expected levels that may occur even when there is no abnormal process drift, and extreme process variations caused by process drift. Any parametric error that is caused under normal process variations is sometimes referred to as marginality, and validation must expose any such marginality 7 that is likely to affect a large fraction of fabricated chips so it may be addressed via redesign before the design moves into high-volume fabrication. Marginalities caused by low level effects such as delay variations, crosstalk, and ground-bounce and aggravated by variability are becoming increasingly important (see the Proceedings of IEDM for any recent year). Figure 1.5 shows the delay variability on various implementations of 16 bit adders on 90 nm technology node from [10]. Later, we will show that the numbers for delay variabilities for similar circuits are much worse, when implemented in 65nm technology node. Our research is motivated by shortcomings of existing approaches during post- silicon validation. In particular, despite advances in design and verification, it is becoming increasingly common for many chip designs to undergo multiple silicon spins. This is the case not only for high-performance custom and semi-custom chips but also for application-specific integrated circuits (ASICs), which are typically much less complex and much less aggressive in terms of area efficiency and performance. For example, as described in [95][179], Collett International Research reports that 37% of ASICs require a second spin while 24% of ASICs require more than two spins (see Figure 1.6). Similar data from Numetrics Management Systems, Inc. has been presented in [173]. The fact that multiple silicon spins are required shows that it is increasingly common for validation to miss many causes of serious circuit misbehavior, in particular the marginalities caused by low-level effects, such as, delay variations, ground bounce, and crosstalk. Note that a redesign is only warranted when marginalities threaten the economic viability of the design by causing a significant fraction of fabricated chips to 8 have erroneous behavior, even in the absence of defects and even when the variations in the fabrication process (process variations) are within the normally expected levels, i.e., even when there is no abnormal process drift). The fact that 24% of the chips required more than two spins shows that existing validation approaches are inadequate. As the fabrication process is pushed to limits, marginalities will continue to grow in importance for the foreseeable future, rendering existing validation approaches even more inadequate. Figure 1.6: Many ASICs require multiple silicon spins. ([95][179]) Any timing error that is caused under normal process variations is sometimes referred to as delay marginality which is an important variation-induced timing bug that comprises of a significant fraction of circuit bugs detected by validation. Thus it becomes imperative to improve quality of validation with respect to delay marginalities. One approach might be to adapt today’s best delay testing approaches for delay validation. 9 Unfortunately, several recent case studies using actual chips show that existing path delay testing approaches generate vectors that fail to invoke the worst-case delays in silicon [45][185][26]. While some of the causes of misbehavior discovered in first-silicon are design errors missed by verification, marginalities constitute an increasing proportion of these causes. This increase in importance of marginalities – caused by low-level effects and aggravated by process variations – is due to the following factors. (i) Increase in percentage variations in values of key parameters and parasitics (e.g., see [10]), as feature sizes and separations shrink deep into nano-scale and push fabrication processes to their limits and beyond. (ii) Increase in the importance of low-level effects – delay variations, inadequate noise margins, charge sharing, crosstalk, and so on. (iii) Need for aggressive design, and hence lower guard-banding, as we experience a slowdown in some benefits of scaling, notably its performance benefits. The fact that this increase in importance of marginalities will continue unabated for the last years of CMOS scaling is clearly evident from the amount of research effort devoted to related concerns (e.g., see the Proceedings of IEDM for any recent year). The importance of our research will increase even more dramatically when new technologies start replacing or supplementing CMOS. 1.2 The scope The importance of this problem of delay validation is due to the fact that correct operation of a digital circuit requires the signal propagation delay of every sensitizable path in the circuit to be smaller than a specified amount, often the clock period. This 10 problem is of particular importance for high performance processors as being the king of the ring they determine the operational frequency by setting the clock period of the whole chip. 1.2.1 Type of high performance chips The manufactured chips are primarily categorized into ASICs and high end processors. They both differ in their respective methodologies in many ways. ASICs generally have higher margins in design, timing, and validation due to reduced constraints and shorter time-to-market, whereas processors being the delay determinants of the chip or system, must undergo aggressive design with lower margins and guard bands. Due to absence of at-speed mechanisms for testing, ASICs sign-offs are generally based on worst-case measurements on PSROs (Performance Sensitive Ring Oscillator) [174] whereas the task of speed-binning that acts like a determinant for pricing of high performance processors necessitates the presence of at-speed testing mechanisms. Thus our dissertation primarily focuses on the high performance processors that dictate the speed of operation of the chip or system and we intend to significantly extend the available at-speed testing mechanism for our cause of delay validation. Also any processor design comprises of logic blocks and embedded memory arrays. We will be focusing on the logic blocks as the proposed method for delay validation can be easily extended to take into account the embedded logic arrays (if necessary) using the existing methods for the memory path delay testing [185]. This dissertation focuses primarily on the gate-dominated paths in logic blocks that consist of more than 15-20 logic gates [45] and frequently occur in large 11 microprocessor blocks [129] that can be either huge data-path blocks (ALU[178]) or control blocks (such as instruction decoder [159] ). Wire-dominated paths [26] (such as global wires, bus lines) are not considered here explicitly. 1.2.2 Type of delay deviants The delay measured on fabricated chip tend to deviate from the estimated nominal delay value due to many reasons such as functional errors, delay defects, process variations, crosstalk, etc. Variations in manufacturing process (process variations) may degrade circuit performance [80], with or without altering its logic functionality [10]. This is also the case for certain types and instances of manufacturing defects (delay defects) [30][62]. It is useful to consider two types of process variations, normal process variations, i.e., those within the normally expected levels that may occur even when there is no abnormal process drift, and extreme process variations caused by process drift. In addition to exposing logic-level design errors, which are covered by other approaches and hence not the subject of our research, validation must expose systematic timing errors which must be addressed via redesign. Hence, validation focuses on normal process variations, since extreme process variations are addressed by improving process control and delay defects are addressed by improving the testing process. Any timing error that is caused under normal process variations is sometimes referred to as delay marginality, and validation must expose any delay marginality that is likely to affect a large fraction of fabricated chips so it may be addressed via redesign before the design moves into high-volume fabrication. 12 Hence, post-silicon validation is typically performed on a small set of chips, sampled from the first batch of chips fabricated for a version of the design. In keeping with the above objective, these chips are selected to capture normal process variations, and chips that are identified as having abnormal levels of process variations and defects are not included. Going along similar lines, our research will deal with the delay marginalities (caused by low-level effects, such as multiple input switching, ground bounce and crosstalk) that threaten the economic viability of the design by causing a significant fraction of fabricated chips to have erroneous behavior, even in the absence of defects and even when the variations in the fabrication process (process variations) are within the normally expected levels, i.e., even when there is no abnormal process drift.) 1.3 Other related work on validation As described above, the key way in which validation can complement the preceding pre-fabrication verification is its ability to capture marginalities due to low- level effects, parasitics, delays, and variations. The use of actual chips means that validation is performed exclusively by applying vectors and observing responses. Hence, the quality of validation is directly dependent on the quality of vectors it uses. Existing validation approaches fall short in several critical ways. Objectives: For historical reasons, validation does not explicitly focus on marginalities. E.g., as shown in Table 1.1, while many of the references focus on design errors [173][7][13][97][98][121][122][457][149][151][152][164], few focus on marginalities [73][66]. Even these largely pose the problem and identify challenges. 13 Paradigm: Current approaches use a paradigm which can be called ad- hoc/functional/random whereas we propose to use the paradigm of certification via elimination of likely deviants as used for testing. Approaches for generating vectors for validation: Functional vectors and pseudo-random vectors are commonly used. In some methodologies, random vectors are generated with non-uniform probabilities, or with bias, with the objective of increasing the probability of exercising some “corner” cases or parts of the design that are deemed to require more attention. This approach is sometimes called input biasing (e.g., see [151][152][184]). Whenever available, vectors generated by the validation team for a previous generation of the design, including vectors generated to further explore, expose, and diagnose anomalous behavior with respect to PVT (Process, Voltage and Temperature) and process variations (typically captured in the form of Schmoo plots, e.g., see [4]), constitute yet another type of vectors used. In addition, vectors generated for verification using approaches such as [152] and [184] can also be used. Approaches for evaluating vectors: In the absence of any model for design deviations being targeted during validation, most validation approaches do not even quantify the quality of vectors. Few approaches that do, use logic- and higher-level metrics. Most of these metrics are adopted from software testing (e.g., see [119]) and high-level verification (e.g., see [151 and [152]) and measure the coverage in terms of the fraction of the HDL code describing the circuit design that is exercised by the vectors. The most common metric used for evaluation of vectors is the HDL statement/block coverage, where vectors are evaluated in terms of the statements/blocks in the HDL code 14 for the circuit’s design that they invoke. Another metric used is branch coverage, which evaluates the vectors in terms of the fraction of all possible branches in the decision blocks in the HDL code that they invoke, and so on. Even the basic notion of observability has been incorporated into these metrics fairly recently (e.g., see [56]). Table 1.1: Shortcomings of existing approaches for validation and testing Reference # Validation Testing Target Variation aware Worst case effect targeted [7][13][97] [98][121] [122][149] Y Design errors N N [151][152] [164][189] Y Design errors N N [184][4] [119][56] Y Design errors N N [23][24][25] Y Ground bounce N Y [35] Y Crosstalk N Y [85][86] Y Instantaneous current N Y [101][105] [124][125] [147] Y Crosstalk N N The fundamental weakness of the current approaches for generating vectors for validation is that none of the existing approaches generate vectors with the explicit objective of invoking worst-case severities for electrical effects that are behind marginalities. This is also the weakness of the metrics used for evaluation of vectors. For example, none of the above metrics requires the vectors that exercise a part of the circuit to invoke any low-level electrical effects. Interestingly, most of these coverage metrics do 15 not even require that the vectors that exercise a part of the circuit also sensitize that part of the circuit to an observable output! In fact, [151] states that most currently used metrics are inadequate even for pre-fabrication verification, where the focus is at the logic-level. In the light of above observations, the vectors used for validation in the current practice can, at best, be viewed as detecting marginalities serendipitously. In contrast, the variation aware framework that we propose will explicitly generate vectors and evaluate their quality with respect to the key objectives of validation, namely to detect marginalities. Existing approaches for generating vectors for testing: We also review approaches for testing, especially those for electrical effects, as these are candidates for adaptation for generating vectors for validation. Several approaches have been developed for generating vectors for lower order effects such as capacitive crosstalk and ground bounce [23][24][25][35][85][86][101][105][124][125][147]. However, all these approaches use the nominal delays for gates and wires. This makes the vectors non- resilient, i.e., unreliable for any fabricated copy of the chip, as values of parasitics and delays vary from one chip to another. We propose to develop resilient models and methods for delay validation which will not be invalidated by inaccuracies and uncertainties in delay estimation for gates and wires, and thus are reliable for any fabricated copy of the chip. 1.4 Dissertation contributions In this dissertation we present an ATPG framework that generates two vector tests which when applied to chips sampled from the first-silicon batch are guaranteed to 16 expose the worst-case delay for each chip. The three-fold contributions of this dissertation are: • Resilient delay model for super-threshold, near-threshold and sub-threshold circuits: A delay model that captures the delay phenomena (including multiple input switching for both to-controlling (MIS-TC) and to-non-controlling (MIS- TNC) transitions), captures variations, produces proven upper and lower bounds to be used for worst-case analysis, and tightens these bounds when logic values at any subset of circuit lines are specified. • Timing aware systematic method for vector generation: A framework that generates non inferior test vector-spaces, i.e., when necessary, for a single target- path, a set with multiple vectors in the form of a small number of cubes, using new timing dependent sensitization conditions. These generated vectors collectively guarantee invocation of the worst case delay in the presence of normal process variations. • Variability aware efficient validation vector generator: A new divide and conquer approach that segments the large full (global plus local) variability envelopes into smaller sub-envelopes with full local only and partial global only variabilities and use these to drastically reduce the expected size of the validation test set without weakening the guarantee. 1.5 Dissertation outline The dissertation consists of seven chapters. In Chapter 2, we present the key ideas to overcome the challenges faced for adapting the existing delay testing approaches for 17 delay validation. We also present the outline of our proposed approach including the key elements. Chapter 3 deals with the proposed novel idea of bounding approximations to capture variability in advanced delay models for post-silicon delay validation and then in Chapter 4 we extend our work to include near-threshold and sub-threshold circuits. In Chapter 5 we present a method to generate resilient vectors using a novel timing based ATPG that guarantee invocation of maximum post-silicon delay. In Chapter 6 we devise a method to incorporate knowledge of local and global variability in validation vector generation. Finally, in Chapter 7 we conclude this dissertation with proposed future work about methods to extend the approaches present in earlier chapters for delay deviation due to crosstalk. 18 CHAPTER 2 THE APPROACH: KEY IDEAS AND NEW CONCEPTS In this chapter, we analyze the shortcomings of existing delay-testing approaches that are commonly used for delay marginality validation, identify the unique challenges and present our key ideas to deal with these challenges. Furthermore, we briefly introduce our proposed approach along with it key components. 2.1 Delay fault models and delay test generation A delay test for a combinational logic block consists of a sequence of two vectors < V1, V2>. These vectors must create a desired transition that propagates through the fault location and to a primary output [81]. An automatic test pattern generator (ATPG) is a program capable of generating test patterns (vectors). Delay test generation [15][37][65][137][166] assumes a delay fault model and attempts to generate tests for every delay fault being modeled in a circuit. Various delay fault models have been used to capture the effects of defects and process variations that cause delay deviations. Spot defects, which typically cause large local changes in the geometries and are random in size and location, can be captured using the transition fault model [108][175] which assumes that the extra delay caused at a fault site is large enough to prevent a transition (in a particular direction) at the site from reaching any primary output by the time of observation. 19 Unlike the transition fault model, the gate delay fault model does not assume that the extra delay is necessarily large enough to cause erroneous circuit operation independent of the path via which a transition is propagated from the fault site to a primary output. The size of extra delay as well as the location of the fault needs to be specified to determine the ability of a test to detect a gate delay fault. Methods for computing the smallest delay fault size (detection threshold) guaranteed to be detected by a test have been developed [138][139]. Both the above delay fault models assume a single fault and hence do not capture the fact that physical defects and process variations may affect more than one gate in a circuit. For example, distributed delay failures along a path may cause an error but may not be captured by the test pattern generated under the single fault assumption, if no individual gate delay fault along the path is large enough. The path delay fault model [110][155] considers delay faults on paths from primary inputs to primary outputs in a combinational circuit and captures distributed delay failures. While the change in the delay value of a particular gate or wire may be small, changes in delay values of multiple gates and wires along some circuit path may cause that path’s delay to exceed the desired clock period. Hence, the path delay testing approach in [110][155] is appropriate, since it inherently tackles multiple delay variations and defects where each variation or defect may increase the delay of a gate/wire by a small amount. 2.2 Limitations of current path delay testing approaches Existing delay testing approaches do not consider delay models for nano-scale CMOS, particularly impact on delays of multiple input switching (MIS) and process 20 variations [185][26]. Even the high-quality transition delay testing approaches can only target delay defects that increase delay of a single node whereas variations affect multiple nodes. Path delay test (PDT) has traditionally been believed to be superior for identifying slow paths and considered to be useful for speed binning and performance characterization. Unfortunately, recent silicon studies by nVidia [45], Freescale [185], and Sun (now a part of Oracle) [26] have shown that existing path delay testing approaches generate vectors that fail to invoke the worst-case delays in first-silicon. Despite the fact that robust path delay testing can tackle arbitrary numbers of delay defects and variations along a target path, robust tests often fail to invoke worst-case delay of a target path in any technology where MIS can increase a gate’s delay. (More details ahead.) Furthermore, existing approaches for generating vectors use nominal values for parameters and parasitics. This makes the generated vectors non-resilient, i.e., unreliable for any fabricated copy of the chip, and hence unsuitable for all post-silicon delay related activities. Though the specific details vary somewhat, the general methodology followed for PDT by industry, as well as in these studies [45][185][26], is: • Use static timing analysis (STA) to obtain a ranked list of timing critical paths. • Categorize paths based on path cell types – wire bound and gate bound. • Select top ranked paths in each category. • Generate statically-sensitized robust tests for these paths. 21 One common observation from [45][185][26] is that the PDTs invoke lesser delay than the functional tests since the critical paths in silicon are different from those identified by STA and targeted during vector generation. Reasons evident from [45][185][26] for this anomaly viz-a-viz everyone’s expectations for PDT are: • Inaccurate delay models [185][26] – Existing delay models (used by these approaches) do not take into account all relevant lower order effects. • Non resilient delay models [185][26] – Existing delay models (used by these approaches) do not take into account the effects of variations on delay. • Wrong path selection approach [185][26] – Selecting top ranked paths based on ranking will eliminate certain paths with similar delays from consideration for test generation. • Wrong sensitization conditions [185][26] – Existing sensitization conditions are based on erroneous assumptions regarding gate delay and hence do not excite worst case delay and can falsely declare certain testable paths untestable. • Statically sensitized robust PDT [45][26] – PDTs used in these studies apply static values at side inputs to facilitate delay diagnosis. Such tests do not guarantee invocation of the worst-case delay for a target path [30][62][135]. Additional reasons, such as test application differences and pre silicon – post- silicon netlist mismatches, further reduce the quality of delay testing. The analysis of the results in these silicon studies and the facts that path delay is influenced by various electrical phenomena, such as MIS and capacitances of nodes internal to the on-path CMOS gates and their initial states [30][135], and that process 22 variations must be considered necessitate the development of a new framework to generate resilient vectors for post-silicon delay characterization. 2.3 The foundations of our proposed approach In this section, we translate our abovementioned overall objectives into more precise technical objectives. We start by defining some of the characteristics of the problem, especially constraints and objectives posed by users of validation. We then address these challenges by presenting our key ideas and formulating the specific technical problems, which we address in the next section. 2.3.1 Unique challenges We begin by examining the proposed framework from the perspective of its potential users to identify the practical considerations that we must use to define the objective functions and constraints for our research. • Only require realistically available models and information regarding parameters, parasitics, and variations: The uncertainty in the values of parameters, variations, and so on continues to grow with each scaling generation of fabrication process. Hence, any such framework must require only a practically characterizable subset of all such information. This is a prerequisite to ensuring completeness and accuracy, since use of unreliable models and values (including those obtained via extrapolation from the past technology generation) may cause validation to miss debilitating marginalities, especially those emerging for the first time in a new fabrication process. 23 • Completeness: We must impose overall quality constraints on validation as these are necessary to provide the dramatic reductions in costs and time to market. In particular, our framework must detect a high percentage (preferably, 100%) of all possible instances of marginalities to virtually eliminate the need for a third silicon spin. • Economics of chip fabrication: More precisely, economics of chip fabrication provides us with some flexibility as we can ignore some extreme delays, provided they affect only a small and bounded fraction of chips. This allows us to use delay models that consider as lower and upper bounds, delay values obtained by capturing ±k standard deviations (σ) around the mean values (μ), provided that k is sufficiently large to capture atleast (100 - ε)% of fabricated chips, where ε is determined by economic considerations. All guarantees provided by our approach are under such considerations. • Meet practical constraints on run-time complexity: While a wide range of relevant models and simulators exist, they cannot be used because the run-time complexity for the types and lengths of vectors sequences used for validation are orders of magnitude greater than what can be simulated in acceptable run time. This is a fundamental barrier that arises due to the need (a) to use high-complexity models for low-level electrical effects, (b) to consider extremely large numbers of targets, where each target is a specific instance of a low-level effect, and (c) to simulate or generate extremely large numbers of vectors. 24 • No pessimism: It is imperative that the proposed framework for validation of marginalities does not raise many false alarms. Each false alarm increases time- to-market for the chip. Time-consuming and expensive diagnosis must be performed for each false alarm. Furthermore, if a false alarm is mitigated via redesign, it can reduce the chip’s performance or increase its cost. In addition to above, precise models for low-level electrical effects may be unavailable or not useable due to their complexities or reliance on unavailable parameter values. 2.3.2 Key ideas In the previous section we identified the unique challenges based on our comprehensive understanding of silicon studies [45][185][26]. We will deal with these challenges by using our following key ideas (also see Table 2.1): 1. Use of bounding approximations: We can use bounding approximations, i.e., approximations that preserve low-runtime complexities of existing delay models and yet provide upper and lower bounds on actual values (of parameters and parasitics) or behavior (of low-level electrical effects). Bounding approximations will be used to (i) reduce run-time complexities, and (ii) to tackle the unavailability of precise models and values. 2. Focus on significant marginalities only: The use of actual chips allows each applied vector to invoke every relevant low-level effect while inherently considering the precise values of all circuit parameters and parasitics (including those that are unknown during our modeling, simulation, and vector generation). 25 The purpose of validation is to detect all marginalities that may cause a significant fraction of fabricated chips to fail. Hence it is sufficient for the methodology to detect such marginality for a sufficient number of chips to trigger further investigation. (This is in contrast with testing, where every chip where we fail to detect an error-causing marginality is a failing chip we sell to customers.) As mentioned earlier, this enables us to select bounds on delays that capture sufficient proportions of chips under normal variations, i.e., (100 – ε)%, where ε is specified by the user. This also allows us to simplify our framework by sampling for validation a sufficiently large number of chips from the first-silicon batch. 3. All inclusive path-sensitization based on timing dependent logic conditions: We will define new timing based logic conditions that satisfy our objective that, for a given path, the vectors generated using our conditions are guaranteed to (collectively) invoke the worst-case delay of the target path. To achieve this, our approach will start with the set of all possible values and based on timing conditions eliminates only those logic values that can be proven as being unable to invoke the worst-case delay under any circumstance. 4. All inclusive vector spaces based on timing dependent logic conditions: As vectors can be applied to chips at rates near 10 9 vectors/second, large numbers of vectors can be used for validation, especially those applied using the circuit’s functional modes. Hence we will develop approaches that use more vectors to achieve specific objectives, including (a) the notion of surrogates to deal with 26 uncertainties in values and models, (b) identify resilient sets of vectors to detect each target or surrogate, i.e., identify a set containing vectors that is guaranteed to include the vector that invokes the worst-case effect for the target or surrogate despite all model, parameter, and variation uncertainties, (c) generation of vector spaces for groups of targets (instead of vectors for targets) to generate vectors at dramatically lower complexities. 5. Vectors applied to silicon: During validation, these approximations will be used to generate vectors. A loose bounding approximation might lead to generation of a larger set of vectors than necessary. In order to manage storage complexity, we propose to collectively store the generated multiple-vectors on structural testers as vector-spaces (without enumerating the don’t care (X) values). Also, as vectors can be applied to chips at rates near 10 9 vectors/ second, large numbers of vectors can be practically used for validation and when these generated vectors are applied to chips, the behavior invoked is determined by the actual chips and is unaffected by any approximation used during vector generation. Hence, the use of bounding approximations in our framework does not make validation more conservative. 6. Segmentation of process variation envelopes: In order to tackle profound challenges due to increasing process variations on the sizes of path sets and vector-spaces, we will develop novel and innovative probability of occurrence based divide and conquer based approaches that segment the full global plus local 27 variations envelopes into global-only and local-only variation sub-envelopes and can dramatically reduce the expected size of the validation path and test sets. Table 2.1: Ideas to make existing PDT approaches suitable for delay validation Idea Deals with the challenge Bounding approximations Imprecise and inaccurate models etc. All inclusive vector spaces Completeness Timing dependent logic conditions Practical runtime complexity Vector-spaces on ATE Practical storage and application complexities Vectors applied to silicon No pessimism Segmentation of variation envelopes Increasing influence of variability 2.3.3 Technical objective The overall objective of the dissertation is to develop the first variability aware framework for post-silicon delay validation. The proposed framework will target delay marginalities, especially those that might need fixing via redesign to obtain reasonable yield during high volume manufacturing, and will employ the certification via elimination of likely deviations paradigm used for testing. Our framework will be developed to improve the quality of validation by detecting the delay debilitating marginalities. To achieve these objectives, our framework will comprise basic models, algorithms for generating vectors for validation. 2.4 The preliminaries In this work, combinational circuits comprised of primitive gates are considered. The worst-case delay refers to the time at which the last possible transition may occur at 28 an output of the circuit, after the application of transitions at its inputs. The last transition may be associated with a falling transition (a clean falling transition, a hazardous transition-to-0, or a hazardous 0) or a rising transition (a clean rising transition, a hazardous transition-to-1, or a hazardous 1). 2.4.1 Terminology and definitions Maximum falling arrival time (A X FL ): The latest (largest) time at which a falling transition at line X may reach 50% of the power supply voltage, V DD (see Figure 2.1). Maximum rising arrival time (A X RL ), minimum falling arrival time (A X FS ), and minimum (smallest) rising arrival time (A X RS ) are similarly defined. Maximum falling transition time (T X FL ): The largest duration in which a falling waveform at line X may transit from 90% to 10% of the power supply voltage, V DD (see Figure 2.1). Maximum rising transition time (T X RL ), minimum falling transition time (T X FS ), and minimum rising transition time (T X RS ) are each defined in a similar manner. Figure 2.1: Arrival and transition times for a falling transition at line X Logic value system: Throughout this dissertation we deal with sequences of two vectors, even though for simplicity we often refer to them as vectors. Hence we denote logic values at a line by using a subset of {CF, CR, S0, S1, TF, TR, H0, H1}, where CF stands for clean falling (no hazard), S0 stands for static 0 (no hazard), TF stands for 29 transition to value 0 (dynamic hazards possible), and H0 stands for hazardous 0 (static hazards possible). CR, S1, TR, and H1 are similar. Logical path (P): A logical path (P) is a sequence of lines along a circuit path L 1 (a primary input), L 2 , …, and L n (a primary output) and a set of signal transitions Tr 1 , Tr 2 , …, and Tr n , where Tr ∈ {R, F} and Tr i represents the signal transition at L i . The lines L 1 , L 2 , …, and L n are called on-path lines. Gates along P are called on-path gates. If a line directly drives an on-path gate but is not an on-path line, it is called a side-input of path P. For example, Figure 2.2 shows one of the ISCAS85 benchmarks, c17. A logical path {3 R , 7 R , 9 F , 10 F , 12 R , 13 R , 16 F } is highlighted. For on-path line 12, line 10 is an on-path input and line 2 is the side-input. Figure 2.2: A logical path in ISCAS85 benchmark c17 A logical path P = {L 1 , L 2 , …, L n } with a corresponding set of transitions {Tr 1 , Tr 2 , …, Tr n } is sensitizable if there exists a sequence of two vectors V = <V 1 , V 2 > that activates the transition Tr 1 at L 1 and propagates that transition along the path to cause Tr n at L n . P is said to be a false path if it is not sensitizable. Also throughout this research, when we say that the delay of a path is excited, it means that a transition at any on-path 30 line, say L i , is caused temporally (and logically) by the corresponding transition at its on- path fanin, L i-1 . Single Input Switching (SIS): The delay caused due to a transition at one input of a primitive gate is assumed to be independent of the presence or absence of transitions to the same direction at other inputs of the gate (in cases where the effects of the other transitions do not take a dominant role). This is also known as the pin-to-pin delay [201]. Multiple Input Switching (MIS): The delay caused by a transition at one input of a primitive gate is affected by the presence or absence of transitions to the same direction at its other inputs. These are further classified based on the type of transitions as multiple inputs switching for to-controlling (MIS-TC) transitions, and multiple inputs switching for to-non-controlling (MIS-TNC). MIS will be covered in more details in Chapter 3. Robust path delay test: A path delay test that guarantees to detect the delay fault on the targeted path independent of all other delays in the circuit by disallowing side input transitions that may cause the final value to occur early at the gate output [30]. Static Timing Analysis (STA): A timing analysis that provides min-max range for each line in circuit for both rising and falling transition [108]. The timing ranges calculated are independent of input vectors and represent bounds on minimum and maximum delays over all pairs of vectors. 31 2.5 The proposed approach The key observation that governed the development of our framework must provide the guarantee that we do not miss the worst-case delay for at least (100 - ε)% of fabricated chips, where ε is user-specified and is determined by economic considerations, despite all uncertainties of the models, parameter values, and variations. 2.5.1 The problem statement Given a gate-level netlist of a combinational logic block, we derive new delay models of the components used in this circuit and subsequently develop the first framework which generates two-vectors tests, multiple for each target path if needed, such that when applied to chips sampled from the first-silicon batch fabricated for the circuit design collectively, our sets of tests are guaranteed to expose the worst-case delay for each chip that lies in the (100 – ε)% of all chips fabricated, where ε is specified by the user. 2.5.2 The overall approach Figure 2.3 shows our proposed flow, where the vectors are generated using our ATPG framework on the software front and the delay is measured on ATE (using the generated vectors) on the hardware front. Thus the delay reported is the actual delay and not an estimated one. 32 Figure 2.3: The propsoed flow We now propose an approach to deal with the gate-dominated paths in the netlist. The proposed approach has four major components (shown in Figure 2.4): Figure 2.4: The proposed approach Resilient delay model: While it is difficult to precisely capture all delay effects for modern CMOS gates, it is possible to derive a resilient delay model that captures all of these effects using practical upper and lower bounds on gates’ delays. A resilient delay model will not be invalidated by inaccuracies and variations as it captures these using bounding approximations. 33 In all existing approaches, the use of a non resilient delay model contributes to the shortcomings of every major task, namely path selection, vector generation, and timing characterization. Hence, in Chapter 3 we present a method that guarantees to capture process variations (by deriving bounds on delay values, obtained by capturing ±k standard deviations (σ) around the mean values (μ), provided that k is sufficiently large to capture at least (100 - ε)% of fabricated chips) into advanced delay models [28][29] to obtain a resilient delay model. Path selection: A path selection approach that uses the resilient delay model and identifies a set of paths that is guaranteed to include every path (in the selected (100 - ε)% of fabricated chips) that may potentially cause a timing error, under the condition that the accumulated values of additional delays along circuit paths is upper bounded by a specified limit (∆). Note that ∆ will be computed in accordance with the requirement imposed by ε. We obtain a path selection approach which provides such guarantees by adapting an existing approach [80] to use the abovementioned resilient delay model. Timing and logic conditions for guaranteed detection: A resilient delay model no longer guarantees that any vector that satisfies the conditions of a robust test will invoke the worst-case delay. Hence, we derive new conditions – which we call MDS (Maximum Delay Sensitization) – such that vectors that satisfy these conditions are guaranteed to invoke worst-case delay in the selected path of the selected (100 - ε)% of fabricated chips. 34 Vector generation via selective enumeration: The above conditions and the bounding delay models typically can only partially-order two-pattern tests in terms of the delay they might invoke in silicon. Using partial ordering (based on the worst case delay invoked) among the logic conditions, we develop an innovative search algorithm to arrive at a set of multiple vectors that are collectively guaranteed to resiliently invoke the maximum delay of the target path of the selected (100 - ε)% of fabricated chips. In Chapter 5 we present a method for vector generation via selective enumeration that guarantees to expose the worst case delay of a fabricated chip. Thus, the Chapter 3 on delay model and the Chapter 5 on vector generation present the crux of our proposed approach. In Chapter 6 we present a divide and conquer based method to efficiently generate the vectors for validation. The generated vectors when collectively applied, guarantee to expose the worst case delay of a fabricated chip at drastically reduced test generation time and test application time method for vector generation via selective enumeration that guarantees to expose the worst case delay of a fabricated chip. 35 CHAPTER 3 CAPTURING VARIABILITY IN GATE DELAY MODELS In nano-meter technologies, many phenomena affect gate delays. Our analysis and others’ recent experiments on actual chips (see Section 2.2) have demonstrated that these effects have become so significant that the pin-to-pin delay models, used by almost every gate-level timing analysis tool, are inadequate for current and future technologies. 3.1 Post-silicon tasks require timing Timing simulation and analysis have become central to any design flow - starting from design specifics to high volume manufacturing (see Table 3.1). On the pre-silicon front, circuit timing simulation (with fully specified vectors) and static timing analysis (with fully unspecified vectors), have become imperative for design tasks such as retiming and synthesis. Whereas, on the post-silicon front, the vectors (that are generated based on the pre-silicon timing analysis - which is typically invoked a large number of times during vector generation with fully specified, partially specified and unspecified vectors) are applied for tasks such as delay validation, diagnosis, testing, and speed- binning. It is usually desired, that the post-silicon delay estimates must strongly match the pre-silicon ones, but unfortunately, this seldom happens in industrial practice [45][185][26]. 36 Table 3.1: Tasks that require timing Tasks Type Vector-based Fully- specified Partially- specified Unspecified Circuit simulation Pre- silicon Y Static Timing Analysis Pre- silicon Y Retiming and synthesis Pre- silicon Y Y Vector generation (validation, diagnosis, testing and speed binning) Pre- silicon Y Y Y Vector application (validation, diagnosis, testing and speed binning) Post- silicon Y Delay testing is the basic technique used for identifying slow paths and slow ICs. Slightly different variants of delay testing are used during validation, diagnosis and characterization of the first-silicon for a new design. It is also used after the design moves into high volume manufacturing, during delay testing and speed binning [26]. One important common characteristic of all the above post-silicon tasks is that they are vector based and require us (a) to generate vectors that will provoke worst-case delays, (b) to evaluate given vectors in terms of their ability to excite high delays, (c) to analyze vectors that fail tests at high speeds to identify the root causes – namely slow sub-paths or gates, and so on. 37 Pre-silicon delay models are the foundation of all variants of delay testing for the above post-silicon tasks, since these tasks use pre-silicon models to select/prioritize paths based on their delays, to generate suitable vectors, and to analyze given vectors. As the fabrication process moves into nano-scale, the importance of many delay phenomena [28][29] and levels of process variations [10][128][194][93] are growing. These two facts are making delay models from recent past increasingly inaccurate (i.e., unable to capture emerging delay phenomena) and non-resilient (i.e., invalidated by process variations). Delay model inaccuracy and non-resilience are the two main reasons behind the limitations of the silicon studies on characterization of timing behavior using fabricated chips from nVidia, Freescale, and Sun (now Oracle) [45][185][26] (see Section 2.2) Moreover, an inaccurate and non-resilient delay model can diminish validation quality; also it can increase the number of vectors generated and hence increase validation costs. Similarly, such a delay model can decrease the resolution of delay diagnosis and hence increase the costs of redesign and reduce the confidence in speed binning. All the above post-silicon tasks require delay models that are accurate and resilient. Any vector generation approach – for validation, delay testing and speed- binning – starts with a completely unspecified vector and specifies additional bits of the vector. Since only fully-specified vectors can be applied to actual silicon, all post-silicon tasks are carried out using fully-specified vectors. Hence, post-silicon tasks require a delay model that can work with fully- and partially-specified as well as fully-unspecified vectors. Finally, since the delay engine is called frequently during vector generation and 38 simulation in a timing-oriented framework, all post-silicon tasks require delay models that have low computational complexities. 3.2 Existing delay models: A review Delay models of gates and wires are used by static timing analysis (STA) to estimate the performance of a chip before releasing its design for fabrication (tapeout). The correlation of the STA results with measurements on actual silicon is primarily determined by accuracy of delay calculations [93]. It is imperative for a gate’s delay model to accurately represent the logic as well as timing behavior of the gate. A basic delay model considers basic delay determinants of the gate, such as input slew and output load. These models are extended to derive advanced delay models that capture additional phenomena associated with timing behavior. The timing behavior are often classified into three – single input switching (SIS), multiple inputs switching for to-controlling (MIS- TC) transitions, and multiple inputs switching for to-non-controlling (MIS-TNC) transitions. It is now increasingly common for delay models to also account for additional effects, such as crosstalk and ground bounce, and variability [93]. MIS-TC transitions at the inputs of a primitive gate decrease the gate’s delay due to activation of multiple charge/discharge paths [28]. On the other hand MIS-TNC transitions at inputs increase the gate’s delay due to Miller effect [29]; in this case, the gate’s delay also depends on the initial state of the capacitiances of internal nodes between series transistors, body effect, and impedence matching (history or stack effects) [29][5]. Figure 3.1 shows the delay vs skew for near-simultaneous transitions at the 39 inputs of a 2-input NAND gate in 65nm CMOS technology, to–controlling (MIS-TC) and to-non-controlling (MIS-TNC). Figure 3.1: Delay vs Skew curves for MIS [28][29] In current practice, delay models are categorized into two types - voltage reference models (VRM) and current source models (CSM) (see Figure 3.2). VRMs define the characteristics of the voltage response at the gate output as a function of input slew and output load using look up tables and interpolation [48], whereas CSMs use a non-linear voltage-controlled current source to determine the output delay and slew via circuit simulation [5]. Figure 3.2: Delay model categorization [48][5] MIS-TC is a well-researched phenomena and has been considered by many delay models and timing analysis approaches ([162][167] for VRMs, and [93][6] for CSMs). 40 MIS-TNC is particularly important since the associated effects (Miller effect, body effect, and stack effects) can increase gate delays. The models in [28][29] (for VRM) and [5] (for CSM) are the first ones to employ a model that combined MIS-TC and MIS-TNC (see Table 3.2). Table 3.2: Review of existing delay models Reference # VRM CSM SIS MIS-TC MIS-TNC Variation- aware [28][29] Y Y Y N [47][70][176] Y Y N [162][167] Y Y N N [5] Y Y Y N [6] Y Y N N [39] Y Y N [60][69][171] Y Y Y [2][67][158][183] Y Y N Y Although industry has migrated from VRMs (SPDM (Scalable Polynomial Delay Model (Synopsys)) to CSMs such as ECSM (Cadence) and CCS (Synopsys) for pre- silicon design oriented tasks namely retiming and synthesis because of the latter’s ability to handle complex waveform shapes [70][93], VRMs are preferred over CSMs for post- silicon tasks due to the associated lower characterization efforts and lower computational complexities (see Section 3.3.5), and advanced delay models, such as those presented in [28][29], are imperative for accurate timing tools [27]. But with the increase in process 41 variations [10][128][194], even these delay models have become inadequate for post silicon tasks for high performance circuits [45][185][26]. CSM based models are good for timing analysis only when the input vectors are completely specified. Hence, these models cannot be used for test generation and other tasks which deal with partially specified vectors [27]. To the best of our knowledge, exiting CSM based approaches for timing analysis [171][72] are essentially SPICE simulation replacements which cannot work efficiently for the generic static timing analysis [195] to calculate arrival and transition time ranges when the vectors are completely unspecified. The growing effect of variability on delay has fueled the development of statistical delay models [12]. Statistical delay models for CSMs [60][69][171] do not capture MIS effects; those for VRMs consider MIS-TC ([2][67][158][183]) but ignore MIS-TNC and associated phenomena. Statistical delay models are unsuitable for post- silicon tasks because of their inability to handle correlation efficiently and due to the enormous complexity associated with using realistic distributions (non-Gaussian) in SSTA [12]. Even if correlations are ignored, simplistic assumptions of independent normal Gaussian distributions are made, these delay models cannot capture all associated delay phenomena as well as variability at practical complexity [12]. These delay models report the delay as a distribution rather than in terms of bounds – the format which is easily understandable by existing timing analysis tools [195]. However, bounds can be considered as a special case (truncated) of distributions, but statistical operations of max and sum on these truncated distributions will not only increase the computational 42 complexity, but also will make the bounds progressively more and more inaccurate [12]. Also, SSTA based on statistical delay models is inefficient for post silicon tasks due to its inability to take advantage of the additional timing related information associated with partially- or fully-specified vectors. Moreover all existing variation aware delay models are analytical in nature and tend to become more complex and less accurate as device dimensions shrink below 65nm [70]. All this necessitates a resilient delay model that can be suitably used for post-silicon delay characterization. Any vector generation approach for post silicon tasks such as testing or validation starts with a preprocessing step where the given netlist is converted to a format where complex gates are decomposed into primitive gates. Also, in Chapter 5 we have motivated why path delay test (PDT) is a suitable candidate for generating vectors for post-silicon validation for gate-dominated paths. Such paths generally consist of more than 15-20 logic gates [45] and frequently occur in large microprocessor blocks [129] that can be either huge data-path blocks (ALU[178]) or control blocks (such as instruction decoder [159].) (Wire-dominated paths [26], such as global wires, bus lines, etc. are not considered here.) Hence, post-silicon delay model characterization for PDT needs to be performed for only a few primitive gates (AND/OR/NAND/NOR etc) and the associated characterization effort is much less than a comprehensive full library characterization (generally done for pre-silicon delay related tasks) [6][77]. 3.3 Developing the resilient delay model Our complete delay validation framework (including delay models, timing analysis, and target selection) is designed to guarantee that we do not miss the worst- 43 case in the selected (100 – ε)% of all fabricated chips, where ε is specified by the user (from Section 2.3.2), despite all model, parameter, and variation uncertainties. Also, we would like to restate the fact that the timing ranges calculated by resilient delay model are tight except for the inherent sources of looseness - partially specified vectors, variability, and looseness associated with approximations used to reduce complexity. 3.3.1 The delay modeling approach An inaccurate and non-resilient delay model can diminish validation quality, at the same time it can increase the number of vectors generated and hence increase its costs. Hence, we need an accurate and resilient delay model for post-silicon tasks that has the following characteristics: • Captures known and emerging gate delay phenomena [28][29] as well as variability. • Only uses bounding approximations to tackle unknowns and any simplifications necessary to make complexity manageable. • Enables computation of tight timing ranges in an efficient manner (manageable complexity) by allowing a timing analyzer to use all available information about logic values [27] (e.g., a logic value at a line internal to a combinational logic block that can be proven to be a necessary condition for the task at hand, a given partially-specified vector applied at the inputs of the block, and so on). 44 3.3.2 Characterization setup Currently the prevalent commercial STA tools employ pin-to-pin delay models [195]. Able to consider input slew and output load, pin-to-pin delay models specify the propagation delay from a given input pin to a given output pin. They implicitly consider the state of internal capacitances since only one such state is possible for each single input transition. But they do not accurately capture the delay for multiple input switching (both MIS-TC and MIS-TNC) and thus are unsuitable for post silicon tasks. The multiple input switching aware delay models in [28][29] only use qualitative information, such as causality and provable properties of underlying physics, particularly the effect of near simultaneous transitions (MIS-TC and MIS-TNC), and hence are used as the starting points. The modeling approach in [28][29] consists of: • Perform simulations by varying all input parameters (input slews, input skews, and the initial state of internal capacitances) over their typical ranges. • Quantify the significance of associated delay phenomena for simultaneous transitions from the simulation data. • Identify appropriate input waveforms that activate each phenomenon. • Develop an empirical model that identifies input waveforms that excite these phenomena. The characterization setup shown in Figure 3.3 can be used to feed the gate with different realistic waveforms with different attributes. The standard delay characterization methodology requires the delay calculation of a gate driving a capacitive load to be driven by a load-less cell [161]. Thus, the parameters (C 1 , S in , and inverter chain length) 45 of the driver circuit can be changed to generate waveforms with different transition times, and skews (between pairs of waveforms). The voltage at node N 01 is copied to node N 11 to ensure the load-less driver requirement of delay characterization [161]. Note that S in1 responsible for voltage N 01 is an ideal voltage source, but S m1 resulting in N 11 is a controlled voltage source with voltage V (N 11 ) = V (N 01 ). Switch SW 1 can be used to get the mirrored (SW 1 =1) or inverted (SW 1 =0) waveform. We also varied the initial states of the internal capacitances, C int , as either precharged or pre-discharged, in our characterization step to account for stack effect [5][29] (not shown in Figure 3.3). We varied C L to account for different capacitive loads on the gate’s output. Figure 3.3: Characterization setup without considering the effect of interconnect. This characterization setup can be extended to account for interconnect delay using the approach in [77] (see Figure 3.4). The basic difference between Figure 3.3 and Figure 3.4 is the use of a π model of the load instead of a simple capacitive load at the driver. 46 Figure 3.4: Characterization setup considering the effect of interconnects 3.3.3 Basic delay model In order to quantify the significance of associated delay phenomena and to capture the effects of MIS on delay, for a NAND gate in a 65nm CMOS technology we perform circuit-level simulations by varying the input transition times (T R ) from 0.1ns to 0.2ns, skew (δ) from -0.5ns to +0.5ns, and output load (C L ) from FO1 to FO4. (We used Cadence Spectre simulator for all our simulations.) We also set C int to 0V and V dd in different simulations to account for stack effect [5] . We arrived at the delay vs. skew relationship for near-simultaneous transitions (MIS) at the inputs of a 2-input NAND gate. Figure 3.5 shows the delay vs. skew relationship (for T X R = 100ps, T Y R = 100ps, C L = FO4) for MIS. Note that, our characterization produces such sets of curves for other combinations of T X R and T Y R values too. It is clear from Figure 3.5 that multiple input switching can significantly affect delay – SIS overestimates MIS-TC by about 30% and underestimates MIS-TNC by about 50% for the cases shown here for 65nm CMOS 47 technology. Similarly, output transition time functions are derived for both MIS-TC and MIS-TNC. Figure 3.5: Delay vs skew curve for near-simultaneous transitions (basic delay model – simulations results) The curves, such as in Figure 3.5, capture all the known basic delay phenomena associated with MIS. MIS-TC at inputs of a primitive gate decrease gate delay due to activation of multiple charge/discharge paths [28]. On the contrary, MIS-TNC at inputs 48 of primitive gate increases the gate delay [29] due to a combination of various effects such as: • Short circuit current – gate outputs start to switch when the current that charges the output is smaller than the current that pulls it down. • Initial state of internal capacitances (precharged and pre discharged) – delay is a function of charge on internal capacitance also know as stack/history effect [5]. • Miller effect – Charge is transferred from gate inputs to gate outputs and slows down the output transitions. • Body effect – affects the threshold voltage of a transistor which in turn affects delay. • Stopping early discharge – two skew-delay curves may have same pin-to-pin delay but different simultaneous (MIS related) delay. • Impedance matching – Better match between pull-down transistors can reduce the gate delay. Figure 3.6, shows the various delay phenomena associated with MIS-TNC for a two input NAND 180 nm [29]. Though not reported here, we observed all these phenomena (with similar trends) for our experiments with 65nm CMOS technology too. 49 Figure 3.6: Various phenomena associated with MIS-TNC [29] 3.3.4 Timing functions Given arrival times and transition times at a gate’s inputs, and the initial state of internal capacitances, we compute timing functions for gate delays and output transition times [28][29]. The output arrival time and transition time is computed from the above data in our advanced timing analyzer - ETA (more in Section 3.3.8). 50 Figure 3.7: (a) Rise delay function, (b) fall delay function Figure 3.7 shows the delay timing functions for a 2-input NAND gate where all inputs of the gate have either steady non-controlling values or transitions in one direction. Similarly output transition time (rise and fall) functions can be obtained. Here, N c represents the number of internal capacitances which are precharged to V dd – V th and δ = A Y – A X is the skew. Gate delays are represented by the following timing functions: • MIS-TC: Rise delay function d Z R (T X F , T Y F , δ Y,X ). • MIS-TNC: Fall delay function d Z F (T X R , T Y R , δ Y,X , N c ). The timing functions are then approximated into a piecewise linear model using empirical equations arrived at by using curve fitting as shown in Figure 3.8 where the accuracy of the model can be traded off with the complexity for development and the usage in the subsequent timing analysis. For example, the output fall delay function d Z F (T X R , T Y R , δ Y,X , N c ) which is approximated as a linear function of three coordinates (D0,S0), (D1,S1), and (D2,S2). Here Di and Si represent the values of delay and skew corresponding to the three coordinates (i = 0, 1, and 2) and in turn are function of the parameters T X R , T Y R , δ Y,X and N c themselves, e.g., D0 = (A 0 *( T X R ) 2 + B 0 * T X R * T Y R + C 0 *(T Y R ) 2 ). (N c = 1), where A 0 , B 0 and C 0 are empirically deteremined constants. Note 51 that any approximation we use to reduce complexity ensures that our model bounds actual delay in silicon. Figure 3.8: Piecewise linear approximation for near-simultaneous transitions (basic delay model) 3.3.5 Incorporating variability and bounding approximations A single curve combining the two cases (shown in Figure 3.5) cannot capture all inaccuracies and variations. Hence, we capture the inaccuracies and variations in these delay parameters using bounding approximations. We consider process variations in terms of the parameters of the devices in the gates, such as V th , L eff , etc. Using the values of variations (from a foundry that fabricates chips in 65nm technology) for most of the 52 circuit parameters (including the major ones of V th , L eff , etc), we perform Monte Carlo simulations to obtain a version of each delay curve for each circuit instance generated via Monte Carlo. Note that the two sets of curves in Figure 3.9 are two of the n sets of min and max delay curves obtained for all Monte Carlo iterations (say n), rest of the set of curves (say n-2) are omitted for clarity. Figure 3.9: Delay vs skew curve for near-simultaneous transitions (resilient delay model – simulations result) 53 Subsequently, we derive the two envelopes to bound the constellation of curves representing the gate delay under variability, as depicted by the two outer envelopes in Figure 3.10. The relationship of min and max curves in Figure 3.9 can be easily represented as a pair of three/four point piecewise linear approximations (curves with big squares at each data point in Figure 3.10). The two envelopes obtained (curves with big squares) represent the bounded approximations for the resilient delay model. Figure 3.10: Resilient delay model – 3,4 point linear bounding 54 The tightness of the bound defines the cost-benefit of the subsequent steps of timing analysis, path selection, and vector generation. The relationship between output transition times and skew for both MIS-TC and MIS-TNC are derived in a similar manner. Also, for each timing function in [28][29] we now have two sets of functions corresponding to the upper and lower bounds Figure 3.11: Resilient delay model – pin to pin bounding 55 We would also like to bring to attention to the fact that though characterization of delay model is originally done for 11 skew points from -500ps to +500ps in steps of 100ps, we only store the 3/4 skew points corresponding to the three/four point approximation and hence do not increase the characterization effort significantly (see Section 3.3.6). The basic pin-to-pin delay model looks like a step function with a step at zero skew, where the size of step captures the difference between the pin-to-pin delays for the two inputs. Another way to bound the resilient delay model will be to use the simple pin- to-pin delay model to derive the bounds (Figure 3.11). It should be noted that in cases such as those for MIS-TC and MIS-TNC the step function of pin-to-pin delay model will reduce to a flat line just like the traditional corner based worst-case delay model. Table 3.3 shows the superiority and tightness of our bounding approximations with respect to the simple pin-to-pin bounding (over all vectors). Here the overestimation of delay by an approach indicates that the delay reported by the approach is greater than the actual maximum delay or less than the actual minimum delay. Table 3.3: Tightness of bounding approximations Type of bounds Overestimate max error (%) MIS-TC MIS-TNC Max delay Min delay Max delay Min delay Pin-to-pin 94.2% 64% 45.4% 50.4% 3/4 point 4.4% 5.3% 5.1% 5.5% Our proposed 3/4 point bounds are automatically tighter (maximum error of about 5% compared to 90% for the bounds based on industry prevalent pin to pin models) 56 because of our objective of emulating the reality as closely as possible at low complexity. Since the resilient delay model will be used for post-silicon tasks, it is imperative that it must capture the worst case with minimum looseness so that the subsequent post-silicon tasks do not explode in complexity (more to come in Chapter 3.4). 3.3.6 Complexity trade-offs In Table 3.4 we compare our VRM based approach with the CSM based approach of [5] (which captures most of the known phenomena associated with gate delays) for characterization and usage complexities. The values in Table 3.4 are for the delay of a single two input NAND gate characterized for 5 values of input slews and 5 values of output load. Suppose, for our approach we characterized for 7 values of skews for MIS (both MIS-TC and MIS-TNC) and 2 values of state of internal capacitance for MIS-TNC. Thus total transient simulations for our approach will be 5*4*7*(2+1) = 420. The corresponding value for CSM based approach [5] with input waveform sampled at 7 points will be 4*7 4 = 9,604 transient and 9,604 dc simulations. We consider about 10,000 Monte Carlo simulations [10] for characterization with full variability, so the corresponding characterization effort for our approach with variability is 4.2*10 6 . The statistical approach of CSM [60] requires the sensitivity of each parameter of variation to be calculated separately. Given, that for the library under consideration we have about 50 such parameters we arrive at the number of simulations to be 9.6*10 9 (dc + transient). We store curves for MIS_TC and MIS_TNC as three and four point approximations respectively. Hence, in a Table form we need to store 140 points which can be further optimized in an equation format by using curve fitting techniques to 7 57 points only. For variability, we store the two bounding curves for each set of input slew and output load hence the corresponding number of points for table and equation forms are 280 and 14 respectively. On the other hand, the CSM model needs to store output current, Miller capacitances, output and input capacitances as functions of input and output voltages which require about 48,000 points to be stored in table form. This can be further reduced to equation form at the cost of some accuracy (about 8%) by about 90% using compression techniques proposed in [76]. The statistical model [60] of CSM will require individual sensitivity tables and thus the storage complexity increases. Table 3.4: Complexity trade-offs: VRM V/s CSM for a 2 input NAND Gate Nominal Full Variability VRM CSM VRM CSM Characterization complexity (number of simulations performed) Transient simulations 420 9,604 4.2 x 10 6 4.8 x 10 9 DC simulations 0 9,604 0 4.8 x 10 9 Characterization complexity (storage size) Table form 140 48,020 280 2.4 x 10 6 Equation form 7 4,800 14 2.4 x 10 5 Usage in timing analysis complexity (number of operations performed) Delay calculation 15 350 15 17,000 Our current approach for characterization requires about 7 simulations for various skews for each two input gate with a single load. We can let go of some accuracy to reduce simulation effort by considering a bound on near-simultaneous range as a function of maximum pin to pin delay. For example consider the near simultaneous region bounded by twice the pin to pin delay on either side of zero skew between near 58 simultaneous transitions. In such a scenario the characterization effort for MIS just requires storage at one additional point – the zero skew and simulation runs reduce from 420 to 180. Also, we have observed the two bounding envelopes to be highly correlated and this property can be used to reduce storage complexity where only one envelope is stored and the other is derived from the first one. Table 3.5: Complexity trade-offs: Our approach v/s pin-to-pin bounding Nominal Full Variability 3/4 point Pin-to-pin 3/4 point Pin-to-pin Characterization complexity (number of simulations performed) Transient simulations 420 420 4.2 x 10 6 4.2 x 10 6 Characterization complexity (storage size) Equation form 7 1 14 2 Usage in timing analysis complexity (number of operations performed) Delay calculation 15 1 15 1 Characterization is a non-recurring cost but timing analysis runtime is a huge recurring cost for test generation. Hence, we evaluated the complexity of delay calculation for a fully specified vector by counting the number of floating operations performed. We assign a weight of 1,2 and 5 to addition, multiplication and division respectively. Bottom row of Table 3.4 clearly indicates that our approach though with diminished accuracy, is much more runtime efficient than CSM methods [5][60]. Later in Section 3.4, we will show that timing analysis for unspecified and partially specified vectors using CSM is highly impractical. 59 Table 3.5 shows the complexity trade-off of our approach of 3/4 point approximation v/s pin-to-pin bounding (based on pin-to-pin delay model prevalent in industry standard STA tools). As can be seen here, characterization effort in terms of simulations performed is identical to our approach (as all the known effects must be characterized for completeness). Though storage and timing analysis complexity reduces drastically, the inherent looseness in bounds (50% to 90% for a single gate) calculated from this approach (see Table 3.3) and the enormous increase in complexity of post- silicon tasks (see Section 3.4.2 and Chapter 5) renders pin-to-pin bounding (also, known as worst case delay model) impractical for post-silicon delay related tasks. 3.3.7 Extended model The proposed resilient delay model can handle different numbers of inputs, input positions and can be extended to handle more than two simultaneous transitions using the approaches from [29]. The extended delay model though more accurate, needs more cases to be enumerated and the corresponding characterization effort increases. Also, the timing analysis framework needs to consider more timing cases and thus the runtime complexity explodes. A workaround is to decompose the gates in the circuit as 2-input gates for delay calculation for post-silicon delay related tasks. 3.3.8 Application of delay model The aforesaid delay model can be used for the following three pre-silicon tasks targeting post-silicon delay related activities. 60 3.3.8.1 Enhanced timing analysis Given the arrival and transition times at a gate’s inputs, we calculate the corresponding quantities at the gate’s outputs. Figure 3.12 shows the possible input combinations for a rising/falling transition at output. Figure 3.12: Possible input combinations for (a) output rising transition (b) output falling transition. We enhanced the approach in [27] for both MIS-TC and MIS-TNC, where by using the bitonic relationship between delay and input transition time and exploring the possible transition times within the min-max range in the input transition time vs. delay curves, one arrives at the equations for output arrival times (see Figure 3.13) and output transition times. Figure 3.13: Calculation of output arrival times in ETA using resilient delay model. Hence, we can use these delay models to compute the output arrival time and output transition time ranges in a static vector-unaware manner [80] or a dynamic vector- 61 aware manner [27]. The bounded 3/4 point model tightens these bounds when logic values at any subset of circuit lines are specified (see Figure 3.14). Figure 3.14: Shrinking of timing ranges – timing implications. 3.3.8.2 Path selection We enhanced the approach in [80] that identifies a set of paths that is guaranteed to include all paths that may potentially cause a timing error if the accumulated values of additional delays along circuit paths is upper bounded by a desired limit (Δ), works with upper and lower bounds given by our resilient delay model and also checks for both functional sensitization and maximum delay excitation (more in Chapter 5). 3.3.8.3 Vector generation approach In [180] impact of multiple input switching for vector generation under process variation is considered but the generated tests do not guarantee to invoke the worst case delay. In Chapter 5 a new approach to generate vectors for post silicon delay characterization using the proposed resilient delay model is presented. The method generates vectors that are guaranteed to excite the worst-case delays of fabricated chips without introducing any pessimism by intelligently dividing the delay model to various timing ranges and innovatively exploiting the effect of MIS-TC and MIS-TNC on the gate delay in these timing ranges. 62 3.4 Experimental results to validate the resilient delay model We characterized the basic gates NAND/NOR/AND/OR for variability and MIS at 65nm technology node and an industry-provided model for process variations, using an industry standard library, circuit simulator (Spectre), and the characterization setup and approach explained in Section 3.3. We then applied our approach to combinational parts of ISCAS89 benchmark circuits (see Table 3.6) using an Intel Core 2 Duo 2.2 GHz machine. All gates in the benchmark circuits are assumed to use minimum-size transistors. Our experiments used our new resilient delay model for both MIS-TC and MIS-TNC. It is widely assumed that accurate industry-standard circuit simulators (such as Spectre) provide the best possible estimate of delay post-silicon. In all the following experiments we will compare the results of our gate level timing simulator (ETA – written in C) with Spectre simulations (where we capture process variations by creating n versions of the circuit using Monte Carlo) to evaluate our approach for post-silicon timing related tasks. Table 3.6: ISCAS89 Benchmark Circuits Benchmark PIs POs # of logical Paths s298 17 20 462 s444 24 27 1,070 s953 45 52 2,312 s713 54 42 43,624 s1196 32 32 6,196 s5378 214 228 27,084 s9234 247 250 489,708 63 3.4.1 Experiments on the accuracy of timing analysis We did a preliminary experiment on ISCAS85 benchmark c17 (Figure 3.15). First, for each primary output (lines 16 and 17) we identify the input cone. We perform cone-exhaustive simulations on cone A (line # 16) and cone B (line # 17) via detailed circuit simulations using Cadence-Spectre to identify the worst case delay invoking vector and compare it with the timing estimates obtained by our approach. Table 3.7: Accuracy of simulation based approach for smaller ISCAS Benchmarks PO # lines Max delay (SPECTRE Simulations) Our approach (Simulations) Max delay Tightness of bounds (max) Analysis for c17 Cone A 16 12 Nominal 0.444ns 0.445ns 0.02% Full variability 0.626ns 0.662ns 5.75% Cone B 17 12 Nominal 0.443ns 0.445ns 0.45% Full variability 0.607ns 0.632ns 4.11% Analysis for s298 Cone 1 250 65 Nominal 0.626ns 0.634ns 1.2% Full variability 0.807ns 0.856ns 6.07% Cone 1 250 65 Nominal 0.625ns 0.630ns 0.80% Full variability 0.807ns 0.856ns 6.07% Cone 1 250 65 Nominal 0.624ns 0.631ns 0.93% Full variability 0.806ns 0.856ns 6.20% 64 We repeated the experiment for the cones corresponding to the top three critical paths (cone 1 for all the three paths) for s298 also (see Table 3.7 and Table 3.8). Figure 3.15: The benchmark c17 In Table 3.7, column 3 shows the number of circuit lines in the fan-in cone, column 5 shows the max delay calculated at the primary output under consideration using circuit simulations. Column 6 (timing simulation), shows the same for our method; Column 7 indicates the tightness of the bounds obtained by our method with respect to the circuit simulation results. In Table 3.8, column 3 shows the number of circuit lines in the fan-in cone, column 5 shows the max delay calculated at the primary output under consideration using circuit simulations. Column 6 (timing analysis), shows the same for our method; column 7 and column 8 indicate weather our approach successfully bounds the delay values obtained from circuit simulations and for how many lines it can do so. Column 9 indicates the tightness of the bounds obtained by our method with respect to the circuit simulation results. The results from these two tables indicate that our proposed analysis approach does bound all the lines in the nominal case with an error of about 1% for c17 65 and 3% for s298, whereas the corresponding figures for our proposed simulation based approach reduces to 0.5% (c17) and 1% (s298) . The error can be associated with the looseness associated with the timing ranges calculated by static timing analysis and the looseness of the bounding approximations. Table 3.8: Accuracy of analysis based approcah for smaller ISCAS Benchmarks PO # lines Max delay (SPECTRE Simulations) Our approach (Analysis) Max delay True Bound # lines bounded Tightness of bounds (max) Analysis for c17 Cone A 16 12 Nominal 0.444ns 0.448ns Y 12 0.09% Full variability 0.626ns 0.676ns Y 12 7.98% Cone B 17 12 Nominal 0.443ns 0.448ns Y 12 1.12% Full variability 0.607ns 0.660ns Y 12 8.73% Analysis for s298 Cone 1 250 65 Nominal 0.626ns 0.639ns Y 65 2.08% Full variability 0.807ns 0.894ns Y 65 10.78% Cone 1 250 65 Nominal 0.615ns 0.639ns Y 65 2.23% Full variability 0.807ns 0.894ns Y 65 10.78% Cone 1 250 65 Nominal 0.614ns 0.639ns Y 65 2.40% Full variability 0.806ns 0.894ns Y 65 10.91% In another set of experiments we perform sufficiently large number of Monte Carlo simulations (with full global variability) on benchmarks c17 and s298 using the 66 vector identified in the previous step. Note that the max delay we are reporting here is actually the maximum arrival time at the output which can serve as a tight upper bound on the worst case delay. Also for the full global variability, we do have 100% bounding but the bounds are loose than the nominal case. Experiments with variability results in an error of about 9% (c17) and 11% (s298), whereas the corresponding figures for our proposed simulation based approach reduces to 6% (c17) and 7% (s298). One explanation for this behavior is that we did same number of Monte Carlo runs (say k, where k is sufficiently large) for both the cases (basic gates and complete circuit). So the k runs for the circuit don’t cover the process space that has been covered during k runs for the basic gates during the characterization step. We also compared the accuracy of our approach for c17 and s298 with respect to SPECTRE using our resilient delay model with 3/4 point bounding and with pin-to-pin delay bounding. Table 3.9 shows our resilient delay model with 3/4 point bounding gives way better results than delay model based on pin-to-pin (used by existing STA tools [195]) bounding. Table 3.9: Accuracy of ETA Benchmark ETA Accuracy – Tightness of bounds (max) (%) Nominal Full Variability 3/4 bounding p2p bounding 3/4 bounding p2p bounding c17 1% 2.5% 9% 11% s298 3% 11% 11% 28% Cone exhaustive simulation even for medium size benchmark is a tedious task and can take up to several days of simulation time. Hence for such circuits, we simulated 67 randomly generated 10,000 vectors using Spectre and ETS (our timing simulator) and compared them to the result calculated by our ETA (Table 3.10). As can be seen from Table 3.10, the max delay for nominal case (nom) for s444, s953 and s1196 calculated by our ETA (at negligible computation cost) is offset by about 10.14%, 12.51% and 16.68% respectively with respect to 10,000 random Spectre simulations (which requires large amount of simulation time). Also, the results reported by our ETS (at considerable computation cost) are much more accurate with error being only 1.4%, 0.5% and 5.2% for s444, s953 and s1196 respectively. Table 3.10: Analysis of medium size ISCAS benchmarks Benchmark Max delay reported (ns) Accuracy (%) Random Spectre simulations Random ETS simulations ETA (Analysis) ETA ETS s444 1.42 1.44 1.564 10.14% 1.4% s953 1.015 1.02 1.142 12.51% 0.5% s1196 2.11 2.22 2.462 16.68% 5.2% Figure 3.16 shows the results for our random simulations. The trend of the curves does indicate that the random simulations and timing analysis can potentially converge for sufficiently large number of simulations. It is important to note that in these experiments the looseness in probably due to the fact that the vector that can invoke the worst case delay might not be applied during the random simulations. 68 Figure 3.16: ETA V/s Random simulations When we use our approach for generating vectors for validation to identify vectors [54] and analyze them using our approach on a vector-by-vector basis, the nominal worst case delay observed for s973 becomes 1.132ns. Compared to this, the looseness of our static approach reduces from 12.5% (from random simulations) to 8.83% (see Table 3.11), demonstrating that our static approach is indeed tight and the looseness in the above figures can be further attributed to the inability of randomly generated vectors to invoke worst-case delays. We would like to restate the fact that this looseness of around 10% for medium benchmark circuits such as s973 incorporates the uncertainty in vector space (unspecified) which none of the other timing analysis approaches [6][72][171] have accounted for while reporting their results. 69 Table 3.11: ETA V/s Random Simulations Vs MDS simulations [54] for medium ISCAS benchmarks Benchmark ETA Accuracy – Tightness of bounds (max) (%) Nominal Full global variability w.r.t. Random ETS Simulations (10,000) w.r.t. MDS Simulations [54] w.r.t. Random ETS Simulations (10,000) w.r.t. MDS Simulations [54] s444 8.61% 4.61% 10.5% 5.2% s953 11.96% 8.83% 15.42% 9.4% s1196 10.9% 4.23% 18.61% 11.2% Monte Carlo simulations using circuit simulators for these medium size circuits are very time consuming, and given the number of process parameters varied for the 65nm industrial library provided to us, such simulations will take days. Hence for experiments with variability on these benchmarks we report the max delay obtained by random simulations and compare them to the result calculated by our ETA (Table 3.11). The inaccuracy of ETA increases for the experiments with variability (from 10.9 % to 18.61% for s1196 w,r,t, random ETS simulations). This inaccuracy can be attributed to Monte Carlo runs on the full circuit not covering the complete process space that is covered during characterization of gates. We performed similar experiments with bigger benchmarks s5378 and s9234 and report the results in Table 3.12. Even for s9234 the looseness of ETA is around 18% (compared to MDS simulations) which is a reasonable accuracy considering the fact that ETA ran only once in a vector unaware manner to achieve this. 70 Table 3.12: ETA V/s Random Simulations Vs MDS simulations [54] for big ISCAS benchmarks Benchmark ETA Accuracy – Tightness of bounds (max) (%) Nominal Full global variability w.r.t. Random ETS Simulations (10,000) w.r.t. MDS Simulations [54] w.r.t. Random ETS Simulations (10,000) w.r.t. MDS Simulations [54] s5378 19% 12% 25% 14% s9234 33% 18% 42% 21% Since the proposed delay model will be used primarily for post-silicon delay related tasks and not design analysis, the delay will be measured on the fabricated chip and not estimated by the loose ETA [82]. The bounding approximation will definitely contribute to the looseness of the estimation but the delay measured by applying vectors generated by our approach [54] will be the actual delay with zero pessimism. Our framework can work with fully unspecified, partially specified, and fully specified vectors and hence is suitable for post-silicon delay related tasks such as timing characterization and delay validation. We demonstrate this on s298 by specifying certain bits in a primary input sequence (test-vector) and report the results in Table 3.13. The first row reports the result of ETA with fully unspecified vectors (equivalent of static timing analysis), whereas the last row does the same for a fully specified vector used for maximum delay invocation [54]. The intermediate rows report (the average over ten random choices evaluated for each case of bit specification) the ETA results with partially specified vectors. 71 Table 3.13: Analysis on s298 with partially specified vectors # of bits specified Nominal Full global Variability A_min (ns) A_max (ns) A_min (ns) A_max (ns) 0 0.290 0.639 0.207 0.894 2 0.289 0.638 0.206 0.882 4 0.288 0.638 0.206 0.878 8 0.276 0.635 0.174 0.873 All 0.276 0.634 0.0084 0.856 It is important to note that a vector unaware approach (using fully unspecified vectors) can overestimate actual maximum arrival times (around 2.08% and 10.78% for the nominal and full variability cases respectively). The corresponding figures for a vector aware simulation based approach are 1.2% and 6.07% (see Table 3.8). Also, as expected the range for fully unspecified vector subsumes that for the partially specified vector, which in turn subsumes that for the fully specified vector. Table 3.14 provides an estimate of the run-time complexity associated with performing timing analysis for partially specified vectors using CSM based approaches [72][171]. As evident from the numbers in fourth column that CSM based approaches even after being superior for pre-silicon tasks are rendered almost impractical for post- silicon tasks. 72 Table 3.14: Runtime Analysis: Our approach Vs CSM based approaches [72][171] BENCHMARK # of bits specified Number of simulation runs required Our approach CSM based approaches [72][171] s298 0 1 1.71 x 10 10 10 1 1.67 x 10 7 All 1 1 s953 0 1 1.23 x 10 27 10 1 1.2 x 10 24 All 1 1 s9234 0 1 5.11 x 10 148 10 1 4.99 x 10 145 All 1 1 3.4.2 Experiments on path selection and vector generation In our second set of experiments, we characterize the delay for each gate using full global variability and generated vectors for validation using our delay validation framework [54] (explained in more details in Chapter 5). In these experiments, since the variations are incorporated in the delay models, the timing threshold ∆ = 0.02 [80] captures only modeling errors. Table 3.15 shows the comparison of our delay model with the worst case delay model on various delay marginality validation metrics such as selected path set, generated vector set and test generation runtime (given in CPU clocks provided by the clock() function in C, an approximation of processor time). Results clearly indicate the superiority of our proposed delay model over the bounded pin-to-pin delay model. For example, for s9234 out resilient model decreases the 73 vector-space by 10 5 X, and decreases test generation time by about 5X. This can be attributed to the fact that a simpler delay model might gain in the delay calculation phase, but the inherent looseness in bounds will increase the search space leading to increase in the size of the selected path set and the vector-space that is searched. In [54] we have shown that the vectors generated using our resilient delay model can invoke more delay (up to 15% in some cases) compared to robust test vectors. Table 3.15: Analysis of validation metrics: Our approach V/s P2P based approaches Benchmark Resilient delay model Bounded P2P delay model Paths Vectors CPU clocks Paths Vectors CPU Clocks s298 2 64 4,179 11 74 6,364 s953 2 2 83,467 6 12 97,264 s1196 18 936 213,476 25 7,552 354,468 s9234 5,346 4.35 x 10 5 2.67 x 10 8 18,764 8.15 x 10 10 1.17 x 10 9 Also the validation vector set, though large, is practical because post-silicon validation is performed for a small sample of chips selected from first-silicon batch (unlike delay testing, which must be performed on every fabricated chip copy). Note that the test vector spaces generated by our approach can be further refined using test compaction methods [177]. Also, in Chapter 6 we present an approach to further reduce the validation vector set by intelligently incorporating the knowledge of process variations in a probability of occurrence based divide and conquer approach. 74 3.5 Summary All post-silicon tasks – validation, diagnosis, delay testing and speed-binning – must be carried out by applying vectors to actual chips, and capturing and analyzing responses. Yet, vectors used must be generated and analyzed using pre-silicon models of the circuit. Three comprehensive industrial studies demonstrate that existing approaches for generating and analyzing such vectors are inadequate, and one major weakness is that existing delay models either do not capture process variations or do not capture advanced delay phenomenon that significantly affect delays, especially multiple input switching (also known as near-simultaneous transitions) at inputs of a gate, the charge on internal capacitance and Miller effect. Hence, we developed a general approach for extending any delay model (pin-to-pin and beyond) to ensure that all minimum and maximum delay values computed are guaranteed to bound the corresponding delay values in silicon. Experimental results demonstrate that our new resilient delay model based on bounding approximations compensates for lack of exact knowledge and inaccuracies in delay parameters and captures the effect of variability and yet generates tight bounds. It can also tighten these bounds using available logic values at any circuit lines and thus is suitable for post silicon tasks. It is also a dramatic improvement over the traditional worst-case delay model in the terms of paths selected and vectors generated for validation. This shows that it is important to carefully select a delay model to apply bounding approximations, to ensure high accuracy at low overall complexity. 75 CHAPTER 4 GATE DELAY MODEL FOR ULTRALOW POWER CMOS CIRCUITS In this chapter, we present extensive experimental results to demonstrate that MIS has significant impact (around 30-40%) on nominal delays of near- and sub-threshold gates. Subsequently, we extend our model presented in Chapter 3 to near- and sub- threshold circuits. In particular, via extensive experimentations we show that our model never underestimates the delay and tightly bounds the actual delays. In contrast, in many of these experiments, existing delay models underestimate delays and always provide much looser bounds. 4.1 Emergence of low power CMOS circuits Energy efficiency has become an ubiquitous design requirement for digital circuits and aggressive voltage scaling has emerged as the most effective way to reduce the energy use [74]. Traditionally, design optimization in the logic circuit community has always targeted the minimum-delay operational point (MDP), but as shown in Figure 4.1, for increasing fraction of chips, the energy constraints have shifted the focus from traditional minimum delay operational region to ultralow-energy region around the minimum energy operational point (MEP) [120]. This shift in paradigm resulted in 76 emergence of the family of low V dd circuits known as near-threshold voltage circuits (NTVC) and sub-threshold voltage circuits (STVC). Figure 4.1: Energy delay trade-off in combinational logic [120] 4.2 Effect of voltage scaling on delay Given the wide feasible range of voltage scaling [57], it is important to analyze its effect on delay (see Figure 4.2). In super-threshold regime (V dd > V th ), circuit delay increases mostly quadratically with decreasing voltage. In near-threshold regime (V dd ~ V th ), there is approximately 10X performance degradation compared to super-threshold region and in the sub-threshold regime (V dd < V th ) delay increases exponentially with decrease in V dd . This three-stage sensitivity of delay to voltage scaling, necessitates a 77 delay model which can help accurately predict the delay of circuits in all the three regions easily. Figure 4.2: Delay in different supply voltage operation regions [57] 4.3 Delay sensitivity to variability at low voltages One of the major barriers preventing NTVC and STVC from going mainstream is their higher delay variations [57][17]. Figure 4.3 shows the sensitivities of major delay defining parameters for the three operating regions [74]. As evident from Figure 4.3, the on- and off-current, I on and I off , respectively, play important roles in determining delay variability. Since I on /I off ratio for near-threshold and super-threshold circuits are around the same order of magnitude, for such circuits I off plays a relatively minor role in delay calculation and so does the sensitivity of I off to variability. But, for sub-threshold circuits 78 I off is significant (and so is the sensitivity of I off ) and hence delay variation is much higher (exponential). This increases sensitivity of delay to voltage scaling and hence necessitates a variability aware delay model for NTVC [75][120] and STVC [18][40][102][172][186]. Figure 4.3: Comparison of key sub-threshold, near-threshold and super-threshold sensitivities for 65nm [74] 4.4 Empirical learning from circuit simulations We studied the effect of variability and MIS for all the three family of circuits at 65nm technology node using an industry standard library and circuit simulator (Spectre) and report the results for the worst case delay of a 2-input NAND gate comprising of minimum size transistors in Table 4.1. We selected V dd as 0.5V and 0.2V for NTVC and STVC, respectively [74]. We also performed Monte Carlo simulations to evaluate the effect of variability on gate delay. NV and FV stands for results with no variability and full variability; respectively. Full variability includes both global and local variabilities [10]. 79 4.4.1 Effect of variability on gate delay As evident from Table 4.1, in near-threshold and sub-threshold regions the effect of variability on gate delay is much more severe than in the super-threshold region. Results show that sub-threshold circuits can show up to 1100% delay variation compared to 50% delay variations observed on super-threshold circuits. This fortifies the notion that variability plays a major role in determining the gate delay for ultra-low power CMOS circuits [154][18][102][172]. Table 4.1: Analysis and effect of variability and MIS on max delay of a 2 input NAND gate in 65nm V dd Maximum delay Delay deviation from SIS nominal (%) Max_NV (SIS) Max_NV (MIS) Max_FV (SIS) Max_FV (MIS) Max_NV (MIS) Max_FV (SIS) Max_FV (MIS) 1.2V 113.4 ps 162.65 ps 173.3 ps 245.3ps 43.69% 52.82% 116.36% 0.5V 9.397 ns 13.017 ns 21.39 ns 27.87 ns 38.52% 127.62% 196.53% 0.2V 5.54 μs 7.42 μs 65.2μs 67.9 μs 33.93% 1076.90% 1125.63% 4.4.2 Effect of MIS on gate delay Table 4.1 shows that for a two input NAND gate comprising of minimum size transistors, effect of MIS in near- and sub-threshold circuits (though somewhat diminished compared to super-threshold circuits) is quite significant and can cause about 34% increase in delay (in case of NTVC) and hence cannot be ignored. The slight reduction in contribution of MIS to delay can be attributed to the mitigation of Miller effect for MIS-TNC at reduced voltage levels. 80 4.4.3 Combined effect of variability and MIS It can be seen from Table 4.1 that MIS with variability further worsen the delay variation. For super-threshold circuits with full variability, MIS can increase the percentage delay variation from 52.82% for SIS (Max_FV (SIS)) to 116.36% for MIS (Max_FV (MIS)). Corresponding figures for near-threshold circuits and sub-threshold circuits are from 127% to 196% and from 1076% to 1125%, respectively. Thus with or without variability, MIS also plays a significant role in low-V dd circuits and must be captured by their delay models. 4.5 Existing near- and sub-threshold delay models Existing delay models for NTVC are all empirical in nature where the on-current for a transistor is empirically determined and approximated by some form of the EKV model [59] and a fitting function [120][75][154][61] (see Table 4.2). Transistor delay is then approximated as a function of output load, on-current and input voltage, where input slew is accommodated in the fitting function. Subsequently gate delay is represented as a function of transistor delay based on single input switching. Finally path delay for a combinational logic block is calculated as a simple weighted sum of delays of gates along the path. The delay models of [120][75][154] are highly approximate and do not account for MIS and variability. The delay model of [154] is approximate and SIS based but considers delay variability as a function of on-current variability and represents gate delay as a normal distribution. Path delay distribution is then calculated using statistical analysis on normal distributions. 81 Existing delay models for STVC are either empirical (similar to NTVC delay models) [120][75][18][102][172][40][186][112][111][133][116] or analytical (gate delay arrived at by solving integrals of on-currents over time based on region of operation) [143][165][168] (see Table 4.2). Though more accurate, the analytical models are too complex, even when ignoring MIS, for static/statistical timing analysis. Statistical delay models for STVC [18][102][172][40][186][112][111][133][116] represent gate delay distribution (based on current distribution) as a log-normal variable and compute path delay distributions using statistical operations on log-normal distributions [40]. Table 4.2: Review of existing near- and sub-threshold delay models Existing delay models for near-threshold circuits Reference # Analytical Empirical SIS MIS Variation-aware [120][75] Y Y N N [154] Y Y N Y [61] Y Y N N Existing delay models for subthreshold circuits Reference # Analytical Empirical SIS MIS Variation-aware [74][133] Y Y N N [18][102][172][40] [186][112][111][116] Y Y N Y [143][165][168] Y Y N N [63] Y Y N Y 82 All existing delay models for NTVC and STVC are SIS based, i.e., ignore all the effects of MIS, i.e., near-simultaneous transitions, vector unaware (unable to work with partially- or fully-specified vectors), and deal with variations using distributions and not bounds. All timing related tasks (pre-silicon and post-silicon) require a delay model that is accurate and resilient, i.e., guarantees that the minimum and maximum delay values it reports are guaranteed to be lower and upper bounds on the actual delay of the selected (100 - ε)% of fabricated chips. Hence, we decided to evaluate our resilient delay model presented in Chapter 4 for NTVC and STVC circuits as well. 4.6 Experimental results on simple circuits In Table 4.3 we compare the existing low V dd delay models with our proposed delay model for simple circuits (variability not considered). As evident, even for a chain of 3 NAND gates, the SIS based NTVC delay model of [75][120] will report a delay with an error of about -27% whereas our model reports a much more accurate delay within 0.2%. The inaccuracy of [75][120] can be attributed to factors such as ignoring MIS, and approximations such as path delay being sum of gate delays. Our delay model gives much better results, namely 1.1% compared to -25.63% for STVC delay models of [18][40]. Moreover, we would like to draw attention to the fact that existing delay models for NTVC/STVC grossly underestimate the actual delay (as captured by the negative sign), and are unable to bound the worst case gate delay in any meaningful way. 83 Table 4.3: Comparison of delay models for simple circuits NTVC Analysis Basic circuit Max delay (ns) Max delay error (%) Spectre [75][120] Ours [75][120] Ours INV 1.283 1.260 1.285 -1.8% 0.15% NAND 13.017 9.397 13.032 -27.9% 0.115% INV Chain (3) 3.684 3.629 3.69 -1.5% 0.16% NAND Chain (3) 27.707 20.267 27.752 -26.85% 0.162% STVC Analysis Basic circuit Max delay (μs) Max delay error (%) Spectre [18][40] Ours [18][40] Ours INV 0.45 0.45 0.45 0% 0% NAND 7.42 5.55 7.43 -25.2% 0.13% INV Chain (3) 1.45 1.45 1.455 0% 0.34% NAND Chain (3) 15.33 11.4 15.5 -25.63% 1.13% 4.7 Experimental results on ISCAS benchmarks For ISCAS benchmark circuits, in order to evaluate the effect of ignoring MIS for low V dd circuits (variability again not considered), we simulated 10,000 randomly generated vectors using the circuit simulator and compared the results (see Table 4.4 and Table 4.5) with two approaches that use our gate delay models for delay calculation. • ETA – Enhanced Timing Analysis that computes output arrival time and output transition time ranges in a static manner that is vector-unaware [80]. • ETS – Enhanced Timing Simulation that computes much tighter output arrival time and output transition time ranges in a dynamic vector-aware approach, i.e., for cases where a partially- or fully-specified vector is given [81]. 84 4.7.1 Experimental results on timing analysis: without variability For NTVC and STVC, we repeat ETS and ETA using our models and using existing models; i.e., [75][120] for NTVC and [18][40] for STVC, and report the results in Table 4.4 and Table 4.5. Table 4.4 shows that our resilient delay model based ETS gives much better results than the existing delay models for NTVC [75][120] and STVC [18][40]. Table 4.4: Comparison of delay models for ISCAS benchmarks: Timing simulation NTVC Analysis Benchmark Simulations of 10,000 randomly generated vectors Max delay reported (ns) Accuracy (%) Spectre ETS (ours) ETS [75][120] ETS (ours) ETS [75][120] s298 42.83 43.15 33.25 0.73% -22.37% s444 59.71 61.04 45.86 2.22% -23.20% s953 58.72 61.57 46.09 4.84% -21.51% s1196 122.06 125.4 92.23 2.74% -24.44% s9234 284.82 306.14 196.26 7.48% -31.09% STVC Analysis Benchmark Simulations of 10,000 randomly generated vectors Max delay reported (μs) Accuracy (%) Spectre ETS (ours) ETS [18][40] ETS (ours) ETS [18][40] s298 22.02 22.72 16.5 0.83% -25.09% s444 36.76 38.2 24.3 3.92% -33.89% s953 33.78 34.52 26.8 2.18% -20.67% s1196 61.72 63.05 48.4 2.14% -21.58% s9234 175.44 186.5 119.74 6.30% -31.74% 85 Table 4.5: Comparison of delay models for ISCAS benchmarks: Timing analysis NTVC Analysis Benchmark Analysis Max delay reported (ns) Accuracy (%) ETA (ours) ETA [75][120] ETA (ours) ETA [75][120] s298 44.48 37.12 3.7% -13.3% s444 63.66 51.67 6.2% -13.47% s953 63.97 52.49 8.2% -10.61% s1196 137.92 101.4 11.5% -16.93% s9234 348.9 223.52 22.5% -21.52% STVC Analysis Benchmark Analysis Max delay reported (μs) Accuracy (%) ETA (ours) ETA [18][40] ETA (ours) ETA [18][40] s298 23.21 18.1 5.1% -17.82% s444 40.04 28.4 8.2% -22.74% s953 37.66 29.2 10.3% -13.55% s1196 71.36 50.5 13.5% -18.18% s9234 221.4 132.77 26.2% -24.32% For s1196 the error in delay estimate by our simulation based approaches for NTVC and STVC are 2.74% and 2.14%, respectively. Corresponding figures for the approach of [75][120] (for NTVC) and [18][40] (for STVC) are much higher, about -24% and -21%, respectively, due to approximations and ignoring MIS. This trend also holds for bigger circuits, as evident from the results for the benchmark s9234 (6.3% error for our approach compared to -31.74% error for SIS based approaches from [18][40]). More importantly, our results are never optimistic for analysis or simulation. 86 Table 4.5 shows that even the static approach (ETA) based on existing delay models for NTVC ([75][120]) and STVC ([18][40]) (that do not capture MIS) underestimates the actual delay by about -25% for s9234 operated in sub-threshold. Note that the analysis is pessimistic for both ours and existing delay models. For ours this makes the results more pessimistic. Coincidentally, for existing approaches this reduces optimism. 4.7.2 Experimental results on timing analysis: with variability It is widely assumed that accurate industry-standard circuit simulators (Spectre) provide the best possible estimate of delay post-silicon. Exhaustive Monte Carlo simulations using circuit simulators for medium and large size circuits are very time consuming, and given the number of process parameters varied for the foundry-provided 65nm industrial library, such simulations will take days even for small circuits. Hence for experiments with variability on ISCAS benchmarks we report the max delay obtained by performing sufficiently large number of Monte-Carlo based Spectre simulations on a few selected maximum delay invoking vectors (see Chapter 5) and compare them to the result calculated by our ETS (Table 4.6). Existing approaches for NTVC [75][120] do not account for variability and those for STVC [18][40] deal with variability statistically, i.e., they report distributions instead of bounds. Hence, in the absence of a common platform of comparison between ours and existing approaches for low V dd circuits, we compared our approach with Monte-Carlo based circuit simulations for vectors that have been shown to invoke high delays. 87 Table 4.6: Experiments with variability: ISCAS benchmarks Benchmark ETS Accuracy – Tightness of bounds (max) (%) No variability Full variability NTVC STVC NTVC STVC s298 0.5% 0.6% 1.4% 2.3% s444 2% 3.2% 2.7% 3.8% s953 2% 2% 3.1% 4.2% s1196 2% 2.4% 3.9% 5.3% s9234 5.8% 6.1% 8% 12.2% As shown in Table 4.6, the inaccuracy of ETS increases for the experiments with variability (from 2% to 3.9% for NTVC and 2.4% to 5.3% for STVC in s1196). These results clearly demonstrate that our variability aware simulation based approach is indeed tight and the inaccuracy is largely due to the fact that Monte Carlo runs on the full circuit do not cover the complete process space that is covered during characterization of gates. The results for STVC are slightly looser because the effect of variability on STVC is much more severe than NTVC. Note that for circuits such as s1196 our ETS based simulation approach gives reasonably tight accuracy at a much lower complexity (ETS finishes in 3 seconds whereas the Monte Carlo based Spectre simulations takes about 90 minutes). This reduction in complexity is much larger (about 10 5 X) at reasonable accuracy (about 10-15%) for much larger full chip based circuits such as s9234. Also, our results with variability tend to become somewhat more loose as the size of circuit increases which can be attributed to looseness in bounding approximations and smaller coverage of process variations during full circuit Monte Carlo simulations (as the 88 number of circuit instances we can simulate is limited by high run-time complexity of Spectre simulations) compared to gate characterization. 4.7.3 Experimental results on path selection and vector generation In this set of experiments, we repeat the experiments presented in Section 3.4.2 (characterizing the delay for each gate using full global variability and generating vectors for validation using our delay validation framework [54] (explained in more details in Chapter 5) with timing threshold ∆ = 0.02 [80]) for NTVC and STVC. We applied our notion of bounding approximations to the SIS based delay models for NTVC [75][120] and STV [18][40] during characterization and delay modeling (see Chapter 4). Table 4.7: Analysis of validation metrics: Our approach vs P2P based appraoches NTVC Analysis Benchmark Our approach Approaches based on [75][120] Paths Vectors CPU clocks Paths Vectors CPU clocks s298 6 134 6,273 21 274 8,561 s953 3 3 84,227 8 16 93,224 s1196 38 2,236 513,476 125 47,342 854,233 s9234 7,326 6.25 x 10 5 5.27 x 10 8 23,764 7.35 x 10 11 1.47 x 10 10 STVC Analysis Benchmark Our approach Approaches based on [18][40] Paths Vectors CPU clocks Paths Vectors CPU clocks s298 11 186 7,179 41 543 12,364 s953 5 5 103,467 12 52 127,264 s1196 52 4,936 622,458 167 137,345 926,468 s9234 15,442 7.3 x 10 7 8.67 x 10 8 33,114 8.15 x 10 13 3.17 x 10 10 89 Table 4.7 shows the comparison of our delay model with the existing worst case delay model on various delay marginality validation metrics such as selected path set, generated vector set and test generation runtime (given in CPU clocks provided by the clock() function in C, an approximation of processor time). Results clearly indicate the superiority of our proposed resilient model based approach over the bounded pin-to-pin delay models based on existing models for NTVC [75][120] and STVC [18][40]. It can be seen that for s9234 out resilient model decreases the vector-space (by 10 6 X for both NTVC and STVC), and decreases test generation time (by about 30X for NTVC and 35X for STVC; respectively). We also empirically verified that the vectors generated using our delay model can invoke more delay (up to 12% (14%) for s9234 implemented as NTVC (STVC)) compared to robust test vectors. 4.8 Summary Power is increasingly the primary design constraint for chip designers and one of the main techniques for addressing this concern is aggressive voltage scaling. Device variability increases with voltage scaling and significantly affects gate delays at low voltages. Although existing delay models for near- and sub-threshold circuits captures the effects of variability on gate delays, they do not capture advanced delay phenomenon such as multiple input switching (MIS; also known as near-simultaneous transitions) at inputs of a gate. As a result, most existing gate delay models often grossly underestimate worst case delays. We extend our model for super-threshold circuits (presented in Chapter 3 that guarantees that the minimum and maximum delay values if computed are guaranteed to 90 bound the corresponding delay values in silicon) to include near- and sub-threshold circuits. We also show that our model has practical run-time complexity and works equally well for super-, near-, and sub-threshold circuits. Extensive results demonstrate that our new resilient delay model for low V dd circuits, which captures the effect of MIS and variability, is much more accurate than the existing low V dd delay models and generates tight bounds at low complexity. Importantly, as opposed to existing models, our approach never underestimates delay. Moreover, compared to the existing NTVC and STVC delay models, our model is much more efficient for post-silicon validation in terms of paths selected and vectors generated for validation. 91 CHAPTER 5 GENERATING VECTORS FOR POST-SILICON DELAY VALIDATION In this chapter, we present a new method to generate vectors for post-silicon delay characterization, especially for exposing delay marginalities during post-silicon validation and speed binning during testing. Our method generates vectors that are guaranteed to excite the worst-case delays of fabricated chips without introducing any pessimism. It embodies several innovations based on the resilient gate delay model that captures multiple input switching effects and process variations such as new conditions that vectors must satisfy to invoke the maximum delay of a target path, and a new approach to generate multiple vectors (vector spaces) that are collectively guaranteed to invoke the worst-case delay of the target path. 5.1 Key observations from silicon studies Recent silicon studies by nVidia [45], Freescale [185], and Sun (now a part of Oracle) [26] have shown that existing path delay testing approaches generate vectors that fail to invoke the worst-case delays in first-silicon. Based on these results, we focused on the following key observations: • Robust tests that guarantee the highest delay invocation for pin-to-pin delay model, fail to do so for the proposed resilient delay model. 92 • Robust conditions are timing independent and hence not able to quantify the effect of near-simultaneous transitions that can affect delay significantly. • Vectors based on functional sensitization and enhanced functional sensitization conditions [82] do not guarantee maximum delay sensitization for a resilient delay model. These are due to the fact that existing delay testing approaches do not consider delay models for nano-scale CMOS, particularly impact on delays of multiple input switching (MIS) and process variations (see Table 5.1), thus necessitating a timing oriented resilient path delay test based approach that can be adopted to delay marginality validation. Table 5.1: Existing delay testing approaches Reference Type of PDT Timing oriented Variation aware Invokes maximum delay [15][30][37][62] [110][137][155][166] Robust N N N [135] Non-robust N N N [65] Robust and Non-robust N N N [80][81][82] Enhanced Functional Sensitization Y N Y 5.2 Existing path selection approaches In order to reduce the complexity of path delay test generation and application, it is essential to carefully select paths to target for delay test generation. Many approaches have been proposed to select paths for delay testing. Variability unaware approaches such 93 as [36][40][88][89][103][156] uses timing independent sensitization conditions (either robust or non-robust) to prune the path set by identifying unsensitizable paths or path segments. These approaches fail to select the top critical paths under variability and thus are unsuitable for validation. On the other hand, statistical approaches such as [113][117][181][182][187] use statistical information obtained by performing SSTA [2] to prune the provided initial path set. Though these approaches can account for correlation [187] (spatial as well as process), the final results depend on the quality of the intial path set. Hence if the initial path set is incomplete (i.e., it does not contain the path with maximum delay), the resultant pruned path list will also be incomplete. Thus for our purpose of delay validation we enhanced the approach in [80] that identifies a set of paths that is guaranteed to include all paths that may potentially cause a timing error if the accumulated values of additional delays along circuit paths is upper bounded by a desired limit (Δ), works with upper and lower bounds given by our resilient delay model and also checks for both functional sensitization and maximum delay excitation. 5.3 The value system Delay validation and testing requires a logic value system where the state of a signal has the ability to represent any possible situation that can occur during two consecutive vectors [16]. A resilient delay validation approach requires a value system that is complete. By complete we mean to say that the value system must be able to represent transitions and 94 corresponding delay effects. Thus a complete value system must be able to represent steady values as well as hazardous (static as well as dynamic) transitions. Also for basic gates the impact on delay is different for static or dynamic hazard at the side input. Table 5.2 shows the analysis of existing prevalent logic value systems used for testing. The basic 8 value system of [80] can be selected as the basis for deriving a complete value system as it distinguishes successfully between static and dynamic hazards (using the concept of transition as well as edges) . Table 5.2: Analysis of value systems Reference Type of value system Basic values Total values Able to represent hazards Able to distinguish between static and dynamic hazards [55][62] [135] Two (0, 1) 4 N N [30] Four (S0, S1, R, F) 4 Y N [27][28] [99] Three (0,1,X) 9 Y N [155] Five (0, 1, s, p, -) 6 Y N [110] Five (s0, s1, u0, u1, XX) 5 Y N [16] Six (p0, p1, s0, s1, -0, -1) 23 Y N [80][81] [82] Eight (S0, S1, T0, T1, CR, CF, H0, H1) 53 Y Y 95 The basic 8 values {S1}, {S0}, {CR}, {CF}, {T1}, {T0}, {H1}, and {H0} can then be extended to the 53 value system using the following six step procedure of [16][83] that generates a complete value system from a basic value system using sensitization conditions and subsequent forward and backward implications on basic gates: 1. Define the basic eight values {S0}, {S1}, {T0}, {T1}, {H0}, {H1}, {CR} and {CF} - each of which contains only one of the basic logic values. 2. Define "X" as composite value consisting of all basic logic values. 3. Considering currently identified composite logic values construct (or reconstruct) tables to be used by (NOT/NAND/NOR) gates forward implication procedure. 4. By applying combinations of currently identified composite logic values to backward implication procedure, identify new composite values that need to be added. 5. By applying combination of currently identified composite logic values to the forward implication procedure, identify other new composite values to be added. 6. Repeat steps 3 to 5 until no new value is added. The resultant 53 value system (see Table 5.3) thus provides closure for backward and forward implication procedures. This value system ensures that no information is lost during implication and hence is complete in every sense. Note that our delay validation approach can work with any other complete value system. 96 Table 5.3: The complete eight value system [80] # Value # Value # Value 1 {} 1 19 {T1,H1} 5 37 {S0,S1,H0,H1} 5 2 {CF} 3 20 {CF,S1} 5 38 {CR,S1,T1,H1} 5 3 {CR} 3 21 {CR,S0} 5 39 {CR,T0,T1,H1,H0} 5 4 {S1} 3 22 {CF,T0,H0} 5 40 {CF,T0,T1,H1,H0} 5 5 {S0} 3 23 {CR,T1,H1} 5 41 {CF,S1,S0,T0,H1} 5 6 {T0} 3 24 {CR,T1,H0} 5 42 {CR,S1,S0,T0,H1} 5 7 {T1} 3 25 {S1,T1,H0} 5 43 {CF,CR,T0,S1,T1,H1} 4 8 {H0} 3 26 {CF,T0,H1} 5 44 {CF,CR,T0,S0,T1,H1} 4 9 {H1} 3 27 {S0,T0,H1} 5 45 {CF,CR,S1,T0,T1,H0} 5 10 {S1,H1} 4 28 {S0,S1,H0} 5 46 {CF,S0,S1,T0,H1,H0} 5 11 {S0,H0} 4 29 {S0,S1,H1} 5 47 {CR,S0,S1,T0,H1,H0} 5 12 {CF,T0} 5 30 {CF,S0,S1} 5 48 {CF,CR,S0,T0,T1,H1} 5 13 {CR,T1} 5 31 {CR,S0,S1} 5 49 {CF,CR,S1,H0,H1,T1,T0} 4 14 {CR,S1} 5 32 {S0,H0,CF,T0} 4 50 {CF,CR,S0,H0,H1,T1,T0} 4 15 {S0,CF} 5 33 {S1,H1,CR,T1} 4 51 {CF,CR,S0,S1,H1,T1,T0} 5 16 {T0,H1} 5 34 {S0,H0,CR,T1} 5 52 {CF,CR,S0,S1,H0,T1,T0} 5 17 {T0,H0} 5 35 {S1,H1,CF,T0} 5 53 {CF,CR,S0,S1,H0,H1,T1,T0} 2 18 {T1,H0} 5 36 {CF,S0,T0,H1} 5 1-Empty value, 2-Fully composite value, 3-Basic value, 4-Values based on sensitization conditions, 5-Values derived from implications (forward and backward) 5.4 Innovative use of the resilient delay model The resilient delay model (Chapter 3) captures the effect of process variations and embodies a notion of bounding approximations that ensures that all minimum and maximum delay values computed are guaranteed to bound (to the level of (100-ε)%) the corresponding actual delay values and thus can then be used for high quality post-silicon delay characterization of high performance circuit designs. The key idea is to 97 acknowledge the fact that the resilient delay model can be easily divided into timing ranges and essential conditions can be derived for maximum delay sensitization associated with each timing range. 5.4.1 Delay defining parameters The delay defining parameters based on the resilient delay model only use the limited qualitative information to define delays: Figure 5.1: Delay defining parameters for the resilient delay model 98 • Maximum delay (α) is the maximum delay associated with every input to every output for a gate. For a gate with input X, output Z and rising transition at output, the maximum delay is denoted by α XZ R (see Figure 5.1). • Near simultaneous range (δ) is the maximum time-separation between the transitions at inputs that can possibly cause the effects of one to interfere with the effects of the other. It enables us to derive timing dependent maximum delay sensitization conditions that guarantee invocation of worst case delay at each gate along a target path. For a 2-input gate with falling transitions at inputs X and Y when X precedes Y, this is denoted by δ XY F (see Figure 5.1). 5.4.2 The taxonomy of timing cases We intend to capture the conditions for excitation of the target path using limited available information such as the timing dependent conditions that accommodate near simultaneous transitions. For each timing range, multiple conditions can be easily derived with varying test quality (see the next section). Consider a 2 input NAND gate with inputs X and Y and output Z. Let X be the on-path input and Y be the side input. Let arrival times of X and Y be [A X FS , A X FL ] and [A Y FS , A Y FL ] respectively. Using the delay parameters α and δ, the following timing cases corresponding to a falling transition at on path input X can be derived. • Region 1: Y not near-simultaneous with respect to X: Transitions at X and Y are separated at least by δ YX F (for Y before X) or by δ XY F (for X before Y). Hence X 99 and Y will not interact with each other. This is Region 1 of the delay curve in Figure 5.2. • Region 2: Y near-simultaneous but does not overlap with X: Transitions at X and Y are separated at most by δ YX F (for Y before X) or by δ XY F (for X before Y). Transition at Y can affect the transition at X due to first order effects such as multiple charge paths [28]. This is Region 2 of the delay curve in Figure 5.2. • Region 3: Y overlaps X: The timing ranges of X and Y are not mutually exclusive. Hence transitions at X and Y can overlap (as shown in Figure 5.2). Figure 5.2: Timing cases for MIS-TC for a 2 input NAND gate Figure 5.2 clearly shows that in Region 1 the delay of the NAND gate is equal to the pin to pin delay whereas in Region 2 the delay needs to be arrived by using simulations and curve fitting techniques [28]. Figure 5.3 shows the same for the to-non controlling case. 100 Figure 5.3: Timing cases for MIS-TNC for a 2 input NAND gate 5.5 Maximum delay sensitization conditions Consider a to-controlling transition at on path input X (Figure 5.2). The functional sensitization conditions [82] provide the necessary conditions at the side input Y to be able to sensitize the target path from X to Z. The robust conditions as used in [45][26] will allow only {S1} on the off path inputs for all the timing cases and eventually will miss certain suitable candidates and hence may miss the vector that invokes the worst case delay. Hence, robust conditions are not suitable for generating vectors for delay marginality validation. We define new conditions that satisfy our objective that the vectors generated using our conditions are (collectively) guaranteed to invoke the worst-case delay of the target path. (Since we are interested in maximum delay and simultaneous to-non- controlling transitions can increase delay [29], we will be focusing on the side-input conditions for this case in the rest of the proposal). To achieve this, in contrast to robust conditions, our approach starts with the set of all possible values and eliminates only 101 those cases that can be proven as being unable to invoke the worst-case delay under any circumstance (described ahead). Table 5.4: Classification of timing cases for to-controlling transition at on-path input of a 2 input NAND gate # Timing case Explanations 1 Y before X and not near simultaneous Y arrives at least δ YX F before transition at X. Hence Y and X cannot interfere. A Y FL < =A X FS – δ YX F 2 Y before X and near simultaneous but no overlap Y arrives at most δ YX F before transition at X. Effects of Y and X may interfere with each other due to Miller effect, multiple charge paths, state of internal capacitance etc. A X FS > A Y FL >A X FS – δ YX F A X FS > A Y RL >A X FS – δ YX F 3 Y overlaps X The timing ranges of X and Y overlaps. X and Y may or may not interfere with each other 4 Y after X and near simultaneous but no overlap Y arrives at most δ XY F after transition at X. Effects of Y and X may interfere with each other due to multiple charge paths etc. A X FL < A Y FS <A X FL + δ XY F 5 Y after X and not near simultaneous Y arrives at least δ XY F after transition at X. Hence Y and X cannot interfere. A Y FS >= A X FL + δ XY F Table 5.4 shows the detailed classification of timing cases along with necessary equations for to-controlling transition at on path input of a 2-input NAND gate. Table 5.5 shows the same for the to-non controlling case (see Figure 5.3). 102 Table 5.5: Classification of timing cases for to-non-controlling transition at on-path input of a 2 input NAND gate Timing case Explanations 1 Y before X and not near simultaneous Y arrives at least δ YX R before transition at X. Hence Y and X cannot interfere. A Y RL < =A X RS - δ YX R 2 Y before X and near simultaneous but no overlap Y arrives at most δ YX R before transition at X. Effects of Y and X may interfere with each other due to Miller effect, multiple charge / discharge paths etc. A X RS > A Y RL >A X RS - δ YX R 3 Y overlaps X The timing ranges of X and Y overlaps. X and Y may or may not interfere with each other 4 Y after X and near simultaneous but no overlap Y arrives at most δ XY R after transition at X. Effects of Y and X may interfere with each other due to Miller effect, multiple charge/discharge paths etc. A X RL < A Y RS <A X RL + δ XY R A X RL < A Y FS <A X RL + δ XY R 5 Y after and not near simultaneous Y arrives at least δ XY R after transition at X. Hence Y and X cannot interfere. A Y RS >= A X RL + δ XY R Using the limited timing information available and the timing cases (see Table 5.4 and Table 5.5) we arrive at our maximum delay sensitization conditions (MDS) for each side input shown in Table 5.6 and Table 5.7 (Also see Figure 5.4 and Figure 5.5). As mentioned earlier, in contrast to robust conditions, our approach starts with the set of all possible values (see Table 5.6 and Table 5.7) and eliminates only those cases that can be proven as being unable to invoke the worst-case delay under any circumstance. 103 Table 5.6: Maximum Delay Sensitization conditions for to-controlling transition at on path input of a NAND gate Timing Case Logic Conditions 1. Y before X and not near simultaneous {CR, TR, S1, H1} 2. Y before X and near simultaneous but no overlap {CR, TR, S1, H1,CF,TF} 3. Y overlaps X {CR, TR, S1, H1, TF, CF.H0} 4. Y after X and near simultaneous but no overlap {CF, TF, S1, H1} 5. Y after X and not near simultaneous {CF, TF, S1, H1} To reduce the number of vectors while still providing abovementioned guarantee, we identify provable relationship between the logic values in the above conditions and capture these in the form of partially-ordered sets of values. Consider the timing case 1 (Y before X and not near simultaneous), in order to propagate the falling transition at on path input X, a final value 1 at Y becomes imperative and hence we arrive at the set of logic conditions {CR, TR, S1, H1}. Since transitions at Y and X do not interact with each other, each of the four members of this set is equally good i.e., invokes identical delay for the target path, and thus all are equivalent (Figure 5.4). This graph depicts that any of these four values, CR, TR, S1 and H1, guarantees invocation of the same delay for the target path. Going along similar lines, for the timing case 5 (Y after X and not near simultaneous), we arrive at the equivalent set of logic conditions {CF, TF, S1, H1}. 104 Figure 5.4: Partial-ordered graph for the side input values for various timing cases for to-controlling transition at on path input of a 2-input NAND gate Consider the timing case 4 (Y after X and near simultaneous with no overlap),in order to propagate the falling transition at on path input X, an initial value 1 at Y becomes imperative and hence the all inclusive logic conditions {CF, TF, S1, H1}. However, because of possibility of activating multiple charge paths [28] due to a falling transition at the side input, values {CF, TF, H1} becomes inferior to {S1} as they will invoke delay that is guaranteed to be less than or equal to the delay of a classical robust test. However, we cannot establish any provable relationship between the abilities of these three values to invoke greater delay for the target path. Hence, {H1}, {CF} and {TF} are considered non inferior with respect to each other (Figure 5.4). Similarly, the partial order graph for the timing case 2 (Y before X and near simultaneous with no overlap) can be arrived at and shown in Figure 5.4. 105 Consider the timing case 3 (Y overlaps with X), in order to attain a to-controlling response at Z, Y can have any or a subset of {S1, CR, TR, H1, CF, TF, H0}. Since transitions at Y and X can interfere with each other, nothing can be said in affirmative about the effect of one on the on-path delay over the other. Hence each of {CR}, {TR}, {S1}, {H1}, {CF}, {TF} and {H0} are non inferior with respect to each other. Table 5.7: Maximum Delay Sensitization conditions for to-non-controlling transition at on path input of a NAND gate Timing Case Logic Conditions 1. Y before X and not near simultaneous {CR, TR, S1, H1} 2. Y before X and near simultaneous but no overlap {CR, TR, S1, H1} 3. Y overlaps X {CR, TR, S1, H1} 4. Y after X and near simultaneous but no overlap {CR, TR, S1, H1} 5. Y after X and not near simultaneous { S1, H1} Table 5.7 shows the maximum delay sensitization (MDS) conditions for the to- non controlling case. Figure 5.5 shows the partial ordered graphs for the same. It can be seen that the logic values for timing cases 3 and 4 (the near-simultaneous transition cases) are completely unordered. This is because of the fact that multiple to-non controlling transitions can increase as well as decrease the gate delay [29]. Consider the timing case 1 (Y before X and not near simultaneous). In order to propagate the rising transition at on path input X, a final value 1 at Y is necessary and hence we arrive at the set of logic conditions {CR, TR, S1, H1} shown in Table 5.7. Since transitions at Y and X do not interact with each other for this timing case, each of 106 the four members of this set is equally good, i.e., each invokes identical delay for the target path, and thus all are equivalent as shown in Figure 5.5. This graph depicts that any of these four values, CR, TR, S1 and H1, guarantees invocation of the same delay for the target path. Figure 5.5: Partial-ordered graph for the side input values for various timing cases for to-non controlling transition at on path input of a 2-input NAND gate Consider the timing case 2 (Y after X and near simultaneous with no overlap). In order to propagate the rising transition at on path input X, an initial value 1 at Y is necessary and hence we derive the all inclusive set of logic conditions as {CR, TR, S1, H1}. However, because of the Miller effect [29] due to a rising transition at the side input, any of the values in the set {CR, TR, H1} can be shown to be always superior to any in {S1} to invoke the worst case delay. Since we cannot derive any provable relationship between the abilities of any of the other three values, namely {CR}, {TR}, and {H1}, to invoke greater delay for the target path compared to the remaining two, we 107 consider {CR}, {TR}, and {H1} non inferior with respect to each other as shown in Figure 5.5. This figure illustrates how we represent the set of all possible values at a side input using partially-ordered graph to capture provable timing relationships. In general, each node in such a partially-ordered graph is a set of values. If a directed path exists from a set of values SV1 to another set of values SV2, then application at the corresponding side input of any value in SV1 is guaranteed to invoke greater delay for the particular on-path gate, compared to any value in SV2. In contrast, the absence of any such directed path between two sets of values shows that we cannot derive any such universal relationship in terms of invoking greater delay. Thus for a path in a circuit where every on-path gate is a NAND gate and each satisfies timing case 2, a test vector with either of {CR}, {TR} and {H1} at the side input of each on-path NAND gate is guaranteed to invoke higher delay than a vector with {S1} at all the side inputs. More importantly, due to absence of any ordering between {CR}, {TR} and {H1}, we enumerate multiple vectors that satisfy all provably non- inferior combinations of side-input values. Consider timing case 3. To attain a to-non-controlling response at Z, Y can have any or a subset of {S1, CR, TR, H1}. Since transitions at Y and X can interfere with each other, nothing universal can be said about the effect of one of these values on the target path’s delay compared to any other value. Hence each value {CR}, {TR}, {S1} and {H1} is non inferior with respect to each other as shown in Figure 5.5. In such case complete 108 enumeration is deemed necessary to guarantee that the generated vectors are collectively guaranteed to invoke the worst-case delay of the target path. The test vector space generated using our partially-ordered graphs of side-input values and, when necessary, enumeration of their combinations, guarantee invocation of worst case delay of a target path, since we arrive at these conditions only by eliminating provably inferior candidates. In this manner, our MDS (maximum delay sensitization) conditions use the available limited timing information efficiently to reduce the test generation complexity and the number of vectors. For example, a gate with timing case 3 requires more values as well as more enumeration than a gate with timing case 1. 5.6 Selective enumeration Previous section shows that maximum delay sensitization conditions are timing dependent. Furthermore the alternative values for timing cases are, in general, represented using partially-ordered graphs (see Figure 5.5). For timing case 1, every side input value under maximum delay sensitization is guaranteed to be equally good. Hence, this case requires no enumeration. In contrast, for timing cases 2 and 3 it may be necessary to enumerate alternative side input values, since there exists partial ordering among alternative logic values for maximum delay sensitization. Using the partial ordered graphs we refine the logic values at side inputs to arrive at a set of multiple vectors termed as a test vector-space guaranteed to resiliently detect the target path. A test vector-space is a partially-specified vector where designated partially-specified values are expanded into all possible combinations and every one of them will be applied during validation. (For example, if the first two partially-specified 109 values in XX-10 are designated as ‘X’ for expansion while the third is designated as ‘-’ for don’t care, then our validation vector set comprises of the following four partially- specified vectors: 00-10, 01-10, 10-10, and 11-10, where - is don’t care and can be replaced by either 0 or 1). Figure 5.6: Elimination of inferior vector-spaces at leaf node. Figure 5.7: Algorithm to eliminate inferior vector-spaces at leaf node We eliminate only provably inferior (low delay invoking) vectors in our approach and hence our test vector-spaces comprise of sets of non inferior vectors that can resiliently 110 detect a target. Our all inclusive approach selects all non inferior vector sub-spaces as suitable candidate for validation. The search starts by enumerating the values at each side input and reducing them to either a single value or a single equivalent set. Within an equivalent set each value is guaranteed to invoke equal delay. At the leaf node of our search tree (see Figure 5.6), it eliminates inferior vector sub-spaces and storage of non inferior vector-spaces as per the algorithm shown in Figure 5.7. Figure 5.8: Elimination of inferior vector-spaces at the intermediate node. Figure 5.9: Algorithm to eliminate inferior vector-spaces at intermediate node Let ONIS denotes the old non inferior set and VS(i) represents the i th vector-spaces in the same. Then in order to speed up the search process the elimination of inferior vector- spaces (see Figure 5.8) can be done at intermediate nodes as per algorithm in Figure 5.9. 5.7 The vector generation framework Any classical path oriented ATPG requires the following: • A method to select target paths using available timing information. 111 • A set of logic conditions (timing dependent or independent) at the side inputs to excite the target path in the desired manner. • A traditional test generation framework that applies the above logic conditions and then performs standard test generation tasks, such as implication, justification, and search, to arrive at a single test vector for the target path. Figure 5.10: An overview of our six phase approach 112 Our approach uses the classical ATPG framework as a baseline and develops a new approach to utilize the MDS conditions at side inputs appropriately. Several changes are required since we must enumerate multiple combinations of side input values to generate multiple vectors and eliminate all provably inferior vectors, to arrive at a vector space that is guaranteed to invoke the worst-case delay of the target path in a resilient manner. We now present our approach for identifying a set of multiple vectors (MV) known as vector-space (VS) guaranteed to resiliently detect the target path with no pessimism, even when loose bounding approximation models are used to derive and analyze vectors. We have divided our approach into six phases (see Figure 5.10). Phase 0: Path selection where using ETA [80] and a timing threshold (∆) we arrive at a set of target paths. This is the preprocessing phase necessary for nearly every path oriented ATPG. The next two phases develop the baseline for comparison with our approach as they embody two approaches for vector generation for a target path that are possible using existing methods. Phase 1: Cone exhaustive approach where we mark the fan-in cone of each target path and enumerate all possible values at the corresponding primary inputs. Phase 2: Enumerative vector generation using enhanced functional sensitization (EFS) [82] conditions (these capture only the necessary conditions for excitation of the target path) at side inputs of each gate along the target path. 113 The remaining three phases encapsulate the crux of our new proposed approach, where we intelligently use our new side input conditions along with available timing information to arrive at a resilient vector space that is guaranteed to invoke the worst case delay for the target path under consideration. ETA [80] enables us to prune the search space by providing the bounds on arrival and transition times for partially specified vectors at every stage of our approach. Phase 3: Apply maximum delay sensitization (MDS) conditions at each side input. Then, perform implications, forward and backward to arrive at a mother vector- space. Phase 4: Use side input refinement (SIR) algorithm using the partially-ordered graphs for MDS conditions to refine side input values where each non-inferior refinement gives a vector sub-space. Phase 5: Use timing based pruning (TBP) approach for each vector sub-space to identify the subset of primary inputs where all alternate values must be enumerated. 5.8 Experimental results on c17 to illustrate the framework Consider benchmark circuit c17 (Figure 5.11). Using the approach in [80], in the pre-processing phase we perform ETA and identify a set of target paths whose worst case delay is > T c - ∆ where T c (0.448 ns) is the minimum clock period obtained from ETA. (Recall that this set of target paths is guaranteed to include all paths that may potentially cause a timing error if the accumulated value of additional delays along circuit paths is upper bounded by ∆.) 114 Figure 5.11: A target path on c17 benchmark In Figure 5.11 we show one selected logical path, namely {3 R , 7 R , 9 F , 10 F , 12 R , 13 R , 16 F } (where 7 R and 9 F denote rising and falling transitions at lines 7 and 9, respectively). We also need timing information to use cases from Table 4.5 and Table 4.6 in our approach (max arrival times for each line in c17 is shown inside “[ ]” in Figure 5.11). 5.8.1 Cone exhaustive approach – the first baseline Figure 5.12: Cone exhaustive approach on c17 In this step, for each target path we identify the primary input cone and mark these inputs as XX and the rest of the inputs as --. At a primary input, XX stands for {CR, TR, S1, S0} which essentially means enumerate all possible combinations for these 115 primary inputs. In contrast, -- stands for don’t care meaning that any value can be applied to obtain a fully-specified vector. The fan in cone of the logical path shown (Figure 5.12) has 4 primary inputs (circuit lines 1, 2, 3 and 4). Hence we apply XX at these four inputs and dd at the other remaining input. At this phase the number of vectors needed for validation is 4 4 = 256. 5.8.2 EFS conditions [80] – the second baseline In [80] the authors have shown that the use of enhanced functional sensitization conditions helps to reduce the number of paths that must subsequently be considered for delay test generation. We use the EFS conditions as a baseline to demonstrate the benefits of our approach. We applied the EFS conditions and performed implications to reduce the targets to be considered further and arrive at a set of MV. The fully specified values at line 1, 2, 3 and 4 shows that the number of vectors needed for validation is 2*3*1*2 =12 as shown in Figure 5.13. Figure 5.13: EFS approach on c17 116 5.8.3 New maximum delay sensitization conditions In Section 5.5 we proposed new maximum delay sensitization (MDS) conditions that can be used to generate vectors that are guaranteed to (collectively) invoke the worst- case delay of the target path. Figure 5.14: MDS approach on c17 Now we apply our MDS conditions (Table 5.6 and Table 5.7) and perform implications to eliminate target paths for which necessary conditions cannot be satisfied after implications and thus we arrive at the mother vector space – the set of MV which is guaranteed to resiliently invoke the maximum delay for the target. The number of vectors needed for validation is 2*2*1*2 = 8 as shown in Figure 5.14. Table 5.8: Taxonomy of timing cases for c17 Line # Timing case After phase 2 After phase 3 After phase 4 2 2 {CR,CF,S1} {CR},{S1} {S1} 4 5 {S1,CR} {S1} {S1} 8 2 {S1,H1,CR} {S1}{H1}{CR} {S1}/{H1}* Table 5.8 shows the logic conditions at side inputs after various phases. Column 4 shows the state of the circuit after phase 3 and the logic conditions are partitioned into 117 sets using the partial ordering as per the corresponding timing case (column 2) shown in Figure 5.15. Figure 5.15: The partially ordered graphs for the three lines in c17 5.8.4 Side input refinement In Section 5.5 we showed how the newly proposed timing dependent maximum delay sensitization (MDS) conditions can be represented using partially-ordered graphs representing partially-ordered sets of values. For each timing case, we can partition the set of all possible side input values into multiple sets, where each set represents an equivalence class in the corresponding partially-ordered graph. Within an equivalent set each value is guaranteed to invoke equal delay. Each set of values for each timing case can be used for enumeration at side inputs to arrive at the high delay invoking vectors in an all inclusive approach described below. Using the partially ordered graphs we refine the mother vector-space to arrive at a vector sub-space. This is accomplished by using a systematic approach to enumerate all combinations of individual sets for the corresponding case from Table 5.8 at each side input (Figure 5.14). For our c17 example we must enumerate the two values, namely 118 logic value {S1} and the logic value {H1} at line #8 to derive two non-inferior vector spaces. Figure 5.16: Search tree for SIR on c17 In c17 we have three circuit lines (#4, #2 and #8) which can be identified as side inputs. In Figure 5.14, #2 and #8 belong to the timing case where the side input is early and near simultaneous but does not overlap with on path input. The corresponding partial ordered graph gives two ({CR}, {S1}) equivalent classes at line #2 and three ({S1}, {H1} and {CR}) equivalent classes at line #8. In this phase every side input assignment is refined to either one value or a single set of equivalent values. The corresponding search tree for c17 is shown in Figure 5.16. After SIR is finished we can see that we are left with only two leaf nodes thus giving rise to the two vector sub-spaces [{CF}, {S1}, {CR}, {CR}] and [{S0}, {S1}, (CR), (CR)] and each requiring just one vector to detect the target (also see Figure 5.17). The reason is because of implication conflict, two of the branches at intermediate node #2 119 are terminated whereas for intermediate node #8 each of the branches leads to a leaf node. Though the illustration shows a BFS approach we have implemented a more advanced DFS approach in our framework. Hence, at the end of this stage the number of vectors needed is 1*1*1*1 + 1*1*1*1 = 2. Figure 5.17: SIR approach on c17 In general, we eliminate only provably delay-inferior (i.e., provably low delay invoking) vectors in our approach and hence our test vector-spaces comprise of sets of all non delay-inferior vectors that can resiliently detect a target. 5.8.5 Timing based pruning Figure 5.18: TBP approach on c17 120 After SIR there is no more enumeration based on logic values as the values in singleton sets are all equivalent. Hence the objective now narrows down to identify side inputs where the values need to be enumerated because of timing conditions and by using these to back trace to identify the inputs with partially specified (X) values, where enumeration based on timing is required. We start with each side input where the timing case is either overlap or near-simultaneous transitions (cases 2, 3, and 4 in Tables 5.6 and 5.7) and recursively mark the fan ins and transitive fan ins with timing matters flag given the condition that the line does not contain any subset of static values {S0, S1}. At the end of this step only the primary inputs that are marked with the timing matters flag needs to be enumerated. For another logical path {3 F , 7 F , 9 R , 10 R , 12 F , 13 F , 16 R } as shown in Figure 5.18, the MV for validation have reduced from 4*1*1*1=4 to 1*1*1*1=1 after TBP (see Table 5.9). Table 5.9: Values at primary inputs of c17 after TBP Line no. (PI) Logic values Timing Matters? 1 {S0, S1, CR, CF} No 2 {S1} No 3 {CF} No 4 {S1} No Finally, Table 5.10 shows the number of vectors required for delay validation of c17 during various phases of our approach. Note that phase 0 just selects the path and thus the corresponding number of vectors is given by exhaustive simulation of the whole circuit. 121 Table 5.10: Results for c17 Phase # 0 1 2 3 4 5 No. of vectors 1024 256 12 6 2 1 5.9 Experimental results on larger benchmarks In our first set of experiments, we characterize the delays of individual gates only using the nominal delay values (zero variations). We then compute T c as the maximum circuit delay using enhanced timing analysis [80]. Then using the variability values for the given 65nm technology, we estimate that the overall variability (primarily in threshold voltage, effective channel length, channel width and so on.) in each gate’s delay is between 30% and 50%. Since in this set of experiments individual gate delay models do not capture variations, we capture variations by using ∆ values between 30% and 50%. In phase 0 we use the above gate delay models (zero variability) and ∆ values (30% to 50%) for target path selection. In phase 1 for each target path identified in phase 0, we mark its fan-in cone. In phase 2 we apply EFS conditions and perform implications. The path is not functionally sensitizable if implication fails. Hence, functionally unsensitizable paths are identified without performing any search. This leads to a considerable reduction in the number of target paths. In phases 3 and 4 we respectively apply our new high delay sensitization (HDS) conditions and side input refinement (SIR) approach to obtain the sets of MV (vector sub-spaces) for each target path. Phase 5 uses timing based pruning (TBP), to reduce the number of fully specified patterns in each MV. 122 Table 5.11: Path analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations Δ Total number of paths selected Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.3 26 95 36 30 9 0.4 272 220 106 92 33 0.5 297 251 126 117 43 s953 0.3 486 310 267 182 3 0.4 1,040 941 709 534 7 0.5 1,398 1,099 977 853 47 s1196 0.3 973 587 419 258 12 0.4 1,895 1,123 738 468 46 0.5 2,739 1,809 1407 864 80 s713 0.3 32,638 254 138 65 7 0.4 37,429 713 462 271 51 0.5 38,840 910 760 481 95 Table 5.11 shows the results for number of target paths for three values of ∆ for four benchmarks. The data for various values of ∆ shows that the number of target paths rapidly increases as Δ is increased. The table demonstrates that the proposed approach identifies much fewer paths to target for every possible value of ∆. For example, for s1196 and ∆ = 0.50 only 864 out of 6,196 logical paths can cause timing errors at outputs 123 if the cumulative value of additional delays along every path is upper bounded by ∆ and whose worst case delay is greater than T c - ∆. Table 5.12: Vector analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations Δ Total number of vectors Vector spaces Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.3 6.49 x 10 6 2.04 x 10 5 512,60 87 9 9 0.4 1.04 x 10 7 3.57 x 10 5 1.21 x 10 5 2,816 104 33 0.5 1.12 x 10 7 4.48 x 10 5 1.58 x 10 5 3,722 177 43 s953 0.3 8.52 x 10 12 3.10 x 10 10 1.11 x 10 10 11,114 65 3 0.4 1.70 x 10 13 9.46 x 10 10 3.71 x 10 10 1.40 x 10 6 133 7 0.5 2.20 x 10 13 1.21 x 10 11 4.74 x 10 10 1.60 x 10 6 29,914 47 s1196 0.3 2.48 x 10 16 5.87 x 10 13 3.83 x 10 13 1.41 x 10 7 44 12 0.4 2.91 x 10 16 1.07 x 10 14 4.63 x 10 13 8.54 x 10 7 103,908 46 0.5 3.48 x 10 16 1.91 x 10 14 8.25 x 10 13 6.07 x 10 10 291,719 80 s713 0.3 1.07 x 10 20 2.96 x 10 14 1.01 x 10 14 8.87 x 10 10 65,535 7 0.4 1.22 x 10 20 1.34 x 10 15 3.84 x 10 14 1.21 x 10 12 7.95 x 10 6 51 0.5 1.33 x 10 20 2.91 x 10 15 7.09 x 10 14 1.59 x 10 12 1.80 x 10 7 95 Table 5.12 shows the total number of fully specified vectors (actually each is a two vector sequence) generated for each of the four circuits to guarantee invocation of the worst-case delay. (Note that the total number of vectors reported here is obtained by 124 simple addition for all the target paths. In practice, multiple target paths will have vectors in common and hence what we report here is an upper bound on the number of vectors.) Also, out of 864 such timing critical paths, we could generate maximum delay tests for only 80 of them. (Note the ATPG has a backtrack limit of 20.) Table 5.13: Runtime analysis on ISCAS Benchmark circuits: Zero process variations in gate delay model, ∆ values capture all model variations Δ Total CPU clocks taken Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.3 42 85 379 1,508 4,526 0.4 60 155 866 3,724 8,537 0.5 61 186 1,194 4,435 9,223 s953 0.3 206 4,054 6,255 1.06 x 10 5 1.72 x 10 5 0.4 279 9,366 14,862 1.62 x 10 5 7.81 x 10 5 0.5 362 12,352 19,128 2.05 x 10 5 1.67 x 10 6 s1196 0.3 354 8,839 13,785 2.83 x 10 5 1.49 x 10 6 0.4 561 9,514 28,003 5.63 x 10 5 3.89 x 10 6 0.5 751 26,975 43,681 9.61 x 10 5 7.03 x 10 6 s713 0.3 6,840 46,202 42,842 2.83 x 10 5 3.91 x 10 5 0.4 7,073 55,289 55,678 2.22 x 10 5 1.11 x 10 6 0.5 7,469 55,347 55,986 3.19 x 10 5 8.43 x 10 6 The results show that for a circuit like s1196 and ∆ = 0.50 with our systematic approach the number of fully specified patterns needed for validation concentrating on 125 delay marginalities can be reduced by a factor of 10 11 , compared to baseline case of cone- exhaustive. We report the test generation time in CPU clocks and the final vector-spaces (to be used for efficient tester memory management) for each case as well in Table 5.13. Table 5.14: Path analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations Δ Total number of paths selected Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.01 142 100 44 40 2 0.02 142 100 44 40 2 s953 0.01 366 366 193 150 2 0.02 400 400 216 168 2 s1196 0.01 837 608 233 169 13 0.02 947 696 317 232 18 s713 0.01 29,856 2,258 419 303 11 0.02 29,856 2,258 419 303 11 In our second set of experiments, we characterize the delay for each gate using full global level of variability for the 65nm process (using a sufficiently larger number of Monte Carlo simulations). In these experiments, since the variations are incorporated in the delay models, we can use much smaller values of ∆ as it must now capture only modeling errors. The results for the four benchmarks are shown in Tables 5.14, 5.15 and 5.16. For each circuit the trends with respect to increasing ∆ values are as expected. More 126 importantly, for all these circuits the number of vectors required to guarantee the invocation of the worst-case delay for the post-silicon validation of a design fabricated in a process with very high level of variability is very practical indeed. This is true because post-silicon validation is performed for a small sample of chips selected from first-silicon batch (unlike delay testing, which must be performed on every fabricated chip copy). Note that the test vector spaces generated by our approach can be further refined using test compaction methods. Table 5.15: Vector analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations Δ Total number of vectors Vector spaces Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.01 7.39 x 10 6 182,392 57,656 4,967 64 2 0.02 7.39 x 10 6 182,392 57,656 4,967 64 2 s953 0.01 7.15 x 10 12 4.90 x 10 10 1.04 x 10 10 604,219 2 2 0.02 7.38 x 10 12 5.67 x 10 10 1.25 x 10 10 620,235 2 2 s1196 0.01 1.71 x 10 16 2.83 x 10 13 3.84 x 10 12 1.03 x 10 6 71 13 0.02 2.34 x 10 16 7.91 x 10 13 1.39 x 10 13 5.78 x 10 6 936 18 s713 0.01 1.04 x 10 20 2.27 x 10 14 7.17 x 10 13 5.7 x 10 10 132,256 11 0.02 1.04 x 10 20 2.27 x 10 14 7.17 x 10 13 5.7 x 10 10 132,256 11 127 Table 5.16: Runtime analysis on ISCAS benchmarks. Full global process variations in gate delay model, ∆ values capture all other model variations Δ Total CPU clocks taken Phase 1 Phase 2 Phase 3 Phase 4, 5 After justification s298 0.01 55 85 383 2,987 4,179 0.02 55 85 383 2,987 4,179 s953 0.01 163 669 4,417 60,819 79,376 0.02 165 729 4,763 61,115 83,467 s1196 0.01 305 1,543 9,035 1.8 x 10 5 2.13 x 10 5 0.02 355 1,726 11,233 1.9 x 10 5 3.07 x 10 5 s713 0.01 5,668 22,668 35,433 2.7 x 10 5 3.37 x 10 5 0.02 5,668 22,668 35,433 2.7 x 10 5 3.37 x 10 5 We have validated the resilient delay model along with the associated timing analysis framework against circuit level simulator (Spectre) using extensive simulations and Monte Carlo simulations. We used the same framework to calculate the delay for the most timing critical paths (∆ = 30%) for each benchmark using both robust test set and our test set (resilient vector-spaces). Table 5.17 shows clearly that our resilient vector- spaces generate much higher delay than the classical robust tests. By cone-exhaustive simulation for a small benchmark (as cone-exhaustive for larger circuits is impractical) such as s298, we validated that our resilient-vector space does contain the maximum delay sensitization vector (that invokes the delay of 0.626 ns). 128 Table 5.17: Robust vs Resilient: Comparison of delays invoked in circuit level simulations Benchmark Robust test vectors Our resilient test vectors Improvement (%) s298 0.546 ns 0.626 ns 14.65% s953 1.033 ns 1.132 ns 9.58% s1196 2.19 ns 2.362 ns 7.85% s713 4.197 ns 4.462 ns 6.31% We performed similar experiments with larger benchmarks s5378 and s9234 and report the results in Table 5.18. Even for the full chip based s9234 our approach reports a much smaller selected path set (about 13X reduction - considering full variability) and a much smaller vector set (about 10 32 X - considering full variability) compared to the baseline of cone-exhaustive approach. Table 5.18: Analysis for larger ISCAS benchmarks Benchmark Reduction in number of paths selected Reduction in number of vectors generated Improvement in max delay (%) s5378 No variability (∆=0.3) 16.1X 10 18 X 5.63% Full global variability (∆=0.05) 18.9X 10 21 X 6.91% s9234 No variability (∆=0.3) 14.9X 10 24 X 7.67% Full global variability (∆=0.05) 17.2X 10 32 X 9.25% 129 Table 5.18 also shows clearly that even for s9234, our resilient vector-spaces generate much higher (about 10% - considering full variability) delay than the classical robust tests. Thus, our approach with resilient delay model and all-inclusive vector spaces can serve as a potential candidate to for efficient post-silicon delay marginality validation. 5.10 Comparison with n-detect tests In n-detect tests, faults are targeted multiple times by different test patterns to enable fortuitous discovery of un-modeled faults [136]. In other words, by increasing the number of detections of target faults, n-detection tests increase the likelihood of detecting untargeted faults and defects. Though they improve defect coverage and can be easily incorporated into single detect ATPG, they increase test volume and hamper test compaction. Since, n-detects tests are conceptually aligned with our philosophy of multiple vectors per target path, in Table 5.19 we compare n-detect robust tests (n = 50) with our resilient test-vector spaces for ISCAS benchmarks. Table 5.19: Comparison with n-detect tests (n = 50) Benchmark robust tests (n = 1) n-detect tests (n = 50) Our vector-spaces s298 0.546 ns 0.554 ns 0.626 ns s953 1.033 ns 1.081 ns 1.132 ns s1196 2.19 ns 2.212 ns 2.362 ns s713 4.197 ns 4.242 ns 4.462 ns s5378 2.09 ns 2.113 ns 2.215 ns s9234 4.11 ns 4.183 ns 4.452 ns As evident from results in Table 5.19, though n-detect test perform bettern than 1- detect robust tests (improvement of 1.78% for s9234), they are inferior to our resilient 130 vector spaces (improvement of 7.67%). Our resilient vector-spaces cover the complete search space (in accordance with our guarantee of all-inclusiveness) and outperform n- detect tests (which covers only a limited search space [136]). 5.11 Summary Our method guarantees the excitation of the worst-case delay of the chips in the first silicon batch without introducing any pessimism. It embodies several innovations, including a resilient gate delay model to capture process variations, new conditions that vectors must satisfy to invoke the maximum delay of a target path, and a new approach to generate multiple vectors (vector-spaces) guaranteed to invoke the worst-case delay of the target path. Experimental results show that our proposed framework along with timing dependent conditions can be used efficiently to guarantee detection of delay marginalities. The non inferior test vector-spaces generated by our algorithm and their improvement over robust test vectors further fortify our proposal for using MV to compensate for lack of exact knowledge and inaccuracies in delay parameters. We also show that, our non- inferior vector-spaces, due to their collective ability to invoke the worst case delays are better than the existing n-detect vectors. 131 CHAPTER 6 SEGMENTATION OF THE PROCESS VARIATION ENVELOPE In this Chapter, we present an efficient method for incorporating the knowledge of global-only (worst case of across die, across wafers, and across wafer lots) and local- only (worst case on-die) process variations to generate vectors for identifying delay marginalities under full global plus local variations. With the goal of significantly reducing the number of vectors required for validation, we propose an approach for segmenting the full global plus local process variation envelope into sub-envelopes, where each sub-envelope is guaranteed to capture worst-case full local-only and partial global-only variations, and where all sub-envelopes collectively capture the worst-case full global plus local variations. We then use our approach presented in Chapter 5 in a segment-by-segment manner for generating multiple vectors (captured as vector-spaces) that guarantee the invocation of the worst-case delay of the chips in the first-silicon batch. 6.1 Timing uncertainty and delay variations The uncertainty of timing estimate of a design can be classified into three catagories [12] (see Figure 6.1): 1. Modeling and analysis errors – inaccuracy of device models, in extraction and reduction of interconnect parasitics, and in timing analysis algorithms. 132 2. Manufacturing variations – uncertainty in parameters of fabricated devices and interconnects from die to die and within a particular die. 3. Operating context variations – uncertainty in the operating environment of a particular device during its lifetime, such as temperature, operating voltage, mode of operation and life-time wearout. Figure 6.1: Steps in design process and their effect on timing uncertainties [12] In this research we focus primarily on the timing uncertainty due to process variations, specifically delay marginality which is an important variation induced timing bug that contributes to a significant fraction of circuit bugs detected by validation. As shown in Figure 6.2, parameter variations lead to electrical variations which in turn lead to delay variations (delay marginality). Figure 6.2: Parameter variations causing delay variations [12] 133 6.2 The taxonomy of process variations The main sources of process variations (variability) in current general-purpose CMOS processes are [49]: 1. Random dopant fluctuation (RDF) – The fluctuation of the number of dopant atoms leads to variation of observed threshold voltage V th for the transistor. RDF is regarded as the major source of device variation in DSM technology node. 2. Line edge roughness (LER) – It is the local variation of the edge of polysilicon gate along its width. The causes of increased LER include the incoming photon count variation during exposure and the contrast of arial image, as well as the absorption rate, chemical reactivity, and the molecular composition or resist. 3. Oxide thickness variation (OTV) – In DSM technology node, the thickness of oxide layer is countable number of atomic-level roughness of oxide-silicon interface layer which is becoming increasingly difficult to control. This leads to increasing variation in device parameters like mobility and threshold voltage. Process variations are also referred to as variability. The variability can be mainly categorized as per the terminology used in [49] as follows (see Figure 6.3). • Systematic process variation – the behavior of these physical parameter variations have been well-understood and can be predicted apriori, by analyzing the layout of the design. The examples are variations due to optical proximity, CMP and metal fill etc. 134 • Non systematic process variation – these have uncertain or random behavior and arise from processes that are orthogonal to design implementation. The examples are the primary contributor to process variations RDF, LER, OTV. • Environmental variation – includes power supply voltage and temperature variations. Figure 6.3: General taxonomy of variation It is common practice in design flow to model systematic variations deterministically (in the advanced state of design after gaining more information) and to model the non-systematic variations statistically. Depending on the spatial scale of variations, process variations can be further classified. 1. Global-only variation (worst case inter-die, inlcuding die from different wafers and different wafer lots) – The die-to-die variations result in shift in the process 135 from reticle to reticle, wafer to wafer, and lot to lot. For example, gate length variation of all the devices on the same chip being larger or smaller than the nominal value. 2. Local-only variation (within-die or intra-die) – Variations affect each device within a die differently. For example, some devices on a die have smaller gate lengths whereas other devices on the same die have larger gate lengths. 3. Full global plus local variation (worst case on-die and inter-die, inlcuding die from different wafers and different wafer lots) – The die-to-die variations result in shift in the process from reticle to reticle, wafer to wafer, and lot to lot. Clearly local-only variations are lower than full global plus local variations, since the former is subsumed by the latter. Further local-only systematic variations affect devices differently depending on whether these devices are placed near or far away from each other [20]. In [20][91] it is shown that the size of biggest combinational logic block in industry standard 65nm technology is within the area of 100μm x 100μm (considered to be dominated by the local-nearby component of local variations). Thus for our work (focusing only on the combinational part of logic blocks) we assume local variations to be comprised of local- nearby variations only. The within-die (local-only) variations can also be further classified as: 1. Spatially correlated variations – The kind of within-die variations which exhibit similar characteristic for devices in small neighborhood in the die than those are placed far apart. 136 2. Random or independent variation – The kid of within-die variations which is statistically independent from other device variations. Examples are RDF and LER. Figure 6.4: Categorization of device variation [10] The table in Figure 6.4 [10] shows the industry specific categorization of device variability for 65nm. Here, variations are separated into rows according to spatial domain: those that involve chip mean (global-only), those that vary within the chip (local-only) but have local or chip-to-chip correlation, and those that vary randomly from device to device (local). The columns identify variations arising from the process used to make the device, or originating from device behavior changes over time. Such a categorization of variability is useful as it separates issues requiring different statistical treatments in anticipating their circuit impacts. 137 6.3 Review of existing work on variability Increasing importance of variability on delay and power consumption in CMOS circuits is addressed comprehensively in [10]. Statistical approaches [174] have evolved over time to effectively analyze variability and associated effects. Considerable amount of existing and ongoing research addresses design specific traits such as variability aware design [148], statistical timing analysis [12], statistical cell characterization [153], statistical leakage prediction [169], statistical path selection [187] and even variability aware subthreshold/near-threshold design [74]. But, all these approaches ignore local- only variations and account for global-only variations by performing simulations at a small number of “corners”. Recently, researchers such as [78][46][54] have started addressing test specific traits such as variability-aware fault modeling, delay testing and delay validation respectively. Again, these approaches consider the worst case (full global plus local) variability as the complete process space which results in pessimism in the final results. Thus, none of the above mentioned approaches take advantage of the fact that process monitors [11][20][144] can be used efficiently to make deterministic estimates on global- only variability for each particular die and thereby adaptively reduce the pessimism. Works such as [44][22] present pre-silicon variation models where the spatially correlated component of within-die (local-only) variations are addressed using grid-based models to reduce the pessimism but ignore the deterministic estimates by process monitors of global-only variability. Authors in [192] present an active learning framework for post-silicon variation modeling that reuses information from past wafers 138 to validate and improve the model. In contrast, our approach will use information from past wafers to efficiently generate validation vectors. The approach in [142] shows how PDF (Probability Distribution Function) for leakage due to full global plus local variability can be arrived at by using weighted sum of local-only variability PDFs for leakage at discrete points on global-only variability PDF. Our approach is exactly the reverse of this, where the full global and local variability PDF will be decomposed into local-only variability PDFs at discrete points in the global-only variability PDF. Moreover none of the existing variability-aware works in testing domain [54][174] try to incorporate knowledge of full global plus local, global-only and local-only variability. Hence, ours will be the first approach to decompose full global plus local variability to demonstrate that this can reduce the costs of validation related tasks, namely path selection, vector generation and vector application. 6.4 Effect of process variations on validation vector set This research is motivated by the clear trend that, marginalities constitute an increasing proportion of misbehavior first discovered during validation. This increase in importance of marginalities is caused by low-level effects and is aggravated by process variations. In our experimented on the ISCAS89 benchmarks to analyze the effect of variability on the validation vector space (Chapter 5). We consider local-only as well as full global plus local variabilities and for each level of variations, we select the top 10% delay paths (∆ = 0.1*T c , where T c is the maximum circuit delay determined by performing ETA (Chapter 3) under nominal delay values (zero-variability)) to generate the validation vector-space. 139 Figure 6.5: Validation vector generation for c17 (a) local-only variability (b) full global plus local variabilities Figure 6.5(a) shows the value at each circuit line at the end of our approach for generating test vector spaces (Chapter 5) for validation for the logical path {R 3 , R 7 , F 9 , F 10 , R 12 , R 13 , F 16 } as well as the resilient delay model curves (Chapter 3) for local-only variability. The number of vectors generated is 1*1*1*1 = 1. Figure 6.5(b) shows the same information for the same path for the case for full global variability. In this case, the number of vectors increases to 2*2*1*2 = 8. While the increase in number of vectors for a path is small (from 1 to 8) for a small circuit like c17, we also explored the effect of increasing levels of variability for 140 slightly larger circuits (the increase in number of vectors required for a single top critical path in ISCAS89 benchmark s1196 is reported as 3,478). Table 6.1 shows the effect of nominal (zero), local-only, and full global plus local variabilities on the paths selected (with max delay greater than 0.9*Tc, where Tc is the clock period determined by Enhanced Timing Analysis [20] for nominal case) and vectors generated (size of vector-spaces) for several ISCAS89 benchmarks. It should be noted that the results in Table 6.1 corresponds to the number of paths selected and the number of vectors generated after we have finished justification procedure where certain paths are aborted because of the provided backtrack limit [54]. Table 6.1: Effect of process variations on the number of paths selected and the number of validation vectors Paths selected Vectors generated Nominal Local variations Full global variations Nominal Local variations Full global variations s298 9 13 20 9 22 154 s444 1 7 22 1 7 22 s953 7 17 38 1 162 3,422 s713 3 12 47 6.5 x 10 4 1.43 x 10 5 1.73 x 10 7 s1196 12 26 52 44 1,622 16,542 s9234 734 2,657 6,785 84,562 2.37 x 10 6 6.53 x 10 9 It is evident from Table 6.1 that as the levels of variability increase, from zero- variability (i.e., the nominal case) to local-only variability, and to full global plus local variability, the number of paths that must be targeted as well as the number of vectors generated for validation of each target path increase dramatically. This empirical 141 observation prompted us to identify key properties pertaining to the effects of increasing variability on various steps of our approach for generating validation vectors: • As variability increases (super-set), the envelopes expand and subsume all the points in the envelopes for lower levels (sub-set) of variability. Thus the envelopes of full global plus local variability subsume the envelopes of local-only variability. • With increase in variability (super-set), the timing ranges (calculated by our ETA [82]) at each circuit line are a super-set of those for lower levels of variations. • With increase in variability, the numbers of paths in the target set that are functionally sensitizable (FS) and maximum delay sensitized (MDS) increases. • With the increase in variability, the size of the mother vector-space (obtained after MDS, but before SIR (Chapter 3)) either expands or remains constant. • With increase in variability, the number of sensitizable paths in the target set increases. Also, the test vector-spaces expand and hence so do the total number of vectors needed for validation of the entire circuit. In particular, with higher variability, our timing conditions call for more enumeration at additional circuit lines, compared to lower variability. As evident from Table 6.1 and the properties identified above, as the variability grows, the number of testable paths identified grows and so does the number of vectors in the validation vector-space for each testable path. Hence, we develop a novel and innovative divide and conquer based approach that can dramatically reduce the expected size of the validation path and test sets, without compromising its resilience. 142 6.5 Segmentation of variation envelopes – divide and conquer As mentioned above, the full global plus local variability comprises of two components, namely global-only variability and local-only variability [12][49], as shown in Figure 6.6(a). (Here we assume that all components of variability are commonly approximated as normal distributions for reasons explained later.) Hence, we propose to arrive at the full global plus local variability (μ G+L , σ G+L ) using a segmented approach where (a) the global-only variability component (μ G , σ G ) is segmented into many segmented envelopes, and (b) each segment is combined with the full local-only variability (μ L , σ L )). Subsequently, the full global plus local variability (μ G+L , σ G+L ) can be arrived at by weighted/non-weighted sum of contributions from the segmented envelopes (similar to the concept of weighted sum at discrete points in [142]). Note that our approach is based on the assumption that ring-oscillator based test structures or process monitors such as [11][20][144] to estimate the global-only process shift on parameter values (such as delay) are available on most of chips. Based on the sizes of individual segments, the segmented approach can be classified as: • Uniform segmentation – segmented sections are of uniform width, say kσ (e.g., Figure 6.6(b) shows uniform segmentation with three segments of width 2σ each). • Non-uniform segmentation – segmented sections are of varying widths, say k 1 σ, k 2 σ, … (e.g., Figure 6.6(c) shows non-uniform segmentation with four segments, where the central segment is of width σ, and the segments at the extremes are of 143 width 2.5σ each). We propose to segment at finer granularity at center as the concentration of chips will be much more near the center than at the extremes. Figure 6.6: Segmentation of full local plus global variability envelope for one parameter - (a) Full global plus local, (b) uniform (c) non-uniform 144 Note that the weighting is done as per the area under the curve (probability of occurrence of the nominal operating point of the fabricated chip within the segmented interval). As mentioned earlier, our approach expresses variability in terms of the parameters of the devices in the gates. Transistor threshold voltage (V th ) is the most dominant contributor to device variability for the transistors used in today’s CMOS gates [10][12] and can be considered as our basis for segmentation. From Section 6.4, it is evident that as the variabilities increase, the number of vectors in the validation test set increases. So we propose an approach to segment and quantize the full global-only variability envelope based on the information from process monitors similar to those in [11][20][144]). A uniform three-way and non-uniform three-way segmentation can be performed on the single device parameter V th as shown in Figure 6.6(b) and Figure 6.6(c) respectively. Note that the topmost curve in Figure 6.6(a) shows the single curve representation of full global plus local variability (μ G+L , σ G+L ). Later we will show the benefits of segmentation via experiments on ISCAS benchmarks which is evident from the fact that even for a small benchmark such as c17, the one parameter (V th ) three-way segmentation based adaptive (to be explained next) approach can reduce the validation vector set by as much as 5X (15 to 3). Ideally, all possible sub-envelopes within the global only variability envelope must be considered. But, since there are an infinite number of such sub-envelopes, we need to simplify test development by quantizing the nominal operating points for which we consider a sub-envelope. Note that each sub-envelope corresponds to one process- 145 monitor present in the fabricated chip and we consider the nominal operating points to be at the extremities of each sub-envelope. Hence for the case in Figure 6.6(b), the nominal operating points will be shifted corresponding to global variability component at -3σ, -σ, +σ and +3σ. Based on the number of parameters considered during segmentation, the segmented approach can be classified as: • One parameter segmentation - segment along V th • Two parameter segmentation - segment along V th and Leff Fig 6.7 shows the 3 way (uniform) segmentation for the two dominant delay variability parameters [1] – V th (horizontal axis) and L eff (vertical axis). In Figure 6.7 the rectangles represent the projection of the base of combined (V th and L eff ) PDF. Consider the full global-only and full local-only variability envelope in Figure 6.7(b) of {(-3σ, +3σ) for V th and (-3σ, +3σ) for L eff } being represented as rectangle of size 6σ*6σ = 36σ 2 . A smaller rectangle in Figure 6.7(c) represent a global only variability sub-envelope of size 2σ*2σ = 4 σ2 {(-σ, +σ) for V th and (-σ, +σ) for L eff }. Note that 3-way segmentation in two parameters will correspond to 3*3 = 9 sub-envelopes, similarly for n parameters there will be 3 n sub-envelopes covering the n-dimensional full global variability surface. 146 Figure 6.7: Variability envelopes for two paramters - (a) Full global plus local, (b) full global-only and full local-only, (c) uniform segmentation, (d) non-uniform segmentation Fig 6.7(d) shows the 3 way (non-uniform) segmentation for the two parameters where we have 9 sub-envelopes of varying sizes (four of size 6.25σ 2 , four of size 1.25σ 2 , and one of size σ 2 ) (These figures are simplified; our approach assumes that the two parameters have different standard deviations, namely σ 1 and σ 2 ). Also, based on the number of segments generated, the uniform/non-uniform segmented approach can be 147 classified as m-way segmentation (i.e., having m segments, refer to [53] for more details). Also, segmentation can be done at a finer granularity but this will increase the complexity of all the parts of our framework from preprocessing step of characterization of resilient delay model (Chapter 3) to the final step of generating vectors (Chapter 5). Also, with the increase in number of variability parameters considered the complexity grows exponentially. Later in the experimental results section we will show that segmentation at this granularity gives us about 10X reduction in validation vector volume without any explosive increase in run-time complexity of our framework. 6.6 Usage of sub-envelopes during validation vector generation The segmented envelopes shown in Figure 6.6 and Figure 6.7 are incomplete, as these figures show only how the global-only variability envelope is segmented (based on information from on-chip process monitors). The actual full global plus local variability envelope will be arrived at by superimposing the full local-only variability envelope at the quantized points (which are the extremities of the segmented global-only variability envelope) corresponding to each sub-envelope. Such an arrangement for 3-way uniform- segmentation corresponding to Figure 6.7(c) is shown in Figure 6.8(b). The complete full global plus local variability envelope is given by the large rectangle in Figure 6.8(a); the sub-envelopes for uniform segmentation are given by the rectangles shown in Figure 6.8(b). Sub-envelopes for non-uniform segmentation can be arrived at in a similar way. Note that the extended rectangle (dashed) represents the effect of local-only variability on top of global-only variability (solid). 148 Figure 6.8: Application of global plus local variability sub-envelope for two paramters - (a) Full global plus local envelope, (b) uniform segmentation The variations in each parameter are typically modeled by a distribution. Normal distribution is commonly assumed, since it is often a fairly good approximation of the empirical reality and simplifies analytical derivations. In such cases, the numerical value of variability in a parameter, may correspond to some multiple of its standard deviation, typically, 3σ or higher. In such a scenario, we have the following important observations. 1. The full global variability envelope does not denote the entirety of all possible variations. For example, if we assume that the numerical values for each parameter in the example in Figure 6.8(a) correspond to the 3-times the standard deviation (σ) for the respective parameter, then the full global plus local 149 variability envelope in Figure 6.8(a) represents 99.4% of all possible chips fabricated using that process. 2. Each sub-envelope in Figure 6.8(b) denotes die with different nominal values of its parameters as well as the worst-case global plus local variability. The size of sub- envelope must be greater than or equal to the worst-case local-only variability so as to ensure resilience of the vector-spaces generated. We adhere to this rule as for each segmented envelope we consider the full local-only variability possible for that case and only segment the global-only variability envelope. 3. Each sub-envelope in Figure 6.8(b) represents sets of die that may be fabricated with different probabilities. This arises from the fact that practical distributions for variations are non-uniform. In particular, if we assume normal distribution for each of the key parameters, the probability of occurrence of a variability sub- envelope decreases as we move away from the center of the variability envelope. Hence, in practice we can ignore some parts of the variability envelope (provided the probability of occurrence is sufficiently low) and can thus reduce the number of vectors drastically. Figure 6.9 shows the weight associated with each sub-envelope derived from full global plus local variability envelope for 2-parameter 3-way segmentation (both uniform – Figure 6.9(b), and non-uniform – Figure 6.9(c)). We will be using these weights to arrive at our adaptive validation vector set (see next paragraph). 150 Figure 6.9: Validation vector and path set for s1196 - (a) No segmentation, (b) uniform segmentation, (c) non-uniform segmentation The sub-envelopes for the quantized nominal operating points can be used in the following two ways: • Non-adaptive: Every sub-envelope is used for every fabricated copy of the chip under validation. In this case, for every copy of the chip, the validation test set (VTS) is the union of VTS for individual sub-envelopes, i.e., |VTS| = • Adaptive: For every copy of the chip, we perform measurements on a set of test structures to determine the nominal parameters for the chip. Then we perform validation only using the VTS for the corresponding sub-envelopes. In such an approach, the VTS for each copy of the chip is the union of the VTS generated for the sub-envelopes that correspond to its nominal point. Note that the sub- envelopes near the corners are less likely to occur than those at/near the center of the global variations envelope (which is evident from the probabilities shown in 151 Figure 6.9). Hence, the expected number of vectors required for the entire batch of chips, E (|VTS|), is the sum of the number of validation vectors required for individual sub-envelopes, |VTS|, weighted by the corresponding probabilities of occurrence, i.e., |VTS| = 6.7 Experimental results on ISCAS benchmarks Note that our experiments used a resilient simultaneous delay model for both to- controlling and to- non-controlling transitions (Chapter 3). First, we select T C as the maximum circuit delay (under nominal delay values) computed by enhanced timing analysis (Chapter 3). Then we fix the timing threshold (∆) at 10% for target path selection. Then using our timing dependent framework (Chapter 5) we generated the validation test vector-space for different values of full global plus local, global-only and local-only variabilities in circuit parameters. 6.7.1 Experimental results on super-threshold circuits Table 6.2 shows the validation vector-sets and path-sets based on our approach for the medium sized ISCAS benchmark s1196 implemented as a super-threshold circuit with V dd = 1.2V. The results clearly demonstrate the benefits of our segmentation based approach as it reduces the total number of vectors drastically from 16,542 (full global plus local variability) to 1,142 (adaptive uniform two-parameter three-way segmentation). This number further reduces to 947 using adaptive non-uniform two- parameter three-way segmentation (we will explain ahead more about the benefits of non- uniform segmentation). Similar reduction for selected path set is also observed (from 52 152 (full-global plus local) to 13 (non-uniform adaptive)). We also reported results for 2- parameter 6-way adaptive and non-adaptive approaches (rows 7-9 in Table 6.2). Table 6.2: Validation vector and path-sets - s1196 (full global plus local variability) Approach Paths Vectors Full global 52 16,542 1-parameter segmentation 3-way non-adaptive 42 6,142 3-way adaptive (uniform) 25 1,964 3-way adaptive (non-uniform) 21 1,903 6-way non-adaptive 42 6,024 6-way adaptive (uniform) 24 1,480 6-way adaptive (non-uniform) 20 1,394 2-parameter segmentation 3-way non-adaptive 42 5,836 3-way adaptive (uniform) 17 1,142 3-way adaptive (non-uniform) 13 947 Figure 6.9(b) also shows the validation vector-set (at the top of each rectangle) and path-set (at the bottom of each rectangle) corresponding to 2-parameter, 3-way uniform segmentation for ISCAS benchmark s1196. Each of the nine sub-envelopes contains three values corresponding to the number of paths selected, associated probability of occurrence, and the number of vectors generated; respectively. It is evident from Figure 6.9 that the segments corresponding to bottom-left side (negative) of the distribution results in zero or very small number of vectors in accordance with our observation in previous section that such variability will shift the nominal operating point towards the negative side, rendering almost all of the paths non-critical. Similarly, it can 153 be seen that the sub-envelopes at the extreme top-right corner of the full global plus local variability envelope result in the highest number of validation vectors (as well as paths selected) due to extremely high levels of variations at those corners. Figure 6.9(c) shows the same for 2-parameter, 3-way non-uniform segmentation. It is evident from Figure 6.9(c) that non-uniform adaptive approach can further reduce the validation vector and path sets. The reduction is primarily due to shrinking the central sub-envelope whose contribution in terms of probability was very high (about 50%) in uniform adaptive approach to a lower value (of about 15%) in non-uniform adaptive approach. Though, increasing the size of sub-envelopes at the extremes increases the associated probabilities, but the corresponding increase in validation vector and path-sets for such sub-envelopes is relatively small. This is primarily due to the fact that at the extremes the earlier sub- envelope (corresponding to uniform segmentation) has accounted for most of the vectors and paths identified by the new sub-envelope (corresponding to non-uniform segmentation). Thus reduction of vectors at non-extreme cases (along with their reduced probabilities), dominate the increase in vectors at extreme cases along with their increased probabilities. Table 6.3 shows the number of vectors in validation test set for non-adaptive as well as adaptive (uniform) versions of our one-parameter three-way approach for few bigger ISCAS89 benchmarks (with V dd = 1.2V). It can be observed that our uniform adaptive approach can reduce the validation vector set for the full chip based benchmark 154 s9234 (247 inputs, 489,708 logical paths) by 10X (but requires only 3X characterization effort). Table 6.3: Validation vector-sets for one-parameter segmentation Full global One-parameter three-way Segmentation No. of vectors No. of vectors Reduction Non- Adaptive Adaptive Non Adaptive Adaptive s298 154 90 28 1.71X 5.5X s444 22 13 7 1.69X 3.4X s953 3,422 1,672 370 2.04X 9.2X s713 1.73 x 10 7 8.76 x 10 6 1.42 x 10 6 1.97X 12.1X s9234 6.53 x 10 9 3.86 x 10 9 6.66 x 10 8 1.69X 9.8X Table 6.4 shows the number of vectors in validation test set for non-adaptive as well as adaptive (uniform as well as non-uniform) versions of our two-parameter three- way approach. It can be observed that our uniform adaptive approach can reduce the validation vector set for s9234 by about 19X (but requires only 9X characterization effort), whereas the non-uniform adaptive approach (though requiring identical characterization effort of 9X) can further reduce the validation vector set upto 23X. This additional reduction is due to reduced weight (probabilities) of sub-envelopes at non- extremes (see Figure 6.9(c)). Note that we assume the full global plus local, local-only, and global-only variabilities to be uncorrelated (for worst case variability in 65nm industrial library provided to us) and follow the normal distribution. The probability of occurrence of each sub-envelope is calculated and subsequently multiplied by the |VTS| for that sub- 155 envelope. The cumulative total of all sub-envelopes gives the VTS for the circuit under consideration in our adaptive approach. It can be observed that the knowledge of local variability along with the adaptive approach (both uniform and non-uniform) can significantly reduce the VTS with little increase in characterization effort which is one time cost. Also, note that during segmentation, correlation between parameters is accounted for, as in our method we only segment the single variable per parameter responsible for the statistics of the distribution without affecting the correlation. But this kind of implementation of segmentation without assuming independence between parameters, reduce the guarantee on capturing the worst case (and has to be accounted for in the validation budget) because a part of individual distribution is unaccounted for and subsequently the correlated component. But conceptually, we increase the chance of finding the worst case as now we are dealing with the conditional probability (worst case gate delay, given that global variability is limited to a particular spread and full local variability), in contrast to the unconditional probability (worst case gate delay considering full global plus local variability). Table 6.4: Validation vector-sets for two-parameter segmentation Two-parameter three-way Segmentation Reduction Non Adaptive Adaptive (uniform) Adaptive (non-uniform) s298 2.02X 6.16X 6.49X s444 1.83X 3.67X 4.05X s953 2.18X 14.5X 15.5X s713 2.1X 20.1X 23.8X s9234 1.9X 19.1X 22.6X 156 6.7.2 Experimental results on near- and sub-threshold circuits Based on our observations about effect of variability on delay of ultra-low voltage CMOS circuits from Chapter 4, we decided to evaluate our approach on near-threshold (with V dd = 0.5V) and sub-threshold circuits (with V dd = 0.2V) and report the results in Table 6.5. Note that here 1P and 2P represent one-parameter three-way segmentation and two-parameter three-way segmentation; respectively. Also, NA, AU and ANU represent non-adaptive, adaptive-uniform and adaptive non-uniform; respectively. Table 6.5: Validation vector and path-sets for near- and sub-threshold circuits Near-threshold circuits (V dd = 0.5V) Reduction NA (Non-adaptive) AU (Adaptive uniform) ANU (Adaptive non-uniform) 1P 2P 1P 2P 1P 2P s298 1.9X 2.3X 5.8X 6.8X 5.9X 7.1X s444 2.1X 2.4X 3.7X 4.4X 4.1X 4.9X s953 2.3X 2.6X 10.1X 15.6X 10.5X 16.5X s713 2.2X 2.6X 12.7X 20.6X 13.3X 24.6X s9234 1.9X 2.2X 10.2X 19.4X 11.1X 25.2X Sub-threshold circuits (V dd = 0.2V) Reduction NA (Non-adaptive) AU (Adaptive uniform) ANU (Adaptive non-uniform) 1P 2P 1P 2P 1P 2P s298 2.4X 2.7X 6.5X 7.8X 7.2X 7.5X s444 2.7X 3.2X 5.2X 6.2X 5.8X 6.0X s953 2.8X 3.3X 12.3X 16.4X 13.6X 17.2X s713 2.5X 2.9X 14.4X 21.5X 16.3X 25.9X s9234 2.3X 2.7X 12.6X 20.7X 15.9X 27.1X 157 The results from Table 6.5 clearly demonstrate the benefits of our segmentation based approach for near- and sub-threshold circuits, as the reduction in validation vector- set is higher than that for super-threshold circuits (Table 6.3 and Table 6.4). Note that the reduction in validation vector-set for s9234 has increased from 23X in super-threshold to 25X for near-threshold, and which gets further increased to 27X for sub-threshold circuits. This increased reduction is due to increased effect of variability on delay for near-threshold and sub-threshold circuits, complemented by the reduced weight (probabilities) of sub-envelopes at non-extremes. 6.8 Summary Since increasing process variations have profound influence on the effect of delay marginalities as evident from the quantitative increase in validation complexity and costs (e.g., sizes of path sets and vector-spaces), we decided to segment and quantize the full global plus local variability envelope, based on the information regarding global-only variability (obtained from process monitors). Our results demonstrate that our proposed divide and conquer approaches (adaptive and non-adaptive) that segment the global-only variability envelopes into sub-envelopes (that along with full local-only variability) collectively represent the full global plus local variability envelope, can dramatically reduce the expected size of the validation test set. Furthermore, we observed that a non- uniform segmentation based on probability of occurrence of an envelope can further reduce the validation vector set. 158 CHAPTER 7 FUTURE WORK: GENERATING VECTORS FOR CROSSTALK VALIDATION Besides MIS, crosstalk-induced delay is another effect that affects delay significantly. We can define crosstalk marginality as the crosstalk-induced slowdown in the presence of normal process variations and we propose an approach to extend our delay marginality validation approach to generate vectors for crosstalk marginality validation. 7.1 Crosstalk-induced delay Consider a capacitive coupling between two circuit line x and y. A transition at y is delayed (compared to the case where the coupling capacitance does not exist) if a transition in the opposite direction occurs in x within a relatively short time [8][33][34][105]. Figure 7.1: Crosstalk-induced slow-down 159 The coupling effect also causes the transition at x to arrive later (compared to the case where the coupling does not exist) (see Figure 7.1). This phenomenon is called crosstalk-induced slow-down or delay. 7.2 Testing for crosstalk-induced delay defects In a fabricated copy of a circuit, it is possible that, due to presence of defects and process variations, crosstalk-induced delay (slow-down) is greater than that estimated by timing verification and thus a timing error is created in the circuit [84]. Existing approaches are classified as single aggressor [35][64][84][100][109] and multiple- aggressor approaches [43][68][104][190][191]. Most of the earlier works [35][43][64][68][100][104][109][190][191] are timing independent and work only on very basic pin-to-pin delay models. Few approaches [64][109] are not able to invoke the maximum delay even for the basic pin-to pin delay model. To the best of our knowledge all the existing approaches are variation unaware, hence not suitable for crosstalk validation. 7.3 Surrogates for a crosstalk target In [84] authors presented a timing independent framework based on defining a set of surrogates for each crosstalk slow-down target and generating vectors for the same. Any test set consisting of a vector for each of the testable surrogate is claimed to be a test for the corresponding crosstalk slow-down target. Consider the crosstalk site shown in Figure 7.2 with lines A and V involved in crosstalk. Considering line V as the victim, the two crosstalk slowdown targets are 160 (A,R,V,F) and (A,F,V,R), where R and F respectively represent a rising and a falling transition, and (A,R,V,F) represents a crosstalk target with A as the affecting line and V as the victim line and the two lines have rising and falling transitions, respectively [84]. Such crosstalk sites can be easily identified from the layout topology by selecting pair of lines with sufficiently high coupling capacitances. Figure 7.2: An example crosstalk site In [84] the authors define a surrogate as a triplet of three sub-paths, a sub-path that ends at the affecting line, a sub-path that ends at the victim line, and a sub-path that starts at the victim line and ends at one of the primary outputs. A total of n*m*p surrogates, written as triplets of sub-paths, (IA i , IV j , VO k ), where 1 ≤ i ≤ n, 1 ≤ j ≤ m and 1 ≤ k ≤ p, are defined in association with the crosstalk slow-down target (A,R,V,F). In crosstalk validation the objective is to generate a set of vectors which will guarantee to contain at least one vector that will excite the crosstalk effect at the crosstalk site and detect it at the output with the maximum delay possible. A simple method to attain this objective will be to excite the crosstalk between A (aggressor) and V (victim) in all possible ways and propagate it to output in all possible ways. We will call this approach as target exhaustive. In target-exhaustive approach we generate the cone- 161 exhaustive vectors for the super-cone which comprises of fan-in cones of A and V and the fan-in cone covering the edges of the fan-out-cone of V as shown in Figure 7.3. Due to astronomical number of vectors required, target exhaustive is impractical. Hence, there is a need of a new and efficient approach for crosstalk validation. Figure 7.3: The target-exhaustive approach 7.4 Proposed approach and key ideas Due to the unavailability of precise quantitative values of delays, a simple method of selecting the sub-path with the longest delay from any input to the victim line does not guarantee the desired transition with the highest arrival time at the victim line. In [84] the authors presented an all inclusive approach of enumerating paths/sub-paths. Their approach translate requirements such as “excite a falling transition at line V i with the highest possible arrival time” as (i) enumerate every possible sub-path from inputs to V i , and (ii) for each sub-path, generate a vector that propagates a transition along the sub- path without any reduction/increase in arrival time due to off-path transitions. The set of vectors generated using the above two-step process is guaranteed to include one or more vectors that will cause a falling transition at V i with the highest possible arrival time in a resilient manner [83]. Since our problem of crosstalk validation necessitates a resilient 162 approach we propose to use the same approach with intelligent usage of available timing information to derive conditions which will prune our surrogate list and vectors considerably. 7.4.1 Surrogate list pruning based on timing information It is prevalent in literature [34][35][43][64][68][100][104][109][190][191] that the problem of crosstalk slowdown arises only when the opposite transitions occur at A and V lines, and are within some specific skew. We will term this as δ Xtalk (skew that can cause crosstalk slowdown effect). If the timing ranges of A and V lines are further apart than δ Xtalk (see Figure 7.4) then we can eliminate such surrogates from our surrogate list for a target. The sufficient conditions for elimination are: A V (R/F)S – A A (F/R)L > δ Xtalk (7-a) A A (F/R)S – A V (R/F)L > δ Xtalk (7-b) Figure 7.4: Timing ranges for a surrogate candidate for elimination with a very early transition at A. 163 7.4.2 Identification of FS testable surrogates Functional sensitization conditions [80] are the minimum requirement for any path to be sensitizable (testable). Hence we propose to apply functional sensitization conditions and perform implications on all the sub paths of a surrogate to identify FS testable surrogates. We can use arrival time values computed by ETA [82] to further eliminate surrogates based on the conditions (7-a) and (7-b). 7.4.3 Surrogate list pruning due to timing dependent logic conditions Functional sensitization conditions [80] are timing independent. However, additional constraints can be imposed using the limited timing information available to reduce the number of valid choices at lines. Figure 7.5: Timing dependent functional sensitization conditions for to-controlling transition at on-path input of a 2-input NAND gate Consider the case of a to-controlling transition at on-path input X, as shown in Figure 7.5. Consider the case, where the off-path transition at Y is preceding the on-path transition. In a unit delay model, any final to-controlling transition at side-input that succeeds a to-controlling transition at on-path input will be forbidden, since such a transition at side-input will kill the transition at on-path input (Figure 7.5(a)). So, for such 164 cases the FS conditions at side-input will reduce to {S1, CR, TR, H1} (values with a final value one). Along similar lines, a later arriving to-non-controlling transition at side-input is also forbidden (Figure 7.5(b)) and the corresponding FS conditions at side input will be {S1, CF, TF,H1} (values with initial value 1). Table 7.1 shows the complete timing dependent functional sensitization conditions along with their timing ranges for unit delay model. Table 7.1: Timing dependent FS conditions under unit delay model Timing case Equations Conditions 1. Y before X A Y (R/F)L < A X FS {S1,CR,TR,H1} 2. Y overlaps X The timing ranges of X and Y overlaps {S1,CR,TR,CF,TF,H1,H0} 3. Y after X A Y (R/F)S > A X FL {S1,CF,TF,H1} Along similar lines, the timing dependent FS conditions can be derived for the resilient delay model. Consider the case of transition at the side-input Y before the transition at on-path input X. We are interested in the range where we can definitely forbid a falling transition at Y because it will dominate the on-path transition at X. We know from Chapter 3 that the region of interference between transitions at X and Y for an early arriving signal at Y is δ YX F. Hence if a transition at X occurs δ YX F after transition at Y, it is guaranteed that the transition at output is only because of the transition at Y. Hence the side-input dominates in such cases and in such cases a to-controlling transition at Y is forbidden. 165 Along similar lines, in Table 7.2 we show the timing dependent FS conditions for a resilient delay model. Here α XZ (R) is the maximum pin-to-pin delay from input X to output Z for a falling transition at X. Table 7.2: Timing dependent FS conditions under resilient delay model Timing case Equations Conditions 1. Y long before X A Y (R/F)L + δ YX F < A X FS {S1,CR,TR,H1} 2. Y before X and no overlap A X FS >A Y (R/F)L >= A X FS - δ YX F {S1,CR,TR,CF,TF,H1,H0} 3. Y overlaps X The timing ranges of X and Y overlaps {S1,CR,TR,CF,TF,H1,H0} 4. Y after X and no overlaps A X FL < A Y (R/F)S <= A X FL + α XZ (R) {S1,CR,TR,CF,TF,H1,H0} 5. Y long after X A Y (R/F)S > A X FL + α XZ (R) {S1,CF,TF,H1} Timing dependent FS conditions can be obtained for a to-non-controlling on path transition in a similar manner. Such timing dependent FS-sensitization conditions will help us prune our FS testable surrogate list with no extra search effort by identifying more conflicts during implications. 7.4.4 Surrogates with non-overlapping timing ranges Our target surrogate list consists of all those candidates where the arrival timing ranges at (A) and (V) are within the skew range δ Xtalk . We can utilize the timing information to intelligently deal with the remaining surrogates with non-overlapping timing ranges (see Figure 7.6). It has been observed [130] that the crosstalk slowdown effect increases considerably as the transitions at A and V occur closer to each other. Hence in the cases with early transition at A(V) we will try to get a transition at A(V) with 166 the maximum arrival time using our Maximum Delay Sensitization (MDS) conditions for sub paths from PI’s (Primary Inputs) to A(V). Our MDS conditions guarantee to excite the maximum delay at the output along a path and hence will be able to abridge the skew between A and V for the surrogate under consideration. Figure 7.6: Timing ranges for a surrogate candidate with non-overlapping timing ranges for application of MDS conditions Partial ordering amongst choices in the FS sensitization condition can be derived in a similar way as for the MDS conditions in Chapter 5. The selective enumeration based on such ordering can be done in a way similar to the side-input refinement in Chapter 5. For every triplet of a surrogate, the leaf node comprises of a vector-sub-space obtained after performing SIR at all the three sub paths. Addition of new vector-space and deletion of vector-space can be done along the lines of the algorithm proposed in Section 5.5. 167 BIBLIOGRAPHY [1] M. Abramovici et al., “A reconfigurable design-for-debug infrastructure for SOCs”, In Proceedings of Design Automation Conference, 2006, pp. 7-12. [2] A. Agarwal, F. Dartu and D. Blauuw, “Statistical gate delay model considering Multiple Input Switching”, In Proceedings of Design Automation Conference, 2004, pp. 658-663. [3] N. Ahmed, M. Tehranipoor, and V. Jayaram, “Timing-based delay test for screening small delay defects”, In Proceedings of IEEE Design Automation Conference, 2006, pp. 320-325. [4] T. B. Alexander et al., “Verification, Characterization, and Debugging of the HP PA-7200 Processor”, In Hewlett-Packard Journal, Feb. 1996, pp. 34-43. [5] B. Amelifard et al., “A Current Source Model for CMOS logic cells considering multiple input switching and stack effect”, In Design, Automation & Test in Europe Conference, 2008, pp. 568-573. [6] C. Amin et al., “A Multi-port Current Source Model for Multiple-Input Switching Effects in CMOS Library Cells”, In Proceedings Design Automation Conference, 2006, pp. 247-252. [7] E. Anis and N. Nicolici, “Low Cost Debug Architecture using Lossy Compression for Silicon Debug”, In Proceedings of Design, Automation, and Test in Europe, 2007. [8] H. B. Bakoglu, “Circuits, Interconnects, and Packaging for VLSI”, Addison- Wesley, 1990. [9] P. L. Barreiro and J. P. Albandoz, “Population and sample: Sampling Techniques”, Management Mathematics for European Schools. [10] K. Bernstein et al., “High- Performance CMOS variability in the 65-nm regime and beyond”, In IBM Journal of Research and Development, Vol. 50, Issue 4.5, 2006, pp. 433- 449. [11] M. Bhushan et al., “Ring oscillators for CMOS process tuning and variability control”, In IEEE Transactions on Semiconductor Manufacturing, Vol. 19, Issue 1, 2006, pp. 10-18. 168 [12] D. Blaauw et al., “Statistical Timing Analysis: From basic principles to state of the art”, In IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 2008, Vol. 7, Issue 4, pp. 589-607. [13] T. Bojan, I. Frumkin, and R. Mauri, “Intel first ever converged core functional validation experience: Methodologies, challenges, results and learning”, In Microprocessor Test and Verification, 2008. [14] S. Borkar et al., “Parameter variations and impact on circuits and michroarchitecture”, In Design Automation Conference, 2003, pp. 338-342. [15] S. Bose, P. Agrawal and V.D. Agrawal, “Generation of Compact Delay Tests by Multiple Path Activation”, In Proceedings of International Test Conference, 1993, pp. 714-723. [16] S. Bose, P. Agrawal and V.D. Agrawal, “Logic System for Path Delay Test Generation,” In Proceedings of European Design Automation Conference, 1993, pp. 200-205. [17] B. H. Calhoun and D. Brooks, “Can Subthreshold and Near-Threshold Circuits Go Mainstream”, In IEEE MICRO, Vol. 30, Issue 4, 2010, pp. 80-85. [18] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, “Modeling and Sizing for Minimum Energy Operation in Subthreshold Circuits”, In IEEE Journal of Solid- State Circuits, Vol. 40, No. 9, 2005, pp. 1778-1786. [19] Y. Cao, P. Gupta, A. B. Kahng, D. Sylvester and J. Yang, “Design Sensitivities to variability: Extrapolations and assessments in nanometer VLSI”, In IEEE International ASIC/SOC Conference, 2002, pp. 411 – 415. [20] B. Cha and S. K. Gupta, “Efficient Trojan Detection via Calibration of Process Variation”, In Asian Test Symposium, 2012. [21] H. Chang and S. Sapatnekar, “Statistical Timing Analysis under Spatial Correlation”, In IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 24, No. 9, 2005, pp. 1467-1482. [22] H. Chang and S. S. Sapatnekar, “Statistical Timing Analysis considering Spatial Correlations using a single PERT-like Traversal”, In International Conference on Computer-Aided Design, 2003, pp. 621-625. [23] Y.-S. Chang, S. K. Gupta and M. A. Breuer, “Test Generation for Ground Bounce in Internal Logic Circuitry”, In Proceedings of VLSI Test Symposium, 1999, pp. 95-104. 169 [24] Y.-S. Chang, S. K. Gupta and M. A. Breuer, “Test Generation for Maximizing Ground Bounce Considering Circuit Delay”, In Proceedings of VLSI Test Symposium, 2003, pp. 151-157. [25] Y.-S. Chang, S. K. Gupta and M. A. Breuer, “Test Generation for Maximizing Ground Bounce for Internal Circuitry with Reconvergent Fan-Outs”, In Proceedings of VLSI Test Symposium, 2001, pp. 358-366. [26] L. C. Chen et al., “Using Transition Test to Understand Timing Behavior of Logic Circuits on UltraSPARCTMT2 Family”, In International Test Conference, 2009, pp. 1-10. [27] L. C. Chen, S. K. Gupta, and M. A. Breuer, “A New Framework for Static Timing Analysis, Incremental Timing Refinement, and Timing Simulation”, In Asian Test Symposium, 2000, pp. 102-107. [28] L. C. Chen, S. K. Gupta, and M. A. Breuer, “A New Gate Delay Model for Simultaneous Switching and Its Applications”, In Proceedings of Design Automation Conference, 2001, pp. 289-294. [29] L. C. Chen, S. K. Gupta, and M. A. Breuer, “Gate Delay Modeling for Multiple to non controlling stimuli.” - Unpublished [30] L. C. Chen, S. K. Gupta, and M. A. Breuer, “High Quality Robust Tests for Path Delay Faults”, In VLSI Test Symposium,1997, pp. 88 - 93. [31] L. C. Chen, S. K. Gupta, and M. A. Breuer, “Incremental Timing Refinement on a min-max delay model”, Computer Engineering Technical Report No. 00-01, Electrical Engineering – Systems Dept., University of Southern California, April 2000. [32] L. C. Chen, T.M. Mak, S. K. Gupta, and M. A. Breuer, “ Crosstalk test generation on pseudo industrial circuits: a case study”, In International Test Conference, 2001, pp. 548-557. [33] W. Y. Chen, S. K. Gupta and M. A. Breuer, “Analytical Models Crosstalk Delay and Pulse Analysis for Non-Ideal Inputs”, In Proceedings of International Test Conference, 1997, pp. 809-818. [34] W. Y. Chen, S. K. Gupta and M. A. Breuer, “Analytical Models for Crosstalk Excitation and Propagation in VLSI Circuits”, In IEEE Transactions on Computer-Aided Design, Oct 2002, Vol. 21, Issue 10, pp. 1117-1131. 170 [35] W. Y. Chen, S. K. Gupta and M. A. Breuer, “Test Generation for Crosstalk- Induced Faults: Framework and Computational Results”, In Journal of Electronic Testing: Theory and Applications, Vol. 18, No. 1, Feb. 2002, pp. 17-28. [36] K. T. Cheng and H. C. Chen, “Delay Testing for Non-Robust Untestable Circuits”, In Proceedings of International Test Conference, pp. 954-961, 1993. [37] K. T. Cheng, S. Devadas, and K. Keutzer, “Robust Delay-Fault Test Generation and Synthesis for Testability under a Standard Scan Design Methodology”, In Proceedings of Design Automation Conference, 1991, pp. 80-86. [38] C. Chesneau, “A Tail bound for sums of independent random variables: application to the symmetric Pareto Distribution ”, Applied Mathematics e-Notes, 2009, pp. 300-306. [39] T. W. Chiang et al., “An Efficient Gate Delay Model for VLSI Design”, In International Conference on Computer Design, 2007, pp. 450-455. [40] S. H. Chou, “Computationally efficient characterization of standard cells for statistical static timing analysis”, ME Thesis, Dept. of Electrical Engineering and Computer Science, MIT [41] M. Choudhury et al., “Analytical Model for TDDB-based Performance Degradation in Combinational Logic”, In Design, Automation & Test in Europe Conference, 2010, pp. 423-428. [42] I. L. Chuang and M. A. Nielsen, “Quantum Computation and Quantum Information”, Cambridge Series on Information, 2000. [43] S. Chun , T. Kim and S. Kang, “ATPG-XP: test generation form maximal crosstalk-induced faults”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, September 2009, Vol-28, Issue. 9, pp.1401- 1413. [44] B. Cline et al., “Analysis and Modeling of CD Variation for Statistical Static Timing”, In International Conference on Computer-Aided Design, 2006, pp. 60- 66. [45] B. D. Cory, R. Kapur, and B. Underwood, “Speed Binning with Path Delay Test in 150-nm technology”, In IEEE Design & Test of Computers, Vol. 20, Sept-Oct 2003, Issue 4, pp. 41-45. [46] B. Courtois and C. Visweswariah, “At-speed Testing in the Face of Process Variations”, In VLSI Test Symposium, 2009, pp. 237. 171 [47] J. F. Croix and D. F. Wang, “Blade and Razor: Cell and Interconnect Delay Analysis Using Current-Based Models”, In Proceedings of Design Automation Conference, 2003, pp. 386-389. [48] F. Dartu, N. Menezes, J. Qian, and L. T. Pillage, “A Gate Delay Model for High Speed CMOS Circuits”, In Proceedings of Design Automation Conference, 1994, pp. 576-580. [49] B. P. Das, “Random Local Delay Variability: On chip measurement and modeling”, PhD Thesis - IISc, 2009. [50] P. Das and S. K. Gupta, “A gate delay model considering MIS for ultra low power CMOS circuits”, USC technical report No. CENG-2012-4, 2012. [51] P. Das and S. K. Gupta, “A systematic methodology to identify delay marginalities in a design during first silicon validation”, USC technical report No. CENG-2010-4, 2010. [52] P. Das and S. K. Gupta, “Capturing Variability in advanced gate delay models”, USC technical report No. CENG-2010-3, 2010. [53] P. Das and S. K. Gupta, “Efficient post-silicon validation via segmentation of process variation envelope – Global vs local variations”, USC technical report No. CENG-2012-6, 2012. [54] P. Das and S. K. Gupta, “On generating vectors for accurate post-silicon delay characterization”, In Proceedings of Asian Test Symposium, 2011, pp. 251-260. [55] S. Devadas, “Validatable Non robust delay-fault testable circuits via logic synthesis”, In Proceedings of IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 1992, Vol. 11, issue 12, pp. 1559-1573. [56] S. Devadas, A. Ghosh and K. Keutzer, “An observability-based code coverage metric for functional simulation”, In Proceedings of International Conference on Computer Aided Design, 1996. [57] R. G. Dreslinski et al., “Near-Threshold Computing: Reclaiming Moore’s Law through Energy Efficient Integrated Circuits”, In Proceedings of IEEE, Vol. 98, No. 2, 2010, pp. 253-266. [58] M. A. El-Moursy and E. G. Friedman, “Shielding Effect of On-Chip Interconnect Inductance”, In IEEE Transactions on VLSI Systems, 2005, pp. 396-400. [59] C. C. Enz, F. Krummenacher, and E. A. Vittoz, “An Analytical MOS transistor model valid in all regions of operation and dedicated to low-voltage and low- 172 current applications”, In Journal of Analog Integrated Circuits and Signal Processing, Vol. 8, No. 1, 1995, pp. 83-114. [60] H. Fatemi, S. Nazarian, and M. Pedram, “Statistical Logic Cell Delay Analysis Using a Current-based Model”, In Proceedings of Design Automation Conference, 2006, pp. 253-256. [61] S. Fisher et al., “An Improved Model for Delay/Energy Estimation in Near- Threshold Flip-Flops”, In International Symposium on Circuits and Systems, 2011, pp. 1065-1068. [62] P. Franco and E. J. McCluskey, “Three-Pattern Tests for Delay Faults”, In Proceedings of VLSI Test Symposium, 1994, pp. 452-456. [63] F. Frustaci, P. Corsonello and S. Perri, “Analytical Delay Model Considering Variability Effects in Subthreshold Domain”, In IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 59, Np. 3, 2012, pp. 168-172. [64] P.-Fu et al., “Non-robust Test Generation for Crosstalk-Induced Delay Faults”, In Asian Test Symposium, 2005, pp.120-125. [65] K. Fuchs, F. Fink and M. H. Schulz, “Dynamite: an Efficient Automatic Test Pattern Generation System for Path Delay Faults”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 10, Oct. 1991, pp. 1323-1335. [66] R. Galivanche, “On the Need to Test for Systematic Failures”, Keynote Talk, International Test Synthesis Workshop, 2006. [67] S. Ganapathy et al., “Circuit propagation delay estimation through multivariate regression based modeling under spatio- temporal variability”, In Design, Automation & Test in Europe Conference, 2010, pp. 417-422. [68] K. P. Ganeshpure and S Kundu, “On ATPG for multiple aggressor crosstalk faults in presence of gate delays”, In IEEE International Test Conference, 2007, pp.1-7. [69] A. Goel and S. Vrudhula, “Statistical waveform and current source based standard cell models for accurate timing analysis”, In Proceedings of Design Automation Conference, 2008, pp. 227-230. [70] R. Goyal and N. Kumar, “Current Based Delay Models: A Must For Nanometer Timing”, technical report CADENCE – unpublished. [71] A. Gupta, “Variation Aware Custom IC Design Report”, - Solido Design Automation Report. 173 [72] S. Gupta and S. S. Sapatnekar, “Compact Current Source Models for Timing Analysis under Temperature and Body Bias Variations”, In IEEE Transactions on VLSI Systems, 2011, pp. 1-14. [73] S. K. Gupta, “Validation of First-Silicon: Motivation and Paradigm” - Unpublished [74] S. Hanson et al., “Ultralow-voltage, minimum-energy CMOS”, In IBM Journal of Research and Development, Vol. 50, Issue 4, 2006, pp. 469-490. [75] D. M. Harris, B. Kellar, J. Karl, and S. Kellar, “A Transregional Model for Near- Threshold Circuits with Application to Minimum-Energy Operation”, In International Conference on Microelectronics, 2010, pp. 64-67. [76] S. Hatami and M. Pedram, “Efficient representation, stratification, and compression of variational CSM library waveforms using robust principle component analysis”, In Design, Automation & Test in Europe Conference, 2010, pp. 1285-1290. [77] S. Hatami, “Gate Delay Modeling and Static Timing Analysis in ASIC Design considering Process Variations”, PhD Dissertation, submitted to USC, EE- Systems, 2011. [78] F. Hopsch et al., “Variability-Aware Fault Modeling”, In Asian Test Symposium, 2010, pp. 87-93. [79] I . D. Huang and S. K. Gupta, “On Generating Vectors That Invoke High Circuit Delays - Delay Testing and Dynamic Timing Analysis”, In Proceedings of Asian Test Symposium, 2007, pp. 485-492 [80] I. D. Huang and S. K. Gupta, “Selection of Paths for Delay Testing”, In Proceedings of Asian Test Symposium, 2005, pp. 208 - 215. [81] I. D. Huang, “Timing-Oriented Approach for Delay Testing and Dynamic Timing Analysis”, PhD Dissertation, submitted to USC, EE-Systems, 2007. [82] I. D. Huang, S. K. Gupta, and M. A. Breuer, “Dynamic Timing Analysis”, USC technical report CENG-2004-06, 2004. [83] S. Irajpour, “Testing for Crosstalk- and Bridge-induced Delay Faults”, PhD Dissertation, submitted to USC, EE-Systems, 2007. [84] S. Irajpour, S.K. Gupta, and M. A. Breuer, “Timing-Independent Testing of Crosstalk in the Presence of Delay Producing Defects Using Surrogate Fault Models”, In International Test Conference, 2004, pp. 1024-1033. 174 [85] Y.-M. Jiang and K.-T. Cheng, “Exact and Approximate Estimations for Maximum Instantaneous Current of CMOS Circuits”, In Proceedings of Design Automation & Test in Europe, 1998, pp. 698-702. [86] Y.-M. Jiang, K.-T. Cheng and A. Krstic, “Estimation of Maximum Power and Instantaneous Current Using a Genetic Algorithm”, In Proceedings of IEEE Custom Integrated Circuits Conference, 1997, pp. 135-138. [87] A. B. Kahng and S. Muddu, “Efficient Gate Delay Modeling for Large Interconnect Loads”, In IEEE Multi-Chip Module Conference, 1996, pp. 202-207. [88] S. Kajihara, K. Kinoshita, I. Pomeranz and S. M. Reddy, “A Method for Identifying Robust Dependent and Functionally Unsensitizable Paths”, In International Conference on VLSI Design, pp. 82-87, 1996. [89] S. Kajihara, T. Shimono, I. Pomeranz and S. M. Reddy, “Enhanced Untestable Path Analysis Using Edge Graphs”, In Proceedings of Asian Test Symposium, pp. 139-144, 2000. [90] R. Kanj, R. Joshi and S. Nassif, “Mixture Importance Sampling and Its Application to the Analysis of SRAM Designs in the Presence of Rare Failure Events”, In Proceedings of Design Automation Conference, 2006, pp. 69-72. [91] F. Kashfi et al., “A 65nm 10GHz pipelined MAC structure”, In IEEE International Symposium on Circuits and Systems, 2008, pp. 460-463. [92] B. K. Kaushik, S. Sarkar, R. P. Agarwal and R. C. Joshi, “Waveform Analysis and Delay Prediction in Simultaneously Switching CMOS Gate Driven Inductively and Capacitively coupled On Chip Interconnects”, In IEEE Dallas Circuits and Systems Workshop, 2007, pp. 1-4. [93] I. Keller, K. H. Tam, and V. Kariat, “Challenges in Gate level Modeling for Delay and SI at 65nm and Below”, In Proceedings of Design Automation Conference, 2008, pp. 468-473. [94] R. Kendall, “Worst Case Analysis Methods for Electronic Circuits and Systems to Reduce Technical Risk and Improve System Reliability”, Technical Report INTUTIVE Research and Technology Corporation. [95] J. Ryan Kenny, “Prototyping advanced military radar systems”, Defense Tech Briefs, March 2008. [96] J. Keshava, N. Hakim, and C. Prudvi, “Post-silicon validation challenges: How EDA and academia can help”, In Proceedings of Design Automation Conference, 2010, pp. 3-7. 175 [97] S. V. Kodakara et al., “Model Based Test Generation for Microprocessor Architecture Validation”, In Proceedings of International Test Conference, 2007, pp. 465- 472. [98] A. Kokrady and C. P. Ravikumar, “Fast, Layout-Aware Validation of Test- Vectors for Nanometer-Related Timing Failures”, In Proceedings of International Conference on VLSI Design, 2004, pp. 597- 602. [99] A. Krstic and K. T. Cheng, “Generation of High Quality Tests for Functional Sensitizable Paths”, In Proceedings of VLSI Test Symposium, 1995, pp. 374-379. [100] A. Krstic et al., “Delay Testing Considering Crosstalk-Induced Effects”, In Proceedings of International Test Conference, 2001, pp. 558-567. [101] R. Kundu and R. D. Blanton, “Timed Test Generation for Crosstalk Switch Failures in Domino CMOS”, In Proceedings of VLSI Test Symposium, May. 2002, pp. 379-385. [102] J. Kwong, and A. P. Chandrakasan, “Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits”, In Proceedings of International Symposium on Low Power Electronics and Design, 2006, pp. 8-13. [103] W. K. Lam et al., “Delay fault Coverage, Test Set Size, and Performance Trade- offs”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 32-44, 1995. [104] J. Lee and M. Tehranipoor, “A Novel Test Pattern Generation Framework for Inducing Maximum Crosstalk Effects on Delay-Sensitive Paths”, In Proceedings of IEEE International Test Conference, 2008, pp. 1-10. [105] K. T. Lee, C. Nordquist, and J. A. Abraham, “Automatic Test Pattern Generation for Crosstalk Glitches in Digital Circuits,” In Proceedings of VLSI Test Symposium, 1998, pp. 34-39. [106] K. T. Lee, C. Nordquist, and J. A. Abraham, “Test Generation for Crosstalk Effects in VLSI Circuits”, In Proceedings of International Symposium On Circuits and Systems, May 1996, Vol. 4, pp. 628-631 [107] Y-K. Lee and D-S Wang, “A study on the techniques of estimating the probability of failure”, In Journal of Chungcheong Mathematical Society, 2008, pp. 573-583. [108] Y. Levendel and P.R. Menon, “Transition Faults in Combinational Circuits: Input Transition Test Generation and Fault Simulation”, In Proceedings of International Symposium On Fault-Tolerant Computing, 1986, pp. 278-283. 176 [109] H. Li, P. Shen and X. Li, “Robust Test Generation for Precise Crosstalk-induced Path Delay Faults”, In IEEE VLSI Test Symposium, 2006, pp. 300-305. [110] J. C. Lin and S. Reddy, “On Delay Fault Testing in Logic Circuits”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 6, Sept. 1987, pp. 694-703. [111] T. Lin et al., “Analytical Delay Variation Modeling for Evaluating Sub-threshold Synchronous/Asynchronous Designs,” In IEEE NEWCAS Conference, 2009, pp. 69-72. [112] Y. Lin and V. D. Agrawal, “Statistical Leakage and Timing Optimization for Submicron Process Variation”, In International Conference on VLSI Design, 2007, pp. 439-444. [113] J. Liou et al., “False Path Aware Statistical Timing Analysis and efficient path selection for delay testing and timing validation”, In Proceedings of Design Automation Conference, 2002, pp. 566-569. [114] J. Liou et al., “Modeling, testing, and analysis for delay defects and noise effects in deep submicron devices”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 22, Issue 6, 2003, pp. 756-769. [115] J. Liou et al., “Fast Statistical Timing Analysis by Probabilistic Event Propagation”, In Proceedings of Design Automation Conference, 2001, pp. 661- 666. [116] N. Lotze, J. Goppert and Y. Manoli, “Timing Modeling for Digital Sub-threshold Circuits”, In Proceedings of Design Automation and Test in Europe Conference, 2010, pp. 299-302. [117] X. Lu et al., “Longest Path Selection for Delay Test under Process Variations”, In Proceedings of Asia South Pacific Design Automation Conference, 2004, pp. 98- 103 [118] H. Luo et al., “Modeling of PMOS NBTI Effect considering Temperature Variation”, In Proceedings of International Symposium On Quality Electronic Design, 2007, pp. 139 – 144. [119] Y. K. Malaiya, M. N. Li, J. M. Bieman, and R. Karcich, “Software reliability growth with test coverage”, In IEEE Transactions on Reliability, Vol. 51, Issue 4, Dec. 2002, pp. 420-426. [120] D. Markovic et al., “Ultralow-Power Design in Near-Threshold Region”, In Proceedings of IEEE, Vol. 98, No. 2, 2010, pp. 237-252 177 [121] D. A. Mathaikutty, S. V. Kodakara, and A. Dingankar, “Design fault directed test generation for microprocessor validation”, In Design and Test in Europe, 2007. [122] P. Mishra and N. Dutt, “Functional Validation of Programmable Architectures”, In Proceedings of EUROMICRO Systems on Digital System Design, August, 2004, pp. 12-19. [123] S. Mitra, S. Seshia and N. Nicolici, “Post-silicon validation: Opportunities, challenges and recent advances”, In Proceedings of Design Automation Conference, 2010, pp. 12-17. [124] F. Moll and A. Rubio, “Detectability of Spurious Signals with Limited Propagation in Combinational Circuits”, In Proceedings of Asian Test Symposium, 1998, pp. 34-39. [125] F. Moll and A. Rubio, “Methodology of Detection of Spurious Signals in VLSI Circuits”, In Proceedings of European Test Conference, 1993, pp. 491-496. [126] A. Nahir et al., “Bridging pre-silicon verification and post-silicon validation”, In Proceedings of Design Automation Conference, 2010, pp. 94-95. [127] S. Naidu, “Timing Yield Calculation using an impulse-train approach”, In Proceedings of Asia Pacific Design Automation Conference, 2002, pp. 219-224. [128] S. Nassif, “Delay Variability: Sources, Impact and Trends”, In Proceedings of Solid-State Circuits Conference, 2000, pp. 368-369. [129] S. Natarajan, S. Patil and S. Chakravarty, “Path Delay Fault Simulation on Large Industrial Designs”, In Proceedings of VLSI Test Symposium, 2006, pp. 16-23. [130] S. Nazarian and M. Pedram, “Crosstalk Affected Delay Analysis in nanometer technologies”, In International Journal of Electronics, September, Vol. 95, No. 9, 2008, pp. 903-937. [131] J. Nyathi and B. Bero, “Logic Circuits Operating in Subthreshold Voltages”, In International Symposium on Low Power Electronic Design, 2006, pp. 131-134. [132] K. Okada et al., “A Statistical Gate Delay Model for Intra-chip and Inter-chip Variabilities,” In Proceedings of Asia and South Pacific Design Automation Conference, 2004, pp. 31-36. [133] Y. Osaki et al., “Delay-Compensation Techniques for Ultra-Low-Power Subthreshold CMOS Digital LSIs”, In IEEE International Midwest Symposium on Circuits and Systems, 2009, pp. 503-506. 178 [134] P. Patra, “On the cusp of a validation wall”, In IEEE Design & Test of Computers, Vol. 24, Mar-Apr 2007, Issue 2, pp. 193-196. [135] A. Pierzynska and S. Pilarski, “Non- Robust versus Robust”, In Proceedings of IEEE International Test Conference, 1995, pp. 124-131. [136] I. Pomeranz and S. M. Reddy, “Forming N-Detection Test Sets from One- Detection Test Sets Without Test Generation”, In Proceedings of International Test Conference, 2005, pp. 527-535. [137] I. Pomeranz, S. M. Reddy and P. Uppaluri, “NEST: a Non-Enumerative Test Generation Method for Path Delay Faults in Combinational Circuits”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 14, December 1995, pp. 1505-1515. [138] A. K. Pramanick and S. M. Reddy, “On the Computation of Ranges of Detected Delay Fault Sizes”, In Proceedings of International Conference on Computer Aided Design, 1989, pp. 126-129. [139] A. K. Pramanick and S. M. Reddy, “On the Detection of Delay Faults”, In Proceedings of International Test Conference, 1988, pp. 845-856. [140] W. Qiu and D. M. H. Walker, “An Efficient Algorithm for Finding the K Longest Testable Paths Through Each Gate in a Combinational Circuit”, In Proceedings of International Test Conference, pp. 592-601, 2003. [141] W. Qiu et al., “K Longest Paths Per Gate (KLPG) Test Generation for Scan- Based Sequential Circuits”, In Proceedings of International Test Conference, 2004, pp. 223-231. [142] R. Rao et al., “Statistical Analysis of Subthreshold Leakage Current for VLSI Circuits”, In IEEE Transactions on VLSI Systems, Vol. 12, No. 2, 2004, pp. 131- 139. [143] A. Raychowdhury et al., “Computing With Subthreshold Leakage: Device/Circuit/Architecture Co-Design for Ultralow-Power Subthreshold Operation”, In IEEE Transactions on VLSI Systems, Vol. 13, No. 11, 2005, pp. 1213-1224. [144] S. Reda and S. R. Nassif, “Analyzing the impact of Process Variations on parametric measurements: Novel models and Applications”, In Proceedings of Design Automation Conference, 2009, pp. 375-380. [145] N. J. Roherer et al., “PowerPC*970 in 130 nm and 90 nm technologies”, Digest of Technical Papers, IEEE Solid State Conference, 2004, pp. 68-69. 179 [146] R. Y. Rubinstein, “Simulation and the Monte Carlo Method”, New York: Wiley, 1981. [147] A. Rubio, N. Itazaki, X. Xu, and K. Kinoshita. “An Approach to the Analysis and Detection of Crosstalk Faults in Digital VLSI Circuits”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 13, Mar. 1994, pp. 387-394. [148] S. S. Sapatnekar, “Variability and Statistical Design”, In IPSJ Transactions on System LSI Design Methodology, Vol. 1, 2008, pp. 18-32. [149] C. Seger, “Integrating design and verification – from simple idea to practical system”, In IEEE/ACM International Conference on Formal Methods and Models for Co-Design, 2005. [150] M. Shao et al., “Explicit Gate Delay Model for Timing Evaluation”, In Proceedings of International Symposium on Physical Design, 2003, pp. 32-38. [151] K. Shimizu and D. L. Dill, “Deriving a Simulation Input Generator and a Coverage Metric from a Formal Specification”, In Proceedings of Design Automation Conference, 2002, pp. 801-806. [152] K. Shimizu and D. L. Dill, “Using Formal Specifications for Functional Validation of Hardware Designs”, In IEEE Transactions on Design & Test of Computers, Vol. 19, Issue 4, Aug. 2002, pp. 96-106. [153] L. G. Silva, Z. Zhu, J. R. Phillips, L. M. Silveira, “Variation-Aware, Library Compatible Delay Modeling Strategy”, In IFIP International Conference on Very Large Scale Integration, 2006, pp. 122-127. [154] M. Slimani, F. Silveira and P. Matherat, “Variability-speed-consumption trade-off in near threshold operation”, In Proceedings of PATMOS, 2011, pp. 308-316. [155] G. L. Smith, “Model for Delay Faults Based on Paths”, In Proceedings of International Test Conference, pp. 342-349, 1985. [156] U. Sparmann, D. Luxenburger, K. T. Cheng and S. M. Reddy, “Fast Identification of Robust Dependent Path Delay Faults”, In Proceedings of Design Automation Conference, 1995, pp. 119-125. [157] M. D. Springer, “The Algebra of Random Variables”, Wiley, 1979. [158] J. Sridharan and T. Chen, “Gate Delay Modeling with Multiple Input Switching for Static (Statistical) Timing Analysis”, In Proceedings of VLSI Design, 2006, pp. 323-328. 180 [159] C.L. Su et al., “Reducing Power Consumption at the Control Path of High Performance Microprocessors”, In IEEE Design and Test of Computers, 1994, pp.1-20. [160] C-Y Su and C-W Wu, “A probabilistic method for path delay fault testing”, In Journal of Information Science and Engineering, 2000, pp.783-794. [161] J. B. Sulistyo and D. S. Ha, “A new Characterization Method for Delay and Power Dissipation of Standard Library Cells”, In IEEE Journal of VLSI Design, 2002, pp. 667 - 678. [162] D. Tadesse et al., “Accurate Timing Analysis using SAT and Pattern-Dependent Delay Models”, In Design, Automation & Test in Europe Conference, 2007, pp. 1-6. [163] K. T. Tang, and E. G. Friedman, “Lumped Vs Distributed RC and RLC Interconnect Impedances,” In Proceedings of Midwest Symposium on Circuits and Systems, 2000, pp. 136 – 139. [164] P. A. Thaker, V. D. Agrawal and M. E. Zaghloul, “Validation Vector Grade (VVG): A New Coverage Metric for Validation and Test”, In Proceedings of VLSI Test Symposium, 1999, pp. 182-188. [165] J. R. Tolbert, and S. Mukhopadhyay, “Accurate Buffer Modeling with Skew Propagation in Subthreshold Circuits”, In International Symposium on Quality Electronic Design, 2009, pp. 91-96. [166] S. Tragoudas and D. Karayiannis, “A Fast Non-Enumerative Automatic Test Pattern Generator for Path Delay Faults”, In IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, Vol. 18, July 1999, pp. 1050- 1057. [167] S. Tsai and C. Huang, “A False-Path Aware Formal Static Timing Analyzer Considering Simultaneous Input Transitions”, In Proceedings of Design Automation Conference, 2009, pp. 25-30. [168] A. Valentian et al., “Modeling Subthreshold SOI Logic for Static Timing Analysis”, In IEEE Transactions on VLSI Systems, Vol. 12, No. 6, 2004, pp. 662- 668. [169] V. Veetil et al., “Efficient Smart Sampling based Full-Chip Leakage Analysis for Intra-Die Variation Considering State Dependence”, In Proceedings of Design Automation Conference, 2009, pp. 154-159. 181 [170] V. Veetil, K. Chopra, D. Blaauw, and D. Sylvester, “Fast Statistical Static Timing Analysis Using Smart Monte Carlo Techniques”, In IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 2011, pp. 852-865. [171] V. Veetil, D. Sylvester, and D. Blauuw, “Fast and Accurate Waveform Analysis with Current Source Models”, In Proceedings of International Symposium on Quality Electronic Design, 2008, pp. 53-56. [172] N. Verma, J. Kwong and A. P. Chandrakasan, “Nanometer MOSFET Variation in Minimum Energy Subthreshold Circuits”, In IEEE Transactions on Electronic Devices, Vol. 55, No. 1, 2008, pp.163-174. [173] B. Vermeulen and N. Nicolici, “Post-silicon validation and debug”, Embedded Tutorial (Session 10A), European Test Symposium, 2008. [174] C. Viswiswariah, “Death, Taxes and Failing Chips”, In Proceedings of Design Automation Conference, 2003, pp. 343-347. [175] J. A. Waicukauski et al., “Transition Fault Simulation by Parallel Pattern Single Fault Propagation”, In Proceedings of International Test Conference, 1986, pp. 542-549. [176] X. Wang, A. Kasnavi and H. Levy, “An efficient method for fast delay and SI calculation using current source models”, In Proceedings of International Symposium on Quality Electronic Design, 2008, pp. 58-61. [177] Z. Wang et al., “Dynamic Compaction for High Quality Delay Test,” In Proceedings of VLSI Test Symposium, 2008, pp. 243-248. [178] C. H-P. Wen et al., “On A Software Based Self-Test Methodology and Its Application”, In Proceedings of VLSI Test Symposium, 2005, pp. 107-113. [179] P. Woo, “Structured ASICs - A risk management tool”, D&R Industry Articles (www.desgn-reuse.com). [180] S. H. Wu, S. Chakravarty and L. C. Wang, “Impact of Multiple Input Switching on Delay Test under Process Variation”, In Proceedings of VLSI Test Symposium, 2010, pp. 87 - 92. [181] L. Xie et al., “Bound-based Identification of Timing-Violating Paths under Variability”, In Asia South Pacific Design Automation Conference, 2009, pp. 278-283. [182] L. Xie et al., “False Path Aware Timing Yield Estimation under Variability”, In VLSI Test Symposium, 2009, pp.161-166. 182 [183] S. Yanamanamanda et al., “Uncertainty modeling of Gate Delay considering multiple input switching”, In Proceedings of International Symposium on Circuits and Systems, 2005, Vol. 3, pp. 2457-2460. [184] J. Yuan et al., “Modeling Design Constraints and Biasing in Simulation Using BDDs”, In Proceedings of International Conf. of Computer Aided Design, 1999, pp. 584-589. [185] J. Zeng et al., “On correlating structural tests with functional tests for speed binning of high performance design”, In Proceedings of International Test Conference, 2004, pp. 31-37. [186] B. Zhai, S. Hanson, D. Blauuw, and D. Sylvester, “Analysis and Mitigation of Variability in Subthreshold Design”, In Proceedings of International Symposium on Low Power Electronics and Design, 2005, pp. 20-25. [187] Y. Zhan et al., “Statistical Critical Path Analysis Considering correlation”, In International Conference on Computer-Aided Design, 2005, pp. 699-704. [188] J. Zhang, N. Patil and S. Mitra, “Probabilistic Analysis and Design of Metallic- Carbon-Nanotube-Tolerant Digital Logic Circuits”, In IEEE Transactions on Computer Aided Design , Vol. 28, Issue 9, 2009, pp. 1307-1320. [189] L. Zhang, I. Ghosh and M. S. Hsiao, “A Framework for Automatic Design Validation of RTL Circuits Using ATPG and Observability-Enhanced Tag Coverage”, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, Issue 11, Nov. 2006, pp. 1550-1565. [190] M. Zhang, H. Li and X. Li, “Path Delay Test Generation Toward Activation of Worst Case Coupling Effects”, In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010 , Vol. 99, pp.1-14. [191] M. Zhang, H. Li, and X. Li, “Multiple Coupling Effects Oriented Path Delay Test Generation”, In IEEE VLSI Test Symposium, 2008, pp. 383-388. [192] C. Zhuo et al., “Active Learning Framework for Post-Silicon Variation Extraction and Test Cost Reduction”, In International Conference on Computer-Aided Design, 2010, pp. 508-515. [193] Collett ASIC/IC Verification Study, 2004. [194] International Technology Roadmap for semiconductors, 2012. 183 [195] Synopsys: Accelerating Innovation, “PrimeTime – Golden Timing Signoff Solution and Environment”, 2012, Retrieved May 5, 2012, from: http://www.synopsys.com/Tools/Implementation/SignOff/Documents/primetime_d s.pdf [196] Wolfram Mathworld, “Student’s t distribution”, Retrieved July 12, 2012, from: http://mathworld.wolfram.com/Studentst-Distribution.html 184 APPENDIX INCORPORATING RESILIENCE AND VALIDATION BUDGET As discussed in Chapter 2, an approach that is resilient must capture known and dominant gate delay phenomena as well as variability with manageable complexity and must compute tight timing ranges - including the worst case for at least (100 - ε)% of fabricated chips, where ε is user-specified validation budget and is determined by economic considerations - in an efficient manner. Also, the validation budget ε necessitates that our approach focuses on significant marginalities (present in at least (100 - ε)% of fabricated chips) as redesign driven by errors identified during validation is economically feasible only when marginalities threaten the economic viability of the design by causing a significant fraction of fabricated chips to have erroneous behavior (see Chapter 1). Let, ε is the total validation budget, i.e., the % of chips we can afford to ignore during validation. There are two major components of ε. (i) ε path – validation budget for limited path selection, i.e., the chips missed by our approach as we did not select all truly critical paths, and (ii) ε vector – validation budget for inadequate vectors, i.e., chips failing when our vectors are applied. Thus, ε is a function of ε path and ε vector , i.e., ε = fu 1 (ε path , ε vector ). Please note that ε vector = fu 2 (ε path ), since validation vectors are generated for the limited set of selected paths only. 185 Also, ε path has two components – (i) ε char - validation budget for limited characterization, i.e., chips with gates (or instances of gates) having delay larger than the bound (μ D + k*σ D ) considered during our Monte-Carlo based characterization. (ii) ε conv - validation budget for path delay analysis using truncated PDFs, i.e., chips with paths whose gate delays are within the bounds, but path delay is larger than the estimated path delay due to approximations in our timing analysis. Thus, ε path is a function of ε char and ε conv , i.e., ε path = fu 3 (ε char , ε conv ). Please note that ε conv = fu 4 (ε char ), since timing analysis will be performed for gate instances with limited characterization only. Please note that k can be determined from ε char using approaches from [90] and ε conv can be determined using techniques from [113][115]. Also, precise characterization of functions fu 1, fu 2, fu 3 and fu 4 is beyond the scope of this dissertation. Now we present the reasoning how resilience (in accordance with the validation budget ε) is incorporated into our whole framework and its key components, namely, delay characterization (ε char ), timing analysis (ε conv ), path selection (ε path ), and vector generation (ε vector ). We begin with the assumptions that the circuit simulator (Spectre), transistor and parasitics models (Spectre models), and variability models are golden, i.e., accurate and error free. A.1. Validation budget for resilient characterization ε char A resilient characterization approach must capture all known delay phenomena accurately. This necessitates that the input waveforms and initial conditions for which simulations are performed during characterization (using an accurate transistor-level 186 simulator) must be complete and the simulations performed must capture all corner cases. The final delay values are obtained explicitly, i.e., by using cases on which simulations are performed, or implicitly, i.e.., by using interpolation. Hence, for cases not explicitly simulated, when interpolation is used, it must not miss anything (especially a corner case). We begin by identifying a list of known delay phenomena and associated simulation scenarios [28][29]. We assume this list to be complete and proceed with our approach. Thus, the percentage of chips, where worst case is missed because of limited delay phenomena considered during characterization, must be accounted for in the validation budget for characterization, ε char . In Section 3.3.2 (Figure 3.3) we have shown our characterization setup to drive the gate inputs with a wide range of realistic waveforms with different attributes which are the key variables, namely, transition times, skews, and initial state of internal capacitances, of our timing functions, namely, delay, and transition time [28]. To account for effect of interconnect delays, the driver load can be extended from the simple capacitive load to advanced load models, such as single or multiple stage lumped or distributed RC[163] or RLC[92] loads, single or multiple stage π models of RC[87][130], or RLC loads[87][58]. Thus, our characterization setup can generate a set of complete and realistic input waveforms and in turn can reduce ε char . Also, in an empirical model, given the input waveforms, the golden (accurate) simulator with golden transistor level models takes care of all the relevant effects. Such simulation scenario is characterized by factors such as input vectors, input transition times, skews etc. The empirical method of [28][29] uses methods with varying waveform attributes to capture known delay phenomena and its effects in empirical models. This 187 method can be extended to account for other effects such as NBTI (Negative Bias Temperature Instability) [118], TDDB (Time Dependent Dielectric Breakdown) [41] by identifying and simulating the requirements for the targeted delay phenomena by input waveforms. Thus, our characterization setup can generate and capture all known gate delay phenomena and can be extended to emerging gate delay phenomena (again reducing ε char ). Our characterization and simulation setup allows us to simulate known gate delay phenomena along with variability (see Section 3.3.5 in Chapter 3) and we capture these in empirical equations of output delay and transition times as functions of input variables (skew, slew, initial state of internal capacitances). This approach takes advantage of the fact that only circuit parameters can be pre-characterized before simulation/test generation, only timing variables will change during the simulation/test generation process [28][29]. In [28][29], it is also shown that the empirical delay model explicitly bounds the delay effects associated with simulation scenarios with varying input vectors, input transition times, skews and the delay for initial state of internal capacitances. In [31] it has been validated that the 3 or 4 point approximations can accurately capture the general shape of delay curves for fixed value of transition times. Hence, the approximation associated with interpolation also has to be accounted for in ε char . The curve shown in Figure A.1 corresponds to max delay for the to-controlling case of a 2-input NAND gate. Figure A.1 shows how the maximum delay ‘X’ can be calculated for a particular gate from the delay model based on the three point approximations (timing functions) of Section 3.3.4. 188 Figure A.1: Calculation of max delay using three point approximations Note that the max delay X is a function of the three skew, delay pairs – (D0,S0), (D1,S1) and (D2, S2) - and skew Si and delay Di are, in turn, functions of transition time, relative skew, capacitive load, and the charge on the internal capacitances (see Section 3.3.4). For, the trivial cases of asymptotes with extreme values of skew between simultaneous transitions (S < S2 or S > S1), X can be directly calculated from D2 or D1, but for the intermediate range of S2 to S0 and S0 to S1, X is derived using linear interpolation (slope calculation [52]). In our approach, we bound all simulation results obtained during characterization using a pair of 3,4 point approximations (functions with 3 or 4 points (skew, delay)) as shown in Figure 3.8 and Figure 3.10 (Section 3.3.5). Though we explicitly simulate all known corner cases, we must also ensure that any interpolation used for cases not explicitly simulated, must result in delay values within the bounds ((μ ± k*σ) considered during our Monte-Carlo based characterization of the delay model). Note that a part of ε char determines k. Since the delay v/s skew function (the V shaped curve of Figure 3.10) is a well behaved function (continuous, differentiable, and monotonic in parts) due to the underlying physical phenomenon (multiple charge/discharge paths and the current load 189 added by Miller effect) being linear for small intervals of time (skews) [31], it can be bounded by the associated interpolation (linear approximation – straight line). Also, due to proximity of transistors in basic gates, the gate delay function is highly correlated with variability (variability does not alter the shape of gate delay v/s skew curve drastically). Note that our approach focuses on CMOS technology where process is mature and phenomenon mostly known with clear cut and mathematically analyzed models for components are available. Hence, our approach for empirical modeling is fairly complete. But in future, if our approach is to be applied to any technology without a prior knowledge about delay associated phenomenon, our empirical approach using comprehensive stimuli (for delay related effects and the effect of initial conditions) and related interpolation must be applied systematically and in accordance with validation budget ε char with appropriate caution. In Chapter 2, we mentioned that our approach focuses on significant marginalities as redesign based on validation learning is economically feasible only when marginalities threaten the economic viability of the design by causing a significant fraction i.e., at least (100 - ε)% of fabricated chips to have erroneous behavior, even in the absence of defects and even when the variations in the fabrication process (process variations) are within the normally expected levels, i.e., even when there is no abnormal process drift. Going along these lines, Let X denote the random variable representing the max delay for a particular gate g, and G denote the estimated max gate delay of gate g, derived using our approach and the comprehensive set of characterization attributes (type, transition times, skew, output load, state of internal capacitances). Thus, we estimate the 190 percentage of chips (in terms of probability) with gate g, whose worst case gate g ’s delay X will exceed the estimated value of G, where the probability of failure = prob(X ≥ G)*100. We perform Monte Carlo simulations to capture variability and then we calculate the probability of failure (prob(X ≥ G)) in our characterization step. Let X D0 , X D1 and X D2 denote the random variables representing the max delay value and let D0, D1, D2 denote the estimated max delay value (by our approach) for gate g for relative skew (between the transitions at input) values of S0, S1, and S2. We perform sufficiently large number of Monte Carlo simulations (see Section A.4) and map the resultant data into normal distributions X Di (with mean μ Di and variance σ Di ), where i = 0, 1, and 2. Note that the assumption of the max delay distribution being a normal distribution is a widely used one as the gate delay distribution under variability does follow the normal distribution [2][12][60][69][171][183][158] unlike some transistor parametric distributions under variability (L eff follows a log normal distribution) [12]. Also, Di can be estimated as (μ Di + k*σ Di ), where k must be selected in accordance to the component ε char of significant marginality. Thus, Monte Carlo simulations will provide us with the probability of failure at the corner points (prob(X Di ≥ Di)), where i = 0, 1, 2 as shown in Figure A.2. As mentioned earlier, the delay v/s skew function is a well behaved function and highly correlated with variability (just like the bounded curves in Figure A.3). Thus, for a chip if delay G exceeds the curve at any one point (say the corner point D1), it will 191 definitely exceed at all the other points (D2, D0 and the lines D1D0 and D2D0). Thus, the probability of failure can be approximated as prob (X ≥ G) = prob (X D1 ≥ G). Figure A.2: Estimating probability of failure at corner points of the three-point approximation for MIS-TC Our curve fitting approach represents min (D min ) and max (D max ) delays as functions of input variables – transition times (T R ), skew (δ) and charge on internal capacitances (N c ) (Sections 3.3.4 and 3.3.5). Let D SP denote the delay calculated by our golden simulator for realistic waveforms and ρ denote variability; we have put constraints on our curve fitting techniques such that the delays calculated (D min , D max ) by our approach bounds D SP (this being true for the cases explicitly simulated (D0, D1, D2) as well as any associated interpolation in the form of straight lines joining (D0, D1) and 192 (D0, D2)). Hence, D min (T R , δ, Nc ) ≤ D SP (T R , δ, N c , ρ) ≤ D max (TR, δ, N c ) for all realistic values of T R , δ, N c , and ρ (determined by our characterization setup). This is also true for output transition times. Figure A.3: Bounding approximations on the intertwined curves (hypothetical) Our bounding approach also takes care of cases where simulation results don’t follow the above simple pattern. Figure A.3 shows how bounding can be performed on a pair of intertwined waveforms using our constrained approach. But in practice, such cases don’t occur as due to extreme proximity of transistors in a basic gate, the simulated variation waveforms are highly correlated (as evident from our experiments as well as existing literature [77]). Hence, our empirical delay model with bounding approximation bounds the gate delay in accordance with ε char . Thus, we devise a method to estimate the probability of actual gate delay being more than the estimated gate delay (in accordance with validation budget ε char ) by our approach, and hence we present our claim to incorporating resilience while capturing variability during characterization. Later we will show how the gate delay PDFs will be used to arrive at the path delay PDFs and subsequently the percentage of chips with actual max delay greater than the estimated max delay. 193 Note that our approach for incorporating variability is predominantly based on Monte Carlo Analysis [94] performed by random selection of the parameters by a computer simulation program within the worst case limits. A Monte Carlo analysis, when used properly, can give the probability of a gate/circuit/design output characteristic within a given range, also known as confidence interval [146]. Monte Carlo sampling - the crux of any Monte Carlo Analysis, is one of the techniques used to estimate the probability of failure (P f ), i.e., the probability that the given design parameter exceeds a particular threshold. There are two predominant sampling techniques [107]. (i) Monte- Carlo sampling and (ii) Importance sampling. Monte Carlo sampling generally requires a lot of simulations for accurate estimation. As a counter measure, Importance sampling concentrates on the distribution of sampling points in the region of the highest importance that mainly contributes to failure probability and hence requires a sampling PDF. But, for importance sampling, the accuracy of the estimate depends on the choice of sampling function and an optimal sampling function is difficult to find [146][107]. Using the approach in [90], we can show that the standard Monte Carlo sampling is sufficient for capturing variability in a resilient delay model, if we are ready to ignore the rare failure events (events with probabilities less than a particular threshold i.e., ≤ 0.002). We can ignore such rare failure events because for delay marginality validation one must focus on effects that cause significant fraction of fabrication chips to have erroneous behavior. The analysis in [90] also shows that the number of simulations required in Monte Carlo sampling based approach to cover the 95% confidence interval ignoring rare failure events is in the order of thousands which is of the similar order of 194 what we have performed in our characterization. Moreover, industry standard library design files have a well established Monte-Carlo based variability analysis flow which can be easily incorporated with our characterization flow. Other techniques for variance reduction [90][170], such as Quasi Monte Carlo sampling, Stratified sampling and Latin Hypercube sampling, can be incorporated for reducing variance of Monte Carlo Estimators [146] but they are mostly beneficial in estimating tail probabilities [90] which is not central to our objective of delay marginality validation. Moreover, all these approaches suffer from the problem of dimensionality and slow convergence [170]. Also, all these approaches have been evaluated for Monte Carlo based SSTA [170] on a full chip level, but effect of such sampling methods on gate characterization (that cannot take advantage of topological information that a full chip block based SSTA can take) is yet to be evaluated. Also, the implementation of segmentation based divide and conquer approach (Chapter 6) without assuming independence between parameters, reduces the guarantee on capturing the worst case (and has to be accounted for in the validation budget ε char ) because a part of individual distribution is unaccounted for and subsequently the correlated component. A.2. Validation budget for resilient timing ε conv In timing analysis, arrival times of gate output signals are calculated based on the values of gate inputs. Also, the required time at a gate output may result in a timing requirement associated with arrival time at one of the gate’s inputs. We derive our transfer functions for forward and backward calculation based on the approach in [31] 195 which uses the bitonic relationship between delay and input transition time and explores all the possible transition times within the min-max range in the input transition time vs. delay curves to arrive at the output arrival time and output transition time ranges. Thus, the timing ranges calculated by our approach includes the worst case and are tight except for the inherent sources of looseness (that are either accounted for in ε char or will be accounted for in ε conv ) - partially specified vectors, variability, and looseness associated with approximations used to reduce complexity . The timing analysis will provide us with path delays which can then be used for essential tasks, such as path selection, and vector generation. However, the estimation of path delay is a function of arrival times and gate delays. Later, we will show how convolution of gate delay PDFs can provide us with the path delay PDF. But, instead of simple convolution on gate delay PDFs, we need to perform convolution on input arrival time PDFs and gate delay PDF to arrive at the output arrival time PDF which eventually results in the path delay PDF. For the simplicity of calculation we will assume that the delay dependence on transition time lies in the monotonic region (as shown by region (a) in Figure A.4 for to-controlling transition). This is a realistic assumption for all circuits where the designer does not allow extremely slow to rise or slow to fall transitions to occur inside the circuit. The looseness associated with this assumption needs to be accounted for in ε conv . Thus, the arrival time calculation function will reduce to a simple equation comprising of sum and maximum operators and the standard statistical method of convolution can be applied [12]. 196 Figure A.4: Max arrival time calculation for MIS-TC [28] We would like to bring to attention that convolution with truncated PDFs (missing tail probability) based on our Monte Carlo sampling (for significant marginalities) is a non trivial task and beyond the scope of this research. Methods proposed in [113][115][127], where gate delay is represented as a discrete distribution generated by sampling and normalizing the continuous distributions and convolution (discrete sum) operation is performed on these PDFs using shifting and scaling operators, can be used to perform the convolution under consideration. Moreover, techniques to bound the tail probabilities [38] can also be adapted to speed-up the calculation. Note that the approach of [31] shrinks timing ranges as more and more values become specified (in Section 5.3, we have proven the completeness of value system chosen). Also, backward implications provide timing requirements that all test vectors must satisfy while the required time on a line in timing analysis guarantees (in accordance with the validation budget of ε conv that accounts for all the associated approximations) that no test vector can excite the specified transition outside min-max range. Hence our arrival and transition timing ranges are indeed bounding and subsume the actual minimum and maximum arrival and transition times (again in accordance with ε conv ). 197 Let Y denote the estimated max delay for a given path π. Also, let the path comprise of n gates (g) with max delay of gate g i denoted by Xi. For a simple first order analysis Y = . Note that the PDF f(y) will be arrived at by performing convolution [160] of the PDFs f i (x) (see Figure A.5). Thus, the probability that a path exceeds the timing threshold will be given by Prob(Y ≥ T) = (discrete) or (continuous). Figure A.5: Convolution to arrive at path PDF using gate PDFs [160] A.3. Validation budget for resilient path selection ε path Now, consider that the design has p paths (logical), and based on the approach of [79], consider t paths are selected for testing based on the design specific clock period T and user specified timing threshold TT (where TT = ∆*T) – determined from validation budged ε path . Now, for each of the p-t paths not considered for testing, we will calculate the percentage of chips where the delay of these (p-t) paths can exceed the given clock period T on silicon. Table A.1 shows the validation budget due to limited path selection for testing (ε path ). It reports the percentage of chips failing validation because of limited path selection for vector generation. The percentage of such chips can be calculated as the cumulative sum of the probabilities of paths where the estimated max delay (Y) of the path is greater than clock period (T) condition to the fact that ETA (Chapter 3) obtained 198 delay (Z) of the path is less than timing threshold (TT). Here, Y i (Z i ) represents the max estimated (obtained from ETA) delay of a path π i (i = 1 to p-t) not selected for testing and T j (TT j ) (j = 1 to k) various values of clock periods (timing thresholds decided in accordance with ε path ). Table A.1: Percentage of chips affected by limited path selection T 1 . . T k P 1 Prob[(Y 1 ≥ T 1 )/(Z 1 < TT 1 )] . . Prob[(Y 1 ≥ T k )/(Z 1 < TT k )] P 2 Prob[(Y 2 ≥ T 1 )/(Z 2 < TT 1 )] . . Prob[(Y 2 ≥ T k )/(Z 2 < TT k )] . . . . . . . . . . P p-t Prob[(Y p-t ≥ T 1 )/ (Z p-t < TT 1 )] . . Prob[(Y p-t ≥ T k )/ (Z p-t < TT k )] % of chips [P 1 + P 2 + P 3 + … P p-t ] * 100 . . [P 1 + P 2 + P 3 + … P p-t ] * 100 Note that as mentioned earlier, instead of simple convolution on gate delay PDFs, we need to perform convolution on input arrival time PDFs and gate delay PDF to arrive at the output arrival time PDF which eventually results in the path delay (ε path = fu 3 (ε char , ε conv )). Thus, we can arrive at the conditional probability (Prob [(Y ≥ T)/(Z < TT)]) of Table A.1 and then estimate the max delay for the paths not selected for testing. A.4. Validation budget for resilient vector-space ε vector Now for the selected t paths, vectors will be generated using our timing oriented variation aware approach (Chapter 5). These generated vectors will be applied to a sample of chips (M) selected for validation. Mostly, simple probability sampling is used because the whole population of chips is not known and only a few chips are fabricated for validation (which is not at all indicative of the true population traits used in high 199 volume manufacturing). Note that the advanced methods of population sampling [9] such as stratified sampling, clustered sampling, or systematic sampling that are dependent on known population traits cannot be used. Assuming that the vector generation approach is complete, one can estimate the percentage of chips failing validation. Let m chips show delay related error, i.e., the delay for these chips exceed the clock period. Let D represents the max delay of the chip when vectors for validation are applied. Then, percentage of chips failing validation for the given sample is prob (D ≥ T) = m/M. Now, percentage of chips failing validation (in accordance with ε vector – validation budget due to chips failing on our validation vectors) for the complete population can be estimated using standard statistical population sampling techniques such as student’s t distribution [196]. A.5. Validation budget for resilient framework ε Thus, as mentioned earlier, the estimated validation budget (ε) accounting for significant marginalities will now comprise of: • ε path - percentage of chips not validated completely because of limited path selection but can have max delay greater than clock period (from Table A.1). Also, ε path must account for ε char and ε conv as mentioned earlier. • ε vector - percentage of chips failing validation tests when our vectors are applied. Based on this estimated validation budget, the design is either allowed to go for high volume manufacturing (within permissible value of ε) or redesign is performed and the whole validation procedure is repeated until the number of chips failing validation is small enough to indicate that mass production will economically viable (profitable). 200 Analytically proving the completeness of a vector generation approach is a challenging problem and one of the major reasons behind the non-existence of any form of statistical ATPGs [174]. Hence, in all existing statistical approaches [12][174], paths are statistically selected but vectors are deterministically generated in a timing independent manner. Our variation aware delay validation framework (Chapter 5) generates vectors using available timing information (arrival time ranges at on-path and off-path inputs) and path sensitization conditions are modified (considering only qualitative properties of gate delay phenomena of [28][29]) for maximum delay invocation. Approximations in incorporating variability (ignoring tails of gate delay distributions) may lead to certain inaccuracies in arrival time distributions and subsequently in vector generation. We are yet to devise, a methodology to accurately estimate this inaccuracy because the vectors generated using wider distributions (bigger tails) will only be non-inferior and not necessarily superior.
Abstract (if available)
Abstract
Despite advances in design and verification, it is becoming increasingly common for high-performance designs to misbehave on silicon. This is due to performance issues, such as, timing bugs, which cause significant percentage of fabricated chips to have delay failures that are first discovered only during post-silicon validation. Delay marginality is one such important variation induced timing bug that is often missed by existing validation approaches, as it changes delay to produce errors in significant fraction of fabricated chips, even in the absence of defects and even when the variations in the fabrication process are within the normally expected levels, i.e., even when there is no abnormal process drift. Thus, it becomes imperative to improve quality of validation with respect to delay marginalities. ❧ One obvious approach to achieve this would have been to adapt and apply the existing delay testing approaches (where the generated vectors are applied to a fabricated copy of chip to measure delay) to generate vectors for delay validation, i.e., vectors that are guaranteed to expose all the delay marginalities. But, several recent case studies using fabricated chips show that existing path delay testing approaches generate vectors that fail to expose the delay marginalities on silicon. The primary reason for such an anomaly can be attributed to the inability of existing approaches to account for the effect of advanced delay phenomena such as Multiple Input Switching (MIS) and process variations on delay. This necessitates the development of a variation aware framework that can provide an accurate estimate of the post-silicon delay. ❧ The goal of this dissertation is to develop new models and methodologies to accurately estimate the delay of any given fabricated copy of a design. We identify key strengths of validation and formulate our problem of discovering delay marginalities. We also identify the major phenomena affecting gate delay namely Multiple Input Switching (MIS) and successfully develop new models (based on a new notion of bounding approximations) and methods that account for inaccuracies and variability at practical run-time complexities and are suitable for both pre- and post-silicon timing tasks. We show that compared to existing delay models, our delay models are much more accurate and provide order of magnitude reductions in runtime. Subsequently, we adapt and significantly extend existing delay testing approaches to generate vectors for delay validation, i.e., vectors that are guaranteed to detect the worst-case delay for a significant fraction (within a user-specified validation budget) of fabricated chips. We show that our vectors, being MIS and variation aware, invoke much higher delay than vectors generated by existing vector generation approaches. Finally, we developed a variability aware divide and conquer based method for efficient post-silicon validation that gives substantial reduction in test-application time and hence the test-cost. Our experimental results demonstrate that we have developed the first variation aware resilient framework for post-silicon delay validation.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Trustworthiness of integrated circuits: a new testing framework for hardware Trojans
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Optimal redundancy design for CMOS and post‐CMOS technologies
PDF
Production-level test issues in delay line based asynchronous designs
PDF
Structural delay testing of latch-based high-speed circuits with time borrowing
PDF
An asynchronous resilient circuit template and automated design flow
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Average-case performance analysis and optimization of conditional asynchronous circuits
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Neuromorphic motion sensing circuits in a silicon retina
PDF
Backpressure delay enhancement for encounter-based mobile networks while sustaining throughput optimality
PDF
Silicon photonics integrated circuits for analog and digital optical signal processing
PDF
Defect-tolerance framework for general purpose processors
PDF
QoS-aware algorithm design for distributed systems
PDF
A novel hybrid probabilistic framework for model validation
Asset Metadata
Creator
Das, Prasanjeet
(author)
Core Title
A variation aware resilient framework for post-silicon delay validation of high performance circuits
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/14/2013
Defense Date
01/10/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
delay model,delay validation,marginality,multiple input switching,OAI-PMH Harvest,post-silicon,process variations
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep K. (
committee chair
), Golubchik, Leana (
committee member
), Pedram, Massoud (
committee member
)
Creator Email
prasan1400@gmail.com,prasanjd@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-219698
Unique identifier
UC11293712
Identifier
usctheses-c3-219698 (legacy record id)
Legacy Identifier
etd-DasPrasanj-1435.pdf
Dmrecord
219698
Document Type
Dissertation
Rights
Das, Prasanjeet
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
delay model
delay validation
marginality
multiple input switching
post-silicon
process variations