Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Timing and power analysis of CMOS logic cells under noisy inputs
(USC Thesis Other)
Timing and power analysis of CMOS logic cells under noisy inputs
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TIMING AND POWER ANALYSIS OF CMOS LOGIC CELLS UNDER NOISY INPUTS by Hanif Fatemi A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2007 Copyright 2007 Hanif Fatemi ii Dedication To my beloved wife “Soraya”, to my dear mother “Fatemeh” and to the memory of my father “Ali” who will always be in my heart and my mind iii Acknowledgments I would like to gratefully thank my advisor, Professor Massoud Pedram, for his exceptional advisement, continuous guidance, and inspiration during the past five years. I owe a great deal to him for all the valuable technical and non-technical lessons that he taught me. I spent many hours thinking how to thank my dear wife Soraya for her never ending support and love. Words fall short in describing her kindness and understanding. I will always be gratefully indebt to her for all the sacrifices she made during these years that I was studying and working towards my PhD. I would like to thank my dear mother for all the love, devotion and encouragement she gave me from the day I was born to this day. She has always been a constant and loving support in every difficult moment of my life. I would like to thank my dear grandmother, my dear sister, Maryam, and my beloved nephew, Mohammad, for all of their encouragement and support. I would also like to thank Kourosh Tavakoli for being a true friend. The memory of my father has always been in my heart and mind. Knowing that he will be happy for my achievements has been one of the biggest motivations in my life. iv Table of Contents Dedication ....................................................................................................................ii Acknowledgments.......................................................................................................iii List of Figures .............................................................................................................vi List of Tables ..............................................................................................................ix Abstract .......................................................................................................................x CHATER 1: INTRODUCTION ...............................................................................1 1.1 Semiconductor Industry: A Brief Overview .................................................1 1.2 Integrated Circuit Design Flow.....................................................................2 1.3 Dissertation Contribution..............................................................................9 1.4 Motivation...................................................................................................10 1.5 Outline of the Dissertation ..........................................................................11 CHAPTER 2: LOGIC CELL DELAY ANALYSIS UNDER NOISY INPUT WAVEFORMS ............................................................................13 2.1 Introduction.................................................................................................14 2.2 Background on Current-based Cell Delay Modeling..................................17 2.3 The Proposed Current-Based Cell Delay Model.........................................19 2.4 Experimental Results..................................................................................21 2.5 Summary.....................................................................................................27 CHAPTER 3: A CURRENT-BASED METHOD FOR SHORT CIRCUIT POWER CALCULATION UNDER NOISY INPUT WAVEFORMS ............................................................................29 3.1 Introduction.................................................................................................30 3.2 Short Circuit Energy calculation Approaches.............................................33 3.3 CSPC: Current-based Short-circuit Power Calculator................................35 3.4 Experimental results....................................................................................37 3.5 Summary.....................................................................................................46 CHAPTER 4: A CURRENT SOURCE MODEL FOR CMOS LOGIC CELLS CONSIDERING MULTIPLE INPUT SWITCHING AND STACK EFFECT ............................................................................47 4.1 Introduction.................................................................................................47 4.2 Stack Effect in Multiple Input Switching ...................................................49 4.3 CS Modeling – Multiple Input Switching...................................................52 4.3.1 Modeling the Internal Nodes..................................................................53 v 4.3.2 MSCSM: Pre-characterization...............................................................54 4.3.3 Voltage Calculation................................................................................57 4.4 Experimental Results..................................................................................59 4.5 Summary.....................................................................................................62 CHAPTER 5: A CURRENT SOURCE MODEL FOR CMOS LATCHES AND FLIP-FLOPS ...............................................................64 5.1 Introduction.................................................................................................64 5.2 Current Source Modeling for Latches.........................................................66 5.2.1 Steady-state transparent mode (CLK=1)................................................70 5.2.2 Steady-state hold mode (CLK=0) ..........................................................71 5.2.3 Transition mode (switching CLK) .........................................................72 5.3 CSM for sequential cells: Pre-characterization...........................................74 5.3.1 Non-linear Voltage-Controlled Current Sources ...................................74 5.3.2 Non-linear Voltage-Dependent Capacitances........................................75 5.4 Voltage Calculation based on the CSM sequential cells ............................77 5.4.1 Computing V Q (t k+1 )................................................................................80 5.4.2 Computing V Q_bar (t k+1 ) ...........................................................................80 5.5 Current Source Modeling for Master-Slave Flip-Flops ..............................82 5.6 Current Source Modeling for SR Latches...................................................83 5.7 Experimental Results..................................................................................84 5.8 Summary.....................................................................................................88 CHAPTER 6: POWER OPTIMAL MTCMOS REPEATER INSERTION FOR GLOBAL BUSES.....................................................89 6.1 Introduction.................................................................................................89 6.2 Delay Model................................................................................................91 6.3 Power Dissipation Model............................................................................94 6.3.1 Switching Power Dissipation .................................................................94 6.3.2 Short-Circuit (SC) Power Dissipation....................................................97 6.3.3 Leakage Power Dissipation....................................................................99 6.3.4 Average Power Dissipation..................................................................101 6.4 Power Optimization for MTCMOS Design..............................................102 6.4.1 Power and Delay Modeling..................................................................102 6.4.2 Sleep Signal Delivery Circuitry ...........................................................104 6.4.3 Problem Formulation...........................................................................105 6.5 Experimental Results................................................................................107 6.6 Summary...................................................................................................110 CHAPTER 7: CONCLUSIONS ...........................................................................111 7.1 Summary of Contributions and Applications............................................111 7.2 Possible Extensions...................................................................................115 REFERENCES.......................................................................................................117 vi List of Figures Figure 1-1. Standard IC Design Flow ..........................................................................3 Figure 2-1. A voltage-based (Elmore-based) method pessimism in delay calculation ...................................................................................................15 Figure 2-2. Our proposed current-based circuit model of a logic cell. ......................19 Figure 2-3. The actual and equivalent waveforms by our model for some crosstalk-induced noisy waveforms............................................................22 Figure 2-4. Comparison between our model vs. HSPICE for minimum size inverter in (a) and (b) and minimum size NAND2 in (c), given single (a), double (b), triple aggressor (c) crosstalk-induced noisy waveforms. ........................................................................................24 Figure 2-5. (a) Absolute delay errors in calculated delays vs. Spice for an inverter size x (b) Delay accuracy improvement by our model over the KTV model [35]...........................................................................25 Figure 2-6. (a) Absolute delay errors in calculated delays vs. Spice for a AOI22 size 10x (b) Delay accuracy improvement......................................26 Figure 2-7. Waveform similarity (mean square error) comparison to HSPICE for our model and the KTV model for (a) inverter (b)AOI22.....................................................................................................27 Figure 3-1. The short circuit current measurement during cell pre- characterization ...........................................................................................37 Figure 3-2. Comparison between CSPC vs. HSPICE for minimum size inverter in (a) and (b) and minimum size NAND2 in (c), given single (a), double (b), triple aggressor (c) crosstalk-induced noisy waveforms. ........................................................................................40 Figure 3-3. Energy dissipation for different noisy inputs. .........................................41 Figure 3-4. Absolute short circuit energy calculation error vs. HSPICE for an AOI22 size 10x under noisy waveforms................................................43 Figure 3-5. HSPICE and CSPC waveforms for the example of a glitch....................45 vii Figure 3-6. Short circuit energy calculation errors of CSPC vs. HSPICE for an AOI22 size 10x under glitches .........................................................45 Figure 4-1. (a) Transistor level diagram of a NOR logic cell. (b) Two identical NOR logic cells. The first input transitions of the logic cells are different, while the second ones are identical...............................49 Figure 4-2. The voltage of internal nodes. .................................................................51 Figure 4-3. Change in the output voltage as a result of the stack effect (only the second transition is shown in the figure). The difference in delay of two NOR2 cells is 23.2% ........................................52 Figure 4-4. A simple MIS for NAND2 logic cell without internal nodes. ................53 Figure 4-5. Internal node modeling............................................................................54 Figure 4-6. Complete model for MSCSM .................................................................54 Figure 4-7. MSCSM waveforms compared to HSPICE simulations for fast and slow cases shown in Figure 4-1 ...........................................................60 Figure 4-8. Using MSCSM to accurately model glitches ..........................................60 Figure 4-9. The internal node voltage waveform by using HSPICE and the proposed model for the glitch example shown in Figure 4-8 .....................60 Figure 4-10. Effect of multiple inputs switching, inputs A and B (In1 and In2) are changing simultaneously. Input A is coupled with an aggressor. ....................................................................................................62 Figure 4-11. Delay error vs. noise injection time......................................................62 Figure 5-1. A positive level-sensitive CMOS latch. ..................................................67 Figure 5-2. (a) A pass transistor, (b) its 4-D CSM, (c) its 3-D CSM (for node G, The same model is used w.r.t. G_bar, (d) the decoupled version of the 3-D CSM..............................................................................69 Figure 5-3. (a) Latch of Figure 5-1 in transparent mode, (b) Its CSM. .....................71 Figure 5-4. (a) Latch of Figure 5-1 in hold mode, (b) Its CSM. ................................72 Figure 5-5. CSM for the CMOS latch in Figure 5-1..................................................73 Figure 5-6. I Q (V Q ,V Q_bar ,V CLK ) and I Q_bar (V Q ,V Q_bar ) characterization......................75 viii Figure 5-7. A positive edge triggered flip-flop ..........................................................82 Figure 5-8. (a) NAND-based SR latch (b) CSM for SR latch ...................................84 Figure 5-9. Crosstalk induced noise at D input..........................................................86 Figure 6-1. Buffer model ...........................................................................................92 Figure 6-2. One stage of repeaters with interconnect model .....................................92 Figure 6-3. The model for one stage of two adjacent coupled bus lines....................95 Figure 6-4. Sharing of sleep transistors among different bus lines..........................101 Figure 6-5. Using asymmetric inverters in the sleep signal delivery circuitry .....................................................................................................105 ix List of Tables Table 1. Runtime and error comparison between CSPC and HSPICE......................44 Table 2. Waveform similarity (Normalized RMSE) comparison with HSPICE.......................................................................................................87 Table 3. Waveform similarity (Normalized RMSE) comparison with HSPICE for different cells in different technology ....................................88 Table 4: Probability of different switching scenarios on the coupling capacitances ................................................................................................96 Table 5: Technology Parameters Used in the Simulation Setup..............................109 Table 6: Power consumption results for different designs activity mode factor χ. Frequency=1GHz. ......................................................................109 Table 7: Power consumption results for different delay penalties. Frequency=1GHz, L=10mm, χ=10% .......................................................109 Table 8: Design parameters for the optimized MTCMOS design. Frequency=1GHz, L=10mm, χ=10% .......................................................110 Table 9: Comparing the proposed technique with a two-step approach to design MTCMOS repeaters ......................................................................110 x Abstract This dissertation investigates the effect of capacitive crosstalk on the behavior of CMOS cells and presents a new cell modeling technique for the purpose of noise, delay and power analysis. In particular, a current-based logic cell model for cell timing analysis in the presence of crosstalk-induced noisy inputs is introduced. This model enables accurate calculation of the electrical waveform of the cell output under noise-affected input waveforms of arbitrary shapes. This current source (CS) model is subsequently extended to handle multiple input switching (MIS) while considering the effect of internal node voltages of the transistor stacks in the cell (a.k.a. the stack effect). Application of the proposed CS model for short-circuit power analysis is presented. In addition, a CS model for CMOS register cells i.e., latches and master-slave flip-flops is described. Experimental results for the proposed CS models demonstrate close-to-SPICE accuracy with up to 3 orders of magnitude speedup compared to HSpice. The scope of this dissertation is not limited to delay and power analyses. Indeed, this dissertation also investigates the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise and subject to delay constraints. 1 1. CHATER 1: INTRODUCTION The continuous quest for high-performance and low power circuits has resulted in the technology scaling of ICs, which has become the driving force behind the prosperity of the semiconductor industry. The first semiconductor chips held one transistor each. Subsequent advances added more and more transistors, and as a consequence more functions and more complex functionality were integrated on the chip. The exponential growth of the semiconductor industry over the past few decades has revolutionized every facet of our lives. All this has been made possible by aggressive scaling of the metal-oxide semiconductor (MOS) transistor feature size with every technology generation. 1.1 Semiconductor Industry: A Brief Overview In 1965, Gordon Moore observed that plotting the number of transistors that can be most economically manufactured on a chip gives a straight line on a semi- logarithmic scale [65] . At the time he found that the transistor count is doubling every 18 months. This observation by Gordon Moore has been called Moore’s law and has become a self-fulfilling prophecy. A corollary of Moore’s law is that transistors become faster, consume less power, and are cheaper to manufacture each year. These improvements have been accelerated in the recent years. In 1971, Intel 4004 microprocessor used transistor with minimum dimension of 10µm. Thirty two years later, in 2003, Intel Pentium 4 used transistor with minimum dimension of 2 130nm. This shows two orders of magnitude improvement over three decades. This physical scaling can not continue forever when we approach to the size of the individual atoms. However, many predictions of the limits to scaling have already proven to be wrong. Every new technology generation resulting from scaling, led to an exponential increase in the on-chip device density, faster chips and reduced cost per usable transistor. In the state-of-the-art (sub-100nm) technologies, there are about half a billion CMOS devices on the chip. Integrated circuits with a large number of CMOS devices are referred to as the VLSI (Very Large Scale Integration) circuits. Technology scaling in the sub-100nm regime has resulted in significant IC fabrication challenges, as manufacturing variations and tolerances in the process parameters do not scale proportionally with the device and wire sizes. As VLSI transistor counts have grown exponentially, designers tend to rely on increasing level of automation in the format of computer-aided design (CAD) tools to gain more productivity and therefore to catch up with the growing demand of semiconductor industry. In the following sub-section we review the standard circuit design methodology followed in the industry and the important role played by the CAD tools in the design flow. 1.2 Integrated Circuit Design Flow A design flow is a set of optimizations and tool invocations that allows designer to progress from a specification for a design to the final chip implementation. Due to to 3 the high-complexity of the ICs and the reduced time to market, CAD tools have become an inevitable part of every step in the design process, starting from requirements and specifications of a design to logic synthesis, to place and route, to final fabrication. Each step in the design flow involves sophisticated CAD tools that have evolved over the past few decades. Figure 1-1 shows a simplified flowchart of typical sequence of steps practiced in the semiconductor industry for the design of an Application Specific IC (ASIC). RTL Description Synthesis Floorplanning Design Entry Timing, SI, Power Analysis and Optimization Timing, Power and Area Constraints Static Timing Analysis Signal Integrity Analysis Power Analysis Design for Testability Not OK Not OK Place & Route (Layout) Fabrication Static Timing Analysis Signal Integrity Analysis Power Analysis Design for Testability Timing, SI, Power Analysis and Optimization Sign off Not OK Not OK RTL Description Synthesis Floorplanning Design Entry Timing, SI, Power Analysis and Optimization Timing, Power and Area Constraints Static Timing Analysis Signal Integrity Analysis Power Analysis Design for Testability Not OK Not OK Place & Route (Layout) Fabrication Static Timing Analysis Signal Integrity Analysis Power Analysis Design for Testability Timing, SI, Power Analysis and Optimization Sign off Not OK Not OK Figure 1-1. Standard IC Design Flow 4 The starting point of an integrated circuit design is the design entry, also known as the design specification. The design specification describes the external interface of the design, its inputs and outputs as well as the functional and temporal relationship between the inputs and the outputs. The designer uses a Hardware Description Language (HDL) and possibly behavioral/architectural synthesis to transform this design specification into Register Transfer Logic (RTL). The next step is to synthesize the RTL description and transform it into generic gates and registers, then optimizing the logic to improve speed, power dissipation, and area. Obtaining the area, power and delay estimates of the resulting circuit requires prediction of the layout of the design. The layout of the design specifies the desired location of the logic gates on a fabricated chip. Floorplanning is the first step in the design flow where an approximate of the design layout is generated. Rather than placing a design flat (all cells at the same level of hierarchy), modules are clustered in different areas which is chosen by the communication needs of these modules. Timing analysis checks to verify if the delay of the synthesized circuit (from the previous design flow steps) meets the design requirement. Timing can be verified using a timing simulator such as HSpice. It accounts for various physical effects and interactions between different circuit components and accurately analyzes the circuit delay and power. However, as the circuit size increases the amount of time taken by Spice to perform even one simulation can be a very long. Static Timing Analysis (STA), on the other hand can evaluate the delay of timing paths in an efficient 5 manner. STA is a method used to compute the nominal timing of a circuit without performing extensive circuit simulation using HSpice. The term static in STA refers to the fact that the timing analysis is done independent of the logic values at the inputs. STA relies on simplified delay models for the gates and interconnects to compute the circuit delay. Because of its accuracy and efficiency, STA has become inevitable for verifying the timing of synchronous digital integrated circuits [36]. The inputs to the timing analyzer at this step of the design flow are derived from the basic timing of the library gates due to intrinsic gate delays and routing loads that can be estimated or derived from the floorplaning and global routing data. STA checks for both max and min delay. Max-delay analysis is done for checking if all flip-flops meet their setup time test. Setup time is the time needed for the data to be stable before the edge of the clock. Min-delay analysis is used for the hold time test. A different type of error, known as the hold time violation, occurs if the data at the input of the flip flop does not remain stable after the clock has arrived. Based on the required clock frequency, these two errors impose constraints on the delay of the combinational block. STA is used to check whether the delay of the combinational circuit satisfies these constraints and that the sequential circuit works correctly [36][57]. To estimate the delay of a circuit using STA, the circuit is transformed into a circuit graph such that each node in the graph corresponds to a gate and an edge in the graph corresponds to the connections between gates. The delay of the longest path in the circuit determines the circuit delay. Algorithms for finding the longest path in a graph can be used to identify the longest path in the circuit and thus the 6 circuit delay. The result of STA is a set of critical nodes and paths which do not satisfy the timing. The designer can then optimize these critical gates and paths to reduce their delay and meet the timing constraints [13][57][64][65]. Among the industrial STA CAD tools, PrimeTime from Synopsys is commonly used for different steps (and mostly for the sign off) of the design flow. Assuring signal integrity (SI) is done by electronic circuit tools and techniques that ensure electrical signals are of sufficient quality for proper operation. Signal integrity analysis ensures that the delay of the metal wires of a synthesized circuit meets the specified delay constraints in the presence of noise sources. SI also performs the noise analysis to ensure that the noise is within an acceptable range. Signal Integrity tools attempt to identify and remove effects that cause a design to malfunction. The noise sources arise due to switching in the neighboring metal wires and due to the capacitive coupling that exists between neighboring metal wires of a circuit. In CMOS technologies, this is primarily due to coupling capacitance, but in general it may happen by mutual inductance, substrate coupling, non-ideal gate operation, and other sources. As technology scales down, the wires are getting fatter (to reduce the sheet resistance of the metal lines) and are laid out closer to each other. As a result, coupling capacitance between adjacent lines increases and coupling noise becomes a major concern. This requires sophisticated tools and techniques to overcome this issue. Signal integrity analysis also insures that the voltage drop at the power network and the ground bounce in the ground network are in a reasonable range. Power 7 supply noise involves IR drop and Ldi/dt drop. These power and ground voltage change can result in performance degradation in the VLSI circuits. In general induced noise can have many drastic consequences for digital designs. It can make the design work incorrectly in some cases, or even fail completely. It can also make the design slower than expected. The cost of such a failure is very high, and includes photo-mask costs, engineering costs and opportunity cost due to delayed product introduction. Therefore electronic design automation tools have been developed to analyze, prevent, and correct these problems. PTSI (PrimeTime Signal Integrity) from Synopsys and Celtic from Cadence are two of the most commonly used SI analysis tools [57][13][64]. However there is ongoing research and development effort in CAD community for improving noise and signal integrity analysis tools. Power Analysis is an important step for designing VLSI circuits. Power consumption depends on switching activity of the gates. Power analysis can be performed for a particular set of test vectors by running a simulator and evaluating the total capacitance switched in the circuit at each clock transition. At each step of the design flow, if the power consumption is too high, the designer will typically have to revisit a prior design step to resolve the issue. Among the commercial power analysis tools PrimePower and Powermill from Synopsys are commonly used. Timing, signal integrity and power analysis are often part of an iterative process, which ensures that the synthesized circuit meets the specified design constraints. If the constraints are not met, typically the synthesis process is repeated or the layout is modified to meet the design constraints [28][41][42][65]. 8 Place and Route is the step in designing ICs during which a layout of a larger block of the circuit is created from layouts of the smaller blocks. Layout gives the detailed information about the location of gates of the synthesized circuit. Usually, the design flow requires an iterative approach for finalizing the layout. Signal integrity, timing and power analysis are revisited using the detailed information obtained by RC extraction from the layout information. The iterative approach is performed to make sure that the design meets the power and timing constraints. When the constraints are met, the layout data can be sent for fabrication process. Astro from Synopsys is among the commercial tools for this step of the design flow [21]. Semiconductor device fabrication is the process which creates IC chips. It involves multiple sequences of photographic and chemical processing steps in which the electronic device is created on a wafer made of a semiconductor material. Silicon is the most common material in today fabrication process. As mentioned above, CAD tools are the fundamental means for different steps of IC design flow. There has been continuous effort in both academia and the electronic design automation (EDA) industry to improve the quality of the CAD tools in the form of performance, capacity (memory requirement) and accuracy of the results. New techniques and tools are being developed to address the aggressive technology scaling. This dissertation investigates the effect of capacitive crosstalk on the behavior of CMOS cells and presents a new cell modeling technique for the purpose of noise, 9 delay and power analysis. In the following chapters, we address the shortcoming of the current timing, noise and power analysis tools in the presence of noisy waveforms and propose a new current-based cell modeling to address those shortcomings. The proposed cell modeling in this dissertation can be used in timing and signal integrity as well as power analysis CAD tools. 1.3 Dissertation Contribution This dissertation introduces a current-based cell modeling approach for the purpose of cell delay, noise and power analysis. The non-linear behavior and cell parasitic effects inside a logic cell are pre-characterized to overcome the shortcoming of the previous techniques in modeling such effects. The high accuracy of this model in constructing the output voltage waveform from the noisy input enables one to tackle a number of important applications. Application of the proposed computational engine to power analysis will be discussed in the form of a current-based short circuit power calculator (CSPC) which can construct the actual shape of the short circuit current waveform for any type of input voltage waveform, including glitches. The accuracy improvement of the proposed CS model over the existing approaches is significant. We will also show how to extend this model to handle multiple input switching (MIS) considering the effect of internal node voltage (stack effect). In addition, a current source model for some CMOS latches and flip-flops will be presented. Finally we will address the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. More precisely, an MTCMOS 10 design technique will be employed whereby high-V th sleep transistors are inserted between the virtual supply and the actual supply to reduce the leakage power consumption in the idle mode of circuit operation. Next, the repeater sizes, repeater distances, and the size of the sleep transistors are concurrently calculated so as to minimize the total power dissipation. In the problem formulation, the effect of crosstalk coupling capacitance on propagation delay and (switching and short circuit) power dissipation is considered 1.4 Motivation The shape of the voltage waveforms should be considered in order to achieve accurate timing and noise analysis results in sub-90nm CMOS designs. Conventional analysis tools model the voltage waveform of a circuit node with a single reference point, i.e., a signal arrival time and a constant rise/fall slope (transition time.) This implies that the waveform that is subjected to crosstalk noise is essentially modeled by a saturated ramp. The key fact about the shape of a voltage waveform is that different waveforms with identical arrival times and slews applied to the input of a logic cell (or at the near end of an interconnect line) can result in very different propagation delays through that cell (or at the far end of the line.) Therefore, as the signal integrity problems such as the crosstalk-induced slowdown or glitches become major concerns of circuit designers in the sub-90nm regime, use of a saturated ramp to convey the signal waveform information starts to seriously impact the quality and robustness of timing and noise analysis tools. Therefore a fast method to analyze the 11 effect of crosstalk noise on the output voltage waveform and consequently the cell delay is very desirable. High accuracy of the CS models makes them attractive for employment inside a signoff timing analysis tool. Once a set of critical paths is identified by a standard static timing analysis tool, CS models of logic cells along a target critical path may be utilized to provide an accurate, yet highly efficient, evaluation of the timing criticality and/or noise susceptibility of the path in question. Close-to-SPICE accuracy with orders of magnitude higher speed than SPICE tools, make the CSM-based analysis very attractive. The scope of this thesis is not limited to delay and power analyses in the presence of the crosstalk. The effect of crosstalk noise on delay and power analysis should be considered in the context of delay-constrained power-optimization problems. One such problem is power-optimal repeater insertion for global buses in the presence of crosstalk noise. 1.5 Outline of the Dissertation This dissertation consists of eight chapters. This chapter introduced the introduction and the motivation of this research. Chapter 2 explains logic cell delay analysis methods under noisy input waveforms. Our current-based logic cell model is discussed and experimental results to support the accuracy of the proposed method are presented. The high accuracy of the proposed model in chapter 2 leads us to different applications which are presented in the following chapters. In chapter 3, a short circuit energy calculator based on our logic cell model is proposed. In Chapter 12 4 a current source model for multiple inputs switching is presented which is capable of considering the effect of internal node voltage values (stack effect) on the cell output voltage waveform and hence the cell delay. Chapter 5 presents a current source model for CMOS register cells. The feedback loops which are present in register cells are accurately modeled. This in turn enables us to correctly capture the timing behavior of the CMOS register cells. In chapter 6, we address the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. MTCMOS technique by inserting high-V th sleep transistors to reduce the leakage power consumption in the idle mode is used. Our summary and conclusions are presented in chapter 7 and finally the references are provided in chapter 8. 13 2. CHAPTER 2: LOGIC CELL DELAY ANALYSIS UNDER NOISY INPUT WA VEFORMS The drastic down scaling of layout geometries to 90nm and below has resulted in a significant increase in the packing density and the operational frequency of VLSI circuits. An unfortunate side effect of this technology advancement has been the aggravation of noise effects, such as the capacitive crosstalk noise, in VLSI circuits. This is mainly because the metal wires have become narrower and thicker (and in fact longer in the case of global interconnects) and are laid out closer to one another, which in turn increases the capacitive coupling noise. Furthermore, IC manufacturing process variations, device/interconnect aging phenomena, and dynamic circuit parameter changes (such as power plane fluctuations and temperature gradients in the substrate) give rise to a rather significant deviation of the electrical parameters of the circuit components from their designed (nominal) values. This effect can produce excessive timing uncertainty, which in turn requires sophisticated crosstalk-aware delay analysis techniques and tools to overcome it. The conventional logic cell delay modeling approaches are not able to accomplish this task. In this chapter we introduce a new current-based model which can correctly calculate the output voltage waveform of a logic cell under noisy input waveform. 14 2.1 Introduction Conventional voltage-based cell delay modeling approaches are not accurate, mainly because they calculate the propagation delay and output transition time of a CMOS logic cell, which is subjected to a noisy input waveform, by approximating this noisy waveform with a saturated ramp signal and then utilizing cell library delay look-up tables based on the input transition time and the output load to report the output timing information. Modeling the input waveform as a saturated ramp may however result in significant error in the timing parameters of interest because the actual output waveform can be very different from the one that is implied by a simple saturated ramp input. The key fact about the shape of a voltage waveform is that different waveforms with identical arrival times and slews applied to the input of a logic cell (or at the near end of an interconnect line) can result in very different propagation delays through that cell (or at the far end of the line.) Therefore, as the signal integrity problems such as the crosstalk-induced slowdown or glitches become major concerns of circuit designers, use of a saturated ramp to convey the signal waveform information starts to seriously impact the quality and robustness of timing and noise analysis tools. Conventional voltage-based cell delay modeling approaches generally use 2-D lookup tables with input slew and output load as the keys to the tables and the output slew and gate delay as the output of the tables. Therefore the resulting pre-characterized look-up tables are inherently incompatible with the arbitrary shapes and hence fall apart in processing noisy inputs. 15 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.00E-09 1.50E-09 2.00E-09 2.50E-09 3.00E-09 3.50E-09 Γ (Elmore-based) Output (Elmore-based) Ouput (Hspice ) eff v in Figure 2-1. A voltage-based (Elmore-based) method pessimism in delay calculation The goal of cell timing analysis is conventionally stated as: Given a noisy waveform at the input of a cell, find an equivalent input voltage waveform that when is applied to the cell generates an output waveform which is as close as possible to the output waveform in terms of its arrival time and slew. As the silicon technology is driven to nanometer, conventional voltage-based lookup tables are nearing the end of their useful life. In [30] and [47] the common voltage-based cell timing analyzers are reviewed and their shortcomings are highlighted. As an example Figure 2-1 shows the pessimism in delay calculation of one such voltage-based technique reviewed in [30] and [47]. The technique is known as Elmore-based where it passes an equivalent saturated ramp, Γ eff , through the latest 0.5V dd crossing point of the noisy voltage waveform. The slope is then selected such that the area, which is encapsulated by that line and straight lines v 1 (t) = 0.5×Vdd and v 2 (t) = V dd is equal to the area surrounded by the noisy input and lines v1 and v2. Note that this method was originally developed to handle the noisy inputs. However in the case of multiple 16 0.5V dd crossing point, which is a common case in the presence of crosstalk noise, this method may lead to erroneous output waveform and hence delay results. To consider the actual shape of the waveform more effectively, the problem is restated in a more general statement as follows: Given a noisy voltage waveform at the input of a cell, determine the output voltage waveform, which has the minimum error with respect to the actual output waveform shape. Current-based has been shown to be more accurate than voltage-based logic cell timing analysis [20][35]. In fact some industrial current-based timing analyzers, such as CCSM and ECSM are already in use [14]. In this chapter we present our current-based model which utilizes pre- characterized tables to capture both the parasitic and the non-linear behavior of the logic cell. We compare our model with both SPICE and the most accurate current- based approach in the literature. The high accuracy of our model in output voltage waveform calculation will lead us to different applications. Short circuit energy calculation and multiple input switching analysis of logic cells under noisy input waveforms will be explained in the next two chapters. The remainder of this chapter is organized as follows. In section 2.2 the previous current-based model are discussed. Section 2.3 describes our CSM logic cell delay analysis. Section 2.4 and 2.5 explain the experimental results and conclusions respectively. 17 2.2 Background on Current-based Cell Delay Modeling Current-based cell timing analyzers generally base their delay calculations on the amount of current flow into or out of a cell. Current-based modeling is a physical model patterned after the actual construction of transistors. It improves delay calculation accuracy by modeling a cell’s output drive as a current source rather than a voltage source. Current sources are more effective at tracking non-linear transistor switching behavior. Current based logic cell delay modeling has been introduced as an alternative to cope with the shortcoming of the conventional approaches. Authors in [20] propose a current based model in which a pre-characterized current source is utilized to capture the non-linear behavior of the logic cell with respect to the input and output voltage values. First I out (V in ,V out ), the amount of current sourced by a cell in response to DC voltage levels on the input and output pins of interest, is determined and a lookup table (denoted by the cell I-V table) is created for each cell by sweeping the DC values of input and output voltages and measuring the current sourced by the cell output pin. However, a response exclusively derived from the DC-based I-V table results in an overly optimistic timing analysis as the DC sweep of the input and output ignores the effects of parasitic elements. Therefore a calibration procedure is thus performed to consider the cell parasitic effects. This procedure determines an internal capacitive load which, when applied to the Blade model, results in a transient waveform that matches the shape of a SPICE-generated waveform for the 18 cell under identical conditions. Once the waveform shapes have been matched, a time shift is calculated by examining the time difference between the 50% points of the Spice output and the calibrated Blade output. The parasitic effects are not modeled accurately in [20]. In particular, the Miller and input parasitic effects are ignored. Keller, et al in [35] present the most recent and accurate current-based model (referred to as KTV [Keller, Tseng, and Verghese] throughout this chapter) in handling waveforms with arbitrary shapes in the presence of noise. Similar to Blade a pre-characterized current source is used. The parasitic components, namely the Miller and the output capacitances are assumed to be constant regardless of the input and output voltage values. Based on our observation, these capacitive effects can vary by orders of magnitude depending on cell input and output voltage values. The assumption of constant values can specifically create significant inaccuracy especially for complex cells. Furthermore, the input parasitic capacitance is ignored during characterization. Finally this model does not address the effect of process variations on cell delay analysis. The main motivation for us to create a new model is that the existing current-based models, as the most accurate models in industry and literature, sometimes exhibit rather large errors compared to SPICE. In the next sections we introduce our proposed current-based model and compare their result with SPICE and the existing cell delay models. 19 2.3 The Proposed Current-Based Cell Delay Model This section explains our current-based logic cell model for the purpose of timing analysis. Our model shown in Figure 2-2 consists for two main components, namely parasitic capacitances to model the parasitic behavior at input and output nodes and the Miller effect between the nodes, as well as a current source at the output node to model the nonlinear behavior of the logic cell [27]. i o i i V i C M I o (V i ,V o ) C i V o C o Figure 2-2. Our proposed current-based circuit model of a logic cell. Each component is in turn a function of the input and output voltage values. As a result, our proposed cell model is represented by the following KCL equation which essentially models the current at the input and output pins of the cell during switching: (, ) ( (, ) ( , )) ( , ) 0 oi o i o o io M i o M io VV i IVV C V V C VV C V V tt ∆ ∆ ++ + − = ∆∆ (1) The coefficients of the last two terms in Equation (1) capture the currents of the Miller capacitance, C M , and charging of the total output to ground capacitance, i.e., C M +C o , respectively. The amount of current sourced by a cell in response to DC voltage levels on the input and output pins of interest, I(V i ,V o ), is determined for each logic cell by sweeping the DC values of input and output voltages and measuring the 20 current sourced by the cell output pin in SPICE. As a result, to model the nonlinear behavior of a cell w.r.t. input and output voltage values, a 2-D lookup table is constructed to store the values of I(V i ,V o ). The input and output voltage values are the keys to the 2-D lookup tables. C M (V i ,V o ) and C o (V i ,V o ) values are characterized through a series of SPICE-based transient simulations, where saturated ramp input and output voltages are applied to input and/or output nodes and the output current is monitored. 2-D lookup tables are used to store C M (V i ,V o ) and C o (V i ,V o ) values. Precise estimation of output load is crucial for accurate delay calculation of a cell. Assume cases in which a cell is driving other cells. The input parasitic capacitances of fan-out cells should be considered as part of the load for delay calculation of the driver cell. The following equation is used to characterize the parasitic effect seen at the input of a cell: { ( , ) (, )} (, ) io i i io M i o M io VV i C VV C V V C VV tt ∆ ∆ =+ − ∆ ∆ (2) Similar to C M and C o , the input parasitic capacitance, C i , is also a function of the input and output voltage values. A transient analysis is used to determine C i . In this analysis, a saturated ramp is applied to input, while the output node is connected to a DC voltage source, and the input current, i i , is measured. Although the input parasitic capacitance, C i , is in fact a function of the input and output voltage values, in practice an input-voltage-dependent C i is all that can be efficiently utilized. This is because when calculating the delay of a logic cell during STA, the output voltage values of its fanout cells are unknown, and therefore, calculation of C i values of the 21 fanout cells cannot make use of any information about the output voltage levels of these fanout cells. That is why we say that making C i dependent on V o is not useful in practice. In addition, note that Equation (1) is enough to find the output current and voltage values, and Equation (2) is only used to characterize C i . Having characterized the current source and parasitic capacitances of a cell, Equation (1) can be rewritten, by substituting the output current, i o , as a function of the output load and output voltage. Considering a capacitive load, C L Equation (2) can be rewritten as: (, ) 0 oo o i Lo io M M VV V V CCIVVC C tt t t ∆∆ ∆ ∆ ++ + − = ∆∆ ∆ ∆ (3) Equation (3) can be rewritten with respect to output voltage values, resulting in: 11 1 () () { (() ()) (, ) } ok ok M i k i k i o Lo M Vt Vt C V t V t IVV t CC C ++ =+ × − − ×∆ ++ (4) 2.4 Experimental Results In order to show the effectiveness of our current-based model, it was compared with HSPICE [31]. Waveforms of arbitrary shapes, ranging from a simple saturated ramp to crosstalk-induced noisy waveforms with voltage fluctuation as high as 85%-V dd , were applied. The set of experiments involved various logic cells, such as simple inverter and NAND gates, as well as complex cells such as AOI (And-Or-Invert). Figure 2-3 shows comparison with HSPICE for some examples of crosstalk- induced noisy waveforms given to a minimum sized inverter in our 130nm library. The equivalent output waveforms generated by our model match the HSPICE. 22 Figure 2-3. The actual and equivalent waveforms by our model for some crosstalk-induced noisy waveforms. Figure 2-4 shows the comparison with HSPICE for some other examples of crosstalk-induced noisy waveforms given to a minimum sized inverter with a FO4 loading in our 130nm cell library. Figure 2-4(a) is for the case where only one aggressor is injecting the noise. The transition time at the input node of the aggressor and victim lines is set to 300ps. The input voltage and output voltage, obtained by our CSM model as well as HSPICE are depicted. Figure 2-4(b) shows another example with the identical experimental setup, except for the number of aggressor lines which is two in this case. Figure 2-4(c) illustrates the results for a minimum size FO4-loaded NAND3 for which a crosstalk noise is injected to one of the inputs through three aggressors, while the other two inputs assume a non-controlling, steady, high level logic value. The transition time at the input driver of the aggressor line as well as that of the NAND input victim line are set to 300ps. Next the accuracy improvement over the most recent and accurate logic cell delay model, i.e., KTV [35] is discussed. Figure 2-5 illustrates the absolute delay error comparison of our model and KTV with respect to HSPICE for a minimum size 23 inverter in our 130nm cell library. The input line to the inverter is coupled by a 50fF coupling capacitance and is under attack by an aggressor line. Both the victim and aggressor lines are driven by minimum sized inverters. The cell under consideration has a FO4 load. The arrival time of the signal transition at the input of the victim line driver is set to 0ps while that of the aggressor driver (i.e., the noise injection time) is swept from 100ps to 200ps with a time step of 1ps. Compared to KTV, the accuracy of delay calculation for the minimum sized inverter is improved by 8.8% (17.3%) in average (max.), respectively (c.f. Figure 2-5(b).) Figure 2-6 shows the absolute delay error trend for a similar experiment performed on AOI22 cell with size 10x; where x is the minimum size AOI22. The coupling value is 80fF and the arrival time of the aggressor line input driver is swept from 100ps to 250ps with time step of 1ps. The accuracy improvement in this case is 52.1% (93.4%) in average (maximum.) The high accuracy of our model is mainly due to our accurate parasitic effect modeling during cell characterization, where the dependency of such effects to input and output voltage values are considered. In general, the error in 50%V dd cell propagation delay is less than 0.7% (2.4%) in average (maximum) compared to HSPICE for the cells in our 130nm library. 24 0 0.2 0.4 0.6 0.8 1 1.2 0.00E+00 5.00E-10 1.00E-09 1.50E-09 2.00E-09 2.50E-09 3.00E-09 Voltage (V) Time (Sec) noisy input voltage output Voltage(Hspice) output voltage (our model) (a) 0.00E+00 2.00E-01 4.00E-01 6.00E-01 8.00E-01 1.00E+00 1.20E+00 0 5E-10 1E-09 1.5E-09 2E-09 2.5E-09 3E-09 3.5E-09 4E-09 output voltage (Hspice) output voltage (our model) noisy input voltage Voltage (V) Time (Sec) (b) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0.0E+00 5.0E-10 1.0E-09 1.5E-09 2.0E-09 2.5E-09 3.0E-09 3.5E-09 4.0E-09 output voltage (our model) output voltage (Hspice) noisy input voltage Time (Sec) Voltage (V) (c) Figure 2-4. Comparison between our model vs. HSPICE for minimum size inverter in (a) and (b) and minimum size NAND2 in (c), given single (a), double (b), triple aggressor (c) crosstalk-induced noisy waveforms. 25 2.00E-12 2.20E-12 2.40E-12 2.60E-12 2.80E-12 3.00E-12 3.20E-12 3.40E-12 3.60E-12 3.80E-12 4.00E-12 1E-10 1.2E-10 1.4E-10 1.6E-10 1.8E-10 2E-10 KTV model Our model Noise injection time at cell input (sec) delay error vs. HSpice (sec) (a) 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 1E-10 1.2E-10 1.4E-10 1.6E-10 1.8E-10 2E-10 Delay accuracy improvement (%): our model vs. KTV Noise injection time at cell input (sec) (b) Figure 2-5. (a) Absolute delay errors in calculated delays vs. Spice for an inverter size x (b) Delay accuracy improvement by our model over the KTV model [35] As discussed earlier the shape of the waveform highly impacts the delay calculation, therefore delay and output slew metrics may not be sufficient to model the waveform shape. Our model is able to compute close-to-SPICE output waveforms in shape. Mean square error (MSE) is a good metric to compare waveform similarities. Figure 2-7 shows the mean square error for the output voltage waveforms computed by our cell model and as well as KTV compared to HSPICE. It is seen that this value is lower for our model compared to KTV in most of the cases. 26 In fact, the average MSE improvement for the inverter and the AOI22 for the aforementioned experiment setup is 11.3% and 24.5%, respectively. 0 1E-12 2E-12 3E-12 4E-12 5E-12 6E-12 7E-12 8E-12 9E-12 1E-11 1E-10 1.2E-10 1.4E-10 1.6E-10 1.8E-10 2E-10 2.2E-10 2.4E-10 delay error vs. HSpice (sec) Our model KTV model Noise injection time (sec) (a) 0 10 20 30 40 50 60 70 80 90 100 1E-10 1.2E-10 1.4E-10 1.6E-10 1.8E-10 2E-10 2.2E-10 2.4E-10 Delay accuracy improvement (%): our model vs. KTV model Noise injection time at cell input (sec) (b) Figure 2-6. (a) Absolute delay errors in calculated delays vs. Spice for a AOI22 size 10x (b) Delay accuracy improvement 27 9.10E-10 9.30E-10 9.50E-10 9.70E-10 9.90E-10 1.01E-09 1.03E-09 1.05E-09 1.07E-09 1.09E-09 1.11E-09 1.00E-10 1.20E-10 1.40E-10 1.60E-10 1.80E-10 2.00E-10 MSE vs. Hspice Noise injection time at cell input (sec) Our model KTV model (a) 3.00E-09 3.50E-09 4.00E-09 4.50E-09 5.00E-09 5.50E-09 6.00E-09 6.50E-09 1.00E-10 1.20E-10 1.40E-10 1.60E-10 1.80E-10 2.00E-10 MSE vs. Hspice Noise injection time at cell input (sec) KTV model Our model (b) Figure 2-7. Waveform similarity (mean square error) comparison to HSPICE for our model and the KTV model for (a) inverter (b)AOI22 2.5 Summary A new current-based cell delay model was developed to accurately capture various cell parasitic and nonlinear effects in the computation of output voltage waveform in the presence of crosstalk-induced noise. The experimental results showed the high accuracy of our cell delay model compared to HSPICE and also the improvement over the existing cell delay models. The average and maximum error in delay calculation of cells in our 130nm library is less than 0.7% and 2.4%, respectively. 28 The high accuracy of this model enables us to apply this model for different application. As a result we will use this model to address the problem of short circuit energy dissipation calculation in chapter 3. Furthermore, extension of this model for handling multiple input switching will be discussed. 29 3. CHAPTER 3: A CURRENT-BASED METHOD FOR SHORT CIRCUIT POWER CALCULATION UNDER NOISY INPUT WA VEFORMS The short circuit current is highly dependent on the input and output voltage values. Therefore the actual shape of the voltage signal waveforms at the input and output of the cell should be considered in order to precisely calculate the short circuit energy. For example, the approximation of the crosstalk induced noisy waveforms with saturated ramps can lead to short circuit energy estimation errors as high as orders of magnitude for a minimum sized inverter. To resolve this shortcoming, in this chapter we introduce a new Current-based Short-circuit Power Calculator denoted by CSPC. Our current-based logic cell model which was explained in the last chapter is utilized, which constructs the output voltage waveform for a given noisy input waveform. The input and output voltage waveforms are then used to calculate the short circuit current, and hence, short circuit energy dissipation. A pre- characterization process is executed for each logic cell in the standard cell library to model the relevant electrical parameters e.g., the parasitic capacitances and nonlinear current sources. Additionally, CSPC is capable of calculating the short circuit energy dissipation caused by glitches in VLSI circuits, which in some cases can be a key contributor to the total circuit energy dissipation. 30 3.1 Introduction Accurate power estimation is a critical step for analysis and design of CMOS circuits in nanometer process technologies. The difficulty is mostly due to (a) input pattern dependence e.g., accurate average power dissipation requires knowledge of a “typical” or “expected” input stream, and (b) variability of the shape of the input signal waveform due to variations in key physical and electrical characteristics of CMOS logic cells and interconnects and/or different sources of noise, such as DC drop on supply lines and crosstalk noise on signal lines. While the first issue has been addressed in the past by developing various statistical or probabilistic power estimation methodologies and frameworks [41][69], the latter issue has not received much attention by the low power design community. To partially address this shortcoming, this chapter seeks to develop a short circuit power calculation method under noisy input signal waveforms. Power consumption in CMOS VLSI circuits comprises of three components: switching, short-circuit, and leakage. The switching component of power dissipation refers to the power consumed to cause a gate output transition and follows the well- known 2 1 2 sw L dd PCVfα = where f is the clock frequency and α is the expected number of output transitions per clock cycle. Detailed treatment can be found in [52]. The next component is the short circuit or rush-through power dissipation. Short circuit power is consumed by the current flow between the power rails (i.e., power supply to ground) through a direct current path which is temporarily 31 established during an output transition. Therefore, the short circuit at each time instance depends on operation region of the devices in the logic cell which means that it is dependent on both input and output voltage values. This dependence has been explored to capture by considering the short circuit current as a function of the input transition time as well as the output load. A well-known equation for time- averaged short-circuit power dissipation is [63]: 3 1 (2) 12 sc in dd T PkV Vf τ α =− where in τ is the input transition time, T V is the threshold voltage of transistors, and k is the effective transconductance parameter of the logic gate. The leakage component of power dissipation (which is rising very fast compared to switching and short circuit components due to lower T V values and thinner gate oxides) accounts for the sub- threshold current conduction, gate oxide tunneling currents, and reverse-biased p-n junction currents. Detailed treatment can be found in [45]. The focus of this chapter is on short circuit energy dissipation. 1 For years, it has been stated and generally accepted that the short circuit current can be made small (say less than 10% of the switching power) by following a few simple design guidelines e.g., do not overdrive a load, do not allow the transition time (inverse of the slew rate) of your intermediate signals to become too long. We will show in this chapter that short circuit energy dissipation can be comparable to other sources of energy dissipation even for a well-designed circuit in sub-90nm designs (e.g., refer to 1 Since the operation frequency of the circuit, f, is assumed to be fixed during the analysis and optimization steps that we consider in this chapter and recalling P = E . f relation, we will alternately use “energy calculation” and “power calculation”. 32 Figure 3-2(a) and (b) in section 3.) This is mostly due to the increasing effect of noise, primarily crosstalk noise and its impact on the voltage signal transition shapes. The increase in the transistor packing density as well as the clock frequency of the VLSI circuits increases the effect of capacitive crosstalk noise; the interconnect lines get thicker and narrower (and longer in case of global interconnects,) which result in the aggravation of crosstalk noise amplitude. This phenomenon in turn results in more distorted voltage signal waveforms and tends to increase the effective transition time of the signal waveforms that are subjected to crosstalk noise. To the best of our knowledge, CSPC is the only model that can construct the actual shape of the short circuit current waveform for any type of input voltage waveform, including glitches. The accuracy improvement by our model over the existing approaches is significant. It is worth mentioning that our current-based approach utilizes the cell parasitic and current data that are pre-characterized for timing analysis purposes, and hence, there is no extra complexity for the pre- characterization step. The remainder of this chapter is arranged as follows. In section 3.2 an overview of the existing short circuit energy calculation is presented. CSPC, our current-based logic cell circuit model for short circuit power calculation is described in section 3.3. Section 3.4 presents our experimental results for different types of input waveforms as well as logic cells. Finally, section 3.5 provides our conclusions. 33 3.2 Short Circuit Energy calculation Approaches Most of the previous work on short circuit power has mostly focused on the development of closed-form analytical expressions [34][43][48][55][56][63]. These approaches, which generally attempt to solve a set of differential equations for a switching inverter loaded with an effective capacitive load, lack accuracy due to their dependence on the speculated simple device models and assumptions made regarding the device operation during signal transitions. Another group of approaches pre-characterize the average short circuit current with respect to the input signal transition time and capacitive output load. This method is very similar to the one used in typical static timing analysis tools, where the logic cell delay and output voltage signal transition time are characterized as a function of the input transition time and capacitive output load. One such technique is the work by Dartu et al in [22], which pre-characterizes the short circuit energy for each cell as follows: 0 () ( , ) scdd sc in L EV i tdt gtC ∞ == ∫ (5) where i sc (t) and E sc denote the short circuit current and energy dissipation for one output signal transition, respectively. E sc is empirically characterized in terms of k- factor type equations. The resulting pre-characterized look-up tables, (, ) in L gtC , are inherently incompatible with the arbitrary waveform shapes, and hence, fall apart in processing noisy inputs, such as crosstalk-induced noisy waveforms. 34 More recently, Acar et al in [6] proposed a practical methodology that finds the maximum short circuit current in the linear and saturation regions of the device operation, and then utilizes triangular waveform approximation based on those peak current values to predict the short circuit energy dissipation during an output transition of a CMOS gate. This methodology uses timing rules of the conventional static timing analysis tools, where cell behaviors are pre-characterized as a function of the input slew and output load capacitance. Unfortunately, these models are not well-suited to deal with crosstalk-induced noisy waveforms. More generally, this technique suffers from the fact that short circuit current waveform cannot be well- modeled by a triangular shape. This is especially true when noisy input waveforms are concerned (cf. Figure 3-2.) The major shortcomings of the previous modeling techniques in both cell output voltage and short circuit current calculation is the fact that the impact of the shape of the input voltage waveform is ignored by using simplifying assumptions, such as approximation of the input waveform with a saturated ramp. The objective of this chapter is to devise an accurate short circuit energy calculation technique. The above-mentioned weaknesses of the previous techniques are all resolved by our current-based model. It can process input voltage waveforms of arbitrary shapes, and hence, construct the exact output voltage waveform. The nonlinear behavior of the short circuit current is captured by generating, during a pre-characterization step, a lookup table for each cell with the input and output voltage values as its keys and the short circuit current value as the table output. 35 We use the term hazard to refer to an unwanted spurious full-rail transition on a signal line. Hazards gives rise to both switching and short-circuit power dissipations. A glitch, on the other hand, refers to an incomplete transition (half-rail swing) on a signal line. Although these glitches can give rise to switching power dissipation, their impact on the circuit power is mostly in the form of short-circuit power dissipation. It is easy to construct an input glitch for a CMOS inverter that will create a DC path between the power and grail rails at the output of the inverter over a long period of time, thus resulting in a significant amount of short circuit power dissipation that far exceeds any switching power dissipation (even if the input glitch is passed on to the output.) Glitches are thus an important contributor to circuit power dissipation. Modeling the glitch short circuit current as a function of the glitch characteristics such as its shape is a difficult task. Furthermore, signal glitches are usually ignored by the timing analysis tools when they do not lead to the circuit delay change while these glitches can significantly increase the amount of short circuit power dissipation in the circuit and hence cannot and should not be ignored by the power analysis tool. Our current-based model can accept any type of glitches at the input of the logic cell and create the corresponding output voltage waveform to accurately construct the respective short circuit current waveform. 3.3 CSPC: Current-based Short-circuit Power Calculator This section explains our current-based logic cell model for the purpose of short circuit energy calculation. The main motivation for us to create a new model is that 36 the existing short circuit current prediction models exhibit large errors compared to SPICE, mainly due to ignoring the impact of the shape of the voltage signal waveform. To resolve this issue our model accurately computes the output voltage waveform given the input voltage waveform using a current-based model. Short circuit current value at each time instance can then be acquired using a pre- characterized lookup table with the input and output voltage values of the cell as the keys to the table. As mentioned before, accurate consideration of the shape of the voltage waveforms at the input and output of a logic cell is crucial to calculate its short circuit current. Therefore we use the current based logic cell model which was presented in the previous chapter for the purpose of constructing the accurate output voltage waveform given the noisy input. The accuracy of our current-based model in output voltage construction was presented in the last chapter. In this chapter we will see how the high accuracy in output voltage waveform construction will be helpful in calculating the short circuit energy dissipation. Short circuit current of a logic cell is a non-linear function of the cell input and output voltage signals. Therefore, we pre-characterize the short circuit current of each cell with a 2-D lookup table with the input voltage and output voltage values as the keys to the tables and the short circuit current as the table output. Having the input voltage, the output voltage waveform can be constructed using our current- based model. A SPICE-based pre-characterization process for short-circuit current is performed. For each cell the common current flows through the pull-up and pull- 37 down paths of the logic cell are evaluated while the input and output voltage values are set to a DC value from 0 to V dd . This pre-characterization is similar to the one explained in section 2.3 which was performed to measure I o (V i ,V o ). Figure 3-1 shows this process for a simple inverter logic cell. The zero voltage supplies, V M1 and V M2 are placed for the purpose of measuring the current flow through the pull-up and pull-down paths of the cell while V CH1 and V CH2 feed the input and output nodes with DC values. The short circuit, I sc (V i ,V o ), is simply the minimum of the currents passing through V M1 and V M2 . A 2-D lookup table is then created to store the I sc (V i ,V o ) values. Note that this table models the nonlinear behavior of the cell short circuit current w.r.t. to the input and output voltage values. 0 V 0 V V i V o V M1 V M2 V dd + _ + _ V CH1 V CH2 Figure 3-1. The short circuit current measurement during cell pre- characterization 3.4 Experimental results To show the effectiveness of CSPC the model was compared with HSPICE simulations [31]. Figure 3-2 shows the comparison with HSPICE for some examples 38 of crosstalk-induced noisy waveforms given to a minimum sized inverter and NAND3 with a FO4 loading in our 130nm cell library. Figure 3-2(a) is for the case where only one aggressor is injecting the noise. The transition time at the input node of the aggressor and victim lines is set to 300ps. The input voltage, output voltage, and short circuit current waveforms obtained by CSPC as well as HSPICE are depicted. It is seen that the CSPC-generated waveforms closely match the corresponding ones generated by HSPICE. Figure 3-2(b) shows another example with the identical experimental setup, except for the number of aggressor lines which is two in this case. This figure shows that the accuracy of CSPC does not degrade no matter how distorted the input voltage waveform is. We note that the short circuit energy dissipation related to Figure 3-2(a) are 2.68fJ (2.78fJ) by HSPICE (CSPC.) Results for the case of Figure 3-2(b) are 15.65fJ (15.74fJ). This constitutes more than 5X rise in short circuit energy dissipation when the number of aggressors is increased from one to two. This is because as the number of aggressor lines increases, the duration in which both NMOS and PMOS are operating increases; this in turn significantly raises the short circuit energy consumption level. Figure 3-2(c) illustrates the results for a minimum size FO4-loaded NAND3 for which a crosstalk noise is injected to one of the inputs through three aggressors, while the other two inputs assume a non-controlling, steady, high level logic value. The transition time at the input driver of the aggressor line as well as that of the NAND input victim line are set to 300ps. The short circuit energy dissipation for this case is 27.71fJ (28.01fJ) by HSPICE (CSPC), meaning that the error of CSPC is less than 1.1% in this case. 39 The switching energy consumption per signal transition for the inverter in the aforementioned experiments (Figure 3-2(a) and (b)) is measured as 8.89fJ. This shows an E sc /E sw ratio (i.e., short circuit to switching energy ratio per transition) of 30.1%, and 176.0% for the two cases of Figure 3-2(a) and Figure 3-2(b), respectively. 0 0.2 0.4 0.6 0.8 1 1.2 0.00E+00 5.00E-10 1.00E-09 1.50E-09 2.00E-09 2.50E-09 3.00E-09 0.00E+00 5.00E-06 1.00E-05 1.50E-05 2.00E-05 2.50E-05 Voltage (V) Current (A) Time (Sec) noisy input voltage output Voltage(Hspice) output voltage (our m odel) short circuit current (Hspice) short circuit current (our m odel) Figure 3.2 (a) 0.00E+00 2.00E-01 4.00E-01 6.00E-01 8.00E-01 1.00E+00 1.20E+00 0 5E-10 1E-09 1.5E-09 2E-09 2.5E-09 3E-09 3.5E-09 4E-09 0.00E+00 5.00E-06 1.00E-05 1.50E-05 2.00E-05 2.50E-05 output voltage (Hspice) output voltage (our m odel) short circuit current (our m odel) short circuit current (Hspice) noisy input voltage Current (A) Voltage (V) Time (Sec) short circuit current for the equivalent ram p input Figure 3.2 (b) 40 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0.00E+00 5.00E-10 1.00E-09 1.50E-09 2.00E-09 2.50E-09 3.00E-09 3.50E-09 4.00E-09 0.00E+00 5.00E-06 1.00E-05 1.50E-05 2.00E-05 2.50E-05 output voltage (our m odel) output voltage (Hspice) short circuit current (Hspice) noisy input voltage short circuit current (our m odel) Time (Sec) Voltage (V) Current (A) Figure 3.2 (c) Figure 3-2. Comparison between CSPC vs. HSPICE for minimum size inverter in (a) and (b) and minimum size NAND2 in (c), given single (a), double (b), triple aggressor (c) crosstalk-induced noisy waveforms. These examples clearly demonstrate how severely the short circuit energy dissipation can be increased due to the noisy input signals even for a reasonable logic cell input transition time and output load. To show the effect of noise on the short circuit energy dissipation we swept the noise injection time. Due to different noise injection time, the time interval in which both transistors are conducting became different. The resulting energy dissipation values corresponding to different arrival times of the noise are shown in Figure 3-3. For some of the case the input/output waveforms are also presented (red waveform is the input and blue waveform indicates the output). It shows that different noisy waveforms result in different energy dissipation. The increasing trend is due to the fact that the time in which both transistors are conducting has been increased. 41 Figure 3-3. Energy dissipation for different noisy inputs. To confirm the advantage of CSPC compared to the conventional techniques, we implemented the technique by Dartu et al in [22] for which an input waveform needs to be approximated with an effective saturated ramp to be compatible with the pre- characterized lookup tables. Figure 3-2(b) illustrates the short circuit waveform for one such ramp approximation. The corresponding short circuit energy dissipation is calculated as 7.1fJ, which is less than half of the actual short circuit energy dissipation by the noisy waveform (i.e., 45.9% error w.r.t. to the HSPICE report, 15.45fJ.) This underlines the fact that the shape of waveform cannot be ignored during short-circuit power calculation. To investigate the accuracy of CSPC in dealing with a complex logic cell, an AOI22 (And-Or-Invert) with size 10x is studied, where x denotes the minimum size for an AOI22. The cell is FO4-loaded. One of the input nodes has come under attack through a coupling capacitance of 80fF. The other inputs are set to their non- 42 controlling values. The corresponding aggressor and victim lines are driven by 10x inverters. The arrival time of the signal transition at the input of the victim line driver is set to 10ps while that of the aggressor line driver (i.e., the noise injection time) is swept from 100ps to 250ps with a time step of 1ps. Figure 3-4 illustrates the percentage error in short circuit energy dissipation calculation of the AOI22 compared with HSPICE. The average (maximum) error of the short circuit energy calculation for the AOI22 cell is 1.16% (3.35%). We repeated this experiment for different FO4-loaded logic cells with different sizes. An automated test was performed to validate CSPC compared to HSPICE for different logic cell types using a similar experimental setup to that of the previous experiment on the AOI22. 150 noisy input waveforms were applied by sweeping the noise injection time for each logic cell. For each noisy input the transient analysis period and step size were set to 4ns and 3.3ps, respectively. Table 1 summarizes the average and maximum errors in short circuit energy calculation of those logic cells. The runtime of CSPC is independent of the number of transistors in the logic cell. In contrast, the number of transistors greatly affects the runtime of HSPICE. For example, the HSPICE simulation for XOR2 takes almost 3 times as long as that of the NAND2 whereas the runtime of CSPC is about the same for both cases. 43 0 0.5 1 1.5 2 2.5 3 3.5 4 1E-10 1.2E-10 1.4E-10 1.6E-10 1.8E-10 2E-10 2.2E-10 2.4E-10 Short circuit energy calculation error (%) vs. HSpice Noise injection time (sec) Figure 3-4. Absolute short circuit energy calculation error vs. HSPICE for an AOI22 size 10x under noisy waveforms Next we demonstrate the accuracy of CSPC for short circuit energy dissipation of glitches. Figure 3-5 shows an example of glitch induced by a coupling capacitance value of 50fF on the quiet victim, which happens to be the input node of a minimum- size inverter with a FO4 load. The output voltage waveforms constructed by CSPC as well as those computed by HSPICE are also depicted. It is seen that the inverter output is not logically affected by the glitch, and therefore, the glitch will be typically ignored by the timing analysis or a validation tool. However, the corresponding short circuit energy dissipation is measured by HSPICE to be 3.5fJ. This amount is in fact comparable with the short circuit dissipation measured for complete signal transitions at the input of the inverter, e.g., contrast this value to the energy dissipation for the case of Figure 3-2(a) reported by HSPICE as 2.68fJ. 44 Table 1. Runtime and error comparison between CSPC and HSPICE. Error (%) Runtime Logic Cell Avg. Max CSPC HSPICE Runtime Speedup INV 10x 1.11 2.13 8.28ms 24.4 2940 NAND2 10x 1.23 3.29 8.56ms 52.4 6120 XOR2 10x 1.41 3.52 9.44ms 149.2s 15800 AOI22 10x 1.16 3.35 9.00ms 60.8s 6750 An AOI22 size 10x was considered under similar experimental setup to the one in Figure 3-4. However, this time the cell input under crosstalk attack was kept quiet. In addition, the arrival time of the aggressor line was set to a constant value, while its transition time was swept from 200ps to 400ps with a time step of 1ps. Figure 3-6 is the absolute error for the short circuit energy calculation of the corresponding 200 glitch cases. CSPC was coded in C and PERL script language. All the experiments discussed in this section were performed on a Sun Fire V880 machine with the UltraSPARC III 750MHz processor running Sun Solaris operating system. 45 0 0.2 0.4 0.6 0.8 1 1.2 0 5E-10 1E-09 1.5E-09 2E-09 2.5E-09 3E-09 3.5E-09 4E-09 0.00E+00 5.00E-06 1.00E-05 1.50E-05 2.00E-05 2.50E-05 Voltage (V) Current (A) glitch (input voltage) short circuit current (our m odel) short circuit current (Hspice) Time (Sec) output voltage (our m odel) output Voltage (Hspice) Figure 3-5. HSPICE and CSPC waveforms for the example of a glitch. 0 0.5 1 1.5 2 2.5 3 2.00E-10 2.40E-10 2.80E-10 3.20E-10 3.60E-10 4.00E-10 Short circuit energy calculation error (%) vs. HSpice Aggressor input transition time (sec) Figure 3-6. Short circuit energy calculation errors of CSPC vs. HSPICE for an AOI22 size 10x under glitches 46 3.5 Summary An accurate technique to calculate the short circuit energy dissipation of logic cells was presented. The short circuit current was shown to be highly dependent on the input and output voltage values and hence the shape of the waveforms. This fact has been generally ignored by the conventional short circuit estimation techniques. To address this issue, we used our current-based logic cell model that can accurately construct the output voltage waveform for a given input waveform of arbitrary shape subjected to noise. The input and output voltage waveforms are used to calculate the short circuit current and hence energy dissipation. A pre-characterization process is executed for each cell to model the electrical parameters such as the parasitic capacitances and nonlinear current sources. Our model is capable of considering the glitches in short circuit energy calculation. The HSPICE-based experimental results show the high accuracy of our technique. 47 4. CHAPTER 4: A CURRENT SOURCE MODEL FOR CMOS LOGIC CELLS CONSIDERING MULTIPLE INPUT SWITCHING AND STACK EFFECT This Chapter presents an accurate current source model (CSM) for CMOS logic cells which accurately captures not only the effect of multiple inputs switching, but also the effect of internal node voltages of transistor stacks (i.e., the stack effect). By neglecting the effect of these voltages in the transistor stacks of CMOS logic cells, previous CSMs can encounter up to 20% error in their delay estimates in the presence of multiple inputs switching. We present a multiple input switching CSM to accurately capture the stack effect. We show how to characterize the various components of the model and how to utilize it for output voltage waveform calculation. The output waveform of the proposed CSM matches that of the HSPICE very closely. 4.1 Introduction As we discussed in the previous chapters the drastic down scaling of layout geometries to 65nm and below has resulted in a significant increase in the packing density and the operational frequency of VLSI circuits. An unfortunate side effect of this technology advancement has been the aggravation of noise effects, such as the capacitive crosstalk noise. In the ASIC design flow, combinational and sequential logic cells are pre-characterized for the input-to-output propagation delay and output 48 slew as a function of the input slew and effective output load (C eff ). This characterization is based on an implicit assumption about the saturated ramp form of the voltage waveforms that drive the inputs of a logic cell or are produced at its output. This approach is inherently incompatible with the arbitrary shapes of voltage waveforms, and thus, falls short when dealing with noisy inputs such as crosstalk- induced noisy waveforms. Therefore current source modeling has been introduced and applied for the noise and delay analyses. The CSM in chapter 2 was developed for a single input switching. For the cases of multiple input gates, the cell was modeled based on a single input change. Not modeling multiple input switching (MIS) for timing can result in as much as 100% error in stage delay and slew calculation. If inputs of a logic cell such as NAND2 arrive simultaneously, then the cell delay is significantly different than in the situation when one of the inputs has been stable for a long time. Most timing tools apply SIS cell delay/slew models even if the timing windows for the input signal would predict an MIS event. This can result in a significant under-estimation of cell delay/slew and makes delay analysis optimistic [9][18]. Similar to single input switching in the presence of noise, CSM analysis is necessary for MIS. In [9] the authors present an extension to CSM to cope with the multiple input switching problems. Each input and output pin of the cell is modeled with a voltage-dependent current source and a nonlinear capacitor. Each component in this model depends on all the inputs voltage values and the output voltage. However the effect of internal node voltages has been completely ignored. We have observed this simple model 49 may result in 20% error in delay calculation in some cases. Therefore in this chapter we present a complete CSM model which is not only capable of handling simultaneous input switching but also accurately captures the effect of internal node voltages. The remainder of this chapter is organized as follows. In Section 4.2 we provide the background and motivation for the problem. Section 4.3 presents our MSCSM (Multiple Input Switching Current Source Modeling). Section 4.4 is dedicated to simulation results and section 4.5 concludes the chapter. 4.2 Stack Effect in Multiple Input Switching Various CSMs for SIS are essentially similar in the sense that they all model the output current of the logic cell with a voltage-dependent current source. The CS model for a SIS combinational logic cell is described in chapter 2. B A Ou N V dd C L C N M2 M1 M3 M4 (a) B A Out1 0 Æ 1 Æ 0 1 Æ 1 Æ 0 B A Out2 1 Æ 1 Æ 0 0 Æ 1 Æ 0 C L C L (b) Figure 4-1. (a) Transistor level diagram of a NOR logic cell. (b) Two identical NOR logic cells. The first input transitions of the logic cells are different, while the second ones are identical 50 In the following, it is demonstrated that a current source model which considers only the input and output nodes is not able to characterize a multiple input switching logic cell accurately. For the sake of presentation and without loss of generality, in the remainder of this section, we limit the discussion to 2-input NOR logic cells. The key concepts and analyses for other types of logic cells with more than 2 inputs are similar. In a NOR2 logic cell, when the inputs change to ‘00’ and the output node becomes ‘1’, not only the load capacitance C L will be charged to V dd , but also the capacitance of the internal node N, i.e., C N in (a), will be fully charged from its initial voltage V N to V dd . Clearly, a higher initial value of V N necessitates less output current to charge up this internal node, and therefore, the transition of the output to ‘1’ becomes faster. The exact value of V N depends on the previous states of the inputs (we ignore the sub-threshold leakage current effect). Consider two different “input history” cases depicted in . In the first case, the inputs of the NOR change from ‘10’ (A=’1’, B=’0’) to ‘11’ and then to ‘00’. In this case, in input state ‘10’, node N is connected to the supply voltage, and therefore, its voltage is V dd , when input B changes to ‘1’ some current is injected to N through the gate-drain capacitors of M3 and the voltage of N slightly increases beyond V dd ; therefore, right before making the ‘00’ transition, the voltage of V N is V dd +∆V 1 . In the second case, the inputs of the NOR2 logic cell start from ‘01’ and then makes ‘11’ and ‘00’ transitions. This time, in input state ‘01’, the final voltage of node N is |Vt,p|. When input A changes to ‘1’, some charge is injected to node N through the 51 miller capacitor which slightly increases the voltage of this node to |V t,p |+∆V 2 . Figure 4-2 presents the SPICE waveforms for the voltage of the internal node under these two scenarios. From the above discussion, it is expected that the 50% propagation delay of the ‘11’ to ‘00’ input transition to be lower in the first case (i.e., for Out 1 ) compared to the second case (i.e., for Out 2 ). SPICE results reported in Figure 4-3 confirm this expectation. Notice that the delay difference for the ‘11’ to ‘’00’ transition between two cases is 23.2%, which is a significant difference. The aforementioned effect is called the internal node voltage or the stack effect. MIS CSMs that ignore this important effect will thus produce inaccurate timing, especially for lightly loaded logic cells. Time 0.0 5.0e-10 1.0e-9 1.5e-9 2.0e-9 2.5e-9 3.0e-9 Voltage -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 A B N1 N2 Figure 4-2. The voltage of internal nodes. 52 Time 2.0e-9 2.1e-9 2.2e-9 2.3e-9 2.4e-9 2.5e-9 Voltage -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 A B Out1 Out2 Figure 4-3. Change in the output voltage as a result of the stack effect (only the second transition is shown in the figure). The difference in delay of two NOR2 cells is 23.2% 4.3 CS Modeling – Multiple Input Switching This section explains the components in the CS model of a combinational logic cell when multiple inputs are switching simultaneously. For the sake of simplicity, we consider that no more than two inputs are switching at the same time. More precisely, even when the logic cell has more than two inputs, we model the cell based on a maximum of two varying inputs (all other inputs are set to their non- controlling values). This is a modeling decision that trades off the accuracy of results vs. the space and time complexity of model characterization and evaluation. The authors in [9] introduced a CSM for the MIS logic cells. In their model, each input and output pin of the cell is modeled with a voltage-dependent current source and a nonlinear capacitor. 53 Each component in this model depends on all the inputs voltage values and the output voltage. This model may result in large inaccuracy due to ignoring the internal node voltage values. As it was shown in experimental results, an error of more than 20% is anticipated if the stack effect is ignored. Out A B B C mB I o (V A ,V B ,V o ) C B V C o A C mA C A (a) (b) Figure 4-4. A simple MIS for NAND2 logic cell without internal nodes. A simple CS model that extends the model which was described in chapter 2 to handle the MIS case is shown in Figure 4-4(b). The major shortcoming of this MIS model is the absence of an initial voltage value and stored charge on internal node capacitances. 4.3.1 Modeling the Internal Nodes In the following subsections we describe how to model the initial internal node voltages and consider their effect on the output voltage calculation. The modeling of the internal node is based on the following observation. The voltage value at node ‘Out’ in Figure 4-5(a) is a function of the voltage value at node N. Moreover, V(N) itself is dependent on voltage values of nodes A, B, and Out as 54 well as the circuit parameters. Therefore node N acts as both an input and an output. To calculate V(N), we model the circuit at node N by a voltage dependent current source and a voltage dependent capacitance, cf Figure 4-5(b). For simplicity we did not model the miller effect between node N and other nodes. From our simulation results, this simplification results in acceptable accuracy. Out N V dd A B C N I N (V A ,V B ,V N ,V o ) V N C N I N (V A ,V B ,V N ,V o ) V N (a) (b) Figure 4-5. Internal node modeling 4.3.2 MSCSM: Pre-characterization This subsection explains how to characterize dependent current sources and various capacitances of the proposed CSM. B C mB I o (V A ,V B ,V N ,V o ) C B V o C o A C mA C A C N I N (V A ,V B ,V N ,V o ) V N B C mB I o (V A ,V B ,V N ,V o ) C B V o C o A C mA C A C N I N (V A ,V B ,V N ,V o ) V N Figure 4-6. Complete model for MSCSM 55 As shown in Figure 4-6, the proposed MSCSM model consists of six nonlinear capacitive components, namely, input and output parasitic capacitances to model the parasitic loading at input and output nodes of the logic cell, miller capacitances to model the capacitive coupling between the input nodes and the output node, and the internal node capacitance. The model also has two nonlinear current sources, one at the output node and one at the internal node. Each component is in turn a function of the voltage values at the input, output and internal nodes. As a result, the cell model is represented by the following two KCL equations which essentially account for the currents at the output pin and at the internal node of the cell during the switching. () ( ) () ( ) {} () () ∂ ++ + + ∂ ∂ ∂ −− = ∂∂ JG JG JG JG JG JG 0 o oo o mB mA AB mm AB V iIV CV C V C V t V V CV C V tt (6) () () ∂ + = ∂ JGJG 0 N NN V IV C V t (7) where () = JG ,, , AB N o VVVVV is a 4-D voltage vector and i o is the current at the ‘Out’ node (which is sourced by the load or by the voltage source connected to ‘Out’ in the pre-characterization step). The miller capacitances C mA , C mB , the output capacitance C o , and the internal node capacitance C N values are pre-characterized with a series of SPICE transient simulations, in which saturated ramp input and output voltages are applied to input and/or output and internal nodes while the output current is monitored. The 4-D tables are used to store C mA , C mB , C o and C N values. Different 56 slopes for the ramp waveforms are used and the capacitance values are calculated for each slope, and finally, an average value for the parasitic capacitances is stored. Changing the slope of the ramp voltage waveform in the pre-characterization process has a very small effect on the pre-characterized capacitance values. The values of current sources I o and I N , in response to DC voltage levels on the inputs, output, and internal node are also determined for each logic cell. These voltage sources are swept from –∆v to V dd +∆v where ∆v is a safety margin for the cases where the voltage reaches a value above (below) V dd (zero). The current at output and internal node are measured in SPICE and I o and I N current sources are characterized for different ( ) = J G ,, , AB N o VVVVV values. As a result, to model the nonlinear behavior of a logic cell with respect to the input, output, internal node voltage values, a 4D lookup table is created to store the values of I o ( V JG ) and I N ( V J G ). Precise estimation of the output load is critical for accurate output voltage calculation of a cell. The output node of a cell is usually connected to several fan-out cells directly or indirectly through an interconnect line. The input parasitic capacitances of fan-out cells should be thus considered as part of the load when calculating the output voltage of the driver cell. The following equation is used to characterize the parasitic capacitance seen at input A of a cell. A similar formula can be used for node B. () { } () () Ao AA mA mA VV iCV C V C V tt ∂ ∂ =+ − ∂ ∂ JGJG JG (8) 57 A SPICE-based transient analysis is similarly used to determine C A and C B . In this analysis, a saturated ramp is applied to the one input while the output node is connected to a DC voltage source, and the input current, is measured. Although the input parasitic capacitances, C A and C B , are in fact functions of the input and output voltage values, in practice, an input-voltage-dependent C A and C B , are all that can be efficiently utilized. This is because when calculating the output voltage waveform of a logic cell, the output voltage values of its fanout cells are unknown, and therefore, calculation of C A and C B values of the fanout cells cannot make use of any information about the output voltage levels of these fan-out cells. That is why we say that making C A and C B dependent on V o is not useful in practice. 4.3.3 Voltage Calculation The logic cell pre-characterization steps of the model are load-independent, because the model components are characterized as a function of the input, output and internal node voltage values rather than the input slew and output effective capacitance. Therefore the output voltage waveform can be constructed for a given input voltage waveform in the presence of an arbitrary load. Note that the current drawn by the load can always be written as a function of the output voltage of the logic cell and the load components. Using this current component for the load, a KCL equation at the cell output node can be written, which is a function of the cell output and input voltages, the pre-characterized cell components, and the load electrical parameters. For simplicity, in the remainder of this section, we show the 58 KCL equation for a simple capacitive load C L (i.e., the current component for the load is simply C L ∂V o /∂t). The KCL at the output node can be written with respect to output voltage values, resulting in: () () () () () () () () () 1 11 ok ok mA Ak Ak mB B k B k o LO mA mB Vt V t C V t V t C Vt Vt IV t CC C C + ++ =+ ×− +× − − ∆ ++ + JG (9) The internal node voltage value V N is evaluated with Equation (10): () () ( ) + ∆ =− J G 1 N Nk Nk N IV t Vt Vt C (10) Equations (9) and (10) are used iteratively. The updated value of V o at t k+1 from (9) is used to evaluate ( ) J G N IV in (10). Similarly the updated V N at t k+1 will be use when V o (t k+2 ) is calculated. The internal node effect is smaller when the fanout load is much larger than the diffusion capacitances of the driver cell. This is due to the fact that whether or not some additional output current is needed to charge the internal capacitances becomes less significant when the output current is large. The complete MSCSM can be used selectively for different logic cells based on the output load. Using this selective modeling, one can use the simple MSCSM of Figure 4-4(b) for the logic cells that drive a relatively large load. Otherwise, the complete MSCSM of Figure 4-6 should be used. 59 4.4 Experimental Results To study the accuracy of the proposed model, we performed extensive simulations and compared our MSCSM with HSPICE [31]. A 130nm cell library with the supply voltage of 1.2V has been used in these simulations. The set of experiments involved common logic cells, i.e., NAND and NOR cells. In the first experiment, we compared the efficiency of our MSCMS in modeling the delay of a NOR2 gate for the fast and slow transitions shown in . The result of this simulation is shown in Figure 4-7. From this figure one can see that MSCSM captures the effect of internal node and accurately models the delay. The maximum error of MSCSM for these two cases is 3%, while using the CSM presented in Figure 4-4 results in results in 22% error. Figure 4-8 compares the results of HSPICE and the proposed model when a glitch occurs at output of a NOR cell. The MSCSM completely models the logic cell and the generated waveforms by MSCSM for output nodes follow the HSPICE waveforms closely. Figure 4-9 displays the internal node voltage waveform for the glitch example shown in Figure 4-8. The red line refers to the HSPICE simulation result and the brown one shows the proposed model result. 60 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 Time(ns) Voltage IN1 IN2 OUT1(SPICE) OUT1(MSCSM) OUT2(SPICE) OUT2(MSCSM) Figure 4-7. MSCSM waveforms compared to HSPICE simulations for fast and slow cases shown in -0.05 0.2 0.45 0.7 0.95 1.2 11.5 2 2.5 3 Time(ns) Voltage In1 In2 OUT(SPICE) OUT(MSCSM) Figure 4-8. Using MSCSM to accurately model glitches -0.05 0.2 0.45 0.7 0.95 1.2 11.5 2 2.5 3 Time(ns) Voltage In1 In2 N(SPICE) N(MSCSM) Figure 4-9. The internal node voltage waveform by using HSPICE and the proposed model for the glitch example shown in Figure 4-8 Figure 4-10 shows another example simulated by MSCSM. Here input B has constant voltage value while the voltage value of input A is coupled to by an 61 aggressor. The shape of the waveform greatly impacts the accuracy of timing analysis; therefore, delay and output slew metrics may not be sufficient to construct shape of the waveform. Our model is able to compute close-to- SPICE output waveforms in terms of their actual shape. We use the Root Mean Squared Error (RMSE) as a metric to compare waveform similarities. RMSE is defined as: () () () 2 1 1 N SPICE k MSCSM k k RMSE V t V t N = =− ∑ (11) VSPICE and VMSCSM are the voltage values of the output of the logic cell at a given time. For each experiment, k=1 represents t 1 which is the time at which the noisy input starts to change whereas k=N represents t N when output node reach their stable final values (either high or low). We finally normalize RMSE to V dd to take out the effect of V dd scaling. To generate different noisy waveforms for this experiment, the noise injection time is swept for a time period 1ns with a step size of 10ps. The input line A to the NOR2 logic cell is coupled by a 50fF coupling capacitance and is under attack by an aggressor line. Both the victim and aggressor lines are driven by minimum sized inverters. The NOR2 logic cell has a FO2 load. The arrival time of the signal transition at the input of the victim line driver is set to 2.2ns while that of the aggressor driver (i.e., the noise injection time) is swept from 2ns to 3ns with a time step of 10ps. Figure 4-11 shows the 50% delay error when output waveforms are compared for the MSCSM and HSPICE for the period of noise injection. The average calculated RMSE is 1.4% of V dd , which confirms that our voltage waveform closely matches that produced by HSPICE. 62 -0.25 0.25 0.75 1.25 0 0.5 1 1.5 2 2.5 3 Time(ns) Voltage In1 In2 OUT(SPICE) OUT(MSCSM) Figure 4-10. Effect of multiple input switching, inputs A and B (In1 and In2) are changing simultaneously. 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.05 2.1 2.15 2.2 2.25 2.3 2.35 Noise Injection Time (ns) Delay Error (ps) Figure 4-11. Delay error vs. noise injection time 4.5 Summary In this chapter we presented an accurate current source model (CSM) for multiple switching CMOS logic cells which effectively captures the internal node voltage effect. This model, which is called MSCSM, is especially useful for delay and noise analysis of lightly loaded cells. We showed because of neglecting the effect of internal node voltages, previous multiple input switching current source models such as [9] may result in significant error in delay calculation, especially for lightly loaded cells. We showed that the accuracy of our proposed technique, which captures the 63 effect of internal node, is comparable with HSPICE. More precisely, it has been demonstrated that the maximum delay error of our model, which captures the internal node voltage effect, is 3% while that of a CSM without this capability is about 22%. 64 5. CHAPTER 5: A CURRENT SOURCE MODEL FOR CMOS LATCHES AND FLIP-FLOPS A current source model for register cells i.e., latches and master-slave flip-flops, is presented. Timing and noise analysis using current source models would not become a reality unless accurate current source models are developed for sequential cells. In addition to multi-stage logic nature of the sequential cells, the key difficulty is the presence of feedback loops. Our proposed model addresses these problems by characterizing the cell with suitable current source and parasitic components. Given the input and clock voltage waveforms of arbitrary shapes, our new model can accurately compute the output voltage waveform of a register cell, and hence, the timing parameters associated with the cell. All timing arcs such as the clock-to-Q propagation delay can be accurately calculated. This in turn enables one to perform accurate timing analysis for the next stage of the circuit which is fed by a latch or a flip-flop. Experimental results for our current source sequential cell model demonstrate close to Spice waveforms with three orders of magnitude speedup. 5.1 Introduction The incompatibility of the voltage-based pre-characterization data with noisy waveforms necessitates additional waveform-aware characterization steps of the logic cells for the purpose of noise analysis. For example, in [50], noise analysis is performed for feedback loops to check whether the noise transferred from the output 65 back to the input, is strong enough to change the state of the circuit. A key advantage of our CS model is that it can handle any type of input voltage waveform including full-swing hazardous pulses and partial glitches e.g., a crosstalk-induced noise pulse or glitch. Consequently, no extra characterization steps, such as the ones in [50], are needed. In addition to being C eff independent and being able to handle any arbitrary shape waveform, CSM is compelling in the sense that instead of only propagating the delay and slew value, it can propagate the whole voltage waveform e.g., in the form of a set of <time, voltage> pairs. CSM is able to do this propagation along the whole timing path from primary input to primary output. High accuracy of the CS models makes them attractive for employment inside a signoff timing analysis tool. Once a set of critical paths is identified by a standard static timing analysis tool, CS models of logic cells along a target critical path may be utilized to provide an accurate, yet highly efficient, evaluation of the timing criticality and/or noise susceptibility of the path in question. Close-to-SPICE accuracy with orders of magnitude higher speed than SPICE tools, make the CSM-based analysis very attractive. For example in [35] an efficient CS-based technique for worst case aggressor alignment is described that can reduce the pessimism of the conventional voltage based techniques by 50%. All previous CSM approaches have targeted combinational logic cells. However, each combinational part of the circuit is fed by and the output results are captured by a set of sequential cells. Therefore, the lack of CSM for sequential circuit elements makes it impossible to have a complete CSM-based solution for performing the 66 delay and noise analysis and optimization steps. Our CSM for the sequential cells makes it possible to construct the exact voltage waveforms for their outputs, and hence, drastically reduce the pessimism of timing arc calculations and setup/hold tests. To the best of our knowledge, we are the first to introduce CS modeling of the sequential cells. One of the deficiencies of typical sequential cell models is that they report an unknown result for the output if the setup/hold time tests are violated. A key benefit of the proposed model for CMOS register cells is that the output waveform may be computed even when setup/hold time violations occur. This can be very useful for diagnostic purposes. The remainder of this chapter is arranged as follows. In section 5.2 our sequential CSM is presented. Section 5.3 describes the pre-characterization steps. In section 5.4 we show how to perform the voltage calculation for the output of the latch. Section 5.5 and 5.6 explain the CSM for Master-Slave flip-flops and SR latches respectively. Finally the experimental results and conclusions are presented. 5.2 Current Source Modeling for Latches This section explains the components in the CS model of a CMOS latch as well as the characterization process. As mentioned earlier, the CS model can be used to calculate the output voltage waveform given an input voltage waveform of arbitrary shape, including one with noise-induced glitches. These glitches can cause functional errors if they are latched into sequential cells. Unlike typical voltage-based models 67 that need additional pre-characterization, our CSM can construct the output voltage waveform and detect whether or not the latch has failed. Most sequential cells such as flip-flops and latches have at least one feedback loop to store a certain logic state. As an example, Figure 5-1 shows a simple latch with a data input, D, a clock input, CLK, and true output, Q, and the complementary output Q_bar. The goal is to devise a current-based model capable of computing the output voltage waveforms (for nodes Q and/or Q_bar) given the input voltage waveforms for data and clock nodes. Note that input waveforms can be noisy which in turn can make the output waveforms noisy. The feedback loop is the most challenging part of such a model, because the noise which has been transferred to the output node through the path from the inputs to the output can be magnified through the feedback and fed back to input. The model should have the ability to accurately account for this feedback-magnification effect. D CLK_bar CLK Q_bar Q CLK_bar CLK Figure 5-1. A positive level-sensitive CMOS latch. Transmission gates (pass transistors) are commonly used in sequential circuits; therefore, it is important to provide a CSM for the pass transistor as shown in Figure 5-2(a). A pass transistor acts mainly as a non-linear resistor along its input and output nodes with the resistance value adjusted by its gate input (VG and its 68 inverted VG_bar) as well as the input and output voltage levels. The nonlinear resistance behavior of the pass transistor along its input and output can be well modeled by a dependent current source (cf. Figure 5-2(b)). It is necessary to consider the effect of both G and G_bar, therefore, the dependency becomes four dimensional (Vin, Vout, VG, VG_bar). However, this makes the pre-characterization process too complex and since the resulting pre-characterized lookup tables would have high dimensionality, the complexity of maintaining and utilizing them during the timing analysis is increased. To remedy this situation, we have devised the following characterization technique to reduce a 4-D dependence to two 3-D dependencies. The characterization of the pass transistor and its model is divided into two parts, one with respect to node G, and another with respect to G_bar. In the first section G_bar is forced to HIGH voltage level, in order to characterize the NMOS transistor by turning off the PMOS transistor. The dependency of each component in this model would be to three voltage values, Vin, Vout, and VG (cf. Figure 5-2(c)). G_bar G IN OUT I (V IN,V OUT,V G, V G_bar) C IN(V IN,V OUT,V G, V G_bar) IN OUT C G(V IN,V OUT,V G, V G_bar) G C OUT(V IN,V OUT,V G V G_bar) Figure 5-2 (a) Figure 5-2 (b) 69 I (V IN,V OUT,V G) C IN(V IN,V OUT,V G) IN OUT C G(V IN,V OUT,V G) G C OUT(V IN,V OUT,V G) I OUT (V IN ,V OUT ,V G ) C IN (V IN ,V OUT ,V G ) IN OUT C G (V IN ,V OUT ,V G ) G I IN (V IN ,V OUT ,V G ) C OUT (V IN ,V OUT ,V G ) Figure 5-2 (c) Figure 5-2 (d) Figure 5-2. (a) A pass transistor, (b) its 4-D CSM, (c) its 3-D CSM (for node G, The same model is used w.r.t. G_bar, (d) the decoupled version of the 3-D CSM. Similarly the second part of characterization is conducted to model the PMOS transistor by forcing G to LOW voltage level, hence turning off the NMOS transistor. The circuit model looks similar to the one in Figure 5-2(c) except that the component dependencies are now with respect to Vin, Vout, and VG_bar. Note that for complete model of the pass transistor the two non-linear current sources, I(Vin,Vout,VG) and I(Vin,Vout,VG_bar) should be placed in parallel. For the sake of simplicity we have only shown I(Vin,Vout,VG) in the figures. The current sources in Figure 5-2(b) and (c) can be decoupled into two current sources at the input and output nodes resulting in Figure 5-2(d). Note that C IN and C OUT in Figure 5-2(d) include the parasitic effects seen at the input and output nodes, respectively. There are also the Miller capacitances between nodes G and IN and also G and OUT. These Miller capacitances are decoupled and merged into CIN and COUT. In order to calculate the output voltage waveform, it is sufficient to write the KCL equations at the output node. Therefore the capacitive and current source 70 elements connected to IN and G in Figure 5-2(c) are not necessary for that purpose. However, they are needed when the pass transistor acts as a load. At each time instance, the latch can be in one of the three modes: transparent, hold, or transition. In order to have an accurate CSM, the behavior of the latch in each mode should be carefully investigated. In the following discussion we describe the CSM model for each mode. This step by step presentation helps up provide the intuition behind our complete model. Note that we do not need to select different models for different modes of operation. We will present a complete CSM (Figure 5-5) which covers all different modes and is able to adapt itself and calculate the output voltage in any mode. 5.2.1 Steady-state transparent mode (CLK=1) In this mode CLK = 1 (and CLK_bar=0), the latch is transparent, i.e., Q = D and the pass transistor connected to the input data is conducting. However, the pass transistor in the feedback path is OFF. The inverter between Q and Q_bar operates and passes the inverted Q, and hence, inverted D into Q_bar (cf. Figure 5-3(a)). The latch CSM in this mode may be obtained by connecting in series the CSMs for the inverter and the pass transistor, resulting in the model depicted in Figure 5-3(b). Note that in Figure 5-3(b) only the elements of pass gate at node Q are shown. The circuit components related to nodes D and CLK are similar to those in Figure 5-2(d) and are not repeated here. Recall that these components are useful only when the latch acts as a load. 71 D Q_bar Q CLK_bar CLK (a) I Q_bar (V Q ,V Q_bar ) C Q (V Q ,V Q_bar ) Q_bar C Q_bar (V Q ,V Q_bar ) C M (V Q ,V Q_bar ) Q I D (V D ,V Q ,V CLK ) C D (V D ,V Q ,V CLK ) (b) Figure 5-3. (a) Latch of Figure 5-1 in transparent mode, (b) Its CSM. 5.2.2 Steady-state hold mode (CLK=0) In this mode CLK =0 (and CLK_bar=1), making the pass transistor in the feedback loop conducting. A feedback loop is hence established such that the two inverters feed one another around the loop, while the input data is disconnected from the rest of the latch circuit (cf. Figure 5-4 (a)). As mentioned before the feedback loop is the most challenging part of a sequential element. The inverter model of Figure 2-2 can be used back to back for this case (see Figure 5-4(b)). Q_bar Q Figure 5-4 (a) 72 I Q_bar (V Q ,V Q_bar ) C Q (V Q ,V Q_bar ) Q_bar I Q (V Q ,V Q_bar ) C Q_bar (V Q ,V Q_bar ) Q C M (V Q ,V Q_bar ) Figure 5-4 (b) Figure 5-4. (a) Latch of Figure 5-1 in hold mode, (b) Its CSM. 5.2.3 Transition mode (switching CLK) This mode exists when the clock signal is making a falling or rising transition or when any of the CLK or CLK_bar is not in the steady (high or low) state. The two pass transistors may be in an intermediate region between ON and OFF states; therefore, the link from the input data to the rest of the latch circuit and/or the feedback path can be in a transient state. In contrast to the hold mode where the feedback loop is completely closed and the two cross-coupled inverters are connected back to back, in the transition mode the current to Q through the feedback is controlled by CLK (and CLK_bar). If CLK=1, this current will be zero; However if CLK=0, this current will be equal the output current of the feedback inverter, i.e., IQ in Figure 5-4(b). To account for this controlling behavior of the CLK signal we add one more dimension to IQ (dependent on VCLK as well) and shown as IQ(VQ,VQ_bar,VCLK) (this current is also dependent on VCLK_bar and this dependency can be modeled by a current source IQ(VQ,VQ_bar,VCLK_bar) in parallel which is not shown in the figure). 73 This mode can also work for the case when the feedback loop is open, i.e., CLK=1. However, in this case the pass transistor on the path between D and Q is functioning. Therefore, the CS model should be a superset of the CSMs in Figure 5-2(b) and Figure 5-4(b) with IQ made dependent on VCLK. The results is the complete CSM for the latch which is depicted in Figure 5-5. The CSM of Figure 5-5 can handle waveforms of arbitrary shapes at both D and CLK inputs and is able to construct the voltage waveforms at node Q and Q_bar in any operation modes of the latch. The transition mode is one of the main factors that comes into action when a setup or hold time test is violated. In the experimental section we will present examples of such cases to show the accuracy of our CSM-based sequential cell modeling. I Q_bar (V Q ,V Q_bar ) C Q (V Q ,V Q_bar ) Q_bar I Q (V Q ,V Q_bar ,V CLK ) C Q_bar (V Q ,V Q_bar ) C M (V Q ,V Q_bar ) Q I D (V D ,V Q ,V CLK ) C D (V D ,V Q ,V CLK ) Figure 5-5. CSM for the CMOS latch in Figure 5-1. 74 5.3 CSM for sequential cells: Pre-characterization In the following subsections we describe how to pre-characterize the CSM of Figure 5-5 using a SPICE tool. 5.3.1 Non-linear Voltage-Controlled Current Sources The current sources at nodes Q and Q_bar capture the non-linear behavior of the latch. To characterize IQ(VQ,VQ_bar,Vclk) and IQ_bar(VQ,VQ_bar), CLK, CLK_bar, Q, and Q_bar pins are forced to DC values by using DC power supplies (note that the analysis for CLK_bar is similar to that of CLK and from now on we will not repeat the CLK_bar analysis). Each power supply is swept from zero to (VDD+∆). The ∆ margin of safety is considered for the cases that the clock and/or output voltages overshoot beyond VDD. The currents sourced by the Q and Q_bar pins are measured in SPICE. Similar to Figure 2-2, the current sourced by the Q_bar pin is only dependent on DC voltage values of Q and Q_bar and is related to the inverter between these two pins. As a result, a 2-D lookup table is constructed to store the values of IQ_bar. However, the current IQ sourced at node Q is dependent on all there voltage sources. This is due to the fact that the transmission gate in the feedback loop controls the current which is sourced by the Q pin. Figure 5-6 shows the characterization setup for IQ(VQ,VQ_bar,VCLK) and IQ_bar(VQ,VQ_bar). Note that the DC voltage source applied to CLK_bar pin (which is the inverted of the voltage applied to CLK) is not shown in the picture. 75 In order to characterize ID(VD,VQ,VCLK), we apply DC voltage sources to the three terminals of the transmission gates and measure the current in SPICE. Note that for this characterization, we divide the latch into two parts (Figure 5-6(a) and Figure 5-6(b) and measure the current sourced by the Q node in two different steps as IQ(VQ,VQ_bar,Vclk) and ID(VD,VQ,VCLK). The alternative solution is to consider the complete latch circuit (Figure 5-1) and connect four DC voltage sources to D, Q, Q_bar and CLK pins and create a 4-D table for the current sourced by the Q pin. However, this table would contain some elements that are not important for us. Thus we resort to the simpler approach which results in two 3-D tables instead of a single 4-D table. This is clearly more desirable from memory efficiency viewpoint. CLK_bar Q_bar Q CH1 CH3 CLK CH2 Q CH1 CH3 CLK CH2 D CLK_bar (a) (b) Figure 5-6. I Q (V Q ,V Q_bar ,V CLK ) and I Q_bar (V Q ,V Q_bar ) characterization 5.3.2 Non-linear Voltage-Dependent Capacitances These non-linear capacitances capture the parasitic effects that exist in the latch. CQ_bar(VQ,VQ_bar) captures the parasitic capacitance at the output pin (Q_bar) of the latch. CM(VQ,VQ_bar) is the Miller capacitance between Q and Q_bar nodes. Similarly, CQ(VQ,VQ_bar) is the parasitic effect seen at node Q in Figure 5-6(a). 76 Finally CD(VD,VQ,VCLK) models the parasitic capacitance seen at node Q in Figure 5-6(b). CQ and CQ_bar are represented as a function of VQ and VQ_bar. The dependency of these parameters to VCLK is a second order effect and negligible. However, we used average values of these parameters for different VCLK values. To better understand the pre-characterization process for these capacitors, we first write the KCL equation at node Q_bar of the model in Figure 5-5: () () () ( ) __ _ _ __ ,, ,, 0 Q o Qbar QQbar M QQbar Qbar M Q Qbar o Q Qbar dV iI VV C VV dt dV CVV C VV dt +− ⋅+ ⎡⎤ + ⋅= ⎣⎦ (12) Several transient analysis steps are performed to characterize the capacitive elements of our CSM. To pre-characterize the Miller capacitance CM, a saturated ramp input voltage is applied to node Q (CH1 voltage source in Figure 5-6(a)). Simultaneously, DC voltage sources are applied to nodes Q_bar and CLK (CH2 and CH3 in Figure 5-6(a)). These DC voltage values are swept from 0 to (VDD + ∆). The terms containing _ Qbar dV dt in Eqn. (12) will thus be zero. Next, with the above setup, the current that is sourced by the Q_bar pin (the current associated with CH2) is monitored. IQ_bar(VQ,VQ_bar) is plugged into the equation for the corresponding voltage values of Q and Q_bar nodes. Since IQ_bar(VQ,VQ_bar) is already characterized in Step I, the only unknown parameter in Eqn. (12) is CM(VQ,VQ_bar), which is thereby calculated. We have observed that the slope of the ramp input does not change the characterization results. However, we examined ramp signals with different slopes and used the average parameter values for all the 77 ramps to fill up the lookup tables. A similar procedure is used for the characterization of the capacitance CQ_bar(VQ,VQ_bar). However, this time a ramp voltage is applied to node Q_bar while CLK and Q are forced to DC voltage values. Similar to what is done for the Miller and output capacitances, a transient analysis is used to determine CQ. In this analysis, a saturated ramp is applied to input Q (CH1 inFigure 5-6(a)), while DC voltage sources are applied to CH2 and CH3 and the input current, iCH1, is monitored. KCL equation at Q node for the corresponding CSM of Figure 5-6(a) is written as: () () () 1_ _ _ _ _ (, , ) , , ,0 Q C H Q Q Q bar clk Q Q Q bar M Q Q bar Q bar MD Qbar dV i I VV V C VV C V V dt dV CV V dt ⎡⎤ ++ + ⋅ ⎣⎦ −⋅ = (13) where _ Qbar dV dt is equal to zero. Therefore CQ values can be calculated and a 2-D table is created to store them. A similar approach is utilized to characterize the parasitic capacitances of the transmission gates. As stated before, our model considers the Miller effects between CLK and pins Q and CD by decoupling those capacitances and adding them to the capacitance of each pin. 5.4 Voltage Calculation based on the CSM sequential cells An immediate benefit of applying our latch CSM in timing and noise analysis is that its pre-characterization steps become load-independent, because the model elements are characterized as a function of the input and voltage values instead of the input 78 slew and effective output capacitance. Therefore the output voltage waveforms at nodes Q and Q_bar can be constructed for given input voltage waveforms (for CLK and D) in the presence of any arbitrary load. Writing the KCL equations at nodes Q and Q_bar of our model enables us to calculate the voltage waveform associated with these nodes. The KCL equation at node Q is written as: ()( )( ) ()( )( ) _ __ __ ,, , , , ,, , , 0 Qbar D D Q C L K Q Q Q bar C L K M Q Q bar Q D D Q C L K Q Q Q Q bar M Q Q bar dV IV VV I VV V C VV dt dV C V VV I C VV C V V dt +− ⋅ ⎡⎤ ++ + ⋅= ⎣⎦ (14) Another KCL equation can be written at node Q_bar. Note that the current drawn by the load can always be written as a function of the output voltage of the logic cell driving the load as well as the electrical elements of the load. Therefore, a KCL equation at node Q_bar can be written which is a function of the cell output and input voltages, the pre-characterized cell components, as well as the load electrical parameters. To simplify the discussion we show the KCL equation for a simple capacitive load CL (hence the current component of the load is simply CL∆VQ_bar/∆t): () () () () __ _ _ __ _ ,, ,, 0 Q Q bar Q Q bar M Q Q bar Q bar Q bar Q Q bar M Q Q bar L dV IVV CVV dt dV CVV CVV C dt −⋅ ⎡⎤ ++ +⋅= ⎣⎦ (15) In cases where the output load is a CMOS circuit such as another sequential circuit, the CSM for the corresponding circuit should be substituted. The CSM elements are typically input and output voltage dependent. For example, in case of a 79 having a latch as the output load, a capacitance and a current source both of which are dependent on the input and output voltages of the load must be considered. However the output voltage of the load is not known yet. In such cases, we deem that the model elements only depend on the known voltage value (input voltage) and so in the corresponding lookup tables make an average over the proper row or columns related to the unknown value. Next we show how the CSM simulator constructs the voltage waveforms of Q and Q_bar. We first assume that the voltage of Q and Q_bar at time t k is known (calculated by the CSM in the previous steps). The voltage values at node D at any time instance are also known (node D is either a primary input or is the output of some combinational cells and is calculated based on a combinational CSM. Note that we can easily perform the CSM analysis (i.e. the calculation of the noisy output voltage) of the last combinational stage simultaneously with that of the latch. More precisely we can always calculate VD, VQ and VQ_bar altogether using the CSM of the last combinational stage along with the CSM of the sequential cell. However for simplicity in the next subsection we show the analysis for calculating VQ and VQ_bar. Our CSM simulator constructs the voltage waveform of Q and Q_bar in two steps as explained next. A Similar approach can be used to solve VD, VQ and VQ_bar iteratively. Our CSM simulator constructs the voltage waveform of Q and Q_bar in two steps as explained next. 80 5.4.1 Computing V Q (t k+1 ) Eqn. (14) can be rewritten with respect to VQ and VQ_bar values, resulting in: () ( ) ( ) () ()( ) () () ( ) 11 _1 __ _ 1 1 __ (), (), () (), ( ), ( ) ,()() ,, ,, DD k Q k CLK k QQ k Qbar k CLK k M Q Qbar Qbar k Q bar k Qk Qk QQ Qbar M Q Qbar D D Q CLKQ IV t V t V t t I V tV tV t t CV V V t V t Vt Vt CV V C V V C V V V I ++ + − + ⎧⎫ ×∆ ⎪⎪ ⎪⎪ +×∆ ⎨⎬ ⎪⎪ +× − ⎪⎪ ⎩⎭ −= ++ (16) where VQ(t k+1 ) denotes the voltage value of node Q at the subsequent time instance. The non-linear parameters of the model are evaluated at VD(t k+1 ),VCLK(t k+1 ),VQ(t k ), and VQ_bar(t k ). 5.4.2 Computing V Q_bar (t k+1 ) Having calculated VQ(t k+1 ) from Eqn. (16), we rewrite Eqn. (15) with the updated values of VQ which results in: () ( ) ( ) ()( ) () ( ) _1_ _1_ _1 _ __ _ (), () ,() () ,, Qbar Q k Qbar k M Q Qbar Q k Qbar k Q bar k Q bar k Q bar Q Q bar M Q Q bar L IVt V t t CV V V t V t Vt Vt CVV CVV C + + + ⎧ ⎫ ×∆ + ⎪ ⎪ ⎨ ⎬ ×− ⎪ ⎪ ⎩⎭ −= ++ (17) where the non-linear parameters of the model are evaluated at VD(t k+1 ),VCLK(t k+1 ),VQ(t k+1 ), and VQ_bar(t k ). Similarly the updated value of VQ(t k+1 ) will be used in the next round of Eqn. (16) to calculate the VQ(t k+2 ). Note that Eqns (16) and (17) are used in an iterative manner. Therefore for calculating VQ(t k+1 ) and VQ_bar(t k+1 ), we should iterate between these 81 equations until a fixed point is reached. However, based on our experiments after the first iteration, if we revisit Eqns (16) and (17), the change in values of VQ(t k+1 ) and VQ_bar(t k+1 ) is insignificant. As a result for maintaining the run-time of the voltage calculator we limit ourselves to just one iteration. When there is no feedback (i.e., CLK=1 and IQ≈0), solving Eqns (16) and (17) is similar to calculating the output voltage of an inverter for which the input comes from a transmission gate. When the feedback is closed (i.e., CLK=0 and ID≈0), Eqns (16) and (17) update each other’s current sources (IQ and IQ_bar), which accurately model the magnification effect of the feedback loops. The other mode of operation which is the transmission mode is also captured by the dependency of ID and IQ to VCLK. 82 5.5 Current Source Modeling for Master-Slave Flip-Flops An edge-triggered (master-slave) flip-flip comprises of two level sensitive latches: a negative level-sensitive and a positive level-sensitive latch. The first stage latch is referred to as the master latch while the second stage latch is called the slave latch. Figure 5-7 shows a positive edge-triggered flip-flop. When CLK is low, the master negative level-sensitive latch output, QM_bar, follows the D input while the slave positive level-sensitive latch holds the previous value When CLK makes a rising transition from 0 to 1, the master latch stops sampling the data input and holds the last data value at the time of the clock transitions. The slave latch becomes transparent, passing the stored master value of QM_bar to the output of the slave latch, Q. The D input is blocked from affecting the output because the master is disconnected from the D input. When CLK makes a falling transition from 1 to 0, the slave latch holds its value and the master starts sampling the input again. D CLK CLK_bar QM_bar CLK CLK_bar CLK_bar CLK Q CLK_bar CLK Figure 5-7. A positive edge triggered flip-flop To develop the CSM for a master-slave flip-flop, the latch CSM model of section 5.2.3 (Figure 5-5) can be substituted for both the master and the slave latches. Therefore for a given input data and clock, the QM_bar (the first stage of Figure 5-7) 83 can be calculated by using Equations (16) and (17). The same approach is used to calculate the voltage value at node Q (the second stage of Figure 5-7). We consider the loading effect of the slave latch on the master latch output, QM_bar. 5.6 Current Source Modeling for SR Latches In this section we briefly explain how the CS model for a different type of latch such as SR latch can be created. Figure 5-8(a) shows an SR latch implemented using a pair of cross-coupled NAND cells. We use a simple multiple input switching CSM for each NAND and then combine them to create the CSM for the SR latch. The resulting CSM is depicted in Figure 5-8(b). It is easy to extend this model by using a complete multiple inputs switching scheme as described in chapter 4 considering the effect of internal nodes. The current sources at nodes Q and Q_bar are characterized by 3-D lookup tables. Although, in theory, capacitances at input and output nodes of the NAND are dependent on voltage values of the combinational cell terminals, these values are not as sensitive to these voltages as the non-linear current sources. Therefore, the number of entries in the capacitance look-up tables can be significantly smaller than that for the current-source look-up tables. The voltage values at Q and Q_bar can be calculated similar to section 5.4. 84 S R Q_bar Q S R Q_bar Q (a) I Q_bar (V R ,V Q ,V Q_bar ) C Q Q_bar I Q (V S ,V Q ,V Q_bar ) C Q_bar Q C MQQ_bar R C MRQ_bar C MSQ C R C S S (b) Figure 5-8. (a) NAND-based SR latch (b) CSM for SR latch 5.7 Experimental Results Our CSM simulator was implemented using C and Perl. All the experiments discussed in this section were performed on a Sun Fire V880 machine with the Ultra- SPARC III 750MHz processor running Sun Solaris operating system. To evaluate our CSM models, we use HSPICE as providing the “golden” result. In our experiments we considered voltage waveforms with arbitrary shapes from simple saturated ramps to crosstalk-induced noisy waveforms with voltage fluctuations as high as 85% of V dd . It is crucial to capture any noise effect at the output of the latch and find out whether it can change the state of the latch through 85 the feedback loop(s). The experiments were performed by using latches and flip- flops of different types (to be described later) and the output Q and Q_bar waveforms were compared with those of HSPICE. An example of such experiments is shown in Figure 5-9(a). It is seen that despite the D input being noisy, the noise does not result in the change of state for the latch i.e., Q_bar changes from High to Low as expected. The Q and Q_bar waveforms generated by our model closely match the HSPICE waveforms. Figure 5-9(b) shows another example in which the noisy input of the latch has resulted in an illegal change of state. The Q_bar signal remains at High level and cause a functional. Figure 5-9(b) clearly shows the capability of our CSM model to accurately construct the exact output of the latch and hence report any failure in the state of the circuit. 0 0.2 0.4 0.6 0.8 1 1.2 2.0E-10 5.0E-10 8.0E-10 1.1E-09 1.4E-09 1.7E-09 2.0E-09 Noisy D input Time (sec) CLK Q (Hspice) Q_bar (Hspice) Q (our model) Q_bar (our model) Figure 5-9 (a) 86 0 0.2 0.4 0.6 0.8 1 1.2 2.0E-10 4.0E-10 6.0E-10 8.0E-10 1.0E-09 1.2E-09 1.4E-09 Noisy D input CLK Q_bar (our model) Q_bar (Hspice) Q (Hspice) Q (our model) Time (sec) Figure 5-9 (b) Figure 5-9. Crosstalk induced noise at D input (a) changes the state of the latch as expected (b) does not change the state of the latch and causes a functional error The shape of the waveform highly impacts the accuracy of timing analysis; therefore, delay and output slew metrics may not be sufficient to construct shape of the waveform. Our model is able to compute close-to-SPICE output waveforms in terms of their actual shape. We use the Root Mean Squared Error (RMSE) as a metric to compare waveform similarities. RMSE is defined as: () () () 2 1 1 N SPIC E k C SM k k RM SE V t V t N = =− ∑ VSPICE and VCSM are the voltage values of the output of the latch (Q or Q_bar) at a given time. For each experiment, k=1 represents t 1 which is the time at which the noisy input D starts to change whereas k=N represents t N when both Q and Q_bar reach their stable final values (either high or low). We finally normalize RMSE to Vdd to take out the effect of V dd scaling. To generate different noisy 87 waveforms for D, the noise injection time (which is defined as the arrival time of the aggressor that is attacking the D signal) is swept from 100ps to 600ps with a step size of 5ps. The CLK signal was kept fixed at 1.6ns. Note that in some of the cases unwanted change of the latch may occur. Table 1 shows the normalized RMSE for some of these cases in 130nm library for the Q_bar output. Latch1 and Latch2 are transmission gate based latches (Figure 5-1) of different sizes whereas latch 3 is an SR type latch (Figure 5-8). FF1 is a master-slave flip-flop as depicted in Figure 5-7. As reported in Table 2, the RMSE is around 2% of Vdd, which confirms that our voltage waveform closely matches that produced by HSPICE. Noise injection time (psec) 200 300 400 500 600 Latch 1 10.5e-3 9.9e-3 11.5e-3 8.1e-3 13.1e-3 Latch 2 8.7e-3 7.4e-3 11.6e-3 9.3e-3 11.2e-3 Latch 3 21.3e-3 19.5e-3 22.6e-3 21.2e-3 25.8e-3 RMSE FF1 14.3e-3 16.5e-3 15.6e-3 17.3e-3 15.2e-3 Table 2. Waveform similarity (Normalized RMSE) comparison with HSPICE To see the effect of technology scaling on the accuracy of our CSM, we performed some experiments by using Predictive Technology Model (PTM) [32] for 90nm and 65nm. For each cell and for each technology, we calculated the average normalized RMSE. These results are reported in Table 2. In addition to sweeping the noise injection time from 100ps to 600ps, the CLK signal was also swept from 1ns to 1.9ns with step size of 5ps. This resulted in 9000 different configurations for each cell under evaluation. Note that the CSM-based calculator is on average 3 orders of magnitude faster than HPSICE while producing nearly the same accuracy results. 88 Average normalized RMSE Library Cell Q Q_bar Runtime speedup vs. HSPICE Latch 1 13.5e-3 11.1e-3 1220 Latch 2 14.1e-3 12.2e-3 1220 Latch 3 26.5e-3 23.3e-3 2130 130nm FF1 6.5e-3 7.3e-3 1110 Latch 1 12.5e-3 10.1e-3 1230 Latch 2 14.5e-3 12.8e-3 1330 Latch 3 27.4e-3 23.6e-3 2160 90nm FF1 6.9e-3 7.9e-3 1150 Latch 1 12.9e-3 10.6-3 1290 Latch 2 14.7e-3 13.3e-3 1290 Latch 3 27.5e-3 24.5e-3 2170 65nm FF1 7.6e-3 8.2e-3 1410 Table 3. Waveform similarity (Normalized RMSE) comparison with HSPICE for different cells in different technology 5.8 Summary A CS model for sequential cells such as transparent latches and master-slave flip- flops was presented. The main challenge is the presence of feedback loops. Our proposed model addresses those, by creating the necessary current source and parasitic components. Given the input and clock voltage waveforms of arbitrary shapes, our new model can accurately compute the output voltage waveform of a register cell, and hence, the timing parameters associated with the cell. This was shown to considerably reduce the pessimism in timing parameters. Experimental results for our current source sequential cell model demonstrate closed to SPICE waveforms with significant runtime speedup. 89 6. CHAPTER 6: POWER OPTIMAL MTCMOS REPEATER INSERTION FOR GLOBAL BUSES This chapter addresses the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. MTCMOS technique by inserting high-V th sleep transistors to reduce the leakage power consumption in the idle mode is used. We simultaneously calculate the repeater sizes, repeater distances, and the size of the sleep transistors to minimize the power dissipation. The effect of crosstalk coupling capacitance on propagation delay and (switching and short circuit) power dissipation is considered. Experimental results show that depending on the activity factor of the circuit, the proposed technique can significantly reduce the power consumption of the global bus interconnects. 6.1 Introduction As the CMOS technology continues to scale down toward Ultra Deep Sub-Micron (UDSM) technologies, more functionality is being integrated on a single die. This drastic integration results in increase in the size of the die, and consequently in the number of long global interconnects and in the length of them. The interconnect delay becomes the dominant factor to determine the overall performance of the integrated circuits. Since the delay of an interconnect is quadratic in its length, repeater insertion has been widely used to reduce the delay. As shown in [10] the repeaters can be optimally sized and separated to minimize the interconnect delay. 90 The size of an optimal repeater is typically much larger than a minimum-sized repeater. Since millions of repeaters will be inserted to drive global interconnects, significant power will be consumed by these repeaters, particularly if delay-optimal repeaters are used [17]. Several works used the extra tolerable delay for power saving in interconnects. Authors in [11] and [17] provided analytical methods to compute unit length power optimal repeater sizes and distances. The power analysis should consider switching, leakage and short circuit accurately. As the technology scales down wires are laid out closer to each other which in turn increases the capacitive coupling noise on the interconnection lines. This will affect both delay and power consumption in interconnects. In addition to switching power on the coupling capacitances, in [26] we showed that the short circuit power consumption is increased significantly in the presence of crosstalk noise. Therefore, one should also consider this effect in the design of power optimal repeaters. Moreover, the technology scaling has resulted in large increase in leakage current. Leakage power has grown exponentially to become a significant fraction of the total chip power consumption [58]. Authors in [54] studied the applicability of MTCMOS to repeater design for leakage power saving, however they did not provide a mathematical solution for the simultaneous optimal sizing of the sleep transistors and repeaters and the insertion length. In addition the effect of crosstalk on delay and power has not been taken into account for the optimal design. This chapter studies the opportunity of minimizing the average power consumption during both active and standby mode of the bus lines by simultaneously 91 computing repeater sizes, repeater insertion lengths, and the size of the sleep transistors subject to a delay constraint in the presence of crosstalk noise. We consider the worst case crosstalk for the delay constraint. However the assumption of worst case crosstalk is not realistic for power optimization. More precisely, the objective is to minimize the average power (in contrast to minimizing the maximum power). Therefore, we show how to estimate the average power as a function of probability of different types of transitions on the coupled lines. We will also discuss the delivery circuitry of the sleep signals to the sleep transistors. 6.2 Delay Model Consider a uniform interconnection line of resistance r per unit length and capacitance c per unit length, and total length of L. Suppose the line is divided into L/l segments and identical repeaters of unit driving resistance rs, unit input capacitance cg, unit output capacitance cp, and size s are inserted at each segment (c.f. Figure 6-1 for a pictorial). Figure 6-2 shows one stage of the repeater chain with the interconnect model in between. The delay and the transition time of a segment comprising of a repeater driving an interconnect segment of length l terminated with a repeater of the same size and driven by a step input are ln 2 τ ⋅ and ln9 0.8 τ ⋅ , respectively. Note that ( ) 2 1 2 sg p s g r c c r cl s rlsc rcl τ=+ + + + . With a finite input slew rate, the contribution of the input transition time t r to the repeater delay can be represented by γt r [55] where, for a rising input, γ is calculated as 92 ()( ) 1 2 1/ 1 tn dd n VV γ α =− − + where V tn is the threshold of the NMOS and α n is the NMOS alpha-power parameter. Similarly, for a falling transition, γ is calculated from the PMOS parameters. An average value for γ is used. Therefore the delay of one repeater stage is given by ( ) ln 2 ln9 0.8 γ τ +⋅ . Figure 6-3 shows the delay model for two adjacent bus lines. c c is the coupling capacitance per unit size. We assume zero skew between the transitions launched into the lines. The worst case delay occurs when transitions on these two lines are in opposite directions. s r s /s c p s c g s V tr Figure 6-1. Buffer model l r s /s c p s c g s c p s c g s r s /s cl/2 cl/2 rl V tr V tr ss Figure 6-2. One stage of repeaters with interconnect model We model the Miller effect in coupling capacitance (to create the worst case delay conditions) by rewriting the formula for the time constant τ as follows: () () () 2 1 2 2 s sg p c g c r r c c c c l rlsc c c rl s τ=+ + + + + + (18) 93 The total delay of the interconnection line is equal to τ.(L/l). Therefore, minimizing the total delay is equivalent to minimizing the time constant per unit length i.e., τ /l: () () () 1 2 1 2 s sg p c g c r rc c c c rsc c c rl ll s τ =+ + + + + + (19) With a derivation similar to that given in [10], the worst case delay per unit length of interconnect line (in the presence of crosstalk) is minimized when: () () 2 2 sg p opt c rc c l rc c + = + , () 2 s c opt g rc c s rc + = (20) and, () 1 211 2 p sg c g opt c rc r c c lc τ ⎛⎞ ⎛⎞ ⎛⎞ ⎜⎟ = ++ ⎜+ ⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ⎝⎠ ⎝⎠ (21) It has been shown in [11] and [17] that the optimal delay per unit length (and therefore the optimal total delay) is insensitive to both the size of the repeaters and the distance between repeaters. Hence, significant power and area can be saved by allowing a small delay penalty. Therefore, one can use repeaters with sizes smaller than s opt and segment lengths longer than l opt , and achieve a significant power saving. To accurately address this power optimization problem, we first present the power dissipation model of the global buses and then introduce our power optimal repeater design methodology. 94 6.3 Power Dissipation Model The power dissipation of a global bus line has three components: switching power, short circuit power, and leakage power. 6.3.1 Switching Power Dissipation This is due to charging and discharging of the load capacitance. The switching power for one stage can be calculated as: ( ) ( ) 2 dd g p P fV s c c lc α=++ (22) where α is the switching activity of the inverter, f is the frequency, and V dd is the supply voltage. Note that equation (22) does not consider the switching power consumed on the coupling capacitances. When only one of the lines switches, the coupling capacitance c c ·l charges or discharges with a voltage level change of V dd . Therefore, its coupling energy consumption is 2 0.5 cdd clV . When two adjacent bus lines are simultaneously switching in the opposite directions, the coupling capacitance (c c ·l) charges or discharges with a voltage level change of 2V dd .. Therefore, the total energy consumption by the drivers of both lines is 0.5c c ·l(2V dd ) 2 [60]. Finally when two adjacent bus lines make transitions in the same direction, no coupling energy is consumed. To estimate the average switching power consumption on a single stage of the repeater chain, we make the following assumptions: i) Assume that there is no temporal and spatial correlation between the data which is being transmitted through the two adjacent bus lines. ii) The probability of transmitting a ‘1’ (‘0’) is equal to 95 ‘p’(‘1–p’). As a result, the probability of the transition between two consecutive data bits on a single bus line can be calculated as k 1 =p(1–p). To calculate the average coupling power, we need to calculate the probability of each type of transition on the coupling capacitance between two adjacent lines. Table 4 presents these probabilities for all possible scenarios. Note that 5 2 1 i i k = = ∑ .Using the values of k 1 to k 5 , we can write the average switching power consumption for one stage of two adjacent bus lines (Figure 6-3): ( ) ( ) () 2 1 2 2 23 20.5 0.5 2 0.5 sw dd g p dd c dd c PkfVscclc k f V lc k fV lc =× + + ++ (23) r s /s c p s c g s c p s c g s r s /s cl/2 cl/2 rl V tr V tr r s /s c p s c g s c p s c g s r s /s cl/2 rl V tr V tr c c l/2 c c l/2 cl/2 l l ss ss c c ⋅l Figure 6-3. The model for one stage of two adjacent coupled bus lines 96 Without loss of generality and for the sake of the presentation, we will limit ourselves to only two adjacent lines. The analysis for three (and more) bus lines is similar. In general, if the input pattern and the spatial-temporal correlation between the data bits of a single line or two adjacent lines are available, a number of probabilistic techniques such as [41][42][69] can be used to estimate k 1 to k 5 . Furthermore, several encoding techniques have been proposed for minimizing coupling effect for static on chip bus structures [12][53][59]. Some approaches were also introduced to find a permutation for the bus lines for minimizing the crosstalk effects [33][70]. The impact of these optimization techniques can be captured by appropriately revising the equations for k 1 to k 5 . The rest of the analysis remains the same. Table 4: Probability of different switching scenarios on the coupling capacitances Transition Type Occurrence Probability Opposite direction ( ) () 2 2 2 21 Pk p p ↑↓ = = − One switches and other is quiet ( ) () ( ) ( ) 2 2 3 41 1 P kp pp p ↑− = = − + − Both quiet ( ) () () 42 42 4 121 Pkp p p p −− = = + − + − Same direction ( ) () 2 2 5 21 Pk p p ↑↑ = = − The coupling power is also dependent on the relative switching time of the line drivers [60]. For global buses, we can safely assume zero skew between the drivers’ 97 switching times. One can consider the relative delay between the transitions of the two lines and use a similar approach as [60] to compute the effect of relative delay on coupling power. 6.3.2 Short-Circuit (SC) Power Dissipation The SC power is consumed by the current flow between the power rails through a direct current path which is temporarily established during an output transition [63]. Several techniques have been proposed to estimate the SC power dissipation [6][26] [48][63]. The SC power is a function of the input transition time, the output load capacitance, and the size of the transistor. Most of the previous works on power optimal repeater design either ignore the SC power consumption or use an inaccurate approximation of the SC power consumption. We use the closed form formula presented in [48] which captures the dependence of the SC power consumption on the circuit parameters. The SC power consumption is increased significantly in the presence of crosstalk noise [26]. Therefore similar to switching power, we formulate average short circuit power consumption based on the transition type probability on adjacent bus lines. As shown in [48], the SC energy consumption of an inverter during a full signal switch (such as a falling transition followed by a rising) can be approximated as 0 0 22 2 4 2 dr dd SC dsat out d r sI t V E VGC sHI t = +⋅ (24) where H and G are technology dependent parameters [48] and Id0 is the average 98 saturated drain current of the NMOS and PMOS transistors of the minimum sized inverter. Due to the shielding effect of the interconnect resistance, the repeater sees a capacitance less than C total , where C total is the summation of repeater parasitic capacitances, interconnect capacitance and the coupling capacitances (considering the miller effect based on the transition type), e.g., ( ) ( )() 2 total p g c Cccsccl ↑↓ = + + + (25) Using the effective capacitance approach, the capacitance seen by the repeater for opposite direction transitions is written as: () ( ) ( ) ( ) 22 out eff p c g c cl cl C C c s cl c s cl δ ↑↓ = ↑↓ = + + + ⋅ + + (26) where δ<1 and depends on l and s. The ratio of C eff to C total is also a function of l and s. Similar to [17], we calculate ω, the average ratio of C eff to C total for different types of transitions. This average ratio is used for short circuit evaluation. In addition, due to the impact of crosstalk on transition time, different values for t r are used (by considering different τ values due to different coupling capacitances). Therefore, the average short circuit power consumption of the repeater (for one falling or rising transition) can be estimated as: () () () () 0 0 22 2 2 2 2 dr dd SC dsat d r total fs I t V Pk VG C sHI t ω ↑↓ ↑↓ ↑↓ ↑↓ ⋅⋅ =⋅ ⋅+⋅ (27) () () () () () () () () 00 00 22 2 2 22 3 5 22 2 22 dr dd dr dd dsat d r dsat d r total total fs I t V fs I t V k k VG C sHI t V G C sHI t ωω ↑− ↑↑ ↑− ↑↑ ↑− ↑↑ ↑− ↑↑ ⋅⋅ ⋅ ⋅ +⋅ + ⋅ ⋅+⋅ ⋅ +⋅ 99 6.3.3 Leakage Power Dissipation The third source of the power dissipation is the leakage current. In the present CMOS technologies, the major components of the leakage current are sub-threshold and gate-tunneling currents [23]. The sub-threshold leakage is the drain-source current of a transistor operating in the weak inversion region which can be expressed as follows [23], () () kT qV kT n V V V V q eff ox sub sub DS DS SB GS e e L W C A I / ' ' 0 1 0 − + − − − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = η γ µ (28) where A sub = (kT/q) 2 e 1.8 , µ 0 is the zero bias mobility, C ox is the gate oxide capacitance per unit area, W and L eff denote the width and effective length of the transistor, k is the Boltzmann constant, T is the absolute temperature, and q is the electrical charge of an electron. In addition, V t0 is the zero biased threshold voltage, γ’ is the linearized body-effect coefficient, η denotes the drain-induced barrier lowering (DIBL) coefficient, and n′ is the sub-threshold swing coefficient of the transistor. Since V DS of the OFF transistor is V dd which is more than a few kT/q∼26mV, the sub-threshold leakage power of an NMOS transistor can be written as, , ' tn V sub nmos sub nmos nmos PAW e λ µ − = (29) Where A’ sub =A sub V dd ×C ox /L eff ×exp(ληV dd ) and λ=q/n′kT are technology constants. 100 A similar formula can be derived for a PMOS transistor. Therefore, sub-threshold leakage power dissipation of a repeater can be written as, ( ) ,, 1 sub sub pmos sub nmos PpP pP =⋅ + − ⋅ (30) where p is the probability that the input of the inverter is at logic 1. If the ratio of the width of the PMOS transistor to that of the NMOS is β, (30) can be re-written as: () ( ) min ' 1 1 tp tn V V sub sub pmos nmos sub AsW Ppe peKs λ λ βµ µ β − − ⋅ =+− =⋅ + (31) where W min is the minimum size of the inverter. Gate tunneling is the other major source of the leakage power. The major source of gate tunneling leakage in CMOS circuits is the gate-to-channel tunneling current of the ON NMOS transistors, which can be modeled [23][38] as follows : 2 ox ox t B V ox tunnel tunnel nmos eff ox V IAWL e t − ⎛⎞ = ⎜⎟ ⎜⎟ ⎝⎠ (32) where A tunnel and B are technology constants, and t ox is the oxide thickness. V ox is the potential drop across the oxide. When the transistor is ON, V ox =V gs -ψ s , where ψ s is the surface potential of the transistor [62]. Ignoring the gate-tunneling leakage of the PMOS, the gate tunneling leakage power of an inverter, P tunnel , can be calculated by, min ' 1 tunnel tunnel tunnel A P ps W K s β = ⋅⋅ = ⋅ + (33) where () ( ) ( ) 2 2 '/exp/ tunnel tunnel eff dd dd s ox ox dd s AALVV t BtV ψψ =− − − is a coefficient independent of the size and threshold voltage of the inverter [23][38]. 101 6.3.4 Average Power Dissipation Having obtained the equations for different components of the power dissipation in equations , (27), (31) and (33), the total average power dissipation for one stage of two adjacent bus lines in the active mode of circuit operation can be written as, 22 2 active sw sc sub tunnel PP P P P = ++ + (34) The factor 2 is due to the presence of two repeaters in one stage of two adjacent lines. Note that we have already considered the two repeaters on adjacent lines in the case of P sw in equation . In the standby mode, however, the only sources of the power dissipation are the sub-threshold and gate-tunneling leakage; so, 22 standby sub tunnel PP P = + (35) The average power consumption can be obtained as a weighted sum of the power consumption in the active and standby modes: (1 ) total active standby PP P χ χ = +− (36) where χ is the active mode factor of the circuit, i.e., the percentage of the time the circuit is in the active mode. l l ss ss c c ⋅l … … … … sleep sleep W slp W slp Figure 6-4. Sharing of sleep transistors among different bus lines 102 6.4 Power Optimization for MTCMOS Design 6.4.1 Power and Delay Modeling MTCMOS technology provides low leakage and high performance operation by utilizing high speed, low V t transistors for logic cells and low leakage, high V t devices as sleep transistors [5]. Sleep transistors disconnect logic cells from the supply and/or ground to reduce the leakage in the standby mode. The bus lines spend large percentage of the time in the standby mode. Therefore sleep transistors can be used for total power saving. The drawback is the increase in the delay in the active mode due to the additional resistance of the sleep transistors. Sleep transistors can be shared between the repeaters. Since repeaters are inserted at identical distances, we can share the sleep transistors between repeaters on different data lines. Figure 6-5 shows the case for only two adjacent bus lines. Similarly we can share the sleep for more than two bus lines. In the presence of sleep transistor both leakage components are substantially smaller in the standby mode. In the standby mode the virtual ground node (i.e., the drain terminal of sleep transistor) charges to a voltage near V dd [5]; hence, the potential drop across the oxide of the ON NMOS transistors becomes very small and, from equation (32), the gate-tunneling leakage of the inverter becomes negligible. 103 The sub-threshold leakage current and power dissipation can be calculated from equation (28) as, () , ,, 0 , thigh dd standby MTCMOS dd sub standby VV slp dd sub ox dd eff standby MTCMOS slp PVI W VA CV e L Ks λη µ −+ = ⋅ =⋅ =⋅ (37) where W s and V t,high denote size and threshold voltage of the sleep transistor, K standby,MTCMOS is the sub-threshold current for the minimum size sleep transistor and sslp is the size of the sleep transistor normalized to that of the minimum size transistor. Using the MTCMOS technique, the total power of one stage of two adjacent bus lines can be written as: ,, (1 ) total MTCMOS active standby MTCMOS PP P χ χ = +− (38) In order to consider the effect of the MTCMOS on the worst case delay constraint, we need to consider two cases: I) Adjacent bus lines are switching in the opposite direction; therefore, the sleep transistor is contributing to a single falling transition. Using equation (18), the time constant for one stage can be written as: () () () ()() 2 1 1 2 2 2 s sg p c g c slp gp c slp r drc c c c l rlsc c c rl s r sc c c c l W =+ + + + + + ⎡⎤ +⋅+ ++ ⎣⎦ (39) 104 II) Adjacent lines are switching in the same direction; when there are two simultaneous falling transitions, twice as much current has to be sunk through the sleep transistor. Therefore, the resistance of the sleep transistor should be doubled for the delay estimation. More precisely, () () 2 1 2 2 2 s sg p g slp gp slp r d r c c cl rlsc crl s r sc c cl W =+ + + + ⎡⎤ +⋅+ + ⎣⎦ (40) Note that the sleep transistors result in the delay increase only in the case of falling transitions at the output node of the repeaters. Therefore we introduce a new time constant as d 1 ′ =(τ 1 +d 1 )/2 and d 2 ′ =(τ 2 +d 2 )/2 where τ 1 (as in equation (18)) and τ 2 are the time constants for opposite and same direction transitions without any sleep transistors, respectively. The worst case delay per stage is equal to max{d 1 ′,d 2 ′}. 6.4.2 Sleep Signal Delivery Circuitry An important issue in the design of MTCMOS circuits is how to deliver the sleep signal to all MTCMOS transistors in the design. The sleep signal should be fast enough to minimize the transition time of the system from the standby mode to active mode [5]. If the sleep signal driver circuit is improperly designed, it will result in unnecessary switching and leakage power consumption. To minimize the delay of the system for transition from the standby mode to active mode and also to reduce the power consumption of the sleep signal delivery circuit, we use asymmetric inverters in this network as depicted in Figure 6-5. In this figure, weak transistors are 105 minimum-sized and have high threshold voltages. The rationale is that only the rise delay of the sleep signal plays a role in determining the wake-up delay of the circuit. The fall delay of the sleep signal, on the other hand, determines the active to standby mode delay which is not a critical factor. The sleep signal delivery circuit shown in Figure 6-5 not only minimizes the sleep signal propagation delay, but also linearly reduces the switching power dissipation of the sleep signal delivery circuit due to selective use of minimum-size transistors. At the same time, it exponentially reduces the leakage power of the sleep signal delivery circuit during the active mode of circuit operation by using high threshold voltage transistors in each inverter (which are OFF in the active mode). … … sleep W S S W W S S W W S S W Weak PMOS transistor Strong NMOS transistor Strong PMOS transistor Weak NMOS transistor Figure 6-5. Using asymmetric inverters in the sleep signal delivery circuitry 6.4.3 Problem Formulation Equation (21) gives the optimal worst-case delay per unit length for non-MTCMOS bus lines. Next we consider the same problem for the MTCMOS bus lines. Suppose a target end-to-end delay per unit length of interconnection line is given, which is 106 expressed as ∆% more than (τ/l) opt . Given this target delay, we need to calculate the values of l, s, and W slp , which minimize the total power dissipation. The total power for an interconnect of length L is equal to P total-MTCMOS .(L/l) where P total-MTCMOS was given in equation (38). Therefore, a constrained minimization problem for P total- MTCMOS should be solved: ( ) () () () 12 min , , .(1) ,, (2) , , '' where ; ; R and 1 slp slp req slp req total MTCMOS req opt Pl sW st Q l s W T Rl sW T Pdd PQ ll l T l τ − ⎧ ⎪ ⎪ ≤ ⎨ ⎪ ≤ ⎪ ⎩ ≡≡≡ ⎛⎞ ≡+∆ ⎜⎟ ⎝⎠ (41) The optimization problem can be solved by using the Lagrangian relaxation technique. In this technique, the constraints are relaxed and summed up in the objective function after multiplying them by non-negative coefficients, called the Lagrange multipliers: ( ) ( ) 12 req req FP Q T R T λλ =+ ⋅ − + ⋅ − (42) From the Lagrange method, the solution of the optimization problem (41) should satisfy the following set of conditions (Kuhn-Tucker optimality conditions): () ( ) 12 0; 0; 0; 0; 0; slp req req FF F sl W QT R T λλ ∂∂ ∂ ⎧ == = ⎪ ⎪ ∂∂ ∂ ⎨ ⎪ ⋅− = ⋅ − = ⎪ ⎩ (43) 107 These equations are solved numerically and the triplet (l ,s ,W slp ) which results in minimum P total-MTCMOS /l is selected. 6.5 Experimental Results To study the efficacy of the proposed technique, we conducted a comprehensive set of experiments. To extract the parameters which are used in the optimization problems, we performed transistor level simulation of devices in HSPICE [31] on a 45nm predictive technology model. All simulations were carried out at the frequency of 1GHz and die temperature of 100 o C. The extracted technology parameters are reported in Table 5. MOSEK optimization toolbox [44] was used to solve the mathematical problem. Two coupled bus lines as described in the paper are used for our experiments. After optimizing the bus lines, the corresponding values of the design were extracted to SPICE netlist and detailed HSPICE simulations were performed to measure the worst-case delay and the average power consumption of the buffer chain. We first calculated the average power consumption when the worst case delay is optimized. These values are reported in Table 6 as P D . The measurements were done for different active mode factors, χ. The power-optimal solutions with 10% delay penalty and for different χ without using MTCMOS sleep transistors and with only two degrees of freedom, s and l, are reported as P P in the table. Finally, the power optimal solutions with MTCMOS sleep transistors are reported as P M in the table. When the percentage of the time that the circuit is in the 108 active mode (i.e., χ) is small, the dominant component of the power consumption is the standby leakage. Therefore, MTCMOS technique results in significant power savings compared to P D and P P . As χ increases, the power saving diminishes. Since the active mode factor of global buses is usually very small, one can see that the power saving achieved by applying our technique is high. Note that the sleep signal delivery was achieved by the circuit shown in Figure 6-5 and its power dissipation overhead was considered in the total power consumption results. In the second set of our experiments, where results are presented in Table 7, we compared the efficacy of the proposed technique for different values of delay penalty. More precisely, here the value of χ assumed to be 10% and the delay penalty ∆ was varied from 5% to 40%. For each case, P P and P M were measured by HSPICE simulation. As we increase the delay penalty, the power reduction in both P P and P M increases. This power saving saturates as we increase ∆. Table 8 reports the optimal parameter values for the power-optimized design using the MTCMOS technique. The design parameters are normalized with respect to the delay-optimized repeater size (s opt ) and insertion length (l opt ). It is observed that by increasing ∆ both repeater and sleep sizes are decreasing. However, decrease in the sizes diminishes as the delay budget increases. Finally, we compared our results with a two-step approach to design MTCMOS repeaters. In this two-step approach, first the power-optimal solution with no sleep transistor is found; then the size of the sleep transistors is calculated based on the power-optimal l and s values of the first step. We assume equal ∆% in each step of this approach. Therefore for a fair comparison we have to compare the two-step 109 approach results with our solution with (2∆+∆ 2 )% ≈ 2∆% delay penalty. Table 9 compares the average power consumption achieved by our technique with that of two-step approach, denoted as P T . It is seen that on average, our approach gives about 9.5% improvements in average power consumption over the two-step solution. Table 5: Technology Parameters Used in the Simulation Setup Technology Parameter V dd (V) V t,low (V) V t,high (V) K MTCMOS (µW/µm) K sub,NMOS (µW/µm) Value 1.1 0.25 0.35 58 881 Technology Parameter K sub,PMOS (µW/µm) K tunnel (µW/µm) r (Ω.mm) c c (fF/mm) c ( fF/mm) Value 301 273 1099.9 53.68 19.41 Table 6: Power consumption results for different designs activity mode factor χ. Frequency=1GHz. χ P D (µW) P P (µW) P M (µW) P M reduction over P D (%) P M reduction over P P (%) 1% 59.1 24.2 9.9 83.3 59.3 2% 66.1 28.0 11.6 82.4 58.6 5% 87.3 39.4 22.4 74.4 43.2 10% 122.6 58.4 46.3 62.2 20.7 20% 193.1 96.3 89.3 53.8 7.3 30% 263.7 134.2 132.9 49.6 1.0 Table 7: Power consumption results for different delay penalties. Frequency=1GHz, L=10mm, χ=10% ∆ P P (µW) P M (µW) P M reduction over P D (%) P M reduction over P P (%) 5% 73.1 56.1 54.2 23.2 10% 58.4 46.3 62.2 20.7 15% 51.2 41.1 66.5 19.7 20% 49.1 36.7 70.0 25.3 25% 43.0 36.1 70.5 15.9 30% 38.0 32.7 73.4 14.0 35% 37.7 29.3 76.1 22.3 40% 33.2 29.0 76.4 12.7 110 Table 8: Design parameters for the optimized MTCMOS design. Frequency=1GHz, L=10mm, χ=10% ∆ s/s opt l/l opt W slp /s opt 5% 0.79 1.21 3.89 10% 0.70 1.43 2.90 15% 0.63 1.57 2.47 20% 0.57 1.71 2.20 25% 0.53 1.82 2.01 30% 0.51 1.93 1.88 35% 0.48 2.07 1.77 40% 0.45 2.14 1.68 Table 9: Comparing the proposed technique with a two-step approach to design MTCMOS repeaters Delay Penalty P T (µW) P M (µW) P M reduction over P T (%) 5% 56.7 56.1 0.9 10% 49.6 46.3 6.8 15% 44.6 41.1 8.0 20% 40.2 36.7 8.7 25% 39.7 36.1 9.0 30% 35.8 32.7 8.7 35% 35.3 29.3 17.1 40% 34.8 29.0 16.8 6.6 Summary This chapter addressed the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. We used MTCMOS technique by inserting high-V th sleep transistors to reduce the leakage power consumption in the idle mode. By accurately modeling different components of the power consumption and the delay, a mathematical problem was formulated for minimizing the average power under a timing constraint. Detailed HSPICE simulation showed that by considering the effect of crosstalk on both delay and power consumption, and by using MTCMOS technique, the average power consumption of the bus lines can be reduced by more than 50% with a small delay penalty of 5%. 111 CHAPTER 7: CONCLUSIONS 7.1 Summary of Contributions and Applications In this dissertation a new and compact logic cell model for the purpose of timing, noise and power analysis was presented. The current-based cell modeling technique presented in this paper can be used in CAD tools in different steps of the design flow. For example, our current source (CS) based cell models can be deployed inside signal integrity (SI) tools. Noise analysis and crosstalk delay change analysis are important parts of signal integrity analysis. As we discussed throughout this dissertation, previous noise and delay analysis techniques show large inaccuracy when noisy inputs are applied to the logic cells. Our CS models can overcome this shortcoming. Another application of the proposed modeling is path-based timing analysis (PBA). PBA adds the gate and wire delays on a specific path. The calculation is simpler since there is no max/min operation on the converging or diverging nodes of the timing graph, but the paths of interest must be identified by a standard static timing analysis tool prior to running PBA. Path-based analysis is less- pessimistic in general and can be done for the sign off analysis. Therefore during PBA more accurate modeling for the cells along the path is required. Once a set of critical paths is identified, the proposed CS models of logic (combinational and sequential) cells on that specific path can be utilized to provide an accurate evaluation of the timing of the path in question. 112 Through the course of this research, we concluded that as technology scales down and the effect of crosstalk noise increases, using an accurate cell model such as the one we proposed in this thesis becomes inevitable. In chapter 2 we presented a new current-based cell delay model to accurately capture various cell parasitic and nonlinear effects in the computation of output voltage waveform in the presence of crosstalk-induced noise. The output current of the logic cell is modeled with a voltage-dependent current source. A DC analysis step is performed to pre-characterize this current source as a function of the input and output voltages of the cell. The CS model for the single input switching combinational logic cell consists of three nonlinear capacitive components, namely, input and output parasitic capacitances, C i and C o , to capture the parasitic loading at input and output nodes of the cell and the miller capacitance, C M , to capture the capacitive coupling effect between the input and output nodes. We model all of these parasitics as a function of input and output voltage values. The experimental results showed the high accuracy of our cell delay model compared to HSPICE and also the improvement over the existing cell delay models. The average and maximum error in delay calculation of cells in our 130nm library is less than 0.7% and 2.4%, respectively. The high accuracy of this model enables us to apply this model for different application. As a result we used this model to address the problem of short circuit energy dissipation calculation in chapter 3. In this chapter we described how to pre- characterize the short circuit current value as a function of inputs and output voltage 113 value of the cell. The computational engine showed huge improvement in run time compared to SPICE with close to SPICE estimation of the short circuit energy dissipation. Previous short circuit energy dissipation estimation methods cannot handle noisy input waveforms. We compared our proposed short circuit power calculation method with that of [22]. Our method showed more than 45% accuracy improvement in estimation of the short circuit energy dissipation in the presence of crosstalk-induced noisy waveforms. In chapter 4, we extended our current-based logic cell model to handle multiple input switching considering the effect of internal node voltage values (stack effect). We showed that because of neglecting the effect of internal node voltage values, previous multiple input switching current source models may exhibit significant error in delay calculation, especially for lightly loaded cells. The accuracy of our model is comparable with HSPICE. More precisely, we showed that the maximum delay error of our model, which captures the internal node voltage effect, is 3% while that of a CSM without this capability is about 22%. In chapter 5, a CS model for sequential cells such as transparent latches and master-slave flip-flops was presented. Noise analysis on a sequential cell is very important due to the fact that a crosstalk noise (or glitch) applied to the input of a sequential cell may change the next state of the circuit. The key difficulty in the noise analysis of sequential cells is the presence of feedback loops. Our proposed model addresses these issues by characterizing the cell with suitable current source and parasitic components. Given the input and clock voltage waveforms of arbitrary 114 shapes, our CS model can accurately compute the output voltage waveform of a register cell This in turn enables one to perform accurate timing analysis for the next stage of the circuit which is fed by a latch or a flip-flop. We compared the results of our proposed CSMs with HSpice. High accuracy of the CS models along with the significant speed up compared to HSpice, make them attractive for employment inside a signoff timing analysis tool. The CS modeling presented in chapter 2 to chapter 5 enables us to perform a complete timing analysis along a target path. As we discussed, this application is known as path-based analysis. The effect of crosstalk noise on delay and power analysis should be considered in the power-optimization problems subjected to delay constraints. In chapter 6 we addressed the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise. We used MTCMOS technique by inserting high-V th sleep transistors to reduce the leakage power consumption in the idle mode. By accurately modeling different components of the power consumption and the delay in the presence of crosstalk, a mathematical problem was formulated and solved for minimizing the average power under a timing constraint. Our experiments showed that by using our proposed technique, the average power consumption of the bus lines can be reduced by more than 50% with a small delay penalty of 5% [25]. 115 7.2 Possible Extensions One of the major shortcomings of the current CS modeling is the extensive memory requirement. If we use 16 equally distributed voltage values between GND and Vdd for indexing the tables, each 2D lookup table has to store 256 values. This will increase if we add one or two more dimensions to the tables. I present some ideas to lower the memory requirement. i) Use non-uniform sampling between GND and Vdd. We know that the change in DC current values is not uniform for different input and output voltage values. Therefore we can increase the sampling size where the change is small and on the other hand store enough points in the critical input/output voltage range. ii) Use more powerful interpolation techniques to estimate the values of the cell model parameters and hence reduce the size of the look-up tables. In addition to memory requirement, performance improvement is a never ending challenge in the EDA industry. Today’s trend in CAD community is to enhance the performance by using multithreading techniques. Multithreading is a programming language feature which allows the programmer to construct multiple independent threads of execution. This allows applications to be executing multiple functions simultaneously. It is predicted that multithreading will become increasingly important in the future, as the industry trends toward multi-core processors and multi-processors. With multiple cores or processors, threads can execute independently, and so only multithreaded applications will be able to take advantage of the power gained by a second processor core [39][66]. Therefore, as a future 116 prediction, an important task is to customize CS based engines for effective utilization in a multithreaded programming environment. Pre-characterizing the look up tables for different corner cases is another challenging and interesting task. Interpolation from the nominal case may be also useful when the memory requirement is concerned. Handling more than two simultaneously switching inputs is another possible extension of my work. However the memory and performance requirement make this analysis a specially challenging problem. Several extensions can be developed on top of our CS based model for sequential cells. Analyzing different types of latches and flip-flops is one of these extensions. In addition, an automated CS modeling of a sequential cell is a very attractive and important work. Finally several extensions to our buffer insertion technique may be developed. Charge sharing between the coupled lines, and simultaneous bus encoding and buffer insertion are among those extensions. 117 REFERENCES [1] Abbaspour, Fatemi. H, and Pedram. M, "Parameterized block-based non- Gaussian statistical interconnect timing analysis," Proc. of Design Automation and Test in Europe, Mar. 2006. [2] Abbaspour. S, Fatemi. H, and Pedram. H, "VITA: Variation-aware interconnect timing analysis for symmetric and skewed sources of variation considering variational ramp input," Proc. of Great Lakes Symposium on VLSI, Apr. 2005, pp. 426-430. [3] Abbaspour. S, Fatemi. H, and Pedram. M, "Parameterized block-based non- Gaussian statistical gate timing analysis," Proc. of Asia and South Pacific Design Automation Conference, Jan. 2006. [4] Abbaspour. S, Fatemi. H, and Pedram. M, "VGTA: Variation-aware gate timing analysis," Proc. of Int'l Conf. on Computer Design: VLSI in Computers and Processors, Oct. 2005. [5] Abdollahi. A, Fallah. F, and Pedram. M, "An effective power mode transition technique in MTCMOS circuits," in Proc. of DAC, pp. 37-42, 2005. [6] Acar. E, Arunachalam. R, Nassif. S, “Predicting short circuit power from timing models,” Proc. of Asia and South Pacific Design Automation Conference, pp. 277-282, 2003. [7] Agarwal. A, Dartu. F, Blaauw. D, ”Statistical gate delay model considering multiple input switching”, Proc. DAC pp. 658 – 663, 2004. [8] Agarwal. K, Sylvester. D, Blaauw. D, Liu. F, Nassif. S, Vrudhula. S, “Variational delay metrics for interconnect timing analysis,” Design Automation Conference, Proceedings. 41st , June 7-11, 2004. 118 [9] Amin. C, Kashyap. C, Menezes. N, Killpack. K, and Chiprout. E, “A multi- port current source model for multiple-input switching effects in CMOS library cells,” in Proc. Design Automation Conference, pp. 247-252, 2006. [10] Bakoglu. H and Meindl. J, “Optimal interconnection circuits for VLSI,” IEEE Trans. On Electron Devices, vol. ED-32, no. 5, pp. 903–909, May 1985. [11] Banerjee. K and Mehrotra. A, “A power-optimal repeater insertion methodology for global interconnects in nanometer designs,” IEEE Trans. On Electron Devices, vol. 49, pp. 2001–2007, Nov. 2002. [12] Benini. L, Micheli. G, Macii. E, “Power optimization of core-based systems by address bus encoding,” IEEE Trans. On VLSI, vol. 6, no. 4, pp. 551-562. Dec. 1998. [13] Bhardwaj. S, “Novel techniques for analysis and optimization of nano-scale digital circuits in the presence of process variation”. PhD dissertation. Arizona State University. Dec 2006. [14] Buch. P, Dai. W “Understanding ECSM and CCSM,” http://www.magma- da.com/c/@SQJEgjiSi1J_6/Pages/MWAUnderstandingECSMandCCSM.ht ml. [15] Chang. H, Sapatnekar. S, “Statistical Timing Analysis Considering Spatial Correlations Using a Single PERT-like Traversal,” Proc. ICCAD, pp. 621- 625, 2003. [16] Chang. H, Zolotov. H, Visweswariah. C, and Narayan. S, ”Parameterized block-based statistical timing analysis with non-Gaussian and nonlinear parameters”, DAC, pp. 71-76, June 2005. [17] Chen. G and Friedman. E, “Low power repeaters driving RC interconnects with delay and bandwidth constraints,” in Proc. Of ASIC/SOC, pp. 335– 339, 2004. 119 [18] Chen. L, Gupta. S, Breuer. M, “A New Gate Delay Model for Simultaneous Switching and Its Applications”, Proc. DAC 2001. [19] Chern. T, Hajjar. A, “Statistical Timing Analysis of Coupled Interconnect Using Quadratic Delay-Change Characteristics” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Volume 23, Issue 12, Dec. 2004 Page(s):1677 – 1683. [20] Croix. J, Wong. D, “Blade and razor: cell and interconnect delay analysis using current-based models,” Proc. DAC, pp. 386-389, 2003. [21] Dally. W and Poulton. J, “Digital Systems Engineering”, Cambridge, U.K.: Cambridge Univ. Press, 1998. [22] Dartu. F, Menezes. F and Pileggi. L, "Performance computation for precharacterized CMOS gates with RC loads," IEEE. Trans. Computer- Aided Design, pp. 544-553, May 1996. [23] De. V, Keshavarzi. A, Narendra. S, "Techniques for leakage power reduction," in Design of High-Performance Microprocessor Circuits, A. Chandrakasan, et al. Eds. Piscataway, NJ: IEEE, 2001. [24] Fatemi. H, Abbaspour. S, Pedram. M, Tuncer. E, Ajami. A, “Statistical timing analysis of coupled interconnects” Proc. of Great Lakes Symposium on VLSI, Apr. 2006. [25] Fatemi. H, Amelifard. B, Pedram. M, “"Power optimal MTCMOS repeater insertion for global buses," in Proc. International Symposium on Low Power Electronics and Design (ISLPED), 2007. [26] Fatemi. H, Nazarian. S, and Pedram. M, "A current-based method for short circuit power calculation under noisy input waveforms," in Proc. of ASP- DAC, pp. 774-779, 2007. 120 [27] Fatemi. H, Nazarian. S, and Pedram. M, “Statistical Logic Cell Delay Analysis Using a Current-Based Model,” Proc of Design Automation Conference (DAC), July 2006. [28] Ghanta. P, “Stochastic performance modeling and analysis of VLSI circuits in the presence of process variation”, PhD dissertation. Arizona State University. Aug 2007. [29] Gupta. P, Kahng. A, Sharma. P, and Sylvester. D, "Gate-length biasing for runtime-leakage control," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 8, Aug. 2006, pp. 1475-1485. [30] Hashimoto. M, Yamada. Y, Onodera. H, “Equivalent waveform propagation for static timing analysis,” IEEE Trans. Computer-Aided Design of Integrated Circuits & Systems, Vol. 23, No.4, pp. 498-508, 2004. [31] HSPICE: The golden standard for Accurate Circuit Simulation,” http://www.synopsys.com/products/mixedsignal/HSPICE/HSPICE.html. [32] http://www.eas.asu.edu/~ptm/ [33] Jiang. I, Chang. Y, and Jou. I, "Crosstalk-driven interconnect optimization by simultaneous gate and wire sizing," IEEE Trans. on CAD, vol. 19, no. 9, pp. 999-1010. Sep. 2000. [34] Kang. C, Abbaspour. S and Pedram. M, “Buffer sizing for minimum energy- delay product by using and approximation polynomial,” Proc. of Great Lakes Symposium on VLSI, April 2003. [35] Keller. I, Tseng. K, Verghese. N, “A robust cell-level crosstalk delay change analysis,” Proc. ICCAD, pp.147-154, 2004. [36] Keutzer. K and Orshansky. M, “From blind certainty to informed uncertainty”. In ACM/IEEE Workshop on Timing issues in the specification and synthesis of digital systems, 2002. 121 [37] Lawler. G, “Introduction to stochastic process”, Chapman & Hall/CRC, 2000. [38] Lee. D, Blaauw. D, and Sylvester. D, "Gate oxide leakage current analysis and reduction for VLSI circuits," IEEE Trans. on VLSI, vol. 12, no. 2, pp. 155-166, Feb. 2004. [39] Lewis. B, Berg. D “Threads Primer: A Guide to Multithreaded Programming”, Prentice Hall, 1998. [40] Liu. Y, Nassif. S, Pileggi. L, and Strojwas. A, “Impact of Interconnect Variations on the Clock Skew of a Gigahertz Microprocessor,” DAC, pp. 168-171, 2000. [41] Marculescu, Marculescu. D, and Pedram. M, "Probabilistic modeling of dependencies during switching activity analysis," IEEE Trans. on CAD, vol. 17, no. 2, pp. 73-83, Feb. 1998. [42] Marculescu. R, Marculescu. D, and Pedram. M, "Switching activity estimation considering spatiotemporal correlations," in Proc. of ICCAD, pp. 294-299, 1994. [43] Maurine. P, Rezzoug. M and Auvergne. D, “Output transition time modeling of CMOS structures”, IEEE International Symposium on Circuits and Systems, vol. 5, pp. 363-366, 2001. [44] MOSEK Optimization Software, http://www.mosek.com [45] Mukhopadhyay. S, Raychowdhury. A, and Roy. K, “Managing leakage power: Accurate estimation of total leakage current in scaled CMOS logic circuits based on compact current modeling,” Proc. of the 40th conference on Design automation, pp. 169 – 174, Jun. 2003. [46] Nassif. S, “Modeling and Analysis of Manufacturing Variations,” CICC, pp. 223-228, 2001. 122 [47] Nazarian. S, Pedram. M, Tuncer. E, Lin. T, “Sensitivity-based gate delay propagation in static timing analysis,” ACM/IEEE Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU), pp. 20- 25, Feb. 2005. [48] Nose. K and Sakurai. K, "Analysis and future trend of short circuit power," IEEE Trans. on CAD, vol. 19, no. 9, pp. 1023-1030, Sept. 2000. [49] Odabasioglu. A, Celik. M and Pileggi. L, “PRIMA: passive reduced order interconnect macro-modeling algorithm,” IEEE Trans. on CAD, vol. 17, No. 8, pp. 645-653, 1998. [50] Oh. N, Dign. L, Kasnavi. L, “Sequential cell noise immunity characterization using meta-stable point of feedback look,” Proc. Int’l Symp. On Quality Electronic Design (ISQED), 6 pages, 2006. [51] Okada. K, Yamaoka. K, Onodera. H, “A statistical gate-delay model considering intra-gate variability,” Proc. of ICCAD, pp. 908-913, 2003. [52] Pedram. M, “Power minimization in IC design: principles and applications,” ACM Trans. on Design Automation of Electronic Systems, Vol. 1, No. 1, 1996, pp. 3-56. [53] Ramprasad. S, Shanbhag. S, and Hajj. I, "A Coding framework for low- power address and data busses." IEEE Trans. on VLSI, vol. 7, no. 2, pp. 212-221, June 1998. [54] Rao. R, Agarwal. K, Sylvester. D, “Approaches to run-time and standby mode leakage reduction in global buses,” in Proc. of ISLPED, pp. 188-193, 2004. [55] Sakurai. T and Newton. A, "A simple MOSFET model for circuit analysis," IEEE Trans. on Electron Devices, vol. 38, no. 4, pp. 887-894, Apr. 1991. 123 [56] Sakurai. T, Newton. A, “Alpha-power law MOSFET model and its applications to CMOS inverter and other formulas,” IEEE J. Solid-State Circuits, vol. 25, pp. 584-594, Apr.1990. [57] Sapatnekar. S, “Timing” Springer-Verlag New York, Inc., 2004. [58] Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 2003 edition, http://public.itrs.net/ [59] Shin. Y, Chae. S, and Choi. K, “Partial bus-invert coding for power optimization of application-specific systems,” IEEE Trans. On VLSI, vol. 9, no. 2, pp. 377-383, Apr. 2001. [60] Sinha. D, Khalil. D, Ismail. Y, and Zhou. H, “A timing dependent power estimation framework considering coupling”, in Proc. ICCAD, pp. 401-407, 2006. [61] Sylvester. D, Nakagawa. O, and Hu. C, “Modeling the Impact of Back End Process Variation on Circuit Performance.” VLSI Technologies, Systems and Applications, Int. Symposium on, pages 58–61, 1999. [62] Taur. Y and Ning. T, “Fundamentals of Modern VLSI Devices”, Cambridge University Press, 1998. [63] Veendrick. H, “Short circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE Jour. Solid- State Circuits, vol. SC-19, pp. 468-473, 1984. [64] Visweswariah. C, Ravindran. K, Kalafala. K, Walker. S, Narayan. S, ”First- order incremental block-based statistical timing analysis”, Design Automation Conference, Proceedings. 41st , June 7-11, 2004 [65] Weste. N, Harris. D, “CMOS VLSI Design: A Circuit and System Prospective” Addison Wesley, 2005. 124 [66] Wikipedia. Description of Multithreading. http://en.wikipedia.org/wiki/Multithreading [67] Wikipedia. Description of Signal Integriity. http://en.wikipedia.org/wiki/Signal_Integrity [68] Wikipedia. Description of Static Timing Analysis. http://en.wikipedia.org/wiki/Static_Timing_Analysis [69] Xakellis. M and Najm. F, “Statistical estimation of the switching activity in digital circuits,” Proc. of the 31st conference on Design automation, pp. 728-733, Jun. 2004. [70] Zhou. H and Wong. F, "Global routing with crosstalk constraints," in Proc. of DAC, pp. 374-377, 1998.
Abstract (if available)
Abstract
This dissertation investigates the effect of capacitive crosstalk on the behavior of CMOS cells and presents a new cell modeling technique for the purpose of noise, delay and power analysis. In particular, a current-based logic cell model for cell timing analysis in the presence of crosstalk-induced noisy inputs is introduced. This model enables accurate calculation of the electrical waveform of the cell output under noise-affected input waveforms of arbitrary shapes. This current source (CS) model is subsequently extended to handle multiple input switching (MIS) while considering the effect of internal node voltages of the transistor stacks in the cell (a.k.a. the stack effect). Application of the proposed CS model for short-circuit power analysis is presented. In addition, a CS model for CMOS register cells i.e., latches and master-slave flip-flops is described. Experimental results for the proposed CS models demonstrate close-to-SPICE accuracy with up to 3 orders of magnitude speedup compared to HSpice. The scope of this dissertation is not limited to delay and power analyses. Indeed, this dissertation also investigates the problem of power-optimal repeater insertion for global buses in the presence of crosstalk noise and subject to delay constraints.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Timing analysis of coupled interconnect and CMOS logic cells in the presence of crosstalk noise
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Test generation for capacitance and inductance induced noise on interconnects in VLSI logic
PDF
Static timing analysis of GasP
PDF
A power adaptive low power low noise band-pass auto-zeroing CMOS amplifier for biomedical implants
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Stochastic dynamic power and thermal management techniques for multicore systems
PDF
Designing efficient algorithms and developing suitable software tools to support logic synthesis of superconducting single flux quantum circuits
PDF
Power efficient design of SRAM arrays and optimal design of signal and power distribution networks in VLSI circuits
PDF
Performance improvement and power reduction techniques of on-chip networks
PDF
A variation aware resilient framework for post-silicon delay validation of high performance circuits
PDF
Charge-mode analog IC design: a scalable, energy-efficient approach for designing analog circuits in ultra-deep sub-µm all-digital CMOS technologies
PDF
High power, highly efficient millimeter-wave switching power amplifiers for watt-level high-speed silicon transmitters
PDF
Automatic conversion from flip-flop to 3-phase latch-based designs
PDF
Multi-level and energy-aware resource consolidation in a virtualized cloud computing system
PDF
Library characterization and static timing analysis of asynchornous circuits
PDF
Power optimization of asynchronous pipelines using conditioning and reconditioning based on a three-valued logic model
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
Asset Metadata
Creator
Fatemi, Hanif
(author)
Core Title
Timing and power analysis of CMOS logic cells under noisy inputs
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publication Date
12/13/2009
Defense Date
10/01/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cell modeling,crosstalk noise,OAI-PMH Harvest,timing analysis
Language
English
Advisor
Pedram, Massoud (
committee chair
), Goldstein, Larry M. (
committee member
), Gupta, Sandeep K. (
committee member
)
Creator Email
fatemi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m979
Unique identifier
UC172363
Identifier
etd-Fatemi-20071213 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-486086 (legacy record id),usctheses-m979 (legacy record id)
Legacy Identifier
etd-Fatemi-20071213.pdf
Dmrecord
486086
Document Type
Dissertation
Rights
Fatemi, Hanif
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
cell modeling
crosstalk noise
timing analysis