Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Effects of non-uniform substrate temperature in high-performance integrated circuits: Modeling, analysis, and implications for signal integrity and interconnect performance optimization
(USC Thesis Other)
Effects of non-uniform substrate temperature in high-performance integrated circuits: Modeling, analysis, and implications for signal integrity and interconnect performance optimization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFECTS OF NON-UNIFROM SUBSTRATE TEMPERATURE IN HIGH-PERFORMANACE INTEGRATED CIRCUITS: MODELING, ANALYSIS, AND IMPLICATIONS FOR SIGNAL INTEGRITY AND INTERCONNECT PERFROMANCE OPTIMIZATION by Amir Hooshang Ajami A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment o f the Requirements for the Degree Doctor o f Philosophy (ELECTRICAL ENGINEERING) May 2003 Copyright 2003 Amir H. Ajami Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3116654 Copyright 2003 by Ajami, Amir Hooshang All rights reserved. INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3116654 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by AMIR HOOSHANG AJAMI under the direction o f fp-s dissertation committee, and approved by all its members, has been presented to and accepted by the Director of Graduate and Professional Programs, in partial fulfillment of the requirements fo r the degree of DOCTOR OF PHILOSOPHY Director Date August 12. 2003 Dissertation Committee Chair C o -C h a ir Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION To My Beloved Family Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOW LEDGEM ENTS There are many people I would like to thank for helping me through this journey and making my years at USC part of the most rewarding and challenging years of my life. First and foremost, I would like to thank my advisor, Professor Massoud Pedram for his continuous support and guidance throughout my Ph.D. studies. I would also like to thank my co-advisor, Professor Kaustav Banerjee at University of California at Santa Barbara, for his exceptional inspiration and endless effort in helping me throughout the last difficult two years of my Ph.D. research work. I’m indebted to both of them for their great support and the great attention I received from them, and for being great academic inspirations for me. I would also like to specially express my appreciation to Professor Gandhi Puvvada, my mentor in the first two years of my Ph.D. studies. I had a very rewarding experience working as his head-teaching assistant in the microprocessor lab. Being the best teacher I’ve ever seen in my life, he taught me many things not only about digital design, but also about life. He was always there for me, and it is my honor to be his assistant for a short period of my life. I would also like to thank my qualifying and dissertation committee iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. members, Professor Peter Beerel, Professor Jaun Luc Gaudiot, Professor Won Namgoong, and Professor Mansour Rahimi for their live and exciting discussions and feedbacks during the qualifying and defense exams. In particular, I would like to thank Professor Peter Beerel, my first research advisor at USC, for his continuous support. I’m also grateful to Professor Won Namgoong for our valuable discussions and his non-stop encouragements. I would like to thank all my current and past colleagues in the low power CAD group at USC, for helping me out in numerous occasions and for creating such a friendly environment; special thanks to Afshin Abdollahi, Yazdan Aghaghiri, Wei Chen, Wei-Chung Cheng, Payam Heydari, Ali Iranli, Chang-Woo Kang, and Peyman Rezvani for their friendship and support beyond our research circle. I would also like to thank all my colleagues in the EE-systems department; special thanks to Emil Ettelai, Shahdad Irajpoor, Reza Motaghian, Shahin Nazarian, Ali Taha, and Vida Vakilotojar, and. Finally, I would like to acknowledge my beloved parents, Latifeh and Kazem, my brother Ardeshir, and my sister Azita, for their unconditional love, devotion, and support; especially my beautiful mom, for being a constant source of motivation and a role model for hard work and perseverance in life. I would also like to specially thank my aunt, Shafigheh iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dastourband, who was always there for me. To them I dedicate this dissertation. v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS DEDICATION................. II ACKNOWLEDGEMENTS... ....... Ill LIST OF TABLES .... VIII LIST OF FIGURES .... IX ABSTRACT.................. XIV CHAPTER 1 INTRODUCTION.... .... 1 1.1 Motivation................................... 1 1.2 Sources of Power Dissipation................................................................... 4 1.2.1 Average Chip Temperature............................................................ 7 1.2.2 Trends in Interconnect Scaling ...... 13 1.3 Impact of Temperature on Interconnect Reliability..............................14 1.4 Impact of Temperature on Chip Performance....................................... 15 1.5 Non-uniform Chip Temperature Profile................................................. 16 1.6 Thesis Contribution...................................................................................17 1.7 Thesis Outline........................................................................................... 22 CHAPTER 2 ANALYTICAL MODEL FOR INTERCONNECT THERMAL PROFILE .................. 23 2.1 M ethodology............................................................................................. 23 2.2 Uniform Substrate Thermal Profile.........................................................31 2.3 Substrate Thermal Profile Calculation ................................................. 35 2.4 Summary............................................ 40 CHAPTER 3 NON-UNIFORM TEMPERATURE-DEPENDENT INTERCONNECT DELAY MODEL ..... 42 3.1 A Thermally-dependent Distributed RC Delay M odel........................42 3.2 Effect of Constant Substrate Temperature on Signal D elay............... 45 3.3 Effect of Substrate Thermal Non-uniformities on Signal Delay. 46 3.4 Directional Thermal Gradients and Their Effects on Signal D elay. 51 3.5 Summary.................................................................................................... 53 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 4 IMPACT OF NON-UNIFORM INTERCONNECT TEMPERATURE ON CLOCK SKEW.................... 55 4.1 Introduction............................................................................................... 55 4.2 Thermally-dependent H-Tree Construction Technique...................... 59 4.3 Experimental Results ......... 63 4.4 Summary....................................................................................................66 CHAPTER 5 EFFECT OF NON-UNIFORM SUBSTRATE TEMPERATURE ON BUFFER INSERTION.. .... 67 5.1 Introduction................................................................................................67 5.2 Temperature-dependent Driver Resistance........................................... 68 5.3 Optimal Buffer Insertion Technique...................................................... 71 5.4 Experimental Results............................................................................... 77 5.5 Discussion..................................................................................................84 5.6 Summary.................................................................................................... 86 CHAPTER 6 ANALYSIS OF IR-DROP SCALING WITH IMPLICATIONS FOR DEEP SUBMICRON P/G DISTRIBUTION NETWORK DESIGNS .... 87 6.1 Introduction................................................................................................87 6.2 Topology of Power Distribution Networks for IR-drop Analysis ... 91 6.3 Methodology for Power Network Planning.......................................... 98 6.3.1 Power Network Electromigration Rule Satisfaction 101 6.4 Effects of Technology Scaling on the IR-drop Effect........................ 104 6.4.1 Effects of Thin-film, Barrier Thickness and Interconnect Temperature................................................................................. 104 6.4.2 IR-drop in Global/Semi-global Power Network..................... 107 6.4.3 IR-drop in Local Power Network.............................................112 6.4.4 Effect of Hot Spots on the Worst-case IR-drop...................... 114 6.5 Effects of the IR-drop on the Cell Performance and Clock Skew . 118 6.6 Summary.................................................................................................. 120 CHAPTER 7 CONCLUSIONS AND FUTURE WORK ............... 122 7.1 Thesis Summary.......................................................................................122 7.2 Future W ork.............................................................................................123 7.2.1 Studying the Effects of Thermal Non-uniformities on EDA F lo w .............................................................................................. 124 7.2.2 Reducing the Magnitude of the Non-uniform Substrate Thermal Gradients....................................................................... 126 CHAPTER 8 REFERENCES ......................................... 129 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES Table 1. Comparison between different thermal profiles and their effects on clock skew...........................................................................................65 Table 2. Parameters used in generating experimental results for three different technologies based on ITRS specifications..........................78 Table 3. Technology parameters used in this work based on ITRS data for Cu. Tm a x is the estimated maximum temperature in the top most metal layer as per [35]...........................................................................103 Table 4. Minimum number of (minimum-width) power tracks needed to be routed on the power grid at global and semi-global tiers (in order to satisfy the EM rules) which was calculated based on (3) for T=105 °C and Tm ax. The./m value is the maximum allowable current density at Tmtx for the global interconnect layer in the table..........................................................................................................104 Table 5. Effective resistive (barrier plus thin-film) and temperature coefficient ratios for the global, semi-global and local tiers for various technology nodes......................................................................106 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES Figure 1-1. Supply voltage Vdd and clock frequency/as a function of the technology node (ITRS’01 [38]).................................................................6 Figure 1-2. Average chip power consumption as a function of the technology feature size for high-performance chips w/heat sink (ITRS’01 [38])...............................................................................................7 Figure 1-3. A simplistic cross section view of a chip containing the substrate and the interconnect lines.......................................................... 10 Figure 1-4. Chip power density and average substrate temperature for different technology feature sizes in high-performance microprocessors (ITRS’99 [37])..............................................................1 1 Figure 1-5. Finite-Element 3-D simulation results of different interconnect layer peak temperatures for some technology nodes based on ITRS parameters.....................................................................................................12 Figure 2-1. A point-to-point interconnect line passing over the substrate, separated by the insulator layer, and connected to the switching devices by using vias at its two ends........................................................27 Figure 2-2. Different configurations of metal lines and vias............................. 33 Figure 2-3. Thermal profile along the length of a 2000 pm long global interconnect line (Cu) with uniform substrate temperature using 0.1 pm and 0.25 pm technology node parameters provided by NTRS’97 [43]..............................................................................................33 Figure 2-4. Concept of using a 2-D mesh on the substrate surface for determining Tr e j(x) using the thermal resistance between each two adjacent nodes in the mesh by considering the power consumption of each block................................................................................................37 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2-5. A 3-D mesh of the substrate consisting of thermal resistors and current sources................................................ 37 Figure 2-6. Thermal profile along the length of a 2000 pm long interconnect (Cu) line with a linear substrate thermal profile using parameters of global wires of 0.1 pm and 0.25 fim technologies [43]................................................................................................................40 Figure 3-1. A distributed RC interconnect model driven by resistance Rd and terminated at load Q ...........................................................................43 Figure 3-2. Percentage increase in delay with respect to the signal delay at 25 °C as a function of the line temperature.............................................46 Figure 3-3. Exposing a point-to-point interconnect to similar exponential thermal profiles in two different directions.............................................48 Figure 3-4. Performance degradation for Ti(x) and T2(x) profiles of Figure 3.2 49 Figure 3-5. Constant-peak normal thermal profile with variable median ju and standard deviation cralong an interconnect line ..................50 Figure 3-6. Delay increase as a function of the median value and the standard deviation of a normal temperature distribution.......................51 Figure 3-7. Gradually decreasing (increasing) interconnect thermal profile as an equivalent to sizing down (up) of a uniform resistance wire......53 Figure 4-1. Portion of a clock tree with two fanout branches that have equal wire lengths....................................... 56 Figure 4-2. Percentage of normalized delay difference between wires 1 and 2 as a function of location parameter x............................................. 57 Figure 4-3. Percentage of normalized delay difference between wires 1 and 2 as a function of the temperature T? as shown in Figure 4-1....... 59 Figure 4-4. A symmetric H-Tree clock distribution net ..............................61 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4-5. Schematic of minimum-skew clock signal insertion for an interconnect line with non-uniform temperature profile........................62 Figure 5-1. Normalized driver resistance as a function of device temperature for different technology feature sizes.................................70 Figure 5-2. Structure of standard buffer insertion in a uniform line with equal segmentation..................................................................................... 73 Figure 5-3. Delay improvement due to the temperature-aware buffer insertion technique in comparison to the standard buffer insertion technique for different numbers of buffers in different technologies based on ITRS..............................................................................................81 Figure 5-4. Location of an inserted buffer in a 6660 pm line (0.18 pm technology): (a) standard technique (b) temperature-aware technique with only variable r (c) temperature-aware technique with only variable Rd (d) temperature-aware technique with both variable Rd and variable r...........................................................................82 Figure 5-5. Delay improvement due to the thermally aware buffer insertion for one buffer as a function of different thermal gradients between the two ends of the line in comparison to the standard buffer insertion techniques for different technologies........................... 82 Figure 5-6. Delay improvement due to the thermally-aware buffer insertion for one buffer as a function of percentage of critical length in comparison to the standard buffer insertion techniques for different technology nodes...................................................................83 Figure 6-1. RC model of a power bus network. Each intermediate node is connected to underlying circuit blocks modeled as time-varying current sources Is’s and on-chip decoupling capacitances C h a p 's..... 94 Figure 6-2. A local power distribution network for a typical standard cell design consisting of power trunks in a comb-line structure connecting to the semi-global power grid through metal2....................96 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6-3. Magnified view of a trunk segment, containing a resistive network with inverters connected to the intermediate nodes................ 97 Figure 6-4. Worst-case IR-drop (AViR/Vdd) increase as a function of the technology node for combined global and semi-global power grids considering the effects of self-heating, while allocating 5% and 10% of the routing area to the power network, respectively...............109 Figure 6-5. Worst-case IR-drop (AViR/Vdd) increase as a function of technology node for combined global and semi-global power grids considering the effects of self-heating, while allocating 10% of the routing area to the power network and assuming uniformly distributed on-chip decoupling capacitances....................................... 110 Figure 6-6. Minimum required percentage of the allocated resources (global layer routing area and substrate area) to ensure a worst- case 10% voltage drop for future technologies, considering the maximum temperature on global/semi-global interconnects...............I l l Figure 6-7. Worst-case IR-drop (AViR/(Vdd-VddO) increase as a function of technology node in the presence of interconnect temperature (T) and surface scattering/barrier effects (S), in the local power trunk lines. Notice that in this graph {Vdd-Vdd') is the actual voltage over the two sides of the local power trunks, and N is the number of standard cell connected to the power trunk............................................113 Figure 6-8. Total worst-case IR-drop (AV[R/Vdd) increase (as a result of Figure 6-5 and Figure 6-7) as a function of technology node in the presence of interconnect temperature (T) and surface scattering/barrier effects (S), while allocating 10% of the routing area to the power network and 5% of the substrate for decoupling capacitor......................................................................................................114 Figure 6-9. Worst-case IR-drop (AVm/Vdd) increase (based on Figure 6-5) as a function of technology node in the presence of hot spots as a function of thermal gradient magnitudes (°C)....................................... 116 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6-10. Sensitivity of the cell delay (SD vdd) to the fluctuations of the supply voltage Vdd for different technology nodes. Y-axis values show the percentage increase in gate delay for each percent decrease in Vdd at the specific technology ......... 119 Figure 6-11. Maximum percentage of the delay difference among drivers connected to a local power trunk for different technologies at room temperature, and at maximum interconnect temperature. ............. 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT The ever-increasing demand for complex ULSI (Ultra Large Scale Integration) circuits with higher performance is leading to higher clock frequencies and device packing density, which results in large on-chip power dissipation. The large power consumption results in dramatic increase in device junction temperature. Furthermore, different switching activities and/or sleep modes of various functional blocks and dynamic power management policies can be major sources of thermal non-uniformities over the Silicon substrate. Without adequate thermal engineering, significant non-uniform temperature distributions can lead to considerable interconnect thermal gradients and substrate hot-spots. Hence, thermal management is essential to the development of future generations of microprocessors, integrated network processors, and systems-on-a-chip (SOQ. At the circuit level, temperature variations in the substrate and interconnect lines have important implications for circuit performance and reliability. The research presented in this thesis focuses on analysis and modeling of non- uniform chip temperature profile and the study of its effects on different aspects of signal integrity and performance in very high-performance ULSI xiv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. designs. This dissertation makes contributions in four distinct, yet related, areas. First, a detailed analysis of the interconnect temperature distributions in the presence of non-uniform substrate thermal profiles is presented. To study the effect of non-uniform substrate temperature on the signal performance in interconnects, a non-uniform temperature-dependent distributed RC interconnect delay model is proposed. Second, by using the proposed temperature-dependent RC delay model, it is shown that clock distribution networks are one of the most vulnerable signal nets to the substrate thermal non-uniformities. Subsequently, a thermally driven near-zero-skew clock routing methodology is proposed. Third, it is shown that the non-uniform substrate temperatures can affect the optimal buffer insertion techniques. Consequently, a new design methodology is provided to reduce the impact of these effects on the optimality of the buffer insertion. Finally, the effects of substrate hot-spots and technology scaling on the worst-case power distribution network voltage (IR) drop are examined. By introducing these studies and methodologies for the first time, it is shown how the presence of substrate non-uniform temperatures can severely degrade the performance of the circuits resulting from conventional design flows. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1 Motivation With the downscaling of the VLSI feature size, interconnects are becoming the dominant factor determining system performance and power dissipation. Due to the ever-increasing demand for very high performance ULSI circuits, aggressive VLSI technology scaling has reduced of the interconnect metal pitch dramatically and increased the number of metal routing layers. This continuous interconnect scaling has resulted in higher current densities in the interconnect lines, which effectively increases the interconnect temperatures. As a result, management of thermal effects is rapidly becoming one of the most challenging efforts in high performance chip design [33]. Furthermore, different activities and sleep modes of the functional blocks in high- performance chips cause significant temperature gradients on the substrate. It has been reported that thermal gradients of 40 °C exist in a typical high- performance microprocessor [32]. Low power design techniques such as 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dynamic power management [59] and clock gating can result in such thermal non-uniformities on the substrate. With circuits moving towards multi-GHz frequencies it is expected that the magnitude of thermal gradients in the substrate will increase. In addition, as the minimum feature size shrinks down, the top most metal layers that carry the global signals get closer to the substrate [35]. As a result, the effect of the non-uniform substrate temperature on the interconnect temperature becomes more critical. Although many research efforts have been focusing on the development of low power and new package design for better chip reliability, the thermal problem still exists and deserves more attention. Thermal management is essential to the development of future generations of microprocessors, integrated network processors, and systems-on-a-chip. At the circuit level, thermal problems have important implications for performance and reliability [11],[22],[25]. Aggressive increase in the interconnect current density has a notable impact on both the interconnect reliability and signal performance. The degradation of interconnect reliability is mainly due to the electromigration (EM) phenomenon. Furthermore, increased current density in interconnects causes increased self-heating (Joule-heating). This self-heating effect results in the temperature rise in interconnect lines, which exponentially reduces the 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interconnect EM time-to-failure [10]. Much work has been done to calculate the EM reliability lifetime of metal interconnects [13],[27],[52], Although extensive research has been performed to determine the uniform chip temperature and predict the effect of temperature on the interconnect reliability, few efforts have focused on analyzing the temperature effects on the interconnect performance. It is well known that the resistivity of a metal increases linearly with its temperature rise. In high performance ICs the peak chip temperature can rise up to 160 °C in 100 nm technology feature size and is expected to rise to a much higher level for future technologies [27],[35]. Such a temperature rise can increase interconnect resistance significantly which subsequently increases the signal delay in the interconnect line. It has been shown that such a temperature increase can alter the propagation delay of different critical paths and, in some extreme cases, causes timing violations [26]. Moreover, it has been shown t recently hat neglecting thermal gradients in the substrate (and consequently in the interconnects) can introduce major errors in the signal delay calculations [4]. In order to fully understand the effects of chip power dissipation on the substrate and interconnects temperatures, one must first examine the different sources of power dissipation inside the chip. 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2 Sources of Power Dissipation For digital CMOS circuits there are four major sources of power dissipation. The average power dissipation, Pavx, can be expressed as [16]: p p i p i p n n avfi dyn lea ka g e s h o r t- c ir c u it V ' * f where pd y n = v ih I ■ V,K iH g ■ f • £ (C,. ■ swt) \ is the dynamic switching V n ) component of power consumption required to charge and discharge load capacitance seen by n switching devices. Here / is the switching frequency (usually equal to 0.5x the clock frequency), C, is the capacitive load of the ith- switching device, swj is the average switching activity factor (0< 5vv <1) for the ith switching device, Vdd is the power supply voltage, and VS W j,,g is the voltage differential between the initial and final voltages across the terminals of the capacitors. PJ h „ r,_c ir,.u il = ^ I s c ■ Vd d 1 is due to the direct path V n J short-circuit current, Isc, which arises when both the NMOS and PMOS transistors are active simultaneously , conducting current directly from supply f \ to ground. pie a k a g e = ^ l lta ka g t ■ VM is due to the leakage current, leakage, V n J which can arise from reverse bias diode currents and sub-threshold effects. 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These three components of power dissipation are all associated with the switching devices. In well-designed circuits with relatively high threshold voltages, the leakage and short-circuit components are usually small. The most dominating component is the one due to switching activity (i.e., P( /w), although the leakage component is becoming increasingly significant for deep sub-micron technologies with lower device threshold voltages. Note that leakage current has also an exponential dependency to the switching device temperature, which suggests that the leakage component of chip power consumption should play an important role in high temperatures. It should be noted that the capacitance C in the switching component of power dissipation is predominantly due to the interconnect capacitances. Hence, it is the interconnect lines that are mainly responsible for the total chip power dissipation. Figure 1-2 shows that how the supply voltage and chip clock frequency in a high-performance microprocessor will change by downscaling the technology feature size based on ITRS’01 projected values [38], It is shown that the value of Vm decreases with technology scaling. However due to the rapid increase in the clock frequency and other power components (such as leakage and short- circuit power) the total power consumption will generally increase by scaling 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the technology feature size (Figure 1-2). 25 20 0.8 15 Frequency (GHz) Vdd (Volts) 0.6 10 0.4 5 0.2 0 150 130 107 90 80 70 65 45 32 Technogy Feature Size (nm) Figure 1-1. Supply voltage Vdd and clock frequency / as a function of the technology node (ITRS’01 [38]). 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 300 ■ 250 ■ 200 - ■ 150 - - 100 150 130 107 90 80 70 65 45 32 Technology Feature Size (nm) Figure 1-2. Average chip power consumption as a function of the technology feature size for high-performance chips w/heat sink (ITRS’01 1.2.1 Average Chip Temperature Considering the 1-D heat transfer model provides a better understanding of the relationship between chip power consumption and thermal effects. Figure 1-3 shows a typical arrangement of the substrate and packaging material in a chip. With the 1-D heat transfer model the average temperature of the chip, Tc/,iP, can be estimated very easily as follows: [38]), 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. t- “ “=t° + R t ( £ (1.2) where T0 is the ambient temperature, P is the average power dissipation and A is the chip area. Equation (1.2) is the dual of Ohm’s law in electrical systems, which states the linear relationship between the voltage difference of two electrical nodes and the current passing through their electrical path using the concept of electrical resistance. In a 1-D heat transfer model, the temperature difference between two thermal nodes has a linear relationship with the heat flow from between nodes. This relationship has been represented as R t in (1.2) and is called the thermal resistance. Here R t represents the substrate (Silicon) layer plus the package and heat sink thermal resistances. Thermal resistance RT is a function of physical dimensions and thermal characteristics of substrate, packaging materials, and heat sink. Based on the operating chip temperature (7 ^ = 1 2 0 °C) for the 180 nm technology node and To=25 °C, R t is 4.75 cm2 °C/W. Assuming the same value for R t (considering the same packaging scheme and heat sink design), the die temperatures at other technology nodes can be estimated using (1.2). Equation (1.2) states that the average chip power density is the dominant factor in determining the average chip temperature. It is assumed that by 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. technology scaling the magnitude of thermal resistance R-p does not change dramatically. Even though the total chip power dissipation increases with technology scaling (based on Figure 1-2), the chip sizes also increase gradually. Figure 1-4 shows that the chip power density remains nearly constant for different technology nodes. As a result, it can be seen from the same Figure that the average substrate temperature is also somehow constant for different technologies. As a result the substrate temperature in high- performance designs is usually assumed to be at 110±10°C for a wide range of technology nodes. This observation shows that the increasing thermal problem as a function of technology node scaling is not related to the surface of the substrate. 3-D simulations show a sharp increase in the maximum chip temperature with scaling, which suggest that some other locations inside the chip will experience such maximum temperature. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Interconnects Chip Devices Package Heat Sink Ambient T0=25 °C Figure 1-3. A sim p listic cross section view of a chip containing the substrate and the interconnect lines. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C M C O 0 ) O CD 5 o Q . ■ 130 - 100 --70 Power Density Chip Temp o •o ' — I CD 3 • o CD “ t E c ■ n CD o 180 150 100 70 50 Technogy Feature Size (nm) 35 Figure 1-4. Chip power density and average substrate temperature for different technology feature sizes in high-performance microprocessors (ITRS’99 [37]). The reason of thermal discrepancy between interconnect line and substrate is that, in addition to the three power components mentioned in (1.1), some power dissipation also results from metal self-heating caused by the current flow in the interconnect network. Although interconnect self-heating constitutes only a small fraction of the total power dissipation in the chip, the temperature rise in the interconnect lines due to self-heating can be significant. This is due to the fact that interconnects are located far away from the substrate and the heat sink, separated by several layers of insulating Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. materials that have lower thermal conductivities than that of the substrate. In fact, full-chip thermal analysis using Finite Element simulations has shown that the maximum temperature in the chip increases rapidly with scaling due to increased self-heating of the interconnects despite the fact that the chip power density (power per unit area) remains nearly constant over a wide range of technology nodes as per the ITRS parameters [35],[37]. The maximum temperature occurs at the top of the chip where the global interconnects are located. 220 50 nm 200 35innm 180 70 nm Temp [°C] 160 - i 100 nm i 130 nm 140 i ' 180 nm 120 0 1 2 3 4 5 6 7 8 9 10 Distance from Substrate [jim] Figure 1-5. Finite-Element 3-D simulation results of different interconnect layer peak temperatures for some technology nodes based on ITRS parameters. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Experimental results have shown that the temperature of interconnects within a very high performance chip with an ambient temperature of 25 °C can rise up to 160 °C in some cases (in the 100 nm technology feature size). 3-D simulations have shown that downscaling the technology feature size will cause the peak temperature in the interconnect lines to increase rapidly (Figure 1-5). As an example, in a 50nm technology feature size, the peak temperature in long global interconnects can reach up to 210 °C [35], 1.2.2 Trends in Interconnect Scaling As VLSI technology feature size continues to be scaled aggressively, a rapid increase in functional density and chip size is observed. This has resulted in an increasing number of interconnect levels and a reduction in interconnect pitch in order to realize all of the inter-device and inter-block communications. Interconnect levels are expected to increase in the near future, from 7 levels at the 180 nm node to 10 levels at the 35 nm node [38], The increase in the number of interconnect levels causes the top most interconnect layers to move further away from the Silicon substrate (in comparison to the local interconnect metal lines), making the heat dissipation more difficult. Additionally, decreasing interconnect pitch will cause increased thermal coupling through thermal cross talk. Furthermore, the 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. critical dimensions of contacts and vias are also decreasing with scaling, resulting in higher current densities in these structures. For the sake of reducing the capacitive cross talk, low-k dielectric materials have been introduced recently. Due to the poor thermal conductivity of low-k materials, it is projected that thermal effects in interconnects with low-k materials for the insulating layers could become another serious thermally related design constraint. 1.3 Impact of Temperature on Interconnect Reliability The unavoidable increase in the current density of metal lines has a notable impact on the interconnect reliability. There have been major efforts to study and model the effect of current density on the interconnect failure time, mostly due to electromigration (EM) induced failures. EM is the directional interconnect metal atom migration caused by the flow of electrons. Black [13] expressed the median time to failure (MTF) due to electromigration as MTF=A.Tn .exp(Q/kT) where A and n are constants, J is the average current density, Q is the effective atom activation energy, k is the Boltzman constant and T is the interconnect temperature. A lot of work has been done to systematically compute the reliability of each interconnect based on Black’s 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. equation and detect the unreliable segments in signal and power lines [27]. More recent works show that the temperature rise in interconnects due to self heating, as a function of RMS current, will change the overall line temperature. This will degrade the interconnect reliability exponentially. As a result, the self-consistent median time to failure formula suggested by [ 1 0 ] considers both the average and RMS currents effects on the failure time. These studies reveal the importance of the temperature in the overall system reliability, which calls for a detailed thermal analysis of the chip. 1.4 Impact of Temperature on Chip Performance In contrary to the extensive efforts to determine chip temperature profiles and model temperature-dependent reliability, few efforts have been made to consider the effect of temperature on signal performance and integrity. It is well known that the resistance of an interconnect increases linearly with the temperature increase in the line. Consequently, the delay of the signal passing through an interconnect line becomes strongly dependent on the temperature profile of the interconnect line. It has been shown that the effect of temperature on the resistance can even change the timing critical paths and, in some cases, affect the functionality of the circuit as well [26]. The severe increase in the interconnect temperature reported in Figure 1-5 demands a 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. more detailed study of temperature dependence of the interconnect performance. Moreover, even though the substrate temperature does not increase as much as interconnect temperature with scaling, the performance of the switching devices on the substrate is very dependent upon their temperature, which will be shown later. As a result, the temperature-dependent switching activity of the devices should be considered more seriously during the optimization procedures in each EDA flow step. 1.5 Non-uniform Chip Temperature Profile Furthermore, it has been reported that significant temperature gradients on the silicon substrate can occur due to different activity and/or different sleep modes of various functional blocks in high-performance microprocessor chips [58], Using these different switching policies is a critical step for low power design schemes. Dynamic power management (DPM) [59] and functional blocks clock gating can be major sources of such thermal gradients over the substrate. Some researchers [54] provided techniques to derive the temperature profile along the substrate surface as well as the self-heat generation in each interconnect. Reference [22] also provides some design 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rules to account for the neighborhood thermal effects on the temperature of each interconnect. The existence of thermal gradients along the substrate will cause temperature non-uniformities to occur along the lengths of long global wires, which are usually used to realize clock nets and inter-block communications. Hence, it is important to analyze and quantify the impact of non-uniform temperature profiles on interconnect performance. The presence of non-uniform temperature distributions along global wires can also affect clock skew, wire sizing, and buffer insertion. In general, it is expected that chip thermal non uniformities affect most of the EDA flow optimization steps. Hence, for deep- sub micron processes, an increasing interplay between temperature, performance, and reliability is expected to occur due to increasing thermal effects. Therefore, traditional circuit-level performance benchmarks such as signal integrity and noise are also interlinked with the thermal problems. As VLSI technology progresses toward sub-100 nm feature size, these effects will become increasingly dominant. 1.6 Thesis Contribution Based on the previous remarks, this thesis addresses the following issues: 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Analytical modeling o f the non-uniform interconnect temperature distribution Due to the non-homogenous nature of the switching activity profile of different blocks on the substrate, the chip temperature profile is generally non- uniform. For this reason, a convenient way of deriving the interconnect temperature in the presence of substrate thermal gradients is needed. An analytical solution to calculate the temperature along the interconnect line in the presence of different thermal gradient profiles on the underlying substrate is presented. The actual thermal boundary conditions and arrangements of the interconnect lines and vias/contacts inside the chip are used to obtain the thermal profiles along each metal line. • Introducing a non-uniform temperature-dependent RC interconnect delay model In order to examine how the non-uniform interconnect temperature affects the overall performance of the signals, a new temperature-dependent RC delay model is introduced. It is well known that metal resistivity changes linearly with the temperature of interconnect line. By taking this fact into consideration, the new RC delay model employs a non-uniform resistance profile along the length of the line. Using the proposed delay model and with the help of different examples, the effects of interconnect thermal non- 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. uniformities on signal performance are demonstrated. It is observed that the direction of the thermal gradient along the length of the long global interconnects is an important factor in determining the magnitude of the degradation of the signal delay in that line. This suggests that the presence of substrate thermal gradient results in a non-uniform degradation of signal delay on the long global interconnect lines. New guidelines are proposed to maintain better signal propagation times while considering the thermal non uniformities over the substrate. • Proposing a new temperature-dependent zero-skew clock routing methodology Based on the observation in the previous section, it is obvious that many EDA flow steps will be affected by non-uniform substrate temperatures. Due to their symmetric nature, H-Tree clock structures will be affected by these substrate thermal gradients the most. In conventional circuit designs, usage of H-Tree methodology is a well-known strategy to ensure a near-zero skew clock routing, which in turn assures the correct functionality of the register components. However, by introducing the thermal non-uniformities and using the temperature-dependent RC delay model, it can be observed that the signal delay propagation will not remain symmetric in different branches of the conventional H-Tree. As a result, a new clock tree routing methodology is 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. presented which compensates for the non-uniform substrate temperature and guarantees a near-zero skew clock routing tree at the end. Experimental results show that without counting the effects of temperature non-uniformities on the clock tree, a significant amount of skew will be added to the clock tree sinks, which consequently endangers the functionality of the sequential components. • Introducing a novel temperature-dependent buffer insertion technique Another important step in EDA flow that will be affected by temperature non uniformities is the optimal buffer insertion technique. Usually, buffer insertion is performed to improve the signal performance and reduce the propagation delay in a signal net. The goal is to find the number, the sizes, and the final placement of inserted buffers in order to minimize the signal propagation delay from the source to a critical sink. The presence of substrate thermal gradients dramatically changes the optimality of the conventional buffer insertion techniques. Not only are the effects of temperature on the signal performance in interconnect lines important, but the switching performance of the inserted buffer is also dependent on the temperature of the assigned location over the substrate surface. In fact, experimental results show that the dependency of the device switching activity on the substrate temperature is greater than the interconnect performance’s dependency on the interconnect temperature. As a result, both of these dependencies should be taken into 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. account while performing the buffer insertion step. A novel temperature- dependent buffer insertion technique is presented. It is shown that by neglecting the effects of temperature on interconnect and device performances, the final placement of the inserted buffers results in a non- optimal solution. • Studying the effect o f technology scaling and hot-spots on the IR-drop phenomenon in power/ground networks A detailed analysis of the power-supply voltage (IR) drop scaling in DSM technologies is presented. More precisely, the effects of interconnect temperature, electromigration, interconnect technology scaling (including the resistivity increase of Cu interconnects due to electron surface scattering and finite barrier thickness), and substrate hot spots are taken into consideration during this analysis. It is shown that the IR-drop effect in the power/ground (P/G) network increases rapidly with technology scaling and that using well known counter measures such as wire-sizing and decoupling capacitor insertion with resource allocation schemes that are typically used in the present designs may not be sufficient to limit the voltage fluctuations over the power grid for future technologies. It is also shown that such voltage drops on the power lines of switching devices in a clock net can introduce a significant amount of skew, which in turn degrade the signal integrity. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.7 Thesis Outline The remainder of the thesis is organized into six main chapters. Chapter 2 provides a thorough background of interconnect temperature calculation techniques and explains the effect of non-uniform thermal substrate gradients on the interconnect temperature. Chapter 3 introduces the effects of interconnect thermal profiles on the i?C-distributed signal delay and provides some design rules to be used for better performance in the presence of a non- uniform thermal profile. Chapter 4 studies the effects of temperature on clock skew and provides a new thermally-dependent clock tree generation technique with near-zero skew to ensure the integrity of the clock signal. Chapter 5 examines the temperature-dependent optimal buffer insertion technique and discusses the variability of the optimal performance derived point due to non- uniform thermal profiles. A new optimization technique is provided to include the effects of non-uniform temperature on performance-driven buffer insertion. In Chapter 6 , the effects of technology scaling including interconnect temperature, barrier and thin-film effects, and substrate hot spots on the power distribution network worst-case IR-drop are studied. Finally, Chapter 7 summarizes the main contribution of this thesis and outlines possible directions for future investigations. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Analytical Model for Interconnect Thermal Profile 2.1 Methodology The temperature distribution as a function of position (r) and time (t) in a closed structure is governed by the following heat diffusion equation and proper boundary conditions: -V • ( 4 V f ( r ) ) + Q(r) = ^ - c pT(r) (1.3) dt p subject to some defined initial values. T is the time dependent temperature at each location, k is the solid thermal conductivity of the material as a function temperature (W/(m°C), cp is the specific heat (J/(kg°C)) of the material constituting the structure, and Q is the heat generation rate. In a general multi layer structure, k and Q are position-dependent, i.e., they are functions of r. In a 3-D space (x,y,z), the heat diffusion equation (1.3) in any material can be 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. written as follows [19]: B B T B B T B B T * ~ BT — (k — ) + — (k — ) + — (k — ) + Q = 5 cn— (1.4) B x ’ By By Bz Bz p Bt where Q* is the rate of heat generation per unit volume (W/m3 ) and 8 is the solid density (kg/m3 ). In general, a boundary condition for solving the diffusion equation (1.4) can be written as follows: k ^ - + h r T = ./: (1.5) Bn. B/Bn denotes the differentiation along the outward-drawn normal at the boundary surface 57, ht is the heat transfer function from surface s, (W/(m2oC)), and fi is an arbitrary function of position in the space. Even though the thermal conductivity k of a material is generally a function of temperature and position, due to its small variations in the conductors, it is usually assumed to be a constant in the interconnects. In addition, the four sidewalls and the top surface of the chip containing the interconnect lines are assumed to be insulated (which is generally a valid assumption). This means that the interconnect lines do not exchange energy through the four sidewalls 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and the top surface. The only side that can exchange heat with the interconnect line is the underlying substrate, which is connected to the heat sink. Under these assumptions and working under the steady-state condition, the system of heat equation (1.4) and boundary conditions can be reduced as follows: ,d2 T | d 2 T | a 2 ^ , 'd x 2 dy2 di subject to specified initial conditions. Note that Q*eff is the effective volumetric heat generation, which also considers the heat loss rate per unit volume that addresses the functionality of the boundary condition (heat loss) from the bottom side of the interconnect line. A 3-D finite element thermal simulation would need to be employed in order to find an exact solution [47]. On the other hand, in a globally long interconnect the length of the line is much larger then the thickness and the width of the line. As a result, the thermal gradients along the thickness and width of the interconnect line can be ignored when focusing on long VLSI interconnects. Consequently, many researchers have used a simplified version of (1.6) and employed the 1-D heat diffusion equation to avoid the huge computation time used by FEM simulators while generating acceptable results [49]. In that case, (1.6) can be reduced as 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. follows: where km is the thermal conductivity of the metal. To derive the effective volumetric heat generation Q~e ff, consider an interconnect line passing over the substrate as shown in Figure 2-1. The interconnect line is connected to the substrate through vias at its two ends. The major source of temperature generation in a chip is the power dissipation due to the dynamic and static activity of the cells lying on the substrate. In addition, the power dissipation in the interconnect line is also a source of the heat generation. For the interconnect line shown in Figure 2-1 the power dissipation Pg in a partial metal length Ax can be expressed as: Pg(x) = I ^ A R E(x) (1.8) where Im is is the root mean square current passing through the line. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L \ Interconnect (km ) InsulatoT x N Substrate Figure 2-1. A point-to-point interconnect lin e passing over the substrate, separated by the insulator layer, and connected to the switching devices by using vias at its two ends. The electrical resistance of the interconnect line Re has a linear relationship to its temperature and can be written as follows: where R0 is the resistance per unit length at a reference temperature, /3 is the temperature coefficient of resistance (1/°C), and T(x) is the temperature profile along the length of the interconnect line. Furthermore, initial resistance R0 can be expressed as: Re(x) = R0(\ + J3-T(x)) (1.9) AR0(x) = p ----- w tm ( 1. 10) 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where p , ■ is the electrical resistivity of the interconnect at the reference temperature, tm is the interconnect thickness and w is the width of the interconnect. On the other hand, energy loss due to heat transfer between the interconnect and the substrate through the insulator for a partial length Ax is: ( l.i i) ART (x) where: ARr (x) = (1-12) Ax P,(x) is the heat flow from the interconnect to the substrate, T\m e is the interconnect temperature, Tmb is the underlying substrate temperature, Rt is the insulator thermal resistance, and £ in s is the effective insulator thermal conductivity. k 'ltu is a shape-dependent parameter that considers the geometry configuration of the heat conducting body on the thermal conductivity. In the case of heat flow by conduction between two identical flat plates with insulated edges, k*jn s is simply the thermal conductivity k ,> w . In the case of a rectangular shape parallel to an infinite plate, the simple approximation introduced by Bilotti [14] can be used. It assumes a quasi 2-D model where 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. heat flows only through the bottom side and partially by the two sides along length of the rectangular shape (interconnect). In this case, the effective thermal conductivity k ^^ can be expressed as kuJS 1 +0.88tjn s/w) and provides results with 3% accuracy for test cases with wtr /tin s > 0.4. However in submicron technologies, the geometrical dimensions of global lines will not satisfy this condition. Hence, using Bilotti’s estimation for effective thermal conduction results in slightly higher than actual values for peak temperature in global lines. In reality, the heat flows from all sides of the rectangular body (i.e. interconnect). A more precise expression for k*in s is presented by [7] where it takes this fact into account and gives a more accurate effective thermal conductivity as follows: k l = kim • • 1.685 • [log(l + 2™ ) ] - 0-59 • (ha.) “0- 078 (1.13) w w t m m « j Authors in [25] have used this approximation and validated its accuracy with 3-D FEM simulations with negligible error for rectangular-shaped interconnect lines. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on the above observations, the net heat energy gain per unit volume is: (1.14) Using the simplified heat equation (1.7), the summarized interconnect heat flow equation can be written as follows: where X and 0 are constants in specified technology and interconnect layer assignment. Equation (1.15) and its coefficients will be the basis of subsequent interconnect temperature calculations. Note that in order to have a unique solution for (1.15), two initial conditions must be provided. Equation (1.15) shows that the underlying substrate temperature, Tr e /x ), plays an important role in determining the temperature of the line. This value is usually assumed to be constant throughout the substrate surface. Although this is a (1.15) (1.16) T 2 - n n s P t (1.17) 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. valid assumption for the short local interconnects, it is not true in the case of long global lines in the upper metal layers. Because of the different switching activities of various cells on the substrate surface, a non-uniform temperature profile along the substrate surface is inevitable. In this study two cases have been analyzed: 1) uniform thermal profile over the underlying substrate and 2 ) non-uniform thermal profile over the underlying substrate. 2.2 Uniform Substrate Thermal Profile Assume that Tr e /x ) is a constant for all positions along the length of the line. The two initial conditions that are needed to solve (1.15) can be derived using the interconnect line and via/contact setup. For one segment of a signal net there are four possible configurations, depicted in Figure 2-2, based on the location and connection of the vias. Here the routes between substrate and metal layer 1 and between metal layer 1 and metal layer 2 are examined. One can easily extend these configurations in the same manner to the other metal layers. The via are assumed to get as hot as the layer immediately beneath them. In reality (and especially in Al-Cu technology), due to their smaller cross-sectional area and higher electrical resistivity, vias can get much hotter [34], unless they have been arranged in some sort of via array instead of just one via contact. In the current analysis, it is assumed that the router uses via 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. arrays wherever possible. Considering Figure 2-2(a), one can see that the two end vias create a thermally conductive path between the metal layer and the substrate. Due to the very small thermal resistivity of the vias, the temperature at the two sides of the metal line is assumed to be equal to the temperature of the substrate. For example, the initial conditions in Figure 2-2(a) to solve (1.15) can be written as follows: where 0<x<L and Tr e f is the constant substrate temperature. By solving the homogenous differential equation (1.15) with the constant coefficients given by (1.16) and (1.17), the line temperature can be written as follows: Assuming a uniform substrate temperature of 100 °C, the interconnect line thermal profiles for a 2 0 0 0 jim long global interconnect line corresponding to Figure 2-2(a) for two different technologies are depicted in Figure 2-3. The parameters for the 250nm and lOOnm technologies are extracted from NTRS’97 guidelines [43]. T (x = 0) = Tref , T (x = L) = Tre f (1.18) ru)=r^+A(i_ sinh Ax + sinh A ( L - x ) sinh AL (1.19) 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2-2. Different configurations of metal lines and vias. 1 1 7 T e c h . n o d e 0 . 1 m i c r o n 1 1 6 ' 1 1 4 T e c h . n o d e 0 . 2 5 r r i o r o n 1 1 3 T(x)(C) 112 111 110 1 0 9 1 0 8 200 400 600 1200 1400 1600 1800 2000 0 800 10Q0 Position x (micron) Figure 2-3. Thermal profile along the length of a 2000 p m long global interconnect line (Cu) w ith uniform substrate temperature using 0.1 pm and 0.25 pm technology node parameters provided by NTRS’97 [43]. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Distance d is called the heat diffusion length; it is a function of 1 /A , and is strongly dependent on the thickness of the insulator between the metal and the substrate and the effective current density flowing through the metal. Using (1.19) and assuming a constant current density in all the metal layers of a signal net, the diffusion length d is larger for the higher level metal layers due to their higher underlying insulator thickness. As an example, for an interconnect with an RMS current of 2 mA in a metal layer with width 0.32 flm and an underlying oxide layer with thickness 1 .2 jum, the diffusion length d is approximately 40 /urn. In addition, the peak value of the temperature in Figure 2-3 is equal to Tr e j+ 0 //I2. Although decreasing X increases the value of the diffusion length, for a long global line, it also increases the peak value (9/tf) sharply. For interconnects whose lengths are comparable to the heat diffusion lengths, the line temperature does not reach the maximum peak value. Using this concept, the authors in [28] have introduced a new technique to make the peak temperature lower by adding extra dummy vias separated by a distance less than the diffusion length. As will be seen in Chapter 3 , it can be shown that the delay of interconnect configuration depicted in Figure 2-2(b) is lower than that for Figure 2-2(c), due to the fact that the configuration in Figure 2-2(b) has a rising thermal profile while the one in Figure 2-2(c) has a decaying thermal profile. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 23 Substrate Thermal Profile Calculation As it was shown, the temperature of the substrate has a major effect on the temperature of the interconnect lines. However, due to the different switching activities of each cell, they can contribute to the chip power consumption in a non-uniform manner. This non-uniform distribution of power generation on the substrate surface will cause a non-uniform substrate thermal profile. Considering different switching activities and power consumptions of the cells, substrate thermal profile Tsub(x) cannot be a constant value in all locations x on the substrate surface. The quality of extracting the T n,b(x) depends on how accurately one can estimate the power consumptions of the cells or macro-cells in different steps of the EDA flow. Some of the techniques that are used to find the substrate thermal profile are given in [54], Due to the duality between thermal and electrical networks, the easiest way to map the substrate thermal profile is to model the substrate as a 3-D grid and solve the system of thermal relations between each two nodes in the grid while considering the packaging and the ambient temperature as additional thermal nodes. To calculate the temperature over the die, the difference among the switching activities of each individual cell needs to be considered. However, it is difficult to capture the point-by-point temperature of the 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. substrate, so the die can be broken into a few grids, perhaps 20x40x6 for the length, width and thickness, respectively. A more simplistic technique can be achieved by using a 2-D mesh over the substrate surface (Figure 2-4). This can be achieved by using the concept of transfer thermal resistance. By definition, the transfer thermal resistance RT jj of a location j (in the 2-D mesh or 3-D grid) with respect to a point heat source i can be defined as: T- RT, j = ^ - ( 1.20) ' P , which is basically the dual of the electrical relationship between the current in an electrical resistance and the voltages at its two ends. Using the finite difference method [54], one can easily find the transfer thermal resistance values of all surface nodes with respect to any single source node by sampling the temperature of these nodes due to one unit of dissipated heat at the source node. 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pptfjp t * ? g®g £ ■ n § * ‘ ▼ > P4 i r f " A * i v Figure 2-4. Concept of using a 2-D mesh on the substrate surface for determining Tr e /x ) using the thermal resistance between each two adjacent nodes in the mesh by considering the power consumption of each block. ^ "X— — t - «- N \ — m N \ Thermal Resistance with Packaging © V \ \ Ambient Temperature Figure 2-5. A 3-D mesh of the substrate consisting of thermal resistors and current sources. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In general, by using a 3-D grid structure over the substrate an n x n thermal resistance matrix for 3-D nodes can be formulated by using (1.20), where n is the number of nodes in the grid. Adding up the power consumption of all the cells in each grid results in the total power dissipated in that grid. By calculating the thermal resistance between each two node on the grid using the 1 -D heat conduction model a system of equations can be built from which the temperature can be calculated at each grid node. Using the transfer thermal resistance matrix RT n x n , one can easily calculate the temperature distribution T = [Tj, To, ..., Tn] f at each node of the grid due to a given specific heat distribution P = [Pi, P2, ..., Pn ] \ by solving: ~ i? ,u R,i2 ... Rt'" 1 1 A 1 R,2' ... ... R 2 " • Pi = t2 R,n l ... ... R,n n J A J A ] ( 1.2 1 ) For simplicity, the thermal resistance can be calculated only between each two adjacent faces, so for each grid cube there are 6 sides for which the thermal resistance must be calculated. In that case the thermal resistivity matrix is a sparse matrix. The power at each cell can be calculated by using 0.5CV2 f a for dynamic part and using around 2 0 percent of that value as the leakage power 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. consumption, or it can be extracted from the power table provided by the cell library. Note that the boundary conditions must be considered, which in here will be the ambient temperature and packaging thermal resistances. In practice the ambient temperature is at 27 °C and the thermal conductivity for packaging is around 7 W/m°C for the sides, 2000 W/m°C for the top, and 8800 W/m°C for the bottom [26]. In the interest of computational efficiency one can choose a fewer number of grids. If the run-time is not an issue, more grids can be selected in each direction to map a highly accurate thermal profile for the surface of the substrate. However, because this procedure depends on finding the power map of the cells on the substrate, Tr e /x ) is a design dependent function. For this reason, and for an illustration, a linear substrate temperature distribution along the length of an interconnect is used and its effect on interconnect temperature T(x) variations is observed. Ti(x)=ax+b is used and the non-homogeneous differential heat equation (1.16) is solved for the configuration shown in Figure 2-2(a) with proper initial conditions. The resulting thermal profile along the line can be expressed as: a i _ T{(x) = — (l — e~'lx------------- sinh Xx) + ax + b (1-22) X~ sinh XL 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2-6 shows the thermal profile in an interconnect using the linear substrate thermal profile Ti(x) (with a gradient from 30 °C to 100 °C). 110 10 0 ' T e c h . n o d e 0 . 1 m i c r o n T e c h . n o d e 0 . 2 5 m i c r o n T(x) (C) 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 0 Position x (micron) Figure 2-6. Thermal profile along the length of a 2000 pm long interconnect (Cu) line with a linear substrate thermal profile using parameters of global wires of 0.1 pm and 0.25 pm technologies [43]. 2.4 Summary In this chapter, a convenient way of deriving the interconnect temperature in the presence of substrate thermal gradients was presented. Due to the non- homogenous nature of the switching activity profile of different blocks on the substrate, the chip temperature profile is generally non-uniform. For this 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reason, an analytical solution to calculate the temperature along the interconnect line in the presence of different thermal gradient profiles on the underlying substrate is needed. Using the actual thermal boundary conditions and arrangements of the interconnect lines and vias/contacts inside the chip, the general 3-D heat diffusion equation has been adaptively simplified such that the system of equations (1.15) and its constants can be easily used to derive interconnect temperature profiles analytically. Through different examples the simplicity and accuracy of the derived analytical model has been demonstrated. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Non-uniform Temperature- dependent Interconnect Delay Model 3.1 A Thermally "dependent Distributed RC Delay Model The resistance of the interconnect has a linear relationship with its temperature and can be written as follows: r(x) = r 0(\ + j3-T(x)) (2.1) where r0 is the unit length resistance at reference temperature and fi is the temperature coefficient of resistance (1/°C). Consider an interconnect with length L and uniform width w that is driven by a driver with on-resistance R,i and junction capacitance Cp and terminated by a load with capacitance Cl as depicted in Figure 3-1. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ax W — Figure 3-1. A distributed RC interconnect model driven by resistance R d and terminated at load Cjj. The line is partitioned into n equal segments, each with length Ax. Using a distributed RC Elmore delay model, the delay D of a signal passing through the line can be written as follows: D = Rcl((]Tc0(xt)• Ax) + CL + Cp)+ Y ir o(A)• Ax• (J ]c 0(xy )• Ax+ CL) (2 .2 ) /=i (=i j=i where c0(x) and ro(x) are the unit length capacitance and unit length resistance at location x, respectively. As the number of the partitions approaches infinity the Elmore delay can be rewritten as: D = Rd(C +CL + [ c0(x)dx) + [ r0 (x )• ( f c0(T )d t+ C L)dx (2.3) JO Jo Jjf 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The third integral in (2.3) represents the downstream capacitance seen by the interconnect from location x. It is assumed that the unit length capacitance does not change with temperature variations along the interconnect length (which is generally a true assumption). It is also assumed that the temperature distribution inside the driver is uniform under the steady-state condition. Hence the Rd will be constant at the chosen operating temperature of the cell. Equation (2.3) can be simplified to the following: D = D0+ (Cp +c0L + CL)p0j3 (x)dx - c0p 0j3 \ ox 'T (x)dx (2.4) where: Dq = Rd (CL + c0L) + (c0p Q + p 0LC L) (2.5) D0 is the Elmore delay of the interconnect corresponding to the unit length resistance at 0 °C (or the reference temperature). From (2.4) it is clear that in order to calculate the actual temperature-dependent delay the area under T(x) and xT(x) needs to be computed. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 Effect of Constant Substrate Temperature on Signal Delay To get an idea of how much temperature can affect the degradation of the delay, the worst-case scenario is assumed by using a uniform thermal profile at the peak temperature over the entire length of the interconnect. Choosing electrical and thermal parameters for Al-Cu interconnects with /?=3E-03 (1/°C) and using r.v / !=0.077(0/sq) at room temperature (25°C) and Crt=0.2(fF/sq) as the unit sheet resistance and capacitance, respectively, the variations of Elmore delay with temperature in an interconnect line with w=0.32p,m, i?j=10n, and Q,= 1000fF for different lengths in (im are summarized in Figure 3-2. As Figure 3-2 shows, for each 20-degree increase in temperature there is roughly a 5 to 6 percent increase in the Elmore delay for the long global wires. Although assuming a constant temperature along the interconnect gives an upper bound on the delay increase, the actual variations of temperature along the interconnect lines in (2.4) need to be estimated and applied. This is necessary mainly due to the fact that non-uniform interconnect temperature has an unavoidable impact on the wire planning. More specifically, the non- uniform temperature profile along the interconnect line can severely affect the 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. clock skew and buffer insertion and these effects cannot be addressed by simply accounting for a uniform worst-case maximum temperature along the interconnect line. L = 100 L = 400 L = 700 L = 1000 L = 2000 30 50 70 90 110 Tem perature (C ^ 130 150 Figure 3-2. Percentage increase in delay with respect to the signal delay at 25 °C as a function of the line temperature. 3.3 Effect of Substrate Thermal Non-uniformities on Signal Delay As an example, consider having exponential temperature distributions along the interconnect length. Observing the behavior of the line under exponential thermal profiles is important in the sense that, as we saw, most of the solutions to the interconnect heat transfer equation (1.15) have an exponential 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. component. By applying an exponential thermal distribution T(x)=a.exp(-bx) to an interconnect and using (2.4), the Elmore delay is as follows: D = D0 + % o Jd[(c0L + CL - y-) + (— - C L)e~ h L ] (2.6) b b b where Do is defined by (2.5). For the sake of analysis, consider two different exponential thermal profiles Tt(x) and T2(x) along an arbitrary interconnect as depicted in Figure 3-3. Using (2.4), calculation shows that the interconnect Elmore delay is more adversely affected by Tj(x) than by T2(x), even though the underlying areas for both Tj(x) and T2(x) in Figure 3-3 are equal along the length of the line. Figure 3-4 compares the performance degradation in the presence of Tj(x) and T2(x) in two different wire lengths, 1000 pm and 2000 pm, with the same electro-thermal characteristics that were mentioned before. In both cases the lower-bound temperature is kept constant at 30 °C. By increasing the upper-bound value for these functions, it can be observed that using T2(x) causes less delay increase than that caused by using Tt(x). This shows that assuming a constant temperature along the wire (with peak-value) is not accurate enough in planning wire routings and clock-skew analysis, as illustrated later in more detail. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 0 0 1 8 0 1 6 0 120 100 T(x) 8 0 T l 4 0 0 2 0 0 4 0 0 6 0 0 8 0 0 1000120014001600 18002000 Position (X) Figure 3-3. Exposing a point-to-point interconnect to similar exponential thermal profiles in two different directions. The above observation demonstrates that if one has the choice, choosing thermal profile T2(x) over Tj(x) is preferable. Figure 3-4 also demonstrates that optimizing thermal profiles is as important as minimizing interconnect length for delay optimization. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 24 i™ ............; ............r ......■ ■ I T - —P — "i l ' '_________I________L _ i I —•----L =2000,T 1 [ | > 22- —m—|_=2000,T2k - - -A --L=1000,T1i ! 20 - - ~--------------------------------------------------------------------------- - l-----------! - - - X - - -L=1000,T2; ; 18 - O' CD $ 16----- 0 O | | ^ U ----- _ C S CD Q 1 2 - 10 - 8 - 30 40 50 6O T §0ip< a& tu80(C D P 110 120 130 140 Figure 3-4. Performance degradation for Ti(x) and T2(x) profiles of Figure 3.2. It must be noted that the substrate thermal map is strongly dependent on the design, synthesis, floorplanning, and placement routines. As a result, analytical modeling of hot spots in the substrate can be a tedious task. However, to approximate a hot spot, one can assume a Gaussian thermal distribution (with constant peak temperature) along the length of a wire with median point [ J L at a constant peak temperature Tm ax and standard deviation < 7 as depicted in Figure 3-5. 49 I I [ i I — ♦— L=2000,T 1 I — ■— L=2000,T2 _ 4----1 - - -it - -L=1000,T1 I - - -X - - -L=1000,T2 1 L I_ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. T(x) (C) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Position (x) Figure 3-5. Constant-peak normal thermal profile with variable median fi and standard deviation < 7 along an interconnect line. By applying T(x) - 7 ^ • exp(-(jt- f i f 12cr2) to (2.4) the interconnect performance degradation can be observed. The movement of median fi along the length of the line will change the value of the delay degradation, and its effect on performance is also strongly dependent on the value of deviation a. For the same a, delay is always better for fi= L rather than for ju= 0 (0<x<L), which again shows the effectiveness of a gradual increase in the temperature along the line from source to sink. It is obvious that for the same median //, any increase in the deviation a will increase the delay. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3-6 shows the increase in the delay of a wire with length 2000 |im as a function of different /.t’s and cf s with Tm ax= 120 °C and the same electrical and thermal properties as described above for Figure 3-2. It can be observed that as n moves along the line, the location at which the maximum increase in delay occurs is also a function of the deviation a 35 •• sigma=100 sigma=400 sigma=800 sigma=1000 30 -• O 10;? 5 ■ • 1600 2000 0 400 800 1200 Median Value Figure 3-6. Delay increase as a function of the median value and the standard deviation of a normal temperature distribution. 3.4 Directional Thermal Gradients and Their Effects on Signal Delay The last two examples illustrate that the delay degradation is strongly 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dependent on the specific thermal distribution function. From a resistance point of view, fluctuations of temperature along the line are equivalent to sizing a wire with uniform resistance. In sections with higher temperature, the wire is equivalent to a thinner uniform resistance wire, and in sections with lower temperature, the wire acts like a thicker wire with uniform resistance as shown in Figure 3-7. By recalling the optimization policy for uniform resistance non-uniform wire sizing [21], the best shape for such a line is a decaying exponential from the source of the signal to the destination. Considering the two previous examples of temperature profiles, when the temperature gradually increases from location 0 to L the line has a better performance than when there is a gradual decrease in the temperature along the length of the line. Keeping in mind that a gradual increase in the line temperature is equivalent to a gradual decrease in the size of a uniform resistance line, the results are therefore analogous to optimal uniform resistance non-uniform wire sizing (assuming a constant capacitance). 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3-7. Gradually decreasing (increasing) interconnect thermal profile as an equivalent to sizing down (up) of a uniform resistance wire. 3.5 Summary In this chapter, in order to examine that how the non-uniform interconnect temperature affects the overall performance of the signals, a new temperature- dependent RC delay model is introduced. It is well known that the metal resistivity changes linearly with temperature of the interconnect line. By taking this fact into consideration, the new RC delay model employs a non- uniform resistance profile along the length of the line. Using the proposed delay model and with the help of different examples, the effects of interconnect thermal non-uniformities on the signal performance are demonstrated. It is observed that the direction of thermal gradient along the length of the long global interconnects is an important factor in determining 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the magnitude of the degradation of the signal delay in that line. This suggests that the presence of substrate thermal gradient results in a non-uniform degradation of signal delay on the long global interconnect lines. New guidelines are proposed to maintain better signal propagation times while considering the thermal non-uniformities over the substrate. 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Impact of Non-uniform Interconnect Temperature on Clock Skew 4.1 Introduction As shown in Chapter 3, the increase in the Elmore delay can be significant at high temperatures. Moreover, delay variations arising from non-uniform interconnect thermal profiles cannot be accounted for by estimating a worst- case delay based on a uniform maximum temperature along the wires. Consequently, a serious problem may arise, which is the skew fluctuations in a clock signal net. This may in turn degrade the performance of the circuit. Assume a clock net with two fanouts as illustrated in Figure 4-1. For simplicity assume that both wires 1 and 2 have the same lengths, widths, and electro-thermal characteristics (as used in Chapter 3) and are routed on the same layer. Assuming different but uniform temperature profiles along both wires, the signal skew can be extracted from Figure 3-2 by estimating the difference in delay corresponding to the two uniform temperature profiles. A 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. more realistic case arises if one of the wires develops a non-uniform thermal profile along its length due to some underlying thermal gradients over the substrate. In the worst case, one can assume that a section of the line is at one temperature and the rest of the line is at another temperature, as shown in Figure 4-1 (for wire 2), where the length x is at temperature T2 and the length (L-x) is at temperature Tj. L-x wire 2 wire 1 Figure 4-1. Portion of a clock tree with two fanout branches that have equal wire lengths. Figure 4-2 depicts the percentage of the normalized delay increase between wires 1 and 2 as a function of position x in which the thermal gradient occurs at location x, wire 1 is at a uniform temperature of 100 °C, and both the wires are 2000 fjm in length. It can be observed that as x approaches zero, the percentage of delay increase reaches its maximum value since the hotter section of the wire (L-x) (which is at T3) extends over the entire length of the 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. line. Now assume that with wire 1 remaining at temperature F/, wire 2 has a certain section of fixed length x where the temperature is lower (or higher) than the rest of the wire. We proceed to study the effect of the magnitude of the gradient between these two sections x and L-x in wire 2 on the normalized delay difference. Assume that temperature T2 in section x of wire 2 is at uniform temperature of 80 °C while wire 1 is still at uniform temperature of 100 °C. 14 - T2=120,T3-160 T2=140,T3=160 n 10-- 9 ■ - 8 • - 7 • 6 - 5 • - 0 300 600 900 1200 1500 1800 Position x Figure 4-2. Percentage of normalized delay difference between wires 1 and 2 as a function of location parameter x. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4-3 shows that the percentage of normalized delay difference between wires 1 and 2 is a function of the magnitude of the temperature gradient in wire 2. It can also be observed that the magnitude of the thermal gradient is an impoi'tant factor in the signal skew fluctuations. In this example, due to the specific definition of the thermal gradient, the skew becomes zero in a certain location along the length of the wire. The above analysis shows the importance of considering the effects of the non- uniform interconnect temperature on the clock skew. Due to the high currents driven through the clock wires, clock nets usually exhibit the highest Joule heating among signal nets, and since they span a large area over the die, the probability that they will experience some thermal gradients is much higher than that for the shorter signal nets. As a result, careful consideration of non-uniform temperature profiles is necessary in clock skew estimation along the clock signal net. It can be shown that by having the non-uniform thermal profiles along wires 1 and 2, one can calculate the effective ratio of the length of wire 1 to that of wire 2 such that the signal skew is eliminated. This design rule may be used in bottom-up merging clock tree generation techniques to ensure a near-zero clock skew routing in the presence of substrate thermal non-uniformities [1]. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. x=600 x=1200 CD o c CD w . C D 6 * ■ £ D > . _ ( T 5 ® A Q 4 - - ■ U 0 f\ N Y * 0 O 100 110 120 130 140 150 160 170 180 T3 (C) Figure 4-3. Percentage of normalized delay difference between wires 1 and 2 as a function of the temperature T ? as shown in Figure 4-1. 4.2 Thermally-dependent H-Tree Construction Technique The goal of the clock signal distribution network is to maintain a zero (or near-zero) skew through it. To ensure zero skew clock distribution, a symmetric H-Tree structure or a bottom-up merging technique can be used [60], For simplicity and without loss of generality, for this analysis an H-Tree clock topology consisting of trunks (vertical stripes) and branches (horizontal stripes) is considered as depicted in Figure 4-4. In general, the top-level 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. segments of the tree are wider than the lower level segments. Furthermore, the top-level global segments of the tree are assigned to the upper metal layers and low-level local segments are routed using the lower metal layers. The problem arises from the fact that trunk 1 and branches 2 of the H-Tree are long. Hence, they are exposed to the thermal non-uniformities in the underlying substrate. Such non-uniformity results in different signal delays at the two ends of trunk 1 and branches 2 of the H-Tree, hence there will be a non-zero skew along the tree. The temperature effects therefore result in a scenario where the symmetric H- tree cannot guarantee the zero skew. If for example, trunk 1 experiences a non-uniform thermal profile, the clock driver must be connected to this segment at a place other than the center of the segment. This also suggests that during a bottom-up binary merge construction of the clock tree [18], the actual temperature-dependent delay must be considered. Having more than a 30 °C thermal gradient in some designs [32], justifies the importance of this kind of analysis. Notice that the steady-state thermal profile of the substrate is considered. Even though the dynamic behavior of the chip causes transient changes in the cell switching activities, because of the large time constant for the temperature propagation in the substrate (around a few ms [26]), the 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. locations of the hot spots are in fact quite stable. Consider the global trunk 1 in the H-Tree depicted in Figure 4-5. Figure 4-4. A symmetric H-Tree clock distribution net. The goal is to find the division point x along the length of the segment (L) such that when the clock signal driver is connected to that point, the delay at the two ends of the trunk 1 are the same. This will in turn ensure the minimal effect of non-uniform gradients temperature on skew. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4-5. Schematic of minimum-skew clock signal insertion for an interconnect line with non-uniform temperature profile. Assume an interconnect thermal profile T(x) along the length L of trunk 1. By using the delay model described in Chapter 3, the propagation delay can be written from the source to the two ends of the trunk. By doing so and assuming balanced loads at the two ends p and q of the trunk and using (2.4), the optimum length I for ensuring zero clock skew can be obtained by solving the following equation: i" J3 \T(x)dx + r~ A = 0 (2.7) 0 where A is a constant and can be written as follows: A = 1 ( ^ + LCl + fi(Lc0 + CL) \T(x)dx- c j {xT(x)dx) (2.8) L c0+Cl 2 J 0 ( t 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given circuit parameters L, C l , c 0 , and T(x), the constant A can be computed easily and (2.7) can be solved in order to obtain the optimum position for the clock signal connection to the net segment. From (2.7) and (2.8), it is seen that with a constant thermal profile T(x) along the length of the interconnect, a zero skew can be guaranteed by connecting the clock signal at l-L/2. In fact, even a non-uniform, but symmetrical thermal profile with the symmetry axis at l=L/2 will result in a zero clock skew when the driver is connected to the middle of the line. From (2.7), it can also be seen that a gradually decreasing (increasing) thermal profile along the length of the line from 0 to L (from p to q), results in an optimum length I* less than (greater than) U2. 4.3 Experimental Results The behavior of temperature-dependent clock skew for a 2000 (im interconnect line with identical electro-thermal characteristics as those in Chapter 3 is now examined, by applying three different interconnect thermal profiles. More precisely, the effects of linear, exponential, and normal (Gaussian distribution with constant peak amplitude) thermal profiles on the clock skew will be considered. Since the global clock lines are thermally long, the thermal effects of vias/contacts at the junction of the interconnect line and 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the driver/receiver are neglected. In the first two cases, different scenarios based on high temperature levels (T# ) and low temperature levels (Ti) have 5 |; been examined (Table 1). Column 3 shows the value of / at which, by inserting the signal to the H-Tree segment, a zero clock skew is guaranteed. The reported normalized skew percentage in column 4 represents the ratio of the clock skew when l=L/2 over the delay from the driver to any endpoint of the interconnect line when 1= 1*. The third set of thermal profiles uses a constant-peak amplitude normal distribution with peak Tnua-at 100 °C, mean p. ((am) and standard deviation a (pm), which approximates the behavior of a hot spot on the substrate. As this profile is symmetric, by applying a distribution with median L/2, the zero skew is guaranteed. Moving the hot spot along the length of the line clearly increases the skew. It is clear from Table 1 that neglecting the effects of thermal profiles on the delay fluctuations, changes the skew by as much as 10 percent. The above discussion suggests that for a given thermal profile T(x), one can adjust the length of I using (2.7) and (2.8) to maintain a zero clock skew. The circuit designer can place the cells such that the hot spots have a symmetrical position relative to the higher-level segments of the clock tree or can route the clock tree such that the higher level segments are symmetrical relative to the underlying hot spots. Because the number of these high-level clock segments 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is small, it is feasible to adjust the position of the clock tree segment or the cell placement over the substrate to maintain a nearly symmetric thermal profile along the clock segments. Thermal Profile ^Pmameiersi Normalized : Skew % . . L=L/2 T{x) = ax + b T - T a — 1 L b = TL Th = 170, Tl =90 1042 5.42 Th=170, Tl=110 1032 3.98 Th=170, Tl =130 1021 2.65 Th=170, Tl =150 1012 1.29 T{x) - a-e a = Th 1 T b = — In (— ^-) L Tl Th= 170, Tl =90 957.5 5.24 Th=170, Tl=110 968.66 3.63 Th=170, Tl =130 979.5 2.40 Th=170, Tl=150 989.7 1.19 T(x)=T -e 2 0 v 7 mix (1=2000, a=1000 1210 7.78 (1=1000, a=400 1000 0.0 (1=500, o=400 827 10.7 (1=300, o=700 911 9.57 Table 1. Comparison between different thermal profiles and their effects on clock skew. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4 Summary In this chapter, it was shown that non-uniform temperature distributions along long global wires in high-performance ICs can have a significant impact on the interconnect performance and the worst-case clock skew. An analytical model that helps designers deal with non-uniformities in the interconnect thermal profile during clock net routing has been presented for the first time. The proposed clock routing methodology ensures a near-zero skew clock tree by accounting for the substrate non-uniform temperature profiles. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Effect of Non-uniform Substrate Temperature on Buffer Insertion 5.1 Introduction Buffer insertion is an effective technique to reduce the interconnect delay. Some earlier works build the fanout tree and insert the buffers simultaneously [49]. However, most of the buffer insertion techniques start with a fixed tree topology for the fanout net and insert buffers into the tree topology later. Reference [56] finds the optimal delay in a fanout tree by permitting only one non-inverting buffer to be inserted in each segment of the tree. References [5] and [44] describe wire segmentation algorithms that allow buffer insertion in a pin-to-pin wire segment of an RC fanout tree. This thesis studies the wire segmentation algorithm by considering the influence of the substrate thermal gradients on the performance of the global interconnects and the transistor switching speed. By using a distributed RC temperature-dependent delay model, it is shown that in the presence of the substrate temperature gradients 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the techniques provided in [5] and [44] become non-optimal. To have the maximum efficiency through the buffer insertion, buffers must be sized and placed very carefully along the interconnect line. More precisely [5] and [44] propose equal distances between adjacent buffers in the inserted buffer chain in order to minimize the signal delay. For a non-uniform interconnect thermal profile, it is shown that the distances between the adjacent buffers do not remain equal and that they are strongly dependent on the thermal profile of the underlying substrate. 5.2 Temperature-dependent Driver Resistance In addition to the dependency of the line resistance to interconnect temperature profile that was discussed in Chapter 3, some CMOS device parameters are also dependent on the substrate temperature including the threshold voltage (Vt), mobility (ju) and energy bandgap of silicon (£■ „). It can be shown that thermally dependent variations of mobility and energy bandgap are usually small and not comparable to the threshold voltage variations. For the sake of simplicity, assume that the major parameter affected by temperature is the threshold voltage. The first order approximation of the rate of threshold voltage variation as a result of the thermal gradients can be written as follows [57]: 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where T is the device temperature and q is the charge of the electron. For silicon the E /q is equal to 1.12 Volts. The variation of threshold voltage directly affects the current drawn from the power source and the transistor switching performance. Note that Cl and Cp shown in Figure 3-1 will not change with temperature variations. In its simple form, the device driver resistance can be written as follows [60]: L, „ / w Rd = ------- 1---------- (2.10) m co x(vdd- vt ) where Le fj is effective channel length, w is the channel width, // is the mobility, C„x is the gate oxide capacitance and V d d is the power supply voltage. From (2.9) and (2.10), it is obvious that threshold voltage variation would cause R(i variations, and the rate of driving resistance change due to thermal gradients can be written as follows: ARd _ (Eg/q) + VT AT Rd " Vm -V T ' T (2 .11) 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It can be seen that the rate of driver resistance variations due to the temperature fluctuation is strongly dependent on the power supply voltage and threshold voltage (and both of them are technology dependent). Figure 5-1 shows the normalized driver resistance as a function of device temperature for different technology nodes based on ITRS specifications [37] (with the assumption of having unit driver resistance at T= 25 °C). The SPICE 0.5 pm data has been extracted from [39]. Note that the driver resistance at 25 °C is the base of the normalization (although its actual value depends on the technology feature size). 8 - # ■ ■— • 0 . 5 ( s p i c e ) - H 0 . 5 - • — 0 . 2 5 0 . 1 8 - e — 0 . 1 3 7 6 T , 5 C C -a o 4 N ( 0 E 3 o Z 2 1 0 10 30 90 110 130 50 70 Temperature ( C ) Figure 5-1. Normalized driver resistance as a function of device temperature for different technology feature sizes. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Optimal Buffer Insertion Technique The goal in buffer insertion is to find the number, size and exact location of the inserted buffers along the length of the line, such that the delay is minimized. In [44] it was shown that in a given technology for a buffer chain, there is a critical length between each two buffers, which results in minimization of the delay between the first and the last buffer. In that work, it was assumed that the source, the sink and the buffers have the same size, which results in the same output driver resistance Rti and gate input capacitance Cl for all of them. It was shown that the critical length and the optimal size of the inserted buffers are dependent on the process technology and the interconnect layer assignment and are not dependent on the driver specifications or the number of inserted buffers. The critical length (lopt) and optimal buffer size (sopt) are as follows: where ro, co, r, c, and cp are the minimum size transistor output resistance (K£2), minimum size transistor input capacitance (pf), unit length interconnect resistance (KO/fim), unit length interconnect capacitance (pf/pm) c (2 .12) rc 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and parasitic output capacitance (pf) for minimum sized transistor, respectively. In [44] it was assumed that the interconnect is a homogenous line so the r and c are uniform along the length of the line. Having an interconnect with length L, the authors in [5] have shown the optimal number of buffers k with size b that minimizes the delay of the line can be written as follows: where Rc i, Cl and Cp are the buffer output resistance, input capacitance and junction capacitance, respectively. In order to have maximum delay reduction, the buffers must be sized by s„pt as stated in (2.12) (Rd=ro/sop, , C[=c0.soph , Cp=cp.sopt ). Moreover, it has been shown that the optimal spacing of the buffers is at equal increments of L/(k+1) on the interconnect line [5], In this scenario, the propagation delay constant for each segment between two adjacent buffers will be the same for all the k+ 1 segments as illustrated in Figure 5-2. (2.13) 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L < ► k +1 Figure 5-2. Structure of standard buffer insertion in a uniform line with equal segmentation. In Figure 5-2 the buffers b p and b$ are the driver and the sink of the interconnect, respectively, and they are assumed to be the same size as the buffers b t (i= 1, 2, 3,..., k). However, employing drivers and sinks with different sizes than the buffers can be easily addressed in (2.12) and (2.13) as was shown in [5]. In that case, the distance between the driver (bo) and the first buffer {bt ) and the distance between the last buffer (bt) and the sink (bs) are not equal to the distances between any two adjacent buffers. However, the driver and the sink can always have the same size as the inserted buffers by cascading up or down from some suitable buffers in the library. For the sake of simplicity, assume that the size of the driver, the sink and the inserted buffers are all the same through the rest of this work. As seen in (2.1), the interconnect resistance is strongly dependent on the line temperature, and any existing gradient in the substrate temperature will affect 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the signal propagation delay. This suggests that the proposed technique of an equally segmented interconnect does not result in an optimal delay reduction in the presence of a non-uniform substrate thermal profile. For an interconnect from the source ho to a single sink b$, the goal is to find locations of k buffers to be inserted along the length of an interconnect in order to minimize the signal propagation delay subject to a non-uniform substrate thermal profile T r c ! j(x) along the length of the interconnect. The capacitance per unit length c is assumed to be constant and the resistance along the length of the line is a linear function of T ref(x) as stated in (2.1). It is assumed that all the k buffer sizes are equal to each other, and that can be found by using (2.12). Considering both a variable size for each buffer and a variable distance between each two adjacent buffers makes the problem extremely complicated to solve. As a result, this thesis just tries to find the exact location of each buffer along the length of the interconnect line. It must be noted that buffer insertion is generally performed after initial floor planning and cell placement at which point an initial thermal map of the substrate is known. For simplicity, it is assumed that inserting new buffers along the length of an interconnect line does not considerably change the power consumption map of the underlying substrate. This is due to the fact 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that the power dissipation of these newly inserted buffers is a small fraction of the power dissipation due to the switching activities of the densely placed cells in the surrounding areas. Using the grid-based fast thermal simulation for the substrate [26], the average power consumption at each grid cell (which has many individual cells in it) contributes to the overall heat generation of that area. As a result, it is assumed that the temperature of each inserted buffer will eventually reach a steady-state value equal to the local substrate temperature. Based on (2.11), the cell driver resistance is a linear function of the local substrate temperature and can be written as follows: f l„ W = R,ra(l + A - V * ) ) (2 J 4 > where Rc i(x) is the driver resistance profile of the transistors with thermal profile Tref(x), R,io is the cell driver resistance at reference temperature (namely 25 °C) and f 3 c is the temperature coefficient of the driver resistance (1/°C). p c can be extracted from SPICE simulations at different temperatures or by using (2.11). Based on Figure 5-2 with buffer locations xj, x?, xj, ..., xk as variables, the path delay from source to sink can be written as follows: 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. k + \ & + 1 £» = X ( J ' ~^) + CL)d T )+ ^R ll(xj_ l)(cxi -cxM +CL+Cp) (2.15) / = i x i i = i where xo and X k+ i are constants equal to zero and L respectively. The first term is the interconnect delay while the second term represents the gate delay. Based on the functionality of Tr e j(x) and dependency of R(x) and Rd(x) on it (as shown in (2.1) and (2.14)), path delay (2.15) may or may not behave as a convex objective function. In case of being convex, the global minimum can be obtained by solving the systems of k partial differential equations (by using the partial derivatives of D with respect to variables x,’s (i = 1, 2, 3, ..., k) and setting them to zero). In general, the derivative of D with respect to the position of the z th buffer (1 <i<k ) can be written as follows: (2.16) ( / X X ■ , C X l M i In the case of a non-convex optimization problem, solving the k partial differential equations may result in a local minimum. One can use the Quasi- Newton method to approximate the delay objective locally by a quadratic function that can be further minimized near-globally with a total order of convergence of at least two [40]. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4 Experimental Results Now the effects of the non-uniform interconnect temperature profile on the buffer insertion problem are examined. Even though the method presented in the previous section derives the location of the inserted buffers, the optimal number of buffers, k, needs to be found. Due to the non-uniform thermal profile of the interconnect, the line resistance per unit length is a function of the position along the length of the line which has a minimum rm in and maximum rm ax. These two values are used in (2.13) to find the optimal number of buffers that need to be inserted into a line. Using the extreme values of the line resistance per unit length (rm in and r,mx) may result in different values for k. In that case both values are examined and the minimum delay resulting from using the suitable number of buffer stages for each case is used. Table 2 shows the parameters used in these experiments for different technology nodes based on ITRS specifications [37]. By using a simple linear function (ax+b) of the position x for the substrate temperature Tr e f(x), the temperature-dependent buffer insertion technique is examined. Note that in reality the substrate thermal profile is dependent on the design, synthesis, floor planning and placement techniques and the temperature profile along the substrate may not be a linear function of x. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Parameter 0.18jam 0.13 0 i 0.1 jim r (KQ/m) 36.3 60.1 103.9 c (pf/m) 269 240 154 Cl ( ff) 1.9 1.7 1.5 R d (K Q ) 8 9.5 10 CP( ff) 4.8 3.5 2.5 l< ) p t ( mm) 3.33 2.5 2.22 Sopt 174 151 110 VD D (V) 1.8 1.5 1.2 Table 2. Param eters used in generating experimental results for three different technologies based on ITRS specifications. In this example, it is assumed that a 75 °C thermal gradient between the two ends of the wire is present (from 25 °C to 100 °C). Furthermore, it is also assumed that the left side (bo in Figure 5-2) of the interconnect line is the cooler side at 25 °C. For illustrative purposes, consider two cases while optimizing the objective function (2.15): (i) non-uniform driver resistance Rd, uniform interconnect 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. resistance per unit length r and (ii) non-uniform driver resistance Rd, non- uniform interconnect resistance per unit length r. Figure 5-3 shows the percentage of performance improvement for temperature-aware buffer insertion in comparison to the standard buffer insertion techniques using buffers from the library with optimal size and optimal length between each two adjacent buffers defined by (2.12) for different technology nodes, hr Figure 5-3, the vertical axis shows the percentage decrease in the signal propagation delay. The graphs labeled with Rd are those related to the analysis where only Rd is considered as a temperature-dependent variable, while the graphs labeled with (Rd+r) consider both temperature dependent Rd and r. It can be seen that as the feature size shrinks down, the effects of the substrate thermal non-uniformities on signal performance becomes more critical. It can also be observed that as the interconnect length increases (which results in an increased number of inserted buffers) the improvement in the signal delay becomes less than that for the shorter lines, and it will eventually saturate to a lower bound. One notable fact is that the performance improvement in the case of only variable Rd is more than that the case with both variable Rd and variable r. From the interconnect resistance point of view, in order to minimize the signal 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. delay an increasing thermal profile from source to sink requires that the inserted buffers to move to the right side to reduce the length of the interconnect between each two adjacent buffers in the areas with higher interconnect temperature. This can be seen in Figure 5-4. Figure 5-4(a) shows the standard buffer insertion result for one buffer. Figure 5-4(b) shows that by considering only a variable interconnect resistance (r) and constant Rd, the location of the inserted buffer must be shifted to the right side to reduce the length of the section with higher temperature. However, from the driver resistance point of view, using an increasing thermal profile from source to sink, forces the inserted buffer to move to the left side to reduce the device resistance as much as possible as shown in Figure 5-4(c). From Figure 5-4(b) and Figure 5-4(c), it can be observed that the movement of inserted buffers is more severe in the case of having a temperature dependent Rd instead of having a temperature dependent r. This is expected, as the magnitude of Rd is much more than that of the interconnect resistance per unit length. In addition, the device driver resistance dependency on the temperature is much more severe than that for the interconnect resistance. As a result, by considering both variable Rd and variable r, the inserted buffer lies between the computed locations of the buffer for the case of Figure 5-4(b) and Figure 5-4(c) as shown in Figure 5-4(d). 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5-5 shows the dependency of delay improvement on the magnitude of thermal gradient between the two ends of the wire for different technologies. It can be observed that as the gradient increases, the standard buffer insertion techniques become less efficient. This shows the importance of having a temperature aware buffer insertion technique that takes into account the fact that in a high performance design a 40 °C to 50 °C thermal gradient is inevitable and that these gradients tend to increase for the future technologies. 1 6 - - > ■ ❖ — 0 . 1 8 ( R d ) - A — 0 . 1 8 ( R d + r ) O — 0 . 1 3 ( R d ) - B — 0 . 1 3 ( R d + r ) - * — 0 . 1 ( R d ) - ♦ — 0 . 1 ( R d + r ) 1 4 . . . ,0 o CL E 0 > o c c d E o t: ® CL 2 3 4 5 6 Number of buffers k Figure 5-3. Delay improvement due to the temperature-aware buffer insertion technique in comparison to the standard buffer insertion technique for different numbers of buffers in different technologies based on ITRS. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a ) (b) (c) (d ) x = 3 3 3 0 y tm > x= 3 4 3 0 [ lm x= 8l.5 \im D > E > x= 1 5 4 .6 \lm > Figure 5-4. Location of an inserted buffer in a 6660 jxm line (0.18 pm technology): (a) standard technique (b) temperature-aware technique with only variable r (c) temperature-aware technique with only variable Rd (d) temperature-aware technique with both variable Rd and variable r. 0 . 1 3 1 2 - - Q. 4 - - - 1 5 2 5 3 5 4 5 6 5 7 5 5 5 Temperature gradient (C) Figure 5-5. Delay improvement due to the thermally aware buffer insertion for one buffer as a function of different thermal gradients between the two ends of the line in comparison to the standard buffer insertion techniques for different technologies. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5-6 shows the dependency of the performance improvement as a function of percentage of optimal length (lopt) defined by (2.12), which shows that thermally-aware buffer insertion will be more effective in interconnects with critical lengths less than the optimal length between each two adjacent buffers provided by Table 2. This suggests that the optimal buffer size and total optimal length of the wire connecting the source and the sink can also be adjusted in the presence of non-uniform thermal profiles along the length of the interconnect line. 0.18 0.13 2 5 - 20 -• CL E 0 O c ( 0 E u . 1 5 - - 5 - - 1 . 4 0 . 7 0.8 0 . 9 1.1 1.2 1 . 3 Ratio of the line length to (k+1).lopt Figure 5-6. Delay improvement due to the thermally-aware buffer insertion for one buffer as a function of percentage of critical length in comparison to the standard buffer insertion techniques for different technology nodes. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.5 Discussion As was shown, variations of cell output resistance is a very important factor to move the solution of buffer insertion optimization out of its optimal point. Cell temperature is the main factor on Rj variations. The fluctuation of the cell driving resistance Rc i affects the switching performance of the cell dramatically. As a result considering substrate temperature is a very important factor in signal performance related problem formulations. By studying (2.10), one can see that, in addition to the cell temperature, technology feature size and power supply voltage also control the driving resistance variations. Power supply voltage variations are caused mainly by the IR-drop effect. Generally, the cells on the substrate are connected to the power supply and ground though via/contacts. The power supply network is usually in the form of a mesh or grid. Having a high current density drawn from each grid segment and a long distance between cell power contact and the power supply causes a voltage drop generated by the resistance of the power path. This IR-drop causes voltage variations on the power contact of different cells along the substrate surface. As a result, each cell has a power supply dependent driving resistance and switching performance. It was shown that IR-drop could cause serious problems for the functionality and performance of a circuit. The authors in [48] showed that having 10% IR-drop could cause up to 10% clock 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. skew. The major skew generation is because of the Rj variation that affects the switching performance of different cells in the clock tree. In the same way it can be shown that R j variation caused by substrate thermal effects can cause significant clock skew [60], Recalling the results of Rd variation in previous section on the optimal buffer insertion, one can conclude that the IR-drop also has a major impact on the buffer insertion policy. On the other hand, substrate thermal gradients will affect both the signal interconnect and the power grid resistances. It is expected that those gradients affect the IR-drop as well and in the worst-case scenario increase it significantly. As a result, substrate thermal gradients can affect the overall signal performance by 1) the effects of temperature on the cell performance through R < j variations 2) by the effect of temperature on the power grid IR-drop (through the resistance-dependency of the power grid interconnect lines) and 3) by the effect of temperature on the global signal interconnect line (through the resistance-dependency of the signal metal lines). We believe that in order to obtain optimal results in different EDA flow steps, one must take all these three thermal effects into consideration (with the understanding that as the technology feature size shrinks down these effects become much more severe. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.6 Summary In this chapter, an analytical model for addressing the effects of substrate temperature non-uniformities on the position of inserted buffers along the interconnect lines has been formulated. It is observed that the delay degradation caused by the effects of temperature on the cell driver on- resistance are much more severe than that caused by the interconnect resistance thermal dependency. It is shown that as VLSI technology scales down, these non-uniform thermal effects will become more severe and must be taken into account in the design methodology to ensure near-optimality of the performance. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Analysis of IR-drop Scaling with Implications for Deep Submicron P/G Distribution Network Designs 6.1 Introduction This section presents a detailed analysis of the power-supply voltage (IR) drop scaling in DSM technologies. More precisely, the effects of temperature, electromigration and interconnect technology scaling (including resistivity increase of Cu interconnects due to electron surface scattering and finite barrier thickness) are taken into consideration during this analysis. It is shown that the IR-drop effect in the power/ground (P/G) network increases rapidly with technology scaling and that using well known counter measures such as wire-sizing and decoupling capacitor insertion with resource allocation schemes that are typically used in the present designs may not be sufficient to limit the voltage fluctuations over the power grid for future technologies. It is 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also shown that such voltage drops on power lines of switching devices in a clock net can introduce significant amount of skew which in turn can degrade the signal integrity. With CMOS process technology scaling down to 0.13 pm and below, IR-drop is becoming an extremely important phenomenon, determining the performance and reliability of ULSI designs. The IR-drop effect manifests itself in P/G distribution networks and can adversely influence the performance of the signal nets including the clock tree [16],[48]. Aggressive interconnect scaling increases the resistance per unit length of wires and the average current density, which results in a significant voltage drop along the global wires. Since the supply voltage level is also reduced with technology scaling (which plays an important role in low power design techniques), the IR-drop effect becomes even more problematic since the ratio of the voltage drop to the ideal supply voltage level increases, which in turn degrades the switching speed of the CMOS gates and their DC noise margins. An excessive voltage drop in the power grid may also result in a functional failure in dynamic logic and a timing violation in static logic. It has been shown that a 10% voltage drop in a 0.18 pm design increases the propagation delay of the gates by up to 8% [48]. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As a result, the main challenge in the design of the power distribution network is to achieve a minimum acceptable voltage fluctuation across the chip (nominally up to a maximum value of 10% of Vdd) while satisfying the electromigration (EM) reliability rule for the power network segments and to realize a power distribution network by consuming a minimum routing area of the interconnect metal layers [51]. A critical issue in the analysis of the power distribution network is the large size of the problem. Simulating ail the nonlinear devices in the chip together with non-ideal power grid is not computationally feasible. Thus, the simulation is usually carried out in two separate steps. First, the non-linear devices are simulated assuming perfect supply voltages and the currents drawn by the devices are calculated. Next, the devices are modeled as independent time-varying current sources. The error incurred by ignoring this non-linearity is usually negligible. By performing these two steps, the problem of power grid analysis is reduced to solving a linear network [41]. In addition to the wire-sizing technique, in order to reduce the effect of switching noise on the P/G network, decoupling capacitors are added near the switching devices over the substrate [8], These capacitors act as local reservoirs of charges for switching circuits and reduce the effect of the power supply glitches. Optimal value and placement of on- 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. chip decoupling capacitors is an essential step to maintain a robust P/G network [23],[45]. In addition to the increase in the resistance per unit length of the metal layers with technology scaling, some physical effects such as electron surface scattering and finite barrier thickness contribute significantly to the overall metal resistivity of the local thin lines. Furthermore, it has been shown that as technology feature size is reduced, the peak chip temperatures that occur on the global metal layers increase rapidly due to the self-heating effect [35]. This can cause further increase the metal resistivity. Most of the recent research reports on the IR-drop effect have mainly focused on the methodology of efficient computation of the voltage drop values for each gate in typical P/G networks [20],[31],[42], In this Chapter various effects of the technology scaling and temperature issues in analyzing the IR-drop phenomenon are considered. It is shown that the IR-drop effect in the power/ground (P/G) network increases rapidly with technology scaling and that using well known counter measures such as wire- sizing and decoupling capacitor insertion with resource allocation schemes that are commonly used in present designs may not be sufficient to limit the 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. voltage fluctuations over the power grid for future technologies and new guidelines should be introduced. 6.2 Topology of Power Distribution Networks for IR-drop Analysis The function of the power network is to carry current from the power chip pads to all of the cells in the design. The power network often has complex and tight electrical specifications, making its design a challenging task. It often consists of a top-level grid network that distributes current from the power chip pads (which are uniformly distributed on the chip area) to the local power tranks and low-level distribution structures that distribute the current from these trunks to the cells. The top-level grid itself is made of global and/or semi-global wire lines that are connected together through vias or a stack of vias. Initially, the number and width of the horizontal and vertical lines in the global/semi-global power grids are determined based on the EM rules. Simulating the power grid requires solving a set of differential equations that are formed through a typical approach like the modified nodal analysis (MNA) [46] as follows: 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. G ■ x(t) + C • x(t) = u(t) (2.17) where G is the grid conductance matrix, C is the grid capacitive (including the decoupling capacitances) and inductive matrix, x(t) is the time-varying vector of grid node voltages and currents through the inductors, and u(t) is the vector of time-varying current sources attached to grid nodes. The set of differential equations (2.17) can be solved efficiently by using the backward Euler technique. In this work, an RC model of the MNA has been used for the sake of simplicity. Due to the fact that the grid matrix is very sparse, an iterative conjugate gradient method is used to solve this linear system where it also exploits the symmetry and positive definitivity of the grid matrix. Figure 6-1 shows the RC network model used for extracting the system of (2.17). Time- varying current sources and decoupling capacitances are connected to each intermediate node in the global/semi-global grid. The amount of the current and decoupling capacitance can be derived by examining the power consumption profile and device count of the underlying functional blocks on the substrate connected to each grid node. By solving the system of (2.17), the voltages on each node at the global and semi-global grids are known, which will then feed to the local trunks. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In standard cell-based designs, by putting a power trank adjacent to (or on top of) a cell row, power can be distributed among cells in that row (Figure 6-2). Notice that a second trunk is needed for the ground network. The ground distribution network is not discussed in this thesis since its analysis is similar. Accordingly, a number of cells (usually between 50 to 100 cells) that belong to the same row of the same functional block in the design are connected to a single power trunk. The power tranks are usually routed in Metal 1 and are connected together on one side by using a strip of Metal2, making a comb-Yike structure as shown in Figure 6-2. To achieve better results both in terms of the local IR-drop and EM reliability, one can use out-of-block extensions and connect both sides of the power trunks together. By using a stack of vias, one or both sides of the local power trunks are connected to one or more nodes on the top-level global/semi-global power grid. For simplicity and without loss of generality, inverters are used to represent the cells that are powered by the local power trunk. A circuit model of the local power trunk is depicted in Figure 6-3 where N-2 identical inverters are assumed to be connected to a power trunk. Capacitors Cdi s include both the built-in (n-well) decoupling capacitors, and the add-on 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (thin-oxide) decoupling capacitances. The total on-chip n-well decoupling capacitor is determined by the area, depth and perimeter of each n-well. In high performance switches, thin-oxide decoupling capacitors should be placed in close proximity to the highly active switching devices. Figure 6-1. RC model of a power bus network. Each intermediate node is connected to underlying circuit blocks modeled as time-varying current sources Is’s and on-chip decoupling capacitances C decap’s. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In Figure 6-3 the distance between consecutive inverters connected to the trunk is equal, resulting in equal resistances Rj through R^.j. Assuming the trunk as a resistive-only network for the time being, by having voltages Vj and Vn and modeling the current drawn by each inverter as a current source /„ voltage Vi at each intermediate node in Figure 6-2 is calculated as follows: N - I N - 2 ^ J R hi+i ~ hi ' h , hi X ^ R h , ~ V‘ ^ R " Vn ^' R‘Iei Note that the resistive voltage drop derived in (2.18) is the worst-case scenario since the current sources /, ’s are depend upon on the magnitude of Vi+i’s. In order to calculate the actual IR-drop, one must use the nonlinear voltage- dependent source current by using the /* of the switching device and repeatedly solving (2.18) until the solution converges. Notice that the effects of the decoupling capacitors and interconnect capacitance per unit length has been neglected in the derivation (2.18). Using this model, by solving the linear network matrix coefficient for the power grid through (2.17), one can solve for the IR-drop for the entire network in an iterative manner. The degree of IR-drop is design-dependent and varies based on the location of cells connected to the grid, the switching activity of each cell, and the location of power pad connections to the grid. As a result, in our experimental setup the 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. IR-drop in a local power trunk is examined by inserting reasonable number of inverters in it. These local power trunks are connected to the global/semi- global power grids. To emphasize the worst-case scenario, it is assumed that all inverters connected to the grid segment switch at the same time. It is also assumed that by using a ball grid type of pin assignments in the problem setup, the power pads are uniformly distributed on the chip area. i T T " T I I ~ 1 I r T ~ '. I T | L Power Trunk (Ml) i i n r VIA1 i . j ; / Standard Cell Row / Metal2: Going towards semi-global power Figure 6-2. A local power distribution network for a typical standard cell design consisting of power trunks in a comb-line structure connecting to the semi-global power grid through meta!2. 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. •N-l v, •N-2 Cd Cd, 'N-2 Figure 6-3. Magnified view of a trunk segment, containing a resistive network with inverters connected to the intermediate nodes. To alleviate the large transient current flowing through the inductance of the global/semi-global grid and limit the IR-drop, decoupling capacitances must be placed through out the chip. Nominally, the stored charge on these capacitors will supply the required transient current for the 10% of the clock period. The charge will be replenished during the remaining 90% of the CPU clock cycle time. To calculate the amount of the needed decoupling capacitances in order to maintain a limited voltage drop, one can use the following formula [8]: P = p(CTVjdf ) (2.19) 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where P is the total chip power consumption, p is the probability that a power transition occurs, Vdd is the supply voltage,/is the clock frequency and Cj is the effective chip capacitance. Assuming a maximum voltage variation of 10%, the computed decoupling capacitance needed for future technologies varies in a range of 39-72 nF/cm2 (for 0.18 to 0.07 pm respectively) [15]. Using the heuristic rule given in [17], the amount of decoupling capacitance needed to accomplish a limited voltage swing (-10%) can be deduced from: C = ^ T (2.20) J VM For a metal-insulator-metal (MIM) capacitor having a dielectric with Toxeq- 1 nm, the capacitance is about 34.5 fF/pm2. Using this value and (2.20) one can calculate the amount and the area of total decoupling capacitance needed in each technology [36]. These values are indicated in Table 3 for different technologies. 6.3 Methodology for Power Network Planning A key concern for the P/G network design is the large amount of current that flows through the interconnect lines, especially on the global layers, which gives rise to EM-induced failures. EM is the transport of the mass in the metal 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. under an applied current density and is widely regarded as a major wear out or failure mechanism of VLSI interconnects [13], When current flows through the interconnect metal, an electronic wind is setup opposite to the direction of current flow. These electrons upon colliding with the metal ions, impact sufficient momentum and displace the metal ions from their lattice sites creating vacancies. These vacancies condense to form voids that result in an increase of interconnect resistance or even open circuit conduction. EM lifetime reliability of metal interconnects is modeled by the well-known Black’s equation, given by: TTF = A ■ f " ■ e x p ( - p j- ) (2.21) where TTF is the time-to-fail (typically for 0.1% cumulative failure). A is a constant that is dependent on the geometry and microstructure of the interconnect line, and j is the average current density. The exponent n is typically 2 under nominal conditions, Q is the activation energy for grain- boundary diffusion (~0.5eV for 0.1 pm Cu), kB is the Boltzmann’s constant, and Tm is the metal temperature. The typical goal is to achieve 10-year lifetime at 100 °C, for which (2.21) and accelerated testing data produce a design rule value for the acceptable current density, jo, at the reference temperature, Tr e f. However, this design rule value does not comprehend self- 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. heating [10]. Based on the technology roadmap predicted values provided by ITRS [37], the values for the maximum allowable current density, j 0, at a specific temperature, Tref, for different technologies are given. On the other hand it is well know that interconnects at different metal layers experience different temperatures [35], For higher layer interconnects, the distance between the metal lines and substrate increases which in turn increases the effective thermal resistance of the underlying dielectric, causesing an increase in the metal temperature. As a result the global interconnects get hotter than the local interconnect lines. Based on the values of jo and Tref given in ITRS for different technologies, the new values for acceptable amount of current density j m such that the EM lifetime rule still remains satisfied at a new temperature Tm are easily calculated by using the following relationship which can be deduced by using (2.21): j„ = J n & r t j - f e r * - ) ) 7 (2.22) m ref Based on ITRS guidelines only current density limits for satisfying EM have been given at a specific temperature for different technologies. By using (2.22) the maximum allowable current densities at different temperatures can be easily calculated. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.1 Power Network Electromigration Rule Satisfaction Using the above discussion, the first step in planning the power network on the global and semi-global levels, is to satisfy the EM rales. Table 3 shows the different parameters for future CMOS technologies based on ITRS guidelines [37]. Using (2.22) and the maximum temperatures at the global and semi- global tiers in different technologies, one can calculate the maximum allowable current density j m for global and semi-global tiers at each technology node. With the knowledge of total power consumption and power supply voltage, one can calculate the maximum current drawn from the power supply. Dividing this value to the number of the power pads, which is usually half of the given value of the P/G pads in the ITRS guideline (and is usually 2/3 of the total number of I/O pads in today’s manufacturing chip technologies), one can calculate approximately the average current drawn from each power pad. Note that a ball grid type of I/O packaging has been assumed here. The maximum current drawn from each power pad is a limiting factor on the EM rule for the power grid interconnects in the area surrounded by that pad. In 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. general, the minimum number of the minimum-width gridlines required in a global power network in order to satisfy the EM rules can be calculated approximately as follows: 1 1 PIV # Tracks = — (— x ----- ^ -)05 (2.23) W ar N padL where w is the minimum width of each power track at the corresponding metal layer (i.e. global or semi-global and it is usually half of its defined pitch), ar is the aspect ratio, P is the total power consumption, VM is the supply voltage, Np c u i is the number of power pads, and j m is the maximum allowable current density to satisfy the EM rule at the corresponding layer (i.e. global or semi- global). Using (2.23) and Table 1, the minimum number of minimum-width gridlines needed for different technologies in order to satisfy EM rules in global and semi-global tiers are calculated and shown in the Table 4. It can be seen from Table 4 that, by going from global tiers to semi-global ones, the power grid becomes gradually denser, which was expected due to the decrease in the pitch (which makes semi-global lines more resistive in comparison to the global ones). Also, note that by technology scaling the power grids become much denser, both on the level of global and semi-global tiers. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. N ode(|im ) M 8 ; 1 -Ik lT : ' :: : 0 , 1 / ; : 0.07'; jo(A/cm2 ) 5.8E5 9.6E5 1.4E6 2.1E6 Chip size (mm2 ) 450 450 622 713 Vdd(V) 1.8 1.5 1.2 0.9 Frequency (Mhz) 1000 1700 3000 5000 P(W ) 90 130 160 170 On_Chip C_Decap (nF) 250 305 333 377 Tm ax (°C) 120 140 150 175 # of P/G pads 1536 2018 2018 2560 Global pitch (nm) 1050 765 560 390 Semi-global pitch (nm) 640 465 340 240 Global layer line ar 2.2 2.5 2.7 2.8 Semi-global line ar 2.0 2.2 2.4 2.5 R-local (KQ/m) 76.23 125.96 219.56 435.5 Table 3. Technology parameters used in this work based on ITRS data for Cu. T„,ax is the estimated maximum temperature in the top most metal layer as per [35]. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Node (pm) 0.18 0.13 0.1 0.07 T m a x global (°C) 120 130 162 170 T„ia x semiglobal (°C) 117 126 150 160 jm/jo 0.74 0.62 0.36 0.33 #global tracks @ 105 °C 526 796 1451 2346 #global tracks @ Tm a x 705 1281 3964 7230 #semiglobal tracks @ 105 °C 1559 2448 4429 6940 #semiglobal tracks @Tra a x 2050 3600 10016 18385 Table 4. Minimum number of (minimum-width) power tracks needed to be routed on the power grid at global and semi-global tiers (in order to satisfy the EM rules) which was calculated based on (3) for T=105 °C and Tmax- The j„, value is the maximum allowable current density at Tm ax for the global interconnect layer in the table. 6.4 Effects of Technology Scaling on the IR-drop Effect 6.4.1 Effects of Thin-film, Barrier Thickness and Interconnect Temperature In ULSI interconnects, metal resistivity begins to increase as the minimum dimension of the metal line becomes comparable to the mean free path of the electrons. This is because surface scattering starts having a considerable 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. contribution to the resistivity compared to the contribution due to the bulk scattering. Resistivity p of a thin-film metal can be expressed in terms of the bulk resistivity p 0 as [6]: A = 1 - — ( l - p ) f ( - i — L) 1 g dx (2.24) where k=d/Xm fV , d is the smallest dimension of the film (width in our case), Xm fp is the bulk mean free path of electrons and p is the fraction of electrons which are elastically reflected at the surface. For Copper p -O A l and X n ^ A H A at 0 °C [24], Moreover, since the temperature alters the mean free path of the electrons, the temperature coefficients a of the thin film is also different from its bulk temperature coefficient ( X q . Another effect that is responsible for increased resistivity, is the presence of barrier material for Cu interconnects. Since the resistivity of the barrier material is extremely high compared to Cu, it can be assumed that Cu carries all the current. Therefore, the effective area through which the current conduction takes place is reduced, or equivalently the effective resitivity of the metal line of the same drawn dimension increases. This becomes more of a problem as metal lines scale since it is very difficult to scale the thickness of the barrier material. This work considers the impact of both the above-mentioned effects and estimates 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the effective resistivity increase. The effective resistivity and temperature coefficient ratios for the global, semi-global and local tier metal for various technology nodes is given in Table 5. It is also well known that interconnect resistance changes linearly with its temperature. This relationship can be written as R=r0 (l+j3.AT) where r0 is the unit length resistance at reference temperature and /? is the temperature coefficient of resistance (1/°C). However, by including the effects of scattering and thin-film, this equation can be re-written as follows: « = „ (i+ P — AT) (2.25) A) ' «o Node (pm) 0.18 0.13 0.1 0.07 (Global)p/po 1.066 1.090 1.125 1.186 (Semi-global) p/p0 1.113 1.358 1.222 1.334 (Local) p/po 1.158 1.222 1.315 1.485 (Global) c Uoq 0.953 0.935 0.912 0.875 (Semi-global) oJoq 0.923 0.895 0.858 0.803 (Local)a/oco 0.902 0.867 0.82 0.752 Table 5. Effective resistive (barrier plus thin-film) and temperature coefficient ratios for the global, semi-global and local tiers for various technology nodes. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As will be seen later, in order to reduce the maximum voltage drop, the global and semi-global tier metals usually have widths that are many times larger than the minimum width lines. As a result the barrier-thickness effect can only be considered for the local lines. On the other hand, the global and semi- global tiers are the hottest lines inside the chip and the effect of their self heating is considerably large. As a result, for global and semi-global lines the effect of line temperature should be considered. 6.4.2 IR-drop in Global/Semi-global Power Network Based on the minimum required number of the power gridlines (for specific wire width) as calculated from (2.23) for both the global and semi-global grids at each technology node, the system of linear equation (2.17) can be built for combined global and semi-global power grids and solved to find the voltage at each node. Nodes at the semi-global power grid distribute the power to the local power tranks through a via or a stack of vias and/or metal2. Hence, by finding the worst-case IR-drop over the nodes at the semi-global level and accounting for the drop over the vias, one can find the voltage at the power pin of the drivers in the local blocks. In this way one can quantify how severely the global and semi-global power grids can affect the IR-drop over the power grid in the worst case. Two cases are considered as detailed next. 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Case I) No decoupling capacitors: Using the number of the tracks provided by Table 4, the resulting voltage drop values would be drastic. Ideally, the maximum voltage drops should be less than 10% in order to ensure a correct functioning of the circuit. Using the minimum-sized tracks would result in huge amounts of voltage drop. As a result, an optimization procedure should be used that attempts to minimize the voltage drop such that a fixed percentage of the routing area gets allocated to the power network, while satisfying the EM rules. Maximum allocation of 5 to 10% of the routing area to the power network is a usual policy in current technologies. Figure 6-4 shows the worst-case voltage drop in different technologies for 5% and 10% allocation of the routing area to the power network, respectively, while accounting for the effect of interconnect temperature (without considering the on-chip decoupling capacitances). Case II) Uniformly-distributed decoupling capacitors: From Figure 6-4, it can be seen that even with wire sizing up to the allowed budget of the routing area, the voltage drop will be more than the maximum allowable margin 10%. As discussed earlier, insertion of on-chip decoupling capacitors near the 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. switching devices on the substrate decreases the peak magnitude of the voltage drop. Based on the amount of the total decoupling capacitor that was calculated by (2.20) and reported in Table 3, one can assume that the decoupling capacitors are uniformly distributed over the substrate surface. 40 - ■Tmax, 10% A ■T=27C , 10%A -T m ax, 5%A -T = 2 7 C, 5%A 30 - 9 2 0 0.07 0.18 0.13 w..w 0.1 Tech. Node (micron) Figure 6-4. Worst-case IR-drop (AVm/Vdd) increase as a function of the technology node for combined global and semi-global power grids considering the effects of self-heating, while allocating 5% and 10% of the routing area to the power network, respectively. Figure 6-5 shows the worst case IR-drop in global/semi-global grids while using a projected amount of on-chip decoupling capacitor and 10% of the routing area for the power network. From Figure 6-5, it can be seen that by 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. using the suggested on-chip decoupling capacitors, the worst-case IR-drop reduces to an acceptable margin for current technologies. However, as the technology node scales towards the sub lOOnm regime, the maximum voltage drop violates the 10% voltage swing rule. In the above experiments the area of the total on-chip decoupling capacitors are at about 5% of the total substrate area. 33 ■Tmax •T=27 C 28 - 23 -- 13 - 0.07 0.18 0.13 0.1 Tech. Node (micron) Figure 6-5. Worst-case IR-drop (AVm/Vdd) increase as a function of technology node for combined global and semi-global power grids considering the effects of self-heating, while allocating 10% of the routing area to the power network and assuming uniformly distributed on-chip decoupling capacitances. Observations made in two previous cases show that for future technologies, assigning 10% of the routing area to the power network and 5% of the 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. substrate area to the decoupling capacitors will not be sufficient in order to limit the maximum voltage drop to be less than the desired value of 10%. As a result, new resource allocation limitations should be determined for new technologies. Figure 6-6 shows the minimum required percentage of the allocated resources to ensure a worst-case 10% voltage drop for future technologies. 33 Routing area Decoupling 28 23 18 8 3 0.07 0.18 0.13 T e c h . N o d e ( m i c r o n ) 0.1 Figure 6-6. Minimum required percentage of the allocated resources (global layer routing area and substrate area) to ensure a worst-case 10% voltage drop for future technologies, considering the maximum temperature on global/semi-global interconnects. I l l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.4.3 IR-drop in Local Power Network The worst-case voltage drops from the previous section, now can be imposed to the two sides of each power trunk in order to examine the IR-drop phenomenon in the local power trunks. At this point and for the sake of simplicity, assume that the voltages at the two ends of the segment are equal. Figure 6-7 shows the worst-case IR-drop increase as a function of the technology node in the local power trunks. Note that the actual voltage applied at the two ends of the power trunk are at Vdd-Vdd' where V( ur is the IR- drop found at the previous stage and can be extracted from Figure 6-5 for different technologies. To extract the total worst-case IR-drop, one must combine the results of the two previous steps. Using Figure 6-5 and Figure 6-7, Figure 6-8 summarizes the total IR-drop increase as a function of technology node in the presence of barrier effect/thin-film and temperature effects. Also note that the worst-case IR-drops for different technologies are based on the assumption of uniformly distributed power pads all over the chip area, which is the emerging trend in the industry. By using the periphery-only power pad distribution scheme, the worst-case IR-drop will be much more severe than the results extracted by Figure 6-5 and Figure 6-8. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 — m------N=100+T+S — #■------N=100+T ®— N=100 — ■© — N=50+T-f-S — — N=50+T — •© — N=50 20 Q . g Q IX > 0 . 0 7 0 . 1 3 0.1 0 . 1 8 T e c h . N o d e ( m i c r o n ) Figure 6-7. Worst-case IR-drop (AViR/(Vdd-VddO) increase as a function of technology node in the presence of interconnect temperature (T) and surface scattering/barrier effects (S), in the local power trunk lines. Notice that in this graph (Vdd-Vdd’ ) is the actual voltage over the two sides of the local power trunks, and N is the number of standard cell connected to the power trunk. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 ■ 30 25 - £ 20 - 15 ■- 10 - - - 0 . 0 7 0.1 0 . 1 8 0 . 1 3 Tech. Node (micron) Figure 6-8. Total worst-case IR-drop (AVWVdd) increase (as a result of Figure 6-5 and Figure 6-7) as a function of technology node in the presence of interconnect temperature (T) and surface scattering/barrier effects (S), while allocating 10% of the routing area to the power network and 5% of the substrate for decoupling capacitor. 6.4.4 Effect of Hot Spots on the Worst-case IR-drop As shown earlier, insertion of decoupling capacitances is an effective way to control the maximum IR-drop in the power network. In reality, the magnitudes of the current sources connected to the power grid are not uniformly distributed. Due to different switching activities and/or sleep modes of various functional blocks, the distribution of current sources over the power network is generally non-uniform. As a result, to obtain an optimal decoupling 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. capacitor insertion scheme, the decoupling capacitances should be distributed non-uniformly according to the activity profiles of the different blocks over the substrate (contrary to the Case II, which was discussed earlier). The existence of such non-uniformly distributed switching activities on the substrate results in the substrate thermal gradients and in extreme cases leads to the creation of hot spots. Some heuristics have been proposed to distribute decoupling capacitances to hot spot neighborhoods in order to minimize the worst-case IR-drop on the power networks [23]. However, the existence of such hot spots along the substrate surface introduces non-uniform temperature profiles along the lengths of the long global interconnects. More specifically, the power distribution network spans all the substrate area and it is exposed by the thermal non-uniformities of the substrate surface. It has been shown that any kind of thermal non-uniformities on the substrate surface would affect globally long interconnects. With the power consumption profile of the blocks over the substrate one can easily determine the substrate thermal profile. To derive the thermal profile of a long global interconnect passing over the substrate, one can use the following: 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. * % .W (2.26) dx~ where Tiin e(x) and Tmb(x) are the interconnect thermal profile and substrate thermal profile along the length of the interconnect, respectively, and X and 8 are two constants that can be derived using the physical dimensions of the interconnect line and the insulator and thermo-electrical properties of the interconnect metal. 0 . 1 8 0 . 1 3 20 * 5 0 6 0 4 0 0 10 20 T h e r m a f 3 0 T h e r m a f G r a i d e n t ( C ) Figure 6-9. Worst-case IR-drop (AVm/Vdd) increase (based on Figure 6-5) as a function of technology node in the presence of hot spots as a function of thermal gradient magnitudes (°C). 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It is also well known that the resistivity of a metal has a linear relationship to the thermal profile of the line. As a result, due to the non-uniformity of the substrate temperature, the resistance profile of the power network will distribute non-uniformly. Specifically, resistance of those segments on top of the hot spots is going to be higher than the rest of the power network segments. As a result, it is expected that by considering the actual temperature-dependent resistivity of the global interconnect, the IR-drop value at those nodes in the proximity of the hot spots will worsen. The thermal effects of hot spots on the interconnect resistance are examined next. To model a hot spot a constant peak Gaussian distribution thermal profile with a constant standard deviation (T(x) = T m :& -ex-p(-(x-ju)2/2cr2)) is assumed. Figure 6-9 shows the variations of the worst-case IR-drop as a function of the magnitude of the thermal gradient of a hot spot over the substrate surface. The hot spot was located through a thermal mapping step in a previous stage. In this experimental setup both the resistance of global interconnects and vias are functions of the underlying substrate temperature. Figure 6-9 shows that by neglecting the thermal effects of hot spots on the resistivity of the global layers, one can not predict the worst-case voltage drop of hot devices correctly, and consequently the amount of the inserted decoupling capacitance proposed by current heuristics is not sufficient. 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.5 Effects of the IR-drop on the Cell Performance and Clock Skew Performance of each cell connected to the local trunk segment is strongly dependent on the fluctuations of power supply voltage (Vdd) ■ A simple short- channel model for transistors in the saturation region can be used to derive the sensitivity of the gate delay as a function of the changes in Vdd, we can use. The Ids can be expressed as follows: (2.27) where Co x is the oxide capacitance, V f f J is the gate to source voltage, vsat is the carrier saturation velocity, V* is the drain to source voltage and Vt is the threshold voltage. The gate delay sensitivity to the power supply voltage fluctuations can be written as follows: ^ Vi M Vt - V t2 + Ec LVm + Ec LVt V lM (Vd c l - V T + EcL)(Vd d - V T) where Ec is the critical electric field, L is the channel length (EC L= 1.4 V) and Vt is assumed to be Vdd5. As shown in Figure 6-10, by technology scaling, 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the dependency of the gate delay on the power supply voltage fluctuations becomes more severe. Notice that Figure 6-10 depicts the sensitivity of power supply voltage variations as a function of technology node. As an example, it can be seen from Figure 6-10 that for each 10% decrease in the power supply voltage in the 0.18 pm technology, an 8.5% increase in the gate delay should be expected. - 1 . 4 - 1 . 3 - 0.8 - 0 . 7 0 . 0 6 0 . 0 7 0.1 0 . 1 3 Tech. Node (micron) Figure 6-10. Sensitivity of the cell delay ( S Dv<m) to the fluctuations of the supply voltage Vm for different technology nodes. Y-axis values show the percentage increase in gate delay for each percent decrease in Vm at the specific technology. Figure 6-11 shows the maximum percentage of delay difference among the devices connected to a semi-global grid for different technologies. This delay difference will appear as skew among the devices in a clock circuitry. It can 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be seen that technology scaling can introduce a considerable amount of skew into the clock tree by affecting the performance of the clock buffers through a non-uniform voltage drop over power network. 45 T m a x T = S 7 C 40 X & i I 0.07 0.13 0.1 0.18 Tech. Node (micron) Figure 6-11. Maximum percentage of the delay difference among drivers connected to a local power trunk for different technologies at room temperature, and at maximum interconnect temperature. 6.6 Summary This section highlighted the growing importance of the IR-drop effects with technology scaling. The effects of temperature, electromigration reliability and interconnect technology scaling including resistivity increase of Cu 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interconnects due to electron surface scattering and finite barrier thickness were taken into consideration for IR-drop analysis. Severe performance degradation and/or functional alterations due to power network IR-drop suggests that the IR-drop issue will become an increasingly important factor in determining P/G networks interconnect design policies and signal integrity guidelines. It was shown that new resource allocation guidelines for P/G metal area and on-chip decoupling capacitors should be provided for future technologies in order to limit the maximum voltage swing in the P/G distribution networks. It was shown that by considering the non-uniform temperature effects of the substrate hot-spots on the resistivity of global interconnects, the allocated decoupling capacitances to the hot spot region should be modified accordingly. 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Conclusions and Future Work 7.1 Thesis Summary In this dissertation it was shown that chip temperature is rising rapidly due to the ever-increasing down-scaling of the technology feature size, even though the device power density remains relatively constant. The reason for this increase is because of the notable self-heating generation in the interconnects. The power generation of interconnects is not as large as device power generation, but because metal lines are much farther from the heat sink than the substrate, they can contribute to the overall chip temperature considerably. Due to the non-uniform power generation map along the substrate surface, the substrate generally has a non-uniform thermal profile. It was shown that non- uniform temperature distributions along long global wires in high- performance ICs could have significant implications for interconnect performance and other critical design metrics such as the clock skew. A detailed analysis of the impact of non-uniform temperature distributions on the interconnect performance was presented using a new distributed RC delay 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. model that incorporates non-uniform interconnect temperature dependency. The model was applied to analyze a wide variety of interconnect layouts and temperature profiles. Analytical models for accurate interconnect temperature distributions arising from non-uniform substrate temperature profiles were derived using fundamental heat diffusion equations. It was shown that the clock skew would be significantly affected by interconnect temperature non uniformities. It was also shown that optimal buffer insertion is strongly dependent on the non-uniform substrate thermal profile. These studies suggest that incorporating thermal analysis is necessary in performing various design optimization steps in high performance ICs. 7.2 Future Work Based on the current study and the results extracted by experiments, future research work should focus on two directions: • Studying the effects of non-uniform substrate thermal profile on other EDA optimization steps. • Proposing effective and feasible methodologies to reduce or eliminate the substrate or interconnect temperature non-uniformities. 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.2.1 Studying the Effects of Thermal Non uniformities on EDA Flow As was seen earlier, some of the traditional and well-accepted optimization schemes in the EDA flow can be extremely affected by non-uniform substrate temperature. It is expected that, most of the back-end flow routines fell out of their optimal points after incorporating the effects of substrate temperature on their final results. Some of the most important steps that should be studied further are as follows: • Considering the effect o f thermal non-uniformities on clock mesh networks'. The clock nets that were discussed in Chapter 4 had topologies in the form of a tree (H-Tree). There is another methodology to route the clock with a topology of a mesh. In the current high-performance chip clock distribution methodologies, designers use hybrid techniques in which they use a mesh for the global clock distribution and use H-Tree’s for local sinks (or vice- versa). Considering the effect of non-uniform substrate thermal profile on the clock mesh distribution, some new design guidelines must be provided to reduce the effect of these gradients. One way is to selectively size the clock network segments. Another way is to non- uniformly change the distance between two neighboring grid lines. 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Considering the effect o f thermal non-uniformities on interconnect layer assignment: As was shown in Chapter 2, the temperature of an interconnect line is dependent on the insulator thickness. This means that the layer assignment of the interconnect is an important factor to determine the peak temperature of the line. Current layer assignment schemes in EDA tools do not consider this effect. Considering the fact that the signal distribution network is usually in the form of a tree, the current density distribution is also a factor in the temperature calculation of the line. Overall, by providing a modified layer assignment scheme that considers the assigned layer for each segment and incorporates the proper current distribution for them, the thermally dependent layer assignment routines can improve the overall performance of the signal distribution network in comparison to the standard routines. • Considering the effect o f thermal non-uniformities on gate sizing and wire sizing: As was shown in Chapter 5, buffer insertion and sizing is extremely dependent on the interconnect thermal profile. In the same way, it is expected that the gate sizing or wire sizing schemes are affected by the underlying substrate non-uniform thermal profile. The wire-sizing scheme can be mixed with optimal layer assignment to 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. optimize the propagation delay of a signal. On the other hand, determining the optimal size of the gates (for the sake of performance optimization) is also dependent on the substrate temperature in the neighborhood area of those gates. Considering the output driving resistance of each gate (which is extremely dependent on the gate temperatures), one can implement new algorithms to optimize the delay of the circuit through gate sizing. 7.2.2 Reducing the Magnitude of the Non-uniform Substrate Thermal Gradients Another way to look at the thermally generated problems in ULSI circuits is to somehow reduce the non-uniformity of the thermal profiles over the substrate and subsequently in the interconnect lines. Some of the proposed methods are as follows: • Cell Placement with the objective o f uniform substrate temperature: It was shown that the non-uniform power generation map of the devices over the substrate is the main source of the substrate thermal non uniformities. One way to reduce this problem is to selectively place the gates over the substrate such that the highly active gates remain far from each other, if possible. In this way the possibility of creating hot spots can be eliminated. It can be shown that by defining new 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. thermally dependent objectives, new placement algorithms can be implemented such that the thermal non-uniformities over the substrate reach a minimum level. One disadvantage of this kind of placement is that other placement objectives, such as total wire length or critical path delay, will not be optimal anymore. Therefore, the newly developed placement routines must consider all of these objectives simultaneously such that after placement, the circuit has an acceptable delay or total wire length while the thermal non-uniformities of the substrate have a minimal gradient. • Using dummy vias to reduce the thermal non-uniformity o f the interconnect line\ As was mentioned in Chapter 2, the thermal path from the interconnect and the substrate is one of the major factors in determining the interconnect peak temperature. If the effective thermal path thickness between the interconnect line and the substrate can be reduced by any means, a reduction in the temperature of the interconnect line can be expected. For example, if some dummy vias can be inserted below the interconnect, such that they do not interfere with the electrical signal distribution, then due to the highly thermal nature of the vias, the temperature of the interconnect in the location of inserted vias reduces form its peak value. Moreover, if the distance 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. between each two consecutive vias remains less than the thermal diffusion length of the interconnect line, then the peak temperature of the interconnect lines reduces rapidly. Using these facts, by inserting a dense array of dummy vias below the interconnect line with non- uniform thermal profile, one can reduce the magnitude of the thermal gradients through adaptive placement of the vias. This method looks very promising and the insertion of the dummy vias is feasible in the current technologies. The only disadvantage is that the routing density under the interconnect lines with dummy vias may increase, so the routing may get be more difficult than before. 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 8 References [1] A.H. Ajami, K. Banerjee, and M. Pedram, “Analysis of substrate thermal gradient effects on optimal buffer insertion,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, 2001, pp. 44-48. [2] A.H.Ajami, M.Pedram, and K.Banerjee, “Effects of non-uniform substrate temperature on the clock signal integrity in high performance designs,” in Proc. IEEE Custom Integrated Circuits Conference, 2001, pp. 233-236. [3] A.H Ajami, K. Banerjee, and M. Pedram, “Non-uniform chip- temperature dependent signal integrity,” in Proc. VLSI Symposium on Technology, 2001, pp. 145-146. [4] A. H. Ajami, K. Banerjee, M. Pedram, and L. P.P.P. van Ginneken, “Analysis of non-uniform temperature-dependent interconnect performance in high-performance ICs,” in Proc. Design Automation Conf., 2001, pp. 567-572. [5] C. Alpert and A. Devgan “Wire segmenting for improved buffer insertion”, in Proc. Design Automation Conference, 1997, pp. 588-593. [6] J.C. Anderson, The Use o f Thin Films in Physical Investigation, Academic Press, 1966. 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [7] R.V. Andrews, “Solving conductive heat transfer problems with electrical-analogue shape factors,” Chemical Engineering Progress, vol. 51, no. 2, pp. 67-71, 1955. [8] H.B. Bakoglu, Circuits, Interconnections and Packaging for VLSI. Addison-Wesley, 1990. [9] K. Banerjee and A. Mehrotra, "Accurate analysis of on-chip inductance effects and implications for optimal repeater insertion and technology scaling," in Proc. Symposium on VLSI Circuits, 2001, pp. 195-198. [10] K. Banerjee, A. Mehrotra, A. Sangiovanni-Vincentelli, and C. Hu, “On thermal effects in deep sub-micron VLSI interconnects,” in Proc. Design Automation Conference, 1999, pp. 885-891. [11] K. Banerjee, A. Mehrotra, W. Hunter, K.C. Saraswat, K.E. Goodson, and S. Wong, “Quantitative projections of reliability and performance for low-k/Cu interconnect systems,” in Proc. IEEE Annual International Reliability Physics Symposium, 2000, pp. 354-358. [12] K. Banerjee, M. Pedram, and A.H. Ajami, “Analysis and optimization of thermal issues in high performance VLSI,” in Proc. Int'l Symposium on Physical Design, 2001, pp. 230-237. [13] I.R. Black, “Electromigration- A brief survey and some recent results,” IEEE Transactions on Electron Devices, vol. ED-16, pp.338- 347, 1969. [14] A.A. Bilotti, “Static temperature distribution in IC chips with isothermal heat sources,” IEEE Transaction on Electron Device, vol. ED-21, no. 3, pp.217-226, 1974. [15] P. Chahal, R.R. Tummala, M.G. Allen, and M. Swaminathan, “A novel integrated decoupling capacitor for MCM-L technology,” IEEE Trans, on Components, Packaging, and Manufacturing, vol 21-2, pp. 184-193, 1998. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [16] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers, 1995. [17] C.S. Chang, A. Oscilowski, and R.C. Bracken, “Future challenges in electronics packaging,” IEEE Trans, on Circuits and Devices, vol. 14-2, pp. 45-54, 1998. [18] T.H. Chao, Y.C. Hsu, J.M. Ho, K.D. Boese, and A.B. Kahng, “Zero skew clock routing with minimum wirelength,” IEEE Transaction on Circuits and Systems-II, vol. 39, no. 11, pp. 799-814, 1992. [19] A.J. Chapman, Fundamentals o f Heat Transfer, 4th ed., New York, Macmillan, 1984. [20] R. Chaudhry, D. Blaauw, R. Panda, and T. Edwards, “Current signature compression for IR-drop analysis,” in Proc. Design Automation Conf, 2000, p p .162-167. [21] C-P. Chen, Y-P. Chen, and D.F. Wong, “Optimal wire-sizing formula under the Elmore delay model,” in Proc. Design Automated Conference, 1996,p p .487-490 [22] D. Chen, E. Li, E. Rosenbaum, and S-M. Kang, “Interconnect thermal modeling for accurate simulation of circuit timing and reliability,” IEEE Trans, on Computer-Aided Design o f Integrated Circuits and Systems, vol. 19, no. 2, pp. 197-205, 2000. [23] H.H. Chen and D.D. Ling, “Power supply noise analysis methodology for deep-submicron VLSI chip design,” in Proc. Design Automation Conf, 1997, pp. 638-643. [24] F. Chen and D. Gardner, “Influence of line dimensions on the resistance of Copper interconnections,” IEEE Electron Device Letters, vol. 19, pp. 508-510, 1998. 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [25] Y-K Cheng, P. Raha, C-C Teng, E. Rosenbaum, and S. Kang, “ILLIADS-T: an electrothermal timing simulator for temperature- sensitive reliability diagnosis of CMOS VLSI chips,” IEEE Trans, on Computer-Aided Design o f Integrated Circuits and Systems, vol. 17, no. 8, pp.668-681, 1998. [26] Y-K. Cheng, C. Tsai, C. Teng, and S. Kang, Electrothermal Analysis o f VLSI Systems, Kluwer Academic Publishers, 2000. [27] Y-K. Cheng et al, “iCET: A complete chip-level thermal reliability diagnosis tool for CMOS VLSI chips,” in Proc. Design Automation Conference, 1996, pp. 548-551. [28] T.Y. Chiang, K. Banerjee, and K.C. Saraswat, “Effect of via separation and low-k dielectric material on the thermal characteristics of Cu interconnects,” in Proc. Technical Dig. IEDM, 2000, pp. 261-264. [29] V. De and S. Borkar, “Technology and design challenges for low power and high performance,” in Proc. Int’l Symp. on Low Power Electronics and Design, 1999, pp. 163 -168. [30] S. Dhar, M.A. Franklin “Optimum buffer circuits for driving long uniform lines,” IEEE Journal o f Solid-states 26(1), pp. 32-40, 1991. [31] A. Dharchoudhury, R. Panda, D. Blaauw, and R. Vaidyanathan, “Design and analysis of power distribution networks in PowerPC™ microprocessors,” in Proc. Design Automation Conf, 1998, pp. 738-743. [32] P.E. Gronowski, W.J. Bowhill, R.P. Preston, M.K. Go wan, and R.L. Allmon, “High performance microprocessor design,” IEEE J. Solid- State Circuits, pp. 676-686, 1998. [33] L. Gwennap, “Power issues may limit future CPUs,” Microprocessor Report, 10(10), August 1996. 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [34] M. Igeta, K. Banerjee, G. Wu, C. Hu, and A. Majumdar, “Thermal characteristics of submicron vias studied by scanning Joule expansion microscopy,” IEEE Electron Device Letters, vol. 21, no. 5, pp. 224-226, 2000. [35] S. Im and K. Banerjee, "Full chip thermal analysis of planar (2-D) and vertically integrated (3-D) high performance ICs,” in Proc. Tech. Dig. IEDM, 2000, pp. 727-730. [36] International SEMATECH, Applications of Metal-Insulator-Metal (MIM) Capacitors, Technology transfer, 00083985A-ENG. [37] International Technology Roadmap for Semiconductors (ITRS), 1999 Edition. [38] International Technology Roadmap for Semiconductors (ITRS), 2001 Edition. [39] R. Kielkowski, SPICE Practical Device Modeling, McGraw-Hill Inc., 1995. [40] D.G. Luenberger, Linear and Nonlinear Programming, 2n d Edition, Addison-Wesley Inc., 1984. [41] T. Mitsuhashi and E.S. Kuh, “Power and ground network topology optimization for cell-based VLSI,” in Proc. Design Automation Conf., 1992, pp. 524-529. [42] S.R. Nassif and J.N. Kozhaya, “Fast power grid simulation,” in Proc. Design Automation Conf., 2000, pp. 156-161. [43] National Technology Roadmap for Semiconductors (NTRS), 1997 Edition. [44] R.H.J.M. Otten, R.K. Brayton “Planning for performance,” in Proc. Design Automation Conf, 1998, pp. 122-127. 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [45] M.D. Pant, P. Pant, and D.S. Wills, “On-chip decoupling capacitor optimization using architectural level current signature prediction,” in Proc. In t’l ASIC/SOC Conf., 2000, pp. 288-292. [46] L. Pillage, R. Rohrer, and C. Visweswariah, Electronic Circuit and System Simulation Methods. McGraw-Hill, 1994. [47] S. Rzepka, K. Banerjee, E. Meusel, and C. Hu, “Characterization of self heating in advanced VLSI interconnect lines based on thermal finite element simulation,” IEEE Trans, on Components, Packaging and Manufacturing Technology-A, vol. 21, no. 3, pp. 406-411, 1998. [48] R. Saleh, S.Z. Hussain, S. Rochel, D. Overhauser, “Clock skew verification in the presence of IR-Drop in the power distribution network,” IEEE Transaction on Computer-Aided Design, vol. 19, no. 6, pp. 635-644, 2000. [49] H.A. Schafft, “Thermal analysis of electromigration test structures,” IEEE Trans, on Electron Device, vol. 34, no. 3, pp. 664-672, 1987. [50] K.J. Singh and A. Sangiovanni-Vincentelli, “A heuristic algorithm for the fanout problem,” in Proc. Design Automation Conf, 1990, pp. 357- 360. [51] X. Tan, C.J.R. Shi, D. Lungeanu, J. Lee, and L. Yuan, “Reliability- constrained area optimization of VLSI power/ground networks via sequence of linear programming,” in Proc. Design Automation Conf, 1999, pp. 78-83. [52] J. Tao, J.F. Chen, N.W. Cheung, and C. Hu, “Electromigration design rules for bi-directional current”, in Proc. International. Reliability Physics Symposium, 1996, pp. 180-187. 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [53] V. Tiwari, D. Sing, S. Rajgopal, G. Mehta, R. Patel and F. Baez,“ Reducing power in high-performance microprocessors,” in Proc. the 35th ACM Design Automation Conference, 1998. [54] C.H. Tsai, S.M. Kang, “Cell-Level Placement for Improving Substrate Thermal Distribution,” IEEE Trans, on Computer Aided Design, vol 19, no. 2, pp. 253-265, 2000. [55] C.H. Tsai and S.M. Kang, “Fast temperature calculation for transient electrothermal simulation by mixed frequency/time domain thermal model reduction,” in Proc. Design Automation Conference, 2000, pp. 750-755. [56] L.P.P.P Van-Ginneken, “Buffer placement in distributed RC-tree networks for minimal Elmore delay,” in Proc. Int. Symposium on Circuits and Systems, 1990, pp. 865-868. [57] E.S. Yang. Microelectronic Devices, McGraw-Hill Inc., 1988. [58] Z. Yu, D. Yergeau, R.W. Dutton, S. Nakagawa, N. Chang, S. Lin and W. Xie, “Full chip thermal simulation,” in Proc. IEEE International Symposium on Quality Electronic Design, 2000, pp.145-149. [59] Q. Wu, Q. Qiu, and M. Pedram, “Dynamic power management of complex systems using generalized stochastic petri nets,” in Proc. Design Automation Conference, 2000, pp. 352-356. [60] P. Zarkesh-Ha, T. Mule, and J.D. Meindl, “Characterization and modeling of clock skew with process variation,” in Proc. Custom Integrated Circuits Conf, 1999, pp. 441-444. 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Consolidated logic and layout synthesis for interconnect -centric VLSI design
PDF
Dynamic voltage and frequency scaling for energy-efficient system design
PDF
A passive RLC notch filter design using spiral inductors and a broadband amplifier design for RF integrated circuits
PDF
High-frequency mixed -signal silicon on insulator circuit designs for optical interconnections and communications
PDF
A CMOS frequency channelized receiver for serial-links
PDF
Encoding techniques for energy -efficient and reliable communication in VLSI circuits
PDF
Clustering techniques for coarse -grained, antifuse-based FPGAs
PDF
Investigation of degrading effects and performance optimization in long -haul WDM transmission systems and reconfigurable networks
PDF
Induced hierarchical verification of asynchronous circuits using a partial order technique
PDF
Energy -efficient information processing and routing in wireless sensor networks: Cross -layer optimization and tradeoffs
PDF
Experimental demonstration of techniques to improve system performance in non-static microwave frequency analog and digital signal transmission over fiber -optic communication systems
PDF
Gyrator-based synthesis of active inductances and their applications in radio -frequency integrated circuits
PDF
High performance components of free -space optical and fiber -optic communications systems
PDF
Design, modeling, simulation and optimization of automated container terminal
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Analysis and design of high-gain space-fed passive microstrip array antennas
PDF
Alias analysis for Java with reference -set representation in high -performance computing
PDF
A thermal management design for system -on -chip circuits and advanced computer systems
PDF
Cognitive modeling of iteration in conceptual design
PDF
Green transit scheduler: A methodology for jointly optimizing cost, service, and life-cycle environmental performance in demand -responsive transit scheduling
Asset Metadata
Creator
Ajami, Amir Hooshang
(author)
Core Title
Effects of non-uniform substrate temperature in high-performance integrated circuits: Modeling, analysis, and implications for signal integrity and interconnect performance optimization
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Bannerjee, Kaustav (
committee chair
), Pedram, Massoud (
committee chair
), Namgoong, Won (
committee member
), Rahimi, Mansour (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-626057
Unique identifier
UC11340134
Identifier
3116654.pdf (filename),usctheses-c16-626057 (legacy record id)
Legacy Identifier
3116654.pdf
Dmrecord
626057
Document Type
Dissertation
Rights
Ajami, Amir Hooshang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical