Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A thermal management design for system -on -chip circuits and advanced computer systems
(USC Thesis Other)
A thermal management design for system -on -chip circuits and advanced computer systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A THERMAL MANAGEMENT DESIGN FOR SYSTEM-ON-CHIP CIRCUITS AND ADVANCED COMPUTER SYSTEMS Copyright 2002 by Herming Chiueh A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2002 Herming Chiueh Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3094315 UMI UMI Microform 3094315 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by under the direction o f h dissertation committee, and approved by all its members, has been presented to and accepted by the Director o f Graduate and Professional Programs, in partial fulfillment o f the requirements for the degree o f DOCTOR OF PHILOSOPHY Director Date Aug u s t 6 . 2002 Dissertation Committee Chair Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION to my parents, fo r their devotion to my education. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGMENTS I am indebted to the supervision of my advisor Professor John Choma, Jr. and co-advisor Professor Jeffrey Draper both of whom have given me the training that has broadened my perspective and technical skills. I am grateful to them for not only their technical guidance but also their personal education and encouragements throughout my Ph.D. study at USC. I would like also thank members of my oral defense committee, Professor Satwindar Sadhal, Jeffrey Draper and John Choma, Jr. for their supervision in my dissertation. I am thankful to Mr. Jay Block for his valuable discussion in the early stage of my research. The project that he initiated: Integrated Thermal Management System is a foundation which I built this work upon. Also, I would like to thank Mr. Chih-Hsiu Lin, Dr. Po-Chung Chen and Professor Satwindar Sadhal in their valuable discussions on analytical thermal model and mathematical derivation. The USC Information Sciences Institute has been an incredible environment to work in. The main reason for this was the people in this group from time to time: Professor Jeffrey Draper, Mr. Sumit Mediratta, Mr. Jeff Sondeen, Dr. Louis Luh, Mr. Chang-Woo Kang, Mr. Ihn Kim, Mr. Joog-Seok Moon and Mr. Taek-Jun Kwon. Their knowledge in VLSI designs, CAD tools has made everything possible in our research projects. I would like to thank Ms. Vay Birdow for her excellent support in administration. This research is supported by DARPA contract DABT63-95-0136 and F30602-98-2-0180. Last, but not least, there is a special group of people I would like to thank. We were separated physically but they have been with me throughout these years. My father Shan-Pen, mother Ming-Hwa, sister Yawen, brother in law Jeen-Feng and my best friend Hsiao-Feng consistently provided infinite love, support, amusement and encouragement and served as a sanity check from time to time. Without them I would have never accomplished what I have so far. Without them by my side I do not plan to accomplish much in the future. iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CONTENTS DEDICATION ii ACKNOWLEDMENTS iii LIST OF TABLES vii LISTS OF FIGURES viii ABSTRACT x 1 Background of Research Problem 1 1.1 System-on-Chip (SOC) Design: A Roadmap to Thermal Catastrophe 1 1.2 Thermal Behavior Review of Integrated Circuits 4 1.2.1 Anatomy of Thermal Behavior in Silicon 4 1.2.2 Previous Research in On-chip Temperature Measurement 6 1.2.3 Previous Research in On-chip Thermal Analysis 8 1.2.4 Useful Thermal Factors for IC Designers 12 1.3 Impact of Thermal Effects on Circuit Behavior 13 1.3.1 Circuit Behavior Impacts from Temperature Gradient 13 1.3.2 Temporal High Temperature Impacts Reliability and Lifetime 18 1.3.3 Electro-Thermal Analysis 19 1.3.4 Thermal Catastrophe: The Recall of Intel Pentium III 1.13 GHz Processor 23 1.4 Thermal Management System for Integrated Circuits Design 26 1.5 Cooling Mechanisms for Thermal Management Systsms 28 1.6 Challenges to Implementing a Complete Thermal Management System for System-on-Chip Design 30 2 Research Objective 33 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 Design of Thermal Management System 35 3.1 Architecture Design of Thermal Management System 3 5 3.1.1 Basic Building Blocks of Thermal Management System for Computer Systems 35 3.1.2 Detailed Design of Thermal Management Circuit 3 8 3.1.3 Operating Modes 3 9 3.2 Thermal Management Algorithms 41 3.3 Analysis and Evaluation of Temperature Gradients 43 3.3.1 Goal, Assumptions and Scope of Model 44 3.3.2 Derivation of Analytical Model 46 3.3.3 Calibration and Validation 48 4 Implementations and Results 50 4.1 Implementation: Embedded System Version 50 4.1.1 TMIC for PowerPC Based Multi-Computer Systems 51 4.1.2 Circuit Implementation and Experimental Result 55 4.2 Implementation: SOC Design 58 4.2.1 Architecture: SOC Implementation 58 4.2.2 Programmable Watchdog Unit with Multi-Channel Input 61 4.2.3 Architecture Enhancements to Embedded System Version 63 4.2.4 CAD Flow and Implementation 63 4.3 Component: Multi-Level Controller 64 4.3.1 Review of Integrated Circuit Based Fan Controller 65 4.3.2 Basic Building Blocks of Fully Integrated Fan Controller 66 4.3.3 Experimental Result 67 4.4 Thermal Evaluation Chip and Model Calibration and Validation 71 4.4.1 Design of Thermal Evaluation Chip 71 4.4.2 Experimental Results from TEC Measurement 73 4.4.3 Calibration and Validation of Analytical Model 76 4.5 Thermal Management Algorithms 81 4.5.1 Implementation of ACPI Standards 81 V Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.2 Implementation of Flexible Standards 83 4.6 System Integration 86 4.6.1 VLSI Implementation of SOC Design 86 4.6.2 Simulation Results 91 4.7 Summary 94 5 Research Contribution and Impact 95 5.1 Comprehensive Analysis and Characterization 95 5.2 Novel Architecture 95 5.3 Low System Overhead for Target Systems 96 5.4 Complete Solution of Thermal Management 97 5.5 Circuit Impacts 97 6 Conclusion 99 Reference List 102 Alphabetized Bibliography 109 Appendix A: Detailed Mathematical Derivation of Analytical Thermal Model 116 Appendix B: Detailed Simulation Result 119 v i Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES 1.1 Thermal conductivity of materials used in integrated circuits 4 1.2 Useful thermal characteristics from previous research 12 1.3 Summary of device parameters for CMOS active components 14 1.4 Approximate performance summary of CMOS passive components 15 1.5 Temperature dependence of some major intrinsic integrated circuit failure mechanisms 19 1.6 Thermal specifications for some Intel Pentium processors 24 4.1 Register assignment in programmable unit 60 4.2 Detailed assignment of configuration and report registers 60 4.3 Specification of designed circuitry layout 69 4.4 Comparison between different fan controller designs 70 4.5 Module #1 temperature with different heat sources 75 4.6 Thermal and physical parameters of TEC 76 4.7 Temperature from models and measurement 80 4.8 Circuit summaries 88 v ii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES 1.1 Roadmap to system-on-chip design 2 1.2 Overview of heat dissipation in integrated circuits 5 1.3 Ids change vs dT at room temperature 17 1.4 Typical diagram for electro-thermal simulation 22 3.1 Basic building blocks of thermal management system for advanced computer systems 36 3.2 Block diagram of the dynamic thermal management circuit 38 3.3 Flow chart of thermal management system 42 3.4 Assumptions of heat source and thermal boundary conditions 44 4.1 Block diagram of Temperature Monitor Interface Circuit 52 4.2 Microphotograph of thermal management chip 56 4.3 ITEM system node board 56 4.4 Detailed block diagram for SOC implementation 58 4.5 Offset temperature monitor 62 4.6 Threshold temperature monitor 62 4.7 Block diagram and system integration of proposed fan controller unit 67 4.8 Circuitry board for design verification 68 4.9 Layout of the fan controller circuitry 69 4.10 Layout of circuit module 72 4.11 Microphotograph picture of the test chip 73 4.12 Spatial temperature offset from measurement 74 4.13 Package temperature 75 4.14 Temperature distribution on die surface 77 4.15 Temperature profile on a portion of die surface 78 4.16 Temperature plot of module #4 over time 79 4.17 Typical flow chart of ACPI standard 84 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.18 ACPI implementation by designed thermal management system 85 4.19 Schematic of final system integration 87 4.20 Simulation results of the multi-level controller 88 4.21 Layout of final thermal management circuitry 89 4.22 Layout of target SOC system 90 4.23 Demonstration of thermal management system’s function 91 B.l Detailed simulation results #1 112 B.2 Detailed simulation results #2 113 B.3 Detailed simulation results #3 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT Increases in circuit density and clock speed in modem VLSI designs have brought thermal issues into the spotlight of high-speed integrated circuit design. Local overheating in one spot of a high-density circuit, such as a CPU on high speed mixed-signal circuit, can cause a whole system to crash. Clock synchronization problems, parameter mismatching and other coefficient changes due to temperature gradients generated by uneven heat-up of on-chip circuitry are the major reasons for system failure. The early stage of this project has successfully characterized the local heat-up problem through an analytical model and test chip. The impact of temperature gradients on circuit behavior is also evaluated. Since the outcome of this analysis has proven the importance of this matter, a systematic solution to thermal management has been proposed. Instead of worst-case thermal management used in conventional systems, this design targets nominal power dissipation and requires the system to actively manage its thermal activity, including monitoring thermal activity and reacting to specified conditions through the control of cooling mechanisms, such as an integrated multi-stage fan controller, to ensure operation within specification. Continuing work includes further improvement in the circuit and architecture as well as analyzing and optimizing the performance of this design. The limitations and advantages by using this thermal management design are also assessed to establish guidelines for designing and analyzing modem system-on- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. chip designs, which are often limited by thermal factors because of inherent power density and sizes. The success of this project offers an opportunity for modem system-on-chip designs to incorporate thermal management techniques to enhance system stability and performance. This design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as provides the flexibility to support different management algorithms. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Background of Research Problem 1.1 System-on-Chip (SOC) Design: A Roadmap to Thermal Catastrophe The evolution of modem semiconductor technology escorts the designs of advanced computer systems and portable communication systems into the system-on-chip era. Recent process innovations, including embedded DRAM [1- 3], embedded SRAM [4], embedded Flash [5], and embedded processor cores [1- 3] have made it possible to implement a complete computer system in a single chip. Another roadmap in processing technology that introduced new materials like copper interconnect [6], low-K dielectric material [7], as well as other novel material and process development resulted in a better environment for mixed- signal design by supplying low-noise, low-resistance interconnect and high-Q passive components [8]. As introduced by deep sub-micron technology like 0.13|im technology and mixed-signal technology from many different foundries, higher circuit density and different technologies are merged in a single chip. The bright future of system-on-chip designs has been predicted and exploited in many different research projects [3,9,10]. Embedded system design [11], personal communication systems [8], and high-speed processor design [1] have taken advantage of this evolution. In Figure 1.1, a roadmap using different novel technologies to a system-on-chip design is shown. System-on-chip design has become the trend of advanced computer systems and mixed-signal designs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. High S p e e d Integrated Circuits C o p p e r Inter c o n n e c t High Q P a s s iv e C o m p o n e n t E m b e d d e d SRA M Mixed- signal P r o c e s s E m b e d d e d P r o c e s s o r Low K Material E m b e d d e d F la sh E m b e d d e d DRAM S yste m - O n -C h ip P o rta b le C o m m u n ic a tio n S ystem H ig h -S p e e d M ixe d -S ig n a l D e sign A d v a n c e d C o m p u te r S ystem Figure 1.1 Roadmap to system-on-chip design However, the introduction of new technologies also introduces new problems. The system-on-chip design yields higher circuit density and larger die size. These two factors have introduced a higher power density as well as the total power per chip. For example, a modem processor design like Intel’s Pentium III contains 28 million transistors, and has a total power of 18-36 watts in a single chip with a die size of 104.6mm2. This means a power-density of 0.17-0.34 watts per mm2 and current density of 0.1-0.2A per mm2. In this range of total power and power density, thermal issues like temperature gradient and heat dissipation become a very serious problem. Besides the power issue, new technologies also create limitations in thermal properties, e.g., low K dielectric introduces lower thermal conductivity in the metal layers. More metal layers with low K material exacerbate the problem. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. All of these factors indicate the future of system-on-chip design is headed for a thermal catastrophe. One example is the recall of the Intel Pentium III 1.13GHz processor [12-14], According to references [12-14], this processor failed at certain temperatures that are in the specification range. As this example highlights, thermal issues limit the trend of SOC and modem VLSI designs. To solve this problem, a complete analysis and review of thermal behavior in VLSI design is necessary. Beyond that, thermal effects that influence circuit behavior should be analyzed. In Section 1.2, a completed thermal analysis for VLSI design is provided. In Section 1.3 the thermal impact to circuit behavior includes a technical analysis of the Intel Pentium III 1.13GHz processor recall. After this analysis, current solutions including thermal management and cooling methods are reviewed in Section 1.4 and Section 1.5. Concluding these reviews, a direction and breakthrough to provide a better solution for thermal issues in SOC design are discussed in Section 1.6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2 Thermal Behavior Review of Integrated Circuits 1.2.1 Anatomy of Thermal Behavior in Silicon In Table 1.1a list of thermal conductivity for materials used in semiconductor fabrication is included. Group Material Thermal Conductivity* Reference source (W/mK) Metals Aluminum 235-237 [15] Copper 386-390[15] Silver 427-430[15] Semiconductors Diamond 1000-2000 100(Film)[16] Silicon 100-150 [17] Germanium 64-77 [17] GaAs 37-45 [171 Insulator S iC > 2 0 .5 -1 .2 [17] Dielectric Films Low K material 0 .2 -0 .7 [16] Table 1.1 Thermal conductivity of materials used in integrated circuits In Figure 1.2, an overview of thermal conditions for general integrated circuits is presented. By viewing Figure 1.2 in conjunction with Table 1.1, a basic idea about thermal conduction in integrated circuit design can be inferred. There are two heat sources in this figure: Q-transistors and Q-metal, which represent the heat generated by transistor junctions and metal interconnects. The heat generated in the transistor junction is caused by the voltage drop of the moving electrons in the MOS transistor source-drain channel. In interconnect, the heat is caused by the voltage drop of electrons in the metal. Q-transistors is much higher than Q-metal, since the power consumption by current and voltage drop in the junction is much higher than the power consumption of interconnect since its voltage drop is comparably small. Thus, Q-transistors dominate the thermal analysis from the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. heat sink to the silicon chip. However, as the thermal conductivity value shows in Table 1.1, silicon dioxide (SiC> 2) is the insulator material used in metal layers, and it has a very small thermal conductivity (0.5-1.2) compared with metal and silicon substrate. The self-heating of interconnects is an important issue for accurate analysis, especially when low k material has been used in copper and mixed- signal processes [18,19]. N-level Interconnects C interconnect C transistor C chip C h eat sink T interconnect Q metal R dielectric Q transistors T transistors R spreading T chip R chip R convection Si chip— r rim um Figure 1.2 Overview of heat dissipation in integrated circuits. After locating the heat sources, the general package shown at the bottom of Figure 1.2 helps in understanding the heat dissipation path. A roughly equivalent RC thermal network is drawn on the left side of this figure. From top to bottom, the different layers have been modeled as corresponding thermal resistances and thermal capacitors. Using the thermal conductivity value provided by Table 1.1, the heat flows can be easily identified in this figure. Since the metal layers have Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mostly silicon dioxide, and there is no contact in the upper side besides the pad contact or wire bonding, the heat flow in the upper side can be neglected in general. The dominant heat flow is the one from sources to the lower side, from the circuit side to the package and then to the heat sink. This characteristic results since the thermal conductivity of materials like silicon, the package, and the heat sink are all higher than the interconnect layers. Of course, different packages will introduce different heat flows and different conditions, but the scenario provided in this section applies in most cases. 1.2.2 Previous Research in On-chip Temperature Measurements For modem VLSI designs, the previous section provides a basic overview of heat transfer inside circuit layers, but detailed information about the temperature according to space and time is necessary to avoid thermal catastrophe. In recent years, much research has been conducted in both measurement and analysis. Limitations in both perspectives exist. In measurement, many research projects have used infrared imaging to capture the thermal distribution of working chips [20,21]. Special purpose IR devices with Si lenses have been built [22] to provide higher precision and thereby obtain detailed information. IR imaging scan is useful for acquiring the steady-state temperature on the chip surface in limit precision (10 |im [22]). Since the IR imaging sensor or camera operates at a speed far less than the clock rate of modem VLSI design, it is almost impossible to get Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the temperature reading with respect to the time domain. The laser flash method [23] is also a novel method to measure the temperature of metal layers by using the reflection change according to the temperature. This method is very useful in interconnect reliability analysis. In previous research Ju & Goodson [24] have shown the result of failed interconnect analysis with its spatial and temporal temperature offset along a segment of interconnects. Although the above methods are useful in measuring the on-chip temperature and its offset to a certain degree, natural limitations of temperature measurement in integrated circuits apply to these methods. Both of them measured only the surface temperature on the chip, and according to the basic overview in the previous section, the surface temperature is very likely to be different from the junction temperature of transistors. Especially in modem processes, there is always greater than 5 interconnect layers. The combination of insulators and metals in the interconnect layers makes it difficult to predict the junction temperature from the sensors. Although conductivity modeling for interconnect layers [25] has been done, it is useful only for predicting the overall thermal conductivity in the interconnect layers and not for mapping the spatial temperature reading to the spatial junction temperature for transistors. However, junction temperature is a very important parameter for circuit simulation. A lot of recent research combines thermal analysis and circuit simulation to feed in each junction temperature in circuits to perform an electro-thermal analysis [20,21,26- 37]. The details of electro-thermal analysis are presented in Section 1.3. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In consideration of this limitation, on-chip temperature sensors are more suitable for measuring junction temperature. Much research about on-chip temperature sensors has been performed. Szekely, et al, [38] provides an excellent review about on-chip temperature sensors for CMOS processes. Modem designs have made it possible to get small-size, high-accuracy and low power dissipation temperature sensors [38,39]. In many sensor designs, a proportional to an absolute temperature (PTAT) circuit [39] and a single diode [40,41] are widely used as the sensing block of on-chip temperature sensors. Some high-speed high-density circuits also include diode [40,41] or ring oscillators [42] on the circuit with external connections to allow the temperature to be measured from outside the circuits. Several commercial parts also provide interface circuits to measure the temperature and provide the digital outputs [43,44]. These are all useful tools for temperature measurement on a chip. 1.2.3 Previous Research in On-chip Thermal Analysis Refer to Figure 1.2 again. The thermal analysis of integrated circuits can be divided into two categories: package and substrate. For either category, analytical approaches and numerical approaches are well investigated. In this section, research according to different categories will be discussed in both analytical and numerical approaches. (1) Thermal Analysis for Packages: The thermal role of IC packages is to deliver the heat generated from the circuit to the package/air interface or the package/heat-sink interface. The Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperature offset between the package and transistor junctions determines how much heat can be extracted from the chip. That is, to deliver the same amount of heat, the temperature offset will be the same, regardless of the exact package temperature. Therefore, the junction temperature will be proportionally higher or lower. So, in order to understand the junction temperature, it is important to understand how much heat can be delivered by the package. These are the factors important to thermal analysis of integrated circuits. Considering thermal behavior, two issues are very important for IC packages: material and package styles. The package material determines the thermal conductivity of the package. When two packages have the same shape and package styles, this factor distinguishes how efficiently heat can be delivered by a package, thus specifying the temperature offset needed to deliver the same amount of heat when the shape and style are the same. This factor is straightforward since it is proportional to the material’s thermal conductivity. The other issue is the shape or style of IC package. Different package designs influence thermal behavior significantly. Design or analysis of a good package is never an easy job [45]. Much research has focused on the detailed thermal analysis of packages [46,47]. However, a lot of recent research has been directed to compact models [48-50] of packages [47,51]. These research efforts focus on how to give a simple model that can apply first-order thermal networks to predict the heat flow and dissipation efficiency in a simple way. These compact models are intentionally simple, but since the measurement feedback is always to be used Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to extract the thermal characterization, the models are useable for IC designers to choose a suitable package to fit their thermal requirements. From the system-on-chip and IC designer’s point of view, current analysis and models for IC packages are not thorough enough. However, for the application engineers, since the off-chip measurement is possible and easy when compared with measurement on the chip, the combination of measurement and compact models is deemed to provide sufficient thermal characteristics to design a system. (2) Thermal Analysis for Substrate: Thermal analysis for substrates is focused on the prediction of junction temperatures, since junction temperature is the most important thermal characteristic that influences the circuit performance. A detailed discussion is available in the next section. As mentioned before, these analyses can be categorized to analytical approaches and numerical approaches. Analytical approaches are usually based on Green’s Function solutions of the heat transfer equation [52,53]. A Green’s Function solution calculates the transient temperature response at a monitor due to a heat source, as a function of relative locations. For a design with m heat sources and n monitor sites, mn equations are needed. Despite its accuracy, the quadratic increase of equations limits the scalability of this approach. Furthermore, Green’s Function solutions are difficult to find except under the most simplistic thermal boundary conditions. As a result, analytical approaches are most suitable for localized analysis with few circuit elements or when the boundary conditions can be ignored during the simulation period. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Numerical approaches usually employ one of the following methods: finite- difference (FDM) [26], fmite-element (FEM) [28,29,54], boundary-element (BEM) [55], or the box integration method [56,57]. Typically the substrate is discretized and modeled as a 3-dimentional lumped circuit network, which is simulated either separately (relaxation-based) or integrated into the electrical netlist for fully-coupled simulation. Both steady-state and dynamic simulations are possible. Numerical methods can be scaled to handle large-sized circuits by using more grid lines in the network, which slows down the calculation, or by assigning a larger substrate volume to each network node, at the price of reduced accuracy. These different meshing approaches or thermal circuit models generate different results for the same integrated circuits, often differing by as much as two orders of magnitude in thermal transport properties. Furthermore, these methods tend to produce intricate mathematics, and the solutions are often elusive. To conclude, both numerical and analytical approaches for on-chip temperature prediction are difficult because of either the dimension of the problem, complexity of mathematics, or most importantly, lack of feedback from the measurement, since the possible measured locations are limited to the surface layer and the circuit layers where an infrared scanner or a circuit-based temperature sensor can be applied or implemented. Though intricate model simulation can give results with a wonderful thermal distribution graph, the integrated circuit designer should keep in mind that it is not possible for the graph to be fully verified by measurement. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.4 Useful Thermal Factors for IC Designers In the previous sections of this chapter, current research and limitations of measurement and modeling have been reviewed. It is important to extract useful thermal information for integrated circuit designers. In Table 1.2, some useful measurement and modeling results are listed. From Table 1.2, two characteristics are important for circuit designers. The first one is the temperature gradient (i.e. the temperature offset in steady state) on the chip. The other one is the temporal temperature offset in a short period of time. The former one produces an electronic characteristics mismatch due to the temperature difference. The latter one impacts the long-term reliability of integrated circuits. These are the thermal characteristics that circuit designers must consider and should try to solve or overcome. In Section 1.3 the impact to circuit behavior will be discussed, and the proposed necessary solution will be presented in Section 1.6. Thermal Characteristic Range Temperature gradient on a Chip 5-35 K[31,38,58] Temporal temperature variation in interconnect (temperature offset along a short metal interconnect in a short period of time) 50-100 K [24,58,59] Short term temperature impact for low-K materials and standard insulator (for ESD analysis) Applied time 100ns, Pulse Energy 10 pJ low-K Temperature 1000K[ 25 Standard insulator T emperature=800K[25] Interconnect failure temperature BOOK [24] Table 1.2 Useful thermal characteristics from previous research Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3 Impact of Thermal Effects on Circuit Behavior The thermal characteristics that influence circuit behavior have been identified in the previous section. In this section, the detailed analysis about these impacts will be discussed. In Section 1.3.1, the temperature gradient’s impact to the circuit characteristic will be discussed. In Section 1.3.2, the temporal overheat and circuit lifetime/reliability will be discussed, and methods of prevention and limitations are also addressed. A review of current electro-thermal simulation that combines circuit and thermal simulation to achieve more accurate results and its weakness are discussed in Section 1.3.3. Following in Section 1.3.4, the Intel Pentium III processor recall will be used as an example of the analysis presented in this section. 1.3.1 Circuit Behavior Impacts from Temperature Gradient (1) Device Behavior The thermal coefficients for active devices used in CMOS processes are summarized [60] in Table 1.3. A summary regarding passive devices is presented in Table 1.4 [60], From the above summarization, the variation due to temperature gradient for some device parameters is high. These impact high-density circuits, power amplifier and mixed-signal designs. A review of some effects is discussed later. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Device Parameter Expression Comments Mobility of electrons fi = K flT L5 For low doping levels, apply to MOS transistor channel and bulk Threshold Voltage Vt (T)=V t(T0 ) - c c (T-To ) a ~ 2.3 mV/°C valid over the range of 200 to 400K. MOS Transistor ID s Ids= ^ f [ 2 ( V G S -V t)V ds- V 2 ds] Ids depends on /u and Vt Diode Is (reversed biased) 1 d l s _ 3 + VG 0 I s d T T VT When VD = 0.6V the reversed saturation current will double for a 5°C increase in temperature. Diode Id (forward biased) 1 d l 3 [ v - V 1 i ULD ^ r qq v D I D dT ~ T [ TVt J When VD = 0.6V the forward diode current will double for a 10°C increase in temperature Diode Vd (forward biased) d v r v - v i w ' A Y D _ r G0 D T dT ~ I T J T When VD = 0.6V the temperature dependence in room temperature is -2.3 mV/°C Table 1.3 Summary of device parameters for CMOS active components Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Component Type Range of Values Relative Accuracy Temperature Coefficient Voltage Coefficient Absolute Accuracy Poly/poly 0.3-0.4 0.06% 25 50 ppm/V 20% capacitor fF/p2 ppm/°C MOS 0.35-0.5 0.06% 25 20 ppm/V 10% capacitor fF/p2 ppm/°C Diffused 10-100 2% 1500 200 35% resistor Q/sq. (5pm width) ppm/°C ppm/V Poly 30-200 2% 1500 100 30% resistor Q/sq. (5pm width) ppm/°C ppm/V Ion impl. 0.5-2k 1% 400 800 5% resistor Q/sq. (5 pm width) ppm/°C ppm/V p-well l-10k 2% 8000 10k 40% resistor Q/sq. ppm/°C ppm/V Pinch 5-20k 10% 10k 20k 50% resistor Q/sq. ppm/°C ppm/V Table 1.4 Approximate performance summary of CMOS passive components (2) Circuit Impacts Most analog designers are aware of the thermal coefficients for different devices since their design parameters such as voltage and current are always related to more than one active/passive device’s temperature coefficient. Phenomena such as thermal run away [61] have been well understood for bipolar design, especially for the power amplifier arena. To prevent thermal run away, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. system instability and long-term overheating that reduces the lifetime of semiconductor circuits, the mechanism of thermal shutdown [62] is frequently provided in high-power design. When the circuit overheats, the circuit will electronically shut down the power-demanding devices to reduce the temperature. At the same time, the circuit ceases to operate [62]. However, thermal shutdown is the worst-case thermal management scheme. For modem mixed-signal and SOC design, it is not practical to cease function since the complexity of the job function might not be able to resume without keeping the current status. Thus, more sophisticated controls as proposed in this research are necessary. By carefully looking at the above summary, there is also some challenge for mixed-signal designers. Since a lot of design is limited to digital fabrication processes due to system integration or cost reasons, some of the passive devices provided by the digital process have comparably higher temperature coefficients. This characteristic influences the design methodology for conventional analog designs [61]. (3) Digital Circuits Impacts: CMOS Inverter Characteristics For the digital integrated circuit design, the most important characteristic that needs to be investigated is the transfer characteristic of an inverter. From previous research [63], as the temperature of a MOS device is increased, the effective carrier mobility, |l, decreases. This results in a decrease in Id s, which is related to temperature T by Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. /* “ T ' u Therefore Ids is the most important factor of digital circuits for determining the average power dissipation of the whole system. Figure 1.3 shows the relationship between Ids and temperature difference with an ambient temperature of 27°C. For a spatial temperature gradient of 50°C as an example, Ids decreases 20.6%, which means a 20.6% change of power dissipation and switching speed in digital circuits. In the same condition, the input threshold shifts 0.4 V in the inverter transfer function; the threshold voltage shifts 0.2 V in both NMOS and PMOS. 0 20 40 60 80 100 120 140 Temperature gradient Figure 1.3 Ids change vs dT at room temperature. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3.2 Temporal High Temperature Impacts Reliability and Lifetime Traditionally, accelerated life tests on integrated circuits and other semiconductor components have been carried out using steady-state temperature as the failure accelerating operating condition. While this is still a good idea for a few failure mechanisms, recent research has indicated that the acceleration factors and circuit lifetime are often activated by temporal temperature as well as spatial temperature gradient rather than steady state temperature [64], Table 1.5 shows an overview of which type of temperature dependence dominates the development of integrated circuit failure mechanisms [65]. From the above review, the reliability of integrated circuits depends on the temporal high temperature and the spatial temperature gradient in most cases. For steady state temperature, most lifetime-related factors do not react until the temperature is higher than 150 °C. In general, temporal overheating in devices and interconnect [24] is a key accelerating issue in reliability. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Failure mechanism Dominate temperature accelerator Comments Hot carrier effects 1/T Dependence on T decreases above -55°C. Independent of temperature in range 20 to 100°C. TDDB, time dependent oxide breakdown T Weak dependence on T. Voltage is the strong accelerator ESD T Resistance to ESD reduces with increase in T Electromigration VT Dependent on T only when T > 150°C. Current density is the major accelerator. Spiking (contact electromigration) T Independent of temperature when T < 400°C Corrosion dT/dt, T Mildly dependent on T. Strongly dependent on humidity Die bond AT Independent of T. T : Steady-state temperature, AT: Temperature cycling, VT: Spatial temperature gradient, dT/dt: Time dependent temperature change Table 1.5 Temperature dependence of some major intrinsic integrated circuit failure mechanisms 1.3.3 Electro Thermal Analysis As previously discussed, temperature gradients play an important role in circuit behavior. For analog and digital circuits, junction temperature assignment is important to get accurate circuit simulation results. However, in traditional circuit simulations, this issue has been ignored, and the temperature coefficient and sweep simulation are used only to simulate the whole circuit at the package Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperature, which is a habit inherited from the hybrid circuit era. For more careful circuit designers, the junction temperature offset can be assigned for circuit simulators like SPICE, but the problem of determining the temperature to be assigned still exists. For hybrid circuits, designers are able to get the package temperature from each component, but as discussed in the previous paragraph, the measurements for on-chip components are not guaranteed. So, the electro thermal simulations that combine circuit simulation and thermal simulation to predict the junction temperature are proposed in much research [26-31]. In Figure 1.4, a typical diagram to implement an electro-thermal simulation is presented [35-37]. This procedure is always initialized by a circuit simulation to get the power dissipation information. Then, using layout-to-thermal analysis tools or hand-calculated projections, thermal analysis is performed to predict the junction temperature. Then the junction temperatures are assigned to the circuit simulation. These steps are repeated until both the junction temperature and power dissipation results converge. According to the above procedure, the simulation is a time consuming job since it needs much more time than pure circuit simulation or thermal simulation, since in both directions, many more simulations must be done to get the results to converge. Moreover, in some special cases, this procedure might never converge. However, electro-thermal simulation is limited by the ability of both electrical simulation and thermal simulation[20,31-34,37]. On the electrical simulation side, most simulators, like HSPICE, are incapable of updating the junction temperature assignment; this means only steady state results can be achieved. On the thermal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. simulation side, the heat source (power dissipation) is fixed since the electrical simulation side is not dynamic. So, achieving a transient electro-thermal simulation that fits reality is impossible with current tools. Some research [29] has used a special circuit layer to represent the thermal phenomena, to make it possible to simulate the thermal and electrical behavior together. However, until now, this method only reports the dynamic temperature reading of the junction temperature but lacks the ability to change the junction temperature in electrical simulation. A feedback technique needs to be implemented to convert the thermal simulation output to the electrical side. Such an undertaking is not so straightforward and until now, the accuracy of this method has been doubted since the thermal model that can be used in this way is very limited. Furthermore, the feedback mechanism is almost impossible to get the same thermal coefficient as the original device models, even though there is a very limited automatic way for adding this component without changing the circuit characteristics like impedance and other factors of circuit nodes. Although there are many weaknesses and limitations of electro-thermal simulation, recent research [32-34] is enthusiastic about this direction since this is currently the only way both electrical simulation and thermal simulation can cooperate to achieve some result. However, without a structure enhancement and rewrite of SPICE or other circuit simulator’s source code, the accuracy and efficiency of electro thermal simulation is limited. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Circuit Layout v > Package Information Circuit Netlists Therm al Networks Power Dissipation Information C o nv erg e ? Circuit Simulation Therm al Simulation Junction Tem perature Prediction C o nv erg e ? Figure 1.4 Typical diagram for electro-thermal simulation 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3.4 A Thermal Catastrophe Case Study: The Recall of Intel Pentium III 1.13 GHz Processor Concluding Section 1.3, the recall of the Intel Pentium III 1.13 GHz processor will be used as an example of thermal impacts to high-density circuit design. In August 2000, Intel Corporation recalled the Pentium III 1.13 GHz processor, announcing that this processor will malfunction at certain temperatures [12-14]. To understand the reasons of this malfunction or circuit failure, the thermal specification from Intel and knowledge provided from previous sections research is used. In Table 1.6, the thermal specification of the Intel Pentium III from 866MHz to 1.13 GHz is described [66]. Only frequency ranges from 866 to 1.13 GHz with the same silicon technology and layout are described from the data book. Basically, they are identical circuits, but the processor supplier categorizes identical chips to different frequencies after testing and applying different voltages to this circuit, which is a very common practice in the processor industry [67]. Since the circuit and the package are the same, the only factor that can be controlled is the core voltage. When different voltages are applied, the circuit can operate at different frequencies. However, this influences the power density and therefore the junction temperature and other temperature specifications. The Max T j u n c t i o n in this table represents the maximum junction temperature allowed on the chip. The T j u n c t i o n Offset represents the temperature offset from reading the on-die temperature sensing diode and the Max T j u n c t i o n - Therefore, the real junction temperature is (Measured T d i o d e + T j u n c t i o n Offset + error) according to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the data book [68]. The Max T j u n c t i o n and Max T c o v e r constrain the maximum package temperatures. These variables should be designed to let the junction temperature be sufficiently higher than the package temperature so that heat can be dissipated. Processor Core Freq. (MHz) Proc. Core Voltages (V) L2 Cache Size (Kbytes) Thermal Design Power (W) Power Density (W/cm2 ) CPUID 0686h Max T ju n c t io n (°C) T ju n c t io n Offset (°C) Max T c o v e r (°C) 866 1.65 256 22.9 35.9 80 3.3 75 933 1.65 256 25.5 39.9 75 3.7 75 1.0 GHz 1.70 256 26.1 40.9 70 3.8 70 1.OB GHz 1.70 256 26.1 40.9 70 3.8 70 1.13 Gil/ 1.80 2 5(5 35.5 55.5 5.2 70 | Table 1.6 Thermal specifications for some Intel Pentium III processors Using this information, several inferences about the thermal characteristics of this circuit and the reason why it malfunctions due to thermal issues can be concluded. • A big gap in T j u n c t i o n from 866 Mhz processor and 1.13 GHz processor has been specified. This means in the higher-speed, higher-power-density processor, the circuit won’t work in the original specification. A lower junction temperature is required for system stability. • The value of T j u n c t i o n offset, which can represent the temperature gradient between the hottest point and the sensing diode, was increased in Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the specification. Since this value is based on measurement, and the sensing diode is designed to be near the hottest point, this indicates the higher-power-density processor has a great temperature gradient on the chip. • The Max T c o v e r remains the same, but according to the manufacturer’s thermal design guideline [68], the 70°C is impossible to reach, since the T c o v e r should be about 30°C lower then the T j u n c t i o n for a 866MHz processor. So the reasonable value for the 1.13 GHz processor should be much lower than 32°C since the difference won’t remain the same because of both power density and temperature gradient increases. From the above observations, the manufacturer had already manipulated the specification to achieve a much lower junction temperature and package temperature for the 1.13 GHz processor. Although the specification package temperature is difficult to reach in a conventional computer chassis, according to reports [12-14,67] all the testing indicated the system failed even though all the manipulated specifications were followed. It can be concluded that the only reason for circuit failure is the great value in the temperature gradient on the chip. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.4 Thermal Management System for Integrated Circuits Design Given the previous discussion about the thermal characteristics of IC design, their impact to circuit behavior and the limitation of electro-thermal simulations, not only should analysis and measurement of thermal characteristics be done, but temperature issues on an integrated circuit should be managed. Previous practices in thermal management usually addressed only a cooling mechanism and analysis [17]. However, thermal management for integrated circuit design and advanced computer systems has more requirements. Beyond the passive heat dissipation mechanisms such as heat sinks and fans that are widely used in system design, active mechanisms to detect and properly handle an over-heating event have become a necessity [11], Such a capability guarantees the system will operate within a certain temperature specification to avoid failure. The ACPI (Advanced Configuration and Power Interface) standard is an example specification for active power and thermal management in personal computer systems [69-71]. Even the ACPI standard is quite limited, though, as it simply supports extra control to turn a cooling mechanism on or off and shifts in the different active cooling levels to maintain the package temperature within specification. Several research efforts proposed advanced implementations using both software and hardware design to implement ACPI and other thermal management algorithms [69, 72]. With these designs, not only can ACPI be implemented, but Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. more intricate thermal management algorithms can also be utilized for modem computer systems. However, as die size and power density increase in this system-on-chip era, the management of package temperature is no longer sufficient to solve the problem. The temperature gradient on-chip discussed in the previous section has become a major factor in system performance. Post-fabrication approaches to addressing on-chip temperature offsets are also needed as die size and power density increase. Without such an approach, some circuit behavior becomes unacceptable (as in Intel’s case), which makes management and control of on- chip temperature offset as important as the reduction of package temperature. With the above functions, systems can deal with an out-of-spec situation either by slow down or shut off of certain portions of the overheating circuitry to cool down the hot spots or by invoking an active cooling mechanism to control the package temperature. Intricate control and optimal management of temperature distribution can be applied with this design. Since sophisticated control of temperature offset and package temperature not only prevents system failure but also enables the whole circuit to run at higher clock speeds due to improvement of matching circuitry behavior and clock synchronization, this design increases system performance and enhances system stability. 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.5 Cooling Mechanisms for Thermal Management Systems A complete thermal management system cannot be implemented without a cooling mechanism. Traditional cooling mechanisms like heat sinks and air-flow are well investigated and modeled [73]. Even if the model is not sufficiently correct, measurement can always provide sufficient information for their properties. Recent research [74] proposed to deposit a diamond passivation layer and other material on the integrated circuits to help the heat dissipation. However, these methods are still in the research stage. Several active cooling methods using micromachining have been proposed recently; micro-channel design [75] and electrokinetic micro-coolers [76] are examples. This research requires some extra micromachining process on the opposite side of the circuit. Though these are all research projects with preliminary results, they exhibit a great potential to cooperate with thermal management systems to control the package temperature and eliminate the temperature gradient. The other potential cooling method is embedded in the power management design of modem processors [77,78]. Many state-of-the-art processors provide mechanisms to modulate system speeds and core voltage [77] as well as the speed for specific special portions or pipeline stages [78]. This mechanism is designed for use in portable designs to extend battery life [78]. However, with cooperation in thermal management systems, the mechanism can be used as the dynamic Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. cooling mechanism to turn off or slow down the overheating portion of a circuit. This result will lead to higher overall power and performance with better thermal control, which is the opposite of its original goal of power optimization. However, with a good combination of power and thermal management in different conditions, this mechanism can be utilized for both high performance and low power optimization. Besides these directions, the other important approach is to integrate more conventional or modem cooling circuit blocks into the SOC design. This focus is important since many trade-offs should be considered, especially power consumption and any required high voltage since these factors are commonly used in conventional cooling circuits. For example, a fan is commonly used as a conventional IC cooling mechanism. However, conventional methods do not control the fan; it will be on all the time to cool the system. Recent research [79] and industry standards [69- 71] have proposed to monitor and control the fan speed inside modem computer systems for performance and power considerations. Most control mechanisms suggest the fan speed can be altered to face different system environments. Most approaches in the PC industry are to introduce SMB compatible fan controllers using an analog fabrication process. Recent research [11,79] and industry design [69] have proposed a pure digital design multi-stage fan controller, which is a suitable design to embed in SOC systems, although a couple of design issues such Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as circuit power dissipation, control signals and circuit size are crucial factors for a successful fan controller. 1.6 Challenges to Implementing a Complete Thermal Management System for System-on-Chip Designs Based on the previous discussion in this chapter, the following six aspects must be addressed to build up a complete thermal management system for SOC designs: • Architecture Design: The thermal management system must require minimal system resources. The flexibility to support different thermal management schemes and computer systems is also highly desirable. • Circuit Implementation: The overhead of circuit implementation in terms of increased area and power-dissipation should be minimal. Process compatibility and SOC integration is the goal of this circuit design. Certain circuits to monitor the temperature gradient and control signals for different processors should be modularized to provide flexibility. • Active Cooling: The target computer system should provide power management at different levels. A fully integratable cooling mechanism based on pure digital design is the best choice to control the package temperature. • Thermal Management Algorithm: The final design should support algorithms and heuristics that fit the designed architectures, as well as flexibility to enable implementations to different scales of computer systems. 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Thermal Evaluation and Modeling: Thermal models should be accurate enough to predict the temperature ranges to be used in thermal management system designs. • System Integration: A system integration of the above components is necessary to verify the role and function of different components. This thesis details the following optimizations and trade-off evaluations necessary to design a thermal management system for SOC designs and advanced computer systems. (1) Since an on-chip monitoring mechanism is included, complicated electro-thermal or numerical thermal simulation can be omitted. However, an analytical model providing sufficient information like temperature range and quality guidelines for the circuit designer is useful. (2) Architecture and Circuit implementation are constrained to be compatible with the system’s process (most likely to be a digital process), and without too much extra load for the system. An interrupt-based system is implemented, and preprogramming to provide flexibility and simplify the architecture is necessary. (3) With respect to system integration, a fancy cooling system that requires extra process steps should not be chosen, although this proposed system has the potential to cooperate with such novel micro-machining cooling methods. Instead, the pure digital design for a fan controller is attractive if the circuit block is small enough. Processors with power management can exploit this capability when combined with thermal management systems. To be able to prove the concepts as described, many implementation steps are needed to produce a prototype system and provide the platform to verify the components separately and fully integrated. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The rest of this dissertation is organized as followed; Chapter 2 states the research objective, Chapter 3 describes the design of the proposed thermal management system, In Chapter 4, the implementation and experimental results of the architecture are presented. Chapter 5, discusses the impact and breakthrough of this design, followed by conclusions in Chapter 6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Research Objective The ultimate goal of this research is to build up a complete thermal management system for system-on-chip designs. Based on previous discussion in Chapter 1, the following problems need to solved: 1. An integrated computer architecture design that provides the platform to implement the thermal management system. System resources required should be minimal; the flexibility to support different thermal management schemes and computer systems should be achieved. 2. The circuit implementations for different blocks of thermal management systems, providing minimal circuit complexity, process compatibility and SOC integration. Certain circuits to monitor the temperature gradient, and control signals for different processors should be modularized to provide software flexibility. 3. Thermal Management Algorithm/Heuristic design for different scales of computer systems will be demonstrated in designed computer architecture for thermal management. 4. An integrated multi-stage fan controller with minimum power dissipation and sufficient stages for thermal management to keep temperature within specification, compatible with pure digital processes of SOC designs. 5. A compact analytical thermal model with measurement feedback to provide sufficient information about the temperature gradient and thermal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. conductivity for scalable chip sizes. The models should be accurate enough to predict temperature ranges to be used in thermal management system designs. 6. A system integration for above items and different computer systems to verify the modularity of each component to provide a general sufficient thermal management system for SOC and high-speed circuit designs. The result of this project offers a novel technique to implement a thermal management system for system-on-chip designs. Circuits like high-speed processors, high-speed mixed-signal design and radio frequency power stages can take advantage of the proposed architecture to accurately control temperature and its distribution all on the chip. The analytical models and thermal evaluation chip implemented in this project also provide heat dissipation design guidelines for critical devices as mentioned. The results of using the proposed models and architecture are enhancement of system stability and higher performance with the same technology due to optimal management of thermal behavior. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Design of Thermal Management System 3.1 Architecture Design of Thermal Management System In this section, the architecture design of a proposed thermal management system is presented. The basic building blocks, detailed organization and its operation modes will be discussed in Section 3.1.1, Section 3.1.2 and Section 3.1.3, respectively. This section provides a foundation for future discussion in implementation and thermal management algorithms. All the technical information about the implementations of hardware and software in this section are based on this proposed architecture. 3.1.1 Basic Building Blocks of Thermal Management System for Computer Systems The basic building blocks of thermal management systems for advanced computer systems are shown in Figure 3.1. Thermal management for c omputer systems from giga-FLOPS multi-node computers to SOC designs as well as personal desktop and laptop computers all share the same building blocks. However, the temperature range, power density and CPU resources vary among different computer systems. For SOC design, the ranges of these parameters dominate the solutions, like choice of temperature sensors, management algorithm, and active cooling methods. Based on this understanding, the choice of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. different implementation methods and the justification for the prototype solution to prove the concepts here will be provided in the following items. Active Cooling Unit Tem perature Sensors CPU Therm al M anagem ent A lgorithm s Therm al M anagem ent Unit Figure 3.1. Basic building block of thermal management system for advanced computer systems • CPU/System: For pure SOC design, the thermal management circuitry is ideally mapped into a CPU special-purpose register and interrupt space. However, practical implementation issues force the prototype design to based on hybrid CPU or computer system instead of designing a processor with integrated thermal management systems. • Temperature Sensors: On-chip temperature sensors are the best solutions for SOC designs. However, due to the selections of computer systems for our 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. prototype implementation, both on-chip temperature sensors and hybrid silicon temperature sensor parts will be used. • Active Cooling Unit: At lease two different methods to cool down the system in SOC design are necessary. One is an on-chip fully integrated fan controller, which will be discussed later. Another is, the processor is ability to use the offset temperature data to tune the speed of different execution units to maintain the offset temperature within spec. Tradeoffs for slowing down some execution units are necessary in a critical temperature situation to prevent system failure. The mechanisms provided in the SOC implementation or processors should cooperate with this circuit to provide the function of managing the offset temperature. • Thermal Management Unit: In Section 3.1.2, a detailed architecture design of a thermal management unit will be discussed. • Thermal Management Algorithms: The implementation in Section 3.2 provides the flexibility to implement different thermal management algorithms and heuristics. In Section 3.2.1, the capability of this design to implement industry standard ACPI algorithms as well as other algorithms will be discussed. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.2 Detailed Design of Thermal Management Circuit The detailed block diagram of the thermal management unit is shown in Figure 3.2. The scope of this research is the white boxes with shadows; the gray boxes are the functions provided by the system to effect sufficient control with this design. The function of every block is described in the following paragraphs. r Temperature " N Temperature \ Sensors w Acquire Unit V / Active Cooling Unit: Fan Controllers Program m able Unit R egisters and Masks ( > r ........... A ctive Cooling Unit: System Speed CP U /System <— ► Controller V J V J Program m able W atchdog Unit Output and Interrupt G enerator Figure 3.2. Block diagram of the dynamic thermal management circuit • Temperature Acquire Unit: This unit is simply an interface to acquire temperatures from sensors. This circuit could be very different when applied to different temperature sensors. The major function of this circuit is to convert and latch the temperature input to parallel digital values. • Programmable Unit: This unit contains threshold registers to program the high and low threshold values for each temperature sensor. Two offset Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperature bounds registers for upper and lower bound of offset temperature (or temperature difference between sensors) are also included. With these threshold values, the watchdog unit can generate interrupts for desired situations. Three fan-speed registers provide the setup for integrated fan controllers. Interrupt mask and offset mask registers indicate which interrupts should be enabled and which set of temperature sensors should be included for offset temperature monitoring. Finally, decoding circuitry and necessary configuration registers provide the communication signals between the processor and other circuit blocks. • Watchdog Unit: This unit contains two monitoring circuits: the threshold monitor for each temperature sensor, and the offset temperature monitor. Both circuits are designed to minimize circuit area while providing sufficient speed to compare the sensors provided in the system. • Output and Interrupt Generator: This unit provides data outputs that are read by the system CPU, like temperature value, offset temperature value, and interrupt types. • Active Cooling Unit-Fan Controller: There are two active cooling units: the integrated fan controller and the system speed controller provided by the system. Three fan controllers are provided in this circuit. 3.1.3 Operating Modes The operation of the thermal management system can be divided into three modes from processor’s point of view. They are the programming mode, data Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. acquiring mode, and interrupt mode. Each mode requires different timing and data order definitions, which will be implemented in the programmable unit of the system. Detailed signal assignments and timing information are provided in Chapter 4, but the basic functionality of each mode is described in the following items. • Programming Mode: This mode provides the function to program the threshold registers for temperature sensors and offset temperatures, masks registers for interrupt and offset temperature monitoring, and fan stage assignments for the integrated fan controllers. To conserve address space, the multiple temperature sensor registers will be mapped to the same address, with the configuration register setting specifying which set is actually being accessed. • Data Acquiring Mode: This mode provides the capability for the processor to read data and status from the thermal management circuit in a polling fashion. Information like current temperatures, offset temperatures, setups, and interrupt status can be acquired by the processor at any time to flexibly support different thermal management algorithms. • Interrupt Mode: Interrupts are provided for designed alert conditions, and interrupt type information is also provided when the interrupt service routine reads the interrupt type register. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 Thermal Management Algorithms Figure 3.3 presents a typical flowchart to utilize the designed thermal management system. Based on the architecture proposed in Section 3.1, the final thermal management algorithm will take advantage of this design. An implementation based on the following enhancements to industry-standard and other thermal management algorithms [69] is proposed. • Monitoring multiple sensor, thresholds, and temperature gradients. • Monitoring spatial temperature offsets in SOC designs to provide information for a processor to dynamically change the speed of different blocks. • Using an interrupt mode of this design to minimize processor loading for thermal management routines. • Using multi-stage on-chip fan controllers accurately controls the package temperature. • Using the developed analytical model to calculate the amount of cooling effort that should be applied in certain circumstances. The implementation of the industry standard thermal management algorithm and example of thermal management mechanism with above enhancements that simulated in the final system integration are presented in Chapter 4, Section 4.5 and Section 4.6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. — S oftw are — ► J k Yes No Temperature Gradients Above Threshold ■Yes No ^ Sensors' N Temperature Above Threshold ^Tem perature,^ Interrupt Acquire Temperature Invoke Cooling Mechanism Hardware reset Software Inititiate Setup Registers Figure 3.3. Flow chart of thermal management system Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3 Analysis and Evaluation of Temperature Gradients According to the analysis in Chapter 1 and predictions and measurements here, the quantification of spatial and temporal temperature offsets show that local heat-up cannot be neglected, especially for mixed-signal integrated circuits, multi-chip modules, circuit reliability analysis and system-on-chip design. A compact model to predict the temperature gradient on the chip is necessary, which identified the temperature gradient ranges in system-on-chip design that provide information to place temperature sensors as well as design a suitable critical value for thermal management algorithms. In this section, an analytical model based on a general heat transfer assumptions and general heat source set-up in integrated circuits is presented. Applying differential equation theories, a set of continuous equations is achieved, which describe the thermal transport properties at each location of a die. The model provides clean, simple analytical solutions of temperature dissipation, and yields the temperature gradient and junction-temperature estimation of transistors on VLSI devices. In Section 3.3.1, assumptions and justifications for the model are described; In Section 3.3.2 the derivation is described; and in Section 3.3.3, the calibration and discussion on how to utilize this model is presented. In Chapter 4.4 a thermal evaluation chip is presented to provide the calibrations coefficient and to validate the model’s prediction. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.1 Goal, Assumptions and Scope of Model The goal of this model is to develop a set of equations to calculate the temperature distribution regarding geometry and time. A chip with a periodic heat source applied in the center of the upper face is defined. The heat transfer on fives sides of the chip (upper side and surrounding four sides) are modeled as adiabatic processes, while the bottom side attached to the package is modeled as an isothermal process. Therefore, heat will dissipate from the center to the package. This assumption is based on the packaging style of most die. The five adiabatic sides have a die to air interface, which has negligible heat transfer compared to the die-package interface. The bottom side is isothermal, because the package is assumed adequate for dissipating heat immediately from the die to the outside environment. Figure 3.4 shows the heat source and dissipation assumptions. Adiabatic P rocess: Single Periodic die to air interface (5 sides) Heat Source Isothermal process: ' ~ die to package interface Figure 3.4 Assumptions of heat source and thermal boundary conditions Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using the above assumptions and boundary conditions, the goal and scope of the model development is defined as follows: ■ Predicts on-chip temperature distribution and its range. ■ Aids temperature sensor placement. ■ Predicts temperature gradients between sensors and target hot spots. With the above goal and scope, a direction of model derivation is formulated. The assumptions defined earlier are suitable for this application, for the following reasons: First, a point heat source is chosen because it is more general then any other kind of heat source, and mathematically it enables a pure analytical solution. Second, using a point heat source model may cause problems when a sensor is really close to the heat source; however, in that case, a model is not necessary. Since the sensor is sufficiently close to the heat source, the temperature reading can be treated as the actual temperature. Third, a periodic delta function is used to represent the heat source pattern. Although its characteristic is not exactly the same as a heat source on the chip, it is similar enough to the digital switching behavior of most VLSI circuits. By using the delta function and point heat source assumptions, a compact, tractable mathematical model is easier to derive. Finally, a classical dynamic model is used since for the target range being sought it is not necessary to consider in the region of solid-state theory domain. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. With these directions, an analytical model is derived in Section 3.3.2, and methods for associated calibration and validation are presented in Section 3.3.3. 3.3.2 Derivation of Analytical Model Using the above assumptions as the boundary conditions, and solving using partial differential equation theory, we get the following results. Define the temperature function as u(x,y,z,t) with x,y,zGF, t >0 where t is time, F is the geometry of the die, x,y,z represent the coordinates, L is the length of the edge of the die, and w is the thickness of the wafer. Assume there is a unit heat source placed at the point (0,0,0) at t-0. We want to know the temperature distribution of T at any given moment for t>0. The boundary condition is prescribed by a Neumann problem (adiabatic) on 5 faces of r other than the bottom face, and a Dirichlet (isothermal) problem on the bottom face. From partial differential equation theory, we know this problem has a unique solution for u. Let’s assume u(x,y,z,t) - ui(x,t) U2(y,t) U3(z,t). Then from the Libniz rule, we know Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Therefore if we can solve ui(x,t), U 2(y,t) and u$(z,t) satisfying the prescribed boundary conditions, then we can solve u. Each function has its own boundary condition and solution. Therefore, we obtain the thermal distribution function given in Equation 1. heat capacity of silicon. Finally, we compute the case when there are units of heat added to the origin periodically. The temperature distribution function, u„(x,y,z,t) is a periodic function after infinite unit heat sources are added. Using equation (1), a temperature function that sums up the periodic heat source can be derived as Temperature (2k+\)2 x 2 (t+— )y time coordinates - 4 k 2x 2( t + j ) y 0 ) where y= , K is the thermal conductivity, p is density, Cp is the P*Cp ,where 0<t<r, / is the frequency (2) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We then combine Equation 1 and 2 to form Equation 3, which can be used to predict spatial and temporal temperature differences. A complete mathematical derivation is shown in Appendix A. 3.3.3 Calibration and Validation Calibration and validation of the derived model is necessary from the discussion in Section 3.3.1 and derivation in Section 3.3.2. Couple assumptions, delta-function, point heat source, is simplify of a real world. Fortunately, these factors lead to a systematic shift in the same factor — heat source — , which can be calibrated with a thermal evaluation chip that provides measurement information to calibrate the shift between the heat source pattern and boundary conditions. In Chapter 4.4, A description of such thermal evaluation chip is presented, as well as measurement and calibration results. Besides calibration, the thermal evaluation chip can also used to validate the derived model using different combinations of sensors and heat sources. The basic idea is that calibration is used using one combination of sensor/heat source combination. After this, other combinations that use other sensor with the same heat source can be measured. Then the comparison between the measurement data and model’s prediction can be used to validate the model. In Chapter 4.4, a complete measurement and validation is presented. ut(x,y,z,i)='Y% Z ji 2knx e L cos( Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. With the derived model, the problem indicated in Chapter 1.6 can be solved, and the objective in Chapter 2 can be completed as well. This derived model provides sufficient information needed for pre-implementation analysis of a thermal management system for SOC design. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Implementations and Results In this chapter, the implementations and experimental result of the designed thermal management system are presented. In Section 4.1 an implementation of an embedded system version is presented. In Section 4.2, an IP-based implementation for SOC design flow is designed. A multi-level fully integratable controller for a fan and power management will be presented in Section 4.3. In Section 4.4, a thermal evaluation chip will be presented for the calibration and verification of the thermal distribution compact model discussed in Section 3.3. In Section 4.5, an implementation of thermal management algorithm using this design is presented. In Section 4.6 the system integration of the designed architecture is presented which shoes the ability to integrate this IP into any SOC system with minimum effort. In the same section, an experimental set-up to integrate this design and its simulation results are also discussed. Finally, Section 4.7 will conclude this chapter. 4.1 Implementation: Embedded System Version In this section, an implementation of the designed thermal management system for an embedded multi-computer system is presented. In Section 4.1.1, the design of the Thermal Management Interface Circuit (TMIC) for PowerPC-based multi-computer systems is presented. Following in Section 4.1.2, the circuit implementation and experimental results are presented. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.1 TMIC for PowerPC Based Multi-Computer Systems The Thermal Management Chip (TMC) was developed for use in an embedded multi-computer system with an integrated hierarchical thermal management scheme[ll]. This system design supports temperature reading and thermal control activity to be performed locally at a node but also provides for global control capability at the system host as well. Each node of this Integrated Thermal Management (ITEM) system contains a PowerPC 604 CPU[80], an Enhanced Router Interface (ERIF)[42], a number of DRAM devices, and a TMC. The TMC device contains an on-chip temperature sensor[39] with an integrated A/D converter [81]and a Thermal Management Interface Circuit (TMIC). The ERIF device is a custom component that contains a network router, network interface, PPC604 bus controller, and a DRAM controller. Additionally, the ERIF contains 3 embedded ring oscillators to serve as temperature indicators. These temperature sensors as well as the analog temperature sensor in the TMC are based on designs that have been optimized through previous research[39]. The Thermal Management Interface Circuit (TMIC) is the configuration and interface portion of the TMC. It contains configuration registers to allow system software to set the sampling rate for reading the temperature and a threshold value for which an interrupt is generated, as well as PowerPC 604 bus interface circuitry to communicate with a node processor. Figure 4.1 shows a block diagram of the TMIC. The detailed function of each portion is described in the following paragraphs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • • Ring O scillator Power Selector O scillator Selector- Configuration Register D0-D3 SYSCLK, RST Clock counter 0 S C 2 DO-07 Sample Register O SC 3 Mux Register Loading Signals Bus Enable Signals Com parator A0-A3I ► -bit down counter Power PC 604 Interface TS, DBB, Re set TA, A A fK -to ad in g E nable DO-C Temperature Register Tem perature reading MUX D0-D7 Threshold Register O n-C hip Tem perature Sensor/ Digital Filter Interrupt Enable Com parator Interrupt- Figure 4.1 Block diagram of Temperature Monitoring Interface Circuit • PowerPC 604 Interface: supports the bus arbitration policy and four-clock burst read and write mode [80] for Power PC 604 processors. It translates the CPU address and control signals into corresponding internal control signals in order to access (read or write) the configuration, sample, and threshold registers and monitor (read) the sampled temperature value. Configuration Register: 4-bit register that contains 2 temperature sensor selection bits, one interrupt enable bit, and a threshold flag bit (read-only), which indicates the 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sampled temperature has exceeded the specified threshold temperature. Bit assignments are optimized to reduce the number of clock cycles needed for checking the threshold flag. The sensor selection bits are used to specify which temperature sensor is currently being sampled (the embedded one on the TMC or one of the 3 external ring oscillator temperature sensors in the ERIF). The combination of the interrupt enable bit and threshold flag allows the system programmer to choose to implement polling-based or interrupt- driven (or some combination) temperature monitoring. • Sampling Register: integer value that specifies how many clock cycles elapse between temperature samplings. This value is compared with the value of a continuously incrementing counter. When the two values match, the current temperature sensor value is latched into the temperature register, and the counter is reset to 0. The value of the sampling register may be very different for each sensor or thermal management algorithm requirement. • Threshold Register: integer value that specifies a threshold temperature. When the temperature register value exceeds this threshold value, the threshold flag is set, and an interrupt is generated if the interrupt enable flag is asserted. • Temperature Register: 8-bit register that stores a value from the currently selected temperature sensor. The temperature register is updated at the periodic rate specified by the sampling register. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Interrupt Generator: when the temperature reading exceeds the value of the threshold register, an interrupt is generated if the interrupt enable in the configuration register is asserted. The capability to disable interrupts provides the option for the system design to implement an active or passive thermal control algorithm. As mentioned above, any one of three ring oscillators on the ERIF chip or the on-chip (TMC) temperature sensor can be monitored to provide the flexibility of measuring temperature from different locations in the system. For the ring oscillator temperature sensors, the TMC contains an 8-bit down counter that is used as a frequency counter, which is calibrated by SPICE simulations and measurements. Depending on the value of the sensor selection bits in the configuration register, either this 8-bit counter value or the 8-bit digital filter output of the on-chip sensor will be stored in the temperature register. The temperature sensor selector also controls the supply power of ring oscillators. By disabling unused temperature sensors, extraneous heat and noise sources may be eliminated. The TMC registers occupy two cache lines in the node memory map, one for the temperature register, the other one for the read/write of Configuration, Sample, and Threshold registers. The architecture parameters, such as register bit widths and address mapping are based on justifications of SOC design but conserve resources by allowing only burst-mode memory accesses due to the limit of the off-the-shelf processor. Also, careful address and bit assignments were made to minimize the TMC pin counts. However, these compromises and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. enhancements do not limit this architecture from being fully integrated into a processor as a SOC design. 4.1.2. Circuit Implementation and Experimental Result This chip was fabricated on an HP 0.5pm single-poly 3-metal process through MOSIS. The die microphotograph is shown in Figure 4.2. The temperature sensor[39], digital filter[82] and TMIC are indicated on the photo. The TMIC occupies 231.3pm x 1094.4pm of the area and contains 3293 transistors. The layout of the thermal management interface circuit was generated by Powerview schematic capture tools[83] and Lager synthesis tools[84] using standard cells developed for the ERIF chip. These standard cells have been modified previously to fit sub-micron processes. Functionality of the TMIC was verified by Powerview simulations using Lager standard cell VHDL models and Berkeley IRSIM[85] at the transistor switch level. Both simulations indicate this chip was fully functional at the system clock requirement of 50MHz. An initial lot of 5 TMC die were packaged in 40-pin DIPs for low-cost functionality testing. Upon successful results from this test, the remaining TMC die were packaged in 40-pin LCC packages for inclusion in the ITEM system. A final system node board is shown in Figure 4.3. The TMC works perfectly at the targeted system speed of 50MHz. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Thermal Management Interface Circuit Temperature Sensor Figure 4.2 Microphotograph of Thermal Management Chip DRAM PowerPC Figure 4.3 ITEM system node board From the presented chip and node board, an embedded system design faction validates the designed thermal management system. As mentioned in Section 2.3, 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the TMC can be used to implement but is not limited to the ACPI protocol. For instance, the temperature threshold can be set to any number of values to represent any number of critical situations. Fuzzy logic control and other algorithms requiring more levels of alerts can be applied. Also with the capability of actively acquiring temperature measures at any time, the CPU can verify a desired temperature response when it executes a cooling action. With this feedback, future actions like increase/reduce FAN speed and clock rates can be applied for more complex management algorithms. The designed architecture yields a balance of hardware and software that is required for the hierarchical thermal management scheme of the Integrated Thermal Management (ITEM) embedded multi-computer system. The TMC may be used to implement the industry-standard ACPI protocol. However, its elegant, flexible design enables it to implement more complex thermal management algorithms as well. The CAD tool flow used to implement the TMC is also noteworthy. Using this design flow, it would be very easy to change the design to accommodate other CPUs or temperature sensors. The integration and experimental results of the designed TMC has demonstrated the advantage of the proposed thermal management system. In Section 4.2, an enhanced architecture targeting SOCs is presented. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 Implementation: SOC Design A thermal management system implementation for SOC designs is presented in this section. In Section 4.2.1, the architecture and detailed organization of SOC-fashion design is presented; enhancements and a comparison to the embedded system design are also discussed. Following in Section 4.2.2, 4.2.3 and 4.2.4, the circuit implementations of important components are discussed. 4.2.1 Architecture: SOC Implementation A detailed block diagram of the proposed SOC thermal management system is shown in Figure 4.4. S O C P r o c e s s o r T e m p e r a tu r e S e n s o r Interface (0) C onfiguration & R e p o rt R e g is te rs (0 -2 ) P r o g r a m m a b le Unit T e m p e r a tu r e S e n s o r Interface (3) Multi-level controller P ow er/C o o lin g Level Multi-level controller P ow er/C o o lin g Level T h re s h o ld R e g is te rs (0 - 3 ) O ffset T h re sh o ld R e g is te rs T e m p e r a t u r e (0) T e m p e r a t u r e (3) O ffset T e m p e r a tu r e Monitor T h re sh o ld T e m p e r a t u r e Monitor O u tp u t a n d Interrupt G e n e r a to r H o st Interface W a tc h d o g Unit P o w e r Control/Active Cooling Unit —^^en sor^^ i i * — ^ ^ e n s o r ^ ^ T e m p e r a tu r e A cquire Unit P o w e r ^C ontroller F an Figure 4.4 Detailed block diagram for SOC implementation 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This design is a complete implementation of the architecture proposed in Section 3.2. The functions of each block have already been discussed in previous chapters, In this chapter, the detailed implementation is discussed and the comparison in functionality and CAD flow for an embedded system implementation versus a SOC implementation is discussed. The block diagram follows the more general architecture design defined in Section 3.2. The whole system is divided into three major units. They are the programmable unit, watchdog unit, power control/active cooling unit and temperature acquire unit. Detailed information of each unit is presented as follows. ■ Programmable unit: This unit contains 8 8-bit and 8 16-bit special-purpose registers with simultaneously parallel output that configures the other portions of the circuit. The 8 8-bit registers can be expanded for more sensors when needed without implementing extra bus signals. The detailed assignment of these registers is shown in Table 4.1 and Table 4.2. ■ Watchdog unit: The unit contains 2 sets of monitoring registers: Threshold temperature monitor and offset temperature monitor. The design of these two register sets as well as the watchdog unit is discussed in Section 4.2.2 ■ Active cooling unit: The active cooling unit uses an innovative pure digital fully integratable multi-level controller, which can be used for power management circuits that use system management bus or fan controller to trigger a 256 level fan-speed control, the detail design is discussed in Section 4.3. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Address Assignment Register Name Function 0-3 (8-bit) Power/Cooling Level Multi-level controller with 4 sets of 256 level output. 4 8-bit registers program 4 sets o f control values. 4-7 (8-bit) Temperature Temperature values, if more accurate sensor is used, the value can be expanded to 16-bit. 8-11 (16-bit) Threshold Threshold value for each temperature sensor contains 2 portions: 8 MSB represent high threshold and 8 LSB represent low threshold 12 (16-bit) Offset threshold 8 MSB high offset threshold, 8 LSB low offset threshold 13 (16-bit) Report See Table 4.2 14-15 (16-bit) Configuration See Table 4.2 Table 4.1 Register assignments in programmable unit Register Bit Assignment Function Report Register (13) 0-2 Overflow report (T/F 1/0) (Sensor 00-11) 3-5 Underflow report (T/F 1/0) (Sensor 00-11) 6-10 Threshold overflow (T/F 1/0) (sensor combination 00-11,00-11) 11-15 Threshold underflow (T/F 1/0) (sensor combination 00-11, 00-11) Configuration Register I (14) 0 Threshold enable 1 Offset enable 2 Interrupt 3 Interrupt mask 4-7 Sensor enable 8-15 6 sets o f offset combination. Watchdog unit will look for assigned combination sensors for its offset value. Configuration Register 1(15) 0-15 Table 4.2 Detailed assignments of configuration and report registers ■ Temperature acquire unit: The temperature acquire unit stores the temperature values from sensors to registers in the programmable unit. Different sensors required different conversions. However, instead of an onboard design needed Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to interface with a system management bus or other sensor output, in a SOC implementation, this circuit is minimum since temperature sensors are integrated into the same chip with the rest of the system. 4.2.2 Programmable Watchdog Unit with Multi-Channel Input The circuit design of the watchdog unit is presented in Figure 4.5 and Figure 4.6. In both designs, the interconnects between this watchdog unit and programmable unit is reduced using a serial connection. Although the comparison speed will be limited by using this method, a significant amount of chip area that would otherwise be occupied by interconnect between the programmable unit and watchdog unit can be reduced. In addition, both watchdog components use single comparator circuit with multiple inputs from different temperature readings to reduce the circuitry complexity. Thus, an implementation of more than 6 channels can be done with very little extra circuitry, which adds very little overhead if a user increases the number of sensors. However, in some SOC designs, while monitoring speed and response time is critical for the entire system or for some specific sensors, it would be easy to modify this design to add another watchdog unit for that purpose, or use a parallel data-bus for critical sensors. To summarize, a user can always derive a suitably balanced design using the block architecture developed in this research. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Read out Highest Temperature Replace c CO c o> 5 5 Current Temperature Replace Lowest Temperature Selector with Masks Read out Figure 4.5 Offset temperature monitor High T h re sh o ld Serial In MUX c High T h re sh o ld n L Interrupt 0 5 ) 0 la 3 Sensor Reading a u 0 a E 0 ) l- Jnterrup t Low T h re sh o ld S e le c to r with M a s k s MUX Low T h re sh o ld Serial In Figure 4.6 Threshold temperature monitor Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.3 Architecture Enhancements to Embedded System Version Comparing with the circuit in Section 4.1, the following enhancements are made in this circuit. First, the number of temperature sensors has been increased to fit the need of more complex systems to monitor temperature in several locations throughout the system. Second, the updated architecture provides simultaneous monitoring of multiple temperature sensors instead of the previous approach of single-sensor monitoring at a time. Third, circuits to monitor temperature offsets between sensors and thresholds for interrupts that provide alerts other than package temperature have been added. Finally, the threshold values have been expanded to have upper and lower limits for each sensor for more robust and flexible controls. 4.2.4 CAD Flow and Implementation The CAD tool flow for this SOC version is completely different to what was used in the embedded system version. The SOC version is implemented by a cell- based ASIC design flow, in which VHDL is used to define all the units, and Synopsys is used to synthesize this design using Artisan standard cells. The physical design is generated by Cadence place-and-route tools. The detailed information of synthesis result and system integration is discussed in Section 4.5, too. By using an ASIC CAD tool flow, it would be straightforward to implement this design to any SOC design or any process technology with minimum effort. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3 Component: Multi-Level Controller The multi-level controller used in the active cooling unit is discussed in this section. For SOC compatibility, this unit is designed to be a fully integrated, digital-process-compatible circuit that can translate the assigned power level. This circuitry can be applied to two kinds of devices: fan controllers and power management. Since many kinds of fan controllers/power management can be defined for different systems, a more general solution that uses a multi-level controller for both power and fan applications is targeted in this research. A multi-level controller using a binary to thermometer code converter is designed. Besides this, the frequency used in power management and fan controllers is very similar too. They are both much slower than the system speed. This characteristic leads to the design in this section. Fan controller design issues are surveyed, and a resulting design is proposed. Furthermore, by using the same circuitry block that eliminates the power transistor driving the fan, a multi-level controller suitable for power management input is also attained. In Section 4.3.1, a review of integrated circuit based fan controllers is given. A fully integrated fan controller is presented in Section 4.3.2. Following in Section 4.3.3, an experimental result and design summary are discussed. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.1 Review of Integrated Circuit Based Fan Controller Linear drive and PWM (Pulse-Width-Modulation) drive are two major classifications of fan controllers [69]. A linear drive provides a simple circuit design by using a DAC (digital-analog converter), but it requires a tachometer output as a feedback mechanism. However, a PWM drive is more attractive since it results in less power consumption for the fan controller, with a linear correlation between the input controlling signal and the active duty cycle of the fan. PWM drive also guarantees the operation of the fan at low fan speeds, while a linear drive design may encounter fan startup problems and very poor efficiency for the DC motor. Recent advances have shown the PWM drive to be beneficial in several aspects. However, most systems do not employ the PWM drive method due to the higher complexity and cost of the PWM circuit. On the other hand, both linear and PWM drives are generally implemented by linear circuit, which made it hard to integrate to most on-chip systems. In Section 4.3.2, a fully integrated fan controller design is presented. The technical details from the building block, prototype, circuit design and final layout are presented in the following sections. The fully integrated digital design is very rare in this kind of circuit but is necessary for fulfilling the requirements o f building blocks for this proposed architecture. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.2 Basic Building Blocks of Fully Integrated Fan Controller The basic block diagram of the proposed fan controller is shown in Figure 4.7. As shown in this block diagram, the fan controller unit is implemented with pure digital design and includes only a decoder, shift-register-based PWM unit, and a clock generator. The decoder is a binary to thermometric code translator while the shift register has a parallel input and has its serial output connected back to its serial input. This controller unit can be easily integrated with the Thermal Monitoring Unit as presented in Section 4.1 and 4.2. In this design, the PWM frequency (fpwM) for the fan controller is set to 20Hz (fciock=300Hz), a tested value optimized for efficiency (transition loss), noise radiation, mechanical vibration, and smoothness of the fan output. With a 15-bit shift register, this PWM Unit has 16 different output levels, which provides sufficient flexibility for thermal management systems. Instead of using a PWM with complex circuitry and high power consumption, a PWM using a single shift register at the designated clock speed is adequate for thermal management applications such as this one. As shown in Figure 4.7, an NMOS transistor is used to switch the fan on the low side. By placing this switch on the low side of the fan, a variety of fan supply voltage values is supported. While the transistor is on, VD s is very small, which means there is almost no power consumption in the drive circuit as compared with the power consumption in the fan. A feedback tachometer is not necessary since PWM drive guarantees linearity between the driving ratio and the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fan speed/power consumption. A clamping diode is used to protect the NMOS transistor from voltage spikes triggered by the inductive fan during cut-off of the fan current. Clock G enerator clock N-bit D ecoder P arallel in Serial in 9 ----- N2-1 bits Parallel-in Serial-out Shift- Register S erial out -Q FAN Figure 4.7 Block diagram and system integration of proposed fan controller unit 4.3.3 Experimental Result A prototype of this controller circuit has been implemented by using discrete components, including a standard timer and CMOS logic devices. For verification purposes, the number of levels in the PWM Unit was reduced to 8. Accordingly, the clock generator was run at a reduced speed of 168Hz, which leads to a PWM frequency of 21 Hz. An NMOS power transistor was used to switch the fan. As Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shown in Figure 4.8, this prototype was implemented on a multi-purpose PCB with a flat cable connector connected to fan-stage assignment signals. A fully integrated chip design (except for the power NMOS and diode) of this circuit is also laid out for fabrication in a HP 0.5pm single-poly 3-metal process. By using the Powerview[83] schematic capture tool and Lager synthesis tools[84] with standard cells developed for previous designs, the layout of the fan controller is effortlessly generated with a very small active area (267pm x 188pm). A picture of the circuit layout is shown in Figure 4.3. This circuit consists of 800 transistors operating at a low frequency of 300Hz, at which power dissipation is negligible compared with the NMOS power transistor that drives the fan. The detailed information regarding the designed circuitry is shown in Table 4.3. The comparison of this design with traditional PWM drives and linear drives is shown in Table 4.4. Figure 4.8 Circuitry board for design verification Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.9 Layout of the fan controller circuitry Process HP 0.5pm single-poly 3-metal process. Pure digital design, easy to merge with Thermal Management system as well as CPU circuitry Design Tools Powerview, Lager, Standard cell design Size 267pm X 188pm Transistor Count 800 Gate Statistics 15 D Flip-Flops 15 Muxes 13 Inverters 22 NAND Gates Operating Frequency 300Hz Table 4.3 Specification of designed circuitry layout Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fan Controller Type Traditional PWM Drives Linear Drives Proposed Design Controller circuit complexity High Medium Low Driving Efficiency (Fan-power/ Control-circuit power) Medium Low High System Integration Hybrid linear device required Hybrid linear device required Single chip solution, pure digital design Table 4.4 Comparison between different fan controller designs From the prototype shown in Figure 4.8, the power consumption of the power NMOS is very small (< 0.2W), and hence a surface-mount (SMT) power transistor and diode can be used in the final system integration to provide a compact product. The active area of the Thermal Monitor Unit together with the fan controller circuit is very small (232jjm x 1350jam in a 0.5pm process), which can be further integrated inside a CPU or SOC design. The necessary configuration data be programmed via the data bus of the CPU, which leads to only 1 I/O pin required for controlling the power NMOS. This circuitry has also been implemented by a VHDL specification that can be integrated with the complete thermal management system in Section 4.5. Detailed information and an implementation using that approach is discussed in Section 4.5. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4 Thermal Evaluation Chip and Model Calibration and Validation In this section, A thermal evaluation chip (TEC) is designed and fabricated. The design of this chip, which matches the assumptions stated in Section 3.3, enable it to be used to validate and calibrate the model discussed in that section. In Section 4.4.1. the design of the Thermal Evaluation Chip is presented. The experimental results and measurement data are shown in Section 4.4.2. In Section 4.4.3, a calibration and validation to the analytical model presented in Section 3.3 is described. 4.4.1 Design of Thermal Evaluation Chip The goal of this design is to measure the spatial temperature differences on chip. Five circuit modules are located in a spiral floor plan to produce a variety of combinations of heat sources and sensors. A picture of each individual module is shown in Figure 4.10. Each module contains a pad driver, a poly-silicon resistor, and a diode. The pad driver is used as the heat source. Poly resistors and diodes function as temperature sensors. A picture of the chip layout is shown in Figure 4.11. The chip was designed and fabricated using a MOSIS 0.5(im process. Each pad driver is considered a point heat source since it occupies less than 1% of the chip area. Several researchers have verified the accuracy of using diodes and resistors as on-chip temperature sensors[40], Non-silicide polysilicon Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. resistors are used here, which are provided in the MOSIS 0.5|xm process. The diodes and resistors were calibrated using a thermal bath with a thermal chamber and test board. After the calibration, we applied linear regression to get the algebraic expression and lookup table for the temperature sensors. With the above setup and calibration, different combinations of heat sources and temperature sensors have been measured. With different combinations of variables, spatial temperature differences are measured. One set of experiments was conducted with a fixed heat source location and varying sensor locations. Another set was conducted with a fixed temperature sensor location and varying heat source locations. Experimental results are shown in Section 4.4.2. (lilt Pad Driver Diode Figure 4.10 Layout of circuit module Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Module #2 Module #1 Module #5 mmmmmmm Module #3 Module #4 Figure 4.11 Microphotograph of the test chip (Circuit modules in a spiral pattern from center to border are labeled PAD1, PAD2, PAD3, PAD4, PAD5, respectively.) 4.4.2 Experimental Results from TEC Measurement A selected subset of experimental results is presented in this section. Figure 4.12 presents the spatial temperature gradients when the center heat source is applied. We can see the temperature of module #2 is 16°C higher than module #5 at 40Mhz. The measurement result of module #3 is used to calibrate the 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. analytical model since it has a suitable scale of distance to the sensor as well as a good range of temperature. center heat source Sensor Locations Temp R3(K) P ADR 1 PADR2 -*-P A D R 3 0.00 20.00 40.00 60.00 -«-P A D R 4 — I — PADR5 Frequency MHz Figure 4.12 Spatial temperature offset from measurement A fixed sensor location is chosen and heat source locations are varied for the results shown in Table 4.5. The temperature of module #1 is measured while the location of the heat source is varying among module #2 - module #5. As shown in the data, a temperature gradient of 20.65°C between 0.556W applied at PAD5 versus PAD2 is measured. These results show that the spatial temperature offset generated by different locations is significant. In this case, the boundary conditions from the analytical model are not 100% valid since the heat source is not in the center. However, using this set-up, the following important phenomenon can be observed: A large variance in temperature can occur for different floorplans of a circuit even when the total power on a chip is constant. The floorplan scheme not only affects the temperature distribution on chip, but also influences the package temperature as well. In Figure 4.13, the package Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperature is measured while varying the heat source location. There is a 13.40°C difference between a heat source applied at the center versus the border. This result matches the prediction in Chapters 3 and confirms that different heat dissipation patterns due to different floorplans for the same circuit is not negligible. Fixed Sensor Location Experiments Circuit Frequency Heat Source Locations #2 #3 #4 #5 1.00 29.72 31.90 31.94 29.99 5.00 30.96 31.35 31.39 30.26 10.00 33.18 31.00 31.55 31.74 15.00 32.40 33.26 33.61 30.65 20.00 33.96 31.78 34.81 32.67 30.00 41.89 33.92 34.62 34.39 40.00 53.52 36.37 35.44 32.87 50.00 45.55 39.13 36.02 33.84 Table 4.5 Module #1 temperature with different heat sources Package Temperature Heat Source Locations 50.00 -* -P a d 2 Pad3 Pad4 Pad5 40.00 30.00 20.00 20.00 40.00 Frequency (MHz) 60.00 0.00 Figure 4.13 Package temperature Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.3 Calibration and Validation of Analytical Model Using the Thermal Evaluation Chip described in Section 4.4.1, and the thermal and physical parameter lists in Table 4.6, a calibration and validation of the analytical model presented in Section 3.3 is discussed in this section. Thermal conductivity K = 150 watts/(m-k) Density of Si p = 2.47 gm/cm3 (2470 kg/m3 ) Heat Capacity of Si Cp = 0.181 x 4.18 W-s/(kg-K) Thermal diffusivity y = K / (p x Cp) (0.0000802674 cm2 /s) Length of chip L= 1940 pm Depth of the chip W = 625.0 pm Average heat power 0.556 W Frequency 50Mhz x 2 Heat per time 5.56 x 10'9 Table 4.6 Thermal and physical parameters of TEC The first step of calibration and validation is to determine the depth index needed for model derivation. Using the physical parameters in Table 4.6, a result from equation (3), for the die’s top surface temperature with a depth index of (k - 20 ~ 200), is shown in Figure 4.14 and Figure 4.15. Suitable values for k depend on accuracy and model runtime constraints. As shown in the following figures, a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. depth index of (k = 200) is sufficient for the precision level that could be measured from the Thermal Evaluation Chip. For a little more insight into the role of the depth index, observe Figure 4.14 and 4.15. In Figure 4.14 using a small k of 20, the equilibrium temperature to verify the temperature offset from the measurement is achieved. With a larger k value, (k=200 for Figure 4.15), a smaller portion of the chip in a more detailed fashion can be examined. The expected ripple-style temperature surface which is due to the periodic heat source is attained. The swing on the surface of the chip is 0.02°C outside the heat source area. © c & 10 CO o - 0 . 0 0 5 0 . 005 I/I,. 0 \ . \ A - 0 . 0 0 5 ^ * » > < « , 0 005 Figure 4.14. Temperature distribution on die surface (z-axis represents relative temperature difference(°C), x,y represent location on the surface(m), k=20) 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.15 Temperature profile on a portion of die surface (z-axis represents relative temperature difference(°C), x,y represent location on the surface(m), k=200) After determining the adequate k range, the next step uses equation (3) to calculate the equivalence temperature of each sensor location assigned in the Thermal Evaluation Chip. Figure 4.16 shows the temperature plot of module # 4 over time. In this Figure, the influence of periodic heat sources is presented; also, a significant error when the time is close to zero is presented. These extraordinary values are due to the assumptions used in derivation of the model. To calibrate these errors, a better way to calculate the equivalence temperature is shown in equation (4). T Equivalence Temperature = Heat-per-time x j u t ( x , y , z , t ) i x (4 ) t= o Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3 0.006 0.008 0.004 3 Figure 4.16 Temperature plot of module #4 over time In the above equation, the equivalence temperature is an integration of the temperature function ut over space and time, which is then averaged by dividing by time T and multiplying by heat-per-time, which can be found in Table 4.6. Using the method described here, and applying this procedure from module #2 to module #5, a series of predicted temperatures from the model is presented in Table 4.7. However, the discussion of calibration in chapter 3 tells us that a factor of calibration is necessary to reconcile the heat source assumption with the actual heat source. In Table 4.7, a measured result is presented, and a calibrated model prediction with a factor of 0.793 that was calibrated using module #4 is also displayed. In the last column, the error between the calibrated model results and measurement results is also provided for discussion. From Table 4.7, after calibration, the calibrated model yields a fairly precise prediction of temperature of Module #3, which has an error of only 1.1%. On the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. prediction of Module #2, an acceptable error of 13% is achieved; however, at this particular location, this error is expected because of the rectangular heat source shape, which is bigger than the point heat source assumed in the model derivation. This is also the reason that Module #4 is selected as the calibration point, since it has enough separation distance to treat Module #1 as a point heat source and the temperature reading is significant enough for the calibration of other modules. Predicted Temperature by original Model Measured results from TEC Calibrated Model Results (factor 0.793) Error Module #2 18.23 12.71 14.45 13% Module #3 8.03 6.29 6.36 1.1 % Module #4 4.01 3.18 3.18 N/A Module #5 0.12 0.00 0.09 N/A Table 4.7 Temperature from models and measurement From the above results, the Thermal Evaluation Chip has validated and calibrated the thermal distribution model discussed in Chapter 3. The scaling factor obtained from the calibration procedure extends beyond the Thermal Evaluation Chip. The achieved calibration factor 0.793 can also be used for other digital circuits since this factor is due to the mismatch between a theoretical periodic heat source and real circuit switching behavior. Thus, a final temperature function can be achieved in equation (5). T Equivalence Temperature = 0.793 x Heat-per-time x J " ut(x,y,z,t) / x (5) /= o Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 Thermal Management Algorithms In this section, an implementation of the ACPI (Advanced Configuration and Power Interface) thermal standard using the thermal management system is presented. By implementing this industry-standard algorithm, the capability of this architecture is verified. Besides the ACPI’s thermal management schemes, the implementations for more flexible and complex algorithms are also discussed. A direction for optimal thermal management algorithms for this architecture is also discussed. 4.5.1 Implementation of ACPI Standards Different approaches for a thermal management system can be easily implemented with the proposed architecture, since this system provides flexible ways for systems to read the temperature, set the threshold value for interrupt generation, and measure temperature values from different sensors. In this section, how the thermal management system can be used to implement an ACPI- compliant protocol[69-71] as an example for other thermal management algorithms is illustrated. A typical flow chart for an ACPI implementation[69] is shown in Figure 4.17. The system acquires a temperature and first determines if it is within the current granularity window, repositioning the window as needed. If the sampled temperature has exceeded the Passive Cooling (PSV), Active Cooling (ACX), or Critical Temperature (CRT) thresholds, corresponding defined actions, such as reduce CPU clock, activate fan, or system shutdown, will take Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. place. Most software approaches require a continuous loop, while hardware approaches use interrupts. The advantage of the hardware approach is that processor time is used for thermal management only when certain situations require it. With the use of this design, all system actions needed to implement the ACPI protocol are triggered by the actions of acquiring the temperature and generating an interrupt when the sampled temperature exceeds the threshold. These two actions can be used to implement the ACPI protocol as shown in Figure 4.18. Here, there is no concept of a granularity window-all temperature values are of importance. A threshold register is initialized to the PSV value in the first step. If an interrupt is generated, it indicates that the PSV has been exceeded, so the CPU performs a predefined action for this threshold and resets the threshold register to the ACX value. This time, when an interrupt is generated, it indicates the ACX value has been exceeded, so the CPU performs a more severe predefined action and now resets the threshold register to the CRT value. Since the thermal management module continuously samples the temperature and compares it against a programmable threshold, CPU resources are required only when significant events have occurred. In contrast, software implementations require CPU resources even under default circumstances because they rely on polling to sample temperature values and then must perform computation to determine if a significant event has occurred. A further danger with software implementations is that significant events may be easily missed if the polling period is not Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sufficiently small, in such condition, a system may crashed before its designed reaction to an exception. Compared to other hardware implementations, the proposed thermal management system reduces the number of different interrupts and requires the same amount of software cooperation. Furthermore, unlike most hardware implementations, this design provides the ability for the system to actively acquire temperature readings in addition to passively waiting for critical situations. 4.5.2 Implementation of Flexible Standards This design can be used to implement but is not limited to the ACPI protocol. For instance, the temperature threshold can be set to any number of values to represent any number of critical situations. Fuzzy logic control and other algorithms requiring more levels of alerts can be applied. Also with the capability of actively acquiring temperature measures at any time, the CPU can verify a desired temperature response when it executes a cooling action. With this feedback, actions like increase/reduce FAN speed and clock rates can be applied for more complex management algorithms. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Below Granularity Window Yes Above Granularity Window Yes Yes PSV Yes ACx Yes CRT Increase Winodw Invoke Active Colling Invoke Passive Cooling Invoke Critical Temerature Shutdown Decrease Window Poll Sensor Figure 4.17 Typical flowchart of ACPI standard Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Software - S o f t w a r e ► T h e r m a l M N o Invoke Passive Cooling Store ACx to Threshold Register Int N o Int N o Invoke Critical Tem perature Shutdown Int / T e m perature ^ Above Threshold \ T e m perature / Tem perature ^ Above Threshold \ T e m p e r a t u r e / / Tem perature x Above Threshold \ T e m perature / Hardware Acquire Temperature Software Initiate Set registers Select Tem perature sensor Store P S V to Threshold Register Hardware reset Invoke Active Colling Store C R T to Threshold Register Figure 4.18 ACPI implementation by designed thermal management system Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6 System Integration In previous sections of this chapter, each component and its development roadmap for the proposed thermal management system was presented. In this section, the system integration of the designed architecture is presented. In Section 4.6.1, a VLSI implementation of a SOC design based on the architecture and design in Section 4.2 and 4.3 is presented. Following in Section 4.6.2, simulation results that demonstrate the function and behavior of designed thermal management system is presented. 4.6.1 VLSI Implementation of SOC Design Based on the design stated in Section 4.2 and 4.3, the schematic of the final system integration is presented in Figure 4.19. Two major components are designed and implemented: Thermal Management Unit and Multi-level controller. Both of these designs follow the specifications in Section 4.2 and 4.3. Each module is designed and synthesized using Synposys tools, the timing and circuit size of each module is optimised to adhere to the target system speed of 200 MHz. The layout of the whole thermal management unit and four multi-level controllers is generated using Cadence Silicon Ensemble. The design is based on TSMC 0.18|im technology with Artisan standard cells. Table 4.8 summarizes circuit haracteristics for the entire design as well as subcomponents. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. FAN0(0:7] fFNO> —EE> r—[FN2> E D - Ee n > < READTEMPI DJLOAD 0_IN{0:7) INDEX(0:3) DATAIN(0:16) DATAOUT(0:15) FAN0(0:7) FAN1(0:7) FAN2(0:7) FAM3(0:7) TEMP0(0:7) TEMP1(0:7) TEMP2(0:7) TEMP3(0:7) — i Figure 4.19 Schematic of final system integration In Figure 4.20, a simulation result of the multi-level controller is presented. The signal din shows the setting driving-ratio in hex form. The signal sout present the pulse-width-modulated driving-ratio output. The result shows that the designed multi-level controller matches the performance of the prototype in Section 4.3 with 16 times of levels (8 bits vs. 4 bits of ratio setting). The simulation results for the thermal management unit will be discussed in Section 4.6.2. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 50000 100000 150000 i I. i .. . > /E/din(0:7) /E/fclk /E/d load /E/sout FF CC AA 88 66 44 22 00 i r n n i Figure 4.20 Simulation results of the multi-level controller Gate count 12308 Power Dynamic: 46.56 mW Cell Leakage Power:871.38nW Area 600 x 600 pm Component Areas and Percentages of Total Area Combination Logic: 135640 pm2 55.28% Flip-flops: 109724 pm2 44.72% Multi-level Controllers: 4x45636 pm2 74.40% Thermal Management Unit: 62822.4 pm2 25.6% Total: 245365.25 pm2 (cell area without interconnects) Table 4.8 Circuit summaries In Figure 4.21, a layout of the complete thermal management design is presented. The purpose of generating this layout is to project the overhead of implementing this circuitry in SOC designs. In Figure 4.22, a layout of a sample SOC design [86] is presented as a target system to incorporate this thermal management design. The thermal management circuitry occupies 600x600 pm, which is 0.44% of the area of the target SOC design. Considering most of the circuitry (44.72% of standard cells is register files) is not switching during the operation, there is only Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.06% power increase for the target system (26.6mW/2.5W) to support the thermal management mechanisms. The overall delay for the thermal management system is 4.56ns, which is sufficiently fast to integrate it into the target SOC system any extra delay. " ' S Figure 4.21 Layout o f final thermal management circuitry Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6.2 Simulation Results A simulation result that demonstrates the behavior of the designed thermal management system is presented in Figure 4.23 (A detailed version of this figure is shown in Appendix B). These results verify the circuit functionality as well as demonstrate a typical system response using the thermal management flow chart presented in Section 3.2 and 4.5. 0 500 1000 1500 2000 2500 I /E/w_en U /E/r_en U .............I I I I i i i i i i i i 1 ........ ■ i i i i i I i i i .......... I i 1 flfl ii i i i i i I m i 1 III H ill ■ 1 u T I I 1 1 /E/clk llllllllllllllllll /E/rst ► /E/index(0:3) U 0[I2 3 45B7p9 E F D 1 2 o | 3 |01 2 3 D ► /E/datain(0:15) U* 0* 0* 0000 * * **** 367B 00B0 * 00B0 ** 00FF ► /E/dataout(0:15) ULJ * *00******* ***** 367B C*' 00B0 ** 00B0 *** 00FF 0000 ► /E/fan0(0:7) 7F EF 9F ► /E/fan1 (0:7) ‘ 0 3 7F EF B0 ► /E/power0(0:7) * C0 FF B0 EF ► /E/power1 (0:7) 00 FF B0 FF /E/readtemp U ► /E/temp0(0:7) JU 86 60 76 56 75 ► /E/temp1 (0:7) JU 7E 61 6A 5A 74 ► /E/temp2(0:7) JU 7D 70 A0 64 53 51 77 ► /E/temp3(0:7) JU 82 70 60 6F 55 71 /E/interrupt Figure 4.23 Demonstration of thermal management system’s function Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 02091109010101310004040407680503060671101004100708070100003010101007067510050000000111011111110707080403080809030910050710100101060609 The sequence of events depicted in Figure 4.23 is as follows: 1. Hardware reset and software initialization is formed from 0 ns to 800 ns, including the initialization of the threshold values and flags. The threshold values have been set to specify a temperature range of - 64°C to 191°C to provide a temperature range that encompasses the military standard of integrated circuits (-50°C to 125°C). Two multi level controllers are defined as fan controllers, and the other two multi-level controllers are defined as power controllers. All the sensors are enabled for threshold checking and offset checking. The initial reading from four temperature sensors is 34°C, 26°C, 25°C and 30°C. The initial duty cycle setting of the two fans is 50%, and both power controller duty cycle are set to 100%. 2. At a time stamp of 1000 ns, the temperature of Sensor 3 rises to 60°C, which triggers an interrupt. The system responds by increasing the driving ratio of Fan #2 and decreasing the system power #1 at time stamp of 1300 ns. 3. The increase of fan speed and reduction of power causes the system to cool down at time stamp of 1450 ns. Thus, all temperature sensors remain in the target range, and the interrupt signal disappears. 4. As the temperature continues to decrease, the temperature of Sensor #3 drops fast, which causes the temperature gradient to exceed a pre- configured limit. An interrupt is triggered at time stamp 1600 ns. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. The processor responds by reducing the other power and increasing the speed of the other fan, which decreases temperature at other locations and thereby eliminates the temperature gradient. Thus, interrupt is withdraw at a time stamp of 1900 ns. 6. From time stamp 2000 ns to 2400 ns, the processor optimizes the configuration values of fan controllers and power management devices, which results in better performance with acceptable temperatures from the sensors. The above simulation demonstrates the ability of the designed thermal management system illustrating a case that handles interrupts from both local overheats and temperature gradients while the real system behavior is an extension of the above simulation and may exhibit different combinations of sensors and different reactions, the simulation scenario validates a sufficient subset of the design to prove its functionality under any real-life circumstance. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.7 Summary This chapter presented implementations of the designed thermal management system and its important components that were proposed in Chapter 3. Two integrated circuits and two circuit module designs have been developed. The implemented thermal management systems are verified for both the embedded system version and SOC version. A layout for SOC version has been generated but not fabricated since the target SOC system is still under development. It is not practical to fabricate a single prototype chip for this module while this thermal management module will not work without the target system. However, since this module was developed using the same CAD tool flow as the first prototype of SOC systems in our research group, results from previous fabrication experiences show that this pure digital design will work within specification without any problem. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Research Contribution and Impact The implementations and results in previous chapters achieved the research objectives defined in the beginning of this research. The approach and implementation have shown the advantages in design of thermal management systems for system-on-chip circuits and advanced computer systems. These unique advantages and breakthroughs are briefly discussed below. 5.1 Comprehensive Analysis and Characterization This research is the first thermal management system proposed for system-on- chip design. A complete study of thermal problems in both aspects of fabrication technology and circuitry was completed in the early stage of this research. This study leads to the design of the thermal evaluation chip and development of the compact thermal model to characterize the thermal behavior of modem integrated circuits designs. The result of this analysis and characterization has identified the importance of on-chip temperature gradient problems, which lead to the novel architecture design of the thermal management system in this system-on-chip era. 5.2 Novel Architecture The architecture of the proposed thermal management system is based on the innovation of temperature gradient management. Compared to other thermal management design as well as thermal management standards and products, this Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is the first thermal management design that focuses on monitoring and controlling the temperature offset between multiple sensors. It is also the first such system that targets system-on-chip designs and is intended to integrate with the target system. Besides that, the architecture design of this research also considers the ability to implement previous standard thermal management approaches like ACPI in an SOC fashion. These design targets limit the process technology and circuit design of components; however, challenges arising from these constraints have been overcome in the proposed novel architecture. 5.3 Low System Overhead for Target Systems The implementation of proposed architecture is optimized in the following two aspects: fabrication process and design flow. A traditional way to implement some of the components used in this design requires the use of linear or high- power processes many components have been redesigned using a pure digital methodology that provides the same function. Also, all the components have been designed using an ASIC cad flow with standard cell libraries to provide the ability to integrate into any SOC design. The proposed architecture also minimizes the overhead of thermal management processing by using an event-driven interrupt-based management system. System resources are required by the thermal management system only at system initialization and defined exception events. Such a design enables Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. integration of thermal management with minimum modifications of the target system. The final system integration also shows that this design has low overhead in terms of circuitry and power increments. The thermal management system requires only 0.44% more circuit area and 1.06% more power dissipation for integration into a sample SOC design. 5.4 Complete Solution of Thermal Management The results of this research provide the first complete thermal management solution for SOC designs, consisting of “management,” “monitoring,” and “modeling” capabilities to cover all aspects of a mature thermal management system. The use of this thermal management system provides the solution to optimize layout, reduce system and package cost, and provide circuit protection to prevent a thermal catastrophe. 5.5 Circuit Impacts The analysis in the early stage of this research shows that by managing and eliminating the temperature gradients in SOC systems, there is a potential to reduce the critical path delay on integrated circuits by 13-15% (30 - 40°C temperature gradients.) Such a performance improvement is about the same scale of the improvement between generations of modem processor. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To summarize, utilization of the proposed thermal management system not only provides the circuit protection, prevents thermal catastrophe, but also improves the performance of target SOC designs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusion This research has successfully developed a systematic solution to thermal management for system-on-chip designs. Instead of worst-case thermal management used in conventional systems, this design targets nominal power dissipation and requires the system to actively manage its thermal activity, including monitoring temperature status and reacting to specified conditions through the control of cooling mechanisms, such as an integrated multi-stage fan controller, to ensure operation within specification. The local heat-up problem has been quantified through an analytical model and test chip. The impact of temperature gradients on circuit behavior has also been discussed. The significant contributions of this research are: 1. The designed architecture strengthens the stability and performance of modem computer systems and high-performance circuits. This yields one generation of improvement in modem processor or other high-speed circuit design using the same silicon process. 2. The design yields a low-cost way to improve system lifetime and reliability for integrated circuits. 3. The design is the first complete thermal management solution for SOC designs, providing “management”, “monitoring”, and “modeling” capabilities while also optimizing layout, reducing system and package cost, and provide protection to the circuit to prevent a thermal catastrophe. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The success of this project offers an opportunity for modem system-on-chip designs to incorporate thermal management techniques to enhance system stability and performance. Circuits like high-speed processors, high-speed mixed- signal design and radio-frequency power stages can take advantage of the proposed architecture to accurately control temperature and its distribution all on the chip. This design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as provides the flexibility to support different management algorithms. The analytical models and thermal evaluation chip implemented in this project also provide heat dissipation design guidelines for critical devices as mentioned. The result of using the proposed models and architecture yields increased system stability and performance with the same technology due to optimal management of thermal behavior. Future work of this research includes at least two directions: 1. Interface circuitry to a larger variety of off-chip temperature sensors and cooling mechanisms should be developed. Such circuits expand the functionality of this design and enable this design to integrate into different scales of computer systems. Thus, modem micro-machining cooling methods can be applied to SOC designs, and modem notebook computers can utilize this design as its thermal management core. 2. A placement tool that is a subset of electronic design automation (EDA) tools for SOC designs, enabling the ability to use the power information Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of each design module on the chip to generate an optimal floorplan with minimum package temperature and on-chip temperature gradients should be developed. The development of these research areas expands this design to different scales of target computer systems and aids the thermal specification of target SOCs in the early stages of design. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reference List [1] J. Driebelbis, J. Barth, H. Kalter, and R. Kho, "Processor-based built-in self-test for embedded DRAM," IEEE Journal o f Solid-State Circuits, vol. 33, pp. 1731-1740,1998. [2] T. Sunaga, H. Miyatake, K. Kasuya, T. Saitoh, M. Tanaka, N. Tanigaki, Y. Mori, and N. Yamasaki, "DRAM macros for ASIC chips.," IEEE Journal o f Solid-State Circuits, vol. 30, pp. 1006-1014, 1995. [3] T. Sunaga, H. Miyatake, K. Kitamura, P. M. Kogge, and E. Retter, "A parallel processing chip with embedded DRAM macros.," IEEE Journal o f Solid-State Circuits, vol. 31, pp. 1556-1559, 1996. [4] H. Sato, et al, "A 500-MHz pipelined burst SRAM with improved SER immunity.," IEEE Journal o f Solid-State Circuits, vol. 34, pp. 1571-1579, 1999. [5] K. Ohsaki, N. Asamoto, and S. Takagaki, "A single ploy EEPROM cell structure for use in standard CMOS processes.," IEEE Journal o f Solid- State Circuits, vol. 29, pp. 311-316, 1994. [6] S. Wong, J. Cho, H. Kang, and C. Ryu, "Reliability of Chemically Vapor Deposited (CVD) Copper Interconnections.," Journal o f Materials Chemistry and Physics, vol. 41, pp. 229-233, 2000. [7] S. Wong, A. Loke, J. Wetzel, P. Townsend, R. Vrtis, and M. Zussman, "Electrical Reliability of Copper and Low-K Dielectric Integration.," presented at Materials Research Society Spring Meeting, San Francisco, California, 1998. [8] J. R. Long and M. A. Copeland, "The modeling, characterization, and design of monolithic inductors for silicon RF IC's.," IEEE Journal o f Solid- State Circuits, vol. 32, pp. 357-369, 1997. [9] D. X. D. Yang, B. Fowler, and A. E. Gamal, "A Nyquist-rate pixel-level ADC for CMOS image sensors.," IEEE Journal o f Solid-State Circuits, vol. 34, pp. 348-356, 1999. [10] C. Kuo, et al, "A 512-kb flash EEPROM embedded in a 32-b microcontroller.," IEEE Journal o f Solid-State Circuits, vol. 27, pp. 574- 582, 1992. [11] J. Draper, J. Block, J. Koller, and C. Steele, "Thermal Management in Embedded Systems using MEMS," presented at Lecture Notes in Computer Science 1388 (IPPS/SPDP'98 Workshops Proceedings), 1998. [12] I. Fried, "Glitch prompts Intel to recall 1.13-GHz Pentiums," http://news.cnet.com, 2000. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [13] I. Fried, "Hardware sites help Intel isolate chip problem," http://news.cnet.com, 2000. [14] S. Musil, "The week in review: Intel hits a speed bump," http://news.cnet.com, 2000. [15] V. A. Khokhlov, et al, "Thermal conductivity in cryolitic melts - new data and its influence on heat transfer in aluminium cells.," presented at 127th TMS Annual Meeting, San Antonio, Texas, 2000. [16] K. E. Goodson and Y. S. Ju, "Heat Conduction in Novel Electronic Films.," in Annual Review o f Materials Science., vol. 29, 1999. [17] J. Sergent and A. Krum, Thermal Management Handbook: for Electronic Assemblies: McGraw-Hill, 1998. [18] Banerjee, Amerasekera, Dixit, and C. Hu, "Measured Impact of low-e During Pulsed Heating (for ESD)," presented at IRPS, 1996. [19] Banerjee, Amerasekera, Cheung, and C. Hu, "Transient Electrical Thermonetry," IEEE Electron Device Letters, vol. 18, 1997. [20] V. Szekely, M. Rencz, and B. Courtois, "Simulation, Testing and Modeling of the Thermal Behavior and Electro-thermal Interaction in ICs, MCMs and PWBs," presented at Southwest Symposium on Mixed-Signal Design, 1999. [21] M. Rencz, V. Szekely, A. Pahi, and A. Poppe, "An Alternative Method for Electro-Thermal Circuit Simulation," presented at Southwest Symposium on Mixed-Signal Design, Tucson, Arizona, 1999. [22] Fletcher, Kino, Quate, and Goodson, "IR Imaging with the Si SIL," presented at Hilton Head Conference, 2000. [23] Y. S. Ju, K. Kurabayashi, and K. E. Goodson, "Thermal Characterization of IC Interconnect Passivation using Joule Heating and OPtical Thermonetry.," Microscale Thermophysical Engineering, vol. 2, pp. 101-110, 1998. [24] Y. S. Ju and K. E. Goodson, "Thermal Mapping of Interconnects Subjected to Brief Electrical Stresses.," IEEE Electron Device Letters, vol. 18, pp. 512-514, 1997. [25] K. E. Goodson and Y. S. Ju, Annual Review o f Materials Science., vol. 29, 1999. [26] G. Digele, S. Lindenkreuz, and E. Kasper, "Fully Coupled Dynamic Electro-Thermal Simulation," IEEE Transactions on VLSI Systems, vol. 5, pp. 250-257, 1997. [27] A. R. Hefner and D. L. Blackburn, "Thermal Component Models for Electro-Thermal Network Simulation," presented at 9th IEEE SEMI THERM, 1993. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [28] M.-N. Sabry, A. Bontemps, V. Aubert, and R. Vahrmann, "Realistic and Efficient Simulation of Electro-Thermal Effects in VLSI Circuits," IEEE Transactions on VLSI Systems, vol. 5, pp. 283-289, 1997. [29] S. Wunsche, C. Clauss, P. Schwarz, and F. Winkler, "Electro-Thermal Circuit Simulation Using Simulator Coupling," IEEE Trans on VLSI Systems, vol. 5, pp. 277-282, 1997. [30] V. Szekely, M. Rencz, and B. Courtois, "Thermal Testing Methods to Increase System Reliability," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. [31] V. Szekely, A. Poppe, A. Pahi, A. Csendes, G. Hajas, and M. Rencz, "Electro-Thermal and Logic-Thermal Simulation of VLSI Designs," IEEE Transactions on VLSI Systems, vol. 5, pp. 258-269, 1997. [32] M. Zubert, A. Napieralski, and M. Napieralska, "The new general method for thermal and electro-thermal model reduction.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [33] P. Dziurdzia and A. Kos, "Electro-thermal simulations of power feedback in active cooling of microstructures.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [34] G. Breglio, N. Rinaldi, and P. Spirito, "3D dynamic electro-thermal simulator applied to a new cellular power MOS affected by electrothermal instability.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [35] V. Szekely, A. Poppe, A. Pahi, A. Csendes, G. Hajas, and M. Rencz, "Electro-Thermal and Logi-Thermal Simulation of VLSI Designs.," IEEE trans. on VLSI Systems, vol. 5, pp. 258-269, 1997. [36] S. Wunsche, C. Claub, and P. Schwarz, "Electro-Thermal Circuit Simulation Using Simulator Coupling." IEEE trans. on VLSI Systems, vol. 5, pp. 277-282, 1997. [37] M.-N. Sabry, A. Bontemps, V. Aubert, and R. Vahrman, "Realistic and Efficient Simulation of Electro-Thermal Effects in VLSI Circuits.," IEEE trans. on VLSI Systems, vol. 5, pp. 283-289, 1997. [38] V. Szekely, C. Marta, Z. Kohari, and M. Rencz, "CMOS Sensors for On- Line Thermal Monitoring of VLSI Circuits," IEEE Transactions on VLSI Systems, vol. 5, pp. 270-276, 1997. [39] L. Luh, J. Choma, Jr., J. Draper, and H. Chiueh, "A High-Speed CMOS on- Chip Temperature Sensor," presented at European Solid-State Circuits Conference (ESSCIRC99), 1999. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [40] A. Claassen and H. Shaukatullah, "Comparison of Diodes and Resistors for Measuring Chip Temperature During Thermal Characterization of Electronic Packages Using Thermal Test Chips," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. [41] H. Shaukatullah, "A Method of Using Thermal Test Chips With Diodes for Thermal Characterization of Electronic Packages Without Calibration," presented at 11th SEMI-THERM Symposium, 1995. [42] C. S. Steel, J. Draper, J. Roller, and C. LaCour, "A Bus-Efficient Low- Latency Network Interface for the PDSS Multicomputer," presented at the International Symposium on High Performance Distributed Computing, 1997. [43] National Semiconductor Corp., "LM84: Diode Input Digital Temperature Sensor with Two-wire Interface," DS100961, 2000. [44] National Semiconductor Corp., "LM83: Triple -Diode Input and Local Digital Temperature Sensor with Two-wire Interface," DS 101058, 1999. [45] J. H. Lau, Low Cost Flip Chip Technology. McGraw-Hill, 2000. [46] K. Weed and A. Kirkpatrick, "An Experimental Investigation of the Emissivity of Various Electronic Packages," presented at 12th IEEE SEMI THERM Symposium, 1996. [47] T. Franke and U. Froehler, "Thermal modelling of semiconductor packages.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [48] C. J. M. Lasance, "Thermal Characterization of Electronic Parts with Compact Models: Interpretation, Application, and the Need for a Paradigm Shift," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. [49] H. Vinke and C. J. M. Lasance, "Recent Achievement in the Thermal Characterization of Electronic Devices by Means of Boundary Condition Independent Compact Models," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. [50] K. V. Damme, M. Baelmans, F. Christiaens, and W. Nelemans, "On the use of compact models for board level thermal simulations.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [51] E. Driessens, et al, "Parametric compact models for the 72-pins polymer stud grid array.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [52] Y. K. Cheng and W. M. Kang, "Improvement on the fast-timing simulator - ILLIADS-T.," presented at IEEE International Symposium on Circuits and Systems, 1996. [53] T. Li, C.-H. Tsai, E. Rosenbaum, and S. M. Kang, "Efficient Transient Electrothermal Simulation of CMOS VLSI Circuits under Electriacal Overstress.," presented at International Conference on Computer-Aided Design, 1998. [54] W. K. Chu and W. H. Kao, "A three-dimensional transient electrothermal simulation system for IC's.," presented at 1st THERMINIC Workshop, Grenoble, France, 1995. [55] J. P. Fradin and B. Desaunettes, "Automatic Computation of Conductive Conductances Intervening in the Thermal Chain.," presented at 25th International conference on Environmental Systems, 1995. [56] I. L. Wemple and A. T. Yang, "Integrated circuit Substrate Coupling Models Based on Voronoi Tesselation," IEEE Trans, on CAD o f ICs and Systems, vol. 14, pp. 1459-1469, 1995. [57] K. J. Kems, I. L. Wemple, and A. T. Yang, "Stable and Efficient Reduction of Substrate Model Networks Using Congruence Trasforms.," presented at International Conference on Computer-Aided Design, 1995. [58] H. Chiueh, J. Draper, L. Luh, and J. Choma, Jr., "A Thermal Evaluation of Integrated Circuits: On Chip Offset Temperature Measurement and Modeling," presented at 2nd International Workshop on Design of Mixed- Mode Integrated Circuits and Applications, Guanajuato, Mexico, 1998. [59] Y. S. Ju, et al, "Short-Timescale Thermal Mapping of Semiconductor Devices.," IEEE Electron Device Letters, vol. 18, pp. 169-171, 1997. [60] P. Allen and D. Holberg, CMOS Analog Circuit Design: Oxford, 1987. [61] T. Lee, The Design o f CMOS Radio-Frequency Integrated Circuits: Cambridge Univesity Press, 1998. [62] S. Soclof, Applications o f Analog Integrated Circuits: Prentice-Hall, Inc., 1985. [63] N. H. E. Weste and K. Eshraghian, Principles o f CMOS VLSI design: a system perspective, 2nd ed: Addison Wesley, 1992. [64] M. Pecht, P. Lall, and E. B. Hakin, "The Influence of temperature on integrated circuit failure mechanisms," Quality and Reliability Engineering International, vol. 8, pp. 167-175, 1992. [65] F. Jensen, Electronic Component Reliability: John Wiley & Sons Ltd, 1995. [66] Intel Corp., "Pentium III Processor for the SC242 at 450 MHz to 1.13 GHz,", July 2000. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [67] http://www.tomshardware.com., 1998. [68] Intel Corp., "AP-905 Pentium III Processor Thermal Design Guidelines," March 1999. [69] J. Steele, "ACPI Thermal Sensing and Control in the PC," presented at Wescon 98, Anaheim, California, 1998. [70] Intel Corp., Microsoft, and Toshiba, "Advanced Configuration and Power Interface Specification,", February 2, 1999. [71] Compaq, Intel, Microsoft, Phoenix, and Toshiba, "Advanced Configuration and Power Interface Specification,", Revision 2.0, July 27, 2000. [72] H. Chiueh, J. Draper, and J. Choma, Jr., "A Programmable Thermal Management Interface Circuit for PowerPC Systems," presented at 6th International Workshop on thermal investigation of ICs and Systems, Budapest, Hungary, 2000. [73] A. D. Kraus and A. Bar-Cohen, Desing and Analysis o f Heat Sinks. New York, New York: John Wiley & Sons, Inc, 1995. [74] M. N. Touzelbaev and K. E. Goodson, "Applications of Micron-Scale Passive Diamond Layers for the IC and MEMS Industrie.," Diamond and Related Materials, vol. 7, pp. 1-14, 1998. [75] M.-N. Sabry, "Transverse temperature gradient effect on fin efficiency for micro-channel design.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [76] Goodson, Stantiago, T. W. Kenny, Carruthers, and Towe, "Electrokinetic Micro Coolers," presented at International Interconnect Technology Technology, San Francisco, CA, 2000. [77] "Processors rev up batteries." in Computer Dealer News, 2000. [78] Transmeta Corp, "Cursoe Processor," http://www.transmeta.com, 2000. [79] H. Chiueh, L. Luh, J. Draper, and J. Choma, Jr., "A Novel Fully Integrated Fan Controller for Advanced Computer System," presented at Southwest Symposium on Mixed-Signal Design, San Diego, California, 2000. [80] Motorla, PowerPC 604 RISC Microprocessor User's Manual, 1994. [81] L. Luh, J. Choma, Jr., and J. Draper, "Feed-Forward Gain Compensation for CMOS Continuous-Time Sigma-Delta Modulators," presented at IEEE International Conference on Electronics, Circuits and Systems, 1999. [82] L. Luh, J. Choma, Jr., J. Draper, and H. Chiueh, "A High-Speed Digital Comb Filter for Sigma-Delta Analog-to-Digital Conversion," presented at IEEE Midwest Sympoisum on Circuits and Systems, 1999. [83] Viewlogic Inc., "Powerview," http://www.viewlogic.com, 1999. 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [84] R. W. Brodersen, Anatomy o f a silicon compiler, 1992. [85] A. Salz and M. Horowitz, "IRSIM: An incremental MOS switch-level simulator," presented at 26th Annu. ACM/IEEE Design Automation Conference, 1989. [86] M. Hall, P. Kogge, J. Roller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, W. Athas, A. Srivastava, J. Shin, and J. Park, "Mapping Irregular Computations to DIVA, a Data-Intensive Architecture," inProc. of SC ‘99, 1999. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Alphabetized Bibliography P. Allen and D. Holberg, CMOS Analog Circuit Design: Oxford, 1987. Banerjee, Amerasekera, Cheung, and C. Hu, "Transient Electrical Thermonetry," IEEE Electron Device Letters, vol. 18, 1997. Banerjee, Amerasekera, Dixit, and C. Hu, "Measured Impact of low-e During Pulsed Heating (for ESD)," presented at IRPS, 1996. G. Breglio, N. Rinaldi, and P. Spirito, "3D dynamic electro-thermal simulator applied to a new cellular power MOS affected by electrothermal instability.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. R. W. Brodersen, Anatomy o f a silicon compiler, 1992. Y. K. Cheng and W. M. Kang, "Improvement on the fast-timing simulator - ILLIADS-T.," presented at IEEE International Symposium on Circuits and Systems, 1996. H. Chiueh, Jeffrey Draper, John Choma, Jr., "A Programmable Thermal Management Interface Circuit for PowerPC Systems," Microelectronics Journal, Vol 32/10-11, pp875-881, 2001. H. Chiueh, Jeffrey Draper, John Choma, Jr., "A dynamic thermal management circuit for system-on-chip designs," 2001 International IEEE Conference on Electronics, Circuits, and Systems (ICECS 2001), Malta, Sep 2001. H. Chiueh, Jeffrey Draper, John Choma, Jr., "A Thermal management system and prototyping for system-on-chip designs," 2001 Southwest Symposium on mixed- signal design (SSMSD2001), Austin, Texas, Feb 2001. H. Chiueh, Jeffrey Draper, John Choma, Jr., "A Programmable Thermal Management Interface Circuit for PowerPC Systems," 6th International Workshop on thermal investigation of ICs and Systems, Sept 2000. H. Chiueh, Jeffrey Draper, John Choma, Jr., "Implementation of a Temperature Monitoring Interface Circuit for PowerPC Systems," The 43rd Midwest Symposium on Circuits and Systems, Aug 2000. H. Chiueh, Louis Luh, Jeffrey Draper, John Choma, Jr., "A Novel Fully Integrated Fan Controller for Advanced Computer Systems," 2000 Southwest Symposium on Mixed-Signal Design (SSMSD2000), San Diego, Feb 2000. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. H. Chiueh, Jeffrey Draper, Louis Luh and John Choma, Jr., "A Novel Model for On-chip Heat Dissipation," 1998 IEEE Asia-Pacific Conference on Circuits and Systems, Chiangmai, Thailand, 1998. H. Chiueh, Jeffrey Draper, Louis Luh and John Choma, Jr., "A Thermal Evaluation of Integrated Circuits: On Chip Offset Temperature Measurement and Modeling," the 2nd International Workshop on Design of Mixed-Mode Integrated Circuits and Applications, Guanajuato, Mexico, 1998. W. K. Chu and W. H. Kao, "A three-dimensional transient electrothermal simulation system for IC's.," presented at 1st THERMINIC Workshop, Grenoble, France, 1995. A. Claassen and H. Shaukatullah, "Comparison of Diodes and Resistors for Measuring Chip Temperature During Thermal Characterization of Electronic Packages Using Thermal Test Chips," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. Compaq, Intel, Microsoft, Phoenix, and Toshiba, "Advanced Configuration and Power Interface Specification,", Revision 2.0, July 27, 2000. "Processors rev up batteries." in Computer Dealer News, 2000. K. V. Damme, M. Baelmans, F. Christiaens, and W. Nelemans, "On the use of compact models for board level thermal simulations.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. [26] G. Digele, S. Lindenkreuz, and E. Kasper, "Fully Coupled Dynamic Electro-Thermal Simulation," IEEE Transactions on VLSI Systems, vol. 5, pp. 250-257, 1997. J. Draper, J. Block, J. Koller, and C. Steele, "Thermal Management in Embedded Systems using MEMS," presented at Lecture Notes in Computer Science 1388 (IPPS/SPDP'98 Workshops Proceedings), 1998. J. Driebelbis, J. Barth, H. Kalter, and R. Kho, "Processor-based built-in self-test for embedded DRAM," IEEE Journal o f Solid-State Circuits, vol. 33, pp. 1731- 1740, 1998. E. Driessens, et al, "Parametric compact models for the 72-pins polymer stud grid array.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. P. Dziurdzia and A. Kos, "Electro-thermal simulations of power feedback in active cooling of microstructures.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fletcher, Kino, Quate, and Goodson, "IR Imaging with the Si SIL," presented at Hilton Head Conference, 2000. J. P. Fradin and B. Desaunettes, "Automatic Computation of Conductive Conductances Intervening in the Thermal Chain.," presented at 25th International conference on Environmental Systems, 1995. T. Franke and U. Froehler, "Thermal modelling of semiconductor packages.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. I. Fried, "Glitch prompts Intel to recall 1.13-GHz Pentiums," http://news.cnet.com, 2000. I. Fried, "Hardware sites help Intel isolate chip problem," http://news.cnet.com, 2000. Goodson, Stantiago, T. W. Kenny, Carruthers, and Towe, "Electrokinetic Micro Coolers," presented at International Interconnect Technology Technology, San Francisco, CA, 2000. K. E. Goodson and Y. S. Ju, "Heat Conduction in Novel Electronic Films.," in Annual Review o f Materials Science., vol. 29, 1999. K. E. Goodson and Y. S. Ju, Annual Review o f Materials Science., vol. 29, 1999. M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, W. Athas, A. Srivastava, J. Shin, and J. Park, "Mapping Irregular Computations to DIVA, a Data-Intensive Architecture," in Proc. of SC ‘99, 1999. A. R. Hefner and D. L. Blackburn, "Thermal Component Models for Electro- Thermal Network Simulation," presented at 9th IEEE SEMI-THERM, 1993. http://www.tomshardware.com., 1998. Intel Corp., "Pentium III Processor for the SC242 at 450 MHz to 1.13 GHz,", July 2000. Intel Corp., "AP-905 Pentium III Processor Thermal Design Guidelines," March 1999. Intel Corp., Microsoft, and Toshiba, "Advanced Configuration and Power Interface Specification,", February 2, 1999. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Y. S. Ju and K. E. Goodson, "Thermal Mapping of Interconnects Subjected to Brief Electrical Stresses.," IEEE Electron Device Letters, vol. 18, pp. 512-514, 1997. Y. S. Ju, K. Kurabayashi, and K. E. Goodson, "Thermal Characterization of IC Interconnect Passivation using Joule Heating and OPtical Thermonetry.," Microscale Thermophysical Engineering, vol. 2, pp. 101-110,1998. Y. S. Ju, et al, "Short-Timescale Thermal Mapping of Semiconductor Devices.," IEEE Electron Device Letters, vol. 18, pp. 169-171, 1997. F. Jensen, Electronic Component Reliability: John Wiley & Sons Ltd, 1995. K. J. Kerns, I. L. Wemple, and A. T. Yang, "Stable and Efficient Reduction of Substrate Model Networks Using Congruence Trasforms.," presented at International Conference on Computer-Aided Design, 1995. V. A. Khokhlov, et al, "Thermal conductivity in cryolitic melts - new data and its influence on heat transfer in aluminium cells.," presented at 127th TMS Annual Meeting, San Antonio, Texas, 2000. A. D. Kraus and A. Bar-Cohen, Desing and Analysis o f Heat Sinks. New York, New York: John Wiley & Sons, Inc, 1995. C. Kuo, et al, "A 512-kb flash EEPROM embedded in a 32-b microcontroller.," IEEE Journal o f Solid-State Circuits, vol. 27, pp. 574-582, 1992. J. R. Long and M. A. Copeland, "The modeling, characterization, and design of monolithic inductors for silicon RF IC's.," IEEE Journal o f Solid-State Circuits, vol. 32, pp. 357-369, 1997. L. Luh, J. Choma, Jr., J. Draper, and H. Chiueh, "A High-Speed CMOS on-Chip Temperature Sensor," presented at European Solid-State Circuits Conference (ESSCIRC99), 1999. C. J. M. Lasance, "Thermal Characterization of Electronic Parts with Compact Models: Interpretation, Application, and the Need for a Paradigm Shift," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. J. H. Lau, Low Cost Flip Chip Technology: McGraw-Hill, 2000. T. Lee, The Design o f CMOS Radio-Frequency Integrated Circuits: Cambridge Univesity Press, 1998. T. Li, C.-H. Tsai, E. Rosenbaum, and S. M. Kang, "Efficient Transient Electrothermal Simulation of CMOS VLSI Circuits under Electriacal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Overstress.," presented at International Conference on Computer-Aided Design, 1998. L. Luh, J. Choma, Jr., and J. Draper, "Feed-Forward Gain Compensation for CMOS Continuous-Time Sigma-Delta Modulators," presented at IEEE International Conference on Electronics, Circuits and Systems, 1999. L. Luh, J. Choma, Jr., J. Draper, and H. Chiueh, "A High-Speed Digital Comb Filter for Sigma-Delta Analog-to-Digital Conversion," presented at IEEE Midwest Sympoisum on Circuits and Systems, 1999. S. Musil, "The week in review: Intel hits a speed bump," http://news.cnet.com, 2000. Motorla, PowerPC 604 RISC Microprocessor User's Manual, 1994. National Semiconductor Corp., "LM84: Diode Input Digital Temperature Sensor with Two-wire Interface," DS100961, 2000. National Semiconductor Corp., "LM83: Triple -Diode Input and Local Digital Temperature Sensor with Two-wire Interface," DS101058, 1999. K. Ohsaki, N. Asamoto, and S. Takagaki, "A single ploy EEPROM cell structure for use in standard CMOS processes.," IEEE Journal o f Solid-State Circuits, vol. 29, pp. 311-316, 1994. M. Pecht, P. Lall, and E. B. Hakin, "The Influence of temperature on integrated circuit failure mechanisms," Quality and Reliability Engineering International, vol. 8, pp. 167-175,1992. M. Rencz, V. Szekely, A. Pahi, and A. Poppe, "An Alternative Method for Electro-Thermal Circuit Simulation," presented at Southwest Symposium on Mixed-Signal Design, Tucson, Arizona, 1999. M.-N. Sabry, A. Bontemps, V. Aubert, and R. Vahrman, "Realistic and Efficient Simulation of Electro-Thermal Effects in VLSI Circuits.," IEEE trans. on VLSI Systems, vol. 5, pp. 283-289, 1997. M.-N. Sabry, "Transverse temperature gradient effect on fin efficiency for micro- channel design.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. A. Salz and M. Horowitz, "IRSIM: An incremental MOS switch-level simulator," presented at 26th Annu. ACM/IEEE Design Automation Conference, 1989. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. H. Sato, et al, "A 500-MHz pipelined burst SRAM with improved SER immunity.," IEEE Journal o f Solid-State Circuits, vol. 34, pp. 1571-1579, 1999. J. Sergent and A. Krum, Thermal Management Handbook: for Electronic Assemblies'. McGraw-Hill, 1998. T. Sunaga, H. Miyatake, K. Kasuya, T. Saitoh, M. Tanaka, N. Tanigaki, Y. Mori, and N. Yamasaki, "DRAM macros for ASIC chips.," IEEE Journal o f Solid-State Circuits, vol. 30, pp. 1006-1014, 1995. T. Sunaga, H. Miyatake, K. Kitamura, P. M. Kogge, and E. Retter, "A parallel processing chip with embedded DRAM macros.," IEEE Journal o f Solid-State Circuits, vol. 31, pp. 1556-1559, 1996. V. Szekely, A. Poppe, A. Pahi, A. Csendes, G. Hajas, and M. Rencz, "Electro- Thermal and Logic-Thermal Simulation of VLSI Designs," IEEE Transactions on VLSI Systems, vol. 5, pp. 258-269, 1997. V. Szekely, M. Rencz, and B. Courtois, "Thermal Testing Methods to Increase System Reliability," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. V. Szekely, M. Rencz, and B. Courtois, "Simulation, Testing and Modeling of the Thermal Behavior and Electro-thermal Interaction in ICs, MCMs and PWBs," presented at Southwest Symposium on Mixed-Signal Design, 1999. V. Szekely, C. Marta, Z. Kohari, and M. Rencz, "CMOS Sensors for On-Line Thermal Monitoring of VLSI Circuits," IEEE Transactions on VLSI Systems, vol. 5, pp. 270-276, 1997. H. Shaukatullah, "A Method of Using Thermal Test Chips With Diodes for Thermal Characterization of Electronic Packages Without Calibration," presented at 11th SEMI-THERM Symposium, 1995. C. S. Steel, J. Draper, J. Koller, and C. LaCour, "A Bus-Efficient Low-Latency Network Interface for the PDSS Multicomputer," presented at the International Symposium on High Performance Distributed Computing, 1997. J. Steele, "ACPI Thermal Sensing and Control in the PC," presented at Wescon 98, Anaheim, California, 1998. S. Soclof, Applications o f Analog Integrated Circuits: Prentice-Hall, Inc., 1985. M. N. Touzelbaev and K. E. Goodson, "Applications of Micron-Scale Passive Diamond Layers for the IC and MEMS Industrie.," Diamond and Related Materials, vol. 7, pp. 1-14, 1998. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Transmeta Corp, "Cursoe Processor," http://www.transmeta.com, 2000. H. Vinke and C. J. M. Lasance, "Recent Achievement in the Thermal Characterization of Electronic Devices by Means of Boundary Condition Independent Compact Models," presented at 13th IEEE SEMI-THERM Symposium, Austin, Texas, 1997. Viewlogic Inc., "Powerview," http://www.viewlogic.com, 1999. K. Weed and A. Kirkpatrick, "An Experimental Investigation of the Emissivity of Various Electronic Packages," presented at 12th IEEE SEMI-THERM Symposium, 1996. I. L. Wemple and A. T. Yang, "Integrated circuit Substrate Coupling Models Based on Voronoi Tesselation," IEEE Trans, on CAD o f ICs and Systems, vol. 14, pp. 1459-1469, 1995. N. H. E. Weste and K. Eshraghian, Principles o f CMOS VLSI design: a system perspective, 2nd ed: Addison Wesley, 1992. S. Wong, J. Cho, H. Kang, and C. Ryu, "Reliability of Chemically Vapor Deposited (CVD) Copper Interconnections.," Journal o f Materials Chemistry and Physics, vol. 41, pp. 229-233, 2000. S. Wong, A. Loke, J. Wetzel, P. Townsend, R. Vrtis, and M. Zussman, "Electrical Reliability of Copper and Low-K Dielectric Integration.," presented at Materials Research Society Spring Meeting, San Francisco, California, 1998. S. Wunsche, C. Clauss, P. Schwarz, and F. Winkler, "Electro-Thermal Circuit Simulation Using Simulator Coupling," IEEE Trans on VLSI Systems, vol. 5, pp. 277-282, 1997. D. X. D. Yang, B. Fowler, and A. E. Gamal, "A Nyquist-rate pixel-level ADC for CMOS image sensors.," IEEE Journal o f Solid-State Circuits, vol. 34, pp. 348- 356, 1999. M. Zubert, A. Napieralski, and M. Napieralska, "The new general method for thermal and electro-thermal model reduction.," presented at 6th International Workshop on Thermal Investigations of ICs and Systems, Budapest, Hungary, 2000. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A: Detailed Mathematical Derivation of Analytical Thermal Model Formulation of the problem: Let r = j (x, y, z)| Assume there is a unit heat put at the point (0,0,0), we want to know the temperature distribution of Jat any given moment in the following time. Suppose the temperature function is u(x,y,z,t) with x,y,zer, t >0 Assume the conductivity of F is 1, then T must satisfies the equation The boundary condition is prescribed by a Neumann problem (adiabatic) on 5 faces of r other than the bottom face, and a Dirichlet (isothermal) problem on the bottom face, i.e. From theory of partial differential equation, we know this problem have a unique solution of u. Let’s assume dt = 0 , where d r is the boundary of r and D is the bottom face of f , also, U = 0 D u(x,y,z,t) = uj(x,t) U 2(y,t) us(z,t) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. then from Libniz rule we know A— ^-l-w(x,j>,0= v d t) ■ u 2(y>t)-u3 (z,t) + ux {x, t) lV A - i . ’ dt u2 (y,t) u fz ,t) \( a 1 + w , (x, f)-w2(y, f)- -C ' ■ufz,t) Therefore if we can solve ui(x,t), U 2(y,t) and U 3(z,t) satisfying the prescribed boundary conditions, then we can solve u. I. To solve U 3(z,t)\ We want to solve U 3(z,t) which satisfies 5 Azu3(z,t) =— u3(z,t) s.t. — - =0,and w 3(-l,t) = 0 \/t> 0 dt dz z = 0 and u3(z,0) is a 8-function \§ if z ■£§ ri £0O ) = j n and l S 0(z)dz = l [ o o j / z = 0 J0 The heat kernel, H(z,w,t) is defined as the temperature at z at time > 1 ,when we put a unit heat on w at t=0. A well know formula of H(z,w,t) is H (z, w, 0 = x e v ' fk O) ‘ fk O ) k = 0 where fk is the roots eigen function of A with eigen value fu n d e r t given boundary condition s-f. £//(>)<&=i therefore, « 3 ( Z , 0 = 2 ^ V ' ^ (Z) • A ( ° ) k= 0 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let’s delay this derivation of u(z,w,t) for a while. To compute the fk , we have the following equation /*"(*) = V /* (z) eJ. f k'( 0) = 0, /(-1) = 0, £eZ +u{0} It is easy to see that {fk } f 0 should be |V 2 cos(~ Z X V2 cos(-y z ), V2 cos(~~ z)>'''>^2 cos(^ + ^ kz),--- c- 1 9 7 1 )2 2 5 7 1 )2 with Eigen v a lu e s ,------- ,--------- 4 4 4 (2k + 1)2 2 5 » • Therefore, u3(z,t) = 2- v (2k+ l)2 , 2& + 1 2 , exP k= 0 --------------— K t 4 •cos ----------KZ 2 Similarly, uj(x,t) and U 2(y,t) can be ordered in this way: /;m =v . « /;(-^)=/;(|)=o It turns out that {fk} should be jl, V2 sin ;cc, V2 cos 2;zx, V2 sin 3 7 E c, y/l cos 4tec, • • •, 4 l sin(2 k -1 )jzx, cos 2 k7ix} with Eigen values ^ ),k 2 ,4k2 ,9k2 ,\6 k 2 ,-■ ■ ] since sin0=0 we have w,(x,/) = 2 -X e -4* 2 ”2 ' • cos(2k7ix) k = 0 O O u2 (y,t) = 2 • ^ e“4 * *' • cos(2kKy) Therefore, ir=0 w (x ,y ,z ,r ) = 8' e 4 < : * ‘ . Qos(lkjcc) ■ cos(2kJty) k = 0 4 = 0 2 y t + 1 4 -C O S ( 7ZZ) 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix B: Detailed Simulation Results o c o o o C O a\ CD CO CO LL CO CO 0 0 O J LL OJ O O O o — - o o o u o o OJ u o o OJ u o V O LL LL O UO LL o > OJ 0 0 0 3 O LL LL LL O 0 3 LL O LL LL 0 3 O m 00 LL 0J 00 O O CO LLI O O O O O O 0 0 LL LL O O CO LL O O — M - OJ LL O LL 3 O O o o o 3 o o o o 3 3 3 3 IT) O O O CO o CO OJ CL Q_ Q_ .C L LJJ LU LU LU Figure B.l Detailed simulation results 1 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1000 1100 1200 1300 1400 1500 1600 1700 1800 o o C O o o o u_ LL O LL O CO CO LO LL CO CO LL O O CO C O LL LL LL LL CO C O CD CO O O LL LL CO CO h - CO CO LL © ^ Is- h - 'C 3 © © N - h - © O „ O S i. © ® c u e < d eg oj o i— ^ ^ J i ^ c O c ^ ^ a S m 00 o L i . 9 U 1 5 Q . Q . L U L U U J U J L U L U L i J L l J L D i l J l i j CO o < 1 ) TD C O C UJ llj A Q_ h - h - r ^ > £ © , © , © © & o c \f C O ■ Q Q . C L Q _ Q . CO E E E E 0) 0) £ o > 0) ll j w ll j m A A A A Figure B.2 Detailed simulation result #2 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. /E/interrupt 1900 2000 2100 2200 2300 2400 2500 2600 o LL LL LL O LL O LL LL O O CO LL LU LL C M O O O CO w LL O ) O o cz o > o o o o o LL LU O O LL LU CO O O LL CO CO CO i n CO LO CO o o o C M Q. CO Q_ a . o Q. p LU Figure B.3 Detailed simulation results #3 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. /E/interrupt
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A passive RLC notch filter design using spiral inductors and a broadband amplifier design for RF integrated circuits
PDF
High performance crossbar switch design
PDF
Design and analysis of ultra-wide bandwidth impulse radio receiver
PDF
CMOS gigahertz -band high -Q filters with automatic tuning circuitry for communication applications
PDF
A 1.2 V micropower CMOS active pixel sensor
PDF
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
Gyrator-based synthesis of active inductances and their applications in radio -frequency integrated circuits
PDF
Clustering techniques for coarse -grained, antifuse-based FPGAs
PDF
A CMOS frequency channelized receiver for serial-links
PDF
High -speed CMOS continuous -time switched -current sigma -delta modulators
PDF
Encoding techniques for energy -efficient and reliable communication in VLSI circuits
PDF
Information hiding in digital images: Watermarking and steganography
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
High-frequency mixed -signal silicon on insulator circuit designs for optical interconnections and communications
PDF
Experimental demonstration of optical router and signal processing functions in dynamically reconfigurable wavelength-division-multiplexed fiber -optic networks
PDF
A unified Bayesian and logical approach for video-based event recognition
PDF
Effects of non-uniform substrate temperature in high-performance integrated circuits: Modeling, analysis, and implications for signal integrity and interconnect performance optimization
PDF
Dynamic voltage and frequency scaling for energy-efficient system design
PDF
Code assignment and call admission control for OVSF-CDMA systems
Asset Metadata
Creator
Chiueh, Herming
(author)
Core Title
A thermal management design for system -on -chip circuits and advanced computer systems
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Choma, John (
committee chair
), Draper, Jeffrey (
committee member
), Sadhal, Satwindar (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-275266
Unique identifier
UC11339958
Identifier
3094315.pdf (filename),usctheses-c16-275266 (legacy record id)
Legacy Identifier
3094315.pdf
Dmrecord
275266
Document Type
Dissertation
Rights
Chiueh, Herming
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical