Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Optimizing power delivery networks in VLSI platforms
(USC Thesis Other)
Optimizing power delivery networks in VLSI platforms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
OPTIMIZING POWER DELIVERY NETWORKS IN VLSI PLATFORMS by Woojoo Lee A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2015 Copyright 2015 Woojoo Lee Abstract While power minimization and dynamic power management (DPM) techniques have been heavily explored to improve the power efficiency of devices inside various VLSI platforms, there is one critical factor that is often overlooked, which is the power con- version efficiency of power delivery network (PDN). PDN is an essential part of VLSI platforms to deliver the power to all devices in a platform from a power source. More precisely, modern VLSI platforms are equipped with many devices, each requiring its own supply voltage level which is typically different from those of other devices in the platform. For example, a smartphone platform is powered by a secondary (rechargeable) Li- Ion battery comprised of a single battery cell providing 3.7V voltage level. This cell output voltage must be converted and regulated to different pre-determined voltage lev- els and distributed to various devices such as processors, memory, display, GPS, etc. in the platform. To support those different voltage levels of the devices, the PDNs con- sist of multiple voltage regulators (VRs, also known as DC-DC converters), which play a pivotal role of the power conversion and regulation. In reality, the VRs in the PDN inevitably dissipate power, and power dissipations from all VRs inside a platform can result in a considerable amount of power loss. For example, experiment with a modern smartphone platform shows that normally 25-40% battery power is dissipated from the PDN. ii This dissertation focuses on the power conversion efficiency of PDN, and propose novel methods to improve it. First, targeting PDNs in smartphone platforms, I propose optimization methods to enhance the power conversion efficiency of each single VR in a PDN. Starting from detailed models of the VR designs, the two optimization methods are presented: (i)staticswitchsizing (S3) to maximize the efficiency of a VR under statistical loading profiles, and (ii)dynamicswitchmodulation (DSM) to achieve the high efficiency enhancement under dynamically varying load conditions. To verify the efficacies of the proposed methods, a PDN characterization procedure for actual smart- phone platforms is also proposed. The procedure is as follows:(i) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels, and (ii) build an equivalent VR model for the power deliv- ery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent VR. Experimental re- sults demonstrate that the S3 can achieve 6% power conversion efficiency enhancement, which translates to 19% reduction in the power losses under the general usages of the smartphone. The DSM accomplishes the similar improvement at the same condition, while it also can achieve the high efficiency enhancement in the various load conditions. Next, targeting PDNs in chip multi-core processors (CMPs), I present optimization methods to enhance the power conversion efficiency of multiple VRs in a PDN. The emerging trend toward utilizing CMPs that supportdynamicvoltageandfrequency scaling (DVFS) is driven by user requirements for high performance and low power. To overcome limitations of the conventional chip-wide DVFS and achieve the maximum possible energy saving, per-core DVFS is being enabled in the recent CMP offerings. While power consumed by the CMP is reduced by per-core DVFS, power dissipated by many VR needed to support per-core DVFS becomes critical. Therefore, I focus on the dynamic control of the multiple VRs in the CMP platform. Starting with a proposed iii platform with a reconfigurable VR-to-core power distribution network, two optimiza- tion methods are presented to maximize the system-wide energy savings: (i) reactive VR consolidation to reconfigure the network for maximizing the power conversion effi- ciency of the VRs performed under the pre-determined DVFS levels for the cores, and (ii) proactive VR consolidation to determine new DVFS levels for maximizing the to- tal energy savings without any performance degradation. Along with the optimization methods for the PDN composed of homogeneous VRs, I also discuss the PDN with het- erogeneous VRs, which is proposed to increase the benefits of the VR consolidation by incorporating VRs with a larger driving capability of load current. Results from detailed simulations based on realistic experimental setups demonstrate up to 36% VR energy loss reduction and 9% total energy saving. Then, I move the target platform to OLED display platforms, and propose a method to optimize the PDN in a large OLED display platform. Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone- specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedi- cated VR for each zone and low conversion efficiency when the load current of each VR lies out of the desirable range. To address this issue, I exploit a reconfigurable power delivery network architecture, comprised of a small number of VRs, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. The exper- imental results show that up to 36% power savings can be achieved in a 65" 4K Ultra high-definition OLED display by using the proposed framework. iv To my parents, wife and family. Thank you for all of the unwavering love, support, encouragement and dedication. v Acknowledgements Many thanks go to my advisor, professor Massoud Pedram, for his unwavering support and invaluable advice on this research; his dedication to my work is very much appre- ciated. Other professors at USC to whom I am grateful are professor Sandeep Gupta, Aiichiro Nakano, Murali Annavaram, Paul Bogdan and Shahin Nazarian. Thanks also to professor Naehyuck Chang at KAIST. I would like to appreciate to my supportive col- league at SPORT lab.: Yanzhi Wang, Qing Xie, Xue Lin, Di Zhu, Tiansong Cui, Shuang Cheng, Mohammad Javd Dousti, Alireza Shafaei, and Majid Ghasemi-Gol. I would like to thank my friends from SNU, Donghwa Shin, Younghyun Kim, Sangyoung Park, and Jaehyun Park. vi Table of Contents Abstract ii Dedication v Acknowledgements vi List of Figures x Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Power conversion efficiency of the PDN . . . . . . . . . . . . . 1 1.1.2 VR models and characteristics . . . . . . . . . . . . . . . . . . 3 1.1.3 Related Research and Limitations . . . . . . . . . . . . . . . . 8 1.1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2 Optimizing the PDN in a smartphone platform 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 VR Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Static switch sizing (S3) . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Dynamic switch modulation (DSM) . . . . . . . . . . . . . . . 22 2.3 Power delivery network Characterization . . . . . . . . . . . . . . . . . 30 vii 2.3.1 Equivalent VR model . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 Module grouping and regression analysis . . . . . . . . . . . . 32 2.4 Experimental work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2 Coefficient identification . . . . . . . . . . . . . . . . . . . . . 35 2.4.3 Default widths extraction . . . . . . . . . . . . . . . . . . . . . 38 2.4.4 Simulation results: static switch sizing . . . . . . . . . . . . . . 40 2.4.5 Simulation results: dynamic switch modulation . . . . . . . . . 42 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 3 Optimizing the PDN in a multicore platform 50 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Dynamic Reconfiguration of the VR-to-core network . . . . . . . . . . 54 3.2.1 Proposed multicore platform . . . . . . . . . . . . . . . . . . . 55 3.2.2 Reactive VRCon . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.3 Proactive VRCon . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.4 Design considerations . . . . . . . . . . . . . . . . . . . . . . . 63 3.3 Heterogeneous PDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Proposed design of the heterogeneous PDN . . . . . . . . . . . 69 3.3.2 VRCon for the heterogeneous PDN . . . . . . . . . . . . . . . 70 3.4 Experimental work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.2 Homogeneous PDN results . . . . . . . . . . . . . . . . . . . . 80 3.4.3 Heterogeneous PDN results . . . . . . . . . . . . . . . . . . . . 83 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 viii Chapter 4 Optimizing the PDN in an OLED display platform 86 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 OLED-DVS with the zoned, large-area OLED display platforms . . . . 90 4.2.1 Preliminary: OLED-DVS . . . . . . . . . . . . . . . . . . . . . 90 4.2.2 Zoned OLED display panel . . . . . . . . . . . . . . . . . . . . 92 4.2.3 Buck-Boost VR characteristics . . . . . . . . . . . . . . . . . . 94 4.3 Design and Dynamic Control of PDN for OLED display platforms . . . 96 4.3.1 PDN architectures in Multicore platform . . . . . . . . . . . . . 97 4.3.2 Reconfigurable PDN for OLED displays . . . . . . . . . . . . . 98 4.3.3 Dynamic Reconfiguration Algorithm of PDN for OLED Display platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.4 Experimental Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.4.1 Simulation Framework . . . . . . . . . . . . . . . . . . . . . . 106 4.4.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Chapter 5 Conclusion 113 References 116 ix List of Figures 1.1 Measured of traces of the power conversion efficiency of the PDN in the Qualcomm Snapdragon MDP MSM8660 [1]. . . . . . . . . . . . . . . 2 1.2 Power conversion efficiency traces: simulation result from Parsec- Stream- cluster in Sniper [2] with LTC3618 [3]. . . . . . . . . . . . . . . . . . . 2 1.3 Circuit diagram of a buck type inductive VR. . . . . . . . . . . . . . . 5 1.4 Simulation results of the VR efficiency and power loss for various load conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Circuit diagram of a low-dropout linear regulator (LDO). . . . . . . . . 8 2.1 Conceptual diagram of the PDN in a smartphone platform. . . . . . . . 17 2.2 Load current distributions of one core in MSM 8660 and a result of the derivedf(I out ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Statistical data for the smartphone usages patterns, sourced from [4]. . . 21 2.4 Circuit diagram for dynamic switch modulation. . . . . . . . . . . . . . 22 2.5 Concept of DSM operation with two parallel-connected PMOS switches. 23 2.6 Simulated power conversion efficiencies by changing the widths of the PMOS switch in Figure 1.3. . . . . . . . . . . . . . . . . . . . . . . . . 25 2.7 Flowchart to classifyf(I out ) into three different cases. . . . . . . . . . . 27 2.8 Types I and II equivalent VR models. . . . . . . . . . . . . . . . . . . . 31 2.9 Conversion efficiencies for all groups. . . . . . . . . . . . . . . . . . . 37 x 2.10 A part of traces of total power consumption: measured data and modeled data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.11 Relation between the power conversion efficiency andW : Group 7 . . . 38 2.12 The ratio of the power consumed by Camera digital to the power con- sumed by all the modules in Group 4. . . . . . . . . . . . . . . . . . . 45 2.13 Load current distribution of display modules in Group 7, according to the 10 brightness levels. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 Diagram of the proposed multicore platform. . . . . . . . . . . . . . . . 56 3.2 Example cases that the reactive VRCon can be applied. . . . . . . . . . 57 3.3 Design flows to determine the VRs and the number of network switches in the proposed platform. Per-core* in the figure means that a designer puts more weight on the energy saving of the VR by setting it to achieve the best efficiency in the normal operation condition of each core. . . . . 66 3.4 A part of the proposed platform with the heterogeneous VRs: each group has theR big VRs andM-R little VRs . . . . . . . . . . . . . . . . . . 69 3.5 A part of the per-core DVFS results of Barnes and Streamcluster from the Sniper simulation with 4-core setup. . . . . . . . . . . . . . . 75 3.6 Topology of 16 cores (four 4-core processors) in Sniper simulation. . . . 76 3.7 VR schematic used in the spice simulation. . . . . . . . . . . . . . . . 77 3.8 Efficiency and Power loss vs. Load current for LTC3816 with (a) Si4840DY , (b) Si4838DY and (c) Si4442DY . . . . . . . . . . . . . . . . . . . . . 78 3.9 VRCon result from Fig. 3.5. . . . . . . . . . . . . . . . . . . . . . . . 80 4.1 AMOLED driver structure based on the DVS-friendly circuit [5]. . . . 91 xi 4.2 An example of the IR-drop in 4x4 zoned panel. (a) original 4K image, (b) the structure of the panel and power supply board, and (c) IR-drop values of the sub-panels. . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3 Buck boost VR. (a) the VR schematic and (b) the conversion efficiency vs.I load andV out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4 Geometrically divided OLED display panel with multiple VRs connected by the switch network. The switch network is partitioned to sub-networks. 100 4.5 Examples of applying OLED-DVS to 4K images in a 4x4 zoned 65" OLED display panel. The red (blue) box indicate an extreme case of a sub-panel with low (high) luminance pixels. . . . . . . . . . . . . . . . 108 xii Chapter 1 Introduction Power delivery network (PDN) is an essential part of VLSI platform to deliver power to all devices in a platform from a power source. Because the PDN inevitably dissipates power that can result in a considerable amount of power loss in a platform, minimizing the power loss of the PDN has become an important issue in VLSI platform designs. This dissertation presents optimization methods to minimize the power loss of the PDN, and hence, improve the power efficiency of the PDN. Both circuit-level and system- level approaches are discussed along with the proposed methods that are verified at three specific VLSI platforms: smartphones, chip multi-core processors (CMPs) and OLED display platforms. 1.1 Background and Motivation 1.1.1 Power conversion efficiency of the PDN Due to limits on the availability of the energy source in many VLSI platforms (ranging from handheld devices to portable electronics to deeply embedded devices), there has been a surge of interest in minimizing the power consumption of the platforms, and now it becomes a primary driver for platform designers. While power minimization and dynamic power management (DPM) techniques have been heavily explored to improve the power efficiency of devices inside various VLSI platforms, there is one critical factor that is often overlooked, which is the power dissipation from the PDN. 1 80 70 60 Efficiency (%) Time (s) 75 65 55 0 50 100 150 200 250 0 500 1000 1500 2000 2500 55 60 65 70 75 80 Mean: 67.93 (%) 0 500 1000 1500 2000 2500 55 60 65 70 75 80 Figure 1.1: Measured of traces of the power conversion efficiency of the PDN in the Qualcomm Snapdragon MDP MSM8660 [1]. 2 2.5 3 3.5 4 4.5 5 5.5 6 x 10 4 0 20 40 60 80 0 10 20 30 Time (ms) 0 20 40 60 80 Efficiency (%) Mean: 75.18(%) 40 Mean: 46.38(%) 5 15 25 35 Figure 1.2: Power conversion efficiency traces: simulation result from Parsec- Stream- cluster in Sniper [2] with LTC3618 [3]. Modern VLSI platforms, such as smartphones, CMPs and OLED displays, are equipped with many devices, each requiring its own supply voltage level which is typi- cally different from those of other devices in the platform. For example, a smartphone platform is powered by a secondary (rechargeable) Li- Ion battery comprised of a single battery cell providing 3.7V voltage level. This cell output voltage must be converted and regulated to different pre-determined voltage levels and distributed to various de- vices such as processors, memory, display, GPS, etc. in the platform. Consequently, the PDNs consist of voltage regulators (VRs, also known as DC-DC converters), which play a pivotal role of the power conversion and regulation. In reality, the VRs in the PDN inevitably dissipate power, and power dissipations from all VRs inside a platform can result in a considerable amount of power loss. For example, Figure 1.1 shows my experiment that normally 25-40% battery power is dissipated from the PDN in a modern 2 smartphone platform [6]. Similarly, Figure 1.2 is an example of traces of the power loss during delivering power to a core in a multicore platform shows that sometimes more than 53% power is dissipated from the PDN [7, 8]. Therefore, reducing such power loss can ensure appreciably reduce the total power consumption of the platforms. Given that the power conversion efficiency of a VR (simply called VR efficiency in the remainder of this dissertation) is the ratio of the power consumed by a device to the power consumed by both the device and VR, modern VRs exhibit high peak power con- version efficiency, but their efficiency can drop dramatically under adverse load condi- tions (i.e., out-of-range output current levels). In other words, a state-of-the-art VR can exhibit low conversion efficiency when there is a mismatch between the VR characteris- tics and its load [9, 10, 11]. This dissertation mainly focuses on the mismatch problem, and presents the circuit level and system level solutions to overcome the problem. In order to address the mismatch problem in detail, the following subsection discusses the VR models and characteristics. 1.1.2 VR models and characteristics VRs are typically classified into three types, inductive VRs, low-dropout linear regula- tor (LDO), and capacitive VRs, according to the circuit implementation and operation principles. The inductive VRs achieve very high power conversion efficiencies for wide range of their output loads. This type of VRs can step-up the output voltage so that it be- comes higher than the input voltage (i.e., boost), or step-down the output voltage so that it is lower than the input voltage (i.e., buck) or both (i.e.,buck-boost). On the other hand, the output voltage of an LDO can only be lower than its input voltage. In general, LDOs are good at low-noise output voltage, low area-overhead and ease of integration. How- ever, their limitation of low power conversion efficiencies makes them normally used to 3 provide power for some noise-sensitive RF or analog modules in the platforms. The ca- pacitive VRs have lower area overhead than the inductive VRs, and achieve better power conversion efficiency than LDOs. However, unlike the inductive VRs where the power conversion efficiencies depend only on parasitics of their components, the conversion efficiency of the capacitive VRs are limited by their output resistance. Thus, it drops significantly as the conversion ratio moves away from the ideal ratio of a given topology and operating mode [12]. In this dissertation, I consider only the inductive VRs and the LDOs. Using these two types of VRs can appropriately provide the low-noise output voltages with high conversion efficiency to the various modules in the VLSI platforms. 1.1.2.1 Inductive VR model The buck type inductive VR consists of an inductor, a capacitor, two MOSFET (or Power MOSFET) switches and a pulse-width- modulation (PWM) controller. Note that MOSFET switches typically used if the load current of the switches are small (i.e., less than 0.5A [13, 14]), while powerFET switches are used to drive the high load current. Figure 3 shows the simplified schematics of the buck type inductive VR (simply called VR in the remainder of this dissertation). The PWM appropriately charges or discharges the output node to keep the output voltage of the VR at a desired target level. The high frequency switching noise is rejected by the L-C filter, whereas a small but important portion of the noise appears as output voltage ripples. Major power losses arise from the on-resistance of power switches and the parasitic resistance of passive elements in the design. In Figure 1.3, the PMOS switch is shown as sw1. Its ON- resistance and ON-state gate charge are denoted byR sw1 andQ sw1 , respectively. Similarly, the NMOS switch, 4 PWM controller R L L R C C DC-DC converter (buck type) Loads R sw1 R sw2 Q sw1 Q sw2 Figure 1.3: Circuit diagram of a buck type inductive VR. shown as sw2 in the figure, has an ON-resistanceR sw2 and gate chargeQ sw2 , accord- ingly. Parasitic series resistances of the inductor,L, and the capacitor,C, are denoted by R L andR C , respectively. Depending on the physical source of power consumption, the equation for the VR power losses may be derived from the following three models: con- duction loss, switching loss, and controller power consumption, denoted byP conduction , P switching , and P controller , respectively [9, 10]. The power loss in the VR, P loss , is the sum of the three terms: P inductive =P conduction +P switching +P controller (1.1) =I out 2 (R L +DR sw1 + (1D)R sw2 ) (1.2) + (I) 2 (R L +DR sw1 + (1D)R sw2 +R C )=12 +V in f sw (Q sw1 +Q sw2 ) +V in I controller ; where the first and second terms of (4.4) account for DC and AC conduction losses, respectively; the third and fourth terms of (4.4) are the switching loss and controller power consumption, respectively; I out is the output current; V in andV out are the input and output voltages;D and (1D) are the PWM duty ratios of the PMOS and NMOS switches, respectively; f sw is the switching frequency; I controller is the current used in the control logic section of the VR, and I = (1D)V out =(L f f sw ) is the amplitude 5 Efficiency Power loss Efficiency Power loss Load current (linear scale) Region I Region II Figure 1.4: Simulation results of the VR efficiency and power loss for various load conditions. of the maximum current ripple at the inductor. Finally, the conversion efficiency of a inductive buck VR,, can be written as: inductive = P out P in = V out I out V out I out +P inductive 100 (%) (1.3) Based on the VR schematics from Figure 1.3 and the extracted parameters from 45nm BSIM4 predictive technology model (PTM) for bulk CMOS [15], the VR effi- ciency is simulated according to the output current changes shown in Figure 1.4. The load currents in the figure are conceptually divided to two regions to show that the main sources of the VR power loss areP switching andP controller in Region I, andP conduction in Region II. While Regions II shows relatively high efficiency, the efficiency in Region I drops dramatically under the adverse conditions of the output current. From (4.4), the power losses due to the PMOS switch, P nmos , and NMOS switch, P nmos , may be expressed as: P pmos =C ox W p L min m m 1 V 2 in f sw + DI 2 out p C ox Wp L min (V in kV pth k) ; (1.4) P nmos =C ox W n L min m m 1 V 2 in f sw + (1D)I 2 out n C ox Wn L min (V in V nth ) (1.5) 6 In (1.4) and (1.5), C ox is the gate capacitance per unit area. W p is the gate width of the PMOS power FET, andW n is the gate width of the NMOS power FET.L min is the minimum gate length of the given technology. p is the hole mobility in the PMOS device, and n is the electron mobility in the NMOS device. V pth and V nth are the threshold voltages of the PMOS and NMOS devices, respectively. m is the tapering factor for the (super buffer-like) gate driver of the power FETs. The output ripple of the VR, V , is strictly limited by the normal operating conditions of the processor. Typically, V must be less than 10% of the nominal output level. The PWM frequency, f sw , and values of the passive componentsL andC significantly affect the magnitude of V . Using the same notation as in the previous subsection, V may be expressed as [16]: V = (V out +V sw2 +V L )(1 V out +V sw2 +V L V in V sw1 +V sw2 ) 8LCf 2 sw ; (1.6) whereV sw1 ,V sw2 , andV L are the voltage-drops bysw1,sw2, andL, respectively. According to (1.4), (1.5) and (1.6), the higher f sw is, the smaller V is, but the power dissipation P switching goes up. On the other hand, a smaller value of f sw gives rise to a need for biggerL orC in order to meet the specified V requirement. So far, the buck type inductive VR model has been introduced, which will be ex- ploited in Chapter 2 and 3. In Chapter 4, I will use the boost type inductive VR. The model of the boost type inductive VR will be introduced in detail in Section 4.2.3. 1.1.2.2 LDO model A typical LDO consists of an error amplifier, a pass transistor, and a feedback resistor network. The power loss of the LDO, denoted byP LDO , is given by: P LDO =I out (V in V ref ) +I q V in ; (1.7) 7 + - = + Loads LDO V ref R 1 R 2 Amplifier Figure 1.5: Circuit diagram of a low-dropout linear regulator (LDO). whereV ref is the reference voltage in the error amplifier; = (R 1 +R 2 )=R 2 corresponds to the voltage divider’s gain coefficient, andI q denotes the quiescent current of the LDO. Unlike the switching VR in which the MOSFET switches dominate the total power loss, the pass transistor in the LDO has a negligible impact on its total power loss [9]. Therefore, the power loss due to internal resistance of the pass transistor does not need to be explicitly accounted for in the model. Thus the conversion efficiency of the LDO, LDO , may be expressed as: LDO = V out I out V in I in = V ref I out V in (I out +I q ) (1.8) 1.1.3 Related Research and Limitations As aforementioned, state-of-the-art (inductive) VRs can exhibit low conversion effi- ciency when there is a mismatch between the VR characteristics and its load. To tackle this drawback, a few approaches have been proposed including circuit and system level approaches. 1.1.3.1 Circuit level approach A few recent papers have studied on components of the VR to improve the efficiency of a single VR [9, 11, 17, 14, 18]. Reference [9] has proven the possibility that sizing 8 the MOSFET switches inside the VR can result in controlling the peak efficiency region of the VR. Reference [11] has derived the optimal sizes of the switches according to the given load condition in the multicore platform. However, these works have not sup- ported the detailed analysis to derive the optimal sizes of the switches (i.e., they have not provide any analytical model). Furthermore, these static switch sizing is only appli- cable when the load condition is given a priori. Any fixed sizing solution tends to result in suboptimal VR conversion efficiency under dynamically changing load conditions, which may be very different from the one for which the static sizing solution was orig- inally obtained. Using multiple/parallel MOSFET switches in the VR design has been presented in [17, 14]. These methods have successfully widen the high efficiency re- gions of the VR. However, the different gate voltages needed for each switch set in [17] require additional VRs, which tends to cause area/control overheads. Furthermore, the number of switches (which was fixed to three in [17, 14]) and their sizes should be determined judiciously in order to achieve the maximum efficiency under given design specifications (i.e., for possible ranges of the load currents of various devices in a target platform). Meanwhile, a few works have focused on an alternative operating mode such as pulse frequency modulation (PFM) that can be added to compensate the degraded efficiency [14, 18]. Although mitigating the radical efficiency drop in the low current region, the efficiency of the PFM mode is typically lower than that of the PWM mode in the normal current region. The design/control complexity of the VR also increases by supporting switching between these two modes. 1.1.3.2 System level approach In contrast to the circuit level approach above, little attention has been paid to the ques- tion of how to improve the efficiency of the PDN from system-level optimizations, in 9 spite of a few papers that have explored VRs from a system perspective [9, 19, 20]. A DVFS policy that is aware of the VR efficiency characteristics has been addressed in [9]. The optimal frequency of a core was derived to minimize the total energy consumption in both the core and the VR. However, reference [9] only takes account for the single pro- cessor platform that equips a single VR, there is still large potential to save more power in the multi-core and multi-VR systems. In [19], the potential of energy saving in a CMP platform using per-core DVFS and fast transient responses of VRs has been presented. To determine the optimal DVFS levels for each core, an offline algorithm based on the integer linear programming (ILP) has been proposed. But this approach does not consider the power dissipated by the indispensable large number of VRs to enable per- core DVFS. Meanwhile, to tackle the drawback of per-core DVFS in a CMP platform, an offline approach to cluster the cores in the same voltage-rail has been suggested [20]. K-meansclustering has been used to group some cores which have the similar DVFS levels, so as to reduce the number of VRs required in the system. However, reducing a fixed number of VRs loses in part the benefit of per-core DVFS as aforesaid, and may not guarantee energy saving in VRs with dynamically changing workloads. In addition, clustering the cores with similar behaviors of the voltage/frequency levels may not be applicable for multi-threaded applications where the locking and synchronization issues should be carefully accounted for [21, 22]. For example, a delayed thread of an appli- cation on the clustered core may have to lock the other threads for the synchronization, which can cause significant delay of the application. 1.1.4 Contributions This dissertation covers both circuit level and system level approaches. First, two opti- mization methods are presented, which exploit the sizes of the MOSFET switches inside 10 a VR, and modify them to improve the power conversion efficiency of a VR. These pro- posed methods are verified with a modern smartphone platform, along with a proposed procedure to characterize the target PDN. Then, two system level optimization method are presented, which targets a CMP enabling per-core DVFS. With a proposed CMP ar- chitecture that equips a reconfigurable PDN, the proposed methods effectively reduces the power loss of the VRs. Next, the target platform is moved to an OLED display plat- form. I explore a PDN in a large, DVS-enabled OLED display, which also inevitably requires multiple VRs that induce considerable amount of power dissipation. Similar to the proposed reconfigurable PDN in the CMP, but more elaborated architecture for the OLED display is presented, along with specific optimization algorithm. 1.1.4.1 Optimizing the PDN in a smartphone platform This work first presents two optimization methods to minimize the power loss due to the VRs according to the load conditions. First, I propose a staticswitchsizing (S3) method. The objective is to statically perform optimal sizing on the output stage drivers of the VR (i.e., the power MOSFET switches) at design time, according to statistical information about the load behavior. This work takes similar approach to the refer- ence [11] (i.e., this work has been published concurrently with the reference [11]), but provides detailed ways of the optimization method including PDN characterization and statistical analysis of the load conditions. Next, I extend the multi-switching scheme to adaptively turn on/off the switches inside the VR, depending on the required amount of load current. This method, calleddynamicswitchmodulation (DSM), enables the dynamic control of the VR so as to minimize its power loss under dynamically changing load conditions. This dissertation provides sophisticated control policies of the multiple 11 switches as well as design optimization algorithms to find the number of switches and their optimum sizes. To apply the proposed optimization methods to the actual smartphone platform, I perform the PDN characterization. This work proposes a characterization procedure, based on i) development of an equivalent VR model, ii) module grouping, and iii) linear regression. The proposed equivalent VR model can effectively model different types of VRs and their cascade connections to represent a power delivery path from the battery cell to a collection of load devices. Each equivalent VR model has its own conversion efficiency coefficients, and I perform characterization to identify these coefficients. Extensive experimental results is also provided. I verify the accuracy of power con- version efficiency characterization with real measurement data. The results point to the fact that power conversion efficiency of the target smartphone platform is quite low. Next, the load current profiles for each module in the smartphone platform is collected. Finally, I apply the two proposed optimization methods (i.e., S3 and DSM) to ensure that the VRs operate at the most energy-efficient points. The experimental results demon- strate that the S3 achieves 6% overall efficiency enhancement, which translates to 19% power loss reduction for the general smartphone usage pattern. The results of DSM show that it can accomplish the efficiency enhancements as high as the S3. Further- more, DSM can perform the efficiency enhancement for the whole load current range conditions. 1.1.4.2 Optimizing the PDN in a multicore platform In this work, the dissertation starts from a concept to combine some cores, which operate at the same voltage level and drive relatively small amount of load current, to be powered 12 by a single VR. This approach can significantly reduce the VR power loss in the multi- core processor platform due to the following two reasons: (i) the VR used to power multiple cores has relatively high current load and thus has higher efficiency according to the VR characteristics, and (ii) the VRs that is not used can be turned off to save power. Based on this concept of VR consolidation, I propose a new design of the multi- core platform, which includes (multiple) sets of network switches to reconfigure the PDN. I then present two optimization methods to minimize the VR power loss and maximize the total energy saving. I first propose a reactive method that configures the PDN based on the sensed voltage/current level of each core. I present a proactive method to decide the optimal voltage/frequency level of each core in the consideration of maximizing the consolidation opportunities of VRs, in order to minimize the energy consumption of the whole system. Along with the optimization methods for the PDN composed of homogeneous VRs, I will also discuss the PDN with heterogeneous VRs as an ongoing work, which is proposed to increase the benefits of the VR consolidation by equipping VRs with a larger driving capability of load current. I will also provide detailed discussion about the design considerations for PDNs. I validate the proposed methods on various applications from the PARSEC and SPLASH2 benchmark suites. I perform detailed multi-core processor simulation using the modified Sniper simulator [4], and the spice circuit simulation with a commercial VR carefully selected for fair evaluation. Results demonstrate upto 35% VR energy loss reduction and 14% total energy saving. 13 1.1.4.3 Optimizing the PDN in a OLED display platform In this work, I present a power delivery architecture based on a reconfigurable switch network to maximize efficacy of the DVS method in the large OLED panel with the minimum overhead of the multiple VRs. The proposed reconfigurable PDN presents the minimum number of VRs but achieves their full potential. Similar to the proposed reconfigurable PDN for the multicore platform, the basic concept of the proposed PDN is that grouping some sub-panels to be powered by a single VR can reduce the VR power loss significantly. For example, if the sub-panels that drive relatively small amount of load current are grouped together, the single VR has relatively high load current. Due to characteristics of the VR efficiency, the VR then may operate at higher efficiency. Of course, when grouping the sub-panels, I should also take account the power con- sumption of the sub-panels and power losses induced by IR-drops. Therefore, I also propose an optimization algorithm to control the proposed PDN to minimize the power consumption of the whole system. This algorithm is to optimally divide the sub-panels into the several groups, and perform the group-level DVS. I validate the proposed methods on an AMOLED panel model that I develop for the realistic experiment. I target a 65” TV plat- form that supports 4K UHD (4096 x 2160) resolution. I perform detailed simulations on the target platform with a commercial VR carefully selected for fair evaluation. Results demonstrate that up to 36% power savings can be achieved, while satisfying IR drop and image quality constraints. 1.1.5 Outline In this dissertation, I started by explaining background and motivations for optimizing the PDN, and briefly mentioned my contributions on this topic. In the subsequent three chapters, I introduce the detailed explanations for the key research tasks. In Chapter 2, 14 I will introduce the optimization methods for the smartphone platform. In Chapter 3, I will present the optimization methods for the multicore platform. In Chapter 4, I will focus on the PDN in the OLED display platform, and will present a specific optimiza- tion method. Finally, in Chapter 5, I will summarize the works, and will conclude this dissertation. 15 Chapter 2 Optimizing the PDN in a smartphone platform 2.1 Introduction Growing demand for increased smartphone functionality and the need to support all kinds of popular applications on the smartphone have been driving the trend toward in- cluding many high-performance modules (such as high-speed processors, fast wireless interface, large and high resolution display, sophisticated sensors) on the smartphone platform. The usability of smartphones has, however, been hampered by their low ser- vice time between successive charging operations. This is because the electrical energy storage density of modern batteries has been advancing at a relatively low pace com- pared to rate at which functional and performance improvements have been made to the smartphone platform and components. The latter, however, comes at the expense of increased power consumption in the smartphone platform. Consequently, there has been a surge of interest in reducing power consumption of the smartphone platform. Some recent works have focused on developing power macro models for the modules in the smartphone platforms [23, 24, 25, 26]. Similarly, dynamic power management (DPM) techniques [27, 28] have been widely investigated and employed in various platforms, including smartphones. 16 Battery DC-DC Buck converter DC-DC Buck converter LDO LDO CPU core: 0.8-1.225 V DRAM VDD: 1.2 V Audio codec IO: 1.8V Power delivery network Battery Camera analog: 2.85V Display Backlight: 3.8V DC-DC buck/ boost converter LDO 3.7V Figure 2.1: Conceptual diagram of the PDN in a smartphone platform. While power modeling and DPM in the smartphone platforms have been heavily investigated, there is one critical factor that has often been overlooked, and that is the power conversion efficiency of the PDN in smartphones. The PDN provides the bat- tery power to all the modules. The conceptual diagram of the PDN in Figure 2.1 shows that it consists of VRs. In reality, VRs in the PDN of a smartphone inevitably dissipate power, and power dissipations from all VRs inside the platform can result in a consid- erable amount of power loss. Given that the overall PDN’s power efficiency is the ratio of the power consumed by all the smartphone modules to the power drawn from the smartphone battery, Figure 1.1 shows that the overall power efficiency of a real smart- phone platform is around 60% to 75%. Improving the power conversion efficiency can thus ensure appreciably longer battery life. In this chapter, I focuses on power conver- sion efficiency in the smartphone platform and introduces an optimization procedure for improving it. 2.2 VR Optimization Optimizing VRs has the goal of reducing the power losses without incurring any per- formance degradation. This is because, unlike typical low-power design techniques that 17 often exploit a trade-off between performance, service quality, and power efficiency, the VR optimization technique does not shut off or slow down the overall system. Enhancement of the overall efficiency of a VR can greatly increase the overall sys- tem power efficiency [29, 30]. VRs show very high overall efficiency under desirable operating conditions. However, their efficiency can be low if they are operating out- side the recommended range of input and output voltages and load currents [9, 10]. Therefore, ensuring that each VR in the system is operating under the desirable op- erating conditions is an effective way of improving the system power efficiency. For example, reference [31] presents a dynamic programming-based approach to design the structure of the PDN in a system while at the same time selecting the ‘most suitable’ VR or LDO for each node of the PDN. Reference [32] proposes the concept of parallel connections of high frequency VRs for distributed energy storage systems. In contrast, the present chapter starts with a fixed conversion tree structure, but performs MOSFET switch reconfiguration based on the load current demands and VR characteristics, so as to improve the overall power conversion efficiency in a smartphone platform. 2.2.1 Static switch sizing (S3) Gate widths of the switches have a substantial impact on the efficiency of the VR. From (1.4) and (1.5),P pmos andP nmos are convex functions of the change in gate width. The smaller gate width reduces the switching loss, but increases the conduction loss, and vice versa for the larger gate width. For a givenI out , the function to find the optimum PMOS gate width is thus obtained by solvingdP pmos =dW p = 0 [13, 14]: W p;opt (I out ) = I out C ox V in s D(m 1) p (V in kV pth k)f sw m (2.1) 18 The function to find the optimum NMOS gate width is derived in a similar manner, and its expression is as follows: W n;opt (I out ) = I out C ox V in s (1D)(m 1) n (V in V nth )f sw m (2.2) It is important that the obtained optimum gate widths from (2.1) and (2.2) satisfy a design constraint whereby the resulting output ripple of the VR, V , is less than its allowed limit. As described in (1.6), changing the switch sizes can affect V . If the derived optimum switch sizes violate the V constraint, I will increaseL orC for the VR. Finally the power loss of the VR in (4.4) is recalculated, to ensure that the overall transistor sizing plus potential change to L and C reduce the net power loss. For reference, my experimental work in this chapter shows that the worst case of V increment from the default switch sizes to the optimum switch sizes is 14%. In other words, if V for the default switch sizes is 5%, then the resulting V should be less than 5.7% (i.e., 5 + 5 0:14). I thus assume V changes are enough small to satisfy the voltage ripple constraints. Detailed results of the V increment are presented in Section 2.4.4. In (2.1) and (2.2), the optimum gate widths are derived for a fixed output current, I out . However,I out in the smartphone is different depending on its usage pattern. There- fore, the goal here is to find the optimum gate widths such that the high-conversion- efficiency operating conditions for the VR match with the current distribution that is produced by the actual usage profile of common smartphone applications. The opti- mization objective is thus to maximize the overall conversion efficiency of the smart- phone based on its typical (expected) daily usage. Treating the total current used in the smartphone as a continuous random variable, I denote its probability density function 19 Figure 2.2: Load current distributions of one core in MSM 8660 and a result of the derivedf(I out ): byf(I out ). Because there are many mobile use cases generating variousI out distribu- tions, finding a general case off(I out ) is challenging. I propose a method utilizing the statistical data of mobile device usage patterns and measured data from running mobile applications as benchmarks as detailed next. First, I obtain a fine-grained classification of diverse mobile use cases. Next, I find mobile applications, representing each distinct class of use cases. I perform extensive measurement of output currents of the VRs in the smartphone platform when different applications are running. In addition, to derive the correct probability distribution off(I out ), I acquire the average runtime of each class of use cases (applications) from the previous studies published in [33, 4, 34]. Figure 2.2 shows example results of the derivedf(I out ) distribution for a processor core in the Qualcomm’s MDP. To derivef(I out ), I ran 10 representative mobile appli- cations (they are ‘Call’, ‘Facebook’, ‘Skype-videochat’, ‘Clock’, ‘Camera’, ‘Google- Map’, ‘Neocore’, ‘SMS’, ‘System setting’, and ‘Youtube’) on the MDP. Next, I classi- fied the 10 applications into seven classes presented in [4]: i) communication (contains 20 communication: 44% productivity: 19% other: 11% browsing:10% system: 5% maps: 5% media: 5% games: 2% communication: 49% other: 15% browsing:12% games: 10% media: 9% productivity: 2% maps: 2% system: 1% (a) Pattern I (b) Pattern II Figure 2.3: Statistical data for the smartphone usages patterns, sourced from [4]. ‘SMS’, ‘Call’, and ‘Skype-videochat’), ii) browsing (contains ‘Web browsing’), iii) me- dia (contains ‘Camera’ and ‘Youtube’), iv) productivity (contains ‘Clock’), v) system (contains ‘System setting’), vi) games (contains ‘Neocore’), and vii) maps (includes ‘GoogleMap’). I determined the average usage time of each class of applications based on the statistical data for the mobile device usage patterns [4]. As shown in Figure 2.3, the reference introduced two representative smartphone use patterns (Pattern I and II), each of which has its own proportions of the usage time for the aforesaid application classes. From the derivedf(I out ), (2.1) is modified to find the expected value of the optimum PMOS width from the S3 (W p;S3 ): W p;S3 = q I 2 out f(I out )dI out C ox V in s D(m 1) p (V in kV pth k)f sw m (2.3) Similarly, the expected value of the optimum NMOS width from the S3 (W n;S3 ) can be calculated as follows: W n;S3 = q I 2 out f(I out )dI out C ox V in s (D 1)(m 1) n (V in V nth )f sw m (2.4) 21 Power MOSFETs W p1 PWM controller V out V in W p2 W n1 W n2 Gate drivers .... . . . . W pN W nN Figure 2.4: Circuit diagram for dynamic switch modulation. 2.2.2 Dynamic switch modulation (DSM) The S3 is only applicable when the load condition is given a priori. Any fixed sizing solution tends to result in suboptimal conversion efficiency under dynamically changing load conditions, which may be very different from the one for which the static sizing solution was originally obtained. Furthermore, the higher the variance of the load cur- rent distribution is, the lower is the guarantee of optimality of the S3 solution. The optimum efficiency under dynamically changing load conditions can be obtained by adaptively turning on or off some of the multiple parallel-connected switches [17, 14]. However, the different gate voltages needed for each switch set in [17] require additional VRs, which tends to cause area/control overheads. Furthermore, the number of switches (which was fixed to three in [17, 14]) and their sizes should be determined judiciously in order to achieve the maximum efficiency under given design specifications (i.e., for possible ranges of the load currents of various smartphone modules). My proposed approach is an extension of the multiple switch scheme, which I call dynamic switch modulation (DSM). This task is to find the optimum number of parallel-connected out- put driver switches, their sizes, and on/off conditions under dynamically varying load conditions. 22 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 Efficiency (%) I out Selected 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 W eff,p,3 =W p1 +W p2 W eff,p,2 =W p2 W eff,p,1 =W p1 I bd,p,1 I bd,p,2 80 70 60 50 40 30 90 20 Figure 2.5: Concept of DSM operation with two parallel-connected PMOS switches. Figure 2.4 shows a simple schematic drawing of the ‘load-adaptive’ VR. There are N pairs of switches connected in parallel. These switches are arranged such that the first switch has the minimum width (denoted byW p1 andW n1 ), and the last switch has the maximum width (denoted byW pN andW nN ). The maximum effective width (i.e., the sum of widths of all parallel-connected FETs of the same type) is large enough to support the maximum output current, I out;max . For a smaller I out value, some of the NMOS and PMOS switches are turned off. Depending on the I out value, a different on/off combination of the switches can be used to achieve the maximum conversion efficiency (which is equivalent to minimizingP pmos andP nmos ). I denote the effective width of the turned-on switch combination asW eff;type;i , where type implies the switch type, i.e.,p (PMOS) orn (NMOS), andi denotes thei th smallest effective width for the switch configuration (among all possible combinations of the same type of switch). Figure 2.5 is an example of the DSM on a VR using two parallel-connected PMOS switches, which can independently be turned on or off at any time. The two PMOS switches give rise to three effective widths for the PMOS switch,W eff;p;1 ,W eff;p;2 and W eff;p;3 . Consequently, the output current range is divided into three operation ranges. The result of DSM in the figure, identified as a thick (red) line, shows that the maximum 23 efficiency in each output current range is achieved by adaptively turning on the appro- priate combination of two PMOS switches. It then follows that, for each output current range, the optimum switch combination must be found. Note that the output current range can be divided into a larger number of bins by in- creasing the number of parallel-connected switches of the same type. A larger bin count greatly increases the flexibility to achieve high efficiency over a wider range of output current values. However, the increased area and power consumption due to higher com- plexity of the control circuitry is an important consideration in determining the optimal number of switches (N). To determine the optimum N and the size of each switch, I first investigate and determine the maximum and minimum effective widths of each type of switch. For the maximum effective width, I use the constraint that it should be large enough to drive I out;max . Therefore, the maximum effective width of PMOS switch should satisfy the following constraint: W eff;p;M I out;max L min p C ox (V in kV pth k)(V in V out;max R L I out;max ) ; (2.5) where M is the number of all possible switch combinations (it is 2 N 1); V out;max is the maximum available output voltage of the VR. I load;max can be obtained from measurements or looked up from a data sheet. I determine the maximum effective width of NMOS switches in a similar manner. To determine the minimum size for the effective widths, I use my observation from the experimental work. Figure 2.6 shows the result of simulating the VR model in Figure 1.3, for various widths of the PMOS switch. The model parameters are determined from the 45nm BSIM4 predictive technology model (PTM) for bulk CMOS [15],f sw = 330Mhz, L = 6:8nH and C = 4nF. According to the results, using switches smaller than a certain width region, yet it does not achieve high efficiency improvement even 24 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 10 0 150 50 100 Efficiency (%) 20 30 Down scaling 200 250 Output current (mA) 40 50 60 70 80 100 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 0 50 100 150 200 250 10 20 30 40 50 60 70 80 90 data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data12 data13 data14 W p W p /2 W p /4 W p /6 W p /8 W p /10 W p /12 Figure 2.6: Simulated power conversion efficiencies by changing the widths of the PMOS switch in Figure 1.3. in the low current region. Therefore, the minimum effective widths should not be made too small. Next, I consider the boundary conditions in the output current regions. I define the i th smallest boundary condition in the output current range as I bd;type;i , where type is the switch type, while i is the i th smallest current value. Thus, I bd;type;i is the bound- ary condition between two consecutive switch combination regions, each of which has the corresponding optimum effective widths,W eff;type;i andW eff;type;i+1 . The example with the two PMOS switches in Figure 2.5 shows that there are two boundary condi- tions,I bd;p;1 andI bd;p;2 . From (1.4), the boundary condition for PMOS switches may be calculated as: I bd;p;i =C ox V in r p f sw W eff;p;i W eff;p;i+1 m D(m 1) (2.6) The boundary condition for NMOS switches can be derived in the same way, and expressed as: I bd;n;i =C ox V in r p f sw W eff;n;i W eff;n;i+1 m (1D)(m 1) (2.7) 25 Finally, I derive the objective functions for PMOS and NMOS sizing that minimize the expected power loss of PMOS (P pmos ) and NMOS (P nmos ) under the whole range of the possible output current values: min M1 X i=1 I bd;p;i+1 I bd;p;i W eff;p;i + DI 2 out W eff;p;i f(I out )dI out ! ; (2.8) min M1 X i=1 I bd;n;i+1 I bd;n;i W eff;n;i + (1D)I 2 out W eff;n;i f(I out )dI out ! ; (2.9) where = C ox L min f sw V 2 in m=(m 1), = p C ox (V in V_{pth})=L min and = n C ox (V in V nth )=L min . I bd;p;M and I bd;n;M are equal to I out;max , whereas I bd;p;1 and I bd;n;1 equals the minimum output current.f(I out ) is the load current distribution. Solving (2.8) and (2.9) is not straight-forward. This is primarily because, as I also stated before, the number of possible combinations (M) increases exponentially as the number of switches (N) grows. In addition, I also have to abide by other design con- siderations, e.g., limitations on the control complexity and area overhead. Therefore, N should be carefully selected, i.e., it must be small enough so as not to significantly increase the control and area overheads, but large enough to enable the DSM in response to varying load conditions. Even ifN is limited to a small number, assuming thatf(I out ) follows a uniform distribution may not guarantee the optimality of the solution. This is because actual load conditions can be discretely distributed (i.e., those modules which have ON/OFF operation controlled by user activities, such as camera, SD card and so on. I thus classify f(I out ) into three cases: discrete, continuous and discretizable, as described in Figure 2.7. I present the heuristic solution of the switch selection and sizing problem for each case in the following subsections. 26 At least one module has the on/off operation controlled by an user Discrete (Details in Section 3.B-1) Yes Yes No No Discretizable (Details in Section 3.B-3) Yes Continuous (Details in Section 3.B-2) Yes No No Modules connected to the target DC-DC converter Operations of the on/off modules dominantly affect f(I out ) At least one module has multiple operation levels controlled by an user Statistical load profiles can be used Modules of the multiple operation levels dominantly affect f(I out ) Yes No 2.2.2.1) 2.2.2.2) 2.2.2.3) Figure 2.7: Flowchart to classifyf(I out ) into three different cases. 2.2.2.1 Discretef(I out ) I define the state of f(I out ) as discrete when f(I out ) has (discretely) dominant load current values. For example, if a VR powers up some modules including modules that can be controllably turned on/off, it may have several discrete load current values in its f(I out ). If the discrete values are dominant in the distribution, the problem here aims to select and size the switches so that the effective widths of the switches match to the widths corresponding to the discrete current values, calculated by (2.1) and (2.2). According to the switch type, the calculated widths are included to a setG p (for PMOS) orG n (for NMOS). I then define cover so that ‘a setScovers a widthw’ means there is an effective width configured by elements inS to match to the value,w . should 27 Algorithm 1 To find a minimum set of the optimum widths of PMOS switches (T p ) under the given number of switches (N) and the discretizedI out 1: Initialization 2: defineI out;i ,N, .I out;i isi th discrete current value 3: W i =W p;opt (I out;i ) andG p =fW 1 ;W 2 ;:::;W K g . from (2.1) 4: max 0 .max will be updated to the maximum number of elements inG p , covered by the setT p fromOptP _widths 5: 6: functioncoverage(w,S) 7: if w then return 1 8: for eachs2S do 9: ifcoverage(ws;S T fsg c ) = 1 then return 1 10: return 0 11: 12: functionOptP _widths(n,m,S) . : main function 13: forniN do .i is the number of switches in the set 14: formjK do . to addW j into the set S 15: S S S fW j g,c 0 16: for 1kK do . to checkW k 17: ifcoverage(W k ,S)= 1 thenc c + 1 18: else if S <i andjk then 19: OptP _widths(i;k;S) 20: c c + 1 21: if P s2S sW eff;p;M andcmax then 22: max c,T p S . to updatemax andT p 23: ifmax =K then break 24: returnT p be small enough. If the given design specification has enough switches (N) so that the effective widths can easilycover all the required widths inG p andG n , then the problem can be solved straightforwardly. However,N is likely quite small in a common design specification. With the given N, the problem is then to find a minimum set of each switch types (T p andT n , wherekT p k;kT n k N) that can cover the maximum number of the widths in G p and G n . Finally, I present an algorithm to solve the problem. A function,coverage, in Algorithm 1 is a simple dynamic programming that determines whether the current set of switches (S) can cover the required width (w). Performing 28 OptP _widths in Algorithm 1 returns the set, T p , that cover the maximum number of elements inG p . The optimum set for the NMOS switches can be obtained in a similar manner. 2.2.2.2 Continuousf(I out ) Some VRs power up modules that have more than two operation levels as set by the user preferences. The brightness level of the display module and the volume level of the speaker module can be representative examples. If the load current of each module’s operation level is known, thenf(I out ) may belong to thediscrete case. However, my experience with the Qualcomm MDP shows that the load current conditions of the vari- ous operation levels typically overlap. I thus can not find discrete breakpoints inf(I out ). Furthermore, the user preference is random so that all the load current conditions have the same probability to be chosen. Finally, I define this case ascontinuous, and treat f(I out ) as an uniform distribution. In this case, finding a set of the effective widths (G p orG n ) can be formulated as a simple arithmetic progression problem to findM number of effective widths with the given minimum and maximum effective widths. Next, Al- gorithm 1 is applied to the resultant set of effective widths, so that I can find the switches to cover the maximum number of the effective widths. 2.2.2.3 Discretizablef(I out ) There are some VRs supplying power to the modules that cannot be controlled by the user. In this case, I propose to use the statistical load profiles of the VR. Therefore, not only can the VR deal with the dynamically varying load conditions, but also it has more possibility to be tuned for actual load conditions, compiled from the typical smartphone 29 use patterns. The way to obtain the statistical load profiles is aforementioned in Section 2.2.1. I propose an approach to adaptK-meansclustering in order to extract some dis- crete values from the load current values (I out ). The measured data of I out is initially modified toI 0 out thati th value ofI out isf(I out ) times duplicated inI 0 out . is a factor to adjust the weight off(I out ). Then,I 0 out is divided intoK parts evenly, and the initial means of all parts are calculated. For the update procedure in theK-meansclustering, the new means, set to be the centroids of the parts, are calculated until the result of the means converges. Finally, the set of the resultant means for each type of switches be- comeG p andG n , respectively. Then they are applied to Algorithm 1 in order to find the minimum switch set that covers the maximum number of elements inG p andG n . 2.3 Power delivery network Characterization Prior to verifying the efficacy of the proposed VR optimization methods in an actual smartphone platform, the power conversion efficiency of the PDN in the target platform should be characterized. However, the characterization is not a trivial task unless the PDN structure and VR specifications, and all the node voltages and branch currents of the PDN are available. Such a white-box approach is generally not possible for com- mercial smartphone platforms. In this chapter, I attempt a gray-box approach by introducing an equivalent VR con- cept. Modules in the platform are powered through the PDN, composed of a set of VRs, as shown in Figure 2.8. The VR set can be an empty set (direct connection), single VR, a cascade connection of a VR and an LDO, (rarely) a cascade connection of multiple VRs, etc. The equivalent VR models the set of VRs on the path from the battery source to each (set of) module. In other words, the proposed equivalent VR abstraction treats the 30 set of VRs as a single equivalent VR. The abstraction enables a gray-box approach by which one can group modules in a smartphone platform by their required supply voltage levels, which can be obtained from datasheets. Power conversion efficiency improve- ment by adapting the proposed VR optimization methods can effectively be performed once I identify the power conversion efficiency of the PDN in the smartphone platform. 2.3.1 Equivalent VR model I classify the equivalent VR models to present either a single VR, or a cascaded connec- tion of a VR (DC-DC switching VR) and an LDO, named Type I and Type II equivalent VRs, respectively. I assume that the battery output current flows through a voltage regulator in order to produce a constant voltage throughout full discharge cycle of the battery. Without loss of generality, Types I and II equivalent VR models can represent most power conversion tree structures in the PDN [30, 31, 35]. Most digital logic com- ponents can be powered by a single VR from the battery to the module - this gives rise to Type I VR model. A cascade connection of two or more VRs is rare, because increasing the number of cascaded VRs generally increases the cost and area overhead with little (or no) benefit in terms of the conversion efficiency. LDOs are often an indispensable component to provide low-ripple output voltage for switching noise-sensitive RF and analog modules. It is uncommon to use a single LDO from the battery to a load device Subset of modules Battery Input voltage Output voltage Output current Subset of modules . . . . . DC-DC converter Type I equivalent converter model . . . DC-DC converter LDO Type II equivalent converter model . . . Input current Figure 2.8: Types I and II equivalent VR models. 31 due to the required large dropout voltage and hence loss of LDO power efficiency. In- stead, it turns out to be more energy-efficient to first convert the battery voltage using a VR to an internal voltage slightly higher than the device voltage, and subsequently, use an LDO for the final power conversion. According to (4.4) and (1.7), the power loss of the equivalent VR may be expressed as: P eqv =A(I q + N X i=1 I mod;i ) 2 + N X i=1 I mod;i + (B +I q ); (2.10) whereN is the number of modules connected to the equivalent VR;I mod;i is the input current of the i th module; Parameter A for the VR is given by A = R L +DR sw1 + (1D)R sw2 ; B is the sum of the second, third, and last terms of (4.4); = 0 for Type I, and = 1 for Type II equivalent VR; is the input voltages of the LDO; and = (V ref k). I can further simplify (2.10) by defining the output current of the equivalent VR, I eqv_out = P N i=1 I mod;i , and thus, the power loss for both types of equivalent VR models can be expressed as: P eqv =aI eqv_out 2 +bI eqv_out +c; (2.11) where the coefficients a, b, and c are derived from (2.10), and are largely dependent on the VR design specification such as the power MOSFET gate width, inductor IR loss, controller loss, etc. [9]. Calculating those coefficients is the key step of the power conversion efficiency characterization. 2.3.2 Module grouping and regression analysis Measurement (or estimation) of the output current of all the equivalent VRs enables us to estimate the unknown coefficients of the equivalent VR model. The input and output 32 voltage levels of each equivalent VR can be obtained from the device datasheets. For example, the Qualcomm MDP MSM8660 [1] incorporates embedded power sensors that monitor and report current values of each module in the platform with fine granularity. When the target platform does not provide embedded current sensors, I can estimate the module current values by activity profiling [23, 24, 25, 26]. Profiling various applications, which result in diverse usage patterns of the system modules, provides sufficient information and data to perform regression analysis and estimate the unknown coefficients. Linear regression analysis is a widely used method in system identification, requiring (i) a well-designed model and (ii) sufficient experi- mental data to extract the best-fit model coefficients. In reality, however, independent control of each module is a challenging task due to the lack of direct control knobs. For example, if I run an application that activates a camera module, currents flowing into the CPU, GPU, memory, and other associated components also ramp up and down corre- spondingly. I must thus apply linear regression analysis to the whole system (including all smartphone modules) simultaneously, while trying to vary the activity level of each module by running different applications. However, this method may not produce suffi- cient data to cover the whole range of activities for all smartphone modules, especially when the number of modules is large (e.g., the Qualcomm MDP has 27 embedded mod- ules.) This is a potential source of inaccuracy for regression analysis due to the weak training set issue. I tackle the problem by performing a module grouping in order to reduce the number of unknown coefficients that must be determined during the characterization process. This grouping procedure reduces the burden in terms of generating sufficient data to perform the linear regression analysis. The idea is that system modules that require the same operating voltage level can be combined into one group, and each group of 33 Table 2.1: Grouping results for Qualcomm MDP MSM8660. Group Modules V oltage 1 and 2 Group 1: CPU core0 and Group 2: CPU core1 0.8 - 1.225 V 3 Internal Memory, Audio DSP, and 1.1 V Digital core (includes GPU and modems) 4 Audio codec Vdd, LPDDR2, ISM, 1.2 V DRAM, and Camera-digital 5 Audio codec IO, IO PAD3, Display IO, 1.8 V DRAM Vdd1, Camera IO, PLL, and eMMC host interface 6 Camera analog, Haptic, SD card, 2.85 V Touch screen, eMMC (Flash), IO PAD2, SD card, and Ambient light sensor 7 Display memory and Display backlight 3.8 V modules is connected to the battery source via a single equivalent VR, as illustrated in Figure 2.8. This method matches well with low power design practices that try to minimize the number of VRs, due to their cost and internal power losses. Given that the number of different voltage levels required by various modules in a smartphone platform is typically less than 10 [30, 35], the grouping procedure signif- icantly reduces the number of parameters to be determined in linear regression. For example, the classification result of the Qualcomm MDP in Table 2.1 shows that the platform requires only seven groups although the module count is 27. Finally, the total power loss of the smartphone,P loss , is given by: P loss = G X k=1 P eqv;k = G X k=1 (a k I eqv_out;k 2 +b k I eqv_out;k +c k ); (2.12) where G is the number of groups; P eqv;k is the power loss of the k th equivalent VR corresponding to thek th group of modules; I eqv_out;k denotes the output current of the equivalent VR, which can be measured using embedded sensors in the Qualcomm MDP; a k ,b k , andc k are the coefficients of the equivalent VR model (to be determined by linear 34 regression.) I treat the battery voltage presented to the power conversion tree as being (nearly) constant, which is valid considering the function of the regulator between the battery cell/pack and the equivalent VR. Therefore, I may assume thata k ,b k , andc k are constant values. 2.4 Experimental work 2.4.1 Experimental setup Qualcomm MDP MSM8660 is used as an actual smartphone platform, which is equipped with Google Android OS 2.3 on top of Snapdragon 1.5 GHz asynchronous dual-core CPU, a 3D-supporting GPU, 3:61 00 WVGA multi-touch screen, 1 GB internal RAM, 16 GB on-board flash, WiFi, Bluetooth, a GPS, dual-side cameras, etc. I perform power measurement of each module using the application profiling tool named Trepn TM . Use of Trepn TM ensures higher accuracy of the measurements. Note, however, that my proposed method is independent of the measurement tools, e.g., I may use activity pro- filing for power measurement provided by Google or based on techniques presented in the literature [23, 24, 25, 26]. The collected data from MDP8660 is next processed by MATLAB for the characterization as well as the optimization procedures. 2.4.2 Coefficient identification As shown in Table 2.1, the Qualcomm MDP modules can be classified into seven groups based on their operating voltage levels. Some modules such as the CPU cores in the MDP use dynamic voltage and frequency scaling (DVFS) techniques that require a range of variable supply voltage levels. Consequently, I keep each CPU core in a separate group but treat the equivalent VRs of these groups identical to each other. Group 7 is 35 associated with display, and therefore, the backlight brightness level mostly determines the current demand in this group. Group 7 coefficients are easy to identify because I can independently control the brightness of the display. In other words, I first perform the linear regression to identify coefficients of the equivalent VR model of Group 7, separately from the other groups. For the remaining six groups, I profile various applications and collect sufficient data for the regression analysis as explained earlier. It is difficult to identify everyc k coeffi- cient of thek th equivalent VRs directly from the linear regression process. Rather, I only extractc ext that corresponds to the sum of all the constant terms in (2.12), i.e., c ext = P G k=1 c k . I find an approximate value for each c k as c k = c ext (P group;k =P group;total ), whereP group;k denotes the power consumption of Groupk, andP group;total is the total power consumption of all the groups. TheP group;k andP group;total values are available from the embedded sensors in the MDP. The extracted coefficients of the seven equivalent VRs are reported in Table 2.2. The power conversion efficiency of Groupk, derived from (P group;k =(P group;k +P eqv;k )), is shown in Figure 2.9. I verify the characterization results of each equivalent VRs. Fig- ure 2.10 shows the comparison of the system power consumption trace between the real measurement as reported by a built-in battery sensor and the estimation as obtained by my extracted equivalent VR coefficients. The trace includes 10 mobile applications, as stated in Section 2.2.1. I measure the error as a signal-to-noise ratio, and the resulting Table 2.2: Extracted coefficients for each group. k a k b k c k k a k b k c k 1,2 0.4427 0.0025 0.0170 5 0.1971 0.5232 0.0128 3 0.4079 0.1742 0.0675 6 0.1814 0.2928 0.0320 4 0.1152 0.1757 0.0077 7 0.4091 0.3871 0.0289 36 Measured current distribution, Measured efficiency, Model, group1 group2 Mean (a) Groups 1 and 2 (b) Group 3 100 80 60 40 20 0 100 80 60 40 20 0 Efficiency (%) Efficiency (%) Distribution .06 .04 .02 0 0 50 100 150 200 250 Current (mA) (c) Group 4 (d) Group 5 0 50 100 150 200 250 Current (mA) 300 Distribution .06 .04 .02 0 .08 02040 60 80 100 Current (mA) Distribution .06 .04 .02 0 .08 010 20 30 40 50 Current (mA) 60 (e) Group 6 (f) Group 7 Distribution .01 0 .02 Distribution .06 .04 .02 0 .08 010 20 30 40 50 Current (mA) 60 020 40 60 80 100 Current (mA) Distribution .01 0 .02 100 80 60 40 20 0 Efficiency (%) 100 80 60 40 20 0 Efficiency (%) 100 80 60 40 20 0 Efficiency (%) 100 80 60 40 20 0 Efficiency (%) Figure 2.9: Conversion efficiencies for all groups. average error is 0.075. The standard deviation of the error is 0.059. The worst case aver- age error is 0.128 and is seen for ‘Neocore’ (there is a rare but important synchronization problem with the built-in sensor causes extreme worst error in this case). I also run four completely new mobile benchmarks (they are different from the one used for the regres- sion analysis): ‘Antutu’ [36], [37], ‘Quadrant’, [38] and ‘GLBenchmark’ [39]. These benchmarks are designed to test the performance of various modules in the smartphone platform. In particular, ‘Vellamo’ includes HTML5 and METAL chapter to evaluate the mobile web browser performance and the mobile processors, respectively. ’GLBench- mark’ and ‘Antutu’ include a 3D testing for GPU. ‘Quadrant’ performs CPU, Memory, 37 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0.8 1 1.2 1.4 1.6 Time (msec) Power (W) Total power consumption Time (s) Power (W) 1.6 1.2 1.0 0.8 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0.8 1 1.2 1.4 1.6 Time (msec) Power (W) Total power consumption data1 data2 data3 data4 Measured Model 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0.8 1 1.2 1.4 1.6 Time (msec) Power (W) Total power consumption data1 data2 data3 data4 1.4 0.6 400 500 600 0 100 200 700 800 900 300 Figure 2.10: A part of traces of total power consumption: measured data and modeled data. I/O testings. Therefore, I believe these four new benchmarks are sufficient to evaluate my regression analysis. The resulting average error and standard deviation are: 0.047, 0.046 for ‘Antutu’, 0.062, 0.040 for ‘Vellamo:Metal’, 0.092, 0.052 for ‘Vellamo:Html5’, 0.064, 0.045 for ‘Quadrant’, and 0.065, 0.058 for ‘GLBenchmark’. I have thus con- firmed that the results of the power conversion efficiency characterization process is accurate enough for the subsequent optimization process. 2.4.3 Default widths extraction Given that the PMOS switch typically has smaller current per width than the NMOS switch, the PMOS switch is much larger than NMOS switch in the VRs [14]. I thus focus on scaling the width of the PMOS switch, and the NMOS switch is sized to have the same resistance of the PMOS switch (i.e., the widths of both switches are in turn 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 80 Efficiency (%) Current (mA) 70 60 50 40 30 20 10 0 90 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 W' (a) Efficiency : Group 7 (b) Power loss : Group 7 . . . Wdef Wdef Wdef Wdef 0 50 100 150 80 Efficiency (%) Current (mA) 60 40 20 0 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 (a) Efficiency : Group 7 (b) Power loss : Group 7 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 80 Efficiency (%) Current (mA) 70 60 50 40 30 20 10 0 90 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 W' (a) Efficiency : Group 7 (b) Power loss : Group 7 . . . Wdef Wdef Wdef Wdef 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 80 Efficiency (%) Current (mA) 70 60 50 40 30 20 10 0 90 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 W' (a) Efficiency : Group 7 (b) Power loss : Group 7 . . . Wdef Wdef Wdef Wdef 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 80 Efficiency (%) Current (mA) 70 60 50 40 30 20 10 0 90 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 W' (a) Efficiency : Group 7 (b) Power loss : Group 7 . . . Wdef Wdef Wdef Wdef 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 0 50 100 150 80 Efficiency (%) Current (mA) 70 60 50 40 30 20 10 0 90 0 50 100 150 0 10 20 30 40 50 60 70 80 90 Current (mA) Efficiency (%) Group6 Efficiency Mesured data W=original W=0.1 W=0.5 W=1.5 0 0.5 1 1.5 30 35 40 45 50 55 60 65 Group6 W Efficiency W Powerloss (W) 65 60 55 50 45 40 35 30 Powerloss (mW) 0 opt 0.5 1 1.5 W' (a) Efficiency : Group 7 (b) Power loss : Group 7 . . . Wdef Wdef Wdef Wdef Table4: Example: DC-DCconvertertuningresultsoffourtypesofapplications. Clock Call Group W opt Gain η Gain P Gain η,max Gain P,max W opt Gain η Gain P Gain η,max Gain P,max Group1 0.2356 11.3508 34.5181 27.2733 43.1362 0.2852 8.7073 30.6121 23.7710 40.8894 Group2 0.1369 21.1175 42.8718 31.2166 47.0092 0.1991 12.9707 37.3828 27.1296 44.5621 Group3 0.3751 6.7329 15.7217 13.5717 24.4609 0.3751 6.6543 15.5894 13.5717 24.4609 Group4 0.1180 15.2996 34.9625 19.4536 38.4971 0.1704 10.2327 28.7502 12.4670 31.7882 Group5 0.1123 15.5010 30.7496 17.9247 33.0915 0.1263 13.0017 27.9058 15.4971 30.7201 Group6 0.1128 14.6603 34.4909 15.9501 35.8211 0.1269 14.0235 33.7792 16.0955 35.9118 Group7 0.0884 6.9630 23.5545 8.1285 25.3117 — — — — — Overall — 9.9381 24.8974 15.0994 30.8775 — 9.3516 23.5594 16.2542 30.9136 Webbrowsing(Facebook) Videochat(Skype) Group1 0.4216 4.4201 20.7700 11.9557 32.7724 0.4216 4.3408 20.544 12.8354 33.2795 Group2 0.2862 8.4227 30.8524 14.4969 38.0183 0.4230 4.1481 20.9525 13.4634 33.9652 Group3 0.3862 6.0869 14.5694 13.3955 24.1807 0.3751 6.6947 15.6603 13.5717 24.4609 Group4 0.2097 8.5744 25.9577 14.0064 33.1636 0.5375 1.6203 7.4004 2.5731 10.9563 Group5 0.1263 12.514 27.2880 15.7297 30.9577 0.1824 8.4489 21.3063 9.7984 23.4784 Group6 0.1128 14.3447 34.1397 15.9733 35.8437 0.2256 6.0727 24.9642 6.4484 25.8884 Group7 0.2210 1.9788 10.5936 2.0172 10.7499 0.1179 4.9756 19.7093 6.4110 22.5335 Overall — 5.7545 19.0159 8.8194 25.0937 — 5.3158 17.9998 9.0870 24.6192 Table5: Example: DC-DCconvertertuningresultsofsixtypes ofapplications. Application Gain η Gain P Gain η,max Gain P,max Setting 10.5807 25.6927 18.4262 32.4668 Camera 5.8590 19.0445 7.6900 23.1567 Game(Neocore) 6.2805 20.5779 8.4631 25.0661 Map(GoogleMap) 5.6135 18.7018 8.9731 25.1223 SMS 6.4957 20.4575 9.8490 26.5044 Media(Youtube) 6.6285 20.6710 9.4164 25.9968 Table6: Example: DC-DCconvertertuningresultsoftwotypes ofthesmartphoneusagepatterns. Usagepattern Gain η Gain P Gain η,max Gain P,max TypeI 6.2176 20.0263 13.6376 29.4715 TypeII 6.2955 20.0705 15.2870 30.0543 the display is the highest. Clock is measured under the median level of the backlight, WiFi on, and Setting is measured under the lowest backlight and WiFi off. For the case of Call, we consider autoturn-offscreenduringthecall. We apply the 10 types of applications in Table 4 and Table 5 to two representative types of smartphone usage patterns studied in [15]. The resulted distribution of the output currents from the firsttypeoftheusagepatternsisshowninFigure4. Table6shows theoptimizationresultsforbothtypesofusagepatterns. 5. CONCLUSIONS Thispapershowsthatsignificantpowerlossincursduringpower conversion from the battery to devices in modern smartphones. Thisisadownsideofactivesemiconductortechnologyscalingthat makes different technology devices require different supply volt- age levels. However, such a trend should not be discouraged be- cause of the advantages from technology scaling. Instead, this pa- per first introduces systematic system-level power conversion ef- ficiency enhancement for smartphones. First, we propose equiv- alent power converter concept that abstract a complicated power converter tree from the battery to a device into a single equivalent power converter. This again enables us to identify the model co- efficients from application profiling. The proposed identification canbeappliedtocommercialsmartphonesthatdonothavecurrent sensors. We demonstrated the accuracy of power conversion effi- ciency identification and how the current power converter setup is offset from the optimal operating conditions. The proposed power convertertuningshowed5%to18%overallpowerconversioneffi- ciency enhancement, which restores up to 32% power loss during powerconversion. W=0.1W def W=W def W=1.5W def 6. REFERENCES [1] A.Shye,B.Scholbrock,andG.Memik,“Intothewild: Studyingreal useractivitypatternstoguidepoweroptimizationsformobile architectures,”MICRO,2009. [2] L.Zhang,B.Tiwana,Z.Qian,Z.Wang,R.P.Dick,Z.M.Mao,and L.Yang,“Accurateonlinepowerestimationandautomaticbattery behaviorbasedpowermodelgenerationforsmartphones,” CODES/ISSS,2010. [3] M.DongandL.Zhong,“Self-constructivehigh-ratesystemenergy modelingforbattery-poweredmobilesystems,”MobiSys,2011. [4] A.Pathak,Y.C.Hu,M.Zhang,P.Bahl,andY.Wang,“Fine-grained powermodelingforsmartphonesusingsystemcalltracing,”EuroSys, 2011. [5] W.YuanandK.Nahrstedt,“Energy-efficientsoftreal-timeCPU schedulingformobilemultimediasystems,”SOSP,2003. [6] D.Shin,Y.Kim,N.Chang,andM.Pedram,“Dynamicvoltage scalingofoleddisplays,” DAC,2011. [7] C.Inseok,S.Hojun,andC.Naehyuck,“Low-powercolorTFTLCD displayforhand-heldembeddedsystems,”ISLPED,2002. [8] Y.Choi,N.Chang,andT.Kim,“DC-DCconverter-awarepower managementforlow-powerembeddedsystems,”IEEET. on Computer-AidedDesignofIntegratedCircuitsandSystems,2007. [9] C.Shi,B.C.Walker,E.Zeisel,E.B.Hu,andG.H.McAllister,“A highlyintegratedpowermanagementICforadvancedmobile applications,”CICC,2006. [10] B.AmelifardandM.Pedram,“Optimaldesignofthepower-delivery networkformultiplevoltage-islandsystem-on-chips,”IEEE T. on Computer-AidedDesignofIntegratedCircuitsandSystems,2009. [11] TexasInstruments,“Handset:smartphone,” Availableat: http://www.ti.com/solution/handset_smartphone. [12] Qualcomm,“Snapdragonłmdpmsm8660datasheet,” Available at: https://developer.qualcomm.com/develop/development- devices/snapdragon-mdp-msm8660. [13] G.A.Rincon-MoraandP.E.Allen,“Alow-voltage,lowquiescent current,lowdrop-outregulator,” IEEEJ. ofSolid-State Circuits, 1998. [14] J.Xiao,A.Peterchev,J.Zhang,andS.Sanders,“Anultra-low-power digitally-controlledbuckconverterICforcellularphone applications,”APEC,2004. [15] F.Hossein,M.Ratul,K.Srikanth,L.Dimitrios,G.Ramesh,and E.Deborah,“Diversityinsmartphoneusage,”MobiSys,2010. Table4: Example: DC-DCconvertertuningresultsoffourtypesofapplications. Clock Call Group W opt Gain η Gain P Gain η,max Gain P,max W opt Gain η Gain P Gain η,max Gain P,max Group1 0.2356 11.3508 34.5181 27.2733 43.1362 0.2852 8.7073 30.6121 23.7710 40.8894 Group2 0.1369 21.1175 42.8718 31.2166 47.0092 0.1991 12.9707 37.3828 27.1296 44.5621 Group3 0.3751 6.7329 15.7217 13.5717 24.4609 0.3751 6.6543 15.5894 13.5717 24.4609 Group4 0.1180 15.2996 34.9625 19.4536 38.4971 0.1704 10.2327 28.7502 12.4670 31.7882 Group5 0.1123 15.5010 30.7496 17.9247 33.0915 0.1263 13.0017 27.9058 15.4971 30.7201 Group6 0.1128 14.6603 34.4909 15.9501 35.8211 0.1269 14.0235 33.7792 16.0955 35.9118 Group7 0.0884 6.9630 23.5545 8.1285 25.3117 — — — — — Overall — 9.9381 24.8974 15.0994 30.8775 — 9.3516 23.5594 16.2542 30.9136 Webbrowsing(Facebook) Videochat(Skype) Group1 0.4216 4.4201 20.7700 11.9557 32.7724 0.4216 4.3408 20.544 12.8354 33.2795 Group2 0.2862 8.4227 30.8524 14.4969 38.0183 0.4230 4.1481 20.9525 13.4634 33.9652 Group3 0.3862 6.0869 14.5694 13.3955 24.1807 0.3751 6.6947 15.6603 13.5717 24.4609 Group4 0.2097 8.5744 25.9577 14.0064 33.1636 0.5375 1.6203 7.4004 2.5731 10.9563 Group5 0.1263 12.514 27.2880 15.7297 30.9577 0.1824 8.4489 21.3063 9.7984 23.4784 Group6 0.1128 14.3447 34.1397 15.9733 35.8437 0.2256 6.0727 24.9642 6.4484 25.8884 Group7 0.2210 1.9788 10.5936 2.0172 10.7499 0.1179 4.9756 19.7093 6.4110 22.5335 Overall — 5.7545 19.0159 8.8194 25.0937 — 5.3158 17.9998 9.0870 24.6192 Table5: Example: DC-DCconvertertuningresultsofsixtypes ofapplications. Application Gain η Gain P Gain η,max Gain P,max Setting 10.5807 25.6927 18.4262 32.4668 Camera 5.8590 19.0445 7.6900 23.1567 Game(Neocore) 6.2805 20.5779 8.4631 25.0661 Map(GoogleMap) 5.6135 18.7018 8.9731 25.1223 SMS 6.4957 20.4575 9.8490 26.5044 Media(Youtube) 6.6285 20.6710 9.4164 25.9968 Table6: Example: DC-DCconvertertuningresultsoftwotypes ofthesmartphoneusagepatterns. Usagepattern Gain η Gain P Gain η,max Gain P,max TypeI 6.2176 20.0263 13.6376 29.4715 TypeII 6.2955 20.0705 15.2870 30.0543 the display is the highest. Clock is measured under the median level of the backlight, WiFi on, and Setting is measured under the lowest backlight and WiFi off. For the case of Call, we consider autoturn-offscreenduringthecall. We apply the 10 types of applications in Table 4 and Table 5 to two representative types of smartphone usage patterns studied in [15]. The resulted distribution of the output currents from the firsttypeoftheusagepatternsisshowninFigure4. Table6shows theoptimizationresultsforbothtypesofusagepatterns. 5. CONCLUSIONS Thispapershowsthatsignificantpowerlossincursduringpower conversion from the battery to devices in modern smartphones. Thisisadownsideofactivesemiconductortechnologyscalingthat makes different technology devices require different supply volt- age levels. However, such a trend should not be discouraged be- cause of the advantages from technology scaling. Instead, this pa- per first introduces systematic system-level power conversion ef- ficiency enhancement for smartphones. First, we propose equiv- alent power converter concept that abstract a complicated power converter tree from the battery to a device into a single equivalent power converter. This again enables us to identify the model co- efficients from application profiling. The proposed identification canbeappliedtocommercialsmartphonesthatdonothavecurrent sensors. We demonstrated the accuracy of power conversion effi- ciency identification and how the current power converter setup is offset from the optimal operating conditions. The proposed power convertertuningshowed5%to18%overallpowerconversioneffi- ciency enhancement, which restores up to 32% power loss during powerconversion. W=0.1W def W=W def W=1.5W def 6. REFERENCES [1] A.Shye,B.Scholbrock,andG.Memik,“Intothewild: Studyingreal useractivitypatternstoguidepoweroptimizationsformobile architectures,”MICRO,2009. [2] L.Zhang,B.Tiwana,Z.Qian,Z.Wang,R.P.Dick,Z.M.Mao,and L.Yang,“Accurateonlinepowerestimationandautomaticbattery behaviorbasedpowermodelgenerationforsmartphones,” CODES/ISSS,2010. [3] M.DongandL.Zhong,“Self-constructivehigh-ratesystemenergy modelingforbattery-poweredmobilesystems,”MobiSys,2011. [4] A.Pathak,Y.C.Hu,M.Zhang,P.Bahl,andY.Wang,“Fine-grained powermodelingforsmartphonesusingsystemcalltracing,”EuroSys, 2011. [5] W.YuanandK.Nahrstedt,“Energy-efficientsoftreal-timeCPU schedulingformobilemultimediasystems,”SOSP,2003. [6] D.Shin,Y.Kim,N.Chang,andM.Pedram,“Dynamicvoltage scalingofoleddisplays,” DAC,2011. [7] C.Inseok,S.Hojun,andC.Naehyuck,“Low-powercolorTFTLCD displayforhand-heldembeddedsystems,”ISLPED,2002. [8] Y.Choi,N.Chang,andT.Kim,“DC-DCconverter-awarepower managementforlow-powerembeddedsystems,”IEEET. on Computer-AidedDesignofIntegratedCircuitsandSystems,2007. [9] C.Shi,B.C.Walker,E.Zeisel,E.B.Hu,andG.H.McAllister,“A highlyintegratedpowermanagementICforadvancedmobile applications,”CICC,2006. [10] B.AmelifardandM.Pedram,“Optimaldesignofthepower-delivery networkformultiplevoltage-islandsystem-on-chips,”IEEE T. on Computer-AidedDesignofIntegratedCircuitsandSystems,2009. [11] TexasInstruments,“Handset:smartphone,” Availableat: http://www.ti.com/solution/handset_smartphone. [12] Qualcomm,“Snapdragonłmdpmsm8660datasheet,” Available at: https://developer.qualcomm.com/develop/development- devices/snapdragon-mdp-msm8660. [13] G.A.Rincon-MoraandP.E.Allen,“Alow-voltage,lowquiescent current,lowdrop-outregulator,”IEEEJ.ofSolid-StateCircuits, 1998. [14] J.Xiao,A.Peterchev,J.Zhang,andS.Sanders,“Anultra-low-power digitally-controlledbuckconverterICforcellularphone applications,”APEC,2004. [15] F.Hossein,M.Ratul,K.Srikanth,L.Dimitrios,G.Ramesh,and E.Deborah,“Diversityinsmartphoneusage,”MobiSys,2010. Table4: Example: DC-DCconvertertuningresultsoffourtypesofapplications. Clock Call Group W opt Gain η Gain P Gain η,max Gain P,max W opt Gain η Gain P Gain η,max Gain P,max Group1 0.2356 11.3508 34.5181 27.2733 43.1362 0.2852 8.7073 30.6121 23.7710 40.8894 Group2 0.1369 21.1175 42.8718 31.2166 47.0092 0.1991 12.9707 37.3828 27.1296 44.5621 Group3 0.3751 6.7329 15.7217 13.5717 24.4609 0.3751 6.6543 15.5894 13.5717 24.4609 Group4 0.1180 15.2996 34.9625 19.4536 38.4971 0.1704 10.2327 28.7502 12.4670 31.7882 Group5 0.1123 15.5010 30.7496 17.9247 33.0915 0.1263 13.0017 27.9058 15.4971 30.7201 Group6 0.1128 14.6603 34.4909 15.9501 35.8211 0.1269 14.0235 33.7792 16.0955 35.9118 Group7 0.0884 6.9630 23.5545 8.1285 25.3117 — — — — — Overall — 9.9381 24.8974 15.0994 30.8775 — 9.3516 23.5594 16.2542 30.9136 Webbrowsing(Facebook) Videochat(Skype) Group1 0.4216 4.4201 20.7700 11.9557 32.7724 0.4216 4.3408 20.544 12.8354 33.2795 Group2 0.2862 8.4227 30.8524 14.4969 38.0183 0.4230 4.1481 20.9525 13.4634 33.9652 Group3 0.3862 6.0869 14.5694 13.3955 24.1807 0.3751 6.6947 15.6603 13.5717 24.4609 Group4 0.2097 8.5744 25.9577 14.0064 33.1636 0.5375 1.6203 7.4004 2.5731 10.9563 Group5 0.1263 12.514 27.2880 15.7297 30.9577 0.1824 8.4489 21.3063 9.7984 23.4784 Group6 0.1128 14.3447 34.1397 15.9733 35.8437 0.2256 6.0727 24.9642 6.4484 25.8884 Group7 0.2210 1.9788 10.5936 2.0172 10.7499 0.1179 4.9756 19.7093 6.4110 22.5335 Overall — 5.7545 19.0159 8.8194 25.0937 — 5.3158 17.9998 9.0870 24.6192 Table5: Example: DC-DCconvertertuningresultsofsixtypes ofapplications. Application Gain η Gain P Gain η,max Gain P,max Setting 10.5807 25.6927 18.4262 32.4668 Camera 5.8590 19.0445 7.6900 23.1567 Game(Neocore) 6.2805 20.5779 8.4631 25.0661 Map(GoogleMap) 5.6135 18.7018 8.9731 25.1223 SMS 6.4957 20.4575 9.8490 26.5044 Media(Youtube) 6.6285 20.6710 9.4164 25.9968 Table6: Example: DC-DCconvertertuningresultsoftwotypes ofthesmartphoneusagepatterns. Usagepattern Gain η Gain P Gain η,max Gain P,max TypeI 6.2176 20.0263 13.6376 29.4715 TypeII 6.2955 20.0705 15.2870 30.0543 the display is the highest. Clock is measured under the median level of the backlight, WiFi on, and Setting is measured under the lowest backlight and WiFi off. For the case of Call, we consider autoturn-offscreenduringthecall. We apply the 10 types of applications in Table 4 and Table 5 to two representative types of smartphone usage patterns studied in [15]. The resulted distribution of the output currents from the firsttypeoftheusagepatternsisshowninFigure4. Table6shows theoptimizationresultsforbothtypesofusagepatterns. 5. CONCLUSIONS Thispapershowsthatsignificantpowerlossincursduringpower conversion from the battery to devices in modern smartphones. Thisisadownsideofactivesemiconductortechnologyscalingthat makes different technology devices require different supply volt- age levels. However, such a trend should not be discouraged be- cause of the advantages from technology scaling. Instead, this pa- per first introduces systematic system-level power conversion ef- ficiency enhancement for smartphones. First, we propose equiv- alent power converter concept that abstract a complicated power converter tree from the battery to a device into a single equivalent power converter. This again enables us to identify the model co- efficients from application profiling. The proposed identification canbeappliedtocommercialsmartphonesthatdonothavecurrent sensors. We demonstrated the accuracy of power conversion effi- ciency identification and how the current power converter setup is offset from the optimal operating conditions. The proposed power convertertuningshowed5%to18%overallpowerconversioneffi- ciency enhancement, which restores up to 32% power loss during powerconversion. W=0.1W def W=W def W=1.5W def 6. REFERENCES [1] A.Shye,B.Scholbrock,andG.Memik,“Intothewild: Studyingreal useractivitypatternstoguidepoweroptimizationsformobile architectures,”MICRO,2009. [2] L.Zhang,B.Tiwana,Z.Qian,Z.Wang,R.P.Dick,Z.M.Mao,and L.Yang,“Accurateonlinepowerestimationandautomaticbattery behaviorbasedpowermodelgenerationforsmartphones,” CODES/ISSS,2010. [3] M.DongandL.Zhong,“Self-constructivehigh-ratesystemenergy modelingforbattery-poweredmobilesystems,”MobiSys,2011. [4] A.Pathak,Y.C.Hu,M.Zhang,P.Bahl,andY.Wang,“Fine-grained powermodelingforsmartphonesusingsystemcalltracing,”EuroSys, 2011. [5] W.YuanandK.Nahrstedt,“Energy-efficientsoftreal-timeCPU schedulingformobilemultimediasystems,”SOSP,2003. [6] D.Shin,Y.Kim,N.Chang,andM.Pedram,“Dynamicvoltage scalingofoleddisplays,” DAC,2011. [7] C.Inseok,S.Hojun,andC.Naehyuck,“Low-powercolorTFTLCD displayforhand-heldembeddedsystems,”ISLPED,2002. [8] Y.Choi,N.Chang,andT.Kim,“DC-DCconverter-awarepower managementforlow-powerembeddedsystems,”IEEE T. on Computer-Aided Designof Integrated Circuitsand Systems,2007. [9] C.Shi,B.C.Walker,E.Zeisel,E.B.Hu,andG.H.McAllister,“A highlyintegratedpowermanagementICforadvancedmobile applications,”CICC,2006. [10] B.AmelifardandM.Pedram,“Optimaldesignofthepower-delivery networkformultiplevoltage-islandsystem-on-chips,”IEEE T. on Computer-Aided Designof Integrated Circuitsand Systems,2009. [11] TexasInstruments,“Handset:smartphone,” Availableat: http://www.ti.com/solution/handset_smartphone. [12] Qualcomm,“Snapdragonłmdpmsm8660datasheet,” Available at: https://developer.qualcomm.com/develop/development- devices/snapdragon-mdp-msm8660. [13] G.A.Rincon-MoraandP.E.Allen,“Alow-voltage,lowquiescent current,lowdrop-outregulator,” IEEE J.of Solid-StateCircuits, 1998. [14] J.Xiao,A.Peterchev,J.Zhang,andS.Sanders,“Anultra-low-power digitally-controlledbuckconverterICforcellularphone applications,”APEC,2004. [15] F.Hossein,M.Ratul,K.Srikanth,L.Dimitrios,G.Ramesh,and E.Deborah,“Diversityinsmartphoneusage,”MobiSys,2010. Table4: Example: DC-DCconvertertuningresultsoffourtypesofapplications. Clock Call Group W opt Gain η Gain P Gain η,max Gain P,max W opt Gain η Gain P Gain η,max Gain P,max Group1 0.2356 11.3508 34.5181 27.2733 43.1362 0.2852 8.7073 30.6121 23.7710 40.8894 Group2 0.1369 21.1175 42.8718 31.2166 47.0092 0.1991 12.9707 37.3828 27.1296 44.5621 Group3 0.3751 6.7329 15.7217 13.5717 24.4609 0.3751 6.6543 15.5894 13.5717 24.4609 Group4 0.1180 15.2996 34.9625 19.4536 38.4971 0.1704 10.2327 28.7502 12.4670 31.7882 Group5 0.1123 15.5010 30.7496 17.9247 33.0915 0.1263 13.0017 27.9058 15.4971 30.7201 Group6 0.1128 14.6603 34.4909 15.9501 35.8211 0.1269 14.0235 33.7792 16.0955 35.9118 Group7 0.0884 6.9630 23.5545 8.1285 25.3117 — — — — — Overall — 9.9381 24.8974 15.0994 30.8775 — 9.3516 23.5594 16.2542 30.9136 Webbrowsing(Facebook) Videochat(Skype) Group1 0.4216 4.4201 20.7700 11.9557 32.7724 0.4216 4.3408 20.544 12.8354 33.2795 Group2 0.2862 8.4227 30.8524 14.4969 38.0183 0.4230 4.1481 20.9525 13.4634 33.9652 Group3 0.3862 6.0869 14.5694 13.3955 24.1807 0.3751 6.6947 15.6603 13.5717 24.4609 Group4 0.2097 8.5744 25.9577 14.0064 33.1636 0.5375 1.6203 7.4004 2.5731 10.9563 Group5 0.1263 12.514 27.2880 15.7297 30.9577 0.1824 8.4489 21.3063 9.7984 23.4784 Group6 0.1128 14.3447 34.1397 15.9733 35.8437 0.2256 6.0727 24.9642 6.4484 25.8884 Group7 0.2210 1.9788 10.5936 2.0172 10.7499 0.1179 4.9756 19.7093 6.4110 22.5335 Overall — 5.7545 19.0159 8.8194 25.0937 — 5.3158 17.9998 9.0870 24.6192 Table5: Example: DC-DCconvertertuningresultsofsixtypes ofapplications. Application Gain η Gain P Gain η,max Gain P,max Setting 10.5807 25.6927 18.4262 32.4668 Camera 5.8590 19.0445 7.6900 23.1567 Game(Neocore) 6.2805 20.5779 8.4631 25.0661 Map(GoogleMap) 5.6135 18.7018 8.9731 25.1223 SMS 6.4957 20.4575 9.8490 26.5044 Media(Youtube) 6.6285 20.6710 9.4164 25.9968 Table6: Example: DC-DCconvertertuningresultsoftwotypes ofthesmartphoneusagepatterns. Usagepattern Gain η Gain P Gain η,max Gain P,max TypeI 6.2176 20.0263 13.6376 29.4715 TypeII 6.2955 20.0705 15.2870 30.0543 the display is the highest. Clock is measured under the median level of the backlight, WiFi on, and Setting is measured under the lowest backlight and WiFi off. For the case of Call, we consider autoturn-offscreenduringthecall. We apply the 10 types of applications in Table 4 and Table 5 to two representative types of smartphone usage patterns studied in [15]. The resulted distribution of the output currents from the firsttypeoftheusagepatternsisshowninFigure4. Table6shows theoptimizationresultsforbothtypesofusagepatterns. 5. CONCLUSIONS Thispapershowsthatsignificantpowerlossincursduringpower conversion from the battery to devices in modern smartphones. Thisisadownsideofactivesemiconductortechnologyscalingthat makes different technology devices require different supply volt- age levels. However, such a trend should not be discouraged be- cause of the advantages from technology scaling. Instead, this pa- per first introduces systematic system-level power conversion ef- ficiency enhancement for smartphones. First, we propose equiv- alent power converter concept that abstract a complicated power converter tree from the battery to a device into a single equivalent power converter. This again enables us to identify the model co- efficients from application profiling. The proposed identification canbeappliedtocommercialsmartphonesthatdonothavecurrent sensors. We demonstrated the accuracy of power conversion effi- ciency identification and how the current power converter setup is offset from the optimal operating conditions. The proposed power convertertuningshowed5%to18%overallpowerconversioneffi- ciency enhancement, which restores up to 32% power loss during powerconversion. W=0.5W def W=W def W=1.5W def 6. REFERENCES [1] A.Shye,B.Scholbrock,andG.Memik,“Intothewild: Studyingreal useractivitypatternstoguidepoweroptimizationsformobile architectures,”MICRO,2009. [2] L.Zhang,B.Tiwana,Z.Qian,Z.Wang,R.P.Dick,Z.M.Mao,and L.Yang,“Accurateonlinepowerestimationandautomaticbattery behaviorbasedpowermodelgenerationforsmartphones,” CODES/ISSS,2010. [3] M.DongandL.Zhong,“Self-constructivehigh-ratesystemenergy modelingforbattery-poweredmobilesystems,”MobiSys,2011. [4] A.Pathak,Y.C.Hu,M.Zhang,P.Bahl,andY.Wang,“Fine-grained powermodelingforsmartphonesusingsystemcalltracing,”EuroSys, 2011. [5] W.YuanandK.Nahrstedt,“Energy-efficientsoftreal-timeCPU schedulingformobilemultimediasystems,”SOSP,2003. [6] D.Shin,Y.Kim,N.Chang,andM.Pedram,“Dynamicvoltage scalingofoleddisplays,” DAC,2011. [7] C.Inseok,S.Hojun,andC.Naehyuck,“Low-powercolorTFTLCD displayforhand-heldembeddedsystems,”ISLPED,2002. [8] Y.Choi,N.Chang,andT.Kim,“DC-DCconverter-awarepower managementforlow-powerembeddedsystems,”IEEET. on Computer-AidedDesignofIntegratedCircuitsandSystems,2007. [9] C.Shi,B.C.Walker,E.Zeisel,E.B.Hu,andG.H.McAllister,“A highlyintegratedpowermanagementICforadvancedmobile applications,”CICC,2006. [10] B.AmelifardandM.Pedram,“Optimaldesignofthepower-delivery networkformultiplevoltage-islandsystem-on-chips,”IEEE T. on Computer-AidedDesignofIntegratedCircuitsandSystems,2009. [11] TexasInstruments,“Handset:smartphone,” Availableat: http://www.ti.com/solution/handset_smartphone. [12] Qualcomm,“Snapdragonłmdpmsm8660datasheet,” Available at: https://developer.qualcomm.com/develop/development- devices/snapdragon-mdp-msm8660. [13] G.A.Rincon-MoraandP.E.Allen,“Alow-voltage,lowquiescent current,lowdrop-outregulator,”IEEEJ.ofSolid-StateCircuits, 1998. [14] J.Xiao,A.Peterchev,J.Zhang,andS.Sanders,“Anultra-low-power digitally-controlledbuckconverterICforcellularphone applications,”APEC,2004. [15] F.Hossein,M.Ratul,K.Srikanth,L.Dimitrios,G.Ramesh,and E.Deborah,“Diversityinsmartphoneusage,”MobiSys,2010. δ Figure 2.11: Relation between the power conversion efficiency andW : Group 7 38 Table 2.3:W def of the equivalent VR models. Group 1, 2 3 4 5 6 7 W def 1.2401 1.1033 1.3109 1.4033 1.4102 0.7368 linearly proportional to each other). From (1.4) and (1.5), the VR power loss model may be generally expressed as a function ofW : P VR = r 1 W +r 2 I out 2 +r 3 W +r 4 ; (2.13) whereW is linearly proportional to both width of PMOS and NMOS switches;r 1 ,r 2 , r 3 , andr 4 are constants. Given that the two MOSFET switches in a VR dominate the power loss of the equiv- alent VR, andI q is small, I rewrite (2.11) as: P eqv;k = r 1;k W k +r 2;k I eqv_out;k 2 +bI eqv_out;k +r 3;k W k +r 4;k ; (2.14) whereP eqv;k andI eqv_out;k are the power loss and the output current of thek th equivalent VR, respectively, corresponding to the k th group of modules; W k , r 1;k , r 2;k , r 3;k and r 4;k are the coefficients of the equivalent VR model that have been determined based on linear regression. In the linear regression procedure, I carefully set the initial condition not be trapped in a local minimum. Then the resultant coefficientW k is the default value ofW (W def ) of thek th equivalent VR, which is shown in Table 2.3. 39 2.4.4 Simulation results: static switch sizing Figure 2.11 (a) shows an example in whichW changes the efficiency graph of Group 7. Figure 2.11 (b) shows that the power loss plots have a convex functional form in terms ofW . From (2.3) and (2.14), the optimalW of thek th group is calculated by: W opt;k = s I eqv_out;k I 2 out f k (I out )dI out r r 1;k r 3;k ; (2.15) wheref k (I out ) is thek th load current distribution. In order to derive f k (I out ), I use the collected loading profiles. As introduced at Section 2.2.1, I run the 10 representative mobile applications, and the loading profiles of all the modules ink th groups are measured for each application. All the applications except ‘Clock’ and ‘System setting’ are run under the same setup where WiFi is turned on and the backlight level of the display is the highest. ‘Clock’ is measured under the median level of the backlight and WiFi on, whereas ‘System setting’ is measured under the lowest backlight and WiFi off. For the case of ‘Call’, I consider auto turn-off screen during the call. I derive two types of load current distribution, according to the two representative smartphone usage patterns, Pattern I and II introduced in Figure 2.3. Figure 2.9 shows the resultedf k (I out ) from Pattern I. From (2.14), the expected power loss of an equivalent VR can be generally expressed as: E[P eqv ] = r 1 W +r 2 I 2 out f(I out )dI out +b I out f(I out )dI out +r 3 W +r 4 : (2.16) 40 I denote the efficiency and power loss for different setup as setup andP setup , where setup can bedef oropt.def implies the default setup, whereasopt implies the optimal setup of the VR. Then setup can be calculated byP group =(P group +P setup ), andP setup can be derived from (2.16) withW = W setup . P group is the power consumed by all the modules in the group. Finally, I define the power conversion efficiency enhancement (Gain ) and power loss reduction (Gain P ) by Gain = opt def 1 100 (%); Gain P = 1 P opt P def 100 (%): (2.17) Table 2.4 shows the S3 results for both Pattern I and II, where the values ofW opt are W opt;I for Pattern I, andW opt;II for Pattern II. The overall power conversion efficiency enhancements for Pattern I and II are 6% and 5.5%, which correspond to 19% and 18% power loss reductions during power conversion, respectively. To check how much the voltage ripple increases by changing from Wdef,k to Wopt,k, I define a parameter called voltage ripple change (%) and calculated as V opt;k /V def;k 100, where V def;k and V opt;k are obtained by substituting W def;k and W opt;k in (1.6), respectively. Throughout the regression results and possible range of output current for each group, V sw1 +V sw2 = r 1 I out , andV L = r 2 I out . Then all the possibleV sw1 , V sw2 andV L are considered to derive the maximum voltage ripple change. The result for each group are reported at Table 2.5. Because the worst case is only 14%, and the equivalent VR model includes LDO, I can safely state that the resulting voltage ripple will satisfy the design constraints. 41 Table 2.4: Static switch sizing (S3) results (%) of Patterns I and II. k W opt;I Gain Gain P W opt;II Gain Gain P 1 0.2976 7.9718 29.7828 0.3365 6.4977 26.7376 2 0.1886 14.0471 38.7954 0.2073 12.3355 37.0320 3 0.3793 4.4908 11.4699 0.3869 3.9988 10.4214 4 0.1671 10.9028 29.7479 0.1834 9.7412 28.0067 5 0.1271 12.8754 27.7512 0.0900 12.4529 27.2171 6 0.1176 14.2869 34.0873 0.1228 13.6310 33.3380 7 0.2130 2.0874 11.0332 0.2145 2.0615 10.9291 t — 6.0157 19.0699 — 5.5536 18.0396 *t indicates the overall result Table 2.5:Voltageripplechange (%) fromW def;k toW opt;k for Patterns I and II. k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 I 8.3 11.6 5.5 14.3 7.7 1.9 0.1 II 7.1 11.0 5.5 13.4 10.2 1.8 0.1 2.4.5 Simulation results: dynamic switch modulation According to the classification flow in Figure 2.7, Group 1, 2, 3 and 5 are classified to discretizable, Group 4 and 6 arediscrete, and Group 7 belongs tocontinuous. In this section, only the results from the smartphone usage pattern, Pattern I, is presented for brevity (i.e., the results from Pattern II are almost same as the Pattern I). As a result of the K-means clustering procedure with K = 7 and = 1000, each of Group 1, 2, 3 and 5 has seven discrete load current values. Table 2.7 shows the resulted values and their corresponding optimum width values. From Algorithm 1, I can derive the set of switches of each group that covers the maximum number of the width values in Table 2.7. And the boundary conditions of the load current region are calculated byI bd;i = p W i W i+1 r 3 =r 1 , which is derived from (2.6). Table 2.6 shows examples of the resulted efficiency enhancement (Gain ;method ) of Group 1, 2 and 3, when = 0:4, and N = 3 (method = DSM1) or 4 (method = 42 DSM2). The results from the S3 (method = S3) are also provided for comparison. I assume that the power losses of the controller for all methods are the same. The table includes the results of the five applications. The results of the other five applications are omitted in this chapter, but they show similar results to the application in the table. Rather, in order to demonstrate the effectiveness of DSM for the varying load conditions, three cases of (fixed) high load current conditions are also explored, although they are rarely observed when running the common applications. Table 2.6 shows that Gain ;DSM2 is slightly better than Gain ;DSM1 , and Gain ;DSM1 is generally better thanGain ;S3 . For Group 1 and 2, high efficiency en- hancement is achieved for all the methods when the applications require the low load current (i.e., System setting and Call in both group, and Facebook and Neocore in Group 2), which are around 8% to 12%. On the other hand, for the applications requiring the higher load current (i.e., Skype-videochat in both group, and Facebook and Neocore in Group 1), the efficiency enhancement of Group 1 and 2 are not that high, which is around 1% to 4%. That is because the efficiencies from the default setup ( def ) are higher in the high load current conditions than the low load current conditions. Meanwhile, the S3 achieves high efficiency enhancement at the low load current conditions, as shown in Table 2.6. But it has drawbacks that the efficiencies at the high load current conditions are reduced -Gain ;S3 can be even negative. On the other hand, DSM can achieve the high efficiency enhancement for wide load current range. For example, the result of Skype-videochat at Group 2, Gain ;S3 is 1.8%, but Gain ;DSM1 and Gain ;DSM2 are still more than 4%. Furthermore, in the cases of Group 1 and 2 when the load current conditions are 150, 250 and 350mA, the results demonstrate that DSM keeps efficiency enhancement even for the high current region, but the S3 does not. The results from Group 3 in Table 2.6 show the similar results. 43 Table 2.6: Efficiency enhancement results (%) of the dynamic switch modulation (Gain ;DSM ) and the static switch sizing (Gain ;S3 ). Group 1 -DSM1:N = 3, Width set={0.1509, 0.4404, 1.1547} andDSM2:N = 4, Width set={0.1509, 0.3017, 0.4404, 1.1547} Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 System setting 9.7606 10.0825 10.2681 Neocore 3.9914 4.4510 4.4730 Call 8.7008 8.6859 8.9136 I out = 150mA -3.7896 0.1598 0.4380 Skype-videochat 3.8615 4.4978 4.5696 I out = 250mA -9.5422 0.0685 0.0851 Facebook 3.9612 4.5575 4.6576 I out = 350mA -13.7721 0.7423 0.8465 Group 2 -DSM1:N = 3, Width set={0.1088, 0.4250, 0.9744} andDSM2:N = 4, Width set={0.1088, 0.1918, 0.2514, 0.9744} Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 System setting 10.0681 11.7873 12.2481 Neocore 12.1440 11.2321 12.2433 Call 12.9531 12.3106 13.1478 I out = 150mA -6.4464 0.5639 0.5639 Skype-videochat 1.8863 4.2109 4.2728 I out = 250mA -13.9895 -0.1039 0.0045 Facebook 7.8378 8.2429 8.7913 I out = 350mA -19.5088 0.3518 2.4955 Group 3 -DSM1:N = 3, Width set={0.2970, 0.3945, 0.8983} andDSM2:N = 4, Width set={0.2970, 0.3717, 0.4122, 0.8983} Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 Case Gain ;S3 Gain ;DSM1 Gain ;DSM2 System setting 4.4993 4.5576 4.6029 Neocore -3.0773 0.4821 0.4636 Call 4.8733 4.9012 4.9499 I out = 200mA 2.7330 2.8415 8.3743 Skype-videochat -0.0035 1.5326 1.5531 I out = 250mA 0.7476 4.7393 4.8130 Facebook 3.2841 3.4753 3.5499 I out = 300mA -5.8153 2.1294 2.2558 44 Table 2.7: results ofK-meansclustering for Group 1, 2, 3 and 5:I 0 out;k is thek th mean value (mA), andW opt;k is its corresponding optimal width. Group I 0 out;1 W opt;1 I 0 out;2 W opt;2 I 0 out;3 W opt;3 I 0 out;4 W opt;4 1 26 0.1509 39 0.2263 52 0.3017 69 0.4004 2 21 0.1088 28 0.1451 37 0.1918 48 0.2514 3 117 0.2970 138 0.3489 147 0.3717 156 0.3945 5 16 0.0968 17 0.1068 19 0.1155 20 0.1231 Group I 0 out;5 W opt;5 I 0 out;6 W opt;6 I 0 out;7 W opt;7 1 98 0.5716 199 1.1547 300 1.7408 2 82 0.4250 188 0.9744 296 1.5341 3 163 0.4122 258 0.6540 354 0.8983 5 22 0.1350 25 0.1533 28 0.1719 The K-means clustering result for Group 5 in Table 2.7 shows the gap between minimum and maximum load current conditions is only 12mA. Thus, only one switch set (N = 1) sized by the S3 would be enough. For DMS withN = 2, {0.0968, 0.1231} can be a set of widths of the switches. Camera-digital in Group 4 is the module that has the on/off operation controlled by an user. Furthermore, as shown in Figure 2.12, it dominantly consumes power (55%65%). When the camera is on, the average load current of Group 4 is 62.7269 0 500 1000 1500 2000 2500 3000 0 10 20 30 40 50 60 70 80 90 50 10 100 150 200 250 Time (s) 20 30 300 40 50 60 70 80 (%) Mean: 65% Mean: 55% Camera :camera on Facebook :camera off Skype-videochat :camera on System setting :camera off Figure 2.12: The ratio of the power consumed by Camera digital to the power consumed by all the modules in Group 4. 45 1 2 3 4 5 6 7 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 10 20 Current (mA) 0.005 0.015 0.025 0.035 30 40 50 60 70 0.045 Probability Figure 2.13: Load current distribution of display modules in Group 7, according to the 10 brightness levels. mA, and when it is off, the average load current of Group 4 is 19.5208 mA. From (2.15), these current values corresponds to the width of switches as 0.5127 and 0.1595, respectively. Meanwhile, SD card and Camera analog are such modules in Group 6. Then I have four discrete load current values, 15, 29, 37 and 51 (mA), according to the conditions of (SD card, Camera analog): off/off, off/on, on/off and on/on. These current values corresponds to 0.1128, 0.2181, 0.2783 and 0.3836 as the optimum effec- tive widths, respectively. WhenN = 2, {0.1128, 0.2783} can be a set of widths of the switches. Table 2.8 shows the efficiency enhancement results of Group 4 and 6, for the four applications. The case of Group 4 shows the similar results to the previous cases of Group 1, 2, and 3 that DSM performs as well as the S3 does in the low load current con- ditions (i.e., System setting and Neocore), but DSM also keeps positive enhancements even in the high load current conditions (i.e., Skype-videochat and Camera). On the other hand, the case of Group 6 show that both methods have almost same results. That is because the load current range of Group 6 is narrow, besides the applications may not frequently require the maximum current. Group 7 consists of two modules, Display memory and Display backlight. Display backlight has various brightness levels that can be set by the user preference. I divide the brightness levels by 10, and measure the load current of Group 6 for for each level. Then 46 Table 2.8: Efficiency enhancement results of Group 4 and 6. Group 4 Group 6 Case Gain ;S3 Gain ;DSM Gain ;S3 Gain ;DSM System 16.7919 16.8653 12.9967 12.9725 setting Neocore 5.6650 5.6295 13.7939 13.7870 Skype- -1.4018 1.6159 5.3388 6.0069 videochat Camera 0.1997 2.3499 6.0982 6.6365 the load current condition induced by each bright level is overlapped to the conditions of the adjacent levels. Figure 2.13 shows the resulted load current distribution of Group 6, when all the levels are equally likely to occur. Next, I select the seven discrete current values of an arithmetic sequence satisfying that the minimum and maximum current values are 11 and 66mA, respectively. These current values corresponds to the required width values. From Algorithm 1 with = 0:01, a set, {0.0355, 0.1225, 0.2128}, is derived. All the possible effective width from the set can cover the seven required width values (ThusN = 3 is enough in this case). Finally, I have the enhancement results that Gain ;S3 = 3:9483%, and Gain ;DSM = 4:3424%, in the case of all the levels to be equally likely chosen. For interested readers, I also provide Table 2.9 to show the detailed results for the 10 applications. 2.5 Summary This chapter demonstrated that significant power loss occurs during power conversion in the PDN of a smartphone platform. To mitigate this problem, this chapter focuses on the VRs in the PDN to introduce two optimization methods for the VRs. S3 was pre- sented to configure the switches in VRs so that the optimal operating conditions of the 47 Table 2.9: Results of the power loss gain (%) for 10 applications. Application Gain P;S3 Gain P;DSM1 Gain P;DSM2 Call 18.3237 18.8361 19.0967 Camera 15.1582 15.9335 16.1485 Clock 22.4631 23.3820 23.4811 Facebook 17.2478 18.3995 18.6534 GoogleMap 16.6835 18.3276 18.5324 Neocore 4.8926 11.1143 11.1352 Skype-videochat 9.5858 12.2300 12.2963 SMS 18.4505 19.4249 19.6605 System setting 21.0103 21.5473 21.6712 Youtube 17.7074 18.2816 18.5089 VRs match to the general load current conditions. The general load current distributions for all modules in the platform were derived from the measured loading profiles and smartphone usage patterns. DSM was also presented to overcome the lack of capability of the S3 that may not be optimal for dynamically varying load conditions. By exploit- ing the multi switching scheme, detailed procedures to select and size the switches was introduced. To verify the presented methods in an actual smartphone platform, the PDN characterization procedure was performed. By the proposed equivalent VR model and grouping method, the power conversion efficiency of the PDN in the target smartphone platform could be characterized. Finally, I applied the proposed optimization methods to the platform. The experimental results showed that the S3 achieves 6% overall efficiency enhancement, which translates to 19% power loss reduction for the general smartphone usage pattern. The DSM accomplishes the similar improvement at the same condition. Furthermore, it also can achieve the high efficiency enhancement in the various load conditions. In the design flow, both S3 and DSM methods can be applied only after obtaining the load current distributions for the modules. S3 is simple to implement, but 48 may not produce the optimal transistor widths under dynamically changing load condi- tions or even under the case that the load distribution has a high variance. On the other hand, DSM has more control/area overhead than S3, but it can achieve high conversion efficiency enhancement under all load conditions. Note that if it happens that the load current distributions are changed because of newly added applications or changing us- age patterns compared to those used for the initial optimization, the DSM method will continue to provide power efficiency enhancement because of its adaptability whereas the S3 method will fail. 49 Chapter 3 Optimizing the PDN in a multicore platform 3.1 Introduction By leveraging technology scaling to pack multiple processor cores on a single die, chip multi-core processors (CMPs) have been increasingly adopted in desktop and server applications, as well as mobile environments, due to the growing demand for high per- formance VLSI systems. CMPs have achieved high throughputs in handling multiple applications by distributing them to different cores and executing them simultaneously. Furthermore, emerging challenging scientific and engineering problems craving for high performance computing and simulation have resulted in the advent of many-core pro- cessors. In spite of the benefits, developing such multi/many-core processors has hit a critical roadblock, power consumption. Due to the limited power budget and run- ning/cooling cost, power consumption has become a overriding concern for CMP de- signs. One of the most effective techniques to mitigate the power consumption of CMPs is to dynamically vary the supply voltage and operating frequency values applied to the process cores in response to load conditions or workload characteristics (this is known as dynamic voltage and frequency scaling, or DVFS for short) [40, 41]. The conven- tional approach is to perform DVFS for all cores in a processor (per-chip DVFS). This 50 approach has not been able to take full advantage of power-saving that DVFS potentially achieves. For instance, some of the cores may not need a high voltage/frequency level, but can not be lowered because of the other cores. To surmount this shortcoming, apply- ing DVFS to each individual core (per-core DVFS) has been suggested, and has resulted in excellent flexibility in controlling power [19, 20]. Unfortunately, this approach can still have inevitable drawbacks such as a larger footprint, higher power conversion loss, and higher control complexity incurred by the more sophisticated power distribution network (PDN). The PDN in the per-core DVFS platform provides power to each core from a power source. It consists of voltage regulators (VRs), which play a pivotal role to convert the voltage level of the power source to the required voltage levels of the target cores. There- fore, to support the per-core DVFS, at least the same number of VRs (as the number of cores) should be equipped in the platform, which can cause high area overhead. How- ever, recent research work that focuses on on-chip VR designs proves that this overhead can be significantly mitigated by reducing the size of each VR [42, 18, 43]. Meanwhile, the VRs inevitably dissipate power, and power dissipations from all VRs inside a per-core DVFS platform can result in a considerable amount of power loss. Given that a VR’s power conversion efficiency (again, simply called VR efficiency in this dissertation) is the ratio of the power consumed by a core to the total power consumed by both the core and VR, the state-of-the-art VRs exhibit high peak power conversion efficiency, but their efficiency can drop dramatically under adverse load con- ditions (i.e., out-of-range output current levels) [43, 7, 8]. Figure 1.2 shows an example of traces of the VR efficiency during delivering power to a core. Around 24% of input power is dissipated by the VR in the high efficiency region (indicated by the red line), but more than 53% of the input power is consumed by the VR in the low efficiency 51 region (the blue line) in the figure. Consequently, the VR efficiency is a critical concern and optimization objective to save power in the platform. A few recent papers have studied VR components in order to improve the effi- ciency of a single VR [14, 11, 10, 6]. Optimizing the switch sizes and the frequency of the pulse-width modulator (PWM) in the VR for the given workload has been stud- ied in [11, 10]. Using multiple/parallel switches in the VR design has been presented in [14, 6]. In contrast, little attention has been paid to the question of how to improve the efficiency of a VR network from system-level optimizations, in spite of a few papers that have explored VRs from a system perspective [9, 19, 20]. A DVFS policy that is aware of the VR efficiency characteristics has been addressed in [9]. The optimal fre- quency of a core has been derived to minimize the total energy consumption in both the core and the VR. However, there is still large potential to save more power in the multi-core and multi-VR systems. In [19], the potential of energy saving in the CMP using per-core DVFS and fast transient responses of VRs has been presented. To deter- mine the optimal DVFS levels for each core, an offline algorithm based on integer linear programming (ILP) has been proposed. But this approach does not consider the power dissipated by the indispensable large number of VRs to enable per-core DVFS. Mean- while, to tackle the drawback of per-core DVFS, an offline approach to cluster the cores in the same voltage-rail has been suggested [20]. K-meansclustering has been used to group some cores which have the similar DVFS levels, so as to reduce the number of VRs required in the system. However, reducing a fixed number of VRs loses in part the benefit of per-core DVFS as aforesaid, and may not guarantee energy saving in VRs with dynamically changing workloads. In addition, clustering the cores with similar behaviors of the voltage/frequency levels may not be applicable for multi-threaded ap- plications where the locking and synchronization issues should be carefully accounted 52 for [21, 22]. For example, a delayed thread of an application on the clustered core may have to lock the other threads for the synchronization, which can cause significant delay of the application. In this chapter, I start from a concept to combine some cores, which operate at the same voltage level and drive relatively small amount of load current, to be powered by a single VR. This approach can significantly reduce the VR power loss in the multi-core processor platform due to the following two reasons: (i) the VR used to power multiple cores has relatively high current load and thus has higher efficiency according to the VR characteristics, and (ii) the VRs that are not used can be turned off to save power. Based on this concept of VR consolidation, I propose a new design of the multi-core platform, which includes (multiple) sets of network switches to reconfigure the PDN. I then present two optimization methods to minimize the VR power loss and maximize the total energy saving. I first propose a reactive method that configures the PDN based on the sensed voltage/current level of each core. I present a proactive method to decide the optimal voltage/frequency level of each core in the consideration of maximizing the consolidation opportunities of VRs, in order to minimize the energy consumption of the whole system. Along with the optimization methods for the PDN composed of homogeneous VRs, I also discuss the PDN with heterogeneous VRs, which is proposed to increase the benefits of the VR consolidation by equipping VRs with a larger driving capability of load current. I provide detailed discussion about the design considerations for both homo/heteogeneous PDNs. I validate the proposed methods on various applications from the PARSEC [44] and SPLASH2 [45] benchmark suites. I perform detailed multi-core processor simulation 53 using the modified Sniper simulator [2], and the spice circuit simulation with a com- mercial VR carefully selected for fair evaluation. Results demonstrate up to 36% VR energy loss reduction and 9% total energy saving. 3.2 Dynamic Reconfiguration of the VR-to-core net- work A state-of-the-art VR powering a set of cores may have low conversion efficiency when there is a mismatch between the high efficiency region of VRs and the load condition of the cores, as addressed in the previous section. Furthermore, due to the introduction of a large number of VRs for per-core DVFS, significant amount of power will be dissipated by the VRs. Especially, the VR efficiency under the low load current condition, as shown in Re- gion I of Figure 1.4, could not be effectively improved by the approaches of sizing the switches. In addition, the power consumption by the controller in a VR,P controller , can- not be scaled with the size of switches. In Region I where the PWM operating mode is inefficient, an alternative operating mode such as pulse frequency modulation (PFM) can be added to compensate the degraded efficiency [18, 14]. Although mitigating the radical efficiency drop in the low current region, the efficiency of the PFM mode is typi- cally lower than that of the PWM mode in the normal current region. The design/control complexity of the VR also increases by supporting switching between these two modes. Instead of adding more operating modes, I propose a system-level optimization tech- nique to substantially improve the VR efficiency in the per-core DVFS based CMPs. This technique dynamically configures the connection network between VRs and cores according to the load current demand for each core. The basic idea can be motivated 54 and illustrated with a simple example: if both cores in a dual core processor require the same supply voltage level, and they have small load currents (their load currents are not necessarily the same), then their power domains can be consolidated to share a single VR. In this way, the shared VR will have higher load current and thus higher conversion efficiency (because it will subsequently operate in its high conversion efficiency region), whereas the other VR which is not in use can be turned off to save energy. Starting from this intuition, I propose a new technique called VR consolidation (or VRCon for short) in a reconfigurable VR-to-core distribution network (this is in analogy with the well- known technique of core consolidation used to consolidate tasks/jobs into a minimum number of active cores in a CMP). 3.2.1 Proposed multicore platform Figure 3.1 provides a conceptual diagram of the proposed multicore platform. The plat- form has a number of VRs and multiple cores. There are several groups of reconfig- urable VR-to-core connection networks supported by network switches implemented with power MOSFET switches. The VR-to-core network can deliver power for each core from any VR in the same group. I will discuss these groups of connection net- works in detail in Section 3.2.4. This reconfigurable power distribution network thus enables arbitrary connections between output of any VR and the input power pin of any core in the same group. The power manager (PM) in a conventional CMP platform controls the processor’s operating condition by using the DVFS technique. Compared to the conventional de- signs, I add a VRCon manager (called VRCM), which ultimately controls the core’s frequency/voltage level, as well as the operations of VRs and ON/OFF states of the net- work switches in VRCon. The PM in the proposed platform still keeps monitoring the 55 DVFS opinion DVFS setup Core 5 Core 8 Core 1 Core 4 VR groups .. Multi-core processor (per-core DVFS) VR output setup VRCon Manager Hardware Performance Monitor Dynamic Config. .. .. .. Switch set 1 Switch set 2 Switch set 3 Power Manager Sensing circuits .. .. Core 9 Core 12 .. .. .. VR 1 VR 4 VR 5 VR 8 VR 9 VR-to-core distribution network Figure 3.1: Diagram of the proposed multicore platform. core status (i.e., performance) reported by the hardware performance monitor (HPM) as a conventional PM does. According to this design, the PM determines a tentative supply voltage and operating frequency of each core, and transmits this information to VRCM as a recommendation. The new supply voltage and frequency levels of each core are finally set by the VRCM, which may actually choose different values than those recom- mended by the PM. Details will be discussed in the following subsections. 3.2.2 Reactive VRCon The power saving achieved by employing DVFS strongly depends on the frequency of the decision making process, or equivalently, the duration of decision period (T DVFS ). IfT DVFS is small, the output of the VR and PLL will change more frequently, which results in better responsiveness to load changes but also higher energy loss and delay penalty due to overhead of DVFS transitions.T DVFS should thus be considered a design variable to be set by the PM, which needs to be (much) longer than the voltage scaling time of the VR [46]. On the other hand, by turning on/off the network switches, the time 56 Vdd 0.75 0.83 0.95 1.05 1.2 0 1 2 3 4 5 V oltage (V) 0.75 0.83 0.95 1.05 1.20 1 3 5 Current 0.75 0.83 0.95 1.05 1.2 0 1 2 3 4 5 6 Vdd 0.75 0.83 0.95 1.05 1.20 Current 1 3 5 Current (A) Time Current (A) V oltage (V) is a valid region for VRCon, is not, because of the high load current. Figure 3.2: Example cases that the reactive VRCon can be applied. to reconfigure the VR-to-core network (T NS ) is only limited by the transient response of the VR, which is in general much shorter than the voltage scaling time (T NS <T DVFS ). Consequently, I treat the DVFS setting and network reconfiguration as the global and local power managements of VRCon, respectively. T DVFS and T NS are the required minimum global and local decision epoch lengths, respectively. For its local power management function, the reactive VRCon applies only to cores operating at the same supply voltage level. As shown in Figure 3.2, the blue box shows the cases when the reactive VRCon can be applied. The VRCM in this case performs only the network switch control to minimize the total energy consumption (that is, it will not change the voltage and frequency decisions of the PM). This total energy consumption is the summation of energy losses of the active VRs (including network switches) and the energy consumptions of the cores during the time period T DVFS . I defineT l as the time period ofl th local management satisfyingT l T NS ,for8l, and 57 P L l=1 T l T DVFS . Now then, the total energy consumption inT DVFS can be expressed as: E T DVFS = N X i=1 E core;i + L X l=1 N X i=1 E NS;i;T l + N X j=1 E VR;j;T l ! (3.1) where minimizing the second term in (3.1) is the objective of the reactive VRCon. In the equation,N is the total number of cores. The energy consumption of thei th core is given byE core;i = I core;i (t)V core;i dt, whereI core;i (t) is the input current of thei th core, andV core;i is the input voltage of thei th core. I core;i (t) is a function of time, butV core;i is constant in the period of T DVFS . I define the energy loss of the turned-on network switch connected to thei th core during time periodT l asE NS;i;T l . The energy loss of the j th VR during time periodT l is defined asE VR;j;T l . For the local power management in an arbitrary time period, I useE NS;i andE VR;j to represent the general forms ofE NS;i;T l andE VR;j;T l , respectively. If identical power MOSFETs are used for the network switches, the power loss of the power MOSFETP NS;i may be expressed as [47]: P NS;i (t) = I on;i +I off;i I g V D Q g + 1 2 C OSS V 2 D +I 2 core;i R NS ; (3.2) where the first term is the switching loss during the turn-on and turn-off times; the second term is the switching loss from output capacitance of the power MOSFET; and the third term is the conduction power loss. I on , I off are the load current at the turn- on and turn-off times, I g is the gate drive current; V D is bus voltage; Q g is the gate charge, which is generally provided in power MOSFET datasheets, and C OSS is the output capacitance of the power MOSFET given by the gate-to-drain capacitance plus the drain-to-source capacitance of the switch.R NS is the on-state resistance of the power MOSFET. From (4.6), I can deriveE NS;i . 58 To obtainE VR;j , I could use the VR power loss model in [9, 6], or circuit simulations with the target VR module. Either method requires the load voltage and current values. The output voltage of a turned-on VR is set to be the supply voltage level of any core connected to the VR. On the other hand, the output current of the VR is set to be the sum of the load currents of the connected cores. Note that if the local power management aims to consolidate some cores to one VR, the maximum load current should not be greater than the maximum current rating of the VR. The red box in Figure 3.2 shows the cases when the reactive VRCon can not be applied, because of the overrated combined load current. Owing to the limited number of cores in each group of the connection networks, it becomes manageable to find the cores to be combined to minimize the energy consump- tion of both VRs and network switches in a group. To achieve this goal, VRCM first sorts the cores in each group that have the same voltage levels and a lower amount of input current than the maximum driving capability of a VR. Then, based on the current levels, VRCM finds the two cores, by merging which the VR energy saving is maxi- mized. After consolidation of those two cores, VRCM keeps repeating this procedure until there is no core available, or the VR energy saving from the consolidation of the remaining cores is less than the power loss of the network switch transition. 3.2.3 Proactive VRCon For its global power management function, the proactive VRCon exploits DVFS tech- nique to perform frequency (and its corresponding voltage level) scaling taking into account energy consumptions of both cores and VRs, in the decision period, T DVFS . In my proposed method, there exists a trade-off between the energy saving by DVFS 59 (which is initially determined by the PM), and reduced energy loss by adaptively turn- ing off the VRs and using fewer number of VRs at higher conversion efficiencies. If the VRCM finds out that the latter option is more desirable, the VRCM will not decrease the frequency/voltage levels of some cores to the minimum possible level; Instead, it will adjust the frequency/voltage levels of the cores to increase the opportunities for applying the VRCon procedure. Compared to the reactive VRCon, the objective here is to find the frequency/voltage level of each core duringT DVFS to minimize the total energy consumption, which can be formulated as: min T X t=1 E T DVFS;t (V core;1 ;V core;2 ;::;V core;N ) ! ; (3.3) where E T DVFS;t denotes the total energy consumption during the t th time period of T DVFS , which is formulated in (3.1). T DVFS;T indicates that all the task processings are finished in this period. Given thatV core;i in the periodT DVFS affects the results of the reactive VRCon,E core;i ,E NS;i;T l andE VR;j;T l inE T DVFS;t are functions ofV core;i . Because of (i) changing V core;8i in time period T DVFS;t affects the VRCon results in periodT DVFS;t+1 , and (ii) the locking and synchronization issues of the multi-thread applications in multi-core processors, solving (3.3) is difficult. Therefore, by exploiting the initial DVFS schedule of the PM, I first divide the overall problem into sub-problems, each of which only concerns how to modify the initial DVFS schedule to optimize the energy saving results of the reactive VRCon in a given period, T DVFS . In order to guarantee that the performance (i.e., total execution time of applications) is not degraded by the modification of DVFS schedule, I impose the constraint that the VRCM can only keep the same or increase (but not decrease) the frequency/voltage level of each core from the original DVFS level suggested by the PM. Now, I transfer the problem in (3.3) 60 to a problem to find the optimal network configuration and voltage level of cores that minimize the total power consumption while maintaining the performance of the system. If I define the network configuration so thatS n denotes a set of the consolidated cores to then th VR, I can formally describe the problem as follows: FindN setsS 1 ,S 2 ,...,S N to minimizeE T DVFS;t (V 1 ;V 2 ;::;V N ) Subject to V Sn = max m2Sn (V m ), andI Sn = X m2Sn I m;new I out(max) (3.4) where V m , 1 m N, is the voltage level suggested by the PM of the m th core; V Sn is the maximum voltage levels of cores consolidated to then th VR (thosen th set), I m;new is the new current value of them th core underV Sn ;I Sn is the summation ofI m ’s, m2S n . If the VRCM finds a solution to the above problem, it will override the DVFS level recommended by the PM with the new voltage level. From the assumption that tasks during time period T DVFS have already been as- signed to the cores according to the PM’s recommendation, I focus only on the DVFS decisions of the VRCM without any task migration. Consequently, (3.4) can be di- vided into a set of subproblems, each of which is to find DVFS levels only for the cores belonging to the same group. Furthermore, the number of cores in any group is con- strained by the maximum load currentI out(max) that a single VR can drive. Therefore, it is tractable to search all possible DVFS levels of the cores in the same group when only voltage increases are possible. I have implemented a clustering-based heuristic solution as shown in Algorithm 1. I first sift through the cores in a group driving a small amount of current so that they can be combined with others. In order to respond to the dynam- ically changing current, I determine the amount of current of each core by the average 61 Algorithm 2 To find a set of the new voltage levels based on the proactive VRCon, under the homogeneous PDN Initialization defineS : a set of the consolidated cores to a single VR .S is a subset solution for (3.4). defineC =f(I 1 ;V 1 ); (I 2 ;V 2 );:::; (I M ;V M )g .M is the number of cores in a group of connection network functionFind_Max_Saving (C) . Find two cores that achieves the maximum power saving by the consolidation. Findi andj such that i6=j,i;jK, .K is the number of elements inC V =max(V i ;V j ) . Max. voltage level is chosen. I =I i;new +I j;new I out(max) .I i;new is the new current value ofi th core indued by changing the voltage level of the core. If the voltage has not been changed,I i;new =I i . max P loss (I i ;V i ) +P loss (I j ;V j )P loss (I;V ) +I i V i +I j V j (I i +I j )V +P NS (I i ) +P NS (I j )P NS (I) . Calculate the power saving.P loss andP NS are from (4.4) and (4.6), respectively. ifi andj exist and the maximum power saving> 0 then update S, and returnf(I i +I j );c l :c l 2C;l6=i;jg . Now I treat these two cores as one equivalent core. else Return {} functionVRCon_pro_I (C) . Main function whileC6= {} do U =fckc2C,I2cI out(max) g Mapu2U tos2S . match the re-arrangedu tos C =Find_Max_Saving_I (C) . A new set C is updated. return S current during the (previous) decision period, T DVFS (i.e., in the proactive VRCon, I first determine the voltage levels of the cores and the network configuration. Later, dur- ing the current decision period, the reactive VRCon changes the network configuration according to the dynamically changing current of cores in real time.) Next I perform the function, Find_Max_Saving, in Algorithm 1 to find the two cores and their voltage level that can achieve the maximum power saving, if they are merged with the same 62 voltage level. I then treat these two cores as one equivalent core. The procedure is re- peated until no energy saving can be achieved by VR consolidation, in the function of VRcon_pro_I. Notice that if the VRCM gets involved in the task allocation to the cores, and the target platform has a large number of cores, then solving (3.4) may require more sophis- ticated combinatorial optimization approach to find the best core to VR matches. This is, however, outside the scope of the present dissertation. Instead, interesting readers may refer [48, 49] that had considered the issues in the hardware-software cosynthesis and codesign. 3.2.4 Design considerations Compared to the conventional per-core DVFS platforms where each core has a single dedicated VR, my proposed network switches will incur additional energy losses. Pre- cisely, the switching energy loss of thei th network switch,E NS;switching;i , the first and second term in (4.6) have a direct effect on the time period of the reactive VRCon,T NS . In general, the lower bound ofT NS can be determined by: max[ i Delay NS;i , for 1lN]T NS ; (3.5) where i is the transition factor, andDelay NS;i is the delay of the network switch that powers thei th core. Interesting readers may refer [50] that describes the detailed way to calculateDelay NS by using the power MOSFET parameters in datasheet. If thei th core changes its network switch, i = 1, otherwise, i = 0. Then the set of i is derived from: N X i=1 i E NS;switching;i Gain VRCon (T NS ); (3.6) 63 whereGain VRCon (T NS ) is the total energy that can be saved from the reactive VRCon during time periodT NS . Regarding to selecting the network switches, the following should be considered. In (4.6),E NS;switching;i is proportional to the charge of the switches, whereas the conduc- tion energy loss,E NS;conduction;i , is affected byR NS . Therefore, if the switch transition occurs frequently in a short time, selecting a power MOSFET that offers the smaller charge values may be preferrable. In contrast, if T NS is long enough, E NS;conduction;i would become the dominant source of E NS;i . Then designers would better focus on choosing the smaller R NS . Of course, the area overhead due to the network switches should be carefully determined at the design time. Selecting the VRs is another important concern in the proposed platform. The VR has limited capability to provide a large amount of load current, as aforementioned in Section 1.1.2. Typically, the VRs that have the higher load current capabilities are equipped with power MOSFETs that offer the smaller resistance but relatively higher charges. Therefore, these VRs perform their peak conversion efficiencies in the higher load current region than the peak conversion efficiency region of the VRs that have the lower load current capabilities. If the VRs with larger capabilities are selected (i.e., these VRs will achieve peak conversion efficiency in the higher load current region than the normal load current of each core), the potential power saving from VRCon could be much higher than the case when each VR is optimally chosen to power a single core (in this case the VR achieves peak efficiency at the normal operation condition of a single core). Nevertheless, I should also consider the later case that accords with the original setup of the VRs for the per-core DVFS: each VR is dedicated to power a single core with the best VR efficiency. In this chapter, for the fair comparison between the results from applying VRCon and not, I use the same platform for both cases that adopts 64 one type of the VRs, each of which is set to achieve the high efficiency in the normal operation region of the core, or each of which has the high load current capability. I calls this setup as a homogeneous PDN. Later in Section 3.3, I will also discuss a heterogeneous PDN that is composed of two different types of VRs, one for VRCon and the other for the operation of the core. Meanwhile, the capability of the VR also affects the number of cores in a group. In other words, due to the limited capability of the VR, the number of cores that can be connected to one VR should be limited. Therefore, designing the VR-to-core network to support all the connections between all the VRs and cores is redundant. In addition, the output voltage fluctuation (a.k.a., voltage droop [51]) problem should be taken into consideration. Because a rapid and large change of the load current of a VR can cause a critical output voltage swing of the VR, more than a certain number of cores should not be connected to one VR at once. I thus proposed the network grouping, where only the VRs and cores grouped in the same subnetwork can be connected. This is also described in Figure 3.1. Furthermore, owing to the limited numbers of connections between the network switches and cores, this grouping can mitigate the scalability issue that the power/implementation overhead from the network switches becomes more significant as the platform is equipped with more cores. Finally, I present the design flows in Figure 3.3 to select the VRs and determine the number of cores in a group. A designer first selects the VRs after deciding where to put more weight on, between the benefits from the VRs optimized for the normal operation condition of each core, and advantage of the VRCon by using the VRs offering the high capabilities. If the designer chooses the first, then the number of cores in a group may be smaller than that from the case when the designer chose the later. According to the required design specification that allows the power/implementation overhead of 65 Design the VR-to- core network Weight? Select the VRs that can achieve the peak efficiency in the multiple cores' operation. Select the VRs that can achieve the peak efficiency in the single core's operation. Increase the number of cores in a group. Capability of the VR may increase. Decrease the number of cores in a group. power/ implementation overhead? VRCon Per-core* Capability of the VR may decrease. Large. Small. Finish Implication Implication Figure 3.3: Design flows to determine the VRs and the number of network switches in the proposed platform. Per-core* in the figure means that a designer puts more weight on the energy saving of the VR by setting it to achieve the best efficiency in the normal operation condition of each core. the network switches, the designer may need to retrace the flows, in such a way that the designer increases/decreases the number of cores in a group, and even select the VRs again. 3.3 Heterogeneous PDN In the previous section, I have discussed the relationship between the effectiveness of VRCon and the current driving capability of a single VR in the homogeneous PDN. When the VR is selected to achieve its highest efficiency in the normal operation region 66 of each core, the current driving capability of switches in the VR may be relatively small (I call this VR a little VR), and in this case we can achieve limited power saving from VRCon. On the other hand, selecting the VRs with a higher capability (I call this VR a big VR) can increase the power savings from the VRCon, while losing the benefits from selecting little VRs when VRCon is not applied. Therefore, selecting VRs in a target homogenous PDN requires accurate estimation of how often the VRCon will be applied and how much energy saving will result from the VRCon. However, these information may be difficult to obtain at the design stage. Then an inaccurate estimation can lead to mismatched VRs, thereby losing both benefits of little VRs and big VRs. To overcome the drawback of the homogeneous PDN, I present the heterogeneous PDN comprised of big and little VRs. The heterogeneous PDN represents a desirable tradeoff between two extremes of selecting only big VRs or only little VRs. In this chap- ter, I consider the two (big and little) types of VRs instead of various types of VRs due to the following reasons: i) Using only two types of VRs reduces the control complex- ity. Applying VRCon to the heterogenous PDN must solve a problem to find optimal connections between cores and the VRs that are not the little VR. If there exist many types of such big VRs, the complexity of the problem may significantly increase, which may result in a heavy computation complexity of the VRCM. ii) Because the number of cores and VRs in a group is limited (due to the power/implementation overhead), the possible range of the load current of all cores in a group is limited. If the current range is not too wide, using two types of VRs may be enough to improve the efficacy of VRCon in the heterogenous PDN. I first explore a heterogeneous PDN design that has the same number of little VRs as in the homogeneous PDN, but is equipped with extra big VRs in each group. This design enjoys both benefits (of little VRs and big VRs), in that the little VR achieves 67 high efficiency by powering a single or a few number of cores, and the big VR takes responsibility for the consolidation of a large number of cores. However, adding extra big VRs may not yield commensurate benefits that justify the area/implementation over- heads. For instance, if there areM cores in a group, addingR extra big VRs in a group requires an additionalMR network switches and additional wire connections to the VR. The overheads will be exacerbated as the number of cores embedded in the platform increases. More precisely, if the big VR consists of the LTC3816 VR (area: 35mm 2 and cost: $4.8 [3]) with two Si4442DY power MOSFETs (each, 27mm 2 and $3.25 [52]), one 7447709100 inductor (69mm2, $3.1) and three EEE-1EA100WR capacitors, (each, 12mm 2 , $0.5), one big VR at least occupies 194mm 2 , and requires $15.9 – the device prices are taken from [53]. Moreover, if there exists 8 cores in a group, adding one big VR needs 8 more network switches, which may induces 216mm 2 area overhead and $26 additional cost. Regarding to using the extra big VRs and the induced overheads, one can have a question of using multiple little VRs instead of using an extra big VR, which results in a homogenous PDN with the more than M number of little VRs. Because of the following reasons, such homogenous PDN does not have any benefit: i) using multiple extra little VRs must cause the worse area overhead than that from adding a big VR. This is because the area overhead from a little VR and a big VR is small (i.e., the different component of the big and little VR may be only the two powerFET switches). Moreover, the multiple little VRs requires more network switches in the configurable network design. ii) Using one big VR give more chance to save power than using the multiple little VRs does. Although the power consumption of a big VR is bigger than that of a little VR, the amount of the difference (which is from the different powerFET switches inside the big and little VR) is typically less than the full power consumption of 68 R big VRs and M-R Little VRs. .. number of the network switches Little VR Little VR M 2 Heterogeneous PDN .. Group1 M number of cores belongs to each group .. .. Big VR Big VR .. .. .. VR .. VR .. Group2 Group3 .. Figure 3.4: A part of the proposed platform with the heterogeneous VRs: each group has theR big VRs andM-R little VRs one little VR. Furthermore, the big VR with large driving current capability can power more number of cores, thereby gives more opportunity to turn off the little VRs to save power, while the multiple extra little VRs do not have any benefit from turning off the VRs. 3.3.1 Proposed design of the heterogeneous PDN Adding extra big VRs to the existing PDN with little VRs can not avoid the scalability issue. Therefore, instead of adding redundant devices, I propose the heterogeneous PDN that replacesR little VRs by the same number of big VRs in each group. Consequently, the total number of VRs assigned to one group is the same with that in the homogeneous PDN design. Fig. 3.4 illustrates the proposed design of heterogeneous PDN, as a part of the proposed platform in Fig. 3.1. In order to determine how many little VRs should be replaced by big VRs, and how to select the powerFETs for the big VRs, I first need to estimate the load conditions of all the cores in a group. Recall that the homogeneous PDN design has a risk that inaccurate estimation of the load conditions may cause mismatch between VRs and actual load conditions, which can cause a significant amount of VR power losses. In contrast, using 69 both big and little VRs simultaneously can mitigate the risk from inaccurate estimation. Hence, I can use the load profiles collected by running various benchmarks on the target platforms to estimate the load conditions. LetR denote the number of big VRs in a group, andCap big andCap little denote the current driving capability of the switches inside the big VR and the little VR, respec- tively. The objective here is to find suchR andCap big values to maximize the power gain, which is the power saving by applying VRCon subtracted by the power loss from VR mismatches. I present a heuristic solution that starts from replacing one little VR by a big VR in a group. Then I keep increasingCap big fromCap little and testing the big VR equipped with the corresponding power MOSFETs, until the increased Cap big no longer improves the power gain (cf. the while-loop in the algorithm 2). Next, I increase R to two, followed by increasingCap big of the two big VRs to search whether this in- crease results in higher power gain than the value obtained previously with one big VR (cf. the for-loop in the algorithm 2). I repeat these procedures until I can not achieve higher power gain. Algorithm 3 explains the proposed procedure in detail. 3.3.2 VRCon for the heterogeneous PDN It is an NP-hard problem to apply VRCon to the proposed heterogenous PDN to find the best connections between VRs and cores to save the maximum amount of energy. To prove the NP hardness of the problem, I reduce the problem to only maximize the energy savings from VRCon, but ignore the energy consumption induced by VR-to- core mismatches. Then the problem is now transformed to a generalized assignment problem, which can be formulated as follows: Given that there areM (heterogeneous) VRs andM cores in a group, each VR has a limited driving capability of its total load current, and each core has a required load 70 Algorithm 3 To determine the number of the big VRs in a group and the power MOS- FETs inside the big VRs Initialization defineGain(R;Cap) .Gain(R;Cap) is the energy savings of both cores and VRs for the given load condition profile, when theR number of the big VRs in a group are replaced. The capabilityCap of the switches are attached to the big VR. g =Gain(0;Cap little ) .R = 0 implies that no big VR is required. functionFind_R_W big (Load condition profile) for 1mM do . To find i) the number of big VRs. Cap =Cap little + Cap whileg<Gain(m;Cap) do . ii) the cap. of the big VR g =Gain(m;Cap),Cap big =Cap, andR =m Cap = +Cap . Cap is the min. cap. increase. ifCap =Cap little + Cap then. this is the case that increasingR can not bring the better power saving. break return (R,Cap big ) current level. Any VR can be assigned to power a subset of cores, as long as the sum of the load currents of assigned cores does not exceed the limit. Depending on the VR- to-core assignment, the profit (i.e., power saving) of each VR varies. The objective is to find an assignment in which the total profit is maximized. If this problem is further simplified so that the profit is only a function of load current, but not affected by the types of VRs, the problem becomes a sort of multiple knapsack problem that is a well known NP-complete problem in combinatorial optimization. I propose heuristic algorithms to apply the reactive and proactive VRCons to the heterogeneous PDN. I first attempt to maximize the utilization of the big VRs in the proposed algorithms. In general, utilizing bigger VRs can give rise to turning off more little VRs and mitigating the energy loss incurred by the mismatches between big VRs and their assigned cores. This approach can also significantly reduce the computational overhead because I do not need to enumerate all the possible connections between all the cores and VRs. At the beginning of this step, I set one big VR as the target VR, and 71 estimate the benefit of each core if the core is connected to the target VR. I define profit for each core as the power saving that can be acquired from assigning the core to the big VR and turning off the little VR. Then the profit of eachi th core is calculated as follows: P loss;little (I i ;V i )P NS;i I i (V base V i ) (3.7) where I i and V i are the load current and voltage levels of the i th core, respectively, P loss;little is the power loss of the little VR in (4.4), andP NS;i is the power loss during the network switch transition. Notice that, to calculateP loss;little , I suppose that the core is currently connected to a dedicated little VR, regardless of what type of VR the core is actually connected to. This is reasonable because any core should be connected to a little VR if it is not connected to a big one. On the other hand, the current connection between the core and VR is taken into consideration during the calculation ofP NS;i . If the core is connected to a big VR, P NS;i is zero, otherwise the transition incurs power dissipationP NS;i . The third term is the estimate of power loss from the potential voltage level change. V base is thus equal toV i when the reactive VRCon is applied. For the case of the proactive VRCon, I set V base to the most common level (or the medium level) among all the voltage levels of the cores. Then I perform a procedure to select the cores that are connected to the big VR. More precisely, the problem here is to find a subset of cores, such that the sum of their profits is maximized and the sum of their current values is less than or equal to the limit of a big VR. This problem is similar to the well-known Knapsack problem, so that I can exploit a dynamic programming to solve the problem in pseudo-polynomial time. After assigning the cores to the target big VR, I repeat above procedures for the other big VR, until all the big VRs are investigated or there exists no available core. If 72 there remains cores that are not connected to the big VRs, I now exploit the VRCon algorithms that I have presented for the homogenous PDN in Section 3.2. To assign the rest of the cores to the little VRs, for example, VRCon_pro_I in Algorithm 2 is used here again. Similarly, the reactive VRCon for the heterogenous PDN in this step is the same as the reactive VRCon for the homogeneous PDN that I have discussed in Section 3.2.2. 3.4 Experimental work 3.4.1 Experimental setup 3.4.1.1 per-core DVFS, multi-core processor setup Unlike the conventional platform, the VRCM in the proposed platform performs DVFS referred to the PM’s initial recommendation. I thus treat the PM’s DVFS recommenda- tion as given a priori in this chapter, exploit an offline DVFS approach as an intermediate step for the overall aim. Similar to [19], I adopt an ILP based algorithm. Finding the optimal frequency/voltage level of each core to minimize the energy consumption under a certain performance penalty,, may be formulated to: min R X r S X s P r;s x r;s ! s:t: R X r S X s D r;s x r;s <;and R X r S X s x r;s =R (3.8) Table 3.1: DVFS frequency and voltage levels. GHz, V 2.66, 1.2 2.33, 1.05 2.13, 0.95 1.87, 0.83 1.66, 0.75 73 whereR is the total number of intervals, andS is the set of the five frequency/voltage levels described in Table 3.1. P r;s is the power consumption when running at s th fre- quency/voltage level forr th interval. By following the same notation toP r;s , D r;s de- notes the incurred delay under the frequency/voltage condition. To obtain P r;s , D r;s , I first performed detailed multi-core simulations for various benchmarks under the five frequency/voltage levels. From the simulation set by the highest frequency/voltage level, the intervals and the default instructions count for each interval were acquired. Based on the default instruction counts, P r;s , D r;s were then derived. Finally, IBM CPLEX was used to solve (3.8). Fig. 3.5 shows an example of the offline DVFS results from = 15%, for two applications in the 4-core simulator setup. Note that, because the PM and VRCM in the proposed architecture exist indepen- dently and operate separately, (3.4) and (3.8) are independently solved. However, if the PM and VRCM are combined to one unit, so that the new unit decides both initial and final DVFS levels of the cores, we may need a new problem definition and its solution. I remain this problem for the future work. I performed the multi-core processor simulations in the Sniper simulator. The plat- form configurations were set based on Intel Xeon Nehalem architecture, the topology is shown in Fig. 3.6. I modified the codes related to the McPAT module in the Sniper to collect the power and timing data from per-core DVFS. The multi-threaded applications from the PARSEC and SPLASH2 benchmarks were used in the simulation. 3.4.1.2 Homogeneous PDN setup I selected the programmable VR from Linear Technology, LTC3816 [3], which satis- fies the Intel VR-design guideline (VRD 11.1 [54]), and can power each core in my processor setup with the five output voltage levels. Next, I selected Si4840DY for 74 0 1 2 3 4 5 6 7 8 9 x 10 4 0.7 0.8 0.9 1 1.1 1.2 0 1 2 3 4 5 6 7 8 9 x 10 4 0.7 0.8 0.9 1 1.1 1.2 0 1 2 3 4 5 6 7 8 9 x 10 4 0.7 0.8 0.9 1 1.1 1.2 0 1 2 3 4 5 6 7 8 9 x 10 4 0.7 0.8 0.9 1 1.1 1.2 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Core 1: SPLASH2-Barnes Core 2: SPLASH2-Barnes Core 3: PARSEC-Streamcluster 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Core 4: PARSEC-Streamcluster Time (ms) 0.8 0.9 1.0 1.1 1.2 0.7 V oltage (V) 0.8 0.9 1.0 1.1 1.2 0.7 V oltage (V) 0.8 0.9 1.0 1.1 1.2 0.7 V oltage (V) 0.8 0.9 1.0 1.1 1.2 0.7 V oltage (V) Figure 3.5: A part of the per-core DVFS results ofBarnes andStreamcluster from the Sniper simulation with 4-core setup. the power MOSFETs, which is a N-channel trench power MOSFET from Vishay Sil- iconix [55]. The on-state resistance and charge and the maximum continuous drain current of Si4840DY are 9m , 19nC and 12.4A, respectively. I then performed LT- spice simulation based on the circuit diagram shown in Fig. 3.7. Fig. 3.8 (a) shows the resulted VR efficiencies according to the various load current under the five output volt- age levels. I set the input voltage level to 12V followed by the VRD 11.1. Given that the load current profiles of a single core gathered from the various benchmark simulations in the Sniper simulator resulted that the typical load current ranged from 4A to 10A, and the maximum current was less than 12.4A, the simulation results show that LTC3816 with Si4840DY is tailored to the dedicated VR for the single core in my multicore setup. 75 Core 1 Core 2 Core 3 Core 4 L1-I (32KB) L1-I (32KB) L1-I (32KB) L1-I (32KB) L1-D (32KB) L1-D (32KB) L1-D (32KB) L1-D (32KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L3 (8MB) DRAM Core 12 Core 13 Core 14 Core 15 L1-I (32KB) L1-I (32KB) L1-I (32KB) L1-I (32KB) L1-D (32KB) L1-D (32KB) L1-D (32KB) L1-D (32KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L3 (8MB) DRAM Figure 3.6: Topology of 16 cores (four 4-core processors) in Sniper simulation. I performed additional homogenous PDN simulations with a different VR setup, in order to investigate the effect from the VR mismatch. As aforementioned, the VR mis- match occurs if I select the power MOSFETs to let the VR have the larger current driv- ing capability, but the induced best efficiency region of the VR may be higher than the load current region from the normal operation of a single core. In reality, when selecting VRs, designers put the high priority on the capability of the VR so that the VRs can drive the maximum (possible) load current of a target core [3] (i.e., however, the load current from the normal operation of the core can be much less than the maximum load current). Furthermore, reference [6] showed that real (smartphone) devices can be equipped with some VRs that are set to achieve their best efficiencies in the vicinity of the maximum load current value. In light of these, I selected Si4838DY [56], which has the on-state resistance R DS and charge Qg and the maximum continuous drain current I D , 3m , 40nC and 25A, respectively, to be incorporated at LTC3816. The resulted efficiency of LTC3816 with Si4838DY from the LTspice simulations is shown at Fig. 3.8 (b). This figure shows that the efficiency of LTC3816 with Si4838DY is less than LTC3816 with Si4840DY in the region less than 12A, but can drive the higher load current. For the network switch, I select SiR800DP that has the lowest resistance (2.3m ) among the power MOSFETs from Vishay Siliconix, which is also available in LTspice simulation. SiR800DP has 40nC on-state charge and occupies 32mm 2 area. By taking 76 LTC 3816 Rfreq Rptc EXT Vcc Vin INT Vcc TG Boost SW BG BSource Isenp Isenn Imax Itcfb Itc PREImon Imon Vss (sen) Vcc (sen) CSlew SS Comp FB Servo GND = + 12V Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 LTC3816 1 3816f TYPICAL APPLICATION FEATURES DESCRIPTION Single-Phase Wide V IN Range DC/DC Controller for Intel IMVP-6/IMVP-6.5 CPUs The L TC ® 3816 is a single-phase synchronous step-down DC/DC switching regulator controller that drives N-channel power MOSFETs in a constant-frequency voltage mode architecture. The controller’s leading edge modulation to- pology allows extremely low output voltages and supports a phase-lockable switching frequency up to 550kHz. The output voltage is programmed using a 7-bit VID code. The L TC3816 features all of the IMVP-6/IMVP-6.5 require- ments, including start-up to a preset boot voltage, differ- ential remote output voltage sensing with programmable active voltage positioning, I MON output current reporting, power optimization during sleep state, and fast or slow slew rate sleep state exit. Fault protection features include input undervoltage lockout, cycle-by-cycle current limit, output overvoltage protection, and PWRGD and overtemperature flags. High Efficiency, Synchronous IMVP-6/ IMVP-6.5 Step-Down Controller APPLICATIONS n Supports 7-Bit IMVP-6/IMVP-6.5 VID Code and Features n Wide V IN Range: 4.5V to 36V Operation with Optional Line Feedforward Compensation n t ON(MIN) < 35ns, Capable of Very Low Duty Cycle n Temperature Compensated Inductor DCR or Sense Resistor Output Current Monitoring n Differential Remote Output Voltage Sensing with Programmable Active Voltage Positioning n Phase-Lockable Fixed Frequency: 150kHz to 550kHz n Programmable UVLO, Preset V OUT at Boot-Up n Programmable Slow Slew Rate Sleep State Exit n Internal LDO for Single Supply Operation n Overvoltage and Overcurrent Protection n PWRGD and VRTT# Thermal Throttling Flags n Power Optimization During Sleep and Light Load n 38-Pin Thermally Enhanced eTSSOP and 5mm s 7mm QFN Packages n Embedded Computing n Mobile Computers, Internet Devices n Navigation Displays L, L T , L TC, L TM, Linear Technology and the Linear logo are registered trademarks and R SENSE is a trademark of Linear Technology Corporation. All other trademarks are the property of their respective owners. Protected by U.S. Patents, including 5408150, 5055767, 5481178, 6580258. + PWRGD CLKEN# VRTT# VR ON DPRSLPVR MODE/SYNC RFREQ LFF VID0-VID6 VR ON DPRSLPVR MODE/SYNC INTV CC PWRGD CLKEN# VRTT# TG BG BOOST SW BSOURCE I SENP I SENN I MAX RPTC 22pF R PTC 470pF 10k 12k 22pF 10pF 2.2nF CSLEW SS I TCFB PREI MON I TC V CC(SEN) V SS(SEN) COMP V FB SERVO GND I MON V IN 1.9k V3 3.3V V CCP 1.1V 1.9k 56Ω EXTV CC L TC3816 INTV CC 47µF s 2 + 10µF s 2 330µF s 3 + 10µF s 20 4.7µF 0.1µF 0.1µF 15nF 2.55k + V CC(CORE) NTC 0.33µH, 1.3mΩ V IN 4.5V TO 36V 6.98k 8.25k 5.1k 14k 10k 21k 15nF I MON 3816 TA01 LOAD CURRENT (A) 30 EFFICIENCY (%) POWER LOSS (W) 90 100 20 10 80 50 70 60 40 0.01 1 10 3816 TA01b 0 3 9 10 2 1 8 5 7 6 4 0 0.1 V IN = 12V, f OSC = 400kHz V CC(CORE) = 0.75V, V EXTVCC = 5V FORCED CONTINUOUS MODE V IN + V EXTVCC LOSS EFFICIENCY Efficiency and Power Loss vs Load Current Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 PowerFETs VID1 VID2 VID3 VID4 VID5 VID6 Output voltage control = + Figure 3.7: VR schematic used in the spice simulation. account of the load current driving capability of the VR and power/area overhead of the the network switches, I set the number of VRs and cores in one group of the VR-to-core networks to four. 3.4.1.3 Heterogenous PDN setup As I discussed in Section IV , in order to mitigate the overheads of the big VRs in the heterogenous PDN, I chose to replace R little VRs by the same number of big VRs. And, I limit one network group to support only connections between four cores and four VRs. I used LTC3816 as the big VR, which was also used in the previous homogeneous PDN, but I changed the power MOSFET inside LTC3816 so that the big VR has a higher current driving capabilityCap big . To determine such a power MOSFET, I first 77 0 0.5 1 1.5 2 2.5 x 10 4 0 20 40 60 80 0 0.5 1 1.5 2 2.5 x 10 4 0 5000 0 0.5 1 1.5 2 2.5 x 10 4 0 5000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 0 5000 10000 15000 0 20 40 60 80 90 0 5000 10000 15000 0 2000 4000 0 5000 10000 15000 0 2000 4000 0 5000 10000 15000 0 2000 4000 0 5000 10000 15000 0 2000 4000 0 5000 10000 15000 0 1000 2000 3000 4000 5000 6000 20 40 60 80 0 0 5 10 15 1 2 3 4 Efficiency (%) Power loss (W) Load Current (A) 0 0.5 1 1.5 2 2.5 x 10 4 0 20 40 60 80 90 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0.5 1 1.5 2 2.5 x 10 4 0 1000 2000 3000 4000 5000 6000 7000 8000 data1 data2 data3 data4 data5 data6 Output voltage 1.2V Output voltage 0.95V Output voltage 0.83V Output voltage 1.05V Output voltage 0.75V (a) 20 40 60 80 0 0 5 10 15 20 1 2 3 4 5 6 Efficiency (%) Power loss (W) Load Current (A) (b) 20 40 60 80 0 0 5 10 15 20 25 1 2 3 4 5 Efficiency (%) Power loss (W) Load Current (A) (c) 0 0.5 1 1.5 2 x 10 4 0 20 40 60 80 0 0.5 1 1.5 2 x 10 4 0 5000 0 0.5 1 1.5 2 x 10 4 0 5000 0 0.5 1 1.5 2 x 10 4 0 5000 0 0.5 1 1.5 2 x 10 4 0 5000 0 0.5 1 1.5 2 x 10 4 0 1000 2000 3000 4000 5000 6000 Figure 3.8: Efficiency and Power loss vs. Load current for LTC3816 with (a) Si4840DY , (b) Si4838DY and (c) Si4442DY . set the baseline: a homogenous PDN that employs little VRs with Si4840DY power MOSFET (Cap little =12.4A). I used a testbench to perform Algorithm 2, which was one of the DVFS results in Section 3.4.1.1. Precisely, I set one network group to have four cores, and ran Barnes in two of the cores and FMM in the other two cores. The performance penalty of the testbench was 15%. Next, I investigated the Vishay power MOSFETs, such as Si4114DY , Si7106DN, Si4442DY and Si4838DY , as introduced in Table 3.2. I performed Algorithm 2 to find the power MOSFET equipped in the big VR and the number of big VRs that could achieve the highest Gain. For readers’ better 78 Table 3.2: Design procedure to build the heterogeneous PDN, following by Algorithm 2: the baseline (homogenous) PDN with Si4840DY (Cap little =12.4A) archivesG VR = 22:56%, andG total = 5:56%. Details are described in Section 3.4.1.3. Big VR Cap big R G VR (%) G total (%) Si4114DY 15.2A 1 23.90 5.89 Si7106DN 19.5A 1 24.05 5.93 Si4442DY 22A 1 24.66 6.09 Si4838DY 25A 1 16.06 3.96 Si4114DY 15.2A 2 24.18 5.96 understanding, I define the total VR energy loss reductionG VR (%) and the total energy saving in the platformG total (%). Table 3.2 shows that replacing a small VR by a big VR that includes Si4442DY as the power MOSFET [52] results in the highest improvement. Finally, I selected the power MOSFET Si4442 for the big VR in Table 3.2. Si4442DY has on-state resistanceR DS and chargeQ g of 5m and 36nC, respectively, whereas its maximum drain currentI D is 22A. As aforementioned, due to the smaller resistance but higher on-state charge andI D of Si4442DY than those of Si4840DY , this big VR is less efficient than the little VR if the current is low, but achieves high effi- ciency in the high current region. In other word, this big VR can drive the higher load current with high efficiency than the little VR. Fig. 3.8 (c) shows the efficiency of the big VR, where its driving current capability is 22A. I determined the number of big VRs to one in one group. Indeed, the improvement by exploiting the big VR in Table 3.2 is not so distinguishable. This is because the given load current profiles from the bench- marks were well matched to the homogeneous PDN with the little VRs. However, if the cores run into some different load current conditions that were not captured by the used benchmarks, the need to use the big VRs should be enlarged. For instance, one case that the four cores have 1A, 1A, 1A and 12A results that one big VR can power all the cores with high efficiency, but the homogenous PDN has to use two little VRs, and one 79 of them has the load current just 3A that corresponds to very low efficiency (cf. Fig. 3.8 (a)). I will discuss this later in Section 3.4.3. 3.4.2 Homogeneous PDN results 3.4.2.1 Homogeneous PDN composed of the VRs with Si4840DY (simply called well-matched PDN) Following Section 3.2.2 and 3.2.3, I performed the reactive and proactive VRCon (cf. Algorithm 1) in the homogeneous PDN. Fig. 3.9 shows the proactive VRCon result of the per-core DVFS example described in Fig. 3.5. In the figure, the voltage levels of some of the cores in certain decision epochs are changed from their initial levels for the VR consolidation, or some of the cores are consolidated without voltage level change. Fig. 3.9 also provides a histogram to show how often the consolidation occurs. As aforementioned, by defining the total VR energy loss reduction as G VR and the total energy saving in the platform asG total , from the baseline VR and platform energy 0 1 2 3 4 5 6 7 8 9 x 10 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 10 20 30 40 50 60 70 80 90 Time (ms) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Case 0 1 2 3 4 5 6 7 8 9 x 10 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 10 20 30 40 50 60 70 80 90 Time (ms) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Case Figure5: VRConresultfromFigure5 Table1: VRCon Case Result* Case Result* Case Result* 0 (1,2,3,4) 5 (1,2),(3,4) 10 (1,4),(2),(3) 1 (1,2,3),(4) 6 (1,3),(2,4) 11 (2,3),(1),(4) 2 (1,2,4),(3) 7 (1,4),(2,3) 12 (2,4),(1),(3) 3 (2,3,4),(1) 8 (1,2),(3),(4) 13 (3,4),(1),(2) 4 (1,3,4),(2) 9 (1,3),(2),(4) 14 (1),(2),(3),(4) TuplesinResult*indicatetheconsolidatedsetsofcores to all configurations and it optimizes DVFS settings based on a global view of workload characteristics. We formulate the DVFS control problem as an integer linear programming (ILP) optimiza- tion problem, which seeks to reduce the total power consumption of the processor within specific performance constraints (?). This approach is similar to the one proposed in [38]. We divide the appli- cation runtime into N intervals based on different temporal gran- ularities of DVFS. A total of L = 4 voltage/frequency (V/F) levels are considered. For each runtime interval i and frequency j, thepowerconsumption,Pij,iscalculated. Thedelayforeachinter- val and V/F level, Dij, is also calculated. Heuristics for the de- lay of individual intervals are obtained by calculating the relative memory-boundness of each interval through cache miss behavior. Equations1-3specifytheILPformulationofourofflinealgorithm. The overheads associated with switching between different volt- age/frequenciessettingsarenotconsideredintheoptimization,but areincludedlaterinSection4. The offline algorithm finds voltage/frequency settings at each interval to minimize power while maintaining a specified perfor- mance constraint. In this study, we consider performance con- straintsof1 DVFStechniquesmaybeusedtoreducetheenergyconsumption ofanexecutedtaskwhileensuringthatthetaskmeetsitsdeadline. However, these techniques are not directly applicable to general- purpose operating systems because they assume that critical infor- mation about all tasks, such as the task arrival time, deadline, and workload,areknowninadvance. Moreover,theworkloadofatask is often represented by the number of CPU clock cycles required to complete the task regardless of whether the workload consists of mainly CPU-bound or memory-bound instructions. The latter information is, of course, critical in determining the idle time of the CPU. In this paper, we propose an intra-process DVFS tech- nique for non real-time operation in which finely tunable energy and performance trade-off can be achieved. The main idea is to lower the CPU frequency during the CPU idle times, which are, in turn,duetoexternalmemorystalls. Nowifthetaskexecutiontime is dominated by the memory access time, then the CPU speed can besloweddownwithlittleimpactonthetotalexecutiontime. 4.2 Simulationresults 5. CONCLUSIONS 6. REFERENCES [1] W.Kim,M.Gupta,G.Wei,andD.Brooks,“Systemlevelanalysisof fast,per-coreDVFSusingon-chipswitchingregulators,” HPCA, 2008. [2] T.Kolpe,A.Zhai,andS.S.Sapatnekar,“Enablingimprovedpower managementinmulticoreprocessorsthroughclustereddvfs,” DATE, 2011. [3] S.Bandyopadhyay,Y.K.Ramadass,andA.P.Chandrakasan,“20uA to100mADC-DCconverterwith2.8to4.2Vbatterysupplyfor portableapplicationsin45nmcmos,” ISSCC,2011. [4] W.Kim,D.M.Brooks,andG.Wei,“Afully-integrated3-level DC/DCconverterfornanosecond-scaleDVSwithfastshunt regulation,” ISSCC,2011. [5] T.E.Carson,W.Heirman,andL.Eeckhout,“Sniper: Exploringthe levelofabstractionforscalableandaccurateparallelmulti-core simulation,” SC, available at snipersim.org,2011. [6] O.Abdel-Rahman,J.A.Abu-Qahouq,L.Huang,andI.Batarseh, “AnalysisanddesignofvoltageregulatorwithadaptiveFET modulationschemeandimprovedefficiency,” IEEE T. of Power Electronics,2008. [7] S.KudvaandR.Harjani,“Fully-integratedon-chipDC-DCconverter witha450Xoutputrange,” IEEE J. of Solid-State Circuits,2011. [8] W.Lee,Y.Wang,D.Shin,N.Chang,andM.Pedram,“Power conversionefficiencycharacterizationandoptimizationfor smartphones,” ISLPED,2012. [9] A.A.Sinkar,H.Wang,andN.Kim,“Workload-awarevoltage regulatoroptimizationforpowerefficientmulti-coreprocessors,” DATE,2012. [10] Y.Choi,N.Chang,andT.Kim,“DC-DCconverter-awarepower managementforlow-powerembeddedsystems,” IEEE T. on Computer-Aided Design of Integrated Circuits and Systems,2007. [11] “PTM,” http://ptm.asu.edu. [12] L.Benini,A.Bogliolo,andG.D.Micheli,“Asurveyofdesign techniquesforsystem-leveldynamicpowermanagement,” IEEE T. on VLSI systems,2000. [13] K.Choi,R.Soma,andM.Pedram,“Fine-graineddynamicvoltage andfrequencyscalingforpreciseenergyandperformancetrade-off basedontheratioofoff-chipaccesstoon-chipcomputationtimes,” IEEE T. CAD,2005. 20 40 60 80 100 0 Proportion (%) 0 1 2 3 4 5 6 7 8 13 ~ Figure 3.9: VRCon result from Fig. 3.5. 80 Table 3.3: Simulation results of the Homogeneous PDN (VRs with Si4840DY): App.*, , Re.*, Pro.*, G VR (%) and G total (%) indicate the application, DVFS performance penalty, reactive, proactive, VR energy loss reduction and total energy saving, respec- tively. App.* VRCon = 5% = 10% = 15% G VR G total G VR G total G VR G total Stream- Re.* 24.68 6.23 19.01 4.88 16.19 4.16 cluster (I) Pro.* 28.81 7.28 23.19 5.95 20.21 5.19 Barnes Re.* 23.78 5.80 23.00 5.86 21.42 5.45 (II) Pro.* 32.21 7.86 31.30 7.98 29.50 7.51 Ocean Re.* 15.43 4.07 16.12 4.31 16.30 4.34 (III) Pro.* 19.11 5.04 19.74 5.28 19.77 5.26 Chole- Re.* 12.84 3.17 15.34 4.04 15.39 4.13 sky (III) Pro.* 18.99 4.70 21.54 5.68 21.46 5.75 Swap Re.* 20.82 5.77 20.82 5.77 20.82 5.77 tion (I) Pro.* 24.34 6.75 24.34 6.75 24.34 6.75 FFT Re.* 6.21 1.13 6.42 1.22 6.51 1.29 (II) Pro.* 6.40 1.16 6.59 1.25 6.70 1.33 Raytr- Re.* 17.74 3.33 23.24 4.77 27.28 5.95 ace (III) Pro.* 18.09 3.40 22.96 4.71 27.52 6.01 FMM Re.* 10.02 2.23 11.63 2.75 11.34 2.68 (III) Pro.* 16.04 3.57 17.73 4.20 17.21 4.07 consumption (note that these baselines are resulted from the initial DVFS setup derived from (3.8)), the result in Fig. 3.9 achievesG VR = 15:45%, andG total = 4:02%. If only the reactive VRCon were applied,G VR = 12:44%, andG total = 3:24%. I performed simulations on various applications under the different simulator setups (different number of cores) and different initial DVFS recommendations (derived from three different performance penalties). Table 3.3 shows the results. The number in the application name indicates the simulation setups: (I), (II) and (III) are for the 16-core, 8-cores and 4-cores setups, respectively. WhileStreamcluster,Barnes andRaytrace resulted more than 25%G VR , others exceptFFT achieves around 20%G VR . Especially,Barnes improved 32% VR energy 81 Table 3.4: Simulation results of the mismatched homogenous PDN.loss mis is the total energy increase (%) compared to the total energy with the well-matched homogenous PDN. Application loss mis Reactive Proactive ( = 15%) (%) G VR G total G VR G total Streamcluster (I) 12.07 18.72 6.38 27.99 9.54 Swaption (I) 9.02 17.80 6.00 21.51 7.25 Barnes (II) 7.69 22.17 6.82 29.77 9.16 FFT (II) 7.91 7.98 2.05 8.45 2.17 Ocean (III) 10.04 16.52 5.50 18.89 6.29 Raytrace (III) 11.21 39.75 11.81 42.35 12.58 Cholesky (III) 10.21 19.06 6.41 24.82 8.34 FMM (III) 6.84 13.69 3.91 18.99 5.42 loss reduction which achieved 8% total energy savings. The reason why the gains of FFT were small may be because the load current values of each core fromFFT are so high that (i) the sum of the load current values may be over the capability of the single VR or (ii) the efficiency corresponding to each load current value is already high, so the increased efficiency from the consolidation may not be distinguishable. In addi- tion,Swaptions, as an example of memory-bound application, where no performance degradation was observed despite DVFS level drops, its initial DVFS recommendations for the three performance penalties are the same. That is why the VRCon results of Swaption for different values show the same improvements in the table. 3.4.2.2 Homogeneous PDN composed of the VRs with Si4838DY (simply calledmismatched PDN) I then performed simulations on the same applications in Table 3.3, but exploiting the mismatched PDN. Table 3.4 shows the improvement results from the case of each ap- plication that the DVFS performance penalty, , is 15%. I definedloss mis to indicate 82 Table 3.5: Simulation results of the heterogeneous PDN.G VR andG total are the gains from the proactive VRCon. Application = 5% = 10% = 15% G VR G total G VR G total G VR G total Streamcluster (I) 16.95 4.28 12.09 3.10 9.45 2.43 Barnes (II) 36.46 8.89 33.21 8.47 30.34 7.73 Ocean (III) 10.97 2.89 11.69 3.13 11.90 3.17 Cholesky (III) 19.65 4.87 20.72 5.47 19.93 5.34 Swaption (I) 14.74 4.09 14.74 4.09 14.74 4.09 FFT (II) 12.67 2.30 12.00 2.29 12.13 2.41 Raytrace (III) 19.72 3.72 21.31 4.38 28.66 6.25 FMM (III) 19.94 4.44 21.88 5.19 20.41 4.83 how much (%) the total energy increased by changing the well-matched PDN to the mis- matched PDN. The table shows thatloss mis can be upto 11%. Note that the gains here were derived based on the total and VR energies from the mismatched PDN without the reconfigurable setup, not based on the energies from the well-matched PDN setup. Except the gains of Swaption and Ocean that become slightly reduced, the gains of all the applications, includingG VR = 42% fromRaytrace, shows the increased results than corresponding results in Table 3.3. This implies that the efficacy of the VRCon may become more powerful, as I discussed in Section 3.2.4. 3.4.3 Heterogeneous PDN results I finally performed the heterogenous PDN simulations, following Section 3.3.2. I first explored the same applications used in the homogenous PDN simulations. For the fair comparison, the gains here were calculated based on the VR and total energies resulted from the well-matched PDN without the reconfigurable setup. Table 3.5 shows the re- sulted gains, that the results from the applications except Streamcluster, Swaption 83 Table 3.6: Results of the both homo- and heterogenous PDN from the three scenarios. Gains are from the proactive VRCon. Application Homogenous PDN Heterogenous PDN G VR G total G VR G total Scenario 1 11.45 2.15 44.14 8.31 Scenario 2 6.45 1.24 18.89 3.64 Scenario 3 10.96 2.11 28.96 5.59 and Ocean become higher than the results from the simulations with well-matched PDN. However, the applications in Table 3.5 may not encompass all the operating con- ditions of the cores, which may demonstrate the more superiority of the heterogenous PDN. In other words, as aforementioned in Section 3.4.1.3, there can be certain load cur- rent conditions of the cores, where the VRCon in the heterogenous PDN can achieves prominent power savings while the VRCon in the homogenous PDN can not. In order to capture such conditions, I manipulated three scenarios: (i) Scenario 1: one core kept 12A load current condition but the others kept loading 1A, (ii) Scenario 2: From the case ofStreamcluster with = 15%, I added 10A to the load current condition of only one core, (iii) Scenario 3; the same setup to the Scenario 2, but I usedRadiosity. The simu- lation results are shown in Table 3.6. As seen, the VRCon gains from the heterogenous PDN show much higher than those from the homogenous PDN. 3.5 Summary This chapter addressed the problem of power conversion efficiency in the multicore platform, where significant power is dissipated by the multiple VRs, and design limita- tions associated with the fixed VR-to-core network undermine the opportunity of power 84 savings from the per-core DVFS technique. This chapter proposed the VR consolida- tion methods with the configurable VR-to-core distribution network integrated in the proposed multicore platform design. The reactive VRCon was presented to configure the network to enhance the power conversion efficiency under the pre-determined DVFS levels. The proactive VRCon was proposed to determine new DVFS levels for maximiz- ing system-wide energy saving without performance degradation. I applied the proposed optimization methods to the PDN composed of homogeneous VRs, and demonstrated that the proposed method accomplish upto 32% VR energy loss reduction. Then I ex- plored the limitation of the homogenous PDN, and proposed the heterogenous PDN that can increase the benefits of the optimization methods by incorporating VRs with a larger driving capability of load current. The simulation results based on the realistic exper- imental setups demonstrated that the proposed methods achieve upto 36% VR energy loss reduction and 9% total energy saving. 85 Chapter 4 Optimizing the PDN in an OLED display platform 4.1 Introduction OLED (Organic Light Emitting Diode) has emerged as a promising light source for displays. This is primarily due to its distinct features, such as higher brightness and luminance, faster response, wider viewing angle, and lower energy consumption, etc., compared with LCD (liquid crystal display) [5, 57]. In addition, OLED has also at- tracted significant attention for potential applications in transparent and flexible dis- plays [58, 59]. As a result, continuous progress in OLED displays enables the market of OLED displays to grow steadily. It is expected that OLED revenues will rise to about $8 billion in the year 2017 [60]. Among all OLED applications, the market of large-area OLED panels has been growing rapidly. The OLED TV is expected to be the second largest OLED applica- tion in 2017, with around $3 billion revenue following the mobile phone display with around $4 billion. In reality, enlarging the size of flat OLED panels has been facing many technical setbacks in panel fabrication and control. For example, the occurrence of short circuits, non-uniformity of light emission, local heat generation and hot spots are the well-known problems in the large-area OLED panels [61]. However, extensive efforts by industry and academia have successfully suppressed those fabrication issues 86 by exploiting new materials and patterning techniques [58, 62]. As a result, 65" 4K (UHD or Ultra High Definition) OLED panel has already been commercialized and available in the market. Low power efficiency of OLED displays still remains a problem not only for large- area OLED panels but also for small portable panels. This is because of the character- istics of OLED: OLED is a surface-emitting lighting source, each pixel is composed of red, green, and blue cells. Cells with different displayed colors have different power efficiencies and different power consumptions at a given luminance level (for example, blue cells have the lowest power efficiency [57].) As a result, to show the black color, an OLED pixel (with red, green, and blue cells) consumes less than 40% the amount of power compared with displaying black in an LCD pixel, whereas displaying white consumes almost three times as much power as that of a LCD pixel [5]. To tackle this drawback, many power management methodologies have been pro- posed, which mainly focus on controlling the pixel colors composition. For instance, lo- cal dimming method for OLED displays has been invented in [63], and color re-mapping methods have been presented in [64, 65]. Furthermore, a supply voltage scaling method of OLED displays (OLED-DVS) has been proposed in [66] to reduce power waste in OLED pixel drivers. Given that the luminance of the OLED pixel is proportional to its driving current, this OLED-DVS method can maintain the image quality as long as the driving current of OLED pixels can be maintained regardless of the voltage scal- ing. The supply voltage control architecture and on-line control scheme for an image sequence has been introduced in [57], and the OLED-DVS technique has been applied to online movie streaming [67]. Recently, the more aggressive approach of the OLED- DVS has been investigated [5, 68], which partitions a panel into the several zones (sub- panels), and applies potentially different voltage levels to the different zones. Applying 87 the OLED-DVS to each zone can take a full advantage of power-saving that the DVS method potentially achieves: If DVS is applied to a whole panel, some regions of the panel may not need a high voltage level, but their voltage level can not be lowered because of the other regions. I call this method the zone-specific OLED-DVS. Among all the proposed methods, I pay attention to the zone-specific OLED-DVS, which is directly applicable to the large-area OLED display. In addition to improving the power efficiency, large-area OLED panels also benefit from fine-grained control of the zoned panel due to the reduced IR drop and the resulting enhanced image quality. The voltage distribution on power lines in large Active Matrix OLED (AMOLED) dis- plays has been investigated as a function of panel size in [69]. It is reported that, due to the IR-drop, the supply voltage drops significantly as the panel size increases. Indeed, depending on the location of pixels and the paths of current flowing from the power sup- ply to each individual pixel, the amount of the IR-drops may be different for different pixels. This phenomenon will ultimately affect the image quality due to non-uniformity of brightness. In recent large-area OLED panels, uneven distribution of supply voltage emerges as a critical problem [61]. To tackle this IR-drop problem, dividing a large panel into several sub-panels, each of which has a dedicated power line, has been intro- duced [70]. Unfortunately, to the best of my knowledge, there has been no study on the published architecture about how one can deliver multiple supply voltages to the multiple sub- panels. To support every sub-panels to have an independent voltage control, at least one VR per sub-panel should be attached to the external power supply board or panel. But a large number of VRs incur significant area, cost and power overheads by introducing large number of inductors and capacitors [20, 7, 8]. For instance, as a panel is zoned with finer granularity, the required number of VRs increases, and eventually reaches to the 88 maximum allowable space. In addition, although the state-of-the-art VRs exhibit high peak efficiency, their efficiency can drop dramatically under adverse load conditions (i.e., out-of-range load current levels). Thereby, if many VRs operate at low efficiency, the power loss of the VRs becomes critical. In this work, I present a power delivery architecture based on a reconfigurable switch network to maximize efficacy of the DVS method in the large OLED panel with the minimum overhead of the multiple VRs. The proposed reconfigurable power delivery network (PDN) present the minimum number of VRs but achieves their full potential. The basic concept of the proposed PDN is that grouping some sub-panels to be powered by a single VR can reduce the VR power loss significantly. For example, if the sub- panels that drive relatively small amount of load current are grouped together, the single VR has relatively high load current. Due to characteristics of the VR efficiency, the VR then may operate at higher efficiency. Of course, when grouping the sub-panels, I should also take account the power consumption of the sub-panels and power losses induced by IR-drops. Therefore, I also propose an optimization algorithm to control the proposed PDN to minimize the power consumption of the whole system. This algorithm is to optimally divide the sub-panels into the several groups, and perform the group-level DVS. I validate the proposed methods on an AMOLED panel model that I develop for the realistic experiment. I target a 65” TV platform that supports 4K UHD (4096 x 2160) resolution. I perform detailed simulations on the target platform with a commercial VR carefully selected for fair evaluation. Results demonstrate that up to 36% power savings can be achieved. 89 4.2 OLED-DVS with the zoned, large-area OLED dis- play platforms 4.2.1 Preliminary: OLED-DVS Like LCD panels, OLED display panels can be classified into two types: passive matrix OLED (PMOLED) and active matrix OLED (AMOLED). The PMOLED panels consist of simple driver structures, hence their manufacturing cost is to be low and the control scheme is relatively simple. However, the simple structure of the PMOLED panels inherently restricts the resolution low and panel size to be small (i.e., typically up to 3") [57]. In contrast, the AMOLED panels are driven by a thin-film transistor (TFT) with a storage capacitor, which enables large size and high resolution displays. Therefore, in this work, I target the AMOLED panels for the large-area displays. Figure 4.1 shows the DVS-friendly AMOLED driver structure [5]. In the figure,I cell andV cell are the driving current and cell voltage of an OLED cell, respectively, andV drop denotes the voltage drop on driver transistors and OLED cell resistance.I cell determines the luminance of the OLED cell, and is controlled by a constant current source I data with a current mirror circuitry. This structure is called an amplitude modulation (AM) driver. The operation of the AM driver is well described in [5]. I thus omit the detailed explanation in this chapter, but underline its key feature that enablesV DD to be scaled down to a certain degree while still satisfying the conditionI cell =I data . LoweringV DD to reduce power loss from the driver circuit is the key concept of the OLED-DVS. UnlessV DD is reduced below a certain threshold that inducesI cell <I data , this OLED-DVS method will not cause any image distortion. Given that the forward 90 Vconrol Storage capacitor Vselect VDD OLED cell Icell Dataline Zoom in Idata Vdrop Vcell Figure 4.1: AMOLED driver structure based on the DVS-friendly circuit [5]. biasing voltage of the diode isV cell , the power lossP driver due to voltage drop can be expressed as follows: P driver =I cell V drop =I cell (V DD V cell ); (4.1) whereI cell can be given by [57]: I cell (V cell ) = (e V cell 1): (4.2) In (4.2), and are design parameters of OLED cells. Typically, OLED drivers are designed to have 50% to 100% headroom between static V DD andV cell , so as to guarantee full contrast and luminance on the panel [66]. Never- theless,I cell seldom reaches its maximum value in reality, and thus the large headroom results in a low power efficiency of the OLED panel. In other words, OLED-DVS can be applied to most of displayed images, and the resulting power savings may be sig- nificant [5, 57]. I cannot controlV DD values cell by cell or even pixel by pixel due to the implementation difficulties and expense (e.g., overhead of VRs– I will discuss this 91 issue in details in the following subsections). Rather, DVS can be either applied to the overall panel [57] or to multiple sub-panels at finer granularity (the zone-specific OLED- DVS) [5, 68, 67]. This limitation may cause image quality degradation for some cells due to aggressive reduction ofV DD in order to maximize power savings. In this chap- ter, I maintain the maximum acceptable level of image distortion for human perception when applying OLED-DVS. Details will be discussed in Section 4.4.1. 4.2.2 Zoned OLED display panel As the size of the OLED panel has been continuously increasing, the non-uniform light distribution has become a critical problem for the large-area OLED panel [61]. Various solutions have been proposed to tackle this problem. For instance, optimizing the ratio between the effective horizontal resistance of the anode and the vertical resistance of the OLED device has been presented in [58]. Utilizing an auxiliary metal (Cr) electrode that is deposited and patterned on indium-tin-oxide (ITO) layer of the OLED cell has been introduced in [62]. In general, most approaches focus on the power distribution inside the panel or explore the OLED cell structure with new materials. Although the previous solutions enhance the luminance uniformity of the OLED panel, their efficacy may be limited to the relatively small panel size (e.g., 150x150mm 2 ). This is because the increasing driving current in a large-area panel (e.g., 65" TV panel with 1500x900mm 2 area) induces significant IR-drop through current conducting paths from the power supply to OLED cells. To overcome this problem, researchers have proposed to divide the panel into multiple sub-panels/zones, whereby each sub-panels have different voltage levels compensating their own amounts of IR- drop [70]. If the size of a sub-panel is small enough such that the supply voltage drop is insignificant within each sub-panel [58, 62], the overall IR-drop can be significantly 92 Power Board 450mV 368mV 227mV 412mV 891mV 663mV 485mV 716mV 391mV 321mV 417mV 844mV 102mV 0mV 0mV 264mV Power terminal (a) (b) (c) Conducting wire Different IR-drops Figure 4.2: An example of the IR-drop in 4x4 zoned panel. (a) original 4K image, (b) the structure of the panel and power supply board, and (c) IR-drop values of the sub-panels. mitigated. Figure 4.2 shows an example of IR-drops when a 65" 4K panel displays ’Bal- loons’. To derive the conducting wire resistance, I assume a copper wire of 0.129 mm 2 cross-sectional area, which is used in commercial OLED panels [71]. Its unit-length resistance is 5.97E-4 =mm. If I divide the panel into 2x2 sub-panels, the resulting IR- drops of sub-panels can reach a maximum of 6V . Since the typical supply voltage level of the OLED panel is around 1415V , 6V IR-drop can cause severe image distortion. However, if I divide the panel into 4x4 sub-panels so that the driving current of each sub-panel decreases, the maximum IR-drop of a sub-panel is less than 0.9V as indicated in Figure 4.2 (c)). Meanwhile, regardless of the IR-drop issue, the zone-specific OLED-DVS in [5, 68, 67] also exploit panel partitioning. With minimum image distortion, this method 93 Buck controller L R C C Inductive switching DC-DC converter Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 M1 M2 PWM sensed I L R L Loads Iload Boost controller Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 M3 Vishay Siliconix Si4838DY Document Number: 71359 S09-0221-Rev. D, 09-Feb-09 www.vishay.com 1 N-Channel 12-V (D-S) MOSFET FEATURES • Halogen-free According to IEC 61249-2-21 Available • TrenchFET ® Power MOSFETs: 2.5 V Rated • 100 % R g Tested PRODUCT SUMMARY V DS (V) R DS(on) ()I D (A) 12 0.003 at V GS = 4.5 V 25 0.004 at V GS = 2.5 V 20 S S D D D S G D SO-8 5 6 7 8 Top View 2 3 4 1 Ordering Information: Si4838DY -T1-E3 (Lead (Pb)-free) Si4838DY -T1-GE3 (Lead (Pb)-free and Halogen-free) D G S N-Channel MOSFET Notes: a. Surface Mounted on 1" x 1" FR4 board. ABSOLUTE MAXIMUM RATINGS T A = 25 °C, unless otherwise noted Parameter Symbol 10 s Steady State Unit Drain-Source Voltage V DS 12 V Gate-Source Voltage V GS ± 8 Continuous Drain Current (T J = 150 °C) a T A = 25 °C I D 25 17 A T A = 70 °C 20 13 Pulsed Drain Current (10 µs Pulse Width) I DM 60 Continuous Source Current (Diode Conduction) a I S 2.9 1.3 Maximum Power Dissipation a T A = 25 °C P D 3.5 1.6 W T A = 70 °C 2.2 1 Operating Junction and Storage Temperature Range T J , T stg - 55 to 150 °C THERMAL RESISTANCE RATINGS Parameter Symbol Typical Maximum Unit Maximum Junction-to-Ambient a t 10 s R thJA 29 35 °C/W Steady State 67 80 Maximum Junction-to-Foot (Drain) Steady State R thJF 13 16 PWM R M1 Q M1 R M2 Q M2 M4 Q M3 R M3 R M4 Q M4 (a) (b) 5 Iload (A) Vout (V) Efficiency (%) 0 2000 4000 6000 8000 10000 5 10 15 20 30 40 50 60 70 80 90 100 I out (mA) V out (V) Efficiency (%) 2 4 6 8 10 0 10 15 20 30 40 50 60 70 80 90 100 Figure 4.3: Buck boost VR. (a) the VR schematic and (b) the conversion efficiency vs. I load andV out . has proved that controlling the sub-panels at finer-granularity can enhance the power efficiency of the OLED panel. However, controlling each sub-panel individually as suggested in [70, 5, 68] necessitates a dedicated VR for each sub-panel, which incurs significant area overhead and extra cost and may result in low conversion efficiency in the VRs. Next I will discuss VR characteristics in order to find an effective way for simultaneous conversion efficiency improvement and overhead reduction. 4.2.3 Buck-Boost VR characteristics In the previous section, Section 1.1.2.1, we have discussed about the characteristics of the inductive VR that is designed for buck-mode operation. In this section, we addition- ally introduce the buck-boost-mode VR and its characteristics. Figure 4.3 (a) shows the schematic of a VR that regulates the output voltage V out , supporting both buck-mode 94 control (V out < V DD = 12V ) and boost-mode control (V out > V DD ). The VR has four powerFETs (M1 to M4), one inductor (L) and one capacitor (C), and two PWM controllers. R component and Q component denote the resistance and stored charge of the corresponding component (powerFET, inductor, or capacitor), respectively. Due to the fact that these components have large footprints, the number of VRs should be limited for the OLED panel. State-of-the-art VRs exhibit high peak power efficiency when the load current is within a certain desirable range, but their efficiency drops significantly when the load current is out of the range. The power conversion efficiency can be defined as: = P out P in = V DD I load P DC V DD I load ; (4.3) whereI load is the load current of the VR, andP DC is the power loss of the VR.P DC in the buck-mode is [72]: P DC =I 2 load (R L +D buck R M1 + (1D)R M2 ) + (I buck ) 2 12 (R L +D buck R M1 + (1D buck )R M2 +R C ) +V DD f sw (Q M1 +Q M2 ) +V DD I ctrl ; (4.4) andP DC in the boost-mode is: P DC = I 2 load 1D boost (R L +D boost R M3 +R M1 +D boost (1D boost )R C ) + (I boost ) 2 12 (R L +D boost R M3 + (1D boost )(R M4 +R C ) +R M1 ) +V DD f sw (Q M3 +Q M4 ) +V DD I ctrl : (4.5) 95 In (4.4) and (4.5),D buck andD boost are the PWM duty ratios in buck and boost modes, respectively, satisfying D buck = V out =V DD and D boost = 1V DD =V out . f sw and I ctrl are the switching frequency and the PWM controller current, respectively; I buck is the current ripple given by Vout(1D buck ) Lfsw , while I boost is given by V in D boost Lfsw . Figure 4.3 (b) shows the conversion efficiency as a function of I load and V out . As seen in the figure, the conversion efficiency under small load current is very low due to the static power consumption of the VR. When the load current is high, the current-induced IR loss will dominate the VR power loss. On the other hand, the output voltage affects the duty ratio of the PWM control and, in turn degrades the efficiency as the difference between the input and output voltage increases. 4.3 Design and Dynamic Control of PDN for OLED dis- play platforms Due to the critical IR-drop, large OLED panels must be divided into multiple sub-panels. However, the optimal design and control of power delivery network (PDN), which com- prises of multiple VRs connecting from the power source to sub-panels, becomes a critical task that requires investigation. For example, if a 65" OLED panel is divided into 8x8 sub-panels, the methods proposed in [5, 68, 67] that supply every sub-panel with a dedicated VR will result in 64 VRs. These VRs may require more than $1000 ad- ditional cost and 110 cm 2 space, which is a significant overhead. Moreover, as discussed in Section 4.2.3, the power conversion efficiency of VRs drops significantly under ad- verse load current conditions, which needs to be taken into account in the PDN design and control framework. 96 In this section, I first explore recent works on PDN architectures and PDN-aware power management. Then I will present the proposed optimal design and control frame- work of reconfigurable PDN for large-area OLED panels, in order for power consump- tion minimization accounting for VR characteristics and subject to the IR-drop con- straint. 4.3.1 PDN architectures in Multicore platform The problem to minimize the power consumption of DVS-enabled OLED displays is similar to the power minimization problem using dynamic voltage and frequency scal- ing (DVFS) in multicore platforms (with/without consideration of the PDN). Therefore, I first explore the previous work on DVFS for multicore platforms. For multicore plat- forms, per-core DVFS has been proposed to achieve the full potential of power saving by DVFS [19]. This approach faces the problem of high VR overhead because one ded- icated VR is required for each core, similar to the case of applying dedicated DVS to each sub-panel/zone separately in an OLED panel. To tackle this drawback, per-cluster DVFS method has been suggested [20]. With a new structure of multicore platform that includes clustered cores and VRs dedicated to each cluster, this method uses task scheduling and migration schemes, i.e., I can select a core where a target task runs, so that the cores in a cluster have similar DVFS levels. However, in the OLED platform, it is fundamentally impossible to perform such task assignment schemes. Thus I cannot apply this method in the OLED display platform. In the previous chapter, I have introduced a reconfigurable PDN for the multicore platform [7, 8]. With a reconfigurable switch network and a methodology for VR con- solidation, this work supports per-core DVFS and simultaneously achieves high power conversion efficiency. More precisely, the main idea is to combine a certain number of 97 cores, which require relatively small amount of load currents, to be powered by a single VR. As a result, a single VR supplies a relatively high load current likely to be within the high-efficiency range, and its conversion efficiency will be higher (c.f. Figure 4.3 (b)), whereas the other VRs that are not used can be turned off to save power. Motivated by [7, 8], I, again, propose a reconfigurable PDN for the zoned, large-area OLED panel. 4.3.2 Reconfigurable PDN for OLED displays Motivated by the PDN architectures in multicore platforms, I propose a reconfigurable PDN for a large-area OLED panel. My presented reconfigurable PDN aims to make a single VR supply power for multiple sub-panels, thereby simultaneously reducing the number of required VRs and enhancing power conversion efficiency. More specifically, I adaptively group a number of sub-panels, and each group of sub-panels is connected to a single VR through a switchable network. Although both of my proposed reconfig- urable PDN and the one in the multicore platform [7, 8] exploit the benefit from VR consolidation, they have inherent differences: i) the number of VRs equipped in the OLED display is less than the total number of sub-panels, thus the sub-panels need to be grouped, ii) grouping the sub-panels and selecting the corresponding VR affect the IR-drop of conducting wires besides power conversion efficiency, a phenomenon that must be taken into account. Figure 4.4 shows the proposed reconfigurable PDN architecture for the zoned, large- area OLED display. It consists of a power supply board and an OLED panel. The power supply board includes multiple VRs and a switch network, and the OLED panel is divided into multiple sub-panels. Note that I integrate the switch network with the power supply board, because the large footprint of the multiple powerFET switches in the switch network makes it impossible for these switches to be integrated in the future 98 flexible and transparent OLED panels. Using this switch network incurs cost and area overhead. Multiple powerFETs in the switch network, which enable each VR to power multiple sub-panels, induces additional cost and space requirement. However, compared with incorporating additional VRs as discussed in [5, 68, 67], the switch network is more economical. Table 4.1 shows an example of component prices for a VR or switch network targeting the application of a 65" OLED panel. I choose LT3791, a buck-boost LED driver controller, along with one inductor, three capacitors and four powerFETs to build one VR, which costs around $19.6. At the same expense, I can adopt around 24 powerFETs in the switch network. In addition, the VR occupies at least 172mm 2 area, while a single powerFET switch occupies only around 4mm 2 . The switch network in Figure 4.4 makes it possible to minimize the power loss of the multiple VRs. For example, let’s suppose there are 8 sub-panels, each of which requires less than 100mA (i.e., the luminance of pixels in these sub-panels may be very low). If each VR is dedicated to each sub-panel, the conversion efficiency may be less than 50% (cf. Figure 4.3). However, if I configure the switch network to connect a single VR to the 8 sub-panels, the single VR drives 800mA, thus achieves more than 75% efficiency. To do so, because of single output voltage level of a VR, the voltage levels of the sub- panels should be the same. This constraint implies that configuring the switch network is intimately related with determining the voltage levels of sub-panels. I will discuss this issue in detail at the following subsection. Although the switch network has advantages in terms of conversion efficiency and area/cost overhead of the multiple VRs, the complexity of switch network needs to be limited, and therefore the number of sub-panels that one VR can be connected should be limited. For each VR, the required number of powerFET switches linearly increases with the number of sub-panels. Namely, to support one VR to connect to all sub-panels 99 (c) Geometrically-divided OLED panel Vsupply1 Vsupply2 (a) Multiple DC-DC converters Voled3 Voled4 Sub-panel3 Sub-panel4 OLED panel Voled2 Sub-panel2 Voled1 Sub-panel1 Converter 1 Converter 2 Power Supply Board Sub-panels (b) Switch network Converter subset 1 Sub-panel subset 1 Sub- network 1 Converters Zoom-in Converter subset 2 Sub-panel subset 2 Sub- network 2 Sub-system 1 Sub-system 2 Figure 4.4: Geometrically divided OLED display panel with multiple VRs connected by the switch network. The switch network is partitioned to sub-networks. of an A by B division, AB switches are needed. If A and B values are large, the area/cost overhead of switches becomes significant. Moreover, using too many switches gives rise to wasted power increase from the unused (turned-off) switches. More specif- ically, the power loss of a powerFET switchP SW is given by: P SW = I on +I off I g V DD Q g + 1 2 C OSS V 2 DD +I 2 on R SW ; (4.6) where I on and I off are the load currents when the powerFET switch is turned-on and turned-off, respectively; I g is the gate drive current; Q g , C OSS and R SW are the gate charge, output capacitance, and on-resistance of the powerFET switch, respectively. The first and second terms in (4.6) indicate the power loss when the switch is turned off. Therefore, I propose that the switch network should be divided into sub-networks, and each VR (and sub-panel) exclusively belong to its own sub-network. In this case, 100 one VR can be connected to all sub-panels belonging to the same sub-network, i.e., VRs and sub-panels that belong to the same sub-network can form a complete bipartite graph (cf. Figure 4.4). Of course, at the design time to determine the sub-network size, designers should consider (i) the aforesaid area/cost overhead of powerFET switches, (ii) the maximum current that a single VR will inject into the sub-network, and (iii) the power conversion efficiencies of VRs as discussed in Section 4.2.3. A sub-network should be designed to be neither too large due to the requirement of a large number of powerFET switches, nor too small due to the limited freedom in reconfiguration and the low conversion efficiency under low load current conditions. In this chapter, I investigate various sizes of sub-networks with 4 by 4 and 8 by 8 divisions of the 65" panel. Details are described at Section 4.4.2. 4.3.3 Dynamic Reconfiguration Algorithm of PDN for OLED Dis- play platforms In this section, I focus on the dynamic reconfiguration of a switch sub-network, which aims to find the group of sub-panels to be powered by each single VR. Assume that the sub-network is connected with N VRs and M sub-panels. I aim to find N mutu- ally exclusive groups for all the N VRs, where each group corresponds to the set of sub-panels that a single VR drives. This is a partitioning problem with the objective to minimize the overall power consumption of the sub-system including power consump- tion of sub-panels and power losses of VRs, powerFET switches and conducting wires, while maintaining the image quality (with the minimum image distortion). I first discuss the constraint to maintain the image quality. For the m th sub-panel (1 m M), letV DVS;m denote the minimum supply voltage level on that panel that will guarantee the image quality with the minimum distortion as derived in [5, 68, 67], 101 which focus on only minimizing the power consumption of sub-panels. I need to account for IR-drop on the wire and powerFET switch,V IR;m , which was neglected in [5, 68, 67], and defineV O;m = V DVS;m +V IR;m . If them th sub-panel is connected to then th VR though the sub-network (i.e., them th sub-panel belongs to then th group,m2G n ), then the output voltage of then th VR,V G;n , must satisfyV G;n V O;m . In other words, the output voltage V G;n should be the maximum of V O;m values of all sub-panels that are connected to then th VR, i.e., in then th group. In this way the image quality with the minimum distortion can be maintained during sub-panel grouping. I formally describe the problem to find the optimal network configuration to mini- mize the total power consumption of the sub-system, as follows: FindN groupsG 1 ,G 2 ,...,G N , which are mutually exclusive. (4.7) Minimize N X n=1 (I G;n V G;n +P DC (I G;n ;V G;n )): (4.8) Subject to I G;n Cap n : (4.9) In (4.7), I G;n = X m2Gn I m and V G;n =max m2Gn (V O;m ) are the output current and voltage of of n th VR, whereI m is the driving current ofm th sub-panel. In (4.8),P DC is the VR power loss, calculated by (4.4) and (4.5), and N X n=1 I G;n V G;n = N X n=1 X m2Gn P panel;m +P network , where P panel;m is the power consumption of m th sub-panel (cf. Section 4.2.1), and P network is the power loss of powerFET switches and conducting wires. P network is re- gardless of the grouping results, but determined by sub-panels in the sub-system, which can be expressed as: P network = M X m=1 P SW (I m ) +I 2 m R wire;m ; (4.10) 102 where P SW is the switch power loss from (4.6), which is a function of I m , and the second term is the conduction loss of the wire. R wire;m is the resistance of the wire that connectsm th sub-panel and the power supply board. In (4.9),Cap n is the current driving capability ofn th VR that is generally provided by controller data sheets. The problem is NP-hard. To prove the NP hardness of the problem, I reduce the problem difficulty by assuming that the resulting power consumption from assigning a specific sub-panel to each VR is pre-determined and independent of other sub-panel assignments (notice that this is not true in the original problem). Now then the problem has been transformed to ageneralizedassignmentproblem, which is a well-known NP-hard problem. To solve the original problem, I propose a clustering-based heuristic algorithm in Algorithm 4. At the initialization procedure, I set the all (N) VRs to be utilized to power all (M) sub-panels. Namely,M sub-panels are partitioned to theN groups. Because the voltage level of the grouped sub-panels is adjusted to be max m2Gn (V O;m ), the grouping may increase the power consumption of some sub-panels whose voltage level became higher. In order to minimize this appreciable power loss from the grouping, I propose to use the k-means clustering method. The grouping is performed based on V O values, so that the total intra-group distances of theV O ’s within a cluster are minimized and the total inter-group distances of the V O ’s across different groups are maximized. Then, I pay attention to the power loss of the VRs. The VR efficiency depends on the output voltage and load current of the VR, and it may be quite low if the output voltage or load current is low. I exploit the equation (4.2) that the voltage level of the OLED cell directly affects its driving current. Namely, the sub-panel groups that have lowmean(V 0 O s) values are likely to have small driving current. Hence, I investigate the groups in ascending order of their voltage level. To do so, I arrange the groups in a way thatG n :V G;n <V G;n+1 , 103 Algorithm 4 The proposed PDN configuration algorithm Data:I 1 ;:::I M , andV O;1 ;:::V O;M . Initialization Based on V O of each sub panel, partition the M sub-panels to the N groups by using K- meansclustering. UpdateV G;n = max m2Gn (V O;m ). . BecauseV G;n V O;m , pre-specified image distortion thresholds should not be exceeded. ArrangeG n so thatV G;n <V G;n+1 for 1nN 1. Global variablec= P N n=1 (I G;n V G;n +P DC (I G;n ;V G;n )) function ISGAIN(I G;1 ;:::;I G;N ;V G;1 ;:::;V G;N ) ifc P N n=1 (I G;n V G;n +P DC (I G;n ;V G;n )) then c P N n=1 (I G;n V G;n +P DC (I G;n ;V G;n )) return=1 return= 0 function OPTIMIZATION . Main Function DefineI n;m :I m ifm2G n . for 1nN 1 do whileG n+1 6= &I G;n Cap n do Findm such thatI m =min(I n+1;m2G n+1 ). Movem fromG n+1 toG n . I G;n I G;n +I m ;V G;n V G;n+1 ; I G;n+1 I G;n+1 I m ; if isGain(I G;1 ;:::;I G;N ;V G;1 ;:::;V G;N ) = 0 then break ifG n+1 = then G n+1 G n and start the investigation fromn = 1 ton =N1. I first move an element (sub-panel) inG 2 , which has the minimum current valueI 2;1 (i.e., I defineI n;m : I m ifm2 G n , in Algorithm 1), toG 1 . The results isV G;1 = V G;2 andI G;1 = I G;1 +I 2;1 . Next, I check whether this move decreases total power consumption or not. If yes, I keep repeating the move until there is no more power saving or the driving current exceeds the maximum capability of a VR. During the procedure, if all the elements in G x>n are moved to G n , I replaceG x withG n . Every single replacement turns off one VRs (i.e., I can also 104 save power by turning off the VRs). In Algorithm 4, the main functionOptimization performs the aforementioned procedure. The proposed reconfigurable architecture relies on a dedicated controller to calculate the clustered voltage levels for zones and to issue voltage setting commands to each zone. The controller is in turn implemented as code running on a standard low-power microprocessor (this is a similar architecture as that presented in the previous chapter or reference [7, 8]). For video streaming applications whose frame rates are 30 (or 60) fps, reference [67] has reported that the time overhead of OLED-DVS is less than 10% of the frame processing time, and therefore, performing DVS does not affect the image quality. Because the number of clusters (N, which is the number of converters) and the number of entities to be clustered (M, which is the number of sub-panels) per sub-network is rather small (i.e., maximum 5 converters and 16 sub-panels in our experimental work), the runtime overhead of our k-means clustering based algorithm is quite short (i.e., its complexity is O(MN) if Lloyd’s algorithm is used in K-mean clustering). Also the delay of reconfiguring the switch network is very short (the maximum delay of the powerFET switch, Si1470DH, is 129.65ns [73]). Therefore, the runtime overhead of our proposed method is quite small. Same applies to the power overhead. As considering the amount of power saving that our reconfigurable network provides, the power overhead of the network itself is very small. For example, if there are 8 sub-panels and 4 converters in a sub-network, each of which sub-panel requires 1A with 12V , the power overhead of the required 32 powerFET switches (Si1470D) should be2.1W, which is only 2.2% of the energy consumption of the 8 sub-panels. Therefore, our method will be applicable to video streaming applications with very little delay or power impacts. 105 4.4 Experimental Work 4.4.1 Simulation Framework I use a pixel-level power model to estimate the power consumption of the OLED panel. Due to the opto-electric efficiency, different color pixels consumes different power. Specifically, the blue pixels are usually implemented in larger size, and consume al- most twice power compared to the other colors. Further, the power dissipated in the parasitic resistance in the cells would be different when I apply different supply voltage to the driver circuit for DVS. The pixel-level power model introduced in [57] captures such color and supply voltage dependencies of the OLED power consumption. I tar- get the AMOLED panel for the large-area display. The pixel power consumptions are aggregated to estimated the panel power consumption. To evaluate the image distortion, I use CIELAB color space that is designed to ap- proximate the human visual perception. Based on the measured current of each RGB cell, I perform the regression analysis, so that I first translate the pixel color in CIEXYZ color. Then I transform the XYZ to the Lab color space, by using the transform func- tion [74]: L = 116f(Y=Y w ) 16; a = 500f(X=X w )f(Y=Y w ) b = 200f(Y=Y w )f(Z=Z w ); where f(t) =t 1 3 : (4.11) In (4.11), L, a, and b represents the lightness of the color, red-green content, and yellow- blue content, respectively;X w ,Y w andZ w are the color coordinate values of reference white color. I use the Euclidean distance in the Lab color space as a metric to measure 106 Table 4.1: Component in a VR/switch network. Component Spec. Product Manufacturer Price Inductor 10uH 7447709100 Wurth Electronics $3.1 Capacitor 10uF 1EA100WR Panasonic $0.5 Regulator Buck-boost LT3791 Linear Technology $11.8 PowerFET N-type Si1470DH Vishay Siliconix $0.8 the color difference,e pixel , between the original color (L o ,a o ,b o ) and changed color (L c , a c ,b c ) of the pixel.e pixel is calculated by: e pixel = s (L o L c ) 2 + (a o a c ) 2 + (b o b c ) 2 L 2 o +a 2 o +b 2 o : (4.12) Finally, I define the image distortion ratio,e image : e image = The number of pixels thate pixel > 0:05 Total number of pixels 100 (%): (4.13) Figure 4.5 shows some examples when I apply the OLED-DVS to the 65" OLED panel. I sete image = 5%, and use four 4K images namely ‘Balloons’, ‘Bridge’, ‘Leop- ard’, ‘Heidelberg’ and ‘Colosseum’. The target panel is divided to 4 by 4, thus total 16 sub-panels. The derived voltage level of each panel (V O;m = V DVS;m +V IR;m ) is described in the figure. To derive the wire resistance, I use a copper wire of 0.129 mm 2 cross-section. As seen in the figure, if a sub-panel mainly consists of the pixels with low brightness (e.g., indicated by red box), itsV O is relatively small, which may be because i) DVS may be not so effective, and ii) the smallI panel may cause small amount of IR- drop. The opposite case of a sub-panel with high brightness (e.g., indicated by blue box in the figure) shows that itsV O is relatively high. The reasons may be opposite to what the former case was analyzed. Meanwhile, I use an analytical model in (4.4) and (4.5) to estimate the power loss of the VR. LT3791 buck-boost VR is used with four powerFETs, 10uH inductor, 30uF capacitor, which are in Table 4.1. Figure 4.3 (b) illustrates the resulted efficiency of the 107 10.05V 7.22W 9.97V 9.10W 8.73V 4.91W 9.01V 5.92W 10.49V 19.02W 10.26V 25.39W 10.09V 18.23W 10.31V 15.00V 9.99V 10.93W 9.92V 23.78W 10.02V 31.16W 10.44V 24.64W 9.70V 4.43W 9.60V 12.82W 9.60V 18.85W 9.86V 11.62W (a) Original 4K images (Ballon, Bridge, Leopard, Heidelberg, Colossesum) (b) DVS results (voltage level (VO =VDVS +VIR) and power consumption) 7.86V 0.77W 7.84V 0.84W 7.95V 1.01W 7.89V 1.08W 8.89V 7.04W 9.46V 9.02W 9.46V 9.11W 8.80V 7.18W 9.93V 17.32W 9.69V 20.84W 9.70V 21.62W 10.06V 18.53W 9.51V 8.92W 9.40V 13.64W 9.40V 12.66W 9.56V 11.28W 8.51V 4.30W 9.73V 7.93W 10.36V 19.79W 10.15V 10.59W 9.76V 7.24W 9.50V 7.19W 10.07V 17.50W 10.13V 15.09W 10.44V 24.53W 9.61V 7.91W 9.64V 10.27W 9.84V 12.15W 10.28V 31.08W 9.60V 21.45W 9.30V 9.07W 9.24V 5.08W 12.89V 71.68W 12.27V 81.41W 12.59V 93.43W 14.02V 98.86W 10.73V 26.71W 11.02V 57.45W 11.65V 89.48W 12.81V 83.62W 8.92V 3.14W 9.30V 7.05W 9.85V 18.08W 10.14V 18.08W 8.81V 4.48W 8.50V 2.51W 8.80V 3.58V 9.06V 6.29W 9.27V 4.02W 10.98V 22.42W 11.51V 57.33W 12.88V 67.36W 9.66V 7.13W 10.68V 42.71W 11.21V 67.23W 11.48V 43.88W 9.69V 5.07W 9.87V 19.61W 10.02V 31.28W 9.99V 10.93W 9.27V 2.95W 9.50V 18.18W 9.50V 10.43W 9.68V 3.60W Balloons, Bridge, Figure 4.5: Examples of applying OLED-DVS to 4K images in a 4x4 zoned 65" OLED display panel. The red (blue) box indicate an extreme case of a sub-panel with low (high) luminance pixels. 108 VR: I change theV DD from 5V to 15V , andI load from 0A to 5A. The efficiency of the VR is calculated from the analytical model with the parameters in the data sheets. The figure shows that small load current and low output voltage results in the low efficiency. For example, in Figure 4.5, the sub-panels in the red box, which drive100mA with 8V , have less than 70% VR efficiency, while the sub-panels in the blue box have over 90% VR efficiency. 4.4.2 Simulation results I first investigate a case that the panel is divided by 4x4 (i.e., thus we have total 16 sub- panels, as seen in Fig. 4.5.) I then determine three different sub-network setups, each of which delivering power to the (upper or lower) 8 sub-panels from 4, 3 and 2 VRs. According to the number of VRs and sub-panels in a sub-network, I notate the proposed methods: DVS 8 :4 , 8 :3 , and 8 :2 imply that there are 8 sub-panels with 4, 3 and 2 VRs in a sub-network. Table 4.2 shows the simulation results for 5 different methods including i) DVS is applied to whole panel [57] (denoted by DVS 16 :1 ), ii) DVS is applied to each sub-panel [5, 68, 67] (denoted byDVS 16 :16 ), and proposed methods. As a reference point, DVS NO column lists the power consumption values for cases without any DVS. For the results of the proposed methods, the power consumption of the powerFET switches (Si1470D) are calculated by (4.6) and included in Table 4.2. Meanwhile, to estimate the cost for each method, I used the price information of each component in a VR, shown at Table 1. For example, as one VR costs $19.6 (one buck- boost LED driver controller, one inductor, three capacitors and four powerFETs, as previously mentioned in Section 4.3),DVS 16 :16 , which requires additional 15 VRs, results in an additional cost of $294 whereasDVS 8 : 4 requires only additional 7 VRs 109 Table 4.2: Simulation results of a 65" 4K OLED panel divided by 4x4: P DVS; n:m ’s de- note the power consumption of the panel, converter(s) and switch network when DVS is applied ton sub-panels withm converters.NO means no DVS applied,DVS 16 :1 ap- plies DVS to whole panel [57],DVS 16 :16 applies DVS to each sub-panel [5, 68, 67], andDVS 8 :4 ,DVS 8 :3 andDVS 8 :2 perform DVS with reconfigurable PDNs (sub- panels are grouped to 2 sub-networks, each of which has thus 8 sub-panels.) Power sav- ing (%) for each method is provided. Additional implementation costs are also provided (details to calculate the add. cost are described in Section 4.4.2.) Image No DVS Previously presented methods P DVS NO (W ) P DVS 16:1 [57] (W) P DVS 16:16 [5, 68, 67] (W) ‘Balloons’ 395.2 285.6 (27.7%) 256.8 (35.0%) ‘Bridge’ 268.4 184.8 (31%) 173.6 (35.3%) ‘Leopard’ 343.8 246.6 (28.2%) 225.2 (34.5%) ‘Heidelberg’ 985.9 977.9 (0.8%) 696.3 (29.4%) ‘Colosseum’ 641.4 570.5 (11.1%) 433.1 (32.5%) Additional cost $294 Image Our proposed methods P DVS 8:4 (W) P DVS 8:3 (W ) P DVS 8:2 (W ) ‘Balloons’ 255.2 (35.4%) 258.9 (34.5%) 261.7 (33.7%) ‘Bridge’ 169.5 (36.8%) 170.4 (36.5%) 172.2 (35.8%) ‘Leopard’ 222.6 (35.3%) 224.3 (34.8%) 227.5 (33.8%) ‘Heidelberg’ 730.5 (25.9%) 751.1 (23.8%) 799.4 (18.9%) ‘Colosseum’ 439.8 (31.4%) 448.11 (30.1%) 471.5 (26.5%) Add. cost $188.4 $136.4 $84.4 and 64 powerFET switches with a total cost of $188.4. Similarly, the additional cost for the other methods are calculated. As expected, DVS 16 :16 saves more amount of power than DVS 16 :1 . Esp., DVS 16 :16 achieves remarkable power saving with ‘Colosseum’ and ‘Heidelberg’, which consume high power with the highest luminance pixels (cf. blue box in Fig. 4.5). However, implementing DVS 16 :16 costs extra $294, which is expensive. On the other hands, compared toDVS 16 :16 , the results of the proposed methods show that they can achieve similar power saving levels by much less expenses. Furthermore, if 110 Table 4.3: Simulation results of 8x8 panel division: P method ’s follow the same notation in Table 4.3. P DVS 64:64 is from [5, 68, 67], andP DVS 16:5 andP DVS 16:2 are from our proposed methods. Image P DVS 64:64 (W) P DVS 16:5 (W) P DVS 16:2 (W) ‘Balloons’ 281.4 (25%) 252.3 (32.8%) 249.4 (33.5%) ‘Bridge’ 201.8 (22.5%) 172.2 (33.91%) 168.2 (35.5%) ‘Leopard’ 250.6 (23.9%) 221.8 (32.7%) 218.6 (33.6%) ‘Heidelberg’ 618.4 (30.9%) 602.8 (32.7%) 626.5 (30.1%) ‘Colosseum’ 423.9 (28.9%) 399.6 (33.0%) 402.95 (32.5%) Add. cost $1254.4 $648 $259.2 images do not require many number of pixels to have high luminance, the proposed methods can save more power thanDVS 16 :16 does. For example, DVS 8 :4 saves 37% power in ‘Bridge’, butDVS 16 :16 saves 35% power. This is thanks to one of the benefits of the proposed reconfigurable PDN, i.e, fewer number of VRs, which lowers power consumption. Moreover, each VR may have higher efficiency than the VR used inDVS 16 :16 . Also note that the cost of implementingDVS 8 :4 is 36% lower than that ofDVS 16 :16 . Next, I divide the panel to 8x8, hence we have finer-grained 64 sub-panels. Cor- responding results are shown in Table 4.3. Table 4.3 shows that compared to our pro- posed methods,DVS 64 :64 saves a little more power in ‘Colosseum’ and ‘Heidelberg’, owing to the benefit of fine granularity control. However the high number of VRs in DVS 64 :64 makes it very costly. E.g., our proposedDVS 16 :2 costs 5 times as low as that ofDVS 64 :64 , with similar or lower power saving achievements. In another ex- ample,DVS 8 :2 (in Table 4.2: costs less than a $100) results in higher power savings thanDVS 64 :64 (which costs a $1000 more for some low luminance images.) The results in Table 4.2 and 4.3 confirm that the proposed framework consistently achieves high power conversion efficiency and significant energy saving while minimiz- ing the overheads of the VRs. 111 4.5 Summary In this chapter, I introduced the reconfigurable PDN architecture and its optimization control to realize the OLED-DVS on large-area display. The large-area OLED panel was partitioned into multiple sub-panels of which the supply voltage is adaptively adjusted based on the displayed content. The proposed PDN architecture consisted of limited number of VRs and switch networks to supply the required voltage in time. The pro- posed framework minimized the overhead of the multiple VRs, as well as consistently achieves high power conversion efficiency and significant energy saving while preserv- ing image quality. The experimental results demonstrated that the proposed method achieves upto 36% power savings in a 65" 4K OLED display platform. 112 Chapter 5 Conclusion This dissertation focuses on the power conversion efficiency of PDN in various VLSI platforms, and propose novel methods to improve it. First, this dissertation have optimized the PDN in a smartphone platform. In Chap- ter 2, I have demonstrated that significant power loss occurs during power conversion in the PDN of a smartphone platform. To mitigate this problem, this chapter have focused on the VRs in the PDN to introduce two optimization methods for the VRs. S3 has been presented to configure the switches in VRs so that the optimal operating condi- tions of the VRs match to the general load current conditions. The general load current distributions for all modules in the platform were derived from the measured loading profiles and smartphone usage patterns. DSM has been also presented to overcome the lack of capability of the S3 that may not be optimal for dynamically varying load con- ditions. By exploiting the multi switching scheme, detailed procedures to select and size the switches has been introduced. To verify the presented methods in an actual smartphone platform, the PDN characterization procedure has been performed. By the proposed equivalent VR model and grouping method, the power conversion efficiency of the PDN in the target smartphone platform could be characterized. Finally, I have applied the proposed optimization methods to the platform. The experimental results have showed that the S3 achieves 6% overall efficiency enhancement, which translates to 19% power loss reduction for the general smartphone usage pattern. The DSM has accomplished the similar improvement at the same condition. Furthermore, it also could 113 achieve the high efficiency enhancement in the various load conditions. In the design flow, both S3 and DSM methods could be applied only after obtaining the load current distributions for the modules. S3 was simple to implement, but may not produce the optimal transistor widths under dynamically changing load conditions or even under the case that the load distribution has a high variance. On the other hand, DSM had more control/area overhead than S3, but it could achieve high conversion efficiency enhance- ment under all load conditions. Note that if it happens that the load current distributions are changed because of newly added applications or changing usage patterns compared to those used for the initial optimization, the DSM method will continue to provide power efficiency enhancement because of its adaptability whereas the S3 method will fail. Next, this dissertation have optimized the PDN in a multicore platform in Chap- ter 3. The problem of power conversion efficiency in the multicore platform has been addressed, where significant power is dissipated by the multiple VRs, and design limita- tions associated with the fixed VR-to-core network undermine the opportunity of power savings from the per-core DVFS technique. This chapter has proposed the VR consoli- dation methods with the configurable VR-to-core distribution network integrated in the proposed multicore platform design. The reactive VRCon has been presented to config- ure the network to enhance the power conversion efficiency under the pre-determined DVFS levels. The proactive VRCon has been proposed to determine new DVFS levels for maximizing system-wide energy saving without performance degradation. I have ap- plied the proposed optimization methods to the PDN composed of homogeneous VRs, and demonstrated that the proposed method could accomplish upto 32% VR energy loss reduction. Then I have explored the limitation of the homogenous PDN, and proposed the heterogenous PDN that could increase the benefits of the optimization methods by 114 incorporating VRs with a larger driving capability of load current. The simulation re- sults based on the realistic experimental setups have demonstrated that the proposed methods achieve up to 36% VR energy loss reduction and 9% total energy saving. In Chapter 4, this dissertation have introduced the reconfigurable PDN architecture and its optimization control to realize the OLED-DVS on large-area display. The large- area OLED panel has been partitioned into multiple sub-panels of which the supply voltage is adaptively adjusted based on the displayed content. The proposed PDN archi- tecture consisted of limited number of VRs and switch networks to supply the required voltage in time. The proposed framework has minimized the overhead of the multiple VRs, as well as consistently has achieved high power conversion efficiency and sig- nificant energy saving while preserving image quality. The experimental results have demonstrated that the proposed method can achieve up to 36% power savings in a 65" 4K OLED display platform. 115 References [1] Qualcomm, “Snapdragon Mobile Development Platform (MDP) MSM8660,” Available at https://developer.qualcomm.com. (document), 1.1, 2.3.2 [2] T. E. Carson, W. Heirman, and L. Eeckhout, “Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation,” in proc. Supercomputing, 2011, available at snipersim.org. (document), 1.2, 3.1 [3] “LTC3816,” available at http://www.linear.com/product/LTC3816. (document), 1.2, 3.3, 3.4.1.2 [4] F. Hossein, M. Ratul, K. Srikanth, L. Dimitrios, G. Ramesh, and E. Deborah, “Diversity in smartphone usage,” in proc. Int’l Conf. on Mobile Systems, Applications, and Services, pp. 179-194 , 2010. (document), 2.2.1, 2.3 [5] X. Chen, Y . Chen, W. Zhang, and H. Li, “Fine-grained dynamic voltage scaling on OLED display,” in proc. ASP-DAC, pp. 807-812, Jan. 2012. (document), 4.1, 4.2.1, 4.1, 4.2.1, 4.2.2, 4.3, 4.3.2, 4.3.3, 4.4.2, 4.2, 4.3 [6] W. Lee, Y . Wang, D. Shin, N. Chang, and M. Pedram, “Optimizing power delivery network in a smartphone platform,” IEEE Trans. Computer-Aided Design of Integr. Circuits Systs., vol. 33, no. 1, pp.36-49, Jan. 2014. 1.1.1, 3.1, 3.2.2, 3.4.1.2 [7] W. Lee, Y . Wang, and M. Pedram, “VRCon: Dynamic reconfiguration of voltage regulators in a multicore platform,” in Proc. Design Automation and Test in Europe, pp. 1-6, March 2014. 1.1.1, 3.1, 4.1, 4.3.1, 4.3.2, 4.3.3 [8] W. Lee, Y . Wang, and M. Pedram, “Optimizing a reconfigurable power distribution network in a multicore platform,” IEEE Trans. Computer-Aided Design of Integr. Circuits Systs., pp. , 2015. 1.1.1, 3.1, 4.1, 4.3.1, 4.3.2, 4.3.3 [9] Y . Choi, N. Chang, and T. Kim, “DC-DC converter-aware power management for low- power embedded systems,” IEEE Trans. Computer-Aided Design of Integr. Circuits Systs., vol. 26, no. 8, Aug. 2007. 1.1.1, 1.1.2.1, 1.1.2.2, 1.1.3.1, 1.1.3.2, 2.2, 2.3.1, 3.1, 3.2.2 [10] W. Lee, Y . Wang, D. Shin, N. Chang, and M. Pedram, “Power conversion efficiency char- acterization and optimization for smartphones,” in proc. of Int’l Symp. on Low Power Elec- tronics and Design, pp. 103-108, 2012. 1.1.1, 1.1.2.1, 2.2, 3.1 116 [11] A. A. Sinkar, H. Wang, and N. Kim, “Workload-aware voltage regulator optimization for power efficient multi-core processors,” in proc. Design Automation and Test in Europe, pp. 1134-1137, March 2012. 1.1.1, 1.1.3.1, 1.1.4.1, 3.1 [12] J. Wibben and R. Harjani, “A high-efficiency DC-DC converter using 2nH integrated in- ductors,” IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 844-854, April. 2008. 1.1.2 [13] S. Musunuri and P. L. Chapman, “Optimization of CMOS transistors for low power DC- DC converters,” in proc. Power Electronics Specialists Conf., pp. 2151-2157, June 2005. 1.1.2.1, 2.2.1 [14] S.Kudva and R. Harjani, “Fully-integrated on-chip DC-DC converter with a 450X output range,” IEEE Journal of Solid-State Circuits, vol. 46, no. 8, pp. 1940-1951, Aug. 2011. 1.1.2.1, 1.1.3.1, 2.2.1, 2.2.2, 2.4.3, 3.1, 3.2 [15] “PTM,” available at http://ptm.asu.edu. 1.1.2.1, 2.2.2 [16] R. Erickson and D. Maksimovic, “Fundementals of power electronics,” Book: Springer, Berlin, Germany, 2001. 1.1.2.1 [17] O. Abdel-Rahman, J. Abu-Qahouq, L. Huang, and I. Batarseh, “Analysis and design of voltage regulator with adaptive FET modulation scheme and improved efficiency,” IEEE Trans. on Power Electronics, vol. 23, no. 2, pp. 896-906, March 2008. 1.1.3.1, 2.2.2 [18] S. Bandyopadhyay, Y . K. Ramadass, and A. P. Chandrakasan, “20uA to 100mA DC-DC converter with 2.8 to 4.2V battery supply for portable applications in 45nm CMOS,” in proc. Int’l Solid-State Circuits Conf., pp. 386-387, Feb. 2011. 1.1.3.1, 3.1, 3.2 [19] W. Kim, M. Gupta, G.-Y . Wei, and D. Brooks, “System level analysis of fast, per-core DVFS using on-chip switching regulators,” in proc. Int’l Symp. on High-Performance Com- puter Architec., pp. 123-134, Feb. 2008. 1.1.3.2, 3.1, 3.4.1.1, 4.3.1 [20] T. Kolpe, A. Zhai, and S. S. Sapatnekar, “Enabling improved power management in multi- core processors through clustered DVFS,” in proc. Design Automation and Test in Europe, pp. 1-6, March 2011. 1.1.3.2, 3.1, 4.1, 4.3.1 [21] A. Grama, G. Karypis, V . Kumar, and A. Gupta, “Introduction to parallel computing,” Book: 2nd Ed. Addison-Wesley, 2003. 1.1.3.2, 3.1 [22] S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai, “The impact of performance asymmetry in emerging multicore architectures,” in proc. Int’l Symp. on Computer Architec., vol. 33, no. 2, pp. 506-517, 2005. 1.1.3.2, 3.1 [23] L. Zhang, B. Tiwana, R. Dick, Z. Qian, Z. Mao, Z. Wang, and L. Yang, “Accurate on- line power estimation and automatic battery behavior based power model generation for smartphones,” in proc. Int’l Conf. on Hardware/Software Codesign and System Synthesis, pp.105-114, Oct. 2010. 2.1, 2.3.2, 2.4.1 117 [24] M. Dong and L. Zhong, “Self-constructive high-rate system energy modeling for battery- powered mobile systems,” in proc. Int’l Conf. on Mobile Systems, Applications, and Ser- vices, pp. 335-348, 2011. 2.1, 2.3.2, 2.4.1 [25] A. Pathak, Y . C. Hu, and M. Zhang, “Fine-grained energy accounting on smartphones with Eprof,” in proc. EuroSys, pp. 29-42, 2011. 2.1, 2.3.2, 2.4.1 [26] D. Shin, N. Chang, W. Lee, Y . Wang, Q. Xie, and M. Pedram, “Online estimation of the remaining energy capacity in mobile systems considering system-wide power consumption and battery characteristics,” in proc. Asia and South Pacific Design Automation Conf., pp. 59-64. Jan. 2013. 2.1, 2.3.2, 2.4.1 [27] L. Benini, A. Bogliolo, and G. D. Micheli, “A survey of design techniques for system-level dynamic power management,” IEEE Trans. on VLSI, vol. 8, no. 3, pp. 299-316, Jun. 2000. 2.1 [28] Q. Qiu, Q. Wu, and M. Pedram, “Dynamic power management in a mobile multimedia system with guaranteed quality-of-service,” in proc. Design Automation Conf., pp. 834- 839, 2001. 2.1 [29] J. Xiao, A. Peterchev, J. Zhang, and S. Sanders, “An ultra-low-power digitally-controlled buck converter IC for cellular phone applications,” in proc. Applied Power Electronics Conf., pp. 383-391, 2004. 2.2 [30] C. Shi, B. C. Walker, E. Zeisel, E. B. Hu, and G. H. McAllister, “A highly integrated power management IC for advanced mobile applications,” in proc. Custom Integrated Circuits Conf., pp. 85-88, 2006. 2.2, 2.3.1, 2.3.2 [31] B. Amelifard and M. Pedram, “Optimal design of the power-delivery network for multiple voltage-island system-on-chips,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 6, pp. 888-900, June 2009. 2.2, 2.3.1 [32] Y . Du, M. Wang, R. T. Meitl, S. Lukic, and A. Q. Huang, “High-frequency high-efficiency dc-dc converter for distributed energy storage modularization,” in proc. Int’l Experts and Consultants, pp.1832-1837, Nov. 2010. 2.2 [33] A. Shye, B. Scholbrock, and G. Memik, “Into the wild: Studying real user activity patterns to guide power optimizations for mobile architectures,” in proc. Int’l Symp. on Microarchi- tecture, pp.168-178, 2009. 2.2.1 [34] T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone usage in the wild: a large-scale analysis of applications and context,” Proc. of Int’l Conf. on Multimodal Interaction, pp. 353-360, 2011. 2.2.1 [35] Texas Instruments, “Handset:Smartphone solutions,” Available at: http://www.ti.com/solution/handset_smartphone. 2.3.1, 2.3.2 [36] Antutu , Available at: http://www.antutu.net. 2.4.2 118 [37] Vellamo , Available at: http://www.quicinc.com/vellamo. 2.4.2 [38] Quadrant , Available at: http://www.aurorasoftworks.com. 2.4.2 [39] GLBenchmark , Available at: http://gfxbench.com. 2.4.2 [40] J. Henkel and S. Parameswaran, “Designing embedded processors - a low power perspec- tive,” Book: Springer, 2007. 3.1 [41] A. Alimonda, S. Carta, A. Acquaviva, A. Pisano, and L. Benini, “A feedback-based ap- proach to dvfs in data-flow applications,” IEEE Trans. Computer-Aided Design of Integr. Circuits Systs., vol. 28, no. 11, pp. 1691-1704, Nov. 2009. 3.1 [42] M. Wens and M. Steyaert, “An 800mW fully-integrated 130nm CMOS DC-DC step-down multi-phase converter, with on-chip spiral inductors and capacitors,” in proc. Energy Con- version Congress and Exposition, pp. 3706-3709, Sept. 2009. 3.1 [43] W. Kim, D. M. Brooks, and G.-Y . Wei, “A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation,” in proc. Int’l Solid-State Circuits Conf., pp. 268-270, Feb. 2012. 3.1 [44] C. Bienia and K. Li, “Parsec 2.0: A new benchmark suite for chip-multiprocessors,” in proc. 5th Workshop on Modeling, Benchmarking and Simulation, June, 2009. 3.1 [45] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The splash-2 programs: Charac- terization and methodological considerations,” in proc. Int’l Symp. on Computer Architec., pp. 24-36, 1995. 3.1 [46] J. Park, D. Shin, N. Chang, and M. Pedram, “Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance micropro- cessors,” in proc. Int’l Symp. on Low-Power Electronics and Design, pp. 419-424, 2010. 3.2.2 [47] Z. J. Shen, Y . Xiong, X. Cheng, Y . Fu, and P. Kumar, “Power MOSFET switching loss analysis: A new insight,” in proc. Industry Application Conf., pp. 1438-1442, Oct. 2006. 3.2.2 [48] L. Shang, R. Dick, and N. K. Jha, “SLOPES: Hardware-software cosynthesis of low- power real-time distributed embedded systems with dynamically reconfigurable fpgas,” IEEE Trans. Computer-Aided Design of Integr. Circuits Systs., vol. 26, no. 3, pp. 508-526, July 2007. 3.2.3 [49] J. Teich, “Hardware/software codesign: The past, the present, and predicting the future,” in proc. IEEE, vol. 100, pp. 1411-1430, May 2012. 3.2.3 [50] L. Balogh, “Design and application guide for high speed MOSFET gate drive circuits,” Available at http://www.ti.com/lit/ml/slup169/slup169.pdf. 3.2.4 119 [51] T. Miller, R. Thomas, and R. T. X. Pan, “VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors,” in proc. Int’l Symp. on Computer Architec., pp. 249-260, June 2012. 3.2.4 [52] “Vishay siliconix Si4442DY datasheet,” Available at http://www.vishay.com/docs/71358/si4442dy.pdf. 3.3, 3.4.1.3 [53] Digi-Key, “Electronic component price,” Available at http://www.digikey.com/. 3.3 [54] “Intel VRD 11.1,” available at http://www.intel.com/content/dam/doc/design- guide/voltage-regulator-down-11-1-processor-power-delivery-guidelines.pdf. 3.4.1.2 [55] “Vishay siliconix Si4840DY datasheet,” Available at http://www.vishay.com/docs/71188/71188.pdf. 3.4.1.2 [56] “Vishay siliconix Si4838DY datasheet,” Available at http://www.vishay.com/docs/71359/71359.pdf. 3.4.1.2 [57] D. Shin, Y . Kim, N. Chang, and M. Pedram, “Dynamic driver supply voltage scaling for organic light emitting diode displays,” IEEE T. CAD, vol. 32, no.7, pp. 1017-1030, July 2013. 4.1, 4.2.1, 4.2.1, 4.2.1, 4.4.1, 4.4.2, 4.2 [58] J. Park, J. Lee, D. Shin, and S. Park, “Luminance uniformity of large-area OLEDs with an auxiliary metal electrode,” IEEE J. Display Technology, vol.5, no. 8, pp. 306-311, Aug. 2009. 4.1, 4.2.2 [59] J. Han, J. Moon, D. Cho, J. Shin, C. Joo, J. Hwang, J. Huh, H. Chu, and J. Lee, “Trans- parent OLED lighting panel design using two-dimensional OLED circuit modeling,” ETRI Journal, vol. 35, no. 4, Aug. 2013. 4.1 [60] J. Colegrove, “OLED display and OLED lighting technology and market forecast,” 2010. 4.1 [61] A. Buckley, Organic Light-Emitting Diodes (OLEDs): Materials, Devices and Applica- tions. Elsevier Science, 2013. 4.1, 4.2.2 [62] J. Park, D. Shin, and S. Park, “Large-area OLED lightings and their applications,” Semi- conductor Science and Technology, vol. 26, no.3, pp.1-9, 2011. 4.1, 4.2.2 [63] J. Betts-LaCroix, “Selective dimming of oled displays,” 2010. U.S. Patent 0149223 A1. 4.1 [64] M. Dong, Y . Choi, and L. Zhong, “Power-saving color transformation of mobile graphical user interfaces on OLED-based displays,” in proc. ISLPED, pp. 339-342, 2009. 4.1 [65] M. Dong and L. Zhong, “Power modeling and optimization for OLED displays,” IEEE T. Mobile Computing, 2012. 4.1 [66] D. Shin, Y . Kim, N. Chang, and M. Pedram, “Dynamic voltage scaling of OLED displays,” in proc. DAC, pp. 53-58, June 2011. 4.1, 4.2.1 120 [67] M. Zhao, H. Zhang, X. Chen, Y . Chen, and C. Xue, “Online OLED dynamic voltage scaling for video streaming applications on mobile devices,” in proc. CODES+ISSS, pp. 1-10, Sept. 2013. 4.1, 4.2.1, 4.2.2, 4.3, 4.3.2, 4.3.3, 4.3.3, 4.4.2, 4.2, 4.3 [68] X. Chen, J. Zheng, Y . Chen, M. Zhao, and C. Xue, “Quality-retaining OLED dynamic voltage scaling for video streaming applications on mobile devices,” in proc. DAC, pp. 1000-1005, June 2012. 4.1, 4.2.1, 4.2.2, 4.3, 4.3.2, 4.3.3, 4.4.2, 4.2, 4.3 [69] M. Jung, O. Kim, and H. Chung, “V oltage distribution of power source in large AMOLED displays,” J. Korean Physical Society, vol. 48, pp. S5-S9, Jan. 2006. 4.1 [70] T. Tsai and L. Chang, “Organic light-emitting diode display,” 2014. US Patent 8,736,180. 4.1, 4.2.2, 4.2.2 [71] Philips, “Design in guide philips lumiblade OLED panel,” Available at http://www.lumiblade-experience.com, 2014. 4.2.2 [72] Y . Wang, Y . Kim, Q. Xie, N. Chang, and M. Pedram, “Charge migration efficiency opti- mization in hybrid electrical energy storage (HEES) systems,” in proc. ISLPED, pp. 103- 108, Aug. 2011. 4.2.3 [73] “Vishay siliconix,” Available at http://www.vishay.com/docs/74277/SI1470DH.pdf. 4.3.3 [74] A. Jain, “Fundamentals of digital image processing,” Engelwood Cliffs, NJ, Prentice-Hall, 1988. 4.4.1 121
Abstract (if available)
Abstract
Power delivery network (PDN) is an essential part of VLSI platform to deliver power to all devices in a platform from a power source. Because the PDN inevitably dissipates power that can result in a considerable amount of power loss in a platform, minimizing the power loss of the PDN has become an important issue in VLSI platform designs. This dissertation presents optimization methods to minimize the power loss of the PDN, and hence, improve the power efficiency of the PDN. Both circuit‐level and system‐level approaches are discussed along with the proposed methods that are verified at three specific VLSI platforms: smartphones, chip multi‐core processors (CMPs) and OLED display platforms.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Power efficient design of SRAM arrays and optimal design of signal and power distribution networks in VLSI circuits
PDF
Thermal analysis and multiobjective optimization for three dimensional integrated circuits
PDF
Trustworthiness of integrated circuits: a new testing framework for hardware Trojans
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Stochastic dynamic power and thermal management techniques for multicore systems
PDF
Optimal redundancy design for CMOS and post‐CMOS technologies
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Development of electronic design automation tools for large-scale single flux quantum circuits
PDF
Performance improvement and power reduction techniques of on-chip networks
PDF
Low power and reliability assessment techniques for advanced processor design
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Multi-level and energy-aware resource consolidation in a virtualized cloud computing system
PDF
Theory, implementations and applications of single-track designs
PDF
Automatic conversion from flip-flop to 3-phase latch-based designs
PDF
Energy efficient design and provisioning of hardware resources in modern computing systems
PDF
Design of low-power and resource-efficient on-chip networks
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Static timing analysis of GasP
PDF
Clustering and fanout optimizations of asynchronous circuits
Asset Metadata
Creator
Lee, Woojoo
(author)
Core Title
Optimizing power delivery networks in VLSI platforms
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (VLSI Design)
Publication Date
04/21/2015
Defense Date
03/23/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
DC-DC converter,low power design,OAI-PMH Harvest,power delivery network,VLSI,voltage regulator
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pedram, Massoud (
committee chair
), Gupta, Sandeep K. (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
woojoole@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-555824
Unique identifier
UC11299487
Identifier
etd-LeeWoojoo-3355.pdf (filename),usctheses-c3-555824 (legacy record id)
Legacy Identifier
etd-LeeWoojoo-3355.pdf
Dmrecord
555824
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lee, Woojoo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
DC-DC converter
low power design
power delivery network
VLSI
voltage regulator