Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Thermal management in microprocessor chips and dynamic backlight control in liquid crystal diaplays
(USC Thesis Other)
Thermal management in microprocessor chips and dynamic backlight control in liquid crystal diaplays
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THERMAL MANAGEMENT IN MICROPROCESSOR CHIPS AND DYNAMIC
BACKLIGHT CONTROL IN LIQUID CRYSTAL DIAPLAYS
by
Wonbok Lee
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Wonbok Lee
i
DEDICATION
To my father in heaven, my mother, my wife, and all of my family members
ii
ACKNOWLEDGEMENTS
First of all, it has been my distinct honor and pleasure to work with Professor
Massoud Pedram during my Ph.D. studies at the University of Southern California. I
recall year 2002 when I was fortunate to join his research group; this became a
turning point of my life. I have received numerous valuable advice, strong support,
and continuous encouragement from him during my doctoral years.
I also would like to thank to every committee member in my thesis defense and
qualifying exam; Professor Sandeep Gupta, Jeffrey Draper, Antonio Ortega, and
Aiichiro Nakano. They are keen scholars with great academic minds who gave me
valuable feedback and guidance.
Luckily I have been with smart individuals and colleagues in our research group
and would like to thank them all. It is impossible to list all of them, but there are a
few whose association I would like to cherish; Kimish Patel, Ali Iranli, Chanseok
Hwang, Changwoo Kang, Behnam Amelifard, Kwanho Kim, Hojun Shim,
Weichung Cheng, Karthik Dantu, and Ramakrishna Soma. Collaboration and
discussion over various research areas with them really have taught me a lot.
Lastly, I am eternally indebted to my parents and my wife for their love and
support without which I would not have been what I am today. Especially to my
father who has passed away on the very last year of my Ph.D. study, I dedicate this
dissertation.
iii
Table of Contents
Dedication ..............................................................................................................i
Acknowledgements..................................................................................................... ii
List of Figures ............................................................................................................. v
List of Tables ..........................................................................................................viii
Abstract ............................................................................................................ix
Chapter 1 Introduction......................................................................................... 1
1.1 Thermal Management in Microprocessors ........................................... 2
1.2 Dynamic Backlight Control in LCDs.................................................... 4
1.2.1 Dynamic Backlight Scaling............................................................. 4
1.2.2 Dynamic Backlight Scanning.......................................................... 6
1.3 Dissertation Contributions .................................................................... 9
Chapter 2 Backgrounds...................................................................................... 11
2.1 Thermal Management in Microprocessors ......................................... 11
2.2 Dynamic Backlight Control in LCDs.................................................. 15
Chapter 3 Thermal Management in Microprocessors ....................................... 18
3.1. Temperature Model, Zone, and Gradients .......................................... 18
3.1.1 Temperature Model....................................................................... 18
3.1.2 Thermal Zone and Gradients......................................................... 19
3.2. Thermal Behavior of Application Programs....................................... 20
3.2.1. General Application Programs...................................................... 20
3.2.2. Multi-media Application Programs............................................... 22
3.3. Proposed Thermal Management I ....................................................... 24
3.3.1. Motivation..................................................................................... 25
3.3.2. Proposed Idea................................................................................ 27
3.3.3. Spatiotemporal Quality Degradation............................................. 32
3.3.4. Simulation and Results.................................................................. 34
3.4. Proposed Thermal Management II...................................................... 39
3.4.1. Motivation..................................................................................... 39
3.4.2. Proposed Idea............................................................................... 41
3.4.3. Why Bank Switching Works......................................................... 42
3.4.4. Simulation and Results.................................................................. 43
3.5. Proposed Thermal Management III .................................................... 48
3.5.1. Motivation..................................................................................... 49
3.5.2. Steady-State Temperature Calculation.......................................... 51
3.5.3. Determination of Voltage and Frequency Level........................... 52
iv
3.5.4. Thermal Management Policy ........................................................ 54
3.5.5. Problem Formulation and Off-line Solution ................................. 55
3.5.6. On-line Solution............................................................................ 57
3.5.7. Simulation and Results.................................................................. 61
Chapter 4 Dynamic Backlight Control in LCDs................................................ 67
4.1. LCD Architecture................................................................................ 67
4.2. Characteristics of HVS........................................................................ 70
4.2.1. Spatial Characteristics of HVS...................................................... 70
4.2.2. Temporal Characteristics of HVS ................................................. 73
4.3. Light Presentation in Display and HVS Perception............................ 74
4.3.1. Light Exposure in CRT Monitors ................................................. 74
4.3.2. Light-Evoked Responses in HVS ................................................. 74
4.4. Proposed Backlight Control Technique I............................................ 77
4.4.1. Dynamic Backlight Scaling Problem............................................ 77
4.4.2. HVS Model and the Proposed Backlight Scaling ......................... 80
4.4.3. Experiment and Results................................................................. 85
4.5. Proposed Backlight Control Technique II .......................................... 91
4.5.1. Proposed Backlight Scanning ....................................................... 91
4.5.2. Proposed Backlight Local Dimming............................................. 95
4.5.3. Experiment and Results................................................................. 97
Chapter 5 Conclusion ...................................................................................... 103
References .........................................................................................................105
v
LIST OF FIGURES
Figure. 1 Image representation in LCDs vs. CRTs. ................................................... 7
Figure. 2 Thermal gradients and zones. ................................................................... 20
Figure. 3 T
ss
and the settling times of general applications. ..................................... 21
Figure. 4 T
ss
and settling times of a MPEG-2 decoding program............................. 22
Figure. 5 Measured MPEG-2 per-frame decoding time in two machines. .............. 25
Figure. 6 Thermal variations and violations of a microprocessor in the simulator.. 26
Figure. 7 Proposed DTM strategy in a glance. ........................................................ 27
Figure. 8 Piecewise decomposition of temperature variation. ................................. 28
Figure. 9 Typical MPEG-2 decoding steps.............................................................. 33
Figure. 10 Comparison of the thermal variations in different programs.................. 38
Figure. 11 Exemplary physical RF utilization. ........................................................ 40
Figure. 12 Performance penalty in half sized RF bank............................................ 41
Figure. 13 Pentium IV floor-plan used in the experiment. ...................................... 43
Figure. 14 Detailed floor-plan for the proposed RF structure.................................. 44
Figure. 15 Partial thermal behaviors in gcc. ............................................................ 46
Figure. 16 Workload variation in each consecutive B-frame. ................................. 50
Figure. 17 Decoding time in each consecutive GOPs.............................................. 51
Figure. 18 Example showing DTM c-factor’ dependence on T
ss
. ........................... 54
Figure. 19 Deadline and thermal constraints in frame decoding. ............................ 56
Figure. 20 Online algorithm for computing GOP-level DTM policy. ..................... 61
vi
Figure. 21 Relation between the workload and the quality degradation.................. 64
Figure. 22 Comparison of thermal curves in three MPEG-2 video files. ................ 65
Figure. 23 TFT-LCD architecture............................................................................ 67
Figure. 24 Pixel granularity vs. LED granularity..................................................... 69
Figure. 25 Brightness vs. luminance characteristics of HVS................................... 72
Figure. 26 Sample responses in HVS....................................................................... 75
Figure. 27 Various pixel transformation functions. ................................................. 78
Figure. 28 Temporal model of HVS. ....................................................................... 80
Figure. 29 AS of HVS for sinusoidal input with varying DC.................................. 82
Figure. 30 Temporally-Aware Backlight Scaling (TABS). ..................................... 85
Figure. 31 Experimental setup. ................................................................................ 85
Figure. 32 Power breakdown of our platform, Apollo............................................. 87
Figure. 33 Time domain variation of backlight luminance...................................... 88
Figure. 34 Fourier transform of output video sequence........................................... 89
Figure. 35 Energy savings in HEBS vs. TABS........................................................ 90
Figure. 36 1-D LED backlight scanning idea........................................................... 91
Figure. 37 Proposed 1-D LED backlight scanning idea........................................... 92
Figure. 38 Determination of the turn-on time of LEDs. .......................................... 93
Figure. 39 Test environment and circuit blocks in the panel. .................................. 97
Figure. 40 LED arrangement in LCD Panel............................................................. 98
Figure. 41 PWM duty cycle changes across different backlight levels. .................. 99
Figure. 42 Luminance changes in the ‘Original’ vs. ‘PLD’................................... 100
vii
Figure. 43 Power consumptions in the ‘Original’ vs. ‘PLD’. ................................ 101
Figure. 44 Power consumption of the ‘Iron Man’.................................................. 102
Figure. 45 Power consumption of the ‘Indiana Jones’........................................... 102
viii
LIST OF TABLES
TABLE 1 MPEG-2 video files used in the experiments.......................................... 35
TABLE 2 Thermal behaviors in the hottest functional unit..................................... 36
TABLE 3 Spatial/Temporal quality degradation ..................................................... 37
TABLE 4 T
ss
and IPC .............................................................................................. 45
TABLE 5 RF utilization and performance............................................................... 47
TABLE 6 MPEG-2 video files used in the experiment. .......................................... 62
TABLE 7 Baseline configuration of simulated processor. ...................................... 62
TABLE 8 Temperatures and the corresponding quality degradation. ..................... 63
ix
ABSTRACT
The desire for the low power dissipation in the electronic systems and the ensuring
thermal safety for the microprocessor chips have brought about a paradigm shift
from the traditional performance-oriented design to the power-temperature-oriented
one. To address some of the critical problems arising from this paradigm shift, this
dissertation introduces a number of low-power design and temperature control
strategies. The first part of the dissertation presents three dynamic thermal
management (DTM) techniques for the microprocessor chips. The first technique,
which is targeted to MPEG-2 decoding, allows some degree of spatiotemporal
quality degradation in the video stream to ensure that the die temperature stays in a
safe range. The second DTM technique, which specifically targets the register file,
periodically switches the load/store operations from one register file bank to another
in order to keep the register file temperature below a critical threshold. The third
DTM technique, which is again targeted at MPEG-2 decoders, is based on (i)
accurate estimation of the workload caused by various frames in a Group Of Pictures
(GOP), (ii) slack borrowing across the GOP frames, (iii) employing Dynamic
Voltage and frequency scaling (DVFS) while considering the frame-rate-dependent
GOP deadline, variance of the frame decoding times within the GOP, and a
maximum chip temperature constraint.
The second part of this dissertation presents three low-power and quality-aware
backlight control techniques for LCDs. First, a temporally-aware backlight scaling
x
technique for video is presented, which considers spatiotemporal image/video quality
in the backlight scaling domain based upon some characteristics of the Human
Visual System (HVS). The proposed technique attempts to minimize the
spatiotemporal distortion in the perceived brightness between the original and the
backlight-scaled videos while maximizing the energy savings in a Liquid Crystal
Display (LCD) with Cold Cathode Fluorescent Lamp (CCFL) backlight. Next, a two-
D Light Emitting Diode (LED) backlight scanning and a local LED dimming
technique are presented. Considering the visual persistence of HVS and the response
time of LCs/LEDs, the proposed scanning technique controls the ON/OFF period of
the LED backlights, achieving a significant reduction in the motion blur artifact. In
addition, the proposed LED local dimming technique controls the local luminance in
the LCD screen by modulating the LED backlights, enhancing the static contrast
ratio of the image frame.
1
Chapter 1 INTRODUCTION
Increasing demand for low power dissipation in the battery-operated mobile device
has changed the concept of ‘primary design driver’ for many of today’s systems from
the high-performance to the long-life and thermal robustness. Indeed, power
dissipation and the resulting temperature rise have become the dominant limiting
factors to the system performance and constitute a significant component of its cost.
The two parameters of power and temperature are closely related in that (i) they
are in a first-order relationship, i.e., an increase in the power dissipation of the
system results in an increase in the chip temperature, and (ii) many of the power
control techniques, e.g., supply voltage scaling, may also be used as a means for
temperature control. More precisely, however, the low power and the temperature-
aware designs seek to achieve different objectives. First, the low power design aims
at minimizing the average power reduction in the time period of use, i.e. maximize
the energy efficiency of operations, while the low temperature design aims at
limiting the peak temperature in the time period of use. Mathematically speaking,
low power design solves a minsum optimization problem whereas the low
temperature design solves a minmax optimization problem. Secondly, the local die
temperatures are a function of the local power densities, hence, the issue of coping
with or avoiding “hot spots” in thermally-driven designs whereas the system power
is a simple summation of the power dissipated in both high and low power-
consuming blocks.
2
To underline this difference, consider a low power-consuming block with a very
small area compared to a high power consuming block with a very large area;
Clearly, from a low power design perspective one must develop strategies to
minimize power dissipation of the larger block whereas from a thermal control
perspective, the power density and hence local die temperature are higher for the
smaller block. In fact, power-aware design that targets limiting the peak power
dissipation in high activity blocks in order to avoid power network integrity
problems has more in common with temperature-aware designs.
In line with this understanding of the power vs. temperature driven designs, this
dissertation presents a number of thermal management techniques specifically for the
microprocessor chips as well as low power techniques specifically for the display
systems.
1.1 Thermal Management in Microprocessors
Peak power dissipation and resulting temperature rise have become the dominant
limiting factors to microprocessor performance. Expensive packaging and heat
removal solutions are needed to achieve acceptable substrate and interconnect
temperatures in high-performance microprocessors. The heat flux in state-of-the-art
microprocessor chip is currently in the range of 10-20W/cm
2
, which is already
exceeding the confines of air cooling.
Current thermal solutions are designed to limit the peak processor power
dissipation to ensure its reliable operation under worst-case scenarios. However, the
peak processor power and ensuing peak temperature are hardly ever observed.
3
Dynamic Thermal Management (DTM) technique has been proposed as a class of
microarchitectural solutions and software strategies to achieve the highest processor
performance under a peak temperature limit. Furthermore, it is known that power
density across the chip is non-uniform, resulting in localized hot spots. DTM
solutions must address this phenomenon as much as they tackle system-wide
temperature violations; when a microprocessor chip approaches its thermal limit, a
DTM controller needs to initiate HW reconfiguration, slow-down, or shutdown to
lower the chip temperature.
Traditionally, these thermal issues within a microprocessor chip have been
handled at the package level. Chip manufacturers have devised sophisticated, albeit
expensive, packaging and cooling assemblies, i.e., heat sinks and micro-fluidic
conduits, to the microprocessor chips so as to efficiently transfer heat generated
within a chip to the ambient environment. However, packaging and cooling systems
without knowledge about the resource utilization and power dissipation demands of a
software program running on a processor chip have some major limitations in that
thermal behavior of a microprocessor combined with its running programs cannot be
seen either circuit/gate level or at design time but only at runtime. As such, micro-
architecture level solutions can respond to the dynamic temperature change of a chip
so as to avoid the worst-case power density and temperature conditions. Some
examples of DTM techniques include fetch-toggling [4] (instruction fetching is
stalled for next N cycles) and instruction cache throttling [4] (throttle the instruction
forwarding from the instruction cache to the instruction buffer), activity migration
4
[24] (dispatching computations to different locations on the die) and Dynamic
Voltage and Frequency Scaling (DVFS) [61].
A common feature of most DTM techniques is their reactive nature, which relies
on two pre-defined levels of thermal limits: a trigger temperature, T
trigger
, and a
critical temperature, T
critical
. T
trigger
is a thermal limit above which DTM techniques
will be activated to lower the temperature whereas T
critical
is a thermal limit above
which the temperature could damage the microprocessor chip, and hence, it must not
be exceeded. Unfortunately, thermal safety achieved by these DTM techniques
typically comes at the expense of noticeable (speed) performance degradation, which
is sometimes intolerable. A good example is the MPEG decoding task where
maintaining a fixed throughput, i.e. frame rate, is crucial. In such applications,
quality may be sacrificed in order to achieve the desired performance. For MPEG
decoding, there has been a number of research works [9] [10] [60] that address power
management, but very little has been done that addresses temperature management.
For this reason, this dissertation focuses on the thermal management techniques for
the thermally safe state of operation of microprocessors in MPEG decoding.
1.2 Dynamic Backlight Control in LCDs
1.2.1 Dynamic Backlight Scaling
As the portable electronic devices become more intertwined with everyday life of
people, it becomes necessary to put more functionality into these devices, increase
their performance, and limit their energy consumption. These devices, which are
5
getting smaller and lighter, are often powered by rechargeable batteries.
Unfortunately, the battery capacities are increasing at a much slower pace than the
overall power dissipation of these devices. So it is critical to develop power-aware
design methodologies and technique to bring the power dissipation growth of such
devices in line with the battery capacity increase.
Major sources of power dissipation in a portable electronic device are many and
vary as a function of the device functionality and performance specification. In
reality, many of these devices are equipped with a Liquid Crystal Display (LCD),
which tends to account for a significant portion of the total system power. For
example, in the SmartBadge system, the display subsystem consumes 28.6% and
50% of the total power in the active/idle and standby modes, respectively [52]. The
dominant backlighting technology for LCD’s is the Cold Cathode Fluorescent Lamp
(CCFL), which uses a low-voltage DC to high-voltage AC converter as the driver.
This driver consumes the largest amount of power in the display subsystem [34].
There are two main classes of techniques for lowering the power consumption of
LCD subsystem. The first class of techniques is focused on the digital/analog video
interface between the graphics controller and the LCD controller. These techniques
try to minimize the energy consumption by taking advantage of different encoding
techniques to minimize the switching activity on the electrical bus. For instance,
Cheng et al. [6] used the spatial locality of the video data to reduce the number of
transition on the bus reducing its energy consumption by 75%. More recently,
Salerno et al. [50] extended the previous work by using a set of limited intra-word
6
transition codes to achieve more than 60% energy saving on average compared to the
basic transmission protocol.
The second class of techniques is focused on the video controller and the backlight
of the display system. The key idea follows from the fact that eye’s perception of the
light, which is emitted from the LCD panel, is a function of two parameters, i)
intensity of the backlight and ii) transmittance of the LCD panel. Thus by carefully
adjusting these two parameters, one can achieve the same perception in human eyes
at different values of the backlight intensity and the LCD transmittance. Now
because the variation in power consumption of the backlight lamp for different
output luminance values is orders of magnitude larger than power consumption of
the LCD panel for different pixel values [5], one can save sizeable amount of energy
by simply dimming the backlight and increasing the LCD transmittance to
compensate for the loss of backlight luminance. As a technique in the second class,
this dissertation proposes an effective backlight control technique in LCDs with
CCFL backlights.
1.2.2 Dynamic Backlight Scanning
LCD TVs are the main stream products in the current Flat Panel Display (FPD)
market and they are present literally everywhere in our lives, e.g., living room,
school/workplace, and grocery stores. In spite of their popularity and excellent
performance (e.g. vivid image representation and high native resolution) over the
other types of TVs such as Plasma Display Panel (PDP), LCDs suffer from a number
7
of shortcomings such as motion blur artifact1, low Contrast Ratio (CR), and low
brightness. For those reasons, Cathode Ray Tube (CRT) ‘motion blur free’ feature
and PDP’s high CR feature pose continual threat to LCD’s dominance.
The biggest shortcoming of LCDs is the motion blur artifact due to three well-
known phenomena: i) slow response time of Liquid Crystal (LC), ii) scan-and-hold
feature of LCDs, and iii) slow pursuit nature of the Human Visual System (HVS). In
order to overcome slow response time of LC, typically various kinds of over-driving
techniques [42] are widely employed by the LCD industry. Those techniques
generally apply sufficiently high voltage levels to LCs so as to tilt them faster, i.e.,
consequently get them to respond faster. The amount of over-driving voltages is
typically determined at design time and saved in a Look-Up Table (LUT).
Figure. 1 Image representation in LCDs vs. CRTs.
Furthermore to effectively tackle the scan-and-hold aspect of LCDs (illustrated in
Figure. 1), together with the slow pursuit nature of the HVS, LCD architects have
devised various ‘eye-bleaching’ techniques such as Black Frame Insertion (BFI) [36],
1
Motion blur is the apparent streaking of rapidly moving objects in a still image or a sequence of
images such as a movie.
8
backlight flashing/blinking [25], and backlight scanning [59]. In fact, although the
response time may approach zero in future LCDs, the motion blur artifact will
continue to exist as long as the LCD panels display every image in the scan-and-hold
style. The aforementioned “fixes” attempt to re-create the impulse-type image
display of CRT monitors in the LCD panels so that the afterimages in an observer’s
eyes are eliminated on a per frame basis.
Another shortcoming in LCDs is their relatively low CR compared to PDP.
Contrast ratio is generally characterized in two ways: i) static (or spatial) and ii)
dynamic (or temporal). Static CR is defined as a (perceived) luminance difference
between the maximum and the minimum pixel values within an image frame
whereas dynamic CR is defined as a luminance difference between the maximum
and the minimum pixel values across image frames. As known, low CR in LCDs
mainly originates from the backlight leak thru LC to the front side of the panel
especially when the pixel values are close to zero grayscale. One good solution to
this problem is the dynamic dimming of the backlight, which may be classified as
follows: i) 0-D (frame level), ii) 1-D (line/segment level), and iii) 2-D (grid/tile
level). Relatively small LCDs, such as PC monitor, equipped with CCFL
backlight(s) simply allows 0-D control while large LCDs, such as LCD TVs, allows
1-D control. In both cases, either a dynamic CR or a limited range of static CR
enhancements is merely feasible. True static CR enhancement is feasible only to the
LCDs with 2-D backlight modulation and the most popular backlight source, in this
category, is the Light Emitting Diodes (LEDs). Not only to save power but also to
9
overcome the motion blur artifact as well as to enhance the static CR, this
dissertation separately proposes two backlight control techniques in LCDs with LED
backlights.
1.3 Dissertation Contributions
The key ideas and main contributions of this dissertation are summarized next.
A. Dynamic Thermal Management for MPEG Decoding
We present a DTM technique [40] for MPEG-2 decoding, which allows some degree
of spatiotemporal image/video quality degradation. Given a target MPEG-2 decoding
time, this technique dynamically selects either an intra-frame spatial degradation or
an inter-frame temporal degradation strategy so as to make sure that the
microprocessor chip will continue to stay in a thermally safe state of operation, albeit
with certain amount of image/video quality loss.
B. Active Bank Switching for Temperature Control in a Register File
We describe an activity migration based DTM technique [48] for register files that
explores program behavior in term of register file utilization such that we pay a
minimal performance penalty while reducing the steady-state temperature. The
proposed idea does not introduce any redundant functional block, does not incur
processor-wide performance penalty in terms of high IPC degradation.
C. GOP-Level Dynamic Thermal Management in MPEG-2 Decoding
We introduce a novel DTM technique [41] based upon the (per-frame decoding)
slack time estimation and its distribution in the GOP to achieve thermally safe state
of operation in microprocessors during MPEG-2 decoding. This idea incorporates
10
DVFS with the consideration of per-frame decoding deadline and temperature
constraints in the GOP. When DVFS is predicted to not meet the performance
constraints, then either the (intra-frame) spatial quality degradation or the (inter-
frame) temporal quality degradation will be employed.
D. HVS-Aware Dynamic Backlight Scaling in LCDs with CCFL Backlights
We present a temporally-aware backlight scaling technique [33] [34] for video, which
minimizes spatiotemporal distortion in perceived brightness between the original and
the backlight-scaled images in MPEG video streams, based upon the analysis of
HVS characteristics. Consequently, the perceived flickering in video is dramatically
reduced while maximizing the energy savings.
E. 2-D Scanning and Local Dimming in LCD TVs with LED Backlights
We describe two new backlight control techniques: a 1-D LED backlight scanning
technique and a 2-D LED backlight local dimming technique. The former technique
reduces the motion blur artifact by effectively breaking the scan-and-hold style
image display in LCDs while the latter technique enhances the static contrast ratio by
locally dim the backlights. In both cases, along with their video/image quality
enhancement, power savings are achieved as well.
11
Chapter 2 BACKGROUNDS
2.1 Thermal Management in Microprocessors
Many of the requisite performance features of microprocessors such as real-time
processing and the Mean-Time-Between-Failure (MTBF) are significantly affected
by the power density and resulting temperature [49]. In fact, the power dissipation
and heat density of microprocessors have increased super linearly to the point that
they affect the microprocessors’ performance, aging, reliability, and the total system
cost including the heat removal solutions [21].
DTM techniques can be classified in many ways: i) local vs. global, ii) aggressive
vs. conservative, iii) Hardware (HW)-based vs. Software (SW)-based, and iv) spatial
vs. temporal. For example, spatial methods, e.g., Activity Migration (AM) [24],
distribute excess heat to different locations within a microprocessor whereas the
temporal methods, e.g., DVFS, distribute the excess heat over time.
Brooks et al. [4] explored a variety of HW and SW DTM techniques, which
include clock frequency scaling, DVFS, decode throttling, speculation control, and
instruction cache fetch toggling. Heo et al. [24] presented an AM technique which is
effective for temperature control in the register file, albeit this method was
accompanied by large area overhead. Srinivasan et al. [61] proposed instruction
window resizing and switching-off the active Functional Units (FUs), which in turn
results in both issue width reduction and register file port deactivation.
12
Some research efforts have interwoven the existing DTM techniques either HW-
HW or HW-SW combination to offset the limitation on each side, achieving better
thermal safety with minimal performance degradation. Kumar et al. [37] proposed a
hybrid-DTM, combining the reactive HW techniques, i.e., clock gating, and
proactive SW techniques, i.e., process scheduling. Skadron et al. [53] incorporated
fetch gating during low thermal stress times and DVS during high thermal stress
times.
In practice, carrying out thermal simulation in the actual microprocessors is
considerably limited by both safety and cost. As such, some researchers have
developed Compact Thermal Modeling (CTM) [27] [28], which parameterize the
circuit geometries together with the material physical properties. The resultant model
is incorporated in the thermal simulators, one of which is the HotSpot [29] [62] [71].
HotSpot is implemented based upon the compact thermal RC modeling and a well-
known duality concept exists on its background; Heat transfer and electrical
phenomenon have similar nature, i.e., heat flow can be modeled as a current that
passes through the thermal resistance, which corresponds to the temperature
difference equivalent to voltage. The thermal capacitance models the transient
behavior and together with the thermal resistance, they form the exponential thermal
RC time constant. Huang et al. [27] [28] proposed another CTM techniques which
separate out two heat-flow paths in microprocessors: 1) Silicon bulk → Thermal
interface material → Heat spreader → Heat sink → Ambient air, 2) Silicon bulk →
Interconnect layer → I/O pad → Substrate → Lead/ball → PCB.
13
More practical DTM solutions are to use thermal sensors that generate processor-
wide thermal profile. However, this type of approach raises some delicate issues
such as, optimal sensor deployment, sensor’s response time, and chip-temperature
increase by the sensor(s). As an alternative, some researches utilized the built-in
Performance Monitoring Unit (PMU) in off-the-shelf processors. In [11] [67], authors
extracted power/energy approximation parameters from PMU on-the-fly and utilize
the parameters for the temperature control. Basically, PMU based DTM approaches
consist of two steps: i) Propose a power/energy estimation model, ii) build up an
equivalent thermal RC network from the micro-architectural floor-plan and plug-in
the power/energy figures so as to generate the temperature information. Lee et al.
[39] proposed an off-line regression analysis to find the relationship between the
event information and the temperature, which basically simplified the 4th order
Runge-Kutta temperature calculation method in the HotSpot. Han et al. [22]
proposed another lightweight runtime temperature monitoring tool called Temptor
which is based on their time-invariant linear thermal system algorithm. They
similarly have tried to simplify the slow temperature calculation method by
repeatedly performing 10msec of power calculation and 40msec of temperature
calculation.
Some research efforts [23] [51] have tried to find the thermal solutions in an early
design phase, i.e., placement/floor-planning. The rationale behind this is the thermal
behavior of every functional block in the microprocessors not only depends on its
own power density but also depends on the power density of its surrounding blocks.
14
Sankaranarayanan et al. [51] firstly introduced the maximum/minimum temperature
range with and without considering the lateral heat spreading by zero/infinite lateral
thermal resistance. Han et al. [23] showed that temperature-aware floor-planning can
reduce the maximum temperature of an Alpha processor by 21℃. The advantage of
this type of approaches is that thermal issues are considered in an early design phase,
which may not totally eliminate the thermal stresses but has the potential in
managing lateral heat spreading efficiently such that the chances of thermal
emergency are reduced. Besides, it does not need any architectural modifications and
resultant performance overhead.
Several DTM techniques have made use of extra resources in microprocessors.
Heo et al. [24] proposed to replicate the functional units and switch the operations
between the original and duplicated units. This technique appears to be effective for
temperature reduction (9.2 ~12.4 ℃℃), albeit it was accompanied by the large area
overhead (30~100%) and performance degradation (12~16%) since the activity
migration consumes processor cycles to copy values between the active bank to the
dormant bank. Similarly, Lim et al. [43] attached a relatively simple secondary
pipeline featured by single issue and in-order execution, to the primary pipeline
featured by multiple issues and out-of-order execution. Kursun et al. [38] proposed
to use helper engine to alleviate the switching overhead between duplicated cores.
Those duplication ideas are limited to target the embedded/mobile microprocessors
where performance degradation can be tolerated.
15
2.2 Dynamic Backlight Control in LCDs
Previous works on the backlight control can be categorized into three classes: i)
Reducing the power dissipation within a given tolerance of image/video quality
distortion, ii) Increasing the dynamic/static CR while maintaining the overall
luminance of image(s), and iii) Enhancing the motion blur artifact.
In the first class, Chang et al. [5] introduced Dynamic backlight Luminance
Scaling (DLS) technique to reduce the energy consumption of the LCDs. This initial
approach suffers from two main drawbacks, i) it manipulates every pixel on the
screen one-by-one, limiting the application of this approach to still image or low-
frame-rate video; ii) it achieves energy saving at the cost of loss in visual
information. Cheng et al. [7] improved the aforesaid initial approach by eliminating
the pixel-by-pixel transformation of the displayed image thru minor hardware
modifications to the built-in LCD reference driver. Their modifications could
implement any single-band grayscale spreading function to adjust the brightness and
contrast of the displayed image, extending the applicability of the initial approach to
streaming applications. Gatti et al. [17] proposed several approaches for Power
Management (PM) of display subsystems based on variable refresh rate, Liquid
Crystal (LC) orientation shift, and backlight dimming. However, their backlight
dimming technique does not use pixel value transformation method, resulting in the
loss of image brightness. Moshnyaga et al. [46] proposed another PM technique
based on user’s attention. In this technique, the display is turn off or put into the
sleep mode whenever a camera based image processing controller detects that the
16
user is not looking at the screen. Ali et al. [31] proposed an improved DLS technique
with image transformation whereby the Dynamic Range (DR) of the original image
is reduced such that the incurred image distortion is no more than a pre-specified
value. This work improved the previous techniques in two aspects: Firstly, it
presented a global Histogram Equalization (HE) algorithm which enabled
preservation of most of the visual information in spite of image transformation, and
secondly, it described a simple, yet effective, modification of the architecture of
built-in LCD reference driver so as to produce any piece-wise linear image
transformation function. Later, Ali et al. [32] presented a backlight scaling technique
which is based on a tone reproduction operator. This operator maps the original
image X to a transformed image X’ such that the perceived brightness of the image is
preserved while its dynamic range is reduced. The proposed technique was the first
to take advantage of some aspects of the HVS to maximize power savings through
dynamic backlight dimming.
In the second class, Oh et al. [47] proposed an adaptive dimming with CCFL
backlights, which enhanced the dynamic CR by a factor of two. Greef et al. [19]
proposed to combine the 1-D dimming and boosting of the CCFL backlights to
improve the brightness, static CR, and still save power. Recently, Chen et al. [8]
presented an idea that incorporated 2-D LED local dimming and global dimming,
both of which aimed at eliminating front-side backlight leak thru LC pixels. Though
their idea is effective to minimize the light leak in the dark regions (hence, improve
17
the black-level and the static CR), aiming at the light leak minimization even at their
global dimming results in the luminance degradation in the image.
In the third class, Fisekovic et al. [14] proposed to consider parameters of i) proper
timing, ii) number of backlight segments, and iii) exposure time (i.e., duty cycle).
They asserted that at least 6-7 segments and 25~30% of duty cycle is effective in
suppressing motion blur artifact while maintaining the reasonable brightness of the
frame images. Similarly, Hung et al. [30] proposed to consider parameters of i) duty
ratio, ii) lamp current, and iii) timing of CCFLs. Recently, Sluyterman et al. [59]
asserted a proper timing of backlight scanning in Hot Cathode Fluorescent Lamp
(HCFL). Though their idea is novel, coarse granularity and slow response of HCFL
backlights makes their idea less attractive in practice. Greef et al. [20] combined the
backlight scanning, dimming, and boosting techniques in HCFL backlights, which
also has a similar limitation.
18
Chapter 3 THERMAL MANAGEMENT IN
MICROPROCESSORS
3.1. Temperature Model, Zone, and Gradients
3.1.1 Temperature Model
According to the thermal model developed in [58], the temperature change in the
microprocessor is represented as:
()
old
th th th
T P
Tt
CR C
Δ= − ⋅Δ
⋅
(1)
where ∆t is a time interval, P is the average power dissipated in an interval, R
th
is a
thermal resistance, C
th
is a thermal capacitance and T
old
is the initial temperature of a
time period, respectively. After a time interval, the new temperature becomes:
new old
TT T = +Δ (2)
Let t
initial
and t
final
denote two instances of time (and their difference be denoted by
∆t), respectively. Then, the rising thermal gradient is represented as:
()
old r
th th th
T TP
tC RC
Δ
=−
Δ
(3)
In contrast, when the processor is stalled (when it is put in the standby mode) with
invocation of any DTM technique, the chip power dissipation is negligible compared
to its active power dissipation i.e., P=0. In that case while processor is stalled, the
falling thermal gradient is calculated as:
19
()
f
old
th th
T
T
tRC
Δ
=−
Δ
(4)
3.1.2 Thermal Zone and Gradients
Based on the aforesaid thermal model, we have classified these zones into two
classes (cf. Figure. 2(a)): i) Fast Temperature Rise (FTR) zone - The rising thermal
gradient is higher than the falling thermal gradient i.e., the temperature rises faster
than it falls (when the chip is allowed to cool off). ii) Fast Temperature Fall (FTF)
zone - The falling thermal gradient is equal to or higher than the rising thermal
gradient i.e., the temperature drops faster than it rises. Note that DTM techniques are
most effective in the FTF zone.
Depending on the type of packaging and cooling solutions, T
critical
may lie in one
or the other of these two zones. In the absence of DTM techniques, an application
program running on a microprocessor chip will lead the chip temperature to a steady-
state temperature denoted as T
ss
, depending on the program’s CPI characteristics.
The goal of the proposed DTM technique is to ensure that T
ss
remains below
T
critical
while minimizing power dissipation, maximizing performance, or meeting a
combination of power-performance constraints. If the original T
ss
is below T
critical
,
none of the DTM techniques is needed. On the other hand, if T
critical
lies in the FTF
zone (cf. Figure. 2(b)), then DTM techniques tend to work very well and the new T
ss
of the chip will be lowered below T
critical
with little performance penalty. Otherwise
(cf. Figure. 2(c)), DTM techniques are expected to be less effective i.e., the new T
ss
of the chip will be brought below T
critical
only with a significant performance penalty.
20
(a) Thermal gradients at different temperature levels. T
breakeven
is a temperature at
which rising and falling thermal gradients are equal.
(b) T
critical
is in the FTF zone. ∆T
drop
is the minimum temperature drop before we
deactivate the DTM techniques i.e., go back to normal mode of operation.
(c) T
critical
is in the FTR zone.
Figure. 2 Thermal gradients and zones.
3.2. Thermal Behavior of Application Programs
3.2.1. General Application Programs
Figure. 3 depicts examples of thermal behavior of a number of general application
programs with different workloads. Prior to a program run, the microprocessor chip
21
is at a certain initial temperature denoted as T
initial
. When the program starts to run,
the temperature rises and finally reaches its T
ss
. Note that T
ss
is program (workload)
dependent but is not dependent on T
initial
. Eventually, every program reaches its own
T
ss
, albeit in different amount of (settling) time, e.g. t
1~
t
5
. The T
ss
values for different
programs can be estimated based on the program behavior, e.g., its CPI-dependent
power value, which can be denoted by P
i
.
Figure. 3 T
ss
and the settling times of general applications.
The thermal gradient, ΔT/ Δt, goes to zero as a program’s thermal curve approaches
T
ss
. Unfortunately, T
ss
is known only after a program’s thermal curve reaches its
steady state level. If we are able to predict T
ss
of a program from its P
i
, then we can
use this predicted value of T
ss
to select the best-effort voltage and frequency at any
time during the program run (In this context, the best-effort solution means the
highest voltage and frequency that guarantees the T
ss
below T
critical
while incurring
minimum performance degradation). In Figure. 3, the difference between T
ss
of the
programs (in this case programs #3, #4, and #5) and T
critical
can guide the degree of
22
DVFS that guarantees the microprocessor chip’s thermal safety while incurring
minimum performance degradation.
Figure. 4 T
ss
and settling times of a MPEG-2 decoding program.
3.2.2. Multi-media Application Programs
The thermal curves of the multi-media application programs are different from those
of the general application programs. Figure. 4 shows one such example
corresponding to an MPEG-2 decoding program [74]. The key difference is the
existence of regular fluctuations in T
ss
: When the frame decoding is finished earlier
than the given per-frame decoding deadline, microprocessor becomes idle (waiting to
display the frame) until the next frame decoding starts. This pattern gives rise to a
high temperature level (i.e., the temperature at the time that the frame decoding is
completed) and a low temperature level (i.e., the temperature at the end of the idle
period) on the thermal curve. Only if the frame decoding workload fully utilizes the
23
microprocessor without any waiting (residual time), there is no significant
fluctuation in T
ss
.
To eliminate such a frame-based T
ss
fluctuation, we need to scale down the voltage
and frequency such that no waiting period exists in the frame decoding. In contrast to
the processor supported maximum voltage and frequency, which we denote as (V
MAX
,
f
MAX
), we refer to such voltage and frequency by (
no wait
max
V ,
no wait
max
f ). In this case, the
original T
ss
will typically be lowered to a new temperature level denoted as
no wait
ss
T
which is supposed to be lower than the original T
ss
.
Note that
no wait
ss
T is not always guaranteed to lie below T
critical
i.e., we may still
have to further scale down the voltage and frequency to meet the chip’s thermal
constraints. Note also that T
ss
is reduced in both cases, however, the
(
no wait
max
V ,
no wait
max
f ) combination does not incur any performance penalty. Hence the
performance degradation is experienced only in the latter voltage and frequency
scaling when
<
no wait
max
ff .
24
3.3. Proposed Thermal Management I
To the best of our knowledge, all previously proposed DTM techniques have paid
performance degradation to achieve thermally safe state in the microprocessors.
However, there are certain occasions when performance degradation is critical and
hard to tolerate. One of the popular exemplary occasions is a MPEG decoding in
which the need of fixed speed, i.e. frame rate, is more crucial than any other factors
in a moderate-performance microprocessor typically equipped with inexpensive
packaging and cooling assemblies. In such case, degradation priority should be given
to the non-performance factors so long as to guarantee the standard frame rate.
In this section, we present a DTM technique that resort to either spatial or
temporal degradations for MPEG-2 decoding with the following observation: As
computers become faster, absolute decoding time of a frame in MPEG-2 video
stream becomes smaller. However, MPEG-2 standard prescribes a fixed frame rate
of 29.97frames/sec (NTSC) and 25frames/sec (PAL) [73]. The frame rate is
determined in consideration of slow trace/pursuit nature of HVS and a frame rate
higher than this 25-30 does not effectively improve the perceived quality of MPEG-2
video streams to human eyes. Hence this, frame rate and processor speed dependent,
available residual time from the given per-frame decoding time can be used to
achieve thermal safety in the microprocessors.
25
3.3.1. Motivation
Figure. 5 reports per-frame decoding time variation of a MPEG-2 video stream
running in the MediaBench MPEG-2 decoder program [72]. The video stream has 60
frames and 704x480 resolutions and it was run in the decoder program with two
different machines: An Intel Xeon 1.7GHz and an Intel Pentium IV 2.8GHz. Notice
that the actual decoding time varies depending on the types of frame (I, P and B): I-
frame, which is computation intensive, takes longer than the other frame types. We
measured the average per-frame decoding times in each machine, which are
42.01msec and 24.01msec, respectively. Considering that MPEG-2 standard
specifies its frame rate as 29.97frames/sec (which corresponds to approximately
33msec/frame), most of the frames in Xeon chip cannot be finished to be decoded in
this frame rate. For this reason, MPEG standard initially has a frame discarding
scheme [65] whereby it can drop B frame, P frame and I frame in stepwise manner,
depending on the machine’s clock speed.
Figure. 5 Measured MPEG-2 per-frame decoding time in two machines.
In Pentium IV, on the contrary, the measured per-frame decoding time takes less
than 33msec. For such a case, MPEG standard initially has its frame rate control
26
scheme that waits for some time to display the frame in a regular interval. For
example, Berkeley MPEG-1 [74] uses ‘select’ function call to slow down displaying
frames in a fixed rate of 29.97frames/sec. Since the current state-of-the-art
microprocessors are much faster than our experimental platforms, it is expected that
less actual per-frame decoding time will be spent yet more per-frame residual time
(actual per-frame decoding time subtracted from given per-frame decoding time) will
be available. If the target video stream has a smaller resolution, this trend will be
more apparent.
Figure. 6 Thermal variations and violations of a microprocessor in the simulator.
In Figure. 6, we simulate this ever-decreasing actual decoding time in our thermal
simulator (Simulator will be explained later): We assume some fixed number of
cycles that correspond to the given per-frame decoding deadline (33msec). If the
actual decoding of a frame finishes earlier than these many cycles (deadline), we
stall the microprocessor inside the simulator for the rest of the cycles until the
deadline is reached and only then, we resume decoding of the next frame. This
procedure continues till all the frames are decoded and we present some simulation
27
results in Figure. 6 where the X-axis plots the simulation cycles in 10K granularity
and the Y-axis plots the temperature in Celsius. As noticed, the temperature starts to
decrease when the actual decoding of every frame finishes any time earlier than the
given per-frame decoding deadline. More importantly, note that the peak temperature
goes up to 103℃, which may cause logical/timing errors in the microprocessor.
Figure. 7 Proposed DTM strategy in a glance.
3.3.2. Proposed Idea
A good observation of this phenomenon is that we might have enough amount of
per-frame decoding time however thermal stress during the actual frame decoding is
severe such that it may result in erroneous frame decoding output. To avoid thermal
violation (shown in Figure. 6), we propose a DTM technique for MPEG-2 decoding.
Figure. 7 shows our DTM strategy: Given a deadline for per-frame decoding, the
conventional MPEG-2 decoder uses the first part of the decoding time to finish the
decoding task (shown as ‘Decoding’ in Figure. 7) while it rests in the second part
(shown as ‘Residual’ in Figure. 7). Unfortunately, the T
trigger
/T
critical
of the chip may
28
be exceeded in the first part. We propose to interleave short periods of decoding with
short periods of processor stalls in a frame decoding period so that the chip
temperature never exceeds the T
trigger
, yet the decoding task is completed before the
per-frame decoding deadline. Then, the problem is when the time to resort quality
degradation is and when the time to resume decoding in every resting.
Figure. 8 Piecewise decomposition of temperature variation.
3.3.1.1. Thermal Gradient
To find out the solution for the aforesaid problems, we model the thermal gradient
over time as three piecewise linear functions in Figure. 8. We specify the
temperature variation during MPEG-2 frame decoding on a DTM-ignorant machine
as T
max
and T
min
, respectively. Note that T
max
and T
min
are mostly invariant when a
program is in a steady-state but may slightly vary when a program behavior, i.e., CPI,
abruptly changes. Moreover, obtaining T
max
and T
min
is not always possible in the
actual system but is always feasible in the thermal simulator, which is aimed at
anticipating application’s thermal behavior in microprocessors. In Figure. 8, the
29
thermal behaviors of a MPEG-2 decoder program during the decoding are divided
into three regions:
z Super-linear Region (
old
th th th
T P
CRC
>>
): In this region,
r
T
t
Δ
Δ
changes dramatically and
the power term dominates the other temperature term in equation 1. Since the
thermal gradient during the rising of the thermal curve is higher than the falling
counterpart, a longer processor stall time is needed compared to the time it takes
for the temperature to rise to the same level.
z Linear Region (
old
th th th
T P
CRC
>
): In this region,
r
T
t
Δ
Δ
changes almost linearly and the
power term is relatively larger than the temperature term in equation 1. The
thermal gradients during the rising and the falling of the thermal curve are
comparable and both take almost same amount of time.
z Constant Region (
old
th th th
T P
CRC
≈
): In this region,
r
T
t
Δ
Δ
becomes almost zero and the
power term is comparable to the other temperature term in equation 1. Since the
thermal gradient during the falling of the thermal curve is higher than the rising
counterpart, a shorter processor stall time is needed compared to the time it takes
for the temperature to rise to the same level.
Conceptually, this region splitting is capable of guiding us to determine the
desirable resting period with respect to T
trigger
however three regions are not sharply
bounded in the thermal curve. Moreover, the T
trigger
is a material/architectural
parameters dependent value whereas T
max
/T
min
is MPEG-2 input file dependent one,
hence the T
trigger
can be located at any level within the thermal curve. To tackle this
30
and build a fail-safe DTM framework for MPEG-2 decoding, we carried out the
following steps.
z Run each MPEG-2 video stream in the MPEG-2 decoder program and get both
T
max
and T
min
on the machine without any DTM policy applied.
z Check T
trigger
of the microprocessor. If T
trigger
> T
max
, machine is thermally safe in
itself and no DTM policy is needed.
z If T
trigger
< T
min
, this means decoding workload is large and DTM policy must pay
significant quality degradation to achieve thermally safe state.
z If T
min
< T
trigger
< T
max
, which we show as a target trigger temperature range in
Figure. 8, then, if T
trigger
lies in the constant region, thermally safe state can be
achieved with little or no quality degradation, whereas if T
trigger
lies in the linear
region some quality degradation must be accepted to achieve thermally safe state.
Finally, if T
trigger
lies in the super-linear region, which is the worst case, then
thermally safe state can be achieved only at the cost of significant quality
degradation.
Based on these steps, we interpret the curves in Figure. 7; during the residual time
in blue curve, the falling thermal gradient is high during its initial phase and slowly
decreases afterwards. Even though significant amount of time is spent in ‘stalling the
processor’, i.e. doing nothing but waiting for the arrival of frame display time,
corresponding temperature drop is expected to be small. In contrast, the proposed
idea with the curve in green stalls the processor only for the duration of time till the
falling thermal gradient remains steep.
31
3.3.1.2. DTM Policy
From the analysis presented in the previous section, high falling thermal gradient
occurs only when the chip operates in either the linear or the constant regions. We
first verified from the extensive experiments with a set of input files with a T
trigger
of
81.8℃ that the T
trigger
is positioned in the ‘linear region’. From this, we select an
empirical value of 1M cycles to stall the processor every time we reach this T
trigger
.
This gives us comparable rising and falling thermal gradients
Even with the strategy above, note that we may end up missing deadlines for
frames unless we do have enough residual time. To cope with such situations, we set
up a DTM policy as follows: i) to minimize the chance of deadline miss, we collect
the slack time (actual per-frame decoding time subtracted from given per-frame
decoding time) for the future use. This slack time collecting is accomplished by
having a buffer in main memory which has the size of three frames. ii) Every time
we finish a frame decoding before the per-frame decoding deadline and the buffer
has any space for frames, we write the decoded frame to the buffer and save the
remaining slack time for the future use. When the buffer does not have any space, we
wait for the buffer to have a space. iii) If we either miss or predict to miss the
deadline for the frame being decoded, we resort to either spatial or temporal quality
degradations to meet the deadlines. Such a deadline satisfaction and a slack time
collection continue over the whole sequence of frames. Under this scenario, here we
explain how we decide to select the type of quality degradation.
32
z Spatial quality degradation: After enough amount of frame decoding time
elapsed, thermal behavior of MPEG-2 decoding program becomes monotonous.
Hence in decoding of all subsequent frames, we are able to predict how many
times the thermal curve will reach to the T
trigger
for the following frame decoding.
Since we allow stalling 1M cycles every time we reach the T
trigger
, the total
number of stall cycles can be easily predicted. If the predicted stall cycles are
larger than the available slack time that we collected, deadline miss is expected
hence activation of spatial quality degradation is triggered.
z Temporal quality degradation: It is true that spatial quality degradation does not
guarantee the deadline satisfaction, i.e., we may run out of slack times to meet
the per-frame decoding deadline. If a deadline miss occurs, we will drop the next
frame. The rationale is that this frame has already missed its deadline and it is
going to be displayed at the time when the next frame is supposed to be displayed.
So, instead of delaying the display of the whole sequence of frames, we drop the
next frame so that the second to the next frame can be decoded on time.
3.3.3. Spatiotemporal Quality Degradation
In our definition, spatial quality degradation is the ratio of how the modified frame
differs from the original frame in the metric of the Root Mean Square Error (RMSE)
whereas the temporal quality degradation is the ratio of how the modified video
stream differs from the original video stream in terms of the number of skipped /
dropped frames. Clearly, the spatial quality degradation is an intra-frame level image
distortion whereas the temporal quality degradation is an inter-frame level video
33
distortion. Moreover, it is obvious that we save more time from the temporal quality
degradation.
3.3.1.3. Spatial Quality Degradation
To find the best decoding steps to minimally scarify the image quality, we analyze
the typical MPEG-2 decoding sequence shown in Figure. 9. Frame decoding in
MPEG-2 has several major steps: Variable Length Decoding (VLD), Inverse
Quantization (IQ), motion compensation (MC), Inverse Discrete Cosine
Transformation (IDCT), dither, display, etc.
Figure. 9 Typical MPEG-2 decoding steps.
Among those steps, we focus on the SNR scalability and the saturation control.
Basically, MPEG-2 has two levels of layers: a base layer and an enhancement layer.
Base layer includes the coarse level of DCT coefficients and the enhancement layer
includes the finer level of DCT coefficients. On the contrary, the saturation control is
34
clipping the results of IQ. Those two Fine Granularity Scalability (FGS) techniques
in MPEG-2 decoding are initially introduced to cope with the time-varying
bandwidth for the smooth image quality degradation and our experiments show that
these two steps consume approximately 10% of the per-frame decoding time.
Moreover, they are relatively easy to be separated out from MPEG-2 decoding steps.
3.3.1.4. Temporal Quality Degradation
Though our definition of temporal quality degradation is simply dropping the frames,
not all the frames can be dropped; If a P frame is dropped, then all the subsequent P
frames must be dropped till the next I frame. Whereas a B frame can be dropped
arbitrarily since the next B frame does not depend on the B frame currently being
dropped. Hence, our temporal quality degradation drops only B frames. If the next
frame to be dropped is not the B frame then we keep decoding I and P frames until
we get the next B frame.
3.3.4. Simulation and Results
For our experiments, we modify and combine Simplescalar [75], Wattch [3], and
HotSpot. The simulated microprocessor model is based on ALPHA 21364, which
has the feature size of 0.18µ, V
dd
of 1.6V and a clock speed of 1GHz. The power
model in the simulation does not model the leakage. To avoid modifying the default
floor-plan in HotSpot, we use the same feature size and linearly scale both V
dd
to
1.8V and a clock speed to 1.2GHz. The T
trigger
and T
critical
are set to 81.8℃ and
85.0℃, respectively, whereas the ambient and initial temperatures are set to 40.0℃
35
and 60.0℃ respectively. The combined simulator generates thermal results of each
functional unit every 10K cycles.
TABLE 1 MPEG-2 video files used in the experiments.
MPEG-2 video files No. of frame Frame resolution I: P: B frame distribution
gitape 14 720 x 480 1: 4: 9
mei60f 50 704 x 480 5: 13: 32
hhilong 45 720 x 576 3: 8: 34
time_015 50 704 x 480 5: 12: 33
soccer_015 51 640 x 480 4: 14: 33
tens_015 47 352 x 192 5: 12: 30
cact_015 50 352 x 192 5: 12: 33
For the application programs, we use MPEG-2 decoder program in MediaBench
benchmark suite. TABLE 1 summarizes the MPEG-2 video files used in the
experiments, which are mostly obtained from [76]. We additionally used a few
custom-made files for the better comparison. The total number of frame in each
video file is limited to 51. Considering that 0.1℃ rise/fall of temperature may take
100K cycles reported in [54], decoding this many number of frame will take long
enough processor cycles to reflect large-scale thermal variation in each program. Our
DTM technique is implemented in the MPEG-2 decoder program such that MPEG-2
decoding steps interact with our thermal simulator, i.e., the start, stop, and resume of
decoder program are controlled by the thermal simulator.
In TABLE 2, we summarize the experimental results. The left column shows the
actually measured decoding time from the platform while the right two columns
36
compare the maximum-minimum temperatures with and without DTM technique. As
shown, maximum temperatures in DTM-ignorant machine are high enough to incur
thermal crisis in some cases whereas the temperatures in DTM machine are limited
below T
trigger
. The N/A parts mean that no DTM technique is necessary for those
input files.
TABLE 2 Thermal behaviors in the hottest functional unit
Max/Min temperatures (℃)
MPEG-2 video
files
Actual decoding time
(msec)
DTM-ignorant Proposed DTM
gitape 21.5 101.5 / 85.5 81.8 / 80.5
mei60f 19.6 99.6 / 83.8 81.8 / 80.5
hhilong 17.2 97.2 / 81.9 81.8 / 80.5
time_015 11.8 91.5 / 76.2 81.8 / 80.5
soccer_015 8.5 82.5 / 70.5 81.8 / 72.4
tens_015 4.0 73.4 / 63.2 N/A
cact_015 4.0 73.4 / 64.1 N/A
Note that the maximum-minimum temperatures for the input files with similar
resolution are roughly equal since they have approximately the same decoding
workload. When the resolution becomes smaller, the maximum-minimum
temperatures are both decreased.
Note also that the minimum temperatures in ‘time_015’ and ‘soccer_015’ are
increased while their maximum temperatures are decreased, implying that our DTM
technique effectively distribute the thermal stress during the actual decoding periods
across residual periods. Moreover, those maximum-minimum temperatures are
bounded in the higher resolution files due to the periodic processor stall periods.
37
TABLE 3 Spatial/Temporal quality degradation
Image/Video quality degradations
Spatial distortion Temporal distortion
MPEG-2 video
files
Scaled frames RMSE Dropped frames Frame drop ratio (%)
gitape 5 0.119 5 35.7
mei60f 8 0.125 15 30.0
hhilong 0 N/A 8 8.8
time_015 0 N/A 0 0
soccer_015 0 N/A 0 0
tens_015 0 N/A 0 0
cact_015 0 N/A 0 0
TABLE 3 summarizes the image/video quality degradation in the experiments. For
the spatial quality degradation, the RMSE of the average luminance (Y) value of
every frame is used. Note that the RMSE is not calculated for all frames but for the
spatially scaled frames only. For the temporal quality degradation, we show the
number of frames dropped and the frame drop ratio. As shown, when the resolution
of the frame decreases, number of dropped frames reduces, which implies that we
have enough amount of residual time. In contrast, frames with the high resolution
have relatively high frame drop ratio.
Figure. 10 shows the run-time thermal behavior during some parts in the
simulation. For simplicity, we use three MPEG-2 video files, each of which has
different resolution: ‘gitape’, ‘soccer_015’ and ‘tens_015’. Each point in the X-axis
corresponds to the measurement step in 100K cycles, Y-axis corresponds to the
temperature in Celsius, and measurements are made when programs reach to the
38
thermally steady state. We categorize MPEG-2 video files into three groups based on
their workload: Files with heavy workload, e.g. ‘gitape’, execute DTM technique
aggressively (top curve), files with medium workload, e.g. ‘soccer_015’, execute
DTM technique non-aggressively (middle curve), and files with light workload, e.g.
‘tens_015’, do not need DTM technique (bottom curve). Clearly, actual decoding
time (in turn, the residual time) has strong relationship with the degree of necessity
in DTM technique and our results show that we achieve thermally safe state of
operation in all three categories.
Figure. 10 Comparison of the thermal variations in different programs.
39
3.4. Proposed Thermal Management II
In this section, we present an activity migration based DTM technique motivated
from an observation that the Register File (RF) is not fully utilized over a program’s
execution, i.e., the lifetime of registers/operands are short such that we only need a
rather small number of physical registers to be active. Therefore, by introducing two
equal-sized banked structures in physical RF (one active bank and another sleep
bank) and selectively using these two banks in an alternating manner can be quite
effective for reducing temperature. This idea, to a certain extent, is similar to what
the authors proposed in [24] in which authors introduced a redundant RF structure,
whereas in our technique we propose to divide RF structure while exploring the
program’s RF utilization. The distinctions of the proposed idea are:
z Area: Proposed technique does not introduce any redundant functional block.
More specifically, compared with [24] the area overhead is quite negligible.
z Performance: Proposed technique does not incur processor-wide performance
degradation, which mainly results in high IPC degradation.
3.4.1. Motivation
Many 32-bit Instruction Set Architectures (ISA) are designed to have 32 architectural
registers but modern superscalar processors have more than 32 physical registers.
This discrepancy is elegantly handled by register renaming, which assigns
architecture registers to physical registers with the consideration of data/control
dependencies among the instructions. This register renaming technique is necessary
40
to resolve the dependencies, however, not all the physical registers are used all the
time. Tran et al. [64] showed that physical register usages in their simulation are in
the range of 40% to 60%. This low utilization phenomenon comes mainly from the
dependencies among instructions in the instruction window.
Figure. 11 Exemplary physical RF utilization.
Figure. 11 shows an exemplary utilization of physical registers. (Simulation
methodology will be explained later). Here the X-axis represents the number of
physical registers in terms of percentage of the original RF size (64) whereas the Y-
axis represents the percentage of total execution time. For example in ‘mcf’, 25% of
physical registers (16) is actually in use during 42% of the execution time. Notice
that 90% of execution time on average, less than a half of the physical registers (32)
are actually allocated. Especially for ‘gzip’, 50% of average utilization covers almost
95% of the execution time.
Figure. 12 shows the performance penalty with respect to the number of registers
used compared to the default number of registers (64). Notice that for ‘djpeg’ even
though for 25% of the execution time more than 32 registers are being used, the
41
respective performance penalty is merely 3%. The reason is as follows: Even though
a new instruction is dispatched and allocated to a physical register, it does not mean
that the instruction can be issued and executed right away due to the data
dependencies among instructions. Hence, if this instruction is dispatched in a cycle
or two, it may not result in performance penalty.
Figure. 12 Performance penalty in half sized RF bank.
3.4.2. Proposed Idea
Based on the above observation, we propose to divide the RF into two equal-sized
banks and use only one bank at a time; at any point of time during the execution of
every program, only one bank is active and a certain switching criteria will
determine when to enable/disable the sleep/active bank. We designate the active
bank as a ‘primary’ and the other one as a ‘secondary’. Registers are allocated
initially from the primary bank and if and only if the primary bank is full, the
allocation is done from the secondary bank. Since only a small number of physical
registers are used during most of the execution time of most programs, the duration
42
for which the secondary bank will be in use is relatively small hence the temperature
in the secondary bank will rapidly decrease.
When bank switching occurs, there might still be some references to the old sleep
bank. However, these pending references are relatively small compared to the
number of references to the new primary bank for the switching period of 10M
cycles, since all the new registers are allocated to the primary bank, and hence, their
contribution to the temperature in the secondary bank is negligible.
3.4.3. Why Bank Switching Works
When the bank switching occurs, temperature in the primary bank changes according
to equation (3) while temperature in the secondary bank changes according to
equation (4). Precisely speaking, the actual falling thermal gradient in the secondary
bank is smaller than equation (4) since some of the registers previously mapped to
this bank are alive and accessed for certain cycles after the occurrence of bank
switching while the actual rising thermal gradient in the primary bank is smaller than
equation (3). However, overall bank switching idea holds as long as the falling
thermal gradient is larger than the rising thermal gradient at a certain temperature
and can be effective to reduce the worst case temperature. The temperature level that
determines the profitability of the active bank switching idea is obtained by solving
the following equation:
1
()()
2
old old
old th
th th th th th
TT P
TPR
CRC RC
−= ⇒ =⋅
(5)
43
According to this equation, it is better to switch the bank when the T
old
is larger
than the right term at any program runtime. Note that the thermal resistance is fixed
but power is varying, implying that bank switching may be effective in many
temperature levels. On the contrary, if the program is in the steady-state and the T
ss
is
above the T
be
(as given by T
old
in equation (5)) then it will be always beneficial to
switch between banks at regular intervals. This would result in lowering the T
ss
to
'
ss
T where
'
ss
T< T
ss
. Since most of our applications’ T
ss
lies above the T
be
, we have
opted to switch bank at a regular interval to bring down the T
ss
.
Figure. 13 Pentium IV floor-plan used in the experiment.
3.4.4. Simulation and Results
For the experiments, we combine SimpleScalar, Wattch, and Hotspot. The combined
simulator generates temperature data every 50K cycles and the initial/ambient
temperatures are set by 60/45℃, respectively. For the floor-plan in the thermal
44
simulation, we obtain a 2.6GHz Pentium IV 130nm floor-plan [80], estimate the area
information for each of functional unit, and use this information in the simulation.
Figure. 14 Detailed floor-plan for the proposed RF structure.
Note that we need more detailed information about the RF for the banked structure.
Figure. 14 (a) shows ‘integer execution core’ of the tagged die-photo obtained from
[80]. As shown, the actual RF area is small in the original floor-plan and is roughly
half of the original size. Hence, we divide the original RF area into half to match our
floor-plan with the die-photo. We position this half sized RF in the center of the RF
area while the surrounding area is kept void (cf. Figure. 14 (b)). Then, we further
divide this area into half (cf. Figure. 14 (c)) since the original RF area corresponds to
the size of 128 whereas our new RF size is 64. In the banked RF, we divide the RF
area into half (cf. Figure. 14 (d)).
45
For the applications, we used SPEC2000INT benchmarks [81] with reference/train
input files, Mediabench program, and MPEG-2 decoder program. Input files for
Mediabench are custom made and input file for the MPEG-2 decoder program is
obtained from [76]. TABLE 4 shows input files of all programs. Each program is
compiled with the PISA compiler using default optimization option, i.e., ‘-o3’.
Our simulation procedure is as follows: For the first 200K cycles of each
benchmark program runs, we obtain the typical power figure for the RF (along with
other FUs). Next, we use this power figure to mimic the thermal simulation without
actually simulating the application by continuously feeding this power figure to each
of FUs. With this way, thermal simulation time can be saved hence we reach to the
T
ss
in a short time. Once the T
ss
is found, we resume the actual thermal simulation.
TABLE 4 T
ss
and IPC
T
ss
() ℃
Program
Monolithic RF (64) Banked RF (2x32)
Thermal
reduction
() ℃
IPC
mcf 68.0 66.7 1.3 0.7707
gcc 76.5 73.7 2.8 1.2748
bzip 78.2 75.0 3.2 1.5022
gzip 81.7 77.5 4.2 2.1069
cjpeg 83.0 79.0 4.0 2.2553
mpeg2dec 82.0 77.7 4.3 2.2729
djpeg 82.0 77.0 5.0 2.3825
At first, we ran each application in a monolithic physical RF of size 64 and record
the T
ss
. Then, we ran the application with a banked RF with the bank switching idea.
46
In a banked RF, the total number of physical registers is 64 but they are divided into
two equal-sized banks. In TABLE 4, the difference of T
ss
between the monolithic and
the banked RF is shown, which is the temperature reduction that bank switching
technique achieved. On average, our bank switching technique achieves 3.4℃ of T
ss
reduction. As a program workload increases, note that the T
ss
also increases, hence
the bank switching idea becomes more effective.
Figure. 15 Partial thermal behaviors in gcc.
Figure. 15 partially shows different thermal behavior of ‘gcc’ program. Each point
in the X-axis corresponds to the measurement step in 10M cycles and the Y-axis
corresponds to the temperature in Celsius, The upper curve corresponds to the
thermal behavior in the monolithic RF and the lower two curves correspond to the
thermal behavior of each bank in the banked RF. Compared to the upper curve, note
that a program’s thermal behavior is maintained in the lower curves and the periodic
bank switching is observed between the two lower curves. Note also that two lower
curves lay one upon another with very small thermal differences, which implies that
47
our bank switching period is well chosen to minimize the temperature gap among the
two banks.
TABLE 5 shows the RF utilization in terms of the percentage of total execution
cycles spent using merely a quarter, a half, and three quarters of the RF, respectively.
The performance penalties are reported with respect to the half sized RF (32). As
noticed, performance penalty is negligible since the utilization of the RF is low as
explained in the motivation.
TABLE 5 RF utilization and performance
RF utilization (%)
Program
25% 50% 75%
Performance penalty (%)
mcf 42 92 96 0
gcc 43 86 98 0.16
bzip 54 93 99 0
djpeg 20 95 99 0.47
cjpeg 32 75 90 1.25
mpeg2dec 38 91 99 0.69
gzip 32 75 90 2.68
48
3.5. Proposed Thermal Management III
In this section, we present a DTM technique based upon the (per-frame) decoding
deadline, slack time estimation, and its distribution within the GOP to achieve
thermally safe state of operation in microprocessors during MPEG-2 decoding.
Basically, DVFS is incorporated with the consideration of both per-frame decoding
deadline and temperature constraints in the GOP; when DVFS is predicted to fail to
meet the performance constraints, then either the (intra-frame) spatial image quality
degradation or the (inter-frame) temporal image quality degradation will be
employed to make up for the expected performance loss. The distinction and
contribution of the proposed DTM technique is:
z Preventive nature: Unlike the typical temperature-driven reactive nature in most
DTM techniques, proposed DTM technique is preventive in that it estimates the
degree of thermal crisis in the near future, and correspondingly applies DVFS
and allows some quality degradations progressively.
z Spatiotemporal quality degradation: Unlike the typical malignant performance
degradation in most DTM techniques, proposed DTM technique pays non-
malignant (does not violate the frame decoding deadline) performance
degradation which is spatiotemporal quality.
z T
ss
driven/guided: Unlike the typical T
trigger
driven DTM techniques, estimation
of T
ss
guides the degree of DTM technique in advance.
49
3.5.1. Motivation
A descriptor in MPEG is its frame type and MPEG has 3 different types: I, P, and B
frames. An I (intra)-frame, which is encoded independent of other types of frames,
has a nearly fixed workload and thus takes an almost constant decoding time. In
contrast, P (predictive) and B (bi-directional) frames have variable workloads due to
the temporal motion variation in the frame sequence, resulting in different degrees of
compression; hence, they take variable decoding time.
Another descriptor in MPEG is a GOP which is defined as a repeated pattern of I,
P, and B frames. Basically, GOP breaks the inter-dependencies of frame sequences
in that a frame in a given GOP rarely has dependency on any frame in other GOPs.
Furthermore, although the I-P-B pattern in GOPs in an encoded MPEG stream is not
necessarily fixed, in practice, this pattern has a fixed format [70], which is
determined at encoding time [69]. In our view, this GOP is at the appropriate level of
granularity for the DTM task because i) in an MPEG stream, although the decoding
workloads for P and B frame types varies widely (cf. Figure. 16), the workload
variation across different GOPs is rather small (cf. Figure. 17), ii) when applying
DVFS to control microprocessor chip temperature, the reliance on a GOP minimizes
the change in voltage and frequency settings (which has some overhead) and at the
same time allows us to perform slack time borrowing across different frames in a
GOP so as to minimize the inter-frame jitter that would arise from greedy and
hurried DVFS in response to the workload of the current frame.
50
Figure. 16 reports the B-frame’s decoding time variation in a MPEG-2 video
stream decoded with MediaBench MPEG-2 decoder program. We run the decoder on
an Intel Pentium IV 2.8GHz platform and report the percentile change in each B-
frame’s decoding workload with respect to its previous B-frame. For this figure, data
for a sequence of six GOPs are selected from a MPEG stream that has a 720x416
resolution. (The simulation methodology will be explained later.) As noticed, the B-
frame’s decoding time fluctuates by a large amount over the frame sequence, which
in turn hinders accurate workload prediction on a per-frame basis.
Figure. 16 Workload variation in each consecutive B-frame.
In contrast, Figure. 17 shows the (per-frame) decoding time of each frame within
the same six GOPs. The X-axis, which depicts the invariant GOP structure in this
MPEG stream, has a frame sequence of ‘IBBPBBPBBPBBPBB’. As shown, per-
frame decoding times in each GOP vary; however frames that appear in the same
position in all of the six GOPs spend nearly constant decoding times. We can thus
conclude that although per-frame decoding times in a GOP may vary significantly,
the decoding times for each frame in a GOP can be estimated after the first few
GOP’s are analyzed and workload of I, P, and B frames for the GOP’s are recorded.
51
Therefore, GOP-aware workload prediction for frames outperforms the conventional
workload prediction (which is typically based on regression estimation of workload
for each frame type independent of the GOP boundaries). Once the DVFS policy for
frames in a GOP is determined, the same policy will remain in effect for all
consecutive GOP’s until the workload changes significantly and at that time we will
estimate the workload again to derive and set a new GOP-level DVFS policy.
Figure. 17 Decoding time in each consecutive GOPs.
3.5.2. Steady-State Temperature Calculation
As mentioned earlier, we assume that the large-scale P
i
’s in a program run are more
or less constant. Then, the constant power term and the current temperature term (i.e.,
T
old
) in equation (1) will determine both ΔT and T
new
at the end of the time period Δt.
The T
new
will be used as new T
old
, and the same calculation repeats for the next time
period of Δt. Eventually after certain number of such time periods of Δt, T
new
will
arrive at the T
ss
, when the thermal gradient satisfies ΔT/ Δt ≈ 0. Hence, the calculation
of equation (1) repeats until this condition is satisfied at which we arrived at T
ss
. This
flow is shown in equation (6):
52
*
1
*
where is the minimum index where / 0
n
k
ss initial
k th th th
*
n
T P
TT t
CR C
nTt
=
⎛⎞
=+ − ⋅Δ
⎜⎟
⋅
⎝⎠
Δ Δ≈
∑
(6)
The number of iterations, n
*
, determines t
settle
, which is n
*
· Δt. In fact, there is a
more direct way of calculating T
ss
as follows:
00
ss
ssth
th th th
T TP
TPR
tCRC
Δ
=⇒ − = ⇒ = ⋅
Δ⋅
(7)
If the calculated T
ss
is below T
critical
, program is naturally thermally-safe and no
DTM technique is needed. If the calculated T
ss
is above T
critical
, the temperature gap
between T
ss
and T
critical
will set the degree of DTM, i.e., how aggressively we need to
slows down the microprocessors. Notice that R
th
is constant for the given
microprocessor chip and the only parameter that we can control is power term, P, to
control T
ss
.
3.5.3. Determination of Voltage and Frequency Level
Once the T
ss
of a program is obtained corresponding to the program workload and
maximum power value and is found to be above T
critical
then we must scale down the
voltage and frequency to reduce the power and in turn reduce T
ss
. We start from a
simple equation for power dissipation per instruction execution in a CPU:
2
P CV f α = ⋅⋅ ⋅
(8)
where C is the total capacitance of the CPU obtained by summing all the gate
input capacitances, V is the operating voltage, f is the clock frequency, and α is a
switching activity factor defined as the expected number of transitions per gate per
53
instruction. Notice that α is dependent on the CPI count of the target application
program. The above power equation is only utilized in the remainder of this
dissertation to show that as long as the application program running on a CPU is
unchanged, the power dissipation is only dependent on the product of V
2
and f.
Clearly, there are more accurate CPU power macro models that can capture the
effect of program behavior (inter-instruction dependencies, cache access patterns,
etc.) but we make this assumption for simplicity.
The previously used P
i
is calculated under the processor-given maximum voltage
and frequency and consequently the resultant thermal curve reaches to T
ss
. By
employing DVFS, the different voltage and frequency levels will generate different
power dissipation values smaller than P
i
. Plugging these new power terms into
equation (1), we obtain different T
ss
values corresponding to each of these power
terms. Among these different voltage and frequency levels generating different T
ss
values, the highest level that satisfies T
ss
< T
critical
is the best voltage and frequency
since it results in the minimum performance degradation under the given temperature
limit. In equation, it is presented as follows:
min 1 2 -1 max
min 1 2 -1 max
2
2
Available frequency levels : ...
Available voltage levels: ...
1,...,
where
i
nn
nn
iii
f
ss i th i i th
f ff f f f
V V V V V V
inPCVf
TPR bV f bC R
α
α
= << < =
=< < < =
∀∈ = ⋅ ⋅ ⋅ ⇒
=⋅ = ⋅ ⋅ = ⋅ ⋅
(9)
Due to workload difference among different application programs, T
ss
of each of
these programs at (V
max
, f
max
) is different. Therefore, each of these programs requires
different level of voltage and frequency to bring their T
ss
below T
critical
. Figure. 18
54
provides the pictorial example of this point. We reuse the thermal curves of program
#3 and #5 from Figure. 3. As shown in bold lines, frequency levels f
1
and f
5
are the
best selections for programs #5 and #3, respectively.
Notice from the plot that the higher is the program’s workload, the more
aggressive DVFS have to be. We define the competitiveness of a DTM strategy,
denoted by the ‘c-factor’, as the degree by which the DTM employs DVFS to control
temperature rise/fall on the CPU. A higher ‘c-factor’ for a DTM strategy means more
aggressive thermal management. In particular, we set a ‘c-factor’ of 1 when
minimum voltage and frequency setting must be applied to bring the temperature
below T
critical
and set a ‘c-factor’ of 0 when no DVFS has to be applied to keep the
CPU thermally safe.
Figure. 18 Example showing DTM c-factor’ dependence on T
ss
.
3.5.4. Thermal Management Policy
Here we define some terms in the proposed DTM technique.
N: Number of frames per GOP.
55
W: Workload of a frame in terms of the number of CPU cycles needed to finish the
frame decoding.
f: Operating frequency chosen from a discrete set of frequencies in the range f
MIN
to
f
MAX
.
t: Per-frame decoding time which is calculated as follows:
W
t
f
=
(10)
D: Per-frame decoding deadline (which is determined by the frame rate).
Δt: Per-frame slack time which is calculated as follows:
tD t Δ=−
(11)
t
slack
: Per-GOP slack time which is the summation of the Δt’s within a GOP, i.e.,
1
N
slack
tt = Δ
∑
(12)
δ: Pre-determined time saving by applying spatial quality degradation.
T
limit
: User-defined temperature limit that guides the DTM policy.
3.5.5. Problem Formulation and Off-line Solution
The problem of assigning frequencies to each frame while meeting both decoding
deadline and temperature constraints can be formulated as follows:
2
1
1
such that and
Minimize
i
N
i
i i
i
i
N
f
ss limit
i
W
D
f
W
TND
f
T
=
=
⎛⎞
−
⎜⎟
⎝⎠
⎛⎞
≤ ⋅
⎜⎟
⎝⎠
<
∑
∑
(13)
56
This objective function represents the frame decoding time variance which is
known as ‘jitter’ in multi-media programs. First, we must ensure that the T
ss
under
our frequency assignment is less than T
limit
. By assigning different voltage and
frequency settings to every frame in a GOP, we try to complete the whole frame
decoding workload in the GOP within the given deadline, D
GOP
= N·D.
Unfortunately, only a discrete and finite number of frequencies are available in
practice. Moreover, it is possible that severe thermal constraint prohibits any
frequency from satisfying the deadline constraint in the GOP. As a consequence, it is
not always possible to find the frequency assignments that meet both of the aforesaid
constraints. In that case, we must choose frequencies that violate the per-frame
deadline (performance) constraint but still meet the temperature constraint, and
subsequently compensate for the performance loss by using spatiotemporal quality
degradation. Due to this limitation and high computational complexity (by using the
ILP solver) of solving the formulated problem (13) online, we have developed an
online heuristic algorithm.
Figure. 19 Deadline and thermal constraints in frame decoding.
57
3.5.6. On-line Solution
The DTM policy presented in this section performs energy-efficient frequency (and
subsequently voltage) assignment while strictly meeting the thermal constraints. If
the performance constraint cannot be met by the frequency scaling, then we resort to
quality degradation. Before explaining the proposed DTM solution, we examine the
relationship between frequency-dependent deadline and the thermal constraint for a
frame in Figure. 19. Although we actually scale both voltage and frequency, we only
show the frequency term in this figure. When the available frequency range is given
as f
MIN
to f
MAX
,
min
deadline
f is the minimum frequency that meets the per-frame decoding
deadline constraint while
max
thermal
f is the maximum frequency that does not violate the
thermal constraint, i.e., T
ss
< T
limit
. The dotted lines in Figure. 19 imply that these two
limits are different in every frame within the GOP. Basically, the proposed DTM
policy decides the degree of thermal management depending on the relative positions
of these two parameters in each frame. The possible frame types categorized by these
two parameters are:
z Frame type I (Non-negative slack generator:
min max
deadline thermal
ff ≤ ): In this type of
frame, multiple frequencies meet both temperature and performance constraints
for the frame decoding. Among the multiple frequencies, we initially select
min
deadline
f but we have the freedom to pick any frequency up to (and including)
max
thermal
f and we use these choices in case we need to compensate for any
performance degradation due to frequency assignment to other frames in the
58
GOP. The initial selection of
min
deadline
f and the following frequency increase is
energy efficient since we always try to use minimum frequency (and
corresponding voltage) while meeting the performance constraint. In the flow
chart (cf. Figure. 20), the ‘AFS (Available Frequency Set)’ corresponds to the set
of frequencies equal to or higher than
min
deadline
f but less than or equal to
max
thermal
f for
this type of frame. Since each frame has different workload, different ranges of
‘AFS’ exist for each frame.
z Frame type II (Negative slack generator:
max min
thermal deadline
ff < ): In this type of frame,
none of the frequencies meet both constraints in the frame. From the flow chart,
we initially select
min
deadline
f for this type of frame; however, we will eventually
switch to
max
thermal
f to meet the thermal constraint. So this type of frame will result
in a frame deadline violation, but not necessarily a GOP deadline violation. If the
latter occurs (because for example when we have many type II frames in a GOP
and few type I ones), then we must sacrifice video quality by applying
spatial/temporal quality degradation.
These two types of frames co-exist in each GOP and together generate a t
slack
for
the whole GOP, which in turn determines the aggressiveness, i.e., ‘c-factor’, of DTM
strategy and the degree of spatial and temporal quality degradation in each GOP.
Figure. 20 shows the proposed GOP-level DTM policy in flow chart; we employ
DVFS, spatial quality degradation, and temporal quality degradation in that order.
Note that the policy depicted in Figure. 20 takes the exact workload of each frame in
a GOP as an input and outputs of the frequency/voltage and spatial/temporal quality
59
degradation assignments for each frame in the GOP. As long as the workload of the
GOP remains relatively unchanged, the same voltage/frequency assignments are in
effect. However, if the GOP changes (more precisely, if the GOP workload changes
by more than 5% with respect to the workload of the last GOP based upon which the
GOP-level policy was determined), then we will re-compute the voltage/frequency
assignments by running the GOP-level DTM policy in Figure. 20.
60
61
Figure. 20 Online algorithm for computing GOP-level DTM policy.
3.5.7. Simulation and Results
For the experiments, we integrated Simplescalar, Wattch, and HotSpot. The
simulated microprocessor model is based on Alpha 21364, which has a minimum
feature size of 0.18µ, nominal supply voltage of 1.6V, and a nominal clock speed of
1GHz. The Wattch power model does not account for the leakage power. The
simulated supply voltages have a range from 0.8V to 1.8V in steps of 0.2V, and
clock frequencies have a range from 600MHz to 1.2GHz in steps of 100MHz. The
both of ambient and initial temperatures were set 40.0℃ while we experimented with
different maximally allowed temperature limits (denoted as T
limit
) to show the
effectiveness of the proposed DTM technique. The combined thermal simulator
generates temperature results for each functional unit every 10K cycles and our
experimental results are based on temperatures in the hottest functional unit.
For the application programs, we use the MPEG-2 decoder program of
MediaBench benchmark suite while considering a frame rate of 60frames/sec.
Several custom-made MPEG-2 video files are used and TABLE 6 summarizes the
files used in the experiments.
62
TABLE 6 MPEG-2 video files used in the experiment.
Files No. of frame Frame resolution GOP Structure
mei60f 48 704 x 480 IB
2
PB
2
PB
2
PB
2
hhilong 36 720 x 576 IB
3
PB
3
PB
3
PB
3
Jurassic Park 105 720 x 416 IB
2
PB
2
PB
2
PB
2
PB
2
time_015 36 704 x 480 IB
2
PB
2
PB
2
PB
2
Dead poet’s Society 435 640 x 352 IB
2
PB
2
PB
2
PB
2
PB
2
Finding Nemo 420 560 x 352 IB
2
PB
2
PB
2
PB
2
PB
2
soccer_015 60 640 x 480 IB
2
PB
2
PB
2
PB
2
PB
2
Monsters Inc. 75 576 x 320 IB
2
PB
2
PB
2
PB
2
PB
2
Back to the Future 300 352 x 256 IB
2
PB
2
PB
2
PB
2
PB
2
cact_015 240 352 x 192 IB
2
PB
2
PB
2
PB
2
tens_015 168 352 x 192 IB
2
PB
2
PB
2
PB
2
Incredible 390 352 x 240 IB
2
PB
2
PB
2
PB
2
PB
2
TABLE 7 summarizes the architectural parameters for the thermal simulation.
TABLE 7 Baseline configuration of simulated processor.
Main memory latency 100 cycles/10 cycles
L1 I/D Cache 64KB 2-way 32B block 1 cycle hit latency
I/D-TLB Fully associate, 128 entries, 30 cycles miss latency
Branch Predictor 4K Bimodal
Functional Units
4 INT ALU, 1 INT MULT/DIV, 2 FP ALU, 1 FP
MULT/DIV
RUU/LSQ size 64/32
Instruction Fetch Queue 8
Issue Width 6 instruction per cycles
63
TABLE 8 shows the temperature reduction and the resulting spatial/temporal
quality degradation with the proposed online DTM algorithm. We set different T
limit
,
namely 65℃, 60℃, 55℃ and 50℃ and show how the proposed DTM algorithm
works for each of MPEG-2 video files. Clearly, spatial/temporal quality degrades as
T
limit
decreases. Similarly, the quality degradation is higher for MPEG-2 video files
that have higher workload, i.e., those with higher resolution. Files that have
relatively small workload are not affected by the proposed DTM technique.
TABLE 8 Temperatures and the corresponding quality degradation.
T
limit
= 65℃ T
limit
= 60℃ T
limit
= 55℃ T
limit
= 50℃
MPEG-2 video
files
T
ss
Spa Tem T
ss
Spa Tem T
ss
Spa Tem T
ss
Spa Tem
mei60f 63.3 91.6 36.1 56.8 91.6 50.0 51.4 91.6 63.8 47.2 91.6 83.3
Hhilong 61.8 91.6 30.5 56.8 91.6 33.3 51.3 91.6 47.2 47.2 91.6 63.8
Jurassic Park 63.4 93.3 5.7 56.8 93.3 17.1 51.3 93.3 28.6 47.2 93.3 45.7
time_015 63.3 16.7 0 56.961.1 0 51.461.111.1 47.2 61.122.2
Dead poet’s 58.5 0 0 55.5 0 0 51.3 20.2 0 47.2 93.1 0
Finding Nemo 52.4 0 0 52.4 0 0 50.3 0 0 47.2 44.3 0
soccer_015 49.8 0 0 49.8 0 0 49.6 0 0 47.0 0 0
Monsters Inc. 47.4 0 0 47.4 0 0 47.4 0 0 46.5 0 0
Back to the 43.7 0 0 43.7 0 0 43.7 0 0 43.7 0 0
cact_015 43.7 0 0 43.7 0 0 43.7 0 0 43.7 0 0
tens_015 43.7 0 0 43.7 0 0 43.7 0 0 43.7 0 0
Incredible 43.7 0 0 43.7 0 0 43.7 0 0 43.7 0 0
Notice that the spatial/temporal quality degradation, reported in TABLE 8, is with
respect to the number of frames either subjected to spatially degraded or temporally
64
dropped over the total number of frames. Hence, spatial quality degradation
percentile is always higher compared to temporal quality degradation. As mentioned,
spatial quality degradation is achieved by skipping the enhancement layer steps in
MPEG-2 video decoding hence it does not worsen the picture quality noticeably.
This is evident from the RMSE value for spatially degraded frames, which is 0.122,
on average, for all the spatially degraded frames.
0
20
40
60
80
100
120
140
mei60f
hhilong
Jurassic park
time
Dead poet's society
Finding Nemo
soccer
Monsters Inc.
Back to the Future
tens
cact
Incredibles
Workload in Percentile
0
10
20
30
40
50
60
70
80
90
100
Combined Quality Degradation in Percentile
Percentile Workload Tcritical =65 ℃ Tcritical =60 ℃ Tcritical =55 ℃ Tcritical =50 ℃
Figure. 21 Relation between the workload and the quality degradation.
Figure. 21 shows the relationship between the average per-frame decoding time as
a percentage of the given frame decoding deadline (cf. Y-axis on the left), and the
degree of average quality degradation per-frame (cf. Y-axis on the right). The bar
graph corresponds to the workload in percentile and the line graphs correspond to the
combined quality degradation. The combined quality degradation is generated by
applying different weighting factors between the spatial and the temporal quality
degradations. This figure basically discloses a number of facts: i) across the
programs, the degree of quality degradation becomes more severe as the program’s
workload becomes higher. ii) As T
limit
gets lower in a program run, higher degrees of
65
quality degradation are needed to meet both deadline and thermal constraints. iii) For
programs whose workload is above 100%, e.g., ‘mei60f’, the combined quality
degradation will never reach 0%. 4) For programs whose workload is less than 100%,
e.g., ‘time_015’, (which implies that they naturally meet the performance constraint)
the combined quality degradation may be higher than 0% since the thermal
constraints can become a bottleneck.
Figure. 22 Comparison of thermal curves in three MPEG-2 video files.
Figure. 22 shows the transient thermal behavior during a part of the simulation for
three MPEG-2 video files. Temperature measurements are performed every 10K
cycles. For each MPEG-2 video file, we report two simulation curves: i) without and
ii) with the proposed DTM technique under a T
limit
value of 65℃. Those three
MPEG-2 video files are chosen to represent the range of possible workloads:
‘Jurassic park’ (red curve) for heavy workload, ‘Dead poet’s society’ (green curve)
for medium workload, and ‘Back to the future’ (light blue curve) for light workload.
For each MPEG-2 video file, proposed DTM technique is shown in dark blue, orange,
66
and violet curves (cf. the quasi flat curves at 63℃, 58℃ and 48℃), respectively.
This figure, together with the quality degradation results in TABLE 8, reveal the
following: i) Files with heavy workload barely meet the thermal constraint but the
quality degradation is high. ii) Files with medium workload meet the thermal
constraint with a good margin and exhibit little or no quality degradation. In this case,
thermal curve far below T
limit
implies that a non-maximum frequency in ‘AFS’ was
selected to minimize energy dissipation after meeting the thermal constraint. 3) Files
with light workload have no concern about the thermal issues, yet the applied DVFS
shifts the original thermal curve down.
67
Chapter 4 DYNAMIC BACKLIGHT CONTROL IN
LCDS
4.1. LCD Architecture
Figure. 23 shows the typical architecture of the digital LCD subsystem; RGB data
generated from a Video Graphics Adapter (VGA) is sent to the Timing Controller
(TCON) on LCD subsystem in LVDS (Low Voltage Differential Signaling) format.
The TCON transmits the RGB pixel data to the source drivers while transmitting
timing signals, e.g., HSYNC and VSYNC, to the gate drivers. By this synchronized
data and timing signals, each line on the LCD panel is addressed and refreshed from
the top left to the bottom right within a given frame display deadline, which is set to
the inverse of the frame rate.
Figure. 23 TFT-LCD architecture.
From the RGB pixel data it receives, the TCON generates a proper grayscale i.e.,
transmissivity of the LCD panel, for each pixel. Note that LCDs are not self-emitting
68
devices and all of the pixels on a LCD panel are illuminated from behind by the
backlight(s). To the observer, a displayed pixel looks bright if its transmittance is
high (i.e., it is in the ON state and it passes the backlight) while a displayed pixel
looks dark if its transmittance is low (i.e., it is in the OFF state and it blocks the
backlight). For color LCDs, different color filters are used to generate shades of three
RGB colors, and then a certain color is generated by mixing three sub-pixels together.
Each pixel in the LCD panel has an individual Liquid Crystal (LC) cell, a TFT,
and a storage capacitor. The electrical field of the capacitor controls the
transmittance of the LC cell and the capacitor is charged and discharged by the TFT.
The gate electrode of the TFT controls the timing for charging/discharging of the
capacitor when the pixel is addressed for refreshing its content whereas the (drain-)
source electrode of the TFT controls the amount of charge. The gate (/source)
electrodes of all TFT’s are driven by a set of gate (/source) drivers.
Ideally, the pixel value transmittance, t(x), is a linear function of the grayscale
voltage v(x), which is in turn a linear function of the pixel value x. The transfer
function of source driver which maps different pixel values, x, into different voltage
levels, v(x) is called the grayscale-voltage function. If there are 256 grayscales, then
the source driver must be able to supply 256 different grayscale voltage levels. To
provide this wide range of grayscales, a number of reference voltages are required.
Note that backlight(s) is the single most power consuming component in the LCD
subsystem [34]. The traditional backlight source is CCFL/HCFL, although white
LED has recently become a popular backlight source due to its merits such as
69
relatively low power consumption and high NTSC gamut [44]. More importantly,
relatively short RT (say, less than 0.1msec [8]) of the LED and its finer granularity
(i.e., a large array of white LEDs rather a few CCFLs) makes it amenable to the fine-
level backlight control.
Figure. 24 Pixel granularity vs. LED granularity.
Figure. 24 shows an example of the LCD panel with LED array backlighting. Let
us denote the total number of pixels for the LCD panel as
pixel
N H V = ×
(14)
whereas the total number of LEDs as
led
N M Q = ×
(15)
Although, in general,
pixel led
N N ≥
(16)
compared to typical CCFL backlighting with 1~8 CCFLs, the LED backlighting
provides a much finer granularity of backlight control (The white LED array on the
test platform are shown in Figure. 40).
70
4.2. Characteristics of HVS
Although light is a form of electromagnetic radiation, measurement of luminous
intensity from a light source requires extra information about the relative sensitivity
of the human eyes to different wavelengths. The luminous intensity of a ‘white’ light
source is thus defined by multiplying the watts emitted at each wavelength by the
efficiency of that wavelength in exciting the eyes, relative to the efficiency at 555nm.
This efficiency factor is referred to as the V-lambda curve [79]. The candela is the
luminous intensity per solid angle, in a given direction of a source which emits
monochromatic radiation at wavelength. Human perceive luminance, which is an
approximate measure of how ‘bright’ a surface appears when one views it from a
given direction. Luminance is thus defined as luminous intensity per square meter
and is measured in candela per square meter.
4.2.1. Spatial Characteristics of HVS
When light reaches eye, it hits the photoreceptors on the retina, which then send an
electrical signal through neurons to the brain. The photoreceptors in our retina,
namely rods and cones, act as the sensors for the HVS. Rods are extremely sensitive
to light and provide achromatic vision at scotopic levels of illumination (10
-6
to 10
cd/m
2
), that is why we cannot see colors in dark surroundings. Cones are less
sensitive, but provide color vision at photopic levels of illumination (10
-2
to 10
8
cd/m
2
). Note that both rods and cones are active at light levels between 10
-2
and 10
cd/m
2
. The problem is that the incoming light can have a vast DR of nearly 1:10
14
,
71
whereas the neurons can transfer a signal with dynamic range of only about 1:10
3
. As
a result, there is the need for some kind of adaptation mechanism in our vision.
One of the most important characteristics that changes with different adaptation
levels is the Just Noticeable Difference (JND.) The JND is the minimum amount by
which light stimulus intensity must be changed to produce a noticeable variation in
sensory experience. Let ΔL and L
a
denote the JND and the adaptation luminance,
respectively. In [2], Blackwell et al. showed that the ratio ΔL/L
a
varies as a function
of the adaptation level, L
a
and thus, established the relationship as:
0.4 2.5
( ) 0.0594 (1.219 ) .
aa
LL L Δ= ⋅ +
(17)
Simply, this equation states that if there is a patch of luminance L
a
+ ε where ε ≥
ΔL on a background of luminance L
a
, it will be discernible, but a patch of luminance
L
a
+ ε, where ε < ΔL will not be perceptible to the human eye.
Let us now consider the brightness perception. Brightness is the magnitude of the
subjective sensation which is produced by visible light and it is often approximated
as the logarithm of the luminance. Studies have shown that the brightness-luminance
relation depends on the adaptation level to the ambient light. In particular, Stevens et
al. [63] devised the ‘brils’ units to measure the subjective value of brightness.
According to them, one bril equals the sensation of brightness that is induced in a
fully dark-adapted eye by a brief exposure to a 5-degree solid-angle white target of 1
micro-lambert luminance. (One lambert is equal to 3183 cd/m
2
.) Let B denote the
brightness in brils, L the original luminance value in lamberts, and L
a
denote the
adaptation luminance of the eye. Then,
72
a
L
B
L
σ
λ =⋅
⎛⎞
⎜⎟
⎝⎠
(18-a)
where
10
0.4 log ( ) 2.92
a
L σ=⋅ +
2.0208 0.336
10
a
L λ=×
(18-b)
Figure. 25 shows the typical perceived brightness characteristic curves. The slope
of each curve represents the human contrast sensitivity that is the sensitivity of the
brightness perception to the changes in the luminance. Note that as L
a
is lowered, the
human contrast sensitivity decreases. Notice also that the HVS exhibits higher
sensitivity to changes in luminance in the darker regions of an image. According to
this figure, two images with different luminance values can result in the same
brightness values, and can appear to the HVS as being identical. As such, our eyes
are very poor judges of an absolute luminance value; all that we can judge is the ratio
of luminance, i.e. the brightness.
Figure. 25 Brightness vs. luminance characteristics of HVS.
73
4.2.2. Temporal Characteristics of HVS
Studies of the dynamics of light perception can be divided into two categories based
on the stimulation method utilized for the HVS characterization. The first category is
based on aperiodic stimuli, which tries to measure the impulse response of HVS. In
this experiment, the sensitivity of the HVS is measured when a brief test light is
presented to the HVS before, during, or after the presentation of a much stronger
background light. Although this experiment is very effective for understanding the
time domain response of the HVS, they are not suitable to quantify the distortion in
the backlight scaling domain in which the distortion is perceived by the HVS during
a video playback in that backlight scaling is applied to every video frame at a rate
that is too fast for the HVS to settle to some steady state background brightness.
The second category is based on the measurement of the Critical Fusion
Frequency (CFF) of the HVS at various amplitude sensitivity values (AS values.)
This concept is closely related to the well-known notion of temporal contrast
sensitivity function [12] [26]. The CFF is the minimum frequency, f
*
, above which
the observer cannot detect any flickering effect when a series of flash light at that
frequency is presented to him/her, i.e., for frequencies lower than f
*
, HVS will notice
flickering. The AS at frequency f
*
is the ratio of the minimum required amplitude of
the flash light at frequency f
*
to the average ambient luminance such that the
flickering effect is perceived [18]. Generally, as the AS increases, the flickering
becomes easier to perceive for a series of flash light at a fixed frequency. These
74
metrics are suitable for quantifying distortions caused by the backlight scaling since
they model the ‘flickering’ effect of series of backlight scaling in every frame.
4.3. Light Presentation in Display and HVS Perception
4.3.1. Light Exposure in CRT Monitors
To understand the principle of the backlight scanning, we need to understand how
CRT monitors display an image. The CRT monitors are known to have ‘motion blur
free’ nature and this nature stems from the fact that the image display in CRT
monitors is realized by a short exposure/emission of the beam light that consists of
three phases [15]; i) Flourescence (during which the light emits while the phosphor
is being struck by electrons), ii) Phospheresence (during which the light emits once
the electron beam is removed), iii) Persistence (from the time that the excitation is
removed to the time that phosphorescence has decayed to 10% from its initial light
output). Temporal summation of those three phases is around 70usec, which implies
that CRT monitors mostly display nothing but a dark screen while it briefly displays
impulse-type image(s) under the refresh rate of 60-120Hz.
4.3.2. Light-Evoked Responses in HVS
In [35], Kenkre et al. described the non-linearity of the HVS response when flash
light stimuli with fixed duration but with different intensity are presented to an
observer’s eyes under a dark-adapted condition. More precisely,
0
1
q
Q
DA MAX
rr e
− ⎛⎞
=− ⎜⎟
⎜⎟
⎝⎠
(19)
75
where r
DA
is the response amplitude (at a certain fixed measurement time t
0
after
flash delivery) and q is the flash light intensity, while r
MAX
and Q
0
are constants
representing maximal response amplitude at t
0
and the saturated intensity,
respectively. Essentially, equation (19) shows that light perception (i.e., response
amplitude) in HVS exponentially decreases depending on the flash light intensity, q.
Figure. 26 Sample responses in HVS.
Figure. 26 shows some sample responses in HVS [45] [66]; When flash light
stimuli with a fixed intensity but with different durations are presented under a dark-
adapted condition, the HVS responds with a certain duration (called visual
persistence) which is 10-100X longer than the duration of the flash light in
millisecond range [13] [16]. According to this phenomenon, the HVS converts the
impulse-type image display of CRTs which lasts about 70usec to a maximum of
76
7msec of light perception. Thereby under the industry standard frame rate of
30frames/sec for the CRTs, which allocates roughly 33msec per frame, an image
frame in a video sequence will be perceived as non-overlapping with its successor
frame. Even with frame rate up-conversion [1], e.g., 120Hz refresh rate, this visual
non-overlapping of successive frames in a CRTs mostly holds. Consequently, motion
blur does not occur in CRT monitors.
In contrast, scan-and-hold aspect of LCDs (cf. Figure. 1) always causes
overlapping of successive frames and this overlapping occurs even to the future
LCDs that have close to zero response time. As a consequence, instead of the efforts
to accelerate the response time of LC cells, it is desirable to devise a better
mechanism to mimic the impulse-type image representation to LCDs.
77
4.4. Proposed Backlight Control Technique I
4.4.1. Dynamic Backlight Scaling Problem
In the transmissive TFT-LCD monitor, for a pixel with value X, the luminance L(X)
of the pixel is represented as
() ( ) Lx b t x = ⋅ (20)
where t(x) is the transmissivity of the TFT-LCD cell for pixel value x, and b ∈[0,
1] is the normalized backlight illumination factor with b=1 representing the
maximum backlight illumination while b=0 representing no backlight. Note that t(x)
is a linear mapping from [0, 255] domain to [0, 1] range. In backlight scaled TFT-
LCD, simply b is scaled down and accordingly t(x) is increased to achieve the same
perceived image luminance.
Let X and X’= Φ(X, β) denote the original and the transformed image data,
respectively. Moreover, let D(X, X’) and P(X’, β) denote the distortion of the images
X and X’ and the power consumption of the LCD subsystem while displaying image
X’ with backlight scaling factor of β. Then, dynamic backlight scaling problem can
be formulated as: Given the original image X and the maximum tolerable image
distortion D
max
, find the scaling factor β and the corresponding pixel transformation
function X’= Φ(X, β) such that P(X’ , β) is minimized and D(X, X’) ≤ D
max
.
The general form of DBS problem is difficult to solve due to the complexity of the
distortion function, D(X, X’), and also the non-linear function minimization step that
is required to determine Φ(X, β). From now on, this dissertation will use Φ(X, β) to
78
denote the transformed ‘image’, and Φ(x, β) to denote a single transformed ‘pixel’ in
the image x with backlight scaling factor of β.
Figure. 27 Various pixel transformation functions.
Chang et al. [5] described two backlight luminance dimming techniques. Let x
denote the normalized pixel value, i.e., assuming an 8-bit color depth. The authors
scaled the backlight luminance by a factor of β while increasing the pixel values
from x to Φ(x, β) by two mechanisms. First, the “backlight luminance dimming with
brightness compensation” technique uses the following pixel transformation function
(cf. Figure. 27b):
(, ) min(1, 1 ) xx β β Φ= +− (21-a)
whereas the “backlight luminance dimming with contrast enhancement” technique
uses this transformation function of (cf. Figure. 27c):
79
(, ) min(1, )
x
x β
β
Φ=
(21-b)
According to these techniques, to calculate the distortion rate an image histogram
estimator is required for calculating the statistics of the input image. Note that the
image histogram simply denotes the marginal distribution function of the image pixel
values. The authors have used the number of saturated pixel values as their measure
of image distortion.
Cheng et al. [7] proposed a different approach in which the pixel values in both
dark and bright regions of the image are used to further dim the backlight. Their key
idea is to first truncate the image histogram on both ends to obtain a smaller dynamic
range for the image pixel values and then spread out the pixel values in this range so
as to enable a more aggressive backlight dimming while maintaining the contrast
fidelity of the image. Their pixel transformation function is given as:
0, 0
1 1
,;
1, 1
(, )
xg
l
g
l
cx d g x g c d
u
l
g g gg
u u
l l
gx
u
x
β
β
⎧
⎪ ≤≤
⎪
−
⎪
+≤≤ = = = ⎨
− ⎪ −
⎪
⎪
≤≤
⎩
Φ=
(22)
where (g
l
, 0) and (g
u
, 1) are the points where Φ(x, β) = cx + d intersects Φ(x, β) =
0 and Φ(x, β) = 1, respectively. In their technique, the image distortion is measured
based on the number of pixel values that are preserved after applying the
transformation function to the original image.
80
4.4.2. HVS Model and the Proposed Backlight Scaling
In the previous (frame-sensitive) backlight scaling techniques, the distortion between
each backlight scaled frame and the original frame, i.e. the spatial distortion, is upper
bounded by a user-specified maximum allowed value, D
max
. However, this spatial
distortion alone is not sufficient to quantify the quality of backlight scaled video. The
other crucial component is the large-scale changes in the luminance of the backlight-
scaled video compared to the original video, i.e. the temporal distortion. This latter
component captures the unwanted variations in overall luminance of frames in the
backlight scaled video sequence.
Figure. 28 Temporal model of HVS.
To model this temporal distortion, we adopt a computational temporal response
model of the HVS from Weigand et al. [68]. Figure. 28 depicts the schematic
representation of this model. This model can be used to determine the AS threshold
of an observer when presented with varying light intensities. Note that the input to
this model is a scalar value representing the absolute luminance of the viewing area
whereas the output is the intensity of perceived luminance. If the difference between
the DC value and the amplitude of a given frequency, f
*
, in the output of the model
exceeds a fixed predefined threshold then the flickering is perceived by the HVS at
81
frequency f
*
. In case of backlight scaling, for example, the input to the model is the
average luminance of each frame of video sequence whereas the output of the model
is the perceived intensity of the light. Basically, this model comprises of a cascade of
two Low Pass Filters (LPFs) and a dynamic gain control followed by the non-linear
characteristic of HVS which transforms the luminance values to perceived brightness
values. Parameters of the two LPFs are controlled by a first order LPF with cutoff
frequency near 0.1Hz. As described in [68], the control LPF has the following
transfer function of:
1
()
2 (1.59 ) 1
CLP
Hf
jf π
−
=
+
(23-a)
whereas the cascade of two LPFs have the general transfer function of,
22
12
222 2
111 2 22
()
LP
ff
Hf
f jd f f f f jd f f f
⎡⎤⎡ ⎤
=⋅
⎢⎥⎢ ⎥
+− + −
⎣⎦⎣ ⎦
(23-b)
with parameters given by,
()
0.5
1
2
1
2
( ) 1/ 1 ( ) /138.839
( ) 4.299 ( ) 11.65
( ) 24.36 ( ) 28.68
( ) 1.218 ( ) 0.616
( ) 1.003 ( ) 0.448
cc
c
c
c
c
zt r t
ft z t
ft z t
dt z t
dt z t
⎡ ⎤
=+
⎣ ⎦
=− ⋅ +
=− ⋅ +
=− ⋅ +
=− ⋅ +
(23-c)
Finally, the controllable gain is given by,
() ( )
0.641 0.5114
2.2 45.899 ( ) 0.001 ( )
cc
grt rt
−−
⎡⎤⎡ ⎤
=⋅ + ⋅ +
⎣⎦⎣ ⎦
(23-d)
where r
c
(t) is the control signal at the output of the LPF given in equation (23-a).
Figure. 29 shows a typical frequency domain transfer function of this model when
a simple sinusoidal input with constant amplitude and varying DC value is presented
82
to it. Note that for a fixed DC value, i.e. fixed background luminance, as we increase
the frequency of the (flickering) light stimuli the required amplitude for the
flickering to be perceived is almost constant for frequencies below 10Hz and then
increases exponentially beyond this frequency. Moreover, for a given frequency of
flash light by increasing the background luminance, the minimum amplitude of the
flash light for which the flickering is perceived decreases.
Figure. 29 AS of HVS for sinusoidal input with varying DC.
It is worth noting that this HVS model is based on subjective experiments, and
therefore, its accuracy varies from one individual to next. The key idea of this
dissertation is to underscore the significance of considering such human perceptive
models during the optimization process for backlight scaling. The commercial
implementation of such system can have multiple HVS models for users to select.
Let X and X’ denote the original and the backlight scaled versions of a video
sequence with total number of frames, N. We define the distortion between the two
video sequences X and X’ as:
83
()
{} { }
()
{}
()
2
2
( , ) ( ,) (1 ) ( ,)
() ( )
(, ) (1 )
()
max spt tmp
jj
j
spt i i
i
j
j
DXX D XX D XX
FV X F V X
max D X X
FV X
α α
αα
=⋅ + − ⋅
−
=⋅ + − ⋅
∑
∑
'' '
'
'
(24)
where D
spt
(X, X’) is the spatial distortion between still images, X and X’, V(.) is the
perceived brightness when an input is given to HVS, and F{.} is the fourier
transform operator. Finally, α is the weighting coefficient. The first term in equation
(24) captures the spatial distortion between the respective frames of video sequences
X and X’, while the second term is the normalized Mean Square Error (MSE)
between the spectral power densities of the brightness between the original and the
backlight scaled video sequences. Basically, this second term captures the temporal
distortion.
Unfortunately, the video distortion function presented in equation (24) is hard to
evaluate since the first term lies in the time domain while the second term lies in the
frequency domain. To circumvent this problem, equation (24) can be simplified
using the Parseval’s theorem which simply states that the integral of squared signal is
equal to the integral of its spectral power density, therefore;
84
{ } { }
{}
()
{} { } {} { }
{}
()
2
2
22
2
2
0
22
2
2
(() ())
(, )
()
{ ( ( ) ) ( ( ) ) 2 ( ( ))( ( ))}
()
1
(( ))
() ( )
12
11
(( )) (( ))
1
(( ))
() ( )
12
11
(( )) (
jj
j
tmp
j
j
jj j j
jj j
j
j
j
j j
jj
jj
j
j
j
j
FV X F V X
DXX
FV X
FV X F V X FV X F V X
FV X
VX
VX V X
N
VX V X
NN
VX
N VX VX
VX
NN
=
−
=
+−⋅ ⋅
=
⊗
=+ − ⋅
⋅
=+ − ⋅
∑
∑
∑∑ ∑
∑
∑
∑∑
∑
∑
00
'
'
''
'
'
'
'
2
())
j
j
VX
∑
(25)
Based on this video distortion model, we propose a Temporally-Aware Backlight
Scaling for video (TABS). Figure. 30 shows the block diagram of this approach. The
key idea is to measure the temporal distortion of backlight scaled video (D
tmp
(X, X’))
and then feedback this information to dynamically change the maximum allowed
spatial distortion of the frames (D
spt
(X, X’)). To measure the temporal distortion of
the video sequence, we follow the procedure as follows:
z In each original frame (X
t ) and backlight scaled frame (Y
t
) at time t, calculate the
mean brightness values,
t
X and
t
Y of all the pixels in the respective frames.
z Plug in the signals
t
X and
t
Y to the temporal model of HVS to get the filtered
yet perceived luminance signals,
HVS
t
X and
HVS
t
Y .
z Put
HVS
t
X and
HVS
t
Y to equation (24), and obtain D
tmp
(X, X’).
z Using D
tmp
(X, X’), modify the maximum allowed spatial distortion, D
spt
(X, X’).
85
Figure. 30 Temporally-Aware Backlight Scaling (TABS).
4.4.3. Experiment and Results
Figure. 31 shows our experimental setup that includes the Apollo test-bed II [78].
The Apollo runs with Intel XScale 80200 processor (733MHz) and has a 6.4-inch
LCD screen [77] with 6-bit color depth at 640x480 resolution, and provides enough
HW/SW capabilities for changing the intensity of the backlight and online
calculation of the image histogram. This processor speed is capable of decoding and
playing MPEG-1 movie stream at the rate of 15 frames/sec.
Figure. 31 Experimental setup.
For the implementation of the TABS algorithm, we modified the MPEG-1 decoder
application [74] to incorporate the aforesaid procedure with two observations, i)
86
HVS is a continuous-time model while the average luminance of each frame in the
video sequence is calculated on per-frame basis, resulting in a discrete-time signal.
Therefore, we transformed the continuous-time filters given by equations (23-a, 23-
d) to discrete-time filters, ii) Due to the limited output luminance range in the off-
the-shelf LCD, the output of the control LPF in Figure. 28 does not significantly
change. Hence, we assumed the output of the control LPF to be constant, i.e., r
c
(t) =
r
0
. This assumption results in a fixed quadratic filters and gain blocks in the temporal
model of HVS, and the resulting digital filter is represented as:
22
0.017 0.011
()
1.067 2.05 1.05 2.03
LP
Hf
zzzz
⎡⎤⎡ ⎤
=⋅
⎢⎥⎢ ⎥
−+ +− +
⎣⎦⎣ ⎦
(26)
Next, we implemented this digital filter and then calculated the output of the HVS
model for the average luminance value of each frame in the video sequence.
For the runtime calculation of temporal distortion, D
tmp
(X, X’), we approximated
the average value calculated in equation (25), by averaging the signal over limited
number of previous filtered average luminance values, i.e., approximating by a
moving average. Then, we used this value and the user-specified maximum allowed
video distortion, D
max
(X, X’), in equation (24) to calculate the maximum allowed
spatial distortion, D
spt
(X, X’). Next, we use D
spt
(X, X’) as the constraint for the TABS
to calculate the pixel transformation function for the backlight scaled image.
In the measurement of the video distortion that comes from the temporal
variations of the backlight and the pixel transformation function, we calculated the
average luminance value of each recorded frames to get the time domain signal.
Then, we calculated the HVS response of this signal by using the discrete-time
87
domain filter given in equation (26) and plotted the frequency domain representation
of the output. The resulting frequency domain plot will be shown later in Figure. 34.
Figure. 32 Power breakdown of our platform, Apollo.
For the capturing of temporal luminance changes of a video sequence in practice,
we ran each movie clip three times and recorded the output of the LCD using a
camcorder at the rate which was four times faster than the displayed frame rate: In
the first run, the original video sequence without any backlight scaling technique
applied was run; in the second run, we ran the video sequence with frame-sensitive
backlight scaling applied (to be specific, it is Histogram Equalized Backlight Scaling,
HEBS [31]), and in the third run, we ran the video sequence with TABS applied. The
power consumption of the LCD components, together with the system-wide power
consumption, were measured and recorded using a USB Data Acquisition System
(UDAS) [86] with a sampling rate of 4K Hz.
88
In Figure. 32, we compared the component-wide power consumption in our
platform with the normal condition and with TABS condition. For this specific
experiment, we used a movie clip (mermaid) and allow 15% of maximum distortion.
As shown, TABS approach achieves 6% system-wide power reduction, which
corresponds to 1Watt. This reduction mainly comes from the power reduction in the
backlight inverter. Note that the power figures on the other components remains the
same, implying that TABS approach does not incur any overhead in terms of power.
Figure. 33 Time domain variation of backlight luminance.
Figure. 33 depicts the time domain variations of the backlight scaling factor ( β) in
some parts of the first movie clip. Note that the strong spike in the HEBS curves is
eliminated in the TABS curve and overall curve for the TABS shows the smoother
backlight variations than the curve in the HEBS. As a result, we do not notice
unwanted flickering in the video sequence with TABS.
89
Figure. 34 Fourier transform of output video sequence.
Figure. 34 compares the temporal behavior of the three different movie clips
(‘Mermaid’, ‘Incredibles’, and ‘Toy Story’) under different maximum allowed
distortion, D
max
(X, X’). For each movie clip, note that the frequency domain plots in
the HEBS approach have a number of unwanted strong low frequency components
(which cause unwanted flickering) while the frequency domain plots in the TABS
approach show a frequency domain characteristic which are almost identical to those
in the original video sequence. Furthermore as we increase the maximum allowed
90
video distortion, the amplitude in most frequencies significantly increases in the
HEBS, whereas the amplitude in the TABS remains almost unchanged. The
amplitudes at each frequency imply the degree of perceived flickering at that
frequency.
Figure. 35 Energy savings in HEBS vs. TABS.
Figure. 35 shows the overall energy savings of the LCD subsystem in different
movie clips between HEBS and TABS. The aggressive energy-saving nature in the
HEBS, albeit it is blind to the temporal distortion, ends up with higher energy
savings in the HEBS than the TABS. However, note that the HEBS suffers from high
temporal distortions (cf. Figure. 34). In contrast, energy savings in the TABS
comparably high while its overall video quality is significantly improved (cf. Figure.
34).
91
4.5. Proposed Backlight Control Technique II
4.5.1. Proposed Backlight Scanning
Figure. 36 1-D LED backlight scanning idea.
Figure. 36 shows how each frame is displayed in LCDs; Typical LCD updates its
pixel value from the top to the bottom in a line-by-line fashion, e.g., lines in blue and
red. Let us denote the time taken to refresh all lines of a given screen by T
frame
.
frame
1
T
Refresh rate
=
(27)
Hence under the refresh rate of 60Hz, it roughly takes about 16msec to address all
the lines. From the moment of being addressed, each pixel in a line takes
LC
response
T to
finish its LC tilting in the worst case. During this time (times in the gradually-shaded
parallelogram areas in Figure. 36), pixel values are not settled so that they be
displayed; Instead they are in transition. Sluyterman et al. [59] proposed a proper
timing of backlight scanning in which they control the turn-on time of the backlight
such that only the pixels which have achieved their target grayscale values are
92
exposed to backlight, and hence, are displayed. The turn-off time of the backlight
needs to be right at the moment when the pixel values for the next frame are being
addressed. Such a backlight control mechanism can be achieved by generating Pulse
Width Modulation (PWM) signals that drive the LED backlights (shown in the
bottom of Figure. 36). Mathematically, the ON period of LEDs for the addressed
pixels within a frame period (denoted by T
on
) should satisfy the following equation:
()
LC
on frame response
TT T ≤−
(28)
Notice that the LCDs without backlight scanning use a T
on
which is equal to T
frame
.
Figure. 37 Proposed 1-D LED backlight scanning idea.
In Figure. 37, we make a pictorial comparison between the proposed scanning idea
with the idea of [59]; when a pixel value switches from 0 to 255 grayscales over a
sequence of frames (see the top drawing), their approach leads to turn on the
backlights when the pixel value achieve their target values and turn off the backlights
93
when the pixel values need to be refreshed (see the second drawing from the top). In
this case, motion blur artifact originated from the slow LC tilts can be avoided by
hiding the pixels in transition. However, this is not sufficient to eliminate the motion
blur artifact since the visual persistence explained earlier will cause the overlapping
of frames (see the third drawing from the top). In the next frame period, this
persistence will be harmless during the OFF period of the backlight (circled in gray)
but will be detrimental during the ON period (circled in orange) since, in the latter
case, visual persistence of the previous frame and the actual frame currently being
displayed will be overlapped, resulting in the motion blur artifact.
To overcome this, we propose to shorten the ON period of the backlight (the
second drawing from bottom) such that the overlapping of frames is avoided as much
as possible (bottom drawing) while the frame luminance is within a certain tolerance.
Figure. 38 Determination of the turn-on time of LEDs.
94
Figure. 38 shows how we shorten the turn-on time, T
on
, in equation (28) with the
consideration of two parameters: 1) response period of the LEDs, and 2) pixels in a
region of the LED array (Refer to the precise definition of the region). Figure. 38
starts with the top two drawings which correspond to the pixel value changes in the
first and the last lines in the same region. As a first parameter, the luminance change
for an LED cannot be done instantaneously, i.e., it can only be done as a ramp. This
is especially true when the LEDs are connected in series, as shown later in Figure. 40.
We illustrate this effect in the third drawing from the top by sloped ramp lines. When
we turn off the LEDs based on equation (28), during the LED’s response period of
falling (which is circled in orange), pixels in the first line are in transition, thus
resulting in the motion blur. To avoid this effect, T
on
should be shortened such that:
()
LC LED
on frame response response
TT T T ≤− −
(29)
where
LED
response
T is the falling response time of LEDs. The waveforms with this new
T
on
are shown in the fourth drawing from the top in Figure. 38.
This is still not enough and we need to consider the second parameter which
originates from the different granularity between the LED region and the pixel’s; not
all the pixels in a LED region switch their grayscale as shown in the top drawing.
Instead, pixels in the last line of the LED region switch grayscale as shown in the
second drawing; when we turn on the LEDs based on equation (29), during the
LED’s response period of rising (which is circled in green), pixels in the last line are
in transition, thereby resulting in the motion blur artifact. To avoid this effect, T
on
should be further shortened such that:
95
()
LC LED
on frame response response region
TT T T T ≤− − −
(30)
where T
region
is the time gap between the first and the last line updates in the LED
region. We show the waveforms with this T
on
in the bottom drawing in Figure. 38.
4.5.2. Proposed Backlight Local Dimming
The local backlight dimming technique with the aid of the 2-D white LED array in
the test platform uses the luminance (Y) values of the pixels. Indeed, we only have
access to the 24-bit RGB signals (which exist in LVDS format and FPGA) for each
pixel. To mitigate the computational complexity, we try to avoid converting the RGB
values to the YUV values (which is a straight-forward task, but is expensive to do in
HW). Instead, for all the pixels belonging to a region of the LED array (Refer to the
precise definition of the region), we estimate the average grayscale value (denoted by
G
avg
) by taking the MAX(R, G, B) for each pixel in the region, summing all these, and
dividing the sum by the total number of pixels in the region. Next, this G
avg
is used to
set the LED’s luminance (denoted by β) of the region using a linear function of;
( ) where ( )
avg
avg avg
G
fG fG
255
β== (31)
Given the value of G
avg
, equation (31) calculates the corresponding backlight level,
β, where 0≤ β≤1. This β linearly maps to the PWM duty ratio of 0~100%. Note that
in equation (31), β is a continuous function. However, it is not feasible to implement
a continuous function in digital systems therefore in our experiment β is discretized
into 11 different levels with a discretized luminance of;
96
avg
discrete
G
10
255
β
⎢ ⎥
=×
⎢ ⎥
⎣ ⎦
(32)
As a result of this local dimming, we maintain the highest perceived luminance
(when G
avg
=255) for the brightest areas of the image since the backlight level is set
to be the highest (100% PWM duty cycle). In contrast, we decrease the lowest
perceived luminance (when G
avg
=0) for the dark areas since the backlight is turned
off (0% PWM duty cycle), achieving the static contrast ratio increase.
Note that most backlight scaling technique s [5 ] [7 ] [3 1 ] [3 3 ] [3 4 ] a t t e m p t to
compensate for the perceived luminance loss by applying a pixel transformation
technique. When a single pixel transformation is applied to a still image, the result is
quite good. Unfortunately, when pixel transformation technique is applied to a
sequence of images, i.e., video, an undesirable flickering effect is often created [34]
since each image may use a different pixel transformation function, resulting in
potentially dramatic changes in values that a pixel assumes from one frame to next.
In addition to apply a pixel transformation technique to the LED array backlighting
technology, one must perform histogram analysis and determine the pixel
transformation function for each
led
N M Q = × region in the target frame, and
subsequently, apply the pixel transformation to all pixels of this region while
adjusting the LED backlight for that region. Such computations will slow down the
display process (by as much as 10~15%, as reported in [34]), resulting in the
degradation of the actual refresh rate.
97
Besides, pixel transformation requires extra HW resources (e.g., daughter board,
memory, FPGA) to save the aforesaid histograms, result of the pixel transformation,
etc. for each of the LED regions for every frame. Such overhead is costly. In addition,
digital signal processing techniques (e.g., LPFs in [34]) are needed to suppress both
the color break-up over the LED regions (i.e., spatially) and the video flickering (i.e.,
temporally). However, spatial (temporal) LPFs tend to lower the static (dynamic)
contrast ratio, which is again undesirable. Our proposed local dimming approach
avoids all these complications due to its simple nature and cost efficient realization.
Figure. 39 Test environment and circuit blocks in the panel.
4.5.3. Experiment and Results
For the test platform, we used a 40-inch Samsung LCD TV (Model LN-T4081F)
which is capable of rendering images in full HD (i.e., 1080p) and has up to 120Hz of
refresh rate. Like the other LCD TVs, this TVs requires a high brightness hence it
98
uses direct-light LCD panel (cf. Figure. 40). It also provides user-settable backlight
levels from 0 to 10. For the experiment, we disabled all built-in features of this TV,
implement the ideas, and measured/observed the performance in terms of power
dissipation, (static) contrast ratio, and the motion blur artifact. From here onwards,
we denote the proposed local dimming as ‘PLD’, proposed backlight scanning as
‘PBS’ and the original image as ‘Original’.
Figure. 40 LED arrangement in LCD Panel.
Figure. 40 shows the LED backlight arrangement in the platform; it has a total of
768 white LEDs and those LEDs are divided by 8 segments (2X4). Each segment has
8 regions (4X2) and every region has a string of 12 LEDs (4X3). The luminance of
those LEDs in a string is controlled as a whole where V
dd
of 120V is applied to the
LED shown in blue while variable voltage of 80-90V is applied to the LED shown in
red. As a result, peak-to-peak voltage of PWM signal is 40V and Supertex HV9980
chips [82] are the major component for this PWM signal generation.
99
The proposed backlight scanning and local dimming algorithms are implemented
in Xilinx FPGA (Spartan 3E) and HW was properly modified such that new PWM
signals, generated based on the proposed ideas, control the luminance of LEDs. We
also used a Klein K-10 colorimeter [83] and Extech Power analyzer [84] (Model #
380803) to measure the color parameters and the power consumption, respectively.
Indeed, we are able to greatly reduce the motion blur artifact with ‘PBS’,
unfortunately, we cannot produce quantitative results of ‘PBS’. Hence, in the
following we only compare the experimental results between ‘Original’ and ‘PLD’.
Figure. 41 PWM duty cycle changes across different backlight levels.
Figure. 41 shows the measured PWM duty cycle across 11 different backlight
levels for the ‘Original’ (curve in red) and ‘PLD’ (curve in blue). The curve in red
shows the PWM duty ratio for the ‘Original’ which is independent of the grayscale
values of the pixels. On the other hand, the curve in blue shows the PWM duty
cycles corresponding to different grayscale values for the ‘PLD’. The reason for
setting grayscale values differently is to show the nature of the ‘PLD’ technique
which scales the backlight levels based on the pixels’ grayscale value as explained
100
earlier. To be specific, 256 grayscales (G) are quasi-equally distributed across 11
backlight levels (BL), e.g., BL
0
= G
0
, BL
1
= G
25
, … , BL
10
= G
255
. As can be seen in
‘PLD’, the duty cycle at BL
10
is 100% while it is aggressively decreased as BL goes
down, eventually reaching 0%.
Figure. 42 Luminance changes in the ‘Original’ vs. ‘PLD’.
Figure. 42 shows the measured absolute luminance in the same experimental setup.
Although the luminance of the ‘Original’ and ‘PLD’ are the same at BL
10
, the
luminance of the ‘PLD’ goes down at a much faster rate compared to that of the
‘Original’ as we go toward BL
0
. Note that the comparison of those two curves does
not show the static contrast ratio enhancement of ‘PLD’ over ‘Original’ because the
grayscale settings in the two curves are different. However, if we set the whole
screen to G
0
in the ‘Original’, and then draw the curves, the curve for the ‘Original’
looks similar to the curve in ‘PLD’ but with some key differences especially in the
circled area, which proves the static contrast ratio enhancement of ‘PLD’. In our
experiment, ‘PLD’ improves the static contrast ratio at G
0
by a factor of 3 compared
to ‘Original’.
101
Figure. 43 shows the total power consumption in the TV. The upper two curves
correspond to the ‘Original’ while the bottom curve corresponds to ‘PLD’. Note that
the point-to-point difference at each BL on the upper two curves implies that
grayscale difference does cause a rather small difference in power dissipation. In
contrast, the difference between the upper curve(s) and the bottom curve in each BL
indicates the more significant power savings which is achieved in ‘PLD’.
Figure. 43 Power consumptions in the ‘Original’ vs. ‘PLD’.
Figure. 44 shows an exemplary power consumption of the LCD TV for a movie
clip named the ‘Iron Man’ [85]. As noticed, the power consumption of ‘Original’
(upper curve) is relatively constant at about 220Watt whereas it varies below the
original curve in ‘PLD’ (lower curve). The average power saving is 11% in this case.
Another power consumption result for the ‘Indiana Jones’ is shown in Figure. 45.
In this case, the average power saving is 10%. In both of Figure. 44 and Figure. 45,
the relatively large gaps between the two curves at some time instances imply that
the frame luminance at that time instance was relatively low thus we could locally
dim the backlight and save sizeable amount of power consumption.
102
Figure. 44 Power consumption of the ‘Iron Man’.
Figure. 45 Power consumption of the ‘Indiana Jones’.
103
Chapter 5 CONCLUSION
This dissertation started by presenting a dynamic thermal management (DTM)
technique for MPEG-2 decoding, which allows some degree of spatiotemporal
image/video quality degradation. Given a target MPEG-2 decoding time, this
technique dynamically selects either an intra-frame spatial degradation or an inter-
frame temporal degradation strategy so as to make sure that the microprocessor chip
will continue to stay in a thermally safe state of operation, albeit with certain amount
of image/video quality loss.
The dissertation next described an activity migration based DTM technique for
register files that explores program behavior in term of register file utilization such
that we pay a minimal performance penalty while reducing the steady-state
temperature. The proposed idea does not introduce any redundant functional block,
does not incur processor-wide performance penalty in terms of high IPC degradation.
The dissertation contained description of a novel DTM technique based upon the
(per-frame decoding) slack time estimation and its distribution in the GOP to achieve
thermally safe state of operation in microprocessors during MPEG-2 decoding. This
idea incorporates DVFS with the consideration of per-frame decoding deadline and
temperature constraints in the GOP.
The dissertation described a temporally-aware backlight scaling technique for
video, which minimizes spatiotemporal distortion in perceived brightness between
the original and the backlight-scaled images in MPEG video streams, based upon the
104
analysis of HVS characteristics. Consequently, the perceived flickering in video is
dramatically reduced while maximizing the energy savings.
The dissertation concluded by introducing two new backlight control techniques: a
1-D LED backlight scanning technique and a 2-D LED backlight local dimming
technique. The former technique reduces the motion blur artifact by effectively
breaking the scan-and-hold style image display in LCDs while the latter enhances the
static contrast ratio by locally dim the backlights. In both cases, along with their
video/image quality enhancement, power savings are achieved as well.
Extensions of this dissertation include use of micro-architectural reconfiguration
on-the-fly to control the temperature rise and dynamic backlight scaling for new
display technologies.
105
REFERENCES
[1] E. B. Bellers, J. G. M. Janssen, and M. Penners, “Motion Compensated Frame
Rate Conversion for Motion Blur Reduction,” SID International Symposium
Digest, pp. 1454-1457, May, 2007.
[2] H. R. Blackwell, “Contrast Thresholds of Human Eye,” Journal of the Optical
Society of America, Vol. 11, No. 36, pp. 624-643, November, 1946.
[3] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for
Architectural-Level Power Analysis and Optimizations,” IEEE/ACM
International Symposium on Computer Architecture, pp. 83-94, June, 2000.
[4] D. Brooks and M. Martonosi, “Dynamic Thermal Management for High-
Performance Microprocessors,” IEEE International Symposium on High-
Performance Computer Architecture, pp. 171-182, January, 2001.
[5] N-H. Chang, I-S. Choi, and H-J. Shim, “DLS: Dynamic Backlight Luminance
Scaling of Liquid Crystal Display,” IEEE Trans. on Very Large Scale Integration
Systems, Vol. 12, No. 8, pp. 837-846, August, 2004.
[6] W-C. Cheng and M. Pedram, “Chromatic Encoding: A Low Power Encoding
Technique for Digital Visual Interface,” IEEE Trans. on Consumer Electronics,
Vol. 50, No. 1, pp. 320-328, February, 2004.
[7] W-C. Cheng and M. Pedram, “Transmittance Scaling for Reducing Power
Dissipation of a Backlit TFT-LCD,” Ultra Low-power Electronics and Design,
Enrico Macii, Kluwer Academic Publisher, June, 2004.
[8] H. Chen, J-H. Sung, T-H. Ha, and Y-J. Park, “Locally Pixel-Compensated
Backlight Dimming for Improving Static Contrast on LED Backlit LCDs,” SID
International Symposium Digest, pp. 1339-1342, May. 2007.
[9] K-H. Choi, K. Dantu, W-C. Cheng, and M. Pedram, “Frame Based Dynamic
Voltage and Frequency Scaling for a MPEG Decoder,” IEEE/ACM International
Conference on Computer Aided Design, pp. 732-737, November, 2002.
[10] K-H. Choi, R. Soma, and M. Pedram, “Off-chip Latency Driven Dynamic
Voltage and Frequency Scaling for an MPEG Decoding,” IEEE/ACM Design
Automation Conference, pp. 544-549, June, 2004.
106
[11] S-W. Chung and K. Skadron, “Using On-chip Event Counters for High
Resolution, Real-time Temperature Measurement,” IEEE Intersociety
Conference on Thermal and Thermo-mechanical Phenomena in Electronics
Systems, pp. 114-120, June, 2006.
[12] B. H. Crawford, “Visual Adaptation in Relation to Brief Conditioning Stimuli,”
Royal Society, Series B, Vol. 134, No. 875, pp. 283-302, March, 1947.
[13] T. Euler and R. H. Masland, “Light-Evoked Responses of Bipolar Cells in a
Mammalian Retina,” Journal of Neurophysiology, Vol. 83, No. 4. pp. 1817-1829,
April, 2000.
[14] N. Fisekovic and J. Bruinink, “Scanning Backlight Parameters for Achieving the
Best Picture Quality in AM LCD,” Euro Display, pp. 533-536, May. 2002.
[15] J. D. Foley, A. V. Dam, S. K. Feiner, and J. F. Hughes, “Computer Graphics:
Principles and Practice,” Addison-Wesley, June, 1990.
[16] C. Friedburg, M. M. Thomas, and T. D. Lamb, “Time Course of the Flash
Response of Dark- and Light- Adapted Human Rod Photoreceptors Derived from
the Electroretinogram,” Journal of Physiology, Vol. 534, No. 1, pp. 217-242,
July, 2001.
[17] F. Gatti, A. Acquaviva, L. Benini, and B. Ricco, “Low Power Control
Techniques for TFT LCD Displays,” ACM International Conference on
Compilers, Architecture, and Synthesis for Embedded Systems, pp. 218-224,
October, 2002.
[18] N. Graham and D. C. Hood, “Modeling the Dynamics of Light Adaptation: The
Merging of Two Traditions,” Vision Research, Vol. 32, No. 7, pp. 1373-1393,
July, 1992.
[19] P. D. Greef and H. G. Hulze, “Adaptive Dimming and Boosting Backlight for
LCD-TV Systems,” SID International Symposium Digest, pp. 1332-1335, May.
2007.
[20] P. D. Greef and H. G. Hulze, “Adaptive Scanning, 1-D Dimming, and Boosting
Backlight for LCD-TV Systems,” Journal of SID, Vol. 14, No. 12, pp. 1103-
1110, December, 2006.
[21] S. H. Gunther, F. Binns, D. M. Carmean, and J. C. Hall, “Managing the Impact
of Increasing Microprocessor Power Consumption,” Intel Technology Journal
Q1, pp. 1-9, 2001.
107
[22] Y. Han, I. Koren, and C. M. Krishna, “Temptor: A Lightweight Runtime
Temperature Monitoring Tool Using Performance Counters,” Workshop on
Temperature-Aware Computer Systems, June, 2006.
[23] Y. Han, I. Koren, and C. A. Moritz, “Temperature Aware Floor-planning,”
Workshop on Temperature-Aware Computer Systems, June, 2005.
[24] S-M. Heo, K. Barr, and K. Asanovic, “Reducing Power Density through Activity
Migration,” IEEE/ACM International Symposium on Low Power Electronics and
Design, pp. 217-222, August, 2003.
[25] J-I. Hirakata, A. Shingai, Y. Tanaka, K. Ono, and T. Furuhashi, “Super-TFT
LCD for Moving Picture Images with the Blink Backlight System,” SID
International Symposium Digest, pp. 990-993, May, 2001.
[26] D. C. Hood, “Lower-Level Visual Processing and Models of Light Adaptation,”
Annual Review of Psychology, Vol. 49, pp. 503-535. 1998.
[27] W. Huang, M. R. Stan, K. Skadron, and K. Sankaranarayanan, “Compact
Thermal Modeling for Temperature-Aware Design,” IEEE/ACM Design
Automation Conference, pp. 878-883, June, 2004.
[28] W. Huang, M. R. Stan, and K. Skadron, “Parameterized Physical Compact
Thermal Modeling,” IEEE Trans. on Components, Packaging, and
Manufacturing Technology, Vol. 28, No. 4, pp. 615-622, December, 2005.
[29] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R.
Stan, “HotSpot: A Compact Thermal Modeling Methodology for Early-Stage
VLSI Design,” IEEE Trans. on Very Large Scale Integration Systems, Vol. 14,
No. 5, pp. 501-513, May, 2006.
[30] H-C. Hung and C-W. Shih, “Improvement in Moving Picture Quality by
Scanning Backlight System,” International Display Manufacturing Conference,
pp. 472-474, February, 2005.
[31] A. Iranli, H. Fatemi, and M. Pedram, “HEBS: Histogram Equalization for
Backlight Scaling,” IEEE/ACM Design Automation and Test in Europe, pp. 346-
351, March, 2005.
[32] A. Iranli and M. Pedram, “DTM: Dynamic Tone Mapping for Backlight
Scaling,” IEEE/ACM Design Automation Conference, pp. 612-617, June, 2005.
[33] A. Iranli, W-B. Lee, and M. Pedram, “Backlight Dimming in Power-Aware
Mobile Displays,” IEEE/ACM Design Automation Conference, pp. 604-607, July,
2006.
108
[34] A. Iranli, W-B. Lee, and M. Pedram, “HVS-Aware Dynamic Backlight Scaling
in TFT-LCDs,” IEEE Trans. on Very Large Scale Integration Systems, Vol. 14,
No. 10, pp. 1103-1116, October, 2006.
[35] J. S. Kenkre, N. A. Moran, T. D. Lamb, and O. A. R. Mahroo, “Extremely Rapid
Recovery of Human Cone Circulating Current at the Extinction of Bleaching
Exposures,” Journal of Physiology, 567.1. pp. 95-112, June, 2005.
[36] T-S. Kim, B-I. Park, B-H. Shin, B. H. Berkeley, and S-S. Kim, “Response Time
Compensation for Black Frame Insertion,” SID International Symposium Digest,
June, 2006.
[37] A. Kumar, L. Shang, L-S. Peh, and N. K. Jha, “HybDTM: A Coordinated
Hardware-Software Approach for Dynamic Thermal Management,” IEEE/ACM
Design Automation Conference, pp. 548-553, July, 2006.
[38] E. Kursun, G. Reinman, S. Sair, A. Shayesteh, and T. Sherwood, “Low-Overhead
Core Swapping for Thermal Management,” Lecture Notes on Computer Science,
pp. 46-60, December, 2005.
[39] K-J. Lee and K. Skadron, “Using Performance Counters for Runtime
Temperature Sensing in High-Performance Processors,” Workshop on High-
Performance, Power-Aware Computing, April, 2005.
[40] W-B. Lee, K. Patel, and M. Pedram, “Dynamic Thermal Management for
MPEG-2 Decoding,” IEEE/ACM International Symposium on Low Power
Electronics and Design, pp. 316-321, October, 2006.
[41] W-B. Lee, K. Patel, and M. Pedram, “GOP-Level Dynamic Thermal
Management in MPEG-2 Decoding,” IEEE Trans. on Very Large Scale
Integration Systems, 2008.
[42] B-W. Lee, C-W. Park, S-I. Kim, M-B. Jeon, J. Heo, D-S. Sagong, J-S. Kim, and
J-H. Souk, “Reducing Gray-Level Response to One Frame: Dynamic
Capacitance Compensation,” SID International Symposium Digest, pp. 1260-
1263, June, 2001.
[43] C. H. Lim, W. R. Daasch, and G. Cai, “A Thermal-Aware Superscalar
Microprocessor,” IEEE/ACM International Symposium on Quality Electronic
Design, pp. 517-522, March, 2002.
[44] C. T. Lie, A. Wang, H-J. Hong, Y-J. Hsieh, M-S. Lai, A. Tsai, T-M. Wang, M-J.
Jou, W-C. Chang, S-L. Sui, J-H. Liao, M-F. Tien, “Color and Image
Enhancement for Large-Size TFT-LCD TVs,” SID International Symposium
Digest, pp. 1730-1733, May, 2005.
109
[45] O. A. R. Mahroo and T. D. Lamb, “Recovery of the Human Photopic
Electroretinogram after Bleaching Exposures: Estimation of Pigment
Regeneration Kinetics,” Journal of Physiology, pp. 417-437, October, 2003.
[46] V. G. Moshnyaga and E. Morikawa, “LCD Display Energy Reduction by User
Monitoring,” IEEE International Conference on Computer Design, pp. 94-97,
October, 2005.
[47] E-Y. Oh, S-H. Baik, M-H. Sohn, K-D. Kim, H-J. Hong, J-Y. Bang, K-J. Kwon,
M-H. Kim, H. Jang, J-K. Yoon, and I-C. Jung, “IPS-Mode Dynamic LCD-TV
Realization with Low Black Luminance and High Contrast by Adaptive Dynamic
Image Control Technology,” Journal of SID, Vol. 13, No. 3, pp. 215-219, March,
2005.
[48] K. Patel, W-B. Lee, and M. Pedram, “Active Bank Switching for Temperature
Control of the Register File in a Microprocessor,” IEEE/ACM Great Lake
Symposium on VLSI, pp. 231-234, March, 2007.
[49] M. Pedram, and S. Nazarian, “Thermal Modeling, Analysis, and Management in
VLSI Circuits: Principles and Methods,” IEEE, Vol. 94, No. 8, August, 2006.
[50] S. Salerno, A. Bocca, E. Macii, and M. Poncino, “Limited Intra-Word Transition
Codes: An Energy-Efficient Bus Encoding for LCD Display Interfaces,”
IEEE/ACM International Symposium on Low Power Electronics and Design, pp.
206-211, August, 2004.
[51] K. Sankaranarayanan, S. Velusamy, M. R. Stan, and K. Skadron, “A Case for
Thermal-Aware Floor-planning at the Micro-architectural Level,” Journal of
Instruction Level Parallelism, Vol. 7, October, 2005.
[52] T. Simunic, L. Benini, P. Glynn, G. D. Micheli, “Event-Driven Power
Management,” IEEE Trans. on Computer-Aided Design of Integrated Circuits
and Systems, Vol. 20, No. 7, pp. 840-857, July, 2001.
[53] K. Skadron, “Hybrid Architectural Dynamic Thermal Management,” IEEE/ACM
Design Automation and Test in Europe, pp. 10-15, February, 2004.
[54] K. Skadron, M. R. Stan, W. Huang, and S. Velusamy, “Temperature-Aware
Microarchitecture,” IEEE/ACM International Symposium on Computer
Architecture, pp. 2-13, June, 2003.
[55] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D.
Tarjan, “Temperature-Aware Computer Systems: Opportunities and Challenges,”
IEEE/ACM International Symposium on Microarchitecture, pp. 52-61,
November / December, 2003.
110
[56] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D.
Tarjan, “Temperature-Aware Micro-architecture: Modeling and
Implementation,” ACM Trans. on Architecture and Code Optimization, Vol. 1,
No. 1, pp. 94-125, March, 2004.
[57] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D.
Tarjan, “Temperature-Aware Micro-architecture: Extended Discussion and
Results,” University of Virginia Tech Report: CS-2003-08, 2003.
[58] K. Skadron, T. Abdelzaher, and M. R. Stan, “Control-Theoretic Techniques and
Thermal-RC Modeling for Accurate and Localized Dynamic Thermal
Management,” IEEE International Symposium on High-Performance Computer
Architecture, pp. 17-28, February, 2002.
[59] A. A. S. Sluyterman and H. J. G. Gielen, “Architectural Choices in the Aptura
Scanning Backlight for Large LCD TVs,” Journal of SID, Vol. 14, No. 2. pp.
169-174, February, 2006.
[60] D. Son, C. Yu, and H. Kim, “Dynamic Voltage Scaling on MPEG Decoding,”
IEEE International Conference on Parallel and Distributed Systems, pp. 633-640,
June, 2001.
[61] J. Srinivasan and S. V. Adve, “Predictive Dynamic Thermal Management for
Multimedia Applications,” AMC International Conference on Supercomputing,
pp. 109-120, June, 2003.
[62] M. R. Stan, K. Skadron, M. Barcella, W. Huang, K. Sankaranarayanan, and S.
Velusamy, “HotSpot: A Dynamic Compact Thermal Model at the Processor-
Architecture Level,” Microelectronics Journal: Circuits and Systems, Vol. 34,
No. 12, pp. 1153-1165, December, 2003.
[63] S. S. Stevens and J. C. Stevens, “Brightness Function: Effects on Adaptation,”
Journal of the Optical Society of America, Vol. 53, No. 3, pp. 375-385, March,
1963.
[64] L. Tran, N. Nelson, F. Ngai, S. Dropsho, and M. Huang, “Dynamically Reducing
Pressure on the Physical Register File through Simple Register Sharing,” IEEE
International Symposium on Performance Analysis of Systems and Software, pp.
78-87, March, 2004.
[65] M. Verderber, A. Zemva, and A. Trost, “HW/SW Codesign of the MPEG-2
Video Decoder,” IEEE International Symposium on Parallel and Distributed
Processing, pp. 1-7, April, 2003.
111
[66] J. Verweij, D. M. Dacey, B. B. Peterson, and S. L. Buck, “Sensitivity and
Dynamic of rod Signals in H1 Horizontal Cells of the Macaque Monkey Retina,”
Vision Research, Vol. 39, No. 22, pp. 3662-3672, November, 1999.
[67] A. Weissel and F. Bellosa, “Dynamic Thermal Management for Distributed
Systems,” Workshop on Temperature-Aware Computer Systems, June, 2004.
[68] T. E. Wiegand, D. C. Hood, and N. Graham, “Testing a Computational Model of
Light-Adaptation Dynamics,” Vision Research, Vol. 35, No. 21, pp. 3037-3051,
November, 1995.
[69] H. Wu, M. Claypool, and R. Kinicki, “Guidelines for Selecting Practical MPEG
Group of Pictures,” IASTED International Conference on Internet and
Multimedia Systems and Application, pp. 61-66, August, 2006.
[70] Y. Yokoyama, “Adaptive GOP Structure Selection for Real-time MPEG-2 Video
Decoding,” IEEE International Conference on Image Processing, pp. 832-835,
September, 2000.
[71] HotSpot at http://lava.cs.virginia.edu/HotSpot
[72] Mediabench at: http://euler.slu.edu/~fritts/mediabench
[73] MPEG-2 Standard: International Organization for Standardization/International
Electro-technical Commission (ISO/IEC) 13818-2.
[74] Berkeley MPEG at http://bmrc.berkeley.edu/frame/research/mpeg
[75] Simplescalar tutorial at http://www.simplescalar.com
[76] MPEG-2 Streams at http://www.mpeg2.de/video/streams
[77] LG Philips, LP064V1 Liquid Crystal Display.
[78] http://apollo.usc.edu/testbed/
[79] ANSI/IES. 1986. Nomenclature and Definitions for Illuminating Engineering,
ANSI/IES RP-16-1986. New York, NY: Illuminating Engineering Society of
North America.
[80] Pentium IV floor-plan at: http://www.chip-architect.com
[81] SPEC2000INT benchmark at: http://www.spec.org/cpu
[82] HV9980 chip at http://www.supertex.com
[83] K-10 colorimeter at http://www.kleininstruments.com/k_10.html
112
[84] Power analyzer at http://ww.extech.com/
[85] Movie trailers at http://www.comingsoon.net/trailers/
[86] USB data acquisition system at http://www.lanpoint.com/
Abstract (if available)
Abstract
The desire for the low power dissipation in the electronic systems and the ensuring thermal safety for the microprocessor chips have brought about a paradigm shift from the traditional performance-oriented design to the power-temperature-oriented one. To address some of the critical problems arising from this paradigm shift, this dissertation introduces a number of low-power design and temperature control strategies. The first part of the dissertation presents three dynamic thermal management (DTM) techniques for the microprocessor chips. The first technique, which is targeted to MPEG-2 decoding, allows some degree of spatiotemporal quality degradation in the video stream to ensure that the die temperature stays in a safe range. The second DTM technique, which specifically targets the register file, periodically switches the load/store operations from one register file bank to another in order to keep the register file temperature below a critical threshold. The third DTM technique, which is again targeted at MPEG-2 decoders, is based on (i) accurate estimation of the workload caused by various frames in a Group Of Pictures (GOP), (ii) slack borrowing across the GOP frames, (iii) employing Dynamic Voltage and frequency scaling (DVFS) while considering the frame-rate-dependent GOP deadline, variance of the frame decoding times within the GOP, and a maximum chip temperature constraint.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Stochastic dynamic power and thermal management techniques for multicore systems
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Power efficient design of SRAM arrays and optimal design of signal and power distribution networks in VLSI circuits
PDF
Thermal modeling and control in mobile and server systems
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Dynamic packet fragmentation for increased virtual channel utilization and fault tolerance in on-chip routers
PDF
Demand based techniques to improve the energy efficiency of the execution units and the register file in general purpose graphics processing units
PDF
Dynamically reconfigurable off- and on-chip networks
PDF
Design of low-power and resource-efficient on-chip networks
PDF
Performance improvement and power reduction techniques of on-chip networks
PDF
Thermal analysis and multiobjective optimization for three dimensional integrated circuits
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Improving the efficiency of conflict detection and contention management in hardware transactional memory systems
PDF
Low power and reliability assessment techniques for advanced processor design
PDF
An FPGA-friendly, mixed-computation inference accelerator for deep neural networks
PDF
In-situ digital power measurement technique using circuit analysis
PDF
Quantum molecular dynamics and machine learning for non-equilibrium processes in quantum materials
PDF
Data-driven image analysis, modeling, synthesis and anomaly localization techniques
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Low cost fault handling mechanisms for multicore and many-core systems
Asset Metadata
Creator
Lee, Wonbok
(author)
Core Title
Thermal management in microprocessor chips and dynamic backlight control in liquid crystal diaplays
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/31/2008
Defense Date
05/14/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
backlight local dimming,backlight scanning,dynamic thermal management,low power,OAI-PMH Harvest
Language
English
Advisor
Pedram, Massoud (
committee chair
), Draper, Jeffrey T. (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
wonbokle@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1477
Unique identifier
UC1419143
Identifier
etd-Lee-20080731 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-96728 (legacy record id),usctheses-m1477 (legacy record id)
Legacy Identifier
etd-Lee-20080731.pdf
Dmrecord
96728
Document Type
Dissertation
Rights
Lee, Wonbok
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
backlight local dimming
backlight scanning
dynamic thermal management
low power