Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Adaptive resource management in distributed systems
(USC Thesis Other)
Adaptive resource management in distributed systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADAPTIVE RESOURCE MANAGEMENT IN DISTRIBUTED SYSTEMS
by
Abhishek Bhan Sharma
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2010
Copyright 2010 Abhishek Bhan Sharma
Acknowledgements
The Aneld failthfuls sing You'll Never Walk Alone to urge their team, Liverpool F.C., to glory on the
football pitch. In football, the success of a team springs from the efforts of its players, coaches/managers,
and its supporters. Similarly, I owe a tremendous amount of gratitude to my team for the completion
of my graduate career. For the past seven years (two at Boston University, and the rest at USC), several
people have played the role of coaches, team mates, and the supporters in my life.
First, the coaches. My advisors, Dr. Ramesh Govindan and Dr. Leana Golubchik, were terric
mentors. They taught me the art of an intellectual pursuit: from grasping the big picture to obsession with
getting the important details right. Their condence in my abilities and their patience with my failures
enabled me to succeed beyond my own expectations. I would also like to thank Dr. Murat Alanyali, Dr.
Azer Bestavros, and Dr. Ibrahim Matta from Boston University for their guidance and encouragement.
I am also grateful to several of my team mates. The folks at the Embedded Networks Lab at USC
Marcos, Nupur, Sumit, Om, Ki-Young, Jeongyeup, Moo-Ryong, Young-jin, Ramki, Nilesh, Shuai, and
Luis were not only an encyclopedia of information, but also excellent collaborators (particularly,
Jeongyeup and Moo-Ryong), and great friends. I also had the opportunity to collaborate on exciting
research with Dr. Michael Neely, Dr. Hui Zhang, Dr. Ranjita Bhagwan, and Yuan Yao.
My family has been my most ardent supporters. None of the wonderful opportunities and success
that I have had would have been possible without the love and support of my parents, Devanand and
Bimla Sharma, and their tremedous sacrices. I am also very fortunate to have ma (Girija Bhan), and
papa (M. K. Bhan) in my life. Not only is their love for me unconditional and boundless, but they are
also exceptional life coaches. My other wonderful supporters include Manisha, Pratap, Aayu, Gautam,
ii
Shereen, Arun and Neera. I am also thankful to have Prachi and Sandeep as my go to people aside
from showering me with lots of love and affection, they always waved their magic wands and solved my
off the pitch problems time and again.
Finally, I would like to thank the best twelfth (wo)man ever. Sonam has been a constant source of
love, support, and inspiration in my life ever since I rst met her. She brings both love and balance to my
life, and I look forward to discovering the exciting things that lie ahead for us in future.
iii
Table of Contents
Acknowledgements ii
List of Tables vi
List of Figures vii
Abstract ix
Chapter 1: Introduction 1
Chapter 2: Literature Review 5
2.1 Resource management for computing-as-a-service . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Workload characterization in Internet Services . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Data compression and energy savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Sensor data fault detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 3: An Alternative Service Model for Shared Clusters 16
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 MRM Goals and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Service model: Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Service model: Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Estimating a job's processing time . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Price-deadline negotiator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Deadline-aware scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.4 Post-facto price adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.5 MRM's online monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 MRM: Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.3 Comparison with Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 4: Automatic Request Categorization in Internet Services 44
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.1 Experimental methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.2 Linearity validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.3 Request categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
iv
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.1 Accuracy improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.2 Workload characterization granularity . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.3 Alternatives to ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.4 Request Arrivals: Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5.5 Linear model applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.6 Data collection issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 5: Dynamic Data Compression in Multi-hop Wireless Networks 62
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Optimization Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 The SEEC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4.1 Design of SEEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4.2 Performance Bounds on SEEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 DSEEC: Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6.2 Energy Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6.3 Adapting to Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6.4 Sensitivity to V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Chapter 6: Sensor Faults: Detection Methods and Prevalence in Real-World Datasets 100
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Sensor Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.1 What causes sensor faults ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.1 Rule-based (Heuristic) Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.2 An Estimation-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.3 A Time Series analysis based Method . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.4 A Learning-based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.5 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4 Evaluation: Injected Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.4.1 SHORT Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4.2 NOISE Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4.2.1 Impact of Fault Duration . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4.2.2 Impact of Fault Intensity . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.2.3 ARIMA modelImpact of parameter L . . . . . . . . . . . . . . . . . . 126
6.4.3 Other Hybrid methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.5 Faults in Real-World data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.5.1 SensorScope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5.2 INTEL Lab, Berkeley data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.5.3 Great Duck Island (GDI) data set . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5.4 NAMOS data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Chapter 7: Conclusion 143
References 144
v
List of Tables
3.1 Cluster congurations used in MRM evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Workload used in MRM evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Deadline violations with MRM in the no-slack scenario. . . . . . . . . . . . . . . . . . . . . . 39
3.4 MRM service differentiation: nish time (in minutes) percentile . . . . . . . . . . . . . . . . . . 39
3.5 Comparison: Delay scheduler, priority scheduler, and MRM . . . . . . . . . . . . . . . . . . . . 41
4.1 Per-request CPU demand estimates for the PetShop benchmark (in milliseconds). . . . . . 54
5.1 System Model Notation: symbols t and n denote the current time and a node, respectively. 69
5.2 Data Collection Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 DSEEC, deep-tree: % of packets compressed; change in application load and link quality . 97
6.1 Fault Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2 Low Intensity NOISE faults, NAMOS, Percentage of faulty samples detected . . . . . . . 128
6.3 Low intensity NOISE faults, Percentage of faulty samples detected . . . . . . . . . . . . . 129
6.4 False Positives as % of total # samples (2880) . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 Real-world datasets and Detection methods . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.6 SensorScope: SHORT faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.7 Intel Lab: SHORT Faults, Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.8 Datasets: Prevalence of faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.9 Detection methods: Performance on injected SHORT faults . . . . . . . . . . . . . . . . . 141
6.10 Detection methods: Performance on injected NOISE faults . . . . . . . . . . . . . . . . . 141
vi
List of Figures
3.1 MRM in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Price vs. slack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Premium factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Example of MRM's estimation of earliest nish time for a new job. . . . . . . . . . . . . . 30
3.5 Accuracy of MRM's processing time estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Job nish times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Job completion times with different schedulers. . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 IIS and SQL server CPU usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 PetShop: Request estimate comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1 Cluster-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Shallow-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 Deep-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 Energy consumption: never compress (rst), SEEC-WT (second), DSEEC (third), always compress
(fourth) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Duty-cycling overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Errors in sensor readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 NOISE fault: Increase in variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3 Histogram Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 LLSE on NAMOS dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.5 Injected SHORT faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
vii
6.6 NOISE Fault: 3000 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.7 NOISE Fault: 2000 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.8 NOISE Fault: 1000 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.9 NOISE Fault: 100 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.10 NOISE, ARIMA: 60 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.11 NOISE, ARIMA: 120 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.12 NOISE, ARIMA: 720 samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.13 High intensity, Long duration NOISE faults . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.14 CONSTANT and NOISE faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.15 Intel data set: NOISE faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.16 Intel data set: Prevalence of NOISE faults . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.17 Intel data set: ARIMA methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.18 SHORT Faults in GDI data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.19 SHORT Faults: Light Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.20 NAMOS data set: NOISE/CONSTANT faults . . . . . . . . . . . . . . . . . . . . . . . . 137
viii
Abstract
In this dissertation, we focus on resource management in distributed systems. The essence of resource
management is to match the requirements of computing tasks with the available resources. We propose
and develop approaches to resource management in three qualitatively different systems: (1) server clus-
ters providing computing-as-a-service, (2) tiered-architecture (of servers) hosting web services, and (3)
networks of wireless sensors. These systems differ from each other along multiple dimensions: available
resources, system dynamics, workload, etc. Still, a common theme in effective resource management for
these systems (as demonstrated in this dissertation) is that we must be cognizant of the system heterogeneity
(computing resources as well as workload), and adapt to system dynamics.
Our work improves upon the state-of-the-art in the three systems in the following way. For systems
providing computing-as-a-service, we design and implement a service model that provides predictability
in job nish times and prioritized service to delay sensitive jobs. We also develop a machine learning
based workload characterization technique for web services that categorizes users' request based on their
resource usage. Such categorization is useful in improving the accuracy of performance models for these
systems. In the context of wireless sensor networks, we make the following two contributions: (1) we
design an online algorithm that makes joint compression and transmission decisions to save energy, and
(2) we explore techniques for detecting anomalies in data collected using these networks.
ix
Chapter 1
Introduction
Distributed computing systems are the underlying IT infrastructure providing several applications and
services that we have come to rely on. Ubiquitous web based services (search, e-mail, social networking,
e-commerce etc.) hosted by large data centers are a well-known example of the success of distributed
computing. Apart from enhancing the scale, efciency, and robustness of computing services, distributed
systems have provided an innovation infrastructure for developing new classes of services. These include
cloud based services computing-as-a-service [1], software-as-a-service [77], location-based services
(for smartphone users) [55], participatory sensing [58], etc.
The main attraction of distributed computing is that it enables multiple autonomous computing devices
(e.g, desktops, serves, smartphones) to cooperate and accomplish a common goal. At times, such cooper-
ation is necessary because individual devices do not have enough computing resources (processing speed,
memory, etc.) to accomplish the task at hand. But other considerations such as robustness to failures, cost,
limits on response time, etc. can also make distributed architecture preferable (over a centralized one).
For instance, the MapReduce [20] programming framework allows cluster of low end servers (limited
resources but lower cost) to provide the same functionalities as a high end (resource rich but expensive)
database management system.
However, the ip side of this improved functionality is that distributed systems are often harder to
manage. A few of the important tasks related to managing large, distributed systems include resource
1
management, ensuring fault tolerance, and maintenance and upgrade. Performing these tasks in the pres-
ence of multiple components with complex dependencies and failure modes is a challenging task. In this
dissertation, we focus on resource management in distributed systems.
The essence of resource management is to match the requirements of computing tasks with the avail-
able resources. This is typically done in the context of certain goals (users' as well as operators') and
real-world constraints (monetary and technological). The main challenges in resource management can
broadly be classied into three categories.
1. Workload characterization: understanding the system workload and its resource needs.
2. Resource allocation: how to arbitrate access to resources across multiple competing tasks?
3. Adapting to system dynamics: component failures, changes in workload, etc.
In this dissertation, we propose and develop approaches to resource management in three qualitatively
different systems: (1) server clusters providing computing-as-a-service, (2) tiered-architecture (of servers)
hosting web services, and (3) networks of wireless sensors. These systems differ from each other along
multiple dimensions: available resources, system dynamics, workload, etc. For example, in wireless (sen-
sor) networks network connectivity and the available bandwidth changes much more rapidly than is the
case with server clusters. Still, a common theme in effective resource management for these systems (as
demonstrated in this dissertation) is that we must be cognizant of the system heterogeneity (computing
resources as well as workload), and adapt to system dynamics. Next, we briey introduce the four pieces
of work that this dissertation is composed from.
First, we design and implement a service model for computing-as-a-service. Such services are hosted
over large serve clusters and allow users to rent computing resources on-demand [1, 30]. A service model
is the interface between the user and the operator that determines the type of service provided. Currently,
relatively simplistic models seem to be the norm, where the operator undertakes to provide resources to
complete a job, but does not provide any assurance of when the job will be completed (predictability)
or provides limited ways in which users can ask for different levels of service (service differentiation).
2
For instance, AWS and Azure use a rental based service model, in which users can choose from a
range of virtual machine instances (of different sizes) and pay a xed rate for each instance; the system
makes no statement about when jobs nish. Most grid-computing infrastructures, managed using Portable
Batch System (PBS) or its variants, charge users based on resource usage (e.g., node hours), and provide
differentiation using a few discrete priority queues that are differentiated by job size and duration. We
explore a part of the design space of service models by considering a design that attempts to provide both
predictability in nish times and the capability for differentiation by allowing users to select desired nish
times (for example, choosing an earlier nish time for a delay-sensitive job). Our approach, called MRM
(for Map-reduce Market), achieves these goals through a novel pricing mechanism. This, in combination
with deadline-based scheduling, can ensure both predictable nish times and provide users with a choice
of nish times.
Next we discuss a learning based approach for characterizing the workload for web services [81]. Our
work addresses an important challenge related to parameterizing the performance models for multi-tiered
web services. The accuracy of these models benets substantially by differentiating among categories of
requests based on their resource usage characteristics. However, categorizing requests and their resource
demands often requires signicant monitoring infrastructure [5, 15]. Such infrastructure is certainly feasi-
ble to engineer, but often requires a deeper understanding of both the workload and the service architecture.
As a result, (invasive) monitoring based solutions have not been widely adopted. We propose a method to
automatically differentiate and categorize requests without requiring sophisticated monitoring techniques.
Using machine learning, our method requires only aggregate measures such as total number of requests
and the total CPU and network demands, and does not assume prior knowledge of request categories or
their individual resource demands. Our evaluations with benchmark workloads show that this approach
works well while being lightweight, generic, and easily deployable.
Finally, we address two problems related to resource management in wireless sensor networks. One
deals with making adaptive decisions about when to compress data with the goal of minimizing overall
energy consumption [82]. Data compression can save energy and increase network capacity in wireless
3
sensor networks. However, the decision of whether and when to compress data can depend upon platform
hardware, topology, wireless channel conditions, and application data rates. Using Lyapunov optimization
theory, we design an algorithm called SEECthat makes joint compression and transmission decisions with
the goal of minimizing energy consumption. A practical distributed variant, DSEEC, is able to achieve
more than 30% energy savings and adapts seamlessly across a wide range of conditions, without explicitly
taking topology, application data rates, and link quality changes into account.
The second problem targets detecting faults in sensor systems. We explore four qualitatively different
techniques for detecting data fault: faulty sensor readings that deviate from the normal pattern exhib-
ited by true (or non-faulty) sensor readings [84]. Our analysis of sensor data collected from real-world
deployments shows that such data faults are occur quite often. This work forms an important stepping
stone towards developing online methods for fault detection [101]. Online fault detection is essential for
isolating faulty components in the network. Once identied, we can save resources by not involving them
until they are xed (e.g., ask sensor nodes will faulty readings to stop sending data until xed).
This dissertation is organized as follows. In Chapter 2, we review the related work. In Chapter 3,
we discuss the design and implementation of MRM, our novel service model for computing-as-a-service.
In Chapter 4, we describe our work on automatic request characterization for web services. Chapter 5
presents our work on dynamic data compression in wireless (sensor) networks, and Chapter 6 describes
our work on sensor data fault detection. We summarize our work and conclude in Chapter 7.
4
Chapter 2
Literature Review
This dissertation includes four distinct topics within the eld of resource management in distributed sys-
tems. Accordingly, this chapter is organized into four sections. We rst survey the work related to MRM,
our service model for systems providing computing-as-a-service. We then discuss related work in work-
load characterization for web services. In the third and the fourth section, we discuss work related to
data compression and energy management in wireless networks, and anomaly detection in the context of
wireless sensor networks, respectively.
2.1 Resource management for computing-as-a-service
Our work in developing MRM, a service model for computing-as-a-service, is related to prior work in sev-
eral areas namely, market based mechanism for resource allocation, congestion pricing, job scheduling
for shared (MapReduce) clusters, and deadline-aware scheduling. Next, we survey several relevant work
from these areas.
Market-based mechanisms. There is a rich history in applying market-based approaches for resource
management, in particular for grid and parallel computing environments [97, 90, 11, 91]. They key idea in
these approaches is to create a market that can bring the sellers (owners of computational resources) and
the buyers (users with jobs to execute) together. Often, prices for resources are set using either repeated
5
auctions or trading [97, 91, 11]. This allows buyers (sellers) to adjust their bids (minimum resource price)
based on the current demand and supply. As with any market-based system, the goal of sellers is to
maximize revenue while the buyers optimize their cost-performance ratio. These auction-based systems
are best suited for federated clusters with multiple independent owners for resources.
MRM's pricing mechanism is designed for clusters owned by a single entity. With a single administra-
tive entity, the overhead of repeated auctions can be avoided by having a centralized entity set prices [90].
Here, users are assumed to be price takers, and the main challenge lies in setting resource price to achieve
desired objectives. Previous work has considered objectives such as maximizing utilization (or revenues),
user satisfaction, and fair resource allocation. In contrast to this body of work, MRM uses pricing to
achieve predictability in nish times and service differentiation for delay sensitive jobs.
Congestion pricing. MRM's price-deadline curve is similar in spirit to prior work on pricing to alleviate
congestion in other resources such as network bandwidth [66], freeways [104], and parking spots [25].
Flat rather than per-byte pricing has emerged the dominant pricing strategy for network access, as the
former has been seen to encourage greater network access. However, at pricing is unsuitable for shared
multi-tasking clusters due to its inability to differentiate jobs based on their delay tolerance. On the other
hand, while freeway tolls and parking prices are designed to maximize prot, we develop a non-prot
scheduler that designs prices to provide service differentiation while ensuring performance predictability.
Also, similar to MRM's load-sensitive pricing, there have been recent initiatives to price for parking based
on current congestion levels [80].
Computing processing time estimates. To our knowledge, the only other work that uses historical data to
estimate the processing time of MapReduce jobs is [41]. Like MRM, that work also uses Hadoop-specic
features to characterize jobs. To estimate the processing time of job A, they rst do a N-nearest neighbor
search in the feature space for jobs similar to A executed in the past, and then use locally-weighted linear
regression to estimate A's processing time. However, unlike MRM, their goal is to detect performance
problems on the cluster using prior estimates of job processing times. As such, it is crucial for their
6
work to have fairly precise processing time estimates; MRM's estimation technique is more conservative
because it attempts to bound the worst case behavior in order to avoid deadline violations.
Several other approaches have been proposed for estimating the (remaining) processing time of a
task/job once it starts executing [2, 57, 56]. While not directly applicable to MRM, which needs a priori
estimates, such approaches can help account for dynamic network and disk contention when computing
processing time estimates, as discussed in Section 3.5.2.
MapReduce schedulers. Scheduling MapReduce-style workows has received signicant attention. Quincy
[37] uses a network ow formulation to achieve two goalsoptimize the data locality of jobs and enable
fair access to the cluster irrespective of job size. Zaharia et al. [105] implement delay scheduling to im-
prove the data locality of short jobs. In both cases, the improvement in data locality and fairness that any
particular job receives depend on other jobs simultaneously seeking access to the cluster and so cannot be
used to offer predictability guarantees.
Infrastructure service platforms such as Elastic MapReduce [21] provide predictability in performance
by over-provisioning the cluster so that users get compute resources when they need them. When such
services become popular, however, and demand exceeds cluster capacity, techniques like MRM will be
needed for predictability.
Deadline-aware scheduling. Scheduling taking deadlines into account has been explored for several
decades [8]. Earliest Deadline First [45] guarantees that a set of jobs can be scheduled so as to satisfy their
deadline demands, if any such schedule exists. However, prior work in deadline-aware scheduling has
focused largely on real-time scheduling of processes. We identify the utility of deadline-aware scheduling
in shared multi-tasking clusters and design and implement a framework, MRM, for doing so.
2.2 Workload characterization in Internet Services
The proposed techniques in published literature for workload characterization and performance model-
ing can conceptually be classied into two categoriesinference based and instrumentation based. The
7
key challenge for inference-based techniques is calibrating the parameters of an analytical model from
aggregate statistics such as overall CPU utilization. Instrumentation-based techniques provide a more de-
tailed characterization of the system as compared to the inference-based techniques, but require invasive
middleware or kernel level instrumentation and an extensive monitoring and logging infrastructure.
Inference-based techniques. Inference based techniques typically construct an analytical model for the
system and perform workload characterization to estimate the parameters of the analytical model. For
example, multi-tier web applications have been modeled as a network of queues in [96, 107, 46]. The
number of request categories, the arrival rate for each request category and the resource requirements
(service times) for each request category are the key parameters needed to calibrate a queueing model. To
estimate these parameters, inference based methods use aggregate resource utilization (for example, CPU
usage) measurements and the information available in server logs.
Liu et al. assume that the number of request categories and their arrival rates at each queue/service tier
is known [46]. They use measured server utilization and end-to-end request delays to estimate the service
times (resource requirements) at each tier for each request category. The estimates for the service times
are such that the weighted least square error between the measured and the estimated server utilization
(using the queueing model) and end-to-end request delay is minimized. Minimizing the least square error
requires solving a quadratic program and the authors present an efcient algorithm for it.
Zhang et al. assume a linear relationship between the aggregate CPU utilization at servers and the
resource usage by each category [107]. This linear relationship is the same as our linear model (equa-
tion 4.4). Assuming knowledge of the number of request categories and the arrival rate for each category,
the authors estimate the service times for each category using ordinary least squares regression. Zhang et
al. use the TPC-W benchmark for experimental evaluation and treat each transaction (for example, add an
item to a cart) as a separate request category.
Urgaonkar et al. develop a product-form queueing model of multi-tier Web services for estimating
request response times [96]. Using high-level system and workload parameters, such as request rates and
8
service times, the model can predict response times for various system congurations. Many of their results
are in terms of a single overall workload category, but they also show how to extend their queueing model
to incorporate different known request classes. The model for scalar workloads requires request rates and
residence times at all tiers in the system. The model, extended with classes, needs these parameters for all
categories.
Stewart et. al use an analytical model to estimate the request response time (end-to-end delay) based
on the observed requests for each transaction types (the transaction mix) and the aggregate resource uti-
lizations [88]. The authors use a linear model similar to equation 4.4 to model the aggregate resource
utilization. In order to estimate the model parameters using benchmark data, the authors perform Least
Absolute Residuals (LAR) regression. Results based on benchmark experiments and analysis of real pro-
duction system data show that the approach works well when none of the resources are saturated.
Our blind source separation based approach to Internet services workload characterization is an infer-
ence based technique. However, unlike the related work described above, we do not assume that various
request categories and their arrival rates are known. In the published literature, a common approach when
dealing with benchmark experiments with multiple request categories is to assume that each transaction
type represents a separate category [107, 88]. This approach is based on the assumption that the dif-
ferent transaction types have different resource demands (e.g., checkout is more CPU intensive than
browsing). While this approach has been shown to yield acceptable performance (within 10-20% rela-
tive prediction error) for benchmarks such as TPC-W and RUBiS, it does not work for workloads involving
transaction types with similar resource usage (refer to the evaluation with the StockOnline benchmark in
[88]). In addition, identifying the transaction types for a web service workload often requires a deeper un-
derstanding of the workload and the system architecture, as well as signicant experience with managing
a real production system.
Using resource utilization measurements and information from server logs, BSS based techniques (ICA
and CCA) can estimate both the number of request categories and their arrival rates, as well as the resource
usage at each service tier by requests of each category, provided that the assumptions required for BSS are
9
satised. These parameter estimates can then be used to calibrate the various inference models, thus
eliminating the need to know number of request categories and their arrival rates a priori. Note that our
ICA based approach requires the same amount of information as other inference based techniques, namely,
aggregate resource utilization measurements and information from server logs.
An alternate approach to request categorization has been proposed by Goldszmidt et. al [27]. They
dene a feature vector consisting of request statistics (e.g., total number of instances for each URL, mean
number of active instances, number of sessions), measured resource utilization and the response time for
each request. All this information can easily be obtained from server logs. The authors use K-means
clustering to categorize URLs according to the average system load while processing the URLs and the
response time required to process the URLs. The work in [27] also denes a metric called the Effective
e-Business Capacity (EEC). EEC is the number of e-Business jobs (e.g., client sessions) that a web service
can service without violating a pre-determined service-level agreement (e.g., 90% of the requests generated
by the client sessions should have a response time less than 1 second). Given the request categories, several
machine-learning based techniques (e.g., Bayes' classier and logistics regression classier) are used to
infer whether the system has enough EEC to satisfy the service-level agreement for a given session arrival
rate. Like our BSS based technique, the inference models in [27] are automatically and dynamically built
from measured data. However, unlike the queueing based inference models, the EEC based models in [27]
cannot capture the impact of an individual component on system performance (e.g., the impact of a CPU
or memory upgrade).
Instrumentation-based techniques. It is possible to obtain a detailed workload characterization re-
quest categories, arrival rates for each category and the per-category resource usage at each service tier
using a monitoring and logging infrastructure. Tools such as Magpie [5] and PinPoint [15] collect ne-
grained information about each request including the various servers (e.g., web, application and database
servers) visited by a request, and the resource usage at these servers during each visit. This ne-grained
information can be used to cluster requests into different categories and to determine the per-category
10
resource usage. While these tools have been implemented as research prototypes, the complexity of the
monitoring infrastructure and the multi-tier service architecture, and the expertise required to deploy and
use these tools, has hindered their adoption in production systems.
2.3 Data compression and energy savings
In Chapter 5, we discuss our design of control algorithms that make decisions about when a sensor node
should compress data and when should it transmit. The goal is to minimize energy consumption and
adapt to the time varying wireless channel characteristics. We design these algorithms using the Lyapunov
Optimization framework [44]. Here, we discuss the related work.
Most relevant to our work is Neely's [62] Lyapunov optimization based joint compression and trans-
mission scheduling algorithm for a single node transmitting data to a base station over a single wireless
hop. It does not consider energy consumption due to packet receptions, overhearing and duty-cycling
overhead that must be accounted for in a multi-hop scenario, as we do. Also, that paper only presents a
theoretical analysis of the algorithm. In this paper, we provide analytical bounds on the performance of
SEEC and also evaluate a distributed version, DSEEC, using simulations.
Two other works related to ours, focusing on compression in wireless networks are Barr and Asanovic [6],
and Sadler and Martonosi [76]. Specically, Barr and Asanovic [6] consider a single wireless hop setting
with xed transmission cost (equivalent to static channel condition), and focus on estimating the com-
munication to computation energy ratios for several compression algorithms. Sadler and Martonosi [76]
consider a multi-hop static setting, where they experimentally demonstrate that compression can save sig-
nicant amount of energy when data is transmitted over multiple hops. They also develop variants of
popular compression algorithms more suited for sensor nodes with limited CPU processing speed and
memory. In contrast to both these papers, we discuss a principled technique for making on-line com-
pression and transmission decisions in a dynamic environment. Our approach, based on the Lyapunov
11
optimization techniques, also enables us to provide performance guarantees for the centralized version of
our algorithm.
Several papers have considered the complementary problem of joint routing and data compression in
wireless sensor networks (Pattem et al. [16], Pattem et al. [68], Dang et al. [18]). The main focus of these
papers is to design data correlation aware routing trees that enable nodes to achieve better compression
efciency (via in-network aggregation and compression) compared to routing trees that are agnostic to
data correlation. A version of SEEC that incorporates compression after in-network data aggregation can
be used on top of these joint routing and data compression schemes to enable the nodes to adapt their
compression decisions to changes in the routing tree, in addition to link quality and application data rate.
Lyapunov optimization based techniques have also been used to design stable algorithms that optimize
different performance metrics (Georgiadis et al. [44]). For example, a joint transmit power allocation and
transmission scheduling algorithm (EECA) that minimizes the system energy expenditure is discussed
in Neely [47]. SEEC's transmission scheduling algorithm is similar to EECA in its use of a differential
queue backlog associated with links for making the transmission decision. However, unlike that work, we
consider the energy consumption due to data transmission, reception, and overhearing.
Finally, several recent papers have used backpressure based transmission scheduling for distributed
congestion control (Sridharan et al. [85], Warrier et al. [98]). The key contribution in these papers is
to design mechanisms that enable backpressure based scheduling for CSMA based MACs (802.11 and
802.15.4). As discussed in Section 5.5, their heuristics can be used in DSEEC.
2.4 Sensor data fault detection
Chapter 6 describes our work on sensor data fault detection. In this section, we describe the related work.
Sensor data integrity encompassing fault detection, fault localization, identication of root causes, and
correcting/recovering from data faults is an active area of research. In this section, we discuss several
pieces of work that are closely related to our data-centric fault detection approach. These techniques are
12
similar to one (or more) of the four classes of methods explored in this paper, and we point these out when
discussing the corresponding work. However, none of these methods can be applied without modication,
to fault detection. For example, some of these techniques are designed with a specic sensing task or type
of sensor in mind (e.g., [22] focusses on improving the accuracy of aggregate queries over sensor readings).
When discussing each related work, we explain why it cannot be used directly for comparison. However, it
may be possible to use some of these methods for sensor fault detection under different assumptions for
example, when information about the probability distribution of the true sensor readings, the characteristics
of the noise corrupting sensor data, etc. is available.
Two recent papers [42, 38] have proposed a declarative approach to erroneous data detection and clean-
ing. StreamClean [42] provides a simple declarative language for expressing integrity constraints on the
input data. Samples violating these constraints are considered faulty. StreamClean uses a probabilistic
approach based on entropy maximization to estimate the correct value of an erroneous sample. The eval-
uation in [42] is geared towards a preliminary feasibility study and does not use any real world data sets.
Extensible Sensor stream Processing (ESP) framework [38] provides support for specifying the algorithms
used for detecting and cleaning erroneous samples using declarative queries. This approach works best
when the types of faults that can occur and the methods to correct them are known a priori. The ESP
framework is evaluated using the INTEL Lab data set [35] and a data set from an indoor RFID network
deployment. These two declarative approaches are similar to our Rule-based methods. For example, we
can think of the integrity constraints in StreamClean as (a combination of) rules. Similarly, specifying
an algorithm for detecting erroneous samples within the ESP framework is equivalent to deciding which
Rule-based method to apply, as both these decisions require knowledge of the data fault types.
Koushanfar et al. propose a real-time fault detection procedure that exploits multi sensor data fusion
[43]. Given measurements of the same source(s) by n sensors, the data fusion is performed (n+1) times,
once with measurements from all the sensors; in the rest of the iterations the data from exactly one sensor is
excluded. Measurements from a sensor are classied as faulty if excluding them improves the consistency
of the data fusion results signicantly. Simulations and data from a small indoor sensor network are
13
used for evaluation in [43]. Data from real world deployments are not used. This approach is similar
to our HMM model based fault detection because it requires a sensor data fusion model. However, this
approach cannot be used for fault detection in applications such as volcano monitoring [99] and monitoring
of chlorophyll concentration in lake water [61] where sensor data fusion functions are not easy to dene
without consulting a domain expert; however the HMM based method can be.
Elnahrawy et al. propose a Bayesian approach for cleaning and querying noisy sensors [22]. The
focus in [22] is to improve the accuracy of aggregate queries over sensor readings (for example, SUM,
A VG, COUNT etc.) rather than detect data faults. Using a Bayesian approach requires knowledge of the
probability distribution of the true sensor readings and the characteristics of the noise process corrupting
the true readings. The evaluation in [22] does not use any real world datasets. In terms of the prior
knowledge and the models required, the Bayesian approach in [22] is similar to the HMM based method we
evaluated in this paper. It is difcult to apply this method to the real world datasets considered in this paper
due to the lack of prior information about the probability distribution for true sensor readings and the noise
characteristics for each sensor, especially for the NAMOS dataset because the variation of chlorophyll
concentration in lake water is not a well understood phenomenon. Even for well-understood phenomena
like ambient temperature, humidity etc. (INTEL and SensorScope datasets), in the absence of ground truth
values and contextual information related to the sensor calibration, determining the probability of the true
sensor readings is difcult. In some cases, prior information related to true sensor readings can be obtained
by calibrating and testing the sensors extensively in a controlled environment before a deployment, as done
by Ramanathan et al. [74] for a ground water monitoring deployment. Elnahrawy et al. also assume that
the noise corrupting the sensor readings follows a Gaussian distribution. This assumption does not hold
for the SHORT and the CONSTANT faults in general.
Tulone et al. model the temperature measurements in the INTEL Lab data set [35] using an autore-
gressive (AR) time series model [94]. They use a procedure similar to our One-step ahead forecasting
based fault detection to detect faults/outliers similar to SHORT faults. However, the AR model based fault
detection technique used in [94] is not suitable for detecting long duration faults (for example, NOISE and
14
CONSTANT fault types) or short duration faults that occur frequently. This is because the autoregressive
model captures only short time scale trends. For example, the AR model used in [94] is trained using
samples collected over an hour (at a rate of a sample every 30 seconds) and hence, it captures temporal
trends/correlations that are on order of a few minutes. For such an AR model, a long duration NOISE fault
lasting a couple of hours (or a short duration fault that occurs frequently) will be treated as a change in
underlying data distribution, and the AR model will be retrained to t the faulty data. Thus, it will fail to
identify a majority of the faulty samples. Our seasonal ARIMA model based fault detection method can
detect both the short duration and the long duration faults, and hence, subsumes the fault detection capabil-
ities of an AR model. However, estimating the parameters for a seasonal ARIMA model is more complex
computationally than estimating the parameters for an AR model. At the very least, we need tempera-
ture measurements collected over 2 3 days to train our seasonal ARIMA model, whereas measurements
collected over only an hour are enough to train the AR model used in [94].
Several papers on real-world sensor network deployments [93, 73, 99, 50] present results on meaning-
ful inferences drawn from the collected data. However, to the best of our knowledge, only [93, 99, 73] do a
detailed analysis of the collected data. The aim of [73] is to do root cause analysis using Rule-based meth-
ods for on-line detection and remediation of sensor faults for a specic type of sensor network monitoring
the presence of arsenic in groundwater. The SHORT and NOISE detection rules analyzed in this paper
were proposed in [73]. Werner et al. compare the delity of data collected using a sensor network mon-
itoring volcanic activity to the data collected using traditional equipment used for monitoring volcanoes
[99]. Finally, Tolle et al. examine spatiotemporal patterns in micro-climate on a single redwood tree [93].
While these publications thoroughly analyze their respective data sets, examining fault prevalence and/or
developing a generic sensor data fault detection approach was not an explicit goal. Our work presents a
thorough analysis of four different real-world data sets. Looking at different datasets also enables us to
characterize the accuracy and robustness of four qualitatively different detection methods.
15
Chapter 3
An Alternative Service Model for Shared Clusters
Computing-as-a-service is here to stay, and its benets, such as resource elasticity and efcient utilization,
have been well-documented. However, little attention has been paid to service models for computing-as-a-
service. Existing service models are relatively simplistic in that they provide little or no predictability in job
nish times and limited service differentiation. In this paper, we propose a new service model for shared
clusters, MRM, that addresses these shortcomings. MRM estimates job processing times conservatively to
provide predictability in nish times, and uses pricing to incentivize users to contribute slack so that delay-
sensitive jobs can be accommodated. We have instantiated MRM in the context of shared Map/Reduce
clusters. Our evaluation results demonstrate that MRM can provide predictability of job nish times and
differentiated service under a variety of user demands (workloads).
3.1 Introduction
Computing-as-a-service has been evolving steadily. Already private clouds offer such a service to different
enterprise organizations (e.g., Google's planet-wide clouds [91]), and several providers now offer such
services publicly (Amazon's web services (AWS), Microsoft's Azure).
The benets of computing-as-a-service are well established. For the consumer, the pay as you go
model provides resource elasticity while reducing the need to maintain their own signicant computing
16
infrastructure. The economies of scale in large clouds or data centers, particularly in the case of large
corporations, enables efcient utilization and a robust computing platform at a low cost.
Over the past several years, service providers have started providing computing abstractions at dif-
ferent levels: bare virtual machines, specialized languages and runtimes (e.g., for massively-parallel data
processing MapReduce [20], Dryad [36]), web services, and so on. For example, Amazon offers both
bare virtual machine clusters as well as MapReduce clusters [21]. We believe that this trend is going to
continue as it not only provides consumers with a range of services, but it also creates opportunities for
operators of publicly available clouds to substantially increase their revenue.
Despite the specialization of computing abstractions, little attention has been paid to the service model:
the interface between the user and the operator that determines the type of service provided. Currently,
relatively simplistic models seem to be the norm, where the operator undertakes to provide resources to
complete a job, but does not provide any assurance of when the job will be completed (predictability)
or provides limited ways in which users can ask for different levels of service (service differentiation).
For instance, AWS and Azure use a rental based service model, in which users can choose from a
range of virtual machine instances (of different sizes) and pay a xed rate for each instance; the system
makes no statement about when jobs nish. Most grid-computing infrastructures, managed using Portable
Batch System (PBS) or its variants, charge users based on resource usage (e.g., node hours), and provide
differentiation using a few discrete priority queues that are differentiated by job size and duration.
Our contributions. In this work, we explore a part of the design space of service models by considering
a design that attempts to provide both predictability in nish times and the capability for differentiation
by allowing users to select desired nish times (for example, choosing an earlier nish time for a delay-
sensitive job). Our approach, called MRM (for Map-reduce Market), achieves these goals through a novel
pricing mechanism. This, in combination with deadline-based scheduling, can ensure both predictable
nish times and provide users with a choice of nish times.
17
The key innovation in MRM is the design of the method by which these feasible nish times are
computed and priced. To ensure predictability, the feasible nish times for a job are computed based
on the current system load and the computing resources required by the job, determined using a history-
based estimator of job processing times. To achieve service differentiation, MRM prices job completion
deadlines to encourage users to select later nish times than the earliest possible nish time. Without this
pricing, the rational choice for a user is to always select the earliest possible nish time for its job as the
deadline: if all the users do this, then MRM is equivalent to FCFS (and hence, it cannot accommodate a
meaningful choice of nish time). This pricing combined with an appropriately chosen wealth distribution
(the initial monetary resources users have to pay for nish times) incentivizes users to offer slack in the
schedule, which can be used to accommodate earlier nish times (than possible with FCFS) for later
arrivals.
Any service model must be co-designed in the context of a computing abstraction, since the details
of prices and congestion will depend on the abstraction. We have instantiated MRM for shared MapRe-
duce [20] clusters. Given the widespread use of such clusters, and the lack of predictability and service
differentiation in these clusters, we believe that MRM addresses an important and urgent need. Experi-
ments using our full-edged prototype on clusters of up to 90 nodes with an adversarial workload, reveal
that MRM can achieve near-perfect predictability and reduce the median waiting time for delay-sensitive
jobs by a factor of 5 compared to FCFS scheduler by incentivizing users to give slack.
The rest of this chapter is organized as follows. We provide a brief overview of our system in Sec-
tion 3.2. We discuss the details of MRM's design and implementation in Section 3.3 and Section 3.4. In
Section 3.5, we present the results from our experiments to evaluate MRM's performance. We conclude
with a discussion of future work in Section 3.6.
18
3.2 MRM Goals and Overview
In this section, we rst describe MRM's design goals, then provide an overview of our system and nally
discuss the challenges involved in its design and implementation.
We design MRM for clusters that provide computing-as-a-service, allowing users to submit jobs that
are then scheduled on the cluster. We dene a job more precisely later; for now, we dene a job as
a distributed computation on the cluster. MRM is designed with the following two goals in mind: (1) to
provide users with predictability in nish times of jobs and (2) to enable the system to differentiate between
delay-sensitive jobs and delay-tolerant ones.
Mechanisms currently used for scheduling jobs in shared clusters, such as FCFS, priority queues, and
fair scheduling, can meet one, but not both, of our design goals. Under the FCFS policy, assuming that
the processing times of jobs are known or can be estimated, we can easily estimate the nish time of a
job from its position in the queue. However, FCFS does not provide differentiated service: small jobs
can get stuck behind large jobs and experience high waiting times. Pure priority queueing can be used
to implement differentiated service levels for jobs, but may result in unbounded nish times for lower-
priority jobs. Mechanisms such as pricing or limiting capacity for the higher priority queues can be used
to alleviate this, as we discuss briey in Section 6.4. Finally, fair scheduling supports job differentiation
through weights, but cannot ensure tight deadlines: a job's fair share may vary with the load, resulting in
unpredictable nish times.
MRM achieves predictability and differentiation through the design of a novel pricing mechanism.
This, in combination with scheduling with deadlines, can ensure both predictable nish times and overall
shorter wait times for delay-sensitive jobs. When a job is submitted by a user, MRM negotiates a deadline
with her. The user is offered a range of feasible nish times for her job. She can select a nish time in
this range as the deadline, and MRM ensures that, in the absence of any unanticipated failures (MRM
can handle routine transient failures, as described in Section 3.4.5), the job's execution completes by its
deadline.
19
Figure 3.1: MRM in action
The key innovation in MRM is the design of the method by which these feasible nish times are
computed and priced: this method ensures both predictability and differentiation, as described below.
To ensure predictability, the set of feasible nish times for a job are computed based on the current
system load and the computing resources required by the job. For example, as shown in Figure 3.1(a),
suppose Alice is the rst user who submits job A, at time t = 0, that requires p
A
= 10 processing time on
the cluster. Then, the set of feasible nish times for A are f
A
=ft 10g. Suppose Alice selects d
A
= 10
as the deadline for A. If Bob submits job B with p
B
= 1 at any time t2 [0;10], say t = 1, then the set of
feasible deadlines for B are f
B
=ft 11g due to the fact that MRM has already committed to nishing
job A by time t = 10.
To ensure differentiation, MRM prices the deadlines to encourage users to select later nish times than
the earliest possible nish time. Without this pricing, the rational choice for a user is to always select
the earliest possible nish time for its job as the deadline: if all the users do this, then MRM would be
equivalent to FCFS. To achieve differentiation, MRM offers incentives to users to accept a later (than the
earliest possible) nish time as the deadline (when this is acceptable) in order to maintain some slack (free
computing resources) on the cluster. This will allow MRM to offer better (than FCFS) deadline choices
to a job that arrives at a time when multiple jobs are already queued and waiting to receive service. In
our scenario above, if Alice had offered some slack to MRM by selecting d
A
> 10, e.g., d
A
= 11, then
f
B
=ft 2g. MRM provides users with an incentive to offer slack, as described below.
Users pay for jobs with tokens (which are dened more precisely later). Intuitively, a token represents
the cost of some unit resource. MRM does not specify how users obtain tokens, leaving cluster operators
considerable exibility to select payment models - e.g., pay-as-you-go plans where users pay for the tokens
20
expended or prepaid plans where users can obtain a set of tokens a priori. For any given job, MRM
associates a price with each of its feasible nish times, thus dening a price-deadline curve for it. The
user (submitting this job) has to select a point on this curve that is commensurate with the tokens she has
and the deadline she wants.
To understand how this mechanism can be used to incentivize slack, consider the example discussed
above. Suppose MRM offers Alice a price-deadline curve where she must pay 30 tokens if she chooses
the earliest possible deadline d
A
= 10, but only 10 tokens if she offers a slack of 1 time unit by selecting
d
A
= 11. Alice does not want to pay more than 10 tokens for her job and settles for d
A
= 11. With
d
A
= 11, the set of feasible nish times for Bob's job B are f
B
=ft 2g. MRM offers Bob a at price-
deadline curve he needs to pay 2 tokens for any (feasible) deadline. Naturally, Bob selects d
B
= 2.
At this point, MRM must preempt job A and rst nish job B so that it can satisfy both deadlines (see
Figure 3.1(b)). Thus, MRM has been able to offer Bob an earlier deadline than he might otherwise have
received in the absence of pricing.
Our high level description of MRM highlights several challenges related to both its design and imple-
mentation. The main design challenge pertains to the design of the price-deadline curve: What, precisely,
is the unit of price? What factors impact the form of the pricing function? Which pricing function is best?
We discuss these in Section 3.3.
Moreover, any realization of MRM must address the following challenges. At the time of submission,
how can MRM a priori estimate the processing time of a job? How must it schedule jobs so that their
deadlines are met? How can it make the system robust to transient failures and errors in processing time
estimates?
In this paper, we describe a realization of MRM for a MapReduce cluster, in which users submit
MapReduce jobs; an exploration of MRM for other massively-parallel computation models such as Dryad,
or alternative models such as virtualized bare machines in AWS, is left to future work. Our realization
conceptually consists of four components: a price-deadline negotiator which computes the price-deadline
curve for a new job, a processing time estimator which estimates the processing time for a newly submitted
21
job, a deadline-aware scheduler, and a job progress monitor which tracks task completion times. We
discuss the details of these components of MRM in Section 3.4.
3.3 Service model: Design
As mentioned in Section 3.2, MRM offers each user with a new job a price-deadline curve; once the user
selects a feasible nish time, MRM's scheduler ensures completion of the job by the selected deadline.
MRM also incentivizes users to increase system slack, enabling it to offer service differentiation. The
central challenge in the design of MRM is the choice of the price-deadline curve.
MRM's price-deadline curve must satisfy two properties. First, when users select a nish time later
than the earliest feasible deadline, they increase the available slack in the system. To incentivize them,
the price-deadline must decrease as a function of offered slack. Second, the price-deadline curve should
adjust to system load. At a low load, the cluster has lots of slack (due to under-utilization), and so MRM
does not necessarily have to charge users a higher price for not offering any slack. However, at high loads,
MRM may need to strongly incentivize users to pick later deadlines in order to maintain sufcient slack
for differentiation.
On selecting a price-deadline curve. There are three key choices involved in selecting a price-deadline
curve: a) What is the shape of the price-deadline curve? b) How to set the price for the earliest feasible
deadline? c) How (if at all) should the price-deadline curve depend on a job's processing time? d) How to
estimate the range of feasible nish times (we discuss this in Section 3.4.1).
Before discussing these choices, we introduce the following notation. Let f
e
j
denote the earliest feasible
nish time for a job j. Suppose its owner selects d
j
f
e
j
as the deadline. We dene the slack offered in
this case as
j
= d
j
f
e
j
. Let c
j
(
j
) be the price charged for job j with slack
j
.
Shape of the price-deadline curve. The shape of the price-deadline curve is determined by the inverse
relationship between the price c
j
and the slack
j
. This price decay can take several qualitatively different
forms: a concave function such as an exponential or high-order polynomial decay, a linear decay, or a
22
convex function such as a sub-linear decay. The main difference between these classes is the discount
rate, the level of price discount that will achieve a given slack
j
. For a xed slack, a concave function
will offer lower discounts than a linear or convex function. Thus, the latter functions are more desirable
from an operator's perspective when the cluster is over-subscribed: in that regime, these functions contain
better incentives than the concave function for increasing slack. We discuss this issue in more detail later
in this section.
Pricing the earliest feasible deadline. The second component of the pricing function is the pricing for the
earliest feasible deadline c
j
(
j
= 0). This choice can also impact how well MRM is able to maintain a
certain level of slack.
Clearly, to incentivize users to offer slack, MRM must price the earliest feasible deadline at a premium.
To understand this, we must rst understand that, in an unloaded system, the right price that a user must
pay for a job is its resource price: the cost of using the computational, network, and storage resources. In
this paper, we primarily consider computational cost. Thus, assuming 1 token buys 1 unit of processing
time on the cluster, a job with a total processing time p
j
will have a resource price r
j
= p
j
.
Then, to incentivize slack in a loaded system, the price for achieving the earliest feasible deadline for
job j must be higher than r
j
. If this premium is set too low (relative to users' purchasing power), then all
users will select the earliest possible deadline for their jobs, thus preventing MRM from providing service
differentiation. If it is set too high, then only rich users will be able to afford to select a tight deadline
for delay-sensitive jobs.
There are two choices for premium pricing: a xed premium where the price for the earliest feasible
deadline is a constant factor multiple of the resource cost (c
j
(0) = f(r
j
) = kr
j
; k 1), and a load-aware
premium where the price varies with load (for example, c
j
(0) = f(;r
j
), where is the average load
on the cluster). A load-aware price for the earliest feasible deadline allows MRM to adjust the overall
price-deadline curve to the system load: at high load, the pricing may be aggressively higher so that,
together with the choice of the appropriate shape of the curve, the system can incentivize slack. However,
23
it increases the planning uncertainty for users e.g., cost conscious users might need to be strategic
about when they submit their jobs. Operators can easily provide tools that use historical load information
to estimate premiums at various times of day to better help cost-conscious delay-tolerant users to plan their
job execution. We qualitatively compare these choices later in this section.
Usage and load-sensitive decay. While we have discussed the shape of the pricing decay, our implicit
assumption has been that the rate of decay is merely a function of the offered slack. However, it is possible
to use other factors to inuence the rate of decay, for specic ends. For example, making this decay be
dependent on load is another way to incentivize slack: at high load, the discount rate can be made lower,
an approach that has a similar effect as load-aware premium pricing of a job's resource price.
MRM can also make the rate of decay be usage-sensitive, or dependent on processing time. For
example, it can set prices based on the ratio of the offered slack to the processing time,
j
pj
. This will result
in large jobs having to offer much more slack than small jobs, to obtain the same discount rate e.g., if
pA
pB
= 2 and both of these jobs paid the same price, then it must be the case that
A
>
B
. Alternatively, if
A
=
B
, the price for job A should be higher than the price for B. A usage sensitive decay rate can result
in shorter queueing delay for small (in terms of processing time) jobs compared to the large jobs. This will
result overall lower average waiting time for jobs. We present experimental results in Section 3.5.3 that
demonstrate this effect.
Which pricing function is the best? With MRM's deadline sensitive pricing mechanism, users face a
cost vs. delay trade-off. They can pay a higher price (offer less slack) and select a tight deadline or select
a loose deadline (offer lots of slack) in order to pay just the resource price (or even less than that). MRM's
three choices related to the price-deadline curve shape, premium pricing, and usage sensitive pricing
controls this trade-off. In the rest of this section, we discuss the qualitative differences between the various
choices for the price-deadline curve.
24
Impact of shape. To understand the impact of the shape of the price-deadline curve, let us x two (out of
three) choices: assume constant factor premium pricing (c
j
(0) = kr
j
) and usage-sensitive decay (MRM
sets prices based on
j
pj
).
Consider the following three curves with different shapes: exponential (convex), linear, and sub-linear
(concave).
c
j
= kr
j
exp
j
p
j
(3.1)
c
j
= kr
j
j
p
j
(3.2)
c
j
= kr
j
1
kr
j
j
p
j
2
(3.3)
0 50 100 150 200 250 300
0
5
10
15
20
25
30
slack
price
linear
exponential
sub−linear
Figure 3.2: Price vs. slack
0 0.2 0.5 0.4 0.6 0.8 1
0
5
10
15
20
Load (r)
Premium factor (k)
p
j
=1
p
j
=10
p
j
=30
Figure 3.3: Premium factor
Figure 3.2 plots the three price-deadline functions for k = 3, and p
j
= r
j
= 10. Naturally, the slower
the decay, the more slack a user needs to offer in order to pay only the resource price. In this example,
with exponential decay, a user needs to offer
j
= 11 if she wants to pay only the resource price, but with
linear and sub-linear decay, she must offer a slack of 200 and 245, respectively.
In addition to its exact form, another design choice is related to minimum pricing: should a user be
charged at least the resource price for its job? Not offering discounts beyond the resource price reduces
25
the range of slack that users will offer. In case of Figure 3.2, with exponential decay, if the price is lower-
bounded by the resource price, a rational user will not offer more than 11 units of slack because doing so
does not reduce the price.
Choice of premium pricing. The main advantage of xed premium pricing for the earliest possible deadline
(c
j
(0) = kr
j
) is its simplicity users always know the premium they need to pay when offering zero slack.
However, it has two major drawbacks (1) at a low load, it can overcharge depending on the value of k,
and (2) we need to be aware of the wealth distribution amongst users (and how much they are willing to
pay for a job) to pick a good value of k.
From an operator's perspective, the second drawback of xed premium pricing is more critical. At high
load, with multiple jobs contending for service, MRM needs to set k in such a way that it can continue to
provide differentiated service. The right value of k depends on the users' wealth (the number of tokens
they possess) and their willingness to pay for tight deadlines. If k is set too low (relative to users' wealth),
then all the users can afford to select the earliest possible deadline, and MRM cannot provide service
differentiation (it will be forced serve jobs in FCFS order). However, information about users' wealth and
willingness to pay can be private information, and it may not be possible for MRM to estimate them. (Even
in systems where users buy or are assigned tokens, and hence users' wealth is known to operators, their
willingness to pay will generally be private and time-varying).
Load-aware premium pricing for the earliest possible deadline can address the two shortcomings high-
lighted above. Equation (3.4) shows a particular premium pricing function:
c
j
(0) = f(;p
j
)r
j
= exp
p
j
r
j
(3.4)
where is the average system load and
1
is the average job processing time across all jobs.
Figure 3.3 shows the change in premium (ratio of price paid when no slack is offered to the resource
price) as the load increases for three different values of job processing time. When the system load
1
We can compute the and
values using moving averages and/or from historical job statistics collected in system logs, as
discussed later.
26
increases, jobs pay a higher premium (with large jobs experiencing a much higher increase in this case);
thus, this premium pricing function adjusts the pricing to the load, and avoids overcharging at low load. In
the worst case, as when all users have innite tokens, load-aware premium pricing still cannot prevent all
users from selecting the earliest possible deadline. However, by charging a steep premium at high load, it
is likely to discourage users with delay-tolerant jobs from doing so.
3.4 Service model: Realization
We instantiate MRM in the context of shared MapReduce clusters. To do so, we must address three major
challenges: (1) how to a priori estimate the processing time of a job?, (2) how to schedule jobs so that
their deadlines are met?, and (3) how to make MRM robust to transient failures? We implemented for
modules (in Hadoop) to address these challenges: a processing time estimator, a price-deadline negotiator,
a deadline-aware scheduler, and an online monitor. Next, we describe these modules in detail.
3.4.1 Estimating a job's processing time
We rst describe the details of the MapReduce framework relevant to estimating a job's processing time,
which assumes basic familiarity with the MapReduce paradigm [20]. We then describe how we obtain
these processing time estimates.
MapReduce data ow. A typical MapReduce job consists of three pipelined phases: Map, Shufe, and
Reduce. The Map and the Reduce phases can consist of multiple independent tasks that can execute in
parallel. The output of the Map phase is partitioned into (disjoint) bins which are stored locally. The
Shufe phase handles this partitioning and the transport of the Map outputs to the relevant reduce tasks.
In this work, we focus on pricing and scheduling MapReduce jobs at the granularity of a job with a
single Map and Reduce phase
2
To estimate the processing time of such jobs, we rst need to estimate the
2
See Section 4.5 for a more detailed discussion on estimating processing times for more complex data-ow graphs), such as those
produced by Pig or Hive. Moreover, we also do not consider jobs with a local aggregation phase.
27
duration of the map, shufe, and reduce phases. We take a practical and simple approach to this problem
that works well in our experiments.
Online Estimation of Processing Time Statistics. The processing time of a job is dened as the time taken to
complete all the three phases described above. The central idea behind our approach is to use prior statistics
of job executions online on the cluster in order to conservatively estimate future processing times. Such
an online estimation approach is quite feasible in modern clusters: Map/Reduce scheduling systems like
Hadoop have sufcient instrumentation to permit fairly ne-grained data collection, and cluster operators
use workload characterization for provisioning.
We have empirically observed that the processing time of a MapReduce job depends upon four features:
(1) the number of map tasks, (2) the number of reduce tasks, (3) the map-reduction factor, and (4) the
reduce-reduction factor. The map-reduction factor measures the average reduction in input size after the
Map phase; for example, in case of a grep job searching for a rare pattern in a large volume of text,
the Map phase will lter out most of the original input, and thus, have a large data reduction factor.
Analogously, the reduce-reduction factor measures the average reduction from the input of the Reduce
phase to its output. Intuitively, these two reduction factors determine the contention for network and disk
I/O for the job, and the number of map and reduce tasks determine the computation resources needed
across the cluster.
In MRM, these features collectively form a feature space
3
, and each job represents a point in this
feature space. When users submit a job, they must specify the feature vector associated with this job: users
can obtain this information (especially the reduction factors) from trial runs of the job.
When each job completes, MRM obtains a sample of the processing time for jobs in the corresponding
point of the feature space. For example, consider a sort job (I/O: byte sequence, no data reduction after
map and reduce phases) with 64 map and 8 reduce tasks that sorts 6 GB of data on a 9 node cluster. After
its completion, MRM can obtain, from the Hadoop logs, statistics about the task completion times for each
3
Other features might also be important in determining processing time. For example, jobs that read compressed input might
differ in overall processing time from jobs that read uncompressed input. We have left an exploration of the complete feature space
for future work.
28
of the three MapReduce phases e.g., mean and standard deviation for task duration, condence intervals,
etc. As more jobs on the same point in the feature space are executed, MRM keeps updating these statistics.
When a user submits a new job which lies at the same point in the feature space
4
, MRM uses these
statistics to compute a conservative processing time estimate for the job. Specically, it very conservatively
uses the upper-bound of the 99% condence interval of the task completion time of each stage in its
calculations. Then, when computing the earliest nish time, it uses these estimates in a manner discussed
below. When computing the pricing function, it uses the total processing time of all tasks as an estimate
of the resources consumed by the job; in our example of a job with 64 map tasks and 8 reduce tasks, it
computes the total processing time as 35:3 64 + 79 8 + 337:5 8 ( conservatively assuming that the
reduce tasks for a job and hence, its shufe phase as well, are started only after all its map tasks nish
executing.)
Our approach is conservative; the choice of the average or 95th percentile of task duration and/or
assuming partial overlap between Map and Shufe phases would provide a less conservative bound but
that increases the risk of deadline violations due to stragglers [2] or failures. Due to our focus on
predictable nish times, we choose to compute a loose upper bound on processing time in MRM. We
discuss the impact of this choice on MRM's pricing policy later in this section.
An alternative approach, used in [57], is to estimate a job's processing time using a debug run of a
smaller version of the job (one with fewer map and reduce tasks). We have veried that this approach can
lead to underestimates of job completion time [70]; given that predictability is an explicit design goal of
MRM, we have not used this simpler method.
3.4.2 Price-deadline negotiator
To be able to offer a price-deadline curve to the user, MRM needs to compute three quantities: (1) the
earliest possible nish time for the job, (2) the premium price to charge the user, if she selects the earliest
nish time as the deadline (i.e., offers no slack), and (3) the shape of the price-deadline curve. As discussed
4
If an exact match is not available, we can use weighted averaging of nearest-neighbor statistics or more sophisticated clustering
to obtain an estimate.
29
Figure 3.4: Example of MRM's estimation of earliest nish time for a new job.
in Section 3.3, (2) and (3) are choices made off-line, for example, by the cluster operator, for the (particular)
choice of the price-deadline curve (e.g., exponentially decreasing with load sensitive premiums, Eq. (3.1)).
Only the earliest possible nish time for a job need be estimated in real time.
If the entire cluster is available for a job, then, of course, its earliest possible nish time is proportional
to its processing time. However, in general, not enough Map and Reduce slots (processor cores) will be
available to execute the job with the highest degree of parallelism; slots will become available over time
as other jobs complete their various phases.
To estimate the earliest nish time for a new job, MRM uses the deadlines committed to currently
executing jobs already admitted into the system. In general, for a cluster with multiple machines, comput-
ing the earliest feasible nish time for a new (n + 1)
st
job, with processing time p
n+1
, when MRM has
already accepted n jobs with processing times p
1
;::: ;p
n
and deadlines d
1
;::: ;d
n
is NP-hard. For the
case of two machines where all jobs have the same deadline d
1
= d
2
= ::: = d
n
= d
0
and jobs cannot be
partitioned (e.g., all jobs have 1 map and 1 reduce task), we can obtain a reduction from the PARTITION
problem that is known to be NP-hard [49].
In the current implementation of MRM, we use a simple greedy heuristic to estimate earliest nish
times. Basically, given the deadlines of currently executing jobs, MRM greedily attempts to ll up any
available slack in the schedule in determining the new job's earliest nish time. We illustrate this with a
simple example.
In this example, for simplicity, we assume that jobs do not have reduce tasks, and hence, contend
only for map slots (our MRM instantiation takes both, map and reduce slots into account). Consider the
30
example shown in Figure 3.4 for a cluster with 16 map slots. When job C (with 64 map tasks) arrives at
t = 0, there are two jobs A and B in the system. Job A and B have 16 and 32 map tasks left, respectively,
with 10 seconds of processing time for each task. Suppose the deadlines for A and B are d
A
= 70 and
d
B
= 110. Then, for A and B to meet their deadlines, their map phase must start by t = 60 and t = 90,
respectively. A lazy schedule where MRM schedules a job as late as possible is shown in Figure 3.4(a).
This schedule leaves a lot of slack in the system: all the map slots are free during the intervals t2 [0;60],
t2 [70;90], and t 110.
To nish C as soon as possible, we can ll some of this slack. For example, suppose each map task
of C takes 20 seconds to complete. Then, as shown in Figure 3.4 (b), we can schedule 48 of them during
the interval [0,60] and the remaining 16 during [70,90]. Hence, the earliest possible nish time for C is
t = 90s. However, if C's map tasks took 30 seconds to nish, then we can only schedule 32 of them
during [0,60], and none during [70,90], and so the earliest possible nish time for C will be t = 150s
(Figure 3.4(c)).
Figure 3.4 illustrates the key step in computing the earliest possible nish time identify the location
in time of slack in the system and then ll this slack with tasks from the new job (to the extent possible).
MRM's computation of the earliest possible nish time assumes that jobs contend only for map slots and
reduce slots. It does not explicitly take the contention for network bandwidth into account.
After computing the earliest possible nish time, MRM offers the price-deadline curve to the user. The
user selects a point on this curve as the deadline that is commensurate with the tokens she has and her
willingness to wait for her job to nish. At this point, MRM hands over the job to the scheduler.
3.4.3 Deadline-aware scheduler
In MRM, it is the scheduler's responsibility to make sure that jobs nish before their deadline. Clearly,
a FCFS (the Hadoop default) or a Fair scheduler [105] cannot always achieve this. However, an earliest
deadline rst (EDF) scheduler will sufce. Since users are allowed to select a deadline from feasible nish
times only, this implies that there exists a schedule S that satises all the deadlines; if S does not schedule
31
jobs in the order of increasing deadlines, then a simple pairwise re-ordering can be used to obtain an EDF
schedule from S without violating any of the deadlines [48]. We have implemented an EDF scheduler in
Hadoop.
3.4.4 Post-facto price adjustment
In Section 3.4.1, we noted that MRM computes pessimistic job processing time estimates to insure against
transient failures as well as variability in task durations, and to avoid violating deadlines. In general,
task processing variability cannot be avoided: the prevalence of stragglers in MapReduce jobs is well-
documented [20, 2], and these may be unavoidable due to server heterogeneity and data skew [2]. Hence,
MRM must account for task duration variability when estimating job processing times. Currently, it does
this by using the 99% condence limits inferred from historical job execution times.
An important downside of MRM's decision to over-estimate processing times is price ination. Since
the premium price and the decay rate of MRM's price-deadline curve for a job depends on its processing
time (Equations 3.1-3.3), MRM may over-charge users. To avoid this, MRM recomputes the price after
a job is completed. We added performance counters in Hadoop that allow MRM to keep track of the
total processing time for a job as its tasks execute. It uses this ex post facto estimate to recompute the
price-deadline curve: knowing the amount by which processing time was over-estimated, MRM shifts the
original price deadline curve accordingly to compute the revised price.
3.4.5 MRM's online monitor
To be robust to rare/unanticipated failures, MRM needs to detect and respond to any deviations from
expected behavior during a job's execution. We have implemented an online monitor in MRM that can
currently detect two types of deviations (1) change in cluster's processing capacity (number of map and
reduce slots), and (2) slowdown in map task durations. The number of map and reduce slots on a cluster
can change due to addition, rebooting, or failure of servers. Hadoop maintains cluster status counters that
32
track these changes. MRM uses these counters to adjust its estimate of the amount of work in the system.
This estimate is used in computing the earliest feasible nish time upon a job's arrival.
As mentioned in Section 3.4.4, MRM also keeps track of task durations as they nish. Hence, it can
detect slowdowns or speed-ups for phases with large number of tasks. This allows MRM to adjust its
estimate of the amount of remaining work and hence, avoid computing an inaccurate feasible nish time
for a new job.
Note that an unanticipated reduction in a cluster's processing capacity and/or slowdown (e.g., due to
network congestion) can cause MRM to violate the deadline of already accepted jobs. Currently, MRM
noties the user when these events (occasionally) happen. We plan on providing more options for users
in MRM, such as re-negotiating the deadline. We also plan to integrate a more sophisticated monitor in
MRM similar to those in Mantri [2] or Parallax [57]. This will be a signicant improvement over MRM's
current monitoring capabilities.
3.5 Evaluation
In this section, we discuss results from extensive experiments of our MRM implementation.
3.5.1 Experimental Setup
Clusters. We evaluate MRM through experiments on two qualitatively different Hadoop clusters: a small
private cluster and a shared cluster at our organization. The private cluster, of which we had dedicated
use, consists of 9 servers. The shared cluster consisted of a variable number (in most experiments we use
30 servers, but we also show experiments with 90 servers) of servers obtained on-demand from a large
compute cluster. Each experiment run on the shared cluster was submitted as a job through a Portable
Batch System (PBS). On both clusters, we setup Hadoop and submitted MapReduce jobs to it.
33
Cluster
size server inter-server
(# servers) hardware bandwidth
Shared 30/90
cores: 24
1 Gbps mem: 416 GB
disk: 3060 GB
Private 9
cores: 48
100 Mbps mem: 48 GB
disk: 100 GB
Table 3.1: Cluster congurations used in MRM evaluation.
Table 3.1 lists the relevant hardware characteristics for either cluster. Apart from size, there are three
qualitative differences between the two clusters: (1) our private cluster has a 10x lower inter-rack band-
width; (2) we had no control over allocated servers for the shared cluster, and hence, the network topology
changed across our different experiment runs; and (3) our jobs on the shared cluster shared the network
fabric with other jobs submitted by other experimenters using the compute cluster. While an option was
to reserve 30 servers for an extended period to nish all our experiments on the shared cluster with the
same set of machines, we choose not to do so. Our choice matches the elastic resource acquisition model
prevalent in today's clouds (e.g., Amazon AWS [21], Google [91]), in which users are more likely to setup
MapReduce clusters on-demand (and hence get different sets of servers to work with at different times).
In either cluster, we congured each server to have 2 map and 2 reduce task slots.
Workload. We created a benchmark workload comprising 100 jobs classied into 9 different bins, with
the number of jobs in each bin based on previously published workload statistics [105]. Table 3.2 shows
the distribution of jobs across bins, and the number of map and reduce tasks for jobs in each bin. This
particular workload mix reects a scenario where users submit lots of small experimental jobs, but there are
also some large production jobs (as is the case at Facebook and Yahoo). We chose the job size (the number
of maps/reduces) for each bin according to the cluster size. Hence, the largest jobs on the shared cluster
have 480 maps and 30 reduces, whereas on the private cluster they have 150 maps and 8 reduces. Finally,
the jobs in bins 79 contain many more map tasks than the available map slots the over-subscription
factor for the largest jobs is 8 (9.4) for the shared (private) cluster.
34
Bin % jobs
Shared Private
# maps/reduces # maps/reduces
1 38 1/0 1/0
2 16 2/0 2/0
3 14 10/0 5/0
4 8 20/2 10/1
5 6 40/4 16/2
6 6 60/6 20/2
7 4 120/12 32/4
8 4 240/16 64/8
9 4 480/30 150/8
Table 3.2: Workload used in MRM evaluation.
Job types. The duration of a map or reduce task of a job depends on the type of the job. Our workload
emulates two canonical MapReduce jobs grep (searching for a pattern) and sort. We generated these jobs
using the loadgen MapReduce classes that come with the Hadoop distribution. The loadgen class takes
two parameters, keepmap and keepreduce, which control the percentage of records retained at the end of
the map and the reduce phases. For grep jobs, keepmap=0.1% and keepreduce=100%, and for sort jobs,
keepmap=100% and keepreduce=100%.
All our experiments use constant-factor pricing with an exponential-decay: we have experimentally
veried our discussion of the qualitative differences between the various choices, and we omit these results
for brevity.
3.5.2 MRM: Validation
We begin by demonstrating that MRM can provide predictability in nish times along with service dif-
ferentiation. We ran a series of tests under different user wealth distributions (the initial distribution of
tokens).
In the rst distribution, users have enough tokens to select the earliest feasible nish time for each
job. In this case, the system has no slack this is a highly adversarial scenario for MRM and is unlikely
to occur. In practice, administrators will want to carefully distribute tokens to induce users to offer slack
when possible. The second wealth distribution emulates a moderate-slack scenario in which 50% of jobs
from each bin are time-critical (hence, they offer no slack) while the rest offer a slack value uniformly
distributed between zero and the maximum possible. Finally, we emulate a wealth distribution in which
35
users have just enough tokens to pay the resource price for each job, a high-slack scenario which is fairly
benign for MRM
5
. These scenarios represent three qualitatively different points in a spectrum of wealth
distributions that administrators can control to achieve system-wide performance goals.
Estimating job processing times. As input for all scenarios, to estimate processing times, we executed a
run of 100 jobs on either cluster. Job sizes were sampled as per the workload mix shown in Table 3.2, and
each job was randomly chosen to be either a grep or a sort job. On the shared cluster, a grep map task took
between 10-39s, while a sort map task took between 22-66s. The corresponding numbers for the private
cluster were: 21-80s for a grep map task and 39-124s for a sort map task. We believe that slower disk and
contention for the network were the main reasons for larger nish times on our private cluster. Naturally,
for a xed job size, a sort reduce task took much longer than a grep reduce task because it processed many
more records.
Predictability of nish times. In a practical system, perfect predictability cannot be achieved due to sys-
tem vagaries. For the same reason, data center operators engineer their systems for a high-level of avail-
ability, but cannot guarantee perfect availability. Recognizing this, we have two measures of predictability:
what percentage of jobs missed their deadlines (were tardy), and the 95-th percentile of tardiness (the mag-
nitude of deadline misses) across all jobs. To quantify service differentiation, we compare the deadline
for time-critical jobs under the no-slack and moderate-slack scenarios. Under moderate-slack, MRM can
leverage the available slack to offer tighter deadlines (compared to the no-slack scenario) to time critical
jobs.
To measure these quantities, we ran 6 experiments on each cluster with different numbers of jobs, with
the job size distribution sampled from Table 3.2. The job arrival process was Poisson with an average
inter-arrival time of 16s, a rate fast enough to result in contention for the cluster's resources. Further, recall
that the job distributions result in an over-subscription ratio of over 8 for the largest jobs. Jobs in each bin
were equally likely to be assigned to type grep or sort.
5
In our evaluations, we emulate these wealth distributions by using the nominal processing time estimates from MRM. In practice,
since these values are estimates, these wealth distributions will necessarily be approximate.
36
In the highly adversarial no-slack setting, as shown in Table 3.3, MRM violated only 2.7% and 8% of
the deadlines on the two clusters. Furthermore, the 95th-percentile tardiness is at most on the order of a
few minutes, and the maximum tardiness across all runs was 24 seconds and 6 minutes for the shared and
the private cluster, respectively.
However, in the moderate-slack and high-slack cases, MRM performed much better: on our private
cluster, there were no violations in any of the 6 runs in the high-slack case; with moderate-slack, the runs
with 125 and 100 jobs experienced no violations and only 1 violation occurred in the 75 and 50 job runs
with a tardiness of 1 and 2 minutes, respectively. On the shared cluster, we ran one experiment each for
the moderate-slack and high-slack cases with 100 jobs. MRM did not violate any deadlines in either case,
unlike the no-slack case where it violated deadlines for 2 jobs. These results, obtained with a relatively
unoptimized system, suggests that MRM can provide almost perfect predictability when users offer slack.
To further stress test MRM, we performed a subset of the 6 runs (125, 100, and 75 jobs) on a 90-server
shared cluster for the no-slack case. The percentage of tardy jobs (95
th
percentile tardiness) was 0.08%
(5.5s), 4% (52s), and 1.3% (2.75s), respectively. On this size topology, we also ran the 100-job mix for the
moderate and high-slack scenarios, and observed no violations.
Finally, we also ran experiments on our private cluster with 100 jobs where the workload mix for the
processing time estimation was different from Table 3.2 (we reversed the proportion of jobs in bins 7-9
and bins 1-3). Hence, the test run had many more large jobs, and this resulted in much higher contention
for reduce slots (and network bandwidth) during the test run compared to the run used for estimating
processing time. In this case, the fraction of jobs with deadline violations (95
th
percentile tardiness)
under the no-slack scenario was 8% (12 mins.). This is not surprising because MRM's processing time
estimation is based on the assumption that the training data captures the workload mix accurately. In
practice, operators will track the workload mix and ensure that MRM's online estimation scheme is current
(e.g., by removing obsolete points in the feature space).
These results are highly encouraging: while deadline violations can occur in the worst case, we believe
that operators can, by controlling wealth distributions, operate the cluster in the regime where users offer
37
slack, allowing the system to minimize or completely eliminate deadline violations in the average case. The
analogy with availability is helpful: in the worst case, any data center can become completely unavailable
after a catastrophic failure, but in the long run, on average it can be engineered to give high (e.g., 3-nines)
availability. We believe that MRM's pricing provides a framework for establishing similar predictability
of performance, with improvements to some of its core mechanisms.
While MRM's relatively unoptimized processing time estimation mechanism already works reason-
ably well, it is the one component which could be improved to increase predictability. MRM can both
overestimate and underestimate job processing times; the underestimates are more signicant on the pri-
vate cluster than on the shared cluster. As Figure 3.5 shows, in the no-slack case with 125 jobs on the
private (shared) cluster, MRM underestimated the processing time for roughly 47% (8%) of jobs. Under-
estimating processing times can result in deadline violations, and this explains the higher number of tardy
jobs (and tardiness) on the private cluster. In the moderate-slack and high-slack cases, even though MRM
can underestimate job processing times, these don't cause deadline violations because loose deadlines,
combined with overestimates for other jobs, provide enough of a buffer.
Moreover, we believe processing time estimation works better in clusters with better provisioning.
Our shared cluster performed much better than the private cluster because of its higher network and disk
bandwidths.
There are several possible approaches to improving processing time estimation. A combination of
ofine estimation (MRM's current approach) together with online adjustment based on current level of
resource contention can be one approach to make MRM's estimation more robust. One way to realize this
is to integrate systems such as Parallax [57] or Mantri [2] with MRM. Another approach is to improve the
ofine estimator. We plan to pursue these two directions as part of our future work.
Service differentiation. We demonstrate MRM's ability to provide earlier deadlines to time-critical jobs
at the expense of delay-tolerant jobs by comparing the deadlines for such jobs under the no-slack and
moderate-slack scenarios. Recall that only 50% of the jobs are time-critical in the moderate-slack case,
38
Shared cluster Private cluster
# jobs
% tardy Tardiness % tardy Tardiness
jobs 95th perc. jobs 95th perc.
125 0% 0 8% 5.5 mins
100 2% 20 s 0% 0
75 2.7% 24 s 6.7% 2.5 mins
50 0% 0 3% 4.3 mins
25 0% 0 0% 0
Table 3.3: Deadline violations with MRM in the no-slack scenario.
0 1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1
Ratio: Estimate/Actual
CDF
private
shared
Figure 3.5: Accuracy of MRM's processing time estimates.
and users select the earliest feasible nish time offered by MRM as the deadline for such jobs. With
no-slack, all jobs are time-critical and hence, MRM is forced to perform a FCFS scheduling. However,
with moderate-slack, MRM can leverage the slack offered by delay-tolerant jobs to nish time-critical jobs
earlier than they would nish under FCFS.
Figure 3.6 shows the distribution of the total time spent in the system (queueing delay plus processing
time) for the time-critical jobs in the 100 job runs under moderate-slack and no-slack scenarios on our
private cluster (the job arrival order was the same across the two runs).
Percentile Low-slack Moderate-slack
25th 5.3 1.4
50th 11.3 2.3
75th 54.6 20.9
95th 70.6 54.9
Table 3.4: MRM service differentiation: nish time (in minutes) percentile
39
0 20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
Time in the system
CDF
moderate−slack
Low−slack
Figure 3.6: Job nish times
Table 3.4 shows different percentiles for the nish time distributions shown in Figure 3.6. With
moderate-slack, MRM was able to leverage the available slack to signicantly reduce the waiting time
for time-critical jobs e.g., the median nish time with moderate-slack is almost 5 lower compared to
low-slack case. Hence, MRM can provide differentiated service to time-critical jobs (at a higher price),
when delay-tolerant jobs offer slack.
3.5.3 Comparison with Other Approaches
For shared MapReduce clusters, schedulers that prevent short jobs from getting stuck behind large jobs
(due to FCFS scheduling) have recently received considerable attention. Fair allocation of map and reduce
slots across jobs has been shown to be effective at reducing the waiting time for short jobs at the expense
of large jobs [105, 37]. A shortest job rst (SJF) priority scheduler can also achieve the same effect.
However, the state-of-the-art fair and priority schedulers for MapReduce clusters do not allow users to
specify/negotiate a deadline. Hence, they do not provide predictability in nish times. We demonstrate
this using the following experiments.
We submitted 100 jobs (as per the workload mix in Table 3.2, and a 50-50 split between grep and
sort job types) to MRM. We executed three runs of this job trace under the no-slack, moderate-slack, and
max-slack scenarios. To compare MRM against fair scheduling, we ran another experiment with the same
job arrival trace using the Delay scheduler [105] for Hadoop. We compared the completion time for a job
40
no-slack moderate-slack max-slack
Delay 14% 8% 1%
Scheduler (32 mins.) (25 mins.) (16 mins.)
Priority 1% 1% 0%
(SJF) (25 mins.) (8 mins.) (0)
MRM
0% 0% 0%
(0) (0) (0)
Table 3.5: Comparison: Delay scheduler, priority scheduler, and MRM
under fair scheduling with its deadline under MRM to determine if the fair scheduler was able to meet its
deadline. We repeated the same exercise using a SJF priority scheduler with the following priorities: jobs
from bins 13 had the highest priority followed by jobs in bins 47, and jobs from bins 89 had the lowest
priority.
Tables 3.5 shows the percentage of tardy jobs and the 95
th
percentile for deadline violations on the pri-
vate cluster. For deadline-agnostic schedulers, high-slack (no-slack) is the most (least) favorable scenario,
as in this case, users are willing to wait the most (least) for their jobs to nish. The Delay (SJF priority)
scheduler violates 14% (1%) of the deadlines in the high-slack case. More than the number of tardy jobs, it
is the magnitude of tardiness that demonstrates the lack of predictability; in the worst case, the Delay and
the SJF priority scheduler violate deadlines by 32 and 25 minutes, respectively. Not surprisingly, even for
favorable scenarios (high-slack and/or moderate-slack), the Delay and priority schedulers violate deadlines
by signicant margins.
We obtain a similar result on our shared cluster. In the worst case, Delay (SJF priority) scheduler
violated 17% (25%) of deadlines with a 95
th
percentile tardiness of 13 (25) mins. In contrast, MRM
violated only 2% of deadlines and the maximum tardiness was 20 seconds.
With additional mechanisms, it may be possible to achieve predictability comparable to MRM using
fair or priority scheduling. However, these additional mechanisms may be non-trivial. One approach
for improving the predictability with fair (priority) scheduling can be to assign weights (priorities) based
on users' deadline preferences, rather than job size. However, this requires addressing two signicant
41
0 2000 4000 6000
0
0.2
0.4
0.6
0.8
1
Job completion time (s)
CDF
no slack
high−slack
moderate−slack
Delay
Priority (SJF)
Figure 3.7: Job completion times with different schedulers.
challenges: (1) how does the scheduler map deadlines to weights (priorities)?, and (2) what mechanism
dissuades users from wanting the highest weight (priority) for all their jobs?
Finally, another way to compare the different approaches is to examine the job completion time distri-
butions. Figure 3.7 compares the cumulative distribution of job completion times for MRM (with no-slack
and moderate-slack) against fair and priority scheduling. Note that MRM with no-slack is equivalent to
FCFS. With slack available, MRM is able to offer deadline choices to jobs that enable them to skip ahead
of jobs that arrived earlier. Even in the moderate-slack scenario (where some large jobs do not offer any
slack), MRM is able to provide smaller nish times for more than 70% of the jobs compared to the Delay
scheduler. However, this comes at the expense of larger nish times for a small number of jobs (less than
5%).
3.6 Conclusions and Future Work
In this work, we have explored a novel service model for shared MapReduce cluster that provides both
predictability of nish times, and service differentiation. Our design, MRM, forces users to reveal their
true nish-time preferences using a novel price-deadline curve formulation, incentivizing them to offer
slack that enables the system to accommodate users with time-critical jobs. Our experiments show that,
42
despite signicant performance variability in clusters, it is possible to achieve near-perfect predictability
of nish times when users can be incentivized to offer slack.
The basic design of MRM opens up many interesting directions for future work. For instance, for
jobs with multiple MapReduce phases such as those written in Hive [31] and Pig [69], MRM would rst
need to identify the critical paththe task sequence with the longest nish timein the DAG. Second,
to deal with deadline violations, MRM can offer a rebate to users when it violates their deadlines, or
appropriately dene service-level agreements to guarantee a certain level of predictability, thereby setting
user expectations. Lastly, mechanisms to detect malicious users (e.g., those who inate feature vectors),
and penalize misbehaving ones, would be essential in a production system.
43
Chapter 4
Automatic Request Categorization in Internet Services
Modeling system performance and workload characteristics has become essential for efciently provision-
ing Internet services and for accurately predicting future resource requirements on anticipated workloads.
The accuracy of these models benets substantially by differentiating among categories of requests based
on their resource usage characteristics. However, categorizing requests and their resource demands often
requires signicantly more monitoring infrastructure. In this paper, we describe a method to automati-
cally differentiate and categorize requests without requiring sophisticated monitoring techniques. Using
machine learning, our method requires only aggregate measures such as total number of requests and the
total CPU and network demands, and does not assume prior knowledge of request categories or their in-
dividual resource demands. We explore the feasibility of our method on the .Net PetShop 4.0 benchmark
application, and show that it works well while being lightweight, generic, and easily deployable.
4.1 Introduction
As Internet and enterprise hosting services have evolved from single-host platforms to large data centers,
the tasks of resource provisioning and capacity planning have similarly grown both in importance and
difculty. Performance modeling is a natural approach for accomplishing such provisioning and planning
tasks in multi-tier cluster-based server systems. It forms the basis for many commercial tools [3, 100], and
contemporary models incorporate sophisticated techniques including detailed resource proling of tiered
44
services [89], long-term forecasting of workloads in enterprise systems [26], comprehensive queueing
models of multiple service tiers [96], and the modeling of nonstationarity in workload mixes [88].
An important aspect of performance modeling is characterizing application workloads and systems to
extract parameters, such as request rates and CPU service times, that serve as inputs to the models. Fre-
quently, workloads are modeled as a single class and aggregate workload and system parameters are used
as inputs. Application workloads, however, often have a variety of user interactions that affect resource
demands. For example, user requests in e-commerce applications typically fall into various categories such
as browsing and shopping. These workload categories can have different resource demands; for example,
shopping requires more CPU than browsing. Moreover, the relative mix of those categories in the overall
workload changes over time [26, 88]. As a result, models that explicitly incorporate a mix of workload
categories and differentiate between them can more accurately perform resource provisioning and capacity
planning for modern data centers [88].
Modeling workloads with category mixes, however, presents the challenge of parameterizing the dif-
ferent categories as inputs to performance models. Scalar (i.e., single class) models might require just the
overall request rate as an input, a straightforward measure to obtain from host performance counters or
server logs. However, models with multiple request categories require request rates
i
for each category.
Per-category parameters to be either extracted or inferred from the workload. Obtaining them would then
require monitoring and logging infrastructure that both differentiates among categories as well as corre-
lates requests across the multiple tiers of service infrastructure. Such instrumentation is certainly feasible
to engineer, but often requires a deeper understanding of both the workload and the service architecture,
as well as a substantially more complex monitoring infrastructure [5, 15].
This work explores the problem of inferring workload categories without the need for sophisticated
monitoring infrastructure. We present an automatic and generic request categorization method that uses
only coarse-grained measurements of system resources, such as overall request rate, total CPU, and net-
work usage as input. Using such measurements over multiple time windows, we apply a machine-learning
45
based computational technique called independent component analysis (ICA) to differentiate and catego-
rize requests and their resource demands in an offered workload. These per-category parameters can then
be directly used as inputs in Internet service performance models.
A learning approach to workload categorization has a number of advantages. First, it greatly sim-
plies the proling task when modeling workloads with multiple categories. Second, it treats both the
workload and system as a black box, inferring request categories and resource usage without requiring de-
tailed instrumentation. Finally, it can prole workloads at different granularities of categories. In addition
to session-level categories such as browsing and shopping, it applies equally well to categorizing work-
loads in terms of content type (e.g., HTML, images, scripts) and request operation (e.g., GET and POST
operations). Using ICA has constraints, though. It requires that certain statistical properties hold in the
workload, and it limits the number of categories that can be inferred to the number of measured aggregate
resources. With our benchmark experiments, we show that workload categorization provides promising
results despite these constraints.
This work makes two contributions. First, we describe how the problem of request categorization maps
to the framework that ICA uses. We also verify that our problem domain satises assumptions that the use
of ICA requires. Second, we evaluate our model using the PetShop 4.0 [53] benchmark on a Windows
Vista platform and show that, using ICA, we can efciently derive the desired request categorization and
resource usages. The percentage error for our request categorization estimates (relative to the ground truth)
and our resource usage estimates (relative to an alternate, more heavy-duty technique) are within 417%.
The rest of this chapter is organized as follows. Section 4.2 denes the problem and describes the model
that we use to represent the resource usage in the different components of an Internet service. Section 4.3
describes how we map our problem to the ICA framework, and how the assumptions that ICA requires
hold true for our data. We describe our experimental results in Section 4.4. In Section 4.5 we discuss the
applicability of our request categorization technique, identify several promising directions for improving
our technique, and discuss the ramications of using our technique in large-scale Internet services. We
conclude in Section 4.6.
46
4.2 Model
In this section we develop a mathematical model representing the problem of request categorization. Let
there be m basic request categories (RC
1
to RC
m
) in an application workload. A request category repre-
sents a class of requests that have a similar resource consumption pattern. Let there be n resources (RS
1
to RS
n
) associated with the Internet service under consideration. Resources could be quantities such as
CPU usage and network usage on web servers, database servers, and other service components.
We make a linearity assumption that states that there is a positive constant a
ij
such that the amount
of resource RS
i
required for processing x
j
requests of category RC
j
is given by a
ij
x
j
+ c
ij
, where the
constant c
ij
captures any non-linear components. In general, we can model the resource usage pattern for
a given time window as a set of linear equations as shown below.
a
i1
x
1
+ a
i2
x
2
+ ::: + a
im
x
m
+ c
i
= r
i
(4.1)
For example, say the index i = 2 gives us the equation for the web server's CPU usage. The total CPU
usage on the web server is the sum of the individual CPU usages of each request made to the web server.
Since there are m different request categories, and since each request of category k uses a
2;k
amount of
CPU on the web server, the total CPU usage for the web server is a sum of the m product terms shown on
the left hand side of Eq. 4.1, plus the constant value. The constant value c
i
=
P
m
j=1
c
ij
.
Eq. 4.1 can be written in matrix form as
Ax + c = r (4.2)
where A = fa
ij
g is the matrix of unit resource consumption for each request category, x = fx
j
g is the
vector of number of requests of a particular category, c =fc
i
g is the vector of constant terms and r =fr
i
g
is the vector of the aggregate resource usage. In Section 4.4.2, we show that the above model does capture
the CPU usage of a benchmark e-commerce application workload.
47
We further assume that the total number of requests,
P
m
j=1
x
j
, is known to us; we address how this
number can be derived in Section 4.4. We model this as the rst equation of our framework of Eq. 4.1, or
the rst row of A. So, for i = 1, all a
1;j
coefcients are set to 1 and the constant term c
1
is set to 0, so that
r
1
is the total number of requests.
Note that when we gather a large number of samples over a period of time, A and c remain xed, while
x and r vary. Therefore, for a set of T samples we can dene a system of linear equations as follows.
AX + C = R (4.3)
with X = [x
1
x
2
:::x
T
] and R = [r
1
r
2
:::r
T
], where x
t
and r
t
represent the request arrivals for each
category and the resource usage for the t
th
sample. The matrix C consists of T repetitions of the column
vector c.
We model the (measurement) noise in our system as an additive zero-mean Gaussian. We incorporate
the additive noise term into Eq. 4.3 as
AX +
~
C = R (4.4)
where
~
C
i
= C
i
+ ", with " being the additive noise. Geometrically, Eq. 4.4 says that if we plot each
column of R as a point in an n-dimensional space, then the T points corresponding to the T columns of
R should (approximately) lie on an m-dimensional hyperplane.
4.3 Analysis
Eq. 4.4 describes the underlying linear model for resource usage in an Internet service. From the aggregate
analysis of the resources consumed, we know the entries of R. However, A,
~
C and X are unknown
and neither do we know the number of categories, m, in the system. Therefore, the system seems to
be extremely underspecied. Nevertheless, we can estimate m, A and X just from R using a machine
learning technique called independent component analysis (ICA).
48
ICA is a gradient descent based optimization technique for solving the generic problem of blind source
separation where the sources, X, and the mixing matrix, A, are unknown, but aggregate observations,
R, are known (see Ch. 7 of [33] for an accessible introduction). ICA can be applied to solve the matrix
factorization problem as stated in Eq. 4.4 only if (a) the columns of X are statistically independent, and
(b) the distribution of the columns of X is non-Gaussian.
Given that we know the entries of the R matrix, we can directly apply ICA to compute A, eliminate
~
C, and thereby compute X. However, before we do so, we must ensure that the basic assumptions of ICA
hold for our data set and, secondly, we know the value of m the number of request categories.
Non-Gaussianity of Request Arrivals: The aggregate per second request arrivals to a web service can
be modeled as samples drawn according to a geometric Poisson distribution (also known as P olya-Aeppli
distribution) [39]. A geometric Poisson distribution for aggregate request arrivals results from modeling
the client session arrivals as Poisson and the number of requests generated by each client session as a
geometric random variable [39, 103]. Since the distribution of the aggregate per second request arrivals is
non-Gaussian, the distribution of the aggregate request arrivals for each category (i.e., the distribution of
the columns of X) will also be non-Gaussian. For example, if the proportion of requests of each category
in the workload is xed, then the request rate for each category is a xed fraction of the aggregate request
rate.
Even though the aggregate per second request arrival is non-Gaussian, if we sample over large time
intervals, the aggregate request arrivals can be approximately Gaussian. We can model the aggregate
request arrivals over a sampling interval (larger than 1 second) as the sum of identically distributed random
variables where each random variable models the aggregate request arrivals per second (i.e., follows a
geometric Poisson distribution). Thus, the aggregate request arrivals for a large sampling interval can be
Gaussian due to the central limit theorem. In our experiments (Section 4.4), we choose a small sampling
interval equal to 10 minutes. The choice of sampling interval is an important parameter for our technique
and is further discussed in Section 4.5.1.
49
Independence assumption: The assumption that the arrivals for different request categories (i.e., the
columns of X) are statistically independent may not always hold. In practice, ICA methods (e.g., the
FastICA package that we use in Section 4.4), look for components that are as independent as possible.
Techniques such as whitening the observed resource consumption measurements (R) and randomizing the
order of samples in the input to ICA helped reduce the error in our ICA based estimates for an e-commerce
benchmark (Section 4.4.3). Whitening involves applying a linear transformation to a vector such that the
resulting vector has uncorrelated components with their variances equal to unity. The FastICA package
performs whitening by default. We further discuss the applicability of ICA based techniques to real world
data in Section 4.5.4.
The problem of determining the number of request categories, m, boils down to nding out the rank
of the R matrix. This can be understood as follows. If the data has been generated by m request cate-
gories, but there are n (> m) resources, then as stated earlier the r vectors will lie on an m-dimensional
hyperplane in the n-dimensional space. This implies that all the samples of r can be expressed as a linear
combination of m basis vectors. Therefore, the rank of R should be m. Note that since the rank of R can
at most be n, we cannot identify more request categories than the number of resource types we have.
Although there are several techniques for estimating the rank of a matrix, in practice we run ICA
assuming m = n to estimate A and X. If m < n, then the entries of n m rows of X turns out to
be signicantly smaller (even negative) than the rest, which helps us throw away those request categories
leaving only the m valid ones.
The A matrix estimated by ICA is unique up to the scaling of the columns and their permutations.
However, recall that the rst row of the matrix models the total number of requests and, therefore, we
require that a
1;j
= 1 for all j. To ensure this, we can normalize the output of ICA. Suppose the matrix
returned by ICA is A
0
. To obtain A, we perform the normalization step by dividing the entries of the j
th
column of A
0
by a
1;j
. Also, since the different permutations of A correspond to different namings of the
request categories, it is sufcient to know any one of the possible permutations.
50
4.4 Experimental Validation
In this section, we validate the linearity assumption (Section 5.2) and demonstrate the feasibility of us-
ing our ICA based method for workload characterization using .NET PetShop 4.0 [53], a benchmark
e-commerce application. We show that the linear model holds for CPU usage of the web and database
servers. We also show that our method automatically categorizes requests, and determines good estimates
of the number of requests and the resource requirements for each category.
4.4.1 Experimental methodology
We ran the ASP.NET PetShop 4.0 benchmark in two tiers: a front-end IIS 6.0 web server and a back-end
SQL Server 2005 database. Both machines ran Windows Vista as the host operating system. A third
machine running Visual Studio Team Suite (VSTS) [54] emulates multiple client sessions. Each client
session generated either a browsing or a shopping workload session. Hence, in our experiments with
session level categorization, the actual number of request categories is 2. A browsing session emulated
a client merely browsing through various items on sale, while in a shopping session the client bought
exactly one item after initial browsing. The average duration of shopping and browsing sessions was 30
and 45 seconds respectively. The different HTTP-level transactions within a session were generated using
the VSTS record/replay utility: the recorder recorded a sample browsing session and a sample shopping
session, and then replayed these sessions while varying user think time between transactions. The user
think time was normally distributed with a mean of 3 seconds.
We collected 270 data points (matrix R), each one at the end of a 10 minute-long experiment run.
Hence, the value of T is 270. Each data point consisted of 3 resource measurements: the aggregate
CPU usage of the web server, the aggregate CPU usage of the database server, and the total number of
client sessions serviced. Therefore, n is set to 3. We measured the CPU usage using the pstat utility,
available with the Microsoft Platform SDK for Windows Server 2003, and obtained the total number of
client sessions serviced from the VSTS test run logs. In practice, we expect to use existing heuristics [32]
51
to determine the total number of sessions using only the front-end web server logs. Before collecting the
data points, we veried that a 10 minute experiment window was sufcient to obtain reproducible results.
The number of emulated clients for each experiment run was randomly chosen between 10 and 200.
The objective of this experiment was twofold. First, we wanted to ensure that the linearity assumption
holds for CPU usage by our benchmark workload, even though previous work has shown this to hold
in large-scale Internet services [89, 107]. Second, we wanted to evaluate ICA as a method of automatic
request categorization. That is, given R as described above, we wanted to validate whether our method
automatically detects that the number of request categories is indeed 2 (m), that it accurately estimates the
number of requests in each category in the different time slots (X), and that it determines the per-category
CPU usage on the web server and the database (A).
We obtain the ground truth values for the number of client sessions serviced for the browsing and
shopping sessions during each experiment (say
~
X) using the VSTS logs. We use these values to validate
the linearity assumption (Section 4.4.2), and as a baseline for comparing with the results inferred by ICA
(Section 4.4.3).
4.4.2 Linearity validation
An F-test (a standard method for estimating data linearity) on the web server and the database server data,
using R and
~
X, showed that the hypothesis that the data is linear can be rejected with a probability of
0. This clearly indicates the linear trend in the aggregate CPU usage on the web server and the database
server.
We now show how much deviation there is from the perfect linear representation by comparing the
measured values of resource usage to estimates that we obtain from the linear model. Figure 4.1 shows the
cumulative distribution of the percentage error between the measured CPU usage values for the web server
and the database server, and their estimates obtained using our linear model for resource consumption
(Eq. 4.4).
52
0 10 20 30 40 50 60 70
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Percentage Error
CDF
web server
data base
Figure 4.1: IIS and SQL server CPU usage
To obtain the estimates for CPU usage from our linear model, rst, we used the measured values for
CPU usage (R) and the ground truth number of browsing and shopping sessions (
~
X) to estimate the matrix
A and the vector
~
C using linear least-squares regression. Then, we use the estimates for A and
~
C, and
the ground truth number of arrivals,
~
X, to obtained the estimates for CPU usage
~
R using Eq. 4.4. The
percentage error between the CPU usage estimate based on our linearity assumption,
~
R, and the measured
values, R, for a data point i is computed as
j
~
R(i)R(i)j
R(i)
100.
For the web server and the database server CPU usage, the average percentage error was 13% and
11%, respectively. Approximately 85% of the data points for the database server and 75% of the data
points for the web server had a percentage error less than 20%. Thus, for the complex workload generated
by our benchmark e-commerce application, our linear model can capture the CPU usage fairly accurately.
Similar results are reported by Zhang et. al [107] for the TPC-W benchmark workload in a two-tier setup
consisting of a web server and a database server. For least-squares regression-based estimates, in the best
(worst) case, they report a relative error of 15% for 98% (87%) of the samples for the web server and a
20% relative error for 89% (79%) of the samples for the database server.
53
Category Resource Regression ICA Error
Shopping
IIS CPU 66.87 69.69 4.2%
SQL CPU 13.27 14.86 12.0%
Browsing
IIS CPU 44.44 47.58 7.1%
SQL CPU 3.27 2.76 15.6%
Category 3
IIS CPU - -2053.26 -
SQL CPU - -23.81 -
Table 4.1: Per-request CPU demand estimates for the PetShop benchmark (in milliseconds).
4.4.3 Request categorization
As stated in Section 4.3, with three resources (n = 3) we can group the requests into at most three
different categories. Lacking any a priori information, we start our ICA computation assuming that the
number of categories is the same as the number of resources, i.e., m = n = 3. We used the FastICA [23]
implementation of ICA to determine 3 sets of CPU usage coefcients for the web server and the database
server for these request categories. We discuss how we infer the correct number of categories using ICA
estimates later in this section.
Resource consumption estimates. Table 4.1 summarizes the estimated CPU demands of a single request
of each category (entries of matrix A) using FastICA and linear regression. The column labeled Error
shows the relative difference between FastICA and linear regression estimates.
We make this comparison for the following three reasons. First, we lack the ground truth value for
resource usage. Measuring the ground truth value for matrixA would require tracking the resource usage
for each individual request within a session as it is serviced at the various tiers (web server and database
server in our case) within the system. While this might be feasible using a monitoring tool like Magpie [5],
it requires a thorough instrumentation of all the system components and may not scale to large numbers
of simultaneous client sessions (the evaluation in [5] considered only 10 simultaneous clients). We plan to
explore the feasibility of obtaining ground truth values for resource consumption using instrumentation in
the future.
Second, as shown in Section 4.4.2, our linear model captures the aggregate CPU usage for the web
server and database with small error. Hence, we expect the linear regression estimates of A to be fairly
54
accurate and, thus, a natural yardstick for comparison in the absence of the ground truth values. Note that
linear regression requires a ne-grained knowledge of the workload: the number of sessions of each cate-
gory for each sample. Measuring the number of sessions of each category (required for linear regression)
is not possible without the knowledge of the request categories present in a workload and, for an unknown
workload, will require an instrumentation-based tool like Magpie [5]. We could use linear regression for
our workload only because we knew the request categories and recorded the number of sessions for each
category (
~
X) in our client emulator.
Third, comparing our matrix A estimates against linear regression provides a comparison between our
method and an alternate technique in the literature. Linear regression is used in [107] to estimate the CPU
usage for different (transaction level) request categories in the TPC-W benchmark workload.
For the two request categories, i.e., shopping and browsing, the ICA estimates are quite close to the
linear regression estimates. Our PetShop 4.0 workload consisted of two categories, browsing and shop-
ping; but the ICA based method estimates 3 different categories. Additionally, the per-request CPU usage
estimates for Category 3 are negative. We conjecture that FastICA models the noise in our system using
the third category. This conjecture is further supported by the fact that 34% of the estimates for the number
of sessions of Category 3 are also negative and that on an average the number of sessions in Category 3
was found to be only 0.6% of the total number of sessions.
Number of sessions for each category. We compare the FastICA estimates for the number of browsing
and shopping sessions against the ground truth values collected from VSTS. Figure 4.2 shows a scatter
plot with the ground truth values plotted on the x-axis and the estimates on the y-axis. The estimates
are close to the actual values: the average percentage error is 11% and 17% for browsing and shopping
categories, respectively. We believe that these errors can be further reduced using techniques that we
discuss in Section 4.5.1.
55
0
500
1000
1500
2000
2500
3000
3500
0 500 1000 1500 2000 2500 3000
Estimated no. of requests
Actual no. of requests
Browsing
Shopping
Figure 4.2: PetShop: Request estimate comparison
4.5 Discussion
Our results show that a lightweight, low-overhead and generic technique such as ICA is very useful in
characterizing and categorizing requests. Using only aggregate metrics such as total CPU usage over time,
we can estimate the number of request categories and per-request resource usages for an e-commerce
application.
Much work remains towards increasing the accuracy of this technique. In this section, we rst describe
some of these directions. We also provide a brief overview of another blind source separation technique
called Canonical Correlation Analysis (CCA) that we plan to compare against ICA as part of our future
work. Finally, we discuss several design issues that we expect will arise in designing a capacity planning
tool that incorporates our request categorization technique, especially for large Internet-scale systems.
4.5.1 Accuracy improvement
We plan to explore the following issues to evaluate the effectiveness of our ICA-based technique for work-
load characterization more rigorously, and to further improve its accuracy.
56
1) Sensitivity analysis. In general, a larger data set (larger value of T) results in more accurate estimates.
However, the minimum number of data points required to achieve accurate estimates depends on the work-
load. In general, the noisier the data, the higher the required value of T. Guidelines for selecting T for
different workloads and applications are a subject of future work.
The width of the monitoring window after which we collect a data point is another important parameter
for our method. In general, deciding the duration of the monitoring window for a workload is tricky: a
smaller time window will lead to more variability in the measurements, which is favorable to our cate-
gorization technique; but too small a time window may fail to capture a substantial number or requests
in their entirety. For example, when modeling at the session granularity, typical shopping and browsing
sessions last at least a few minutes. A smaller time window will then fail to model the system accurately.
Hence, the duration of the monitoring window should be long enough to capture the processing of re-
quests/client sessions in their entirety, but it should be short enough to prevent the aggregate request
arrival rates from following a Gaussian distribution (as discussed in Section 4.3). We plan to do more
experiments to investigate the impact of sampling duration on our ICA based technique. In particular, we
are interested in using aperiodic sampling where the duration of the monitoring time window is chosen
uniformly at random, for example, between 10 minutes to 1 hour.
2) Effect of Noise. The presence of (measurement) noise makes the problem of estimating the matrix A
more difcult (see Ch. 15 of [33]). In practice, the measurements can be pre-processed to reduce noise
before performing ICA. Even though FastICA is known to be consistent in the presence of noise with
certain characteristics, an appropriately designed pre-processing step to reduce noise can further improve
the estimate for matrix A. Another alternative would be to use techniques specically designed for noisy
ICA [34].
3) Only partially blind. We are also exploring a more informed method of performing automatic request
categorization by augmenting our ICA-based technique with feature-based information from server logs.
For example, if we assume that, in general, HTTP GET requests are less resource-intensive than HTTP
POST requests, or that static content is less resource-intensive than dynamic content, then we can adopt a
57
more informed approach to request categorization. We believe that such feature-based differentiation can
provide useful hints to our factorization algorithm.
4.5.2 Workload characterization granularity
One limitation of the current technique is that the number of request categories that we can identify is
limited by the resources whose usage we can measure. Heterogeneous components within the system can
help us identify more request categories. For example, in a system with separate web, application logic and
database servers, using aggregate CPU usage measurements we can partition the requests into up to three
different categories. We can also monitor usage of other resources for example, it is straightforward to
incorporate network usage into our model. More generally, with s different components (servers) and k
different resources in each, we can partition the requests into at most s k different groups.
4.5.3 Alternatives to ICA
Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are two other data-driven
techniques for Blind Source Separation (BSS). All the three techniques (PCA, ICA and CCA) are designed
based on a common principle: they aim to nd linear transformations of the original data such that the re-
sulting components (source signals) are mutually uncorrelated. The uncorrelated constraint, follows from
the assumption that the original sources are independent. However, uncorrelatedness is not enough to
distinguish between the possibly innite number of linear transformations producing uncorrelated compo-
nents, and hence, additional constraints are required.
PCA tries to maximize the variance of the resulting components. ICA tries to nd statistically inde-
pendent components by minimizing gaussianity (based on measures such as kurtosis, negentropy, Chapter
8 [33]). The justication for minimizing gaussianity to search from independent components comes from
the central limit theorem. According to the central limit theorem, sums of nongaussian random variables
are closer to being gaussian than the original variables. Hence, a linear combination of the observed mix-
ture (e.g., the aggregate resource usage measurements in our case) will be maximally nongaussian if it
58
equals one of independent sources. CCA maximizes the autocorrelation of the resulting components. The
autocorrelation constraint is based on the observation that several real world sources (e.g., fMRI source
signals) are autocorrelated [24]. ICA and CCA have been known to outperform PCA in terms of accuracy
and robustness [24, 33]. In practice, the observed mixture is pre-processed using PCA to reduce its di-
mension before using ICA or CCA. For a detailed comparison of the performance of the three methods on
functional MRI data, refer to [24].
We plan to compare the results from CCA based workload characterization with our ICA results as part
of our future work. The advantages of using a CCA based method over ICA are the following: a) CCA
does not require the original source components to be nongaussian, b) CCA does not require the original
sources components to be independent (uncorrelatedness is enough) and c) two different runs of CCA on
the same data identify the same components but the components identied by ICA in two different runs
are not necessarily the same. This is so because packages such as FastICA use random starting points and
hence, their output is not deterministic.
4.5.4 Request Arrivals: Assumptions
The blind source separation techniques discussed in the previous section look for source signals that are
mutually uncorrelated. In addition, ICA tries to nd components that are as independent as possible. If
the request arrivals are correlated, then estimating mutually uncorrelated and/or independent components
might not yield satisfactory results. The question whether BSS techniques like ICA and CCA are useful
for real world workloads is an empirical one. We plan to evaluate our BSS techniques using workload
benchmarks such as TPC-W and RUBiS as well as real production system traces as part of future work.
It is worth mentioning here that BSS for correlated sources is an active area of research within the signal
processing community [59].
59
4.5.5 Linear model applicability
The usefulness of techniques like ICA and CCA for workload characterization depends on the validity of
our linear model (equation 4.4). Our validation results in Section 4.4.2 and the results in [88, 107] show
that assuming a linear model for resource consumption is not an oversimplication, and that it can estimate
the resource consumption in real production systems as well as benchmark workloads with low error.
However, the linear model may not be applicable to all scenarios. As pointed out in [88], the lin-
ear model ignores interaction effects across transaction types. For example, checkout transactions might
require more CPU time during heavy browsing if the browsing activity reduces the cache hit rates for
the checkout transactions. The likelihood of these non-linear interaction effects across request/transaction
types is much higher in a heavily loaded or an overloaded system. Stewart et. al [88] specify that their
linear model based technique is most effective when the system is not heavily loadedpeak CPU utilization
is less than 70%.
Additionally, our linear model does not capture autocorrelation in the aggregate resource utilization.
Mi et. al [52] show that, in spite of independent generation of requests of a certain category, the measured
resource consumption can show signicant autocorrelation; especially when the request arrival is bursty.
Furthermore, for a closed-loop system, this autocorrelation in demand can propagate to all the tiers in the
system. Experimental evaluation in [52] shows that the observed autocorrelation in resource consumption
is signicantly stronger when the resource utilization at the bottleneck resource is above 80%, i.e. the
system is heavily loaded.
Based on the results in [88, 52], we conjecture that the linear model is applicable to scenarios where
the system is not heavily loaded (e.g., peak resource utilization is less than 70%). Cockcroft et al. rec-
ommended that peak resource utilization in Internet services should be below 70% [17]. Thus, the linear
model is applicable to a wide range of scenarios present in real production systems. We plan to investigate
the applicability of our linear model in more depth as part of future work. We also plan to characterize the
impact of autocorrelation and interaction across request types on the errors in our BSS based estimates.
60
4.5.6 Data collection issues
Internet services today consist of multiple components, each of which performs a particular task and can
consist of hundreds, if not thousands, of machines. The existence of more components helps our request
categorization technique by allowing a more ne-grained differentiation of requests, as discussed in Sec-
tion 4.3. But each component can have multiple machines performing similar functions either to provide
fault-tolerance or load-balancing. Since our method associates resource usage with a component, we would
need to aggregate resource usage measurements from all machines providing the services of the compo-
nent and calculate an average. If different groups of machines in the component perform diverse tasks, our
problem formulation considers each group as a separate resource and therefore will collect resource usage
statistics from each group as an independent measure.
4.6 Conclusion
This paper explores the feasibility of using independent component analysis for Internet service applica-
tion workload characterization. Using measurements of aggregate system resource usage, such as CPU,
our method can infer various characteristics of the workload: the number of different request categories,
the resource demands of each category, and the distribution of request arrivals across different categories.
Inferring these category parameters has direct application to the performance modeling of tiered Internet
application hosting systems. These models can benet substantially from the increased delity of model-
ing workloads in separate categories. The advantage of our approach is that it does not require detailed
instrumentation of the system to obtain category characteristics, just aggregate workload information that
is straightforward to obtain. Our evaluation results using an e-commerce benchmark are promising and
indicate that our method can potentially characterize fairly complex workloads with acceptable error.
61
Chapter 5
Dynamic Data Compression in Multi-hop Wireless Networks
Data compression can save energy and increase network capacity in wireless sensor networks. However,
the decision of whether and when to compress data can depend upon platform hardware, topology, wire-
less channel conditions, and application data rates. Using Lyapunov optimization theory, we design an
algorithm called SEEC that makes joint compression and transmission decisions with the goal of minimiz-
ing energy consumption. A practical distributed variant, DSEEC, is able to achieve more than 30% energy
savings and adapts seamlessly across a wide range of conditions, without explicitly taking topology, appli-
cation data rates, and link quality changes into account.
5.1 Introduction
Wireless sensor networks, in which nodes collect sensor data and transmit them over multiple hops to a
sink, have enabled unprecedented visibility in many natural and built environments. Data compression has
been considered for these networks in two contexts. First, compression can reduce transmission costs and
thereby save energy resources. Second, for applications that generate signicant data, compression can
increase the effective network capacity.
Until recently, it was assumed that compression was always energy-efcient so that the decision to
compress could be made statically. However, Sadler and Martonosi [76] showed that, for many commonly
used platforms, whether compressing data at a node is energy-efcient or not depends upon the platform's
62
components, as well as the position of the node in the topology. Specically, they showed that the bal-
ance between the energy required for compression and the energy required for transmission is such that
compression wins only if a node is several hops away from the sink (the actual distance depends upon
topology). One can extend the arguments in Sadler and Martonosi [76] to show that the decision to com-
press at a node can also depend upon the wireless channel conditions along the path from the node to the
sink: a noisy channel, because it consumes energy in retransmissions, can make it favorable to compress
at nodes closer to the sink. Moreover, this decision can also depend upon the application data rate when
the network is operating close to capacity, it might be more energy-efcient to compress a fraction of
packets, rather than all of them.
In practice, this means that the decision of whether (and when) to compress data for energy-efcient
operation must be made dynamically. It is possible, of course, to devise a heuristic decision algorithm for
this task, that explicitly takes platform energy consumption, topology, channel characteristics and applica-
tion data rates into consideration.
In this work, we take a more principled approach. Using tools from Lyapunov optimization theory
(Georgiadis et al. [44]), which explores the design of stable transmission schedulers that optimize a given
objective function, we design an algorithm called SEEC (Stable Energy-Efcient joint Compression and
transmission scheduling, Section 5.4). SEEC ensures network stability (queues at nodes are nite) and
minimizes time average energy expenditure, while dynamically deciding whether and when to compress
data in order to achieve energy-efciency. It is parsimonious, in the sense that a single parameter V governs
the performance of the algorithm.
We are able to provide a performance bound for SEEC. Specically, we show that SEEC's energy
consumption is within an additive factor of the optimal. It can achieve an energy consumption arbitrarily
close to the optimal at the expense of increased queueing delay.
SEEC is a centralized algorithm and makes idealized assumptions about wireless medium access (Sec-
tion 5.2), so we also explore a more practical distributed variant called DSEEC (Section 5.5). This variant
runs on CSMA-based MACs (such as those used with 802.11 and 802.15.4 radios). It uses the same
63
compression decision algorithm as SEEC (which requires only local information), and uses queue backlog
information from its local neighborhood to implement SEEC's transmission scheduling algorithm.
DSEEC's chief advantage is that it is able to achieve energy-savings, while adapting dynamically to
platform changes, channel variability, diverse application data rates, and topology diversity. We demon-
strate (Section 6.4) its adaptivity through extensive simulations in Qualnet [71] (a widely-used high-delity
wireless simulator), and show that DSEEC can achieve more than 30% energy savings in some settings.
DSEEC's elegance stems from the fact that it is able to make the compression decisions without explicitly
considering topology, application data rates, or channel qualities. Instead, these characteristics manifest
themselves as changes in queue backlog at a node, which triggers compression decisions. Finally, we show
that DSEEC's performance is relatively insensitive to the choice of its parameter V : changing V by three
orders of magnitude affects overall energy consumption by less than 10%.
5.2 System Model
In this section we describe our model for a wireless (sensor) network deployed for data collection. This
model provides a framework for describing our joint compression and transmission scheduling algorithm
and analyzing its performance. For ease of exposition, we assume that time is slotted; our system model
and algorithms can be easily generalized to the continuous time case.
Data acquisition. Consider a network of N nodes. LetN =f1;2;::: Ng represent the set of nodes. Each
node n2N has a sensor attached to it, which collects measurements periodically. Let A
n
[t] represent the
number of measurement samples collected by the sensor attached to node n during timeslot t. We assume
that the size of each sample is b bits. In a wide range of sensing applications, data is collected at a xed
sampling frequency: vibration (Paek et al. [67]), and imaging (Hicks et al. [29]). For these applications,
A
n
[t] = a for all timeslots t. However, if the sensing application adapts the sampling frequency in response
to an event (Caron et al. [13]), then A
n
[t] may vary across time slots. We model A
n
[t] as a discrete random
variable with maximum value m. Each node has to deliver the measurement data collected by its sensor to
64
a sink (possibly) over multiple wireless hops. In the design and analysis of SEEC, we assume a single sink
but possible dynamic routing over different outgoing links. However, the evaluation in Section 6.4 is done
for the special case of a xed routing tree rooted at the sink.
Data Compression. The data collected by a node is rst processed by a compression module. The com-
pression module can compress the data using one of the K compression algorithms available to it. Every
timeslot t, the compression module picks a compression option k
n
[t]. Option k
n
[t] = 0 represents no
data compression. This is the central focus of the paper: the decision about whether to compress, and
which compression option to use, is made jointly with transmission scheduling (discussed below), taking
channel conditions and interference into account. The compressed data is then delivered to the queue at
the transmission module (discussed below).
For each node n, we model the size of the data after compression, R
n
[t], as a random variable. In
practice, the output of the compression module will depend on the values of measurement samples being
compressed. However, for analytical tractability, we assume that R
n
[t] are i.i.d. conditionally on the value
of A
n
[t] and k
n
[t]. As we show in our trace-based evaluations (in Section 6.4), our results hold even for
real datasets, indicating that this assumption is not restrictive in practice.
Let E
n
C
[t] denote the energy consumed at node n while compressing data using option k
n
[t]; with
E
n
C
[t] = 0 for k
n
[t] = 0. We model E
n
C
[t] as a random variable whose value is dependent upon the com-
pression option k
n
[t]. We assume that E
n
C
[t] are i.i.d. for all t with the same compression option k
n
[t].
Compression Algorithms. We characterize the different compression algorithms in terms of their ex-
pected compression ratios and energy consumption. For every compression algorithm k 2 K, we dene
65
the expected size of the compressed output m(a;k) and the energy consumption (a;k) for A
n
[t] = a as
follows:
m(a;k)
M
=
EfR
n
[t] j A
n
[t] = a;k
n
[t] = kg (5.1)
(a;k)
M
=
EfE
n
C
[t] j A
n
[t] = a;k
n
[t] = kg (5.2)
Given our denition of m(a;k), the expected compression ratio achieved by option k is
m(a;k)
ab
.
We assume that m(a;k) and (a;k) are known for all compression algorithms k2K. In practice, these
quantities can be empirically determined from a large enough sample dataset prior to network operation.
If such quantities are not available prior to network operation they can be estimated on-line during an
initial learning period during which data is always compressed to collect statistics. In our evaluations, we
use the former approach for simplicity.
Scheduling Transmissions. LetL denote the set of all wireless links in the network. Each link l 2L is
characterized by a pair of nodes (n
1
;n
2
) where n
1
transmits data to n
2
over the link l. For each node n,
let
n
represent the set of all outgoing links on which node n can transmit data and
n
represent the set
of all incoming links on which node n can receive data.
The transmission rate over a wireless link depends on the transmit power and the channel state.
The channel state of a wireless link can be characterized using several different metrics: the Signal to
Interference plus Noise Ratio (SINR) for the link, the expected transmission count (ETX) metric (De
Couto et al.[19]), the LQI (Link Quality Indicator threshold) (Srinivasan et al. [86]) and the RSSI (Re-
ceived Signal Strength Indicator) values, etc. In this paper, we use the ETX metric to capture link quality
in our evaluation (Section 6.4). If we model each packet transmission attempt as a Bernoulli trial, then
ETX=1/(p
f
p
r
), with p
f
and p
r
equal to the successful transmission probability for a packet and its
acknowledgment, respectively.
We represent the current channel state at time t as a vector
~
S[t] =fS
l
[t]g
l2L
where S
l
[t] denotes the
channel state at time t for link l. For all links l2L, we assume that S
l
[t] takes values from a nite set
~
S
66
for all t. We represent the (nite) set of all possible channel state vectors byS =
~
S
jLj
, and hence,
~
S[t]2S
for all t.
A centralized
1
transmission scheduler determines the set of nodes that can transmit at time t based on
the current channel state
~
S[t]. For each link l2L, it determines a transmit power P
l
tran
[t]. We assume that
P
l
tran
[t]2f0;P
max
g for all links l and time t (it is possible to extend our framework to multiple transmit
power levels). A node n transmits at time t if and only if for at least one of its outgoing links l 2
n
is
assigned P
l
tran
[t] > 0. We represent the transmit power allocation by the scheduler at time t as a vector
~
P
tran
[t].
We can incorporate different scheduling constraints into the transmission scheduler. For example,
if nodes have a single radio, and they cannot transmit and receive (or transmit on more than one link)
simultaneously, we can impose the following feasibility constraint on
~
P
tran
[t]:
X
l2
n
I(P
l
tran
[t]) +
X
l2n
I(P
l
tran
[t]) 1 8 nodes n (5.3)
where I(P
l
tran
[t]) is equal to 1 if P
l
tran
[t] > 0 and zero otherwise.
Transmission capacity-power curve. Let
l
[t] denote the number of bits that can be transmitted over link
l during time slot t. We model
l
[t] as a function,
l
[t] = C
l
(
~
P
tran
[t];
~
S[t]); (5.4)
where
~
P
tran
[t] = fP
l
tran
[t]g
l2L
,
~
S[t] is the current channel state, and C
l
(
~
P
tran
[t];
~
S[t]) is the capacity-
power curve for link l dened by the modulation and coding schemes used for transmission on link l. We
assume that there exists a constant
max
, such that
l
[t]
max
for all t. We also assume that, for all l,
1
In a later section, we discuss a decentralized scheduler whose decisions approximate this centralized scheduler.
67
C
l
(
~
P;~ s) is piecewise continuous in
~
P , and is monotonically increasing in the transmission power for link
l, i.e.,
C
l
((P
l
;fP
j
g
j6=l
);~ s) C
l
((
~
P
l
;fP
j
g
j6=l
);~ s) for P
l
>
~
P
l
for each channel state ~ s. Additionally, for every link l, C
l
((0;P
j
j6=l
);~ s) = 0 for every channel state ~ s, i.e.,
zero transmission power always yields zero transmission rate. Thus, at time t,
l
[t] > 0 only for those links
l that are allocated P
l
tran
[t] > 0 by the transmission scheduler. Since we assume P
l
tran
[t] 2 f0;P
max
g,
setting P
l
tran
[t] = 0 is equivalent to not selecting link l for transmission during slot t.
Example of Capacity-Power Curve: For a channel subject to additive white Gaussian noise, the capacity-
power curve for link l, C
l
(
~
P
tran
;~ s) can be approximated using a log() formula for channel capacity.
Hence, we can dene C
l
(
~
P
tran
;~ s) for link l as follows:
C
l
(
~
P;~ s) = log
1 +
l
P
l
tran
N
l
+
P
^
l6=l
^
l
P
^
l
tran
!
where N
l
and
l
represent the Noise and fading coefcients associated with the particular channel state ~ s.
Queueing dynamics. The output of the compression module, R
n
[t], is held in a queue at the transmission
module awaiting transmission. Once data is passed on to the transmission module, it is not considered for
compression again.
Let U
n
[t] denote the number of bits in queue at node n. The following equation captures the dynamics
of the queue backlog:
U
n
[t + 1] max
U
n
[t]
X
l2
n
l
[t];0
!
+ R
n
[t] +
X
l2n
l
[t] (5.5)
where
l
[t] is given by equation (5.4) and
P
l2n
l
[t] represents the exogenous arrivals at node n due
to other nodes routing their data through node n. The equality in (5.5) achieved only when the actual
exogenous arrivals from other nodes is equal to
P
l2n
l
[t]; the actual exogenous arrival can be less than
P
l2n
l
[t] if the other nodes have little or no queue backlog.
68
Symbol Description
An[t] Samples collected; ( m)
b size of each sample in bits
kn[t] Compression decision
Rn[t] Data size after compression
m(:; k)
Expected size of compressed data
for compression algorithm k (eq. 5.1)
E
n
C
[t] Energy consumed by data compression
(:; k)
Expected energy consumption
for compression algorithm k (eq. 5.2)
L All links in the network
n Set of outgoing links
n Set of incoming links
~
S[t] Current channel state;
~
S[t] =fS
l
[t]g
l2L
~
Ptran[t]
Transmit power allocation;
~
Ptrant[t] =fP
l
tran
[t]g
l2L
l
[t]
# bits that can be transmitted on link l
during slot t (eq. 5.4)
Un[t] Queue backlog
E
l
T
[t]
Energy consumed at node
transmitting data on link l
E
l
R
[t]
Energy consumed at node
receiving data on link l
E
l
B
[t]
Energy consumed due to nodes
overhearing the transmission on link l
Table 5.1: System Model Notation: symbols t and n denote the current time and a node, respectively.
Table 5.1 summarizes the notation used to describe our model.
Our framework can naturally incorporate several extensions to this model, which we discuss briey as
part of future work in Section 5.7.
5.3 Optimization Goal
In this paper, our aim is to design a compression and transmission scheduling algorithm that minimizes the
total system power expenditure while maintaining network stability. Using our system model (Section 5.2),
we now formally dene total system power expenditure and network stability.
The main sources of energy expenditure in our system are data compression, data transmission, and
data reception. At time t, the energy consumed by the compression module at node n is E
n
C
[t], with
E
n
C
[t] = 0 if the no compression option is chosen (k
n
[t] = 0).
If the transmission scheduler allocates power P
l
tran
[t] = P
max
for link l = (n; ~ n), then node n has the
opportunity to transmit data for the duration of one time slot, T
slot
. Now, at time t, node n can transmit
l
[t] bits over link l. However, as later described in Section 5.4, in our scheme transmission schedules are
69
determined based on the queue backlog at different nodes. In most cases, when it is the turn of node n to
transmit, its queue will have more data than it can possibly transmit during one slot, i.e., U
n
[t]
l
[t].
Thus, the energy consumed (at node n) by the data transmission on link l during slot t is
E
l
T
[t] = P
l
tran
[t] = P
max
T
slot
(5.6)
When U
n
[t] <
l
[t], we assume (for analytical tractability) that the transmitter stays up for the entire slot
duration. In practice, of course, the transmitter does not have to do this.
Node ~ n consumes energy while receiving the data packet transmitted by node n. Suppose the wireless
interface at each node dissipates a constant amount of power P
recv
when in receive mode. The energy
consumed (at node ~ n) for packet reception on link l during time slot t, under the same assumptions as
above, is:
E
l
R
[t] = P
recv
T
slot
(5.7)
We assume that the wireless interface at a node does not consume any energy when it is not transmitting
or receiving data. This is an idealized assumption, and requires the existence of a well-designed network
duty-cycling protocol. Such a protocol coordinates packet transmissions in a way that nodes can turn their
radios off in order to conserve energy. In general, these protocols (Razvan et al. [75], Burri et al. [10]) do
not achieve the ideal we have assumed, but incur a very small amount of overhead in determining whether
it is safe to go back to sleep. We examine this deviation from the ideal in our experiments in Section 6.4.
Finally, nodes can expend signicant energy overhearing packets, when omnidirectional radios are
used (as we have assumed). A standard method (Ye et al. [102]) for overhearing-avoidance is to use the
RTS/CTS message exchange to determine whether a node can go to sleep or not. Before transmitting a
data packet for node ~ n, a node n broadcasts an RTS control packet that identies ~ n as the intended receiver
and the data packet's transfer duration. All the nodes that are not identied as the receiver can then safely
turn off their radios for the duration of the data packet transfer.
70
We account for this overhearing-avoidance in our analysis in the following manner. Node n broadcast-
ing a packet is equivalent to node n transmitting that packet on all its outgoing links l2
n
. We assume
that a fraction < 1 of the
l
[t] bits is consumed by a control message. The energy consumed at all the
neighbors of node n, except ~ n, due to these nodes receiving a control message is
E
l
B
[t] =
X
~
l2
n:
~
l6=l
(P
recv
T
slot
)
= (j
n
j 1) (P
recv
T
slot
) (5.8)
In our analysis, we assume that all timeslots are of the same duration. Without loss of generality, we dene
T
slot
to be of unit length for the rest of the paper.
Since node n transmits
l
[t] total (control and data) bits on link l, the total network-wide energy
consumed due to node n's compression and transmission activity during time slot t, E
n
tot
, can be written
as,
E
n
tot
[t] = E
n
C
[t] +
X
l2
n
E
l
T
[t] + E
l
R
[t] + E
l
B
[t]
I(P
l
tran
[t]) (5.9)
where I(P
l
tran
[t]) is the indicator function.
We dene the total system energy expenditure during slot t as the sum of the energy expenditure at
each node. The time average total system power expenditure is given by
limsup
t!1
1
t
t1
X
=0
N
X
n=1
EfE
n
tot
[]g (5.10)
71
Our goal is to design an algorithm for making compression and transmission scheduling decisions
to minimize the time average total system power expenditure (5.10) while ensuring network stability
2
.
Formally, we dene a network to be stable if:
limsup
t!1
1
t
t1
X
=0
N
X
n=1
EfU
n
()g <1 (5.11)
Note that network stability implies nite average queue backlog, and hence, nite average delay at each
node.
5.4 The SEEC Algorithm
In this section, we present our rst contribution: the design and analysis of a centralized algorithm, SEEC
that minimizes the system energy expenditure while adapting to topology, network dynamics, application
data rates, and platform differences. In subsequent sections, we explore the design and performance of a
distributed variant, DSEEC.
In designing SEEC, we impose an additional requirement: our algorithm must result in a stable net-
work (Equation 5.11). This requirement suggests a starting point, the Lyapunov optimization framework
(Neely [47], Georgiadis et al. [44]). This framework can incorporate performance metrics such as en-
ergy expenditure, fairness, etc., into the Lyapunov drift, a well-known technique for developing stabilizing
control algorithms. The key idea is to dene a non-negative, scalar function, called a Lyapunov function,
that measures the aggregate congestion of all the queues in the network during timeslot t. The Lyapunov
drift represents the expected change in the Lyapunov function from one timeslot to the next. Under the
Lyapunov optimization framework, control algorithms designed to minimize the Lyapunov drift over time
are guaranteed to stabilize the network and achieve near-optimal performance for a given optimization
objective (energy expediture in case of SEEC).
2
It may make sense, from the perspective of our application, to consider adding other constraints (for example,
constraining average energy usage on individual nodes). We have left this to future work.
72
5.4.1 Design of SEEC
We rst describe SEEC, designed using the Lyapunov drift technique. We present the details of the deriva-
tion and then, in next subsection, analyze SEEC's optimality characteristics. SEEC decouples the choice
of the compression algorithm and the transmission power allocation into two separate algorithms. Both
algorithms involve a single parameter V > 0 that controls the trade-off between energy consumption and
delay.
Compression Algorithm. Every timeslot t, each node n 2 N observes the data collected by its sensor,
A
n
[t], and its current queue backlog, U
n
[t]. It then chooses a compression option k
n
[t]2K as follows:
k
n
[t] = arg min
k2K
(U
n
[t]m(A
n
[t];k) + V (A
n
[t];k)) (5.12)
We break ties arbitrarily if multiple compression options satisfy Equation (5.12). We describe the intuition
for this algorithm below.
Transmission Algorithm. For each link l 2 L, we dene the differential queue backlog U
l
[t] during
timeslot t as U
l
[t] = U
n
[t] U
^ n
[t], where l = (n; ^ n) is a link from node n to node ^ n.
Let
~
U[t] = (U
n
[t]) be the vector of queue backlog at all nodes. Every timeslot t, the transmission
scheduler observes the current queue backlogs
~
U[t] and the channel state
~
S[t], and allocates a power
vector
~
P
tran
[t] = (P
l
tran
[t])
l2L
that solves the following optimization problem:
Maximize
X
n
X
l2
n
U
l
[t]Ef
l
[t]jP
l
[t]; S
l
[t]g V
~
P[t]
(5.13)
subject to :
~
P[t] = P
l
tran
[t] + I(P
l
tran
[t])[1 + (j
nj 1)]P recv
~
P tran[t] = (P
l
tran
[t])
l2L
2P =f0; P maxg
L
73
In Section 5.2, we assumed that P
l
tran
2 f0;P
max
g (hence,P is nite) and that the capacity-power
curves C
l
(
~
P;
~
S) (that determine
l
[t]) are piecewise continuous. Hence, there exists a maximizing power
allocation. We break ties arbitrarily if multiple maximizing power vectors exist.
At a high level, these two algorithms work together as follows. For each time slot, the compres-
sion algorithm chooses the option k
n
[t] that minimizes a weight function which depends on the current
queue backlog at a node (Equation (5.12)). In order to maximize the summation in Equation (5.13),
the transmission algorithm will schedule transmissions only on links for which the link-weight d
l
[t] =
(U
l
[t]
l
[t] V
~
P[t]) > 0. If transmission on two links with positive link-weight cannot be scheduled
simultaneously, then the transmission algorithm picks the link with larger link-weight.
Next, we describe how we derive SEEC (which attempts to jointly make compression and scheduling
decisions in a stable fashion) using the Lyapunov drift technique (Georgiadis et al. [44]).
Lyapunov analysis. Before we embark upon our analysis, we need a closed form expression for the time
evolution of queue lengths in the system. Equation (5.5) provides this, but does not include the overhead
3
of RTS/CTS control messages (Section 5.3). The modied equation is as follows:
U
n
[t + 1] max
U
n
[t]
X
l2
n
~
l
[t];0
!
+ R
n
[t] +
X
l2n
~
l
[t] (5.14)
where ~
l
[t] = (1 )
l
[t].
We dene a quadratic Lyapunov function of queue backlogs as:
L(
~
U[t])
M
=
1
2
N
X
n=1
(U
n
[t])
2
(5.15)
and the one-step conditional Lyapunov drift (
~
U[t]) as:
(
~
U[t])
M
=
E
n
L(
~
U[t + 1]) L(
~
U[t])j
~
U[t]
o
(5.16)
3
Accounting for such overhead may not be strictly necessary in case of SEEC since it uses a centralized transmission
scheduler in a time-slotted system (i.e., may be 0). However, it is needed in the case of a distributed transmission
scheduler (DSEEC, Section 5.5), so we introduce it here and carry it forward in the derivations that follow.
74
The Lyapunov drift for our system is given by the following lemma. (See Georgiadia et al [44] for
further details on Lyapunov drift.)
Lemma 1. Suppose the r.v. A
n
[t] at each node n, and the channel states
~
S[t] are i.i.d. over timeslots. For
the queue evolution, given in Equation (5.14), and the Lyapunov function, dened in Equation (5.15), the
one-step Lyapunov drift for our system satises the following constraint for all t and all
~
U[t]:
(
~
U[t]) BN
X
n
E
(
X
l2
n
~
l
[t]U
l
[t]j
~
U[t]
)
+
N
X
n=1
U
n
[t]E
n
R
n
[t]j
~
U[t]
o
(5.17)
with the constant B dened as follows.
B
M
=
(R
max
+
in
max
)
2
+ (
out
max
)
2
; R
max
M
=
max
n
(EfR
n
[t]g)
in
max
M
=
max
(n;s2S;P2P)
X
l2n
~
l
[t];
out
max
M
=
max
(n;s2S;P2P)
X
l2
n
~
l
[t]
Proof. The proof follows from a straight forward application of the Lyapunov drift analysis technique. We
omit it for brevity and refer the interested reader to [83].
Incorporating the energy constraint. Recall that our goal is to design an algorithm that makes joint
compression and transmission decisions while minimizing energy usage. To do this, we use the Lyapunov
optimization framework (Georgiadis et al. [44], Neely [47]). We add a weighted cost (total system energy
consumed during slot t) to the Lyapunov drift, in Equation (5.17), to get:
(
~
U[t]) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U[t]
o
BN
X
n
E
(
X
l2
n
~
l
[t]U
l
[t]j
~
U(t)
)
+
N
X
n=1
U
n
[t]E
n
R
n
[t]j
~
U[t[
o
+ V
N
X
n=1
E
n
E
n
tot
[t]j
~
U[t]
o
(5.18)
75
In this inequality, we can expand the termE
n
E
n
tot
[t]j
~
U[t]
o
using Equations (5.6), (5.7), and (5.8),
and substitute the value ofE
n
R
n
[t]j
~
U[t]
o
(Equation (5.1)) to get:
(
~
U[t]) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U[t]
o
BN
X
n
E
(
X
l2
n
~
l
[t]U
l
[t] V
~
P[t]j
~
U[t]
)
+
N
X
n=1
E
n
U
n
[t]m(A
n
[t];k
n
[t]) + V (A
n
[t];k
n
[t])j
~
U[t]
o
(5.19)
where
~
P[t] = P
l
tran
[t] + I(P
l
tran
[t])[1 + (j
n
j 1)]P
recv
.
SEEC is designed to minimize the RHS of (5.19). There are three salient points to note from (5.19).
First, because the inequality incorporates the Lyapunov drift, SEEC is stable. Second, comparing the second
term on the RHS of (5.19) and Equation (5.13), SEEC's transmission algorithm contributes to minimizing
the RHS of (5.19) by maximizing this negative term. Finally, comparing the third term on the RHS of (5.19)
and Equation (5.12), we see that SEEC's compression algorithm minimizes this positive term (in order to
minimize the Lyapunov drift). Taken together, SEEC ensures stable, joint compression and transmission
scheduling, with the goal of minimizing energy consumption.
5.4.2 Performance Bounds on SEEC
Next, we provide an analytical bound on the system energy expenditure achieved by SEEC compared to an
optimum value. The optimum value is characterized by the class of stationary randomized algorithms that
make the compression and transmission scheduling decisions in a multi-hop network according to a xed
probability distribution. We then make the following contributions. For this restricted class of algorithms,
Lemma 2 describes the minimum system power consumption, P
av
, required to achieve network stability.
Theorem 1 shows that any joint compression and transmission scheduling algorithm for multi-hop net-
works (and therefore SEEC) that stabilizes the system will require a system power expenditure of at least
76
P
av
. In Theorem 2, we show that SEEC can achieve an average system power consumption arbitrarily close
to P
av
.
Stationary randomized algorithms. Consider the class of stationary randomized algorithms for making
compression and transmission scheduling decisions. Such algorithms choose a compression option (based
only on A
n
[t]) and the transmit power allocation (based only on
~
S[t]) for each time slot t according to
a xed probability distribution. For example, one policy can be to pick a compression option uniformly
at random as well as, based on the current channel state
~
S[t], similarly choose the transmit power vector
~
P
tran
[t] according to a xed probability distribution. Note that these algorithms do not consider the queue
backlogs U
n
[t] while making their decisions.
Condition for achieving stability. Suppose that the data arrival process, i.e., the sequence A
n
[t], t 0,
is ergodic with a steady state distribution p
A
4
. For a given stationary randomized compression decision
policy, let the output data rate (in bits/slot) of the compression module be r
n
=EfR
n
[t]gEfbA
n
[t]g.
We dene a lower bound on r
n
, denoted by r
n
min
as follows:
r
n
min
M
=
E
min
k2K
m(A
n
[t];k)
for all n2N (5.20)
where the expectation is taken over A
n
[t]. Thus, r
n
min
is the minimum average bits/slot delivered to the
output queue by the compression module at node n, assuming that the compression option that results in
the largest expected data size reduction is used in every timeslot. Clearly, r
n
r
n
min
.
Suppose the process representing the time varying channel state, i.e., the sequence
~
S[t], t 0, is
ergodic with a steady state probability distribution
s
. Under a given stationary randomized transmission
scheduling (and transmit power allocation) policy, let
~
f = (f
l
)
l2L
where f
l
=Ef
l
[t]g. A combination
of compression and transmission scheduling policy will stabilize the network if and only if
~
f = (f
l
)
l2L
dene a network-ow satisfying the following conditions
5
4
For ease of exposition, we assume that p An
= p A for all n.
5
in Equation (5.22) is needed to produce an appropriate randomized policy
77
f
l
08 l2L; > 0 (5.21)
X
l2
n
f
l
X
l2n
f
l
= r
n
+ 8 n6= sink (5.22)
N
X
n=1
r
n
=
X
l2
sink
f
l
(5.23)
Let denote the set of rates for which there exists an achievable network-ow
~
f. Clearly, if ~ r
min
=
(r
n
min
) does not belong to , then for the arrivals A
n
[t], it is not possible to stabilize the network, even by
always compressing data. In our subsequent analysis, we assume that ~ r
min
2 . A formal denition of
can be found in Neely et al. [63], where it is dened as the Network Capacity Region.
For the class of stationary randomized policies, we dene the minimum-energy compression function
h
n
(r
n
) (for all nodes n) and the minimum energy transmission function g
(
~
f) as follows.
Denition 1. For each node n, for any value of r
n
such that r
n
min
r
n
EfbA
n
[t]g, the minimum-
energy compression function h
n
(r
n
) is dened as the inmum value h for which there exist probabilities
(
a;k
) for a2f1;2;::: ;mg, k2K, such that:
EfE
n
C
[t]g =
m
X
a=0
K
X
k=1
p
A
(a)
a;k
(a;k) = h (5.24)
r
n
=EfR
n
[t]g =
m
X
a=0
K
X
k=1
p
A
(a)
a;k
m(a;k) (5.25)
a;k
0 8 a;k;
K
X
k=1
a;k
= 1 8 a
Given
~
S[t] with distribution
s
and capacity-power curves
~
C(
~
P;
~
S) = (C
l
(
~
P;
~
S))
l2L
, Neely et al.
dene the Network Graph Family, , as the set of average transmission rates ~ w = (w
l
) that can be
achieved (Neely et al. [63]. Different power allocation algorithms will lead to different ~ w.
78
Denition 2. For any ~ w = (w
l
)
l2L
2 , we dene the minimum energy transmission function g
(~ w) as
the inmum value g for which there exists a stationary randomized power allocation policy that chooses
transmit power vector
~
P
tran
[t] as a random function of the observed channel state vector
~
S[t], and inde-
pendent of the current queue backlog, such that:
X
l2L
E
n
~
P
l
[t]
o
= g; Ef
l
[t]g = w
l
8 l2L
where
~
P
l
[t] = P
l
tran
[t] + I(P
l
tran
[t])[1 + (j
n
j 1)]P
recv
and P
l
tran
[t] is the power allocated for
transmission on link l during timeslot t.
The following lemma shows that the inmum values h
n
(r
n
) and g
(~ w) are achievable.
Lemma 2. Consider ~ r = (r
n
) with r
n
min
r
n
EfbA
n
[t]g8 n and
~
f = (f
l
)2 satisfying conditions
(5.21)-(5.23). For such a scenario, there exists a stationary randomized policy that chooses compres-
sion option k
n
[t] and transmit power allocation
~
P
[t] for timeslot t based only on A
n
[t] and
~
S[t] (and
independent of queue backlogs) such that:
EfE
n
C
[t]g =Ef(A
n
[t];k
n
[t])g = h
n
(r
n
)8 n (5.26)
EfR
n
[t]g =Efm(A
n
[t];k
n
[t])g = r
n
(5.27)
X
l2L
E
n
~
P
l
[t]
o
= g
(
~
f); Ef
l
[t]g = f
l
8 l2L
Proof. This lemma can be separated into two parts: the rst part claims the existence of a stationary
randomized algorithm that makes compression decisions based only on A
n
[t] and
~
S[t] (and independent of
the queue backlogs), and achieves the minimum possible energy consumption for compression, h
n
(r
n
), at
each node; the second part claims that the minimum possible energy consumption due to data transmissions
is also achievable by a stationary randomized algorithm that makes transmit power allocation decisions
based only on A
n
[t] and
~
S[t]. In the following proof, we rst show the existence of a stationary randomized
79
algorithm for making compression decision k
n
[t] that achieves energy consumption h
n
(r
n
) at each node
(Part 1), and then prove a similar result for transmit power allocation
~
P[t] (Part 2).
Part 1. The function h
n
(r
n
) is dened as the inmum ofEfE
n
C
[t]g over all stationary randomized policies
for making the compression decision that yieldEfR
n
[t]g r
n
. This denition implies that there exists
an innite sequence of stationary randomized policies, indexed by i2f1;2;::::g that satisfy:
E
n
R
(i)
n
[t]
o
r
n
8 i2f1;2;:::g (5.28)
lim
i!1
E
n
E
n(i)
C
[t]
o
= h
n
(r
n
) (5.29)
Each stationary randomized policy i is characterized by a collection of probabilities (
(i)
a;k
) where a =
A
n
[t] m and k 2 K. These probabilities can be viewed as a nite dimensional vector belonging to a
compact set dened by the constraints
(i)
a;k
08 a;k;
X
k
(i)
a;k
= 18 k:
It follows from the properties of a compact set that the innite sequencef(
(i)
a;k
)g
1
i=1
contains a convergent
subsequence that converges to (
a;k
)2 . The probabilities (
a;k
) dene a stationary randomized policy
k
[t] with expectationsEfR
n
[t]g andEfE
n
C
[t]g. The expectationsEfR
n
[t]g andEfE
n
C
[t]g are linear
functions of probabilities (
a;k
) as shown in equations (5.24)-(5.25). Hence, the properties (5.28)-(5.29)
hold in limit yieldingEfR
n
[t]g r
n
andEfE
n
C
[t]g = h
n
(r
n
). IfEfR
n
[t]g = r
n
, then the proof is
complete.
IfEfR
n
[t]g < r
n
, we can dene an alternate policy k
0
(t) that chooses policy k
[t] with probability
and with probability (1 ) chooses not to compress data. The value is chosen in such a way that
E
n
R
0
n
[t]
o
= EfR
n
[t]g + (1 )bEfA
n
[t]g = r
n
80
The stationary randomized policy k
0
[t] cannot use more energy than the policy k
[t] (because it con-
sumes no energy when it chooses not to compress data with probability (1 )), and henceE
n
E
n
0
C
[t]
o
h
n
(r
n
). But h
n
(r
n
) is the inmum of energy consumption over all possible stationary randomized poli-
cies with compressed data output rate at most r
n
. This implies that h
n
(r
n
) E
n
E
n
0
C
[t]
o
. Hence,
E
n
E
n
0
C
[t]
o
= h
n
(r
n
), and thus we have proved the existence of a stationary randomized policy k
0
[t]
satisfying the constraints (5.26)-(5.27) in Lemma 2.
Part 2. Our claim that there exists a stationary randomized algorithm that makes transmit power allocation
based only on A
n
[t] and
~
S[t] such that:
Ef
l
[t]g = f
l
8 l2L;
X
l2L
E
n
~
P
l
[t]
o
= g
(
~
f)
follows from the Graph Family Achievability lemma (Lemma 8, Chapter 4.3.2) in Neely's PhD thesis
(Neely [64]).
Optimum system power consumption. The minimum-energy functions h
n
and g
dened for stationary,
randomized algorithms can be used to characterize the minimum system energy consumption for the larger
class of joint compression and transmission scheduling algorithms.
Theorem 1. At each node n, let the r.v. A
n
[t], t 0, be i.i.d. and the corresponding data arrival pro-
cess be ergodic with a steady state probability distribution p
A
. Let the stochastic process representing the
time varying channel state (
~
S[t]) be ergodic with a steady state probability distribution
s
. We assume
that ~ r
min
= (r
n
min
) is within the network capacity region. Then any joint compression and transmis-
sion scheduling algorithm that stabilizes the queues
~
U[t] = (U
n
[t]) requires an average system power
expenditure that satises:
liminf
t!1
1
t
t1
X
=0
N
X
n=1
EfE
n
tot
[]g P
av
(5.30)
where P
av
is the optimal solution to the following optimization problem:
81
Minimize:
N
X
n=1
(h
n
(r
n
)) + g
(
~
f) (5.31)
subject to:
r
n
min
r
n
bEfA
n
[t]g 8 n2N
~
f = (f
l
)
l2L
denes a valid network ow
Proof. Consider a policy that stabilizes the queues at all the nodes in the network. Let
~
k[t] =fk
n
[t]g
n2N
and
~
P
tran
[t] =fP
l
tran
[t]g
l2L
be the compression and transmission power decisions for this policy where
k
n
[t] 2 K for all n;t and P
tran
[t] 2 P for all t. Let R
n
[t] and P
n
comp
[t] be the compression module
output process and the power expenditure, respectively, at node n due to compression decision k
n
[t]. Let
l
[t] = C
l
(
~
P
tran
[t];
~
S[t]) be transmission rate process for link l.
We want to prove that
P
tot
M
=
liminf
t!1
1
t
t1
X
=0
N
X
n=1
EfE
n
tot
[]g (5.32)
= liminf
t!1
1
t
t1
X
=0
E
8
<
:
N
X
n=1
P
n
comp
[] +
X
l=(n;^ n)2L
~
P
l
[]
9
=
;
(5.33)
P
av
(5.34)
where
~
P
l
[] = P
l
tran
[]+I(P
l
tran
[])[1+(j
n
j1)]P
recv
for each link l = (n; ^ n), P
av
is the minimum
time average power consumption required to stabilize the queues (dened in Theorem 1). Equation (5.33)
follows from (5.6)-(5.8), and (5.9). Next, we discuss two lemmas needed for our proof.
82
Lemma 3. If there exist vectors of constants ~ r =fr
n
g
n2N
and
~
P
c
=fP
n
c
g
n2N
with an innite sequence
of timeslotsft
i
g
1
i=1
such that:
lim
i!1
1
t
i
ti1
X
=0
EfR
n
[]g = r
n
(5.35)
lim
i!1
1
t
i
ti1
X
=0
E
P
n
comp
[]
= P
n
c
(5.36)
then P
n
c
h
n
(r
n
) for all n2N , and hence
P
N
n=1
P
n
c
P
N
n=1
h
n
(r).
Proof. Omitted for brevity. See [83].
Lemma 4. If there exist vector of constants ~ w =fw
l
g
l2L
and
~
P
tr
=fP
l
tr
g
l2L
with an innite sequence
of timeslotsft
i
g
1
i=1
such that:
lim
i!1
1
t
i
ti1
X
=0
Ef
l
([]g = w
l
(5.37)
lim
i!1
1
t
i
ti1
X
=0
E
P
l
[]
= P
l
tr
(5.38)
then
P
l2L
~
P
l
g
(~ w), where for
~
P
l
= P
l
tr
+ P
recv
(1 + (j
n
j 1))I(P
l
tr
) for each link l = (n; ^ n).
Proof. Omitted for brevity. See [83].
The key step in proving both lemma 3 and lemma 4 is to show the existence of stationary randomized
policies for choosing a compression option at each node and the transmit power allocations that satisfy
(5.35)-(5.36), and (5.37)-(5.38), respectively. The claim in lemma 3 and lemma 4 then follows from the
denition of h
n
(r
n
) and g
(~ w), respectively.
Let the liminf total power expenditure P
tot
dened in equation (5.32) be achieved over an innite
sequence of timeslotsft
i
g
1
i=1
, i.e.
liminf
i!1
1
t
i
ti1
X
=0
N
X
n=1
EfE
n
tot
[]g = P
tot
(5.39)
83
For any timeslot t, we dene:
^
R
n
[t] =
1
t
t1
X
=0
EfR
n
[]g ;
^
P
n
comp
[t] =
1
t
t1
X
=0
E
P
n
comp
[]
^
l
[t] =
1
t
t1
X
=0
Ef
l
[]g ;
^
P
l
tran
[t] =
1
t
t1
X
=0
E
P
l
tran
[]
Note that for all timeslots t, we have:
0
^
R
n
[t] bEfA[t]g ; 0
^
P
n
comp
[t]
max
(5.40)
0 ^
l
[t]
max
; 0
^
P
l
tran
[t] P
max
(5.41)
We dene a 2 (N +jLj) dimensional vector as follows:
V [t
i
] =
f
^
R
n
[t
i
];
^
P
n
comp
[t
i
]g
n2N
;f^
l
[t
i
];
^
P
l
tran
[t
i
]g
l2L
It follows from constraints (5.40) and (5.41) that the vectors fV [t
i
]g
1
i=1
form an innite sequence in a
2 (N +jLj) dimensional compact set,and hence, this innite sequence a convergent subsequence. Let
f
~
t
i
g
1
i=1
represent this convergent subsequence of timeslots such that there exists a vector of constants
v =
fr
n
;P
n
c
g
n2N
;fw
l
;P
l
tr
g
l2L
satisfying for all n2N and l2L:
lim
i!1
1
~
t
i
~
ti1
X
=0
EfR
n
[]g = r
n
; lim
i!1
1
~
t
i
~
ti1
X
=0
E
P
n
comp
[]
= P
n
c
lim
i!1
1
~
t
i
~
ti1
X
=0
Ef
l
[]g = w
l
; lim
i!1
1
~
t
i
~
ti1
X
=0
E
P
l
tran
[]
= P
l
tr
84
Thus, we have vectors ~ r =fr
n
g
n2N
and
~
P
c
=fP
n
c
g
n2N
, and an innite sequence of timeslotsf
~
t
i
g
1
i=1
satisfying constraints (5.35) and (5.36). Hence, from Lemma 3, we have
X
n2N
P
n
c
X
n2N
h
n
(r
n
)
Similarly, for vectors ~ w =fw
l
g
l2L
and
~
P
tr
=fP
l
tr
g
l2L
, from Lemma 4, we have
X
l2L
~
P
l
g
(~ w)
where
~
P
l
= P
l
tr
+ P
recv
(1 + (j
n
j 1))I(P
l
tr
).
Due to the fact thatf
~
t
i
g
1
i=1
is an innite subsequence of the original sequenceft
i
g
1
i=1
, from equation
(5.32), we have
P tot = lim inf
i!1
1
~
t i
~
t
i
1
X
=0
N
X
n=1
EfE
n
tot
()g =
X
n2N
P
n
c
+
X
l2L
~
P
l
X
n2N
h
n
(r n) + g
(~ w) (5.42)
To compare the P
av
value against the value on the right hand side of the inequality (5.42), we use the fact
that the queues are stable. Since the queues are stable for rate vector ~ r, it follows from the denition of the
network capacity region that ~ r 2 . Furthermore, from the discussion on the conditions for achieving
stability in Section 5.4.2, ~ r 2 implies that there exist variables
~
f = (f
l
)
l2L
dening a network ow
(satisfying constraints (5.21)-(5.23)). Since, the transmission scheduling algorithm achieves rate w
l
on
link l, in order to achieve stability, we must have f
l
w
l
for all l 2 L. This implies g
(
~
f) g
(~ w).
Therefore, from constraint (5.42), we have
P
tot
X
n2N
h
n
(r
n
) + g
(~ w)
X
n2N
h
n
(r
n
) + g
(
~
f) (5.43)
Using the denition of r
n
min
, it is straight forward to show that
r
n
min
r
n
bEfA[t]g 8 n2N (5.44)
85
Since P
av
is dened as the minimum time average power expenditure required to stabilize a network with
compression module output ratesfr
n
g
n2N
(see Theorem 1), it follows that
P
tot
X
n2N
h
n
(r
n
) + g
(
~
f) P
av
(5.45)
We have shown that the inequality (5.45) holds for any joint compression and transmission power control
algorithm that stabilizes all the queues in the network, and hence, proved the claim in Theorem 1.
In practice, solving the optimization problem in Equation (5.31) might not be possible as it requires
exact knowledge of functions h
n
and g
, which in turn requires complete a priori knowledge of the distri-
butions p
A
and
s
. However, this formulation is useful in showing the optimality characteristics of SEEC,
as discussed next.
SEEC Performance. The following theorem shows that SEEC can achieve near-optimal performance, i.e.,
achieve power consumption arbitrarily close to P
av
while maintaining network stability and trading-off
delay (as described below).
Theorem 2. Suppose the arrival process A
n
[t] and the channel state
~
S[t] are i.i.d. across timeslots with
distributions p
A
and
s
, respectively. We assume that it is possible to stabilize the network, i.e., ~ r
min
is
strictly interior to the network capacity region . For any control parameter V > 0, the compression and
transmit power allocation algorithms (Equations (5.12) and (5.13)) achieve average power consumption
and queue backlogs that satisfy the following constraints:
P
tot
= limsup
t!1
1
t
t1
X
=0
N
X
n=1
EfE
n
tot
[]g P
av
+
BN
V
(5.46)
X
n
U
n
M
=
limsup
t!1
1
t
t1
X
=0
X
n
EfU
n
[]g
BN + V N(
max
+
~
P
max
)
max
(5.47)
86
where the constant B is dened in Equation (5.18), and
~
P
max
= P
max
+ P
recv
(1 +
~
max
)
with
~
max
M
=
max
n
n
. The constant
max
denotes the maximum expected power consumption for data
compression across all nodes when, for each timeslot t, the compression algorithm that is expected to
consume maximum power is chosen. It is dened as:
max
M
=
max
n
E
max
k2K
[(A
n
[t];k)]
(5.48)
where the expectation is over the random arrival process A
n
[t].
max
is the largest such that ^ r
n
=
(r
n
min
+)2 , i.e., if we increased the minimum possible compressed data rate at node n, r
n
min
, by
max
,
the resulting rate vector lies on the boundary of the network capacity region .
Proof. The following lemma from Georgiadis et al. [44] is needed for our proof.
Lemma 5. Let L(
~
U[t]) be a non-negative function of
~
U[t] with Lyapunov drift (
~
U[t]) dened in (5.16).
If there are stochastic processes
[t] and [t] such that for every timeslot t and for all possible values of
~
U[t], the Lyapunov drift satises:
(
~
U[t])E
n
[t]
[t] j
~
U[t]
o
(5.49)
then,
limsup
t!1
1
t
t1
X
=0
Ef
[t]g limsup
t!1
1
t
t1
X
=0
Ef[t]g
87
SEEC minimizes the right hand side of the Lyapunov drift expression (5.19). Hence, for any other (pos-
sibly randomized) algorithm that makes compression decisions k
n
[t] and chooses transmit power P
tran
[t],
we have:
(
~
U(t)) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U(t)
o
BN
X
n
E
(
X
l2
n
~
l
[t]U
l
[t] V
^
P[t]j
~
U(t)
)
+
N
X
n=1
E
n
U
n
[t]m(A
n
[t];k
n
[t]) + V (A
n
[t];k
n
[t])j
~
U[t]
o
(5.50)
with
^
P[t] = P
l
tran
[t] + I(P
l
tran
[t])[1 + (j
n
j 1)]P
recv
.
Let ~ r = (r
n
), and
~
f = (f
l
)
l2L
represent compressed data rate and transmission rate vector dening
a network-ow (satisfying equations (5.21)-(5.23)). Let k
n
[t] and
~
P
n
[t] represent the compression and
transmit power allocations made by a stationary randomized policy that yields:
EfE
n
C
[t]g =Ef(A
n
[t];k
n
[t])g = h
n
(r
n
)8 n (5.51)
EfR
n
[t]g =Efm(A
n
[t];k
n
[t])g = r
n
(5.52)
X
l2L
E
n
~
P
l
[t]
o
= g
(
~
f); Ef
l
[t]g = f
l
8 l2L (5.53)
Lemma 2 guarantees the existence of such a policy. The decision k
n
[t] and
~
P
[t] made by a stationary
randomized algorithm are independent of the queue backlogs
~
U[t]. Thus, the expectations in (5.51)-(5.53)
are the same when conditioned on
~
U[t]. Substituting (5.51)-(5.53) into the right hand side of (5.50), and
re-arranging the terms, we get:
(
~
U(t)) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U(t)
o
BN
X
n
U
n
[t]
(
X
l2
n
f
l
X
l2n
f
l
) r
n
!
+ V
X
n
(h
n
(r
n
)) + g
(
~
f)
!
(5.54)
88
The above inequality holds for all ~ r and
~
f. In particular, it also holds for the ~ r
and
~
f
that optimize
(5.30)-(5.31) in Theorem 1 such that P
av
=
P
n
(h
n
(r
n
)) + g
(
~
f
), and
P
l2
n
f
l
P
l2n
f
l
= r
n
.
Plugging this into (5.54), we have:
(
~
U(t)) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U(t)
o
BN + V P
av
(5.55)
Using the drift inequality (5.55) in Lemma 5 with
[t] = V (
P
n
E
n
tot
[t]) and [t] = BN + V P
av
yields
P
tot
P
av
+ BN=V .
Similarly, choosing ~ r = (r
n
min
) and the corresponding ow vector
~
f such that
P
l2
n
f
l
P
l2n
f
l
=
r
n
min
+ , from (5.55), we get
(
~
U(t)) + V
N
X
n=1
E
n
E
n
tot
[t]j
~
U(t)
o
BN
X
n
U
n
[t] + V
X
n
(h
n
(r
n
min
)) + g
(
~
f)
!
(5.56)
Since E
n
tot
0 for all n, from (5.56), we have
(
~
U(t)) BN
X
n
U
n
[t] + V
X
n
(h
n
(r
n
min
)) + g
(
~
f)
!
(5.57)
From the denition of
max
, we have h
n
(r
n
min
)
max
for all n. Also, since P
l
tran
[t] P
max
for all
links l during any timeslot t, we have g
(
~
f)jLj
~
P
max
where
~
P
max
= P
max
+ P
recv
(1 + j
max
j)
with
max
M
=
max
n
n
. For the data collection using a routing tree scenario considered in this paper, each
node has only one outgoing link, and hence, g
(
~
f) N
~
P
max
. Substituting the upper bounds on h
n
(r
n
min
and g
(
~
f) in (5.58), we get
(
~
U(t)) BN
X
n
U
n
[t] + V N(
max
+
~
P
max
) (5.58)
.
89
Using the drift inequality (5.59) in Lemma 5 with
[t] =
P
n
U
n
[t] and [t] = BN + V (N
max
+
jLjP
max
), we get
X
n
U n
BN + V N( max +
~
P max)
(5.59)
. The average total queue backlog bound in (5.59) holds for any value of > 0. A particular choice of
affects only the bound and does not affect the compression and transmit power allocation decisions in SEEC
or any the sample path of system dynamics. Hence, we can minimize the bound in (5.59) by choosing the
largest feasible such that r
n
min
+ 2 (dened as
max
) yielding:
X
n
U
n
BN + V N(
max
+
~
P
max
)
max
Power consumption vs. delay trade-off. We can choose a large value for control parameter V to make
B=V arbitrarily small, and hence, achieve time average power consumption P
tot
arbitrarily close to the op-
timal value P
av
. However, the total average queue backlog
P
n
U
n
grows linearly in V . Thus, reducing the
average power expenditure by choosing a large value for V causes larger queue backlogs resulting in longer
delay in delivering data to the base-station. This O(1=V;V ) power consumption vs. delay trade-off is in-
herent in control algorithms designed based on Lyapunov optimization techniques (Georgiadis et al. [44]).
5.5 DSEEC: Distributed Algorithm
SEEC's performance bounds are derived for a timeslotted system. Under general SINR constraints, nd-
ing the optimal transmission schedule for a timeslot is NP-hard (Georgiadis et al. [44]). However, for
certain scenarios, for example, a cell partitioned network, nding the optimal transmission schedule is
equivalent to nding a maximum weight matching in a graph with link-weights d
l
[t] = U
l
[t]
l
[t] V
~
P[t]
(Georgiadis et al [44]). Given the knowledge of the complete network topology and the link-weights, a
maximum weight matching can be found in polynomial time.
90
In practice, most wireless systems (e.g., 802.11 or 802.15.4-based systems) are not time-slotted. In
the rest of this paper, we consider a distributed variant of SEEC, called DSEEC, for multi-hop wireless
networks based on practical MACs. DSEEC uses the same compression algorithm as SEEC, since that
algorithm only requires local information. The key difference between DSEEC and SEEC is the transmission
algorithm, in which a node uses only information about queue backlogs from its neighbors. Specically,
our transmission heuristic lets only nodes n with positive link-weights, i.e., d
l
[t] > 0 for some link l =
(n; ~ n), contend for the wireless channel. Several other heuristics have been proposed implementing such
backpressure-based scheduling with CSMA-based MACs (Umut et al. [95], Warrier et al. [98]); Sridharan
et al. [85] show that the positive link-weight based heuristic performs as well as others. Because DSEEC is
not analytically tractable, we evaluate its performance through simulation.
5.6 Evaluation
In this section we evaluate the performance of our algorithm in simulation, on topologies derived from
real wireless testbeds and deployments. This section rst discusses our experimental methodology, then
presents our results.
5.6.1 Methodology
We implemented the DSEEC algorithm in Qualnet, a high-delity, packet-level wireless simulator [71].
A constant bit rate (CBR) application generates data periodically. These data bytes are passed on to the
compression module that decides whether to compress the data or not. The output of the compression
module is queued in a buffer at a wireless link/interface awaiting transmission.
Data Compression and Transmission model parameters. We obtained the data compression model
parameters (a;k) and m(a;k) by compressing sensor data from a real-world deployment [67]. We
partition the data collected by a node during this deployment into slices of size 600 bytes, and compress
each slice separately using the compression function in zlib. We then set m(A
n
[t];1) to be equal to the
91
average data size achieved after compression across these slices (with values ranging from 430 to 470
bytes). To estimate (a;1), using our traces as input, we ran our compression program on a LEAP2 node,
an embedded networked sensor platform optimized for low power processing. We measured the average
CPU and memory (SDRAM) energy consumption across several runs, and set (A
n
[t];1) at each node n
equal to 3:35 mJ the average energy consumed for compressing 600 bytes of data. More details about
our trace-driven measurements are available in [83].
We use the 802.11b MAC and physical layer implementation in Qualnet in our simulations. We set the
maximum transmission rate to 2 Mbps, and the radio power consumption to 400 mW when the wireless
interface is in transmit or receive state. We chose these values based on a energy measurement study for
LEAP2 nodes [87]. We assume the existence of a scheme for radio duty cycling that turns off the radio
whenever possible, and discuss its impact in Section 5.6.2.
We assume simple overhearing avoidance using RTS/CTS in 802.11. For our simulation scenarios,
the energy consumption due to receiving RTS or CTS packets is within 10% of the energy consumed by
the wireless interface for receiving a packet. Thus, with RTS/CTS enabled, we set the parameter in our
transmission decision algorithm (Equation (5.13)) to 0:1.
Network topologies. For a xed energy consumption by the radio and for data compression, the network
topology determines whether compressing data saves energy or not. Two network topology dependent
factors can have an impact on the energy savings due to data compression: (a) multi-hop routing and (b)
the average neighborhood size. Compression reduces the size of the data (hence, the number of packets)
that must be forwarded to the sink. With fewer data packets, the total energy cost due to packet trans-
mission, reception, and overhearing is reduced. These energy savings can offset the energy consumed for
compression, leading to a lower total energy consumption (compared to not compressing data).
We evaluate the performance of our algorithm using three qualitatively different data collecting trees:
a cluster-tree, a shallow-tree, and a deep-tree. These topologies represent different combinations of the
two factors multi-hop routing and neighborhood sizes, as summarized in Table 5.2. Each node in the
92
# source avg. # Multi-hop
nodes neighbors paths
Cluster-tree 20 small (4) short ( 2)
Shallow-tree 25 large (11) short ( 2)
Deep-tree 25 medium (8) long ( 4)
Table 5.2: Data Collection Trees
cluster-tree is at most two hops away from the sink and has a small number of neighbors. Nodes in the
shallow-tree are again at most two hops away from the sink but have a large number of neighbors. The
deep-tree (Figure 5.3) consists of several nodes that are more than two hops away from the sink.
1
2
3
4
5
6
7
8
9
10 11
12
13
14
15
16
17
18 19 20 21
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Figure 5.1: Cluster-tree
6
5
4
3
2
7
8
11
12
21
22
24
15
14
13
10
9
16
17
18
19
1
20
26
23
25
CLIQUE 1
CLIQUE 2
Figure 5.2: Shallow-tree
5.6.2 Energy Savings
In this section, we provide evidence which illustrates the need for dynamic compression decisions. We
show that realistic data collection topologies exist where statically always compressing data may not be
the most energy efcient strategy.
We compare DSEEC against two baseline strategies: (1) always compress and (2) never compress.
These are both static compression decisions that do not take the (energy) cost of data compression relative
to the cost of transmission into account. To equalize the effect of transmission scheduling on energy
consumption, we use the same transmission decision algorithm (Equation (5.13)) for all strategies.
93
20
24
1
2 7 5
9
8 10 11
12
13
3
4
6 14
15
18
16
17 19
21
22
23 25
26
CLIQUE 1
CLIQUE 2
CLIQUE 3
Figure 5.3: Deep-tree
We use the time averaged total system power consumption as the performance metric. This is com-
puted as the total energy consumption across all nodes divided by the duration of the experiment. In our
simulations, a CBR application at each node generates 600 bytes of data every 5 seconds. In this regime,
the data rate is low enough that nodes compress data only to minimize energy, not to stabilize queues.
Incidentally, this is the rate at which data is generated at each node in the original deployment. Each
simulation run is 100 minutes long, so the CBR application at each node generates 1200 data packets.
Unless otherwise specied, the control parameter V is set to 10
6
(we discuss DSEEC's sensitivity to V in
Section 5.6.4), and the compression and data transmission parameters are set to the values given in Section
5.6.1. We assume no transmission losses; i.e. p
f
= p
r
= 1 for the ETX metric described in Section 5.2
(we consider transmission losses in Section 5.6.3). Finally, each simulation is averaged over 10 runs; the
95% condence intervals are too small to see on the graphs, so we omit those.
Motivating Dynamic Compression. Figure 5.4 depicts the total power consumed by the different com-
pression decision strategies, where the always compress strategy dissipates more power than DSEEC for all
three topologies 34% for cluster-tree, 17% for shallow-tree, and 10% for deep-tree. In these topologies,
the energy consumed for compressing data is higher than the savings resulting from having to transmit
fewer packets as a result of data compression, so with DSEEC nodes never compress data, and its system
power consumption is the same as never compress.
94
0
10
20
30
40
50
60
Cluster−tree Shallow−tree Deep−tree
System Power Consumption (mW)
Transmit
Receive
Compress
Figure 5.4: Energy consumption: never compress (rst),
SEEC-WT (second), DSEEC (third), always compress
(fourth)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
0
5
10
15
20
25
30
35
Idle time (%)
Power Savings (%)
Figure 5.5: Duty-cycling overhead
Across topologies, two facts stand out. First, always compress consumes less power in transmitting
and receiving packets. This is expected since compressing data reduces the number of packets each node
has to transmit. Second, the power savings achieved by DSEEC for the cluster-tree is signicantly higher
than for the other two topologies. Nodes in the cluster-tree have both smaller neighborhood sizes and
shorter multi-hop paths than in the other topologies. In the shallow-tree each node has a large number of
neighbors. Thus, a signicant fraction of the energy consumed for data compression by always compress
is offset by the lower energy consumed for receiving data and control (RTS/CTS) packets. The deep-tree
topology has nodes with a large number of neighbors as well as nodes that are more than 2 hops away from
the sink, and the benets of DSEEC are the least in this case.
To understand the performance hit for DSEEC caused by local scheduling, we implemented SEEC with-
out time slots (SEEC-WT), which makes global scheduling decisions on top of a CSMA MAC. For our
topologies, the total energy consumed by DSEEC is comparable to that of SEEC-WT.
Overall, this experiment demonstrates the effect of network topology on power consumption and the
benets of using a dynamic algorithm for compression decisions over statically always compressing data.
Duty-cycling overhead. The results shown in Figure 5.4 assume ideal duty-cycling, i.e., that the radio
is turned off whenever it is not transmitting or receiving. In practice, due to the overhead incurred by
95
the duty-cycling protocols, the radio will remain in idle-state from time-to-time before it is turned off.
Figure 5.5 shows how the power savings achieved by DSEEC over always compress varies as a function
of the duty-cycling overhead. This overhead is measured by the fraction of the total experiment time that
a node could have slept but stayed awake to determine if it was appropriate to sleep, averaged over all
nodes. Duty-cycling overhead is topology-dependent but is typically about 1-1.5% for a well-designed
protocol [10]. DSEEC achieves 12-17% savings over the static decision in this regime.
5.6.3 Adapting to Dynamics
In practice, a static compression decision is undesirable not only because of the detailed information
needed for making such a decision, but also because the system needs to adapt to changes in wireless
channel conditions, application trafc demand or platform settings. If the wireless link qualities and/or the
application data rate change signicantly, then a static decision would have to be recomputed. In contrast,
DSEEC can dynamically adapt compression decisions.
We use the deep-tree topology to demonstrate that DSEEC can adapt to changes in link quality and
application data rates. We rst vary the link quality between node 13 and the sink node 1. In the experiment
termed good link, none of node 13's transmissions encounter channel errors. In the experiment termed
bad link, 20% of the transmissions by node 13 get corrupted and not received successfully at node 1
(requiring node 13 to retransmit these packets). Hence, for the link between nodes 13 and 1, p
f
= 0:8
and p
r
= 1 for the ETX link quality metric. This reduces the average transmission rate from 2 Mbps
to 1.6 Mbps for the link 13 ! 1. In our simulations, we monitor the current level of retransmissions
to continuously update the ETX metric, and our estimate of the (average) transmission rate for the link
13 ! 1. In another set of experiments, we steadily increased the application data rate at the nodes in
the deep-tree topology until DSEEC started compressing each packet. We do this by decreasing the packet
inter-arrival time. To illustrate DSEEC's dynamic adaptation to trafc rates, we consider two data points:
(1) Light load (300 milliseconds packet inter-arrival time), and (2) Heavy load (180 milliseconds packet
inter-arrival time).
96
Node ID Light Heavy Good link Bad link
Clique 1
2-12,14 0 0 0 0
Clique 2
13 0 0 0 0
15 0 2.7% 0 0
16 0 0.35% 0 0
17 0 31% 0 0
18 0 39.7% 0 0
20 0 1.8% 0 0
21 0 76.8% 0 0.7%
Clique 3
19 0 46.8% 0 0.5%
22 0 99.8% 0.1% 24.9%
23 0 99.8% 0.45% 34.3%
24 0 99.7% 3.2% 40.2%
25 0 99.3% 0 9.3%
26 0 42.4% 0.87% 2.4%
Table 5.3: DSEEC, deep-tree: % of packets compressed; change in application load and link quality
Table 5.3 depicts the change in fraction of packets compressed, by DSEEC, at all the nodes in the
deep-tree when the link quality or the application data rate changes. For the bad link scenario, packet
drops at node 13 result in retransmissions at the MAC layer. Packet retransmissions decrease the effective
transmission rate on the 13! 1 link, which increases transmission energy cost for all nodes routing data
through that link. Nodes 22 26, that are 3 or more hops away from the sink, respond to this increase
in transmission energy cost by compressing a signicant fraction of their data. However, the increase in
transmission energy consumption is not large enough to trigger data compression at nodes that are 1 or 2
hops away from the sink. Note that DSEEC makes these decisions without any explicit information about
the location of the bad link and the extent of packet drops on it.
With high application data rate, the level of data compression varies across different nodes due to the
following reasons. The 13 ! 1 link in the deep-tree topology is a bottleneck link for all the nodes in
cliques 2 and 3 trying to send data to the sink. Nodes in clique 1 are outside the transmission range of
nodes in clique 2 and 3. Hence, their transmission rate is not affected by the increased contention amongst
the nodes in clique 2 and 3 when the application data rate is increased. As a result, the application data rate
that can be supported without compression is smaller for nodes in cliques 2 and 3 compared to the nodes
in clique 1. This is the reason why nodes in clique 1 never compress the data in both scenarios. Nodes in
clique 3 compress more data than nodes in clique 2 since they are further away from the sink, and hence,
their data is transmitted over more hops as compared to the nodes in clique 2.
97
Insight into dSEEC's adaptability. DSEEC's control decisions are driven by the queue backlog at differ-
ent nodes. With its compression decision algorithm, nodes with higher queue backlog are more likely to
compress data. Due to its differential queue backlog based transmission scheduling, the queue backlog at
a node is directly proportional to its hop-count distance from the sink. Hence, with DSEEC, nodes that are
multiple hops away from the sink have a higher likelihood of compressing data (due to their larger queue
sizes) compared to the nodes that are only 1 or 2 hops away.
A degradation in link quality and/or increase in application data rate can result in larger queue backlogs
at all the nodes in the network. An increase in queue backlogs (for example, when the application trafc
load changes from Light to Heavy) can cause DSEEC to change its compression decision. Moreover, the
impact of any change in network conditions will always be more signicant at nodes that are farther away
from the sink because these nodes have a larger queue backlog compared to the nodes that are closer to the
sink.
5.6.4 Sensitivity toV
The control parameter V impacts the energy-vs-delay trade-off. As discussed in Section 5.4, increasing V
reduces the system power consumption but increases the average queue backlog in the system. We have
found that changing V even by 3 orders of magnitude results in a small impact on the total system power
consumption. Moreover, relative to a static strategy like always compress, when V is varied from 10
3
to 3 10
6
, DSEEC consumes from 8.7% to 10% less power. Intuitively, this relative insensitivity results
from using the same transmission decision strategy for all three compression strategies. A more detailed
discussion and additional results from our sensitivity evaluation are available in [83].
5.7 Conclusions and Future Work
In this paper, we have described the design of SEEC, a stable energy-efcient compression and schedul-
ing algorithm for multi-hop wireless networks. SEEC can achieve near-optimal energy performance, and
98
its distributed variant DSEEC adapts to topology, link dynamics, platform settings, and application data
rates without explicitly taking those factors into account. We intend to pursue several future directions,
including extending SEEC to consider spatially correlated data, choice of routing paths, compression at
intermediate nodes (not just at the source), multiple radio transmit power levels, and bounded-distortion
lossy compression. We also intend to evaluate these extensions on DSEEC, and also evaluate DSEEC more
extensively, using multiple compression levels (K > 1), different platforms, larger networks, and different
compression algorithms.
99
Chapter 6
Sensor Faults: Detection Methods and Prevalence in Real-World
Datasets
Various sensor network measurement studies have reported instances of transient faults in sensor readings.
In this work, we seek to answer a simple question: How often are such faults observed in real deployments?
We focus on three types of transient faults, caused by faulty sensor readings that appear abnormal. To un-
derstand the prevalence of such faults, we rst explore and characterize four qualitatively different classes
of fault detection methods. Rule-based methods leverage domain knowledge to develop heuristic rules
for detecting and identifying faults. Estimation methods predict normal sensor behavior by leveraging
sensor correlations, agging anomalous sensor readings as faults. Time series analysis based methods start
with an a priori model for sensor readings. A sensor measurement is compared against its predicted value
computed using time series forecasting to determine if it is faulty. Learning-based methods infer a model
for the normal sensor readings using training data, and then statistically detect and identify classes of
faults.
We nd that these four classes of methods sit at different points on the accuracy/robustness spectrum.
Rule-based methods can be highly accurate, but their accuracy depends critically on the choice of pa-
rameters. Learning methods can be cumbersome to train, but can accurately detect and classify faults.
Estimation methods are accurate, but cannot classify faults. Time series analysis based methods are more
effective for detecting short duration faults than long duration ones, and incur more false positives than
100
the other methods. We apply these techniques to four real-world sensor data sets and nd that the preva-
lence of faults as well as their type varies with data sets. All four methods are qualitatively consistent in
identifying sensor faults, lending credence to our observations. Our work is a rst-step towards automated
on-line fault detection and classication.
6.1 Introduction
With the maturation of sensor network software, we are increasingly seeing longer-term deployments of
wireless sensor networks in real mode settings. As a result, research attention is now turning towards
drawing meaningful scientic inferences from the collected data [93]. Before sensor networks can be-
come effective replacements for existing scientic instruments, it is important to ensure the quality of the
collected data. Already, several deployments have observed faulty sensor readings caused by incorrect
hardware design or improper calibration, or by low battery levels [73, 93, 99].
Given these observations, and the realization that it will be impossible to always deploy a perfectly
calibrated network of sensors, an important research direction for the future will be automated detection,
classication, and root-cause analysis of sensor faults, as well as techniques that can automatically scrub
collected sensor data to ensure high quality. A rst step in this direction is an understanding of the preva-
lence of faulty sensor readings in existing real-world deployments. In this paper, we take such a step.
Sensor Data Faults. We start by focusing on a small set of sensor faults that have been observed in real
deployments: single-sample spikes sensor readings (we call these SHORT faults, following [73]), longer
duration noisy readings (NOISE faults), and anomalous constant offset readings (CONSTANT faults).
The three fault types (SHORT faults, NOISE faults, and CONSTANT faults), that we focus on in this
paper, cause the faulty sensor readings to deviate from the normal pattern exhibited by true (or non-faulty)
sensor readings, and are derived from a data centric view of sensor faults [65]. These three fault types
have been observed in several real-world deployments [73, 93], and hence, it is important to understand
101
their prevalence and develop automated techniques for detecting them. Given these three fault models, our
paper makes the three contributions described below.
Before describing our contributions, we note that not all sensor data faults fall within the three fault
categories considered in this paper. Ni et al. [65] provide examples of sensor data faults due to calibration
errors that can persist during the entire deployment. For example, an offset fault due to calibration errors
causes all the sensor readings to differ from the true value by a constant amount, but the sensor readings
still exhibit normal patterns (e.g., a diurnal variation in case of ambient temperature).
Detection Methods. We rst explore four qualitatively different techniques for detecting such faults from
a trace of sensor readings. Our decision to consider four qualitatively different fault detection techniques is
motivated by the following two factors. Firstly, our goal is to explore the space of fault detection techniques
that are suitable for detecting the class of data faults SHORT, NOISE, and CONSTANT examined in
this paper. Secondly, as one might expect, and as we shall see later in the paper, no single method is
perfect for detecting the different types of faults we consider in this paper. Intuitively, then, it makes sense
to explore the space of detection techniques to understand the trade-offs in detection accuracy versus the
robustness to parameter choices and other design considerations. This is what we have attempted to do
with respect to the methods that we have chosen, from among the existing general types of fault detection
methods, and our choice of qualitatively different approaches exposes differences in the trade-offs.
All four methods follow a common framework for fault detection: they characterize the normal
behavior of sensor readings, and identify signicant deviations from this normal as faults. However, in
order to facilitate a systematic exploration of the space of detection techniques, we choose/design these
methods based on four different types/sources of information relevant for detecting the SHORT, NOISE
and CONSTANT data faults. The four different classes of methods discussed in this paper are as follows.
Rule-based methods leverage domain knowledge about sensor readings to develop heuristic rules/
constraints that the sensor readings must satisfy.
102
Estimation methods denenormal sensor behavior by leveraging spatial correlation in measure-
ments at different sensors.
Time series analysis based methods leverage temporal correlations in measurements collected by
the same sensor to estimate the parameters of an (a priori selected) model for these measurements. A
sensor measurement is compared against its predicted value, computed using time series forecasting,
to determine if it is faulty.
Learning-based methods infer a model for the normal and faulty sensor readings using training
data, and then statistically detect and identify classes of faults.
While all four methods are geared towards automated fault detection, our design is not fully automated.
In particular, parameter selection (e.g., selecting the fault detection thresholds for Rule-based methods)
using a combination of domain knowledge and heuristics (as summarized in Table 6.1), requires manual
intervention.
While our choice and our design of the four fault detection methods is targeted at SHORT, NOISE,
and CONSTANT faults, we discuss extensions to the Estimation method, the Time series analysis based
methods, and the learning-based methods that incorporate information from multiple sensors co-located
on the same node (inter-sensor relationships) and/or information from sensors attached to different nodes
(inter-node relationships) in Section 6.6. Leveraging these inter-node and inter-sensor relationships can
be useful for detecting data faults not considered in this paper, for example, those caused by calibration
errors [65].
Evaluation using injected faults. By articially injecting faults of varying intensity into sensor data
sets, we are able to study the detection performance of these methods. We nd that these methods sit at
different points on the accuracy/robustness spectrum. While rule-based methods can detect and classify
faults, they can be sensitive to the choice of parameters. By contrast, the estimation method we study can
tolerate errors in parameter choices (for example, errors in estimating the correlation between readings
from different sensor nodes) but relies on spatial correlations and cannot classify faults. Our seasonal
103
time series model based method exploits temporal correlations in sensor measurements to detect faults.
It is more accurate and robust at detecting SHORT faults than longer duration NOISE faults, and incurs
more false positives than the other methods. However, it can detect the onset of long duration NOISE
and/or CONSTANT faults accurately; even in situations where all the sensor nodes suffer from faults
simultaneously. Finally, our learning method (based on Hidden Markov Models) is cumbersome, partly
because it requires training, but it can fairly accurately detect and classify faults. Furthermore, at low fault
intensities, these techniques perform qualitatively differently: the learning method is able to detect more
NOISE faults but with higher false positives, while the rule-based method detects more SHORT faults,
with the estimation method's performance being intermediate. The time series forecasting based method
is able to detect low intensity SHORT faults and short duration NOISE faults but incurs a high false positive
rate. It has a low detection rate for long duration NOISE faults. Motivated by the different performance of
these methods, we also propose and evaluate hybrid detection techniques, which combine these methods
in ways that can be used to reduce false positives or false negatives, whichever is more important for the
application.
Evaluation on Real-World Datasets. Armed with this evaluation, we apply our detection methods (or, in
some cases, a subset thereof) to four real-world data sets. The longest of our data sets spans six months,
and the shortest spans one day. We examine the frequency of occurrence of faults in these real data
sets, using a very simple metric: the fraction of faulty samples in a sensor trace. We nd that faults are
relatively infrequent: often, SHORT faults occur once in about two days in one of the data sets that we
study, and NOISE faults are even less frequent. However, if a fault is not detected promptly, it can corrupt
a signicant fraction of samples for one dataset, 15-35% of samples are affected by a combination of
NOISE and CONSTANT faults across different nodes. We nd no spatial or temporal correlation among
faults. The different data sets exhibit different levels of faults: for example, in a six months-long dataset,
less than 0:01% of the samples are affected by faults, while in another 3-month long dataset, close to 20%
of the samples are affected. Finally, we nd that our detection methods incur false positives and false
negatives on these data sets, and hybrid methods are needed to eliminate one or the other.
104
Our study informs the research on ensuring data quality. Even though we nd that faults are relatively
rare, they are not negligibly so, and careful attention needs to be paid to engineering the deployment
and to analyzing the data. Furthermore, our detection methods could be used as part of an online fault
diagnosis system, i.e., where corrective steps could be taken during the data collection process based on
the diagnostic system's results.
6.2 Sensor Faults
In this section, we rst visually depict some faults in sensor readings observed in real datasets. These
examples are drawn from the same real-world datasets that we use to evaluate prevalence of sensor faults;
we describe details about these datasets later in the paper. These examples give the reader visual intuition
for the kinds of faults that occur in practice, and motivate the fault models we use in this paper.
0 0.5 1 1.5 2 2.5 3
x 10
5
0
500
1000
1500
2000
2500
Time (seconds)
Sensor Readings
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
x 10
4
Sensor Readings
Sample Number
(a) CONSTANT (b) SHORT
Figure 6.1: Errors in sensor readings
Figure 6.1(a) shows readings from a sensor reporting chlorophyll concentration measurements from
a sensor network deployment on lake water. Due to faults in the analog-to-digital converter board the
sensor starts reporting values 4 5 times greater than the actual chlorophyll concentration. Similarly, in
Figure 6.1(b), one of the samples reported by a humidity sensor is roughly 3 times the value of the rest of
the samples, resulting in a noticeable spike in the plot. Finally, Figure 6.2 shows that the variance of the
readings from an accelerometer attached to a MicaZ mote measuring ambient vibration increases when the
105
0 20 40 60 80 100
3.248
3.25
3.252
3.254
3.256
3.258
3.26
3.262
3.264
3.266
x 10
4
Sample Number
Sensor Readings
0 20 40 60 80 100
3.248
3.25
3.252
3.254
3.256
3.258
3.26
3.262
3.264
3.266
x 10
4
Sample Number
Sensor Readings
(a) Sufcient voltage (6 volts) (b) Low voltage (< 5 volts)
Figure 6.2: NOISE fault: Increase in variance
voltage supplied to the accelerometer becomes low. In the absence of ground truth values (as is the case
with the data shown in Figures 6.1 and 6.2), strictly speaking, the term fault refers to a deviation from the
expected value. Hence, these data faults can also be thought of as anomalies.
The faults in sensor readings shown in these gures characterize the kind of faults we observed in the
four datasets from wireless sensor network deployments that we analyze in this paper. We know of two
other sensor network deployments [93, 73] that have observed similar faults.
In this paper, we explore the following three fault models motivated by these examples:
1. CONSTANT: The sensor reports a constant value for a large number of successive samples. The
reported constant value is either very high or very low compared to the normal sensor readings
(Figure 6.1(a)) and uncorrelated to the underlying physical phenomena.
2. SHORT: A sharp change in the measured value between two successive data points (Figure 6.1(b)).
3. NOISE: The variance of the sensor readings increases. Unlike SHORT faults that affect a single
sample at a time, NOISE faults affect a number of successive samples (see Figure 6.2).
SHORT and NOISE faults were rst identied and characterized in [73] but only for a single dataset. Ni et
al. [65] categorize the three fault types dened above as representing the data-centric view of classifying
faults, i.e. these fault types are dened in terms characteristics of the faulty data.
106
6.2.1 What causes sensor faults ?
While it is not always possible to ascertain the root cause for sensor faults, several system (hardware and
software) faults have been known to result in sensor faults.
The typical hardware faults that have been observed to cause sensor faults include damaged sen-
sors, short-circuited connections, low battery, and calibration errors. For the sensor faults shown in Fig-
ure 6.1(a), we were able to establish that they were caused due a fault in the analog-to-digital converter.
Ramanathan et al. [73] and Szewczyk et al. [92] identied short circuit connections as the reason behind
abnormally large or small sensor readings resembling SHORT or NOISE faults. Low battery voltage re-
sulted in a combination of NOISE and CONSTANT faults at temperature sensors (see Figure 6.15) during
the INTEL Lab, Berkeley deployment [35]).
A well-known root cause for sensor data faults is calibration problems [74, 65]. Calibration errors can
corrupt the sensor measurements in different ways: (i) the measured value can differ from its true value
by a constant amount (Offset fault), (ii) the rate of the measured data can differ from the true/expected
rate (Gain fault), and (iii) the parameters associated with a sensor's original calibration formulas may
change during a deployment (Drift fault). Calibration errors can affect all the samples collected during
a deployment, and the faulty data may still exhibit normal patterns. For example, ambient temperature
measurements affected by an Offset fault will still exhibit a diurnal pattern. Without the availability of
ground truth values or a model for expected sensor behavior, detecting data faults due to calibration errors
remains an open problem. In Section 6.6, we discuss extensions to our fault detection methods that can
be used to automatically generate a model for expected sensor behavior by leveraging spatial correlation
across sensor nodes. Bychkovskiy et al. [12], and Balzano and Nowak [4] exploit spatial correlation across
sensor nodes to develop methods for online sensor calibration that can be used to recover from calibration
errors during a deployment, once such an error is detected.
An example of software fault is given in [65] where Ni et al. identify instances of SHORT faults due
to software errors during communication and data logging.
107
In this paper, our focus is on the prevalence of the SHORT, NOISE, and CONSTANT data faults and
methods for detecting these faults. We do not attempt to precisely establish the root cause for these faults.
The reader is referred to [65] for a detailed summary of known (system-level) root causes for sensor faults.
6.3 Detection Methods
In this paper, we explore and characterize four qualitatively different detection methods for detecting
SHORT, NOISE and CONSTANT faults. As discussed in Section 6.1, these methods leveraging different
types/sources of information for fault detection. Rule-based methods SHORT and NOISE rules lever-
age domain knowledge about sensor readings to develop heuristic rules/constraints that the sensor readings
must satisfy. The Linear Least-Squares Estimation (LLSE) based method denes normal sensor behav-
ior by leveraging spatial correlation in measurements at different sensors. The autoregressive intergrated
moving average (ARIMA) model based time series analysis methods leverage temporal correlations in
measurements collected by the same sensor to estimate the parameters of an (a priori selected) model for
these measurements. A sensor measurement is compared against its predicted value, computed using time
series forecasting, to determine if it is faulty. Finally, the learning-based hidden markov model (HMM)
method infers a model for the normal and faulty sensor readings using training data, and then statisti-
cally detect and identify classes of faults. These methods are described in detail in the rest of this section.
Table 6.1 provides a summary of these methods (along with their variations and parameters).
6.3.1 Rule-based (Heuristic) Methods
Our rst class of detection methods uses two intuitive heuristics for detecting and identifying the fault
types described in Section 6.2.
NOISE Rule: Compute the standard deviation of sample readings within a window N. If it is above a
certain threshold, the samples are corrupted by the NOISE fault.
To detect CONSTANT faults, we use a slightly modied NOISE rule where we classify the samples as
108
corrupted by CONSTANT faults if the standard deviation is zero. The window size N can be in terms of
time or number of samples. Clearly, the performance of this rule depends on the window size N and the
threshold. Determining the best value for the window size N requires domain knowledge, in particular, a
good understanding of the normal sensor readings. We discuss a heuristic for selecting the threshold value
later in this section.
SHORT Rule: Compute the rate of change of the physical phenomenon being sensed (temperature, hu-
midity etc.) between two successive samples. If the rate of change is above a threshold, it is an instance of
a SHORT fault.
For well-understood physical phenomena like temperature, humidity etc., the thresholds for the NOISE
and SHORT rules can be set based on domain knowledge. For example, [73] uses feedback from domain
scientists to set a threshold on the rate of change of chemical concentration in soil.
For automated threshold selection, [73] proposes the following technique:
Histogram based method: Divide the time series of sensor readings into groups of N samples.
Plot the histogram of the standard deviations or the rate of change observed for these groups of N
samples. Select one of the modes of the histogram as the threshold.
Clearly, if the histogram does not have a (distinct) mode, then the histogram based method will fail to
select a good threshold. For the NOISE rule, the Histogram method for automated threshold selection will
be most effective when, in the absence of faults, the histogram of standard deviations is uni-modal and
sensor faults affect the measured values in such a way that the histogram becomes bi-modal. However,
this approach is sensitive to the choice of N; the number of modes in the histogram of standard deviations
depends on N. Figure 6.3 shows the effect of N on the number of modes in the histogram computed
for sensor measurements taken from a real-world deployment [60]. The measurements do not contain a
sensor fault, but choosing a large value for N (500 or 1000) can result in the Histogram method selecting
an incorrect threshold. For example, choosing N = 1000 gives a multi-modal histogram (Figure 6.3.c);
this would result in false positives, if we select one of the two modes greater than 20 as the fault detection
109
threshold. As stated earlier, selecting the correct value for the parameter N requires a good understanding
of the normal sensor readings. In particular, a domain expert would have to suggest that N = 1000 in our
previous example was an unrealistic choice of parameter. In practice, one should also try a range of values
for N to ensure that the samples agged as faulty are not just an artefact of the value selected for N.
0 5 10 15 20 25 30
0
10
20
30
40
50
60
70
80
90
Std. Deviation
Frequency
5 10 15 20 25 30
0
5
10
15
20
25
Std. Deviation
Frequency
5 10 15 20 25 30 35
0
1
2
3
4
5
6
7
8
9
Std. Deviation
Frequency
(a) N=100 (b) N=500 (c) N=1000
Figure 6.3: Histogram Shape
6.3.2 An Estimation-Based Method
Is there a method that perhaps requires less domain knowledge in setting parameters? For physical phe-
nomena like ambient temperature, light etc. that exhibit a diurnal pattern, statistical correlations between
sensor measurements can be exploited to generate estimates for the sensed phenomenon based on the
measurements of the same phenomenon at other sensors. Regardless of the cause of the statistical corre-
lation, we can exploit the observed correlation in a reasonably dense sensor network deployment to detect
anomalous sensor readings.
More concretely, suppose the temperature values reported by sensors s
1
and s
2
are correlated. Let
^
t
1
(t
2
) be the estimate of temperature at s
1
based on the temperature t
2
reported by s
2
. Let t
1
be the actual
temperature value reported by s
1
. Ifjt
1
^
t
1
j > , for some threshold , we classify the reported reading
t
1
as erroneous. If the estimation technique is robust, in the absence of faults, the estimate error (jt
1
^
t
1
j)
would be small whereas a fault of the type SHORT or CONSTANT would cause the reported value to
differ signicantly from the estimate.
110
In this paper we consider the Linear Least-Squares Estimation (LLSE) method [40] as the estimation
technique of choice. In scalar form, the LLSE equation is
^
t
1
(t
2
) = m
t1
+
t1t2
t2
(t
2
m
t2
) (6.1)
where m
t1
and m
t2
are the average temperatures at s
1
and s
2
, respectively.
t1t2
is the covariance between
the measurements reported by s
1
and s
2
, and
t2
is the variance of the measurements reported by s
2
.
In the real-world, the value t
2
might itself be faulty. In such situations, we can estimate
^
t
1
based on
measurements at more than one sensor using the LLSE equations for the vector case (a straight forward
and well-known generalization of the scalar form equation).
In general, the information needed for applying the LLSE method may not be available a priori. In ap-
plying the LLSE method to a real-world dataset, we divide the dataset into a training set and a test set. We
compute the mean and variance of sensor measurements, and the covariance between sensor measurements
based on the training dataset and use them to detect faulty samples in the test dataset. This involves an
assumption that, in the absence of faults or external perturbations, the physical phenomenon being sensed
does not change dramatically between the time when the training and test samples were collected. We
found this assumption to hold for many of the datasets we analyzed.
Threshold for fault detection: We set the threshold used for detecting faulty samples based on the
LLSE estimation error for the training dataset. We use the following two heuristics for determining :
Maximum Error: If the training data has no faulty samples, we can set to be the maximum
estimation error for the training dataset, i.e. = maxfjt
1
^
t
1
j : t
1
2 TSg where TS is set of all
samples in the training dataset.
Condence Limit: In practice, the training dataset will have faults. If we can reasonably estimate,
e.g., from historical information, the fraction of faulty samples in the training dataset, (say) p%, we
111
can set to be the upper condence limit of the (1p)% condence interval for the LLSE estimation
errors on the training dataset.
0 2000 4000 6000 8000 10000
20
40
60
80
100
120
140
160
180
Sample Number
Sensor Reading
Actual Readings
LLSE estimate
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
Sample Number
Estimation Error
Estimation error
(a) LLSE Estimate (b) Estimation Error
Figure 6.4: LLSE on NAMOS dataset
Figure 6.4 is a visual demonstration of the feasibility of LLSE, derived from one of our datasets. It
compares the LLSE estimate of sensor readings at a single node A based on the measurements reported
by a neighboring node B, with the actual readings at A. The horizontal line in Figure 6.4(b) represents
the threshold using the Maximum Error criterion. The actual sensor data had no SHORT faults and the
LLSE method classied only one out of 11;678 samples as faulty. We return to a more detailed evaluation
of LLSE-based fault detection in later sections.
Finally, although we have described an estimation-based method that leverages spatial correlations, this
method can equally well be applied by only leveraging temporal correlations at a single node. By extracting
correlations induced by diurnal variations at a node, it might be possible to estimate readings, and thereby
detect faults, at that same node. The method described next presents one approach for exploiting these
temporal correlations for fault detection.
6.3.3 A Time Series analysis based Method
Physical phenomena such as temperature, ambient light, etc. exhibit a diurnal pattern. If these phenomena
are measured periodically for a long time interval (as is the case with several sensor network deployments),
112
the resulting time series of measurements by a sensor captures the diurnal pattern as well as other (shorter)
time scale temporal correlations. These temporal correlations can be exploited to construct a model for the
sensor measurements using time series analysis. Time series analysis is a popular technique for analyzing
periodically collected data. For example, it is used by businesses to model and forecast demand for elec-
tricity, sale of airline tickets, etc. that exhibit temporal correlations like a diurnal and/or a seasonal pattern
[14].
In this paper, we use a multiplicative (0;1;1)x(0;1;1)
s
seasonal ARIMA time series model for fault
detection, where the parameter s captures the periodic behavior in the sensor measurement time series;
for example, temperature measurements exhibit similarities with period s = 24 hours. The multiplicative
seasonal model is widely used for modeling and forecasting of time series with periodicity [9]. It can be
written explicitly as,
z
t
= z
t1
+ z
ts
z
ts1
+ a
t
a
t1
a
ts
+ a
ts1
(6.2)
where z
t
is the sensor reading and a
t
is a sample drawn from a white noise process at time t. Equation
(6.2) shows how the model accounts for periodicity: z
t
depends not only on the measurement at time t1
but also on measurements made s time samples in the past, namely at times t s and t s 1. For more
details on seasonal models, we refer the interested reader to [9].
Fault detection using forecasting: We used the implementation of maximum likelihood (ML) computa-
tional method (Chapter 7, [9]) in SAS [78], a commonly used software package for time series analysis,
to estimate the two parameters, and , of the model using training data. To detect faults in a sensor
measurement time series, we rst forecast the sensor measurement at time t based on our model (using
standard time series forecasting tools available in SAS). We then compute the difference between actual
sensor measurement at time t and its predicted value, and ag the measurement as faulty if this difference
is above a threshold .
113
We used two different durations of forecasting for fault detection.
One-step ahead: We forecast the sensor measurement for time t + 1, ^ z
t+1
, based on the measure-
ments up to time t. We then compare the measurement at time t+1, z
t+1
, against its predicted value
^ z
t+1
to determine if its faulty.
L-step ahead: Using measurements up to time t, we forecast the values for time t + i, 1 i L
with L > 1. We then compare the actual measurements for time t + i, 1 i L against their
forecast value. If the difference between the measured value and its forecast for any sample is
greater than , we ag that sample as faulty.
One-step ahead forecasting is more suited for detecting SHORT faults. The idea behind L-step ahead
forecasting is to detect faults that last for long durations (for example, the fault shown in Figure 6.15).
However, the potential error in our forecast grows with L (Chapter 5, [9]). In order to control false posi-
tives due to erroneous forecast values, we restrict L s.
Threshold for fault detection: We use two heuristics to determine the threshold for fault detection.
Forecast Condence Interval: For each sensor measurement in our test dataset, we compute both
the forecast value and the 95% condence interval for the forecast value. If a sensor measurement
lies outside the 95% condence interval for its forecast value, we ag that measurement as faulty.
Instead of using a xed threshold for all the measurements, the condence interval based heuristic
adapts the threshold value dynamically for each measurement. If the condence interval for the
forecast value for a measurement is small (indicating that we have a high condence that the forecast
value represents the true value), then the threshold for that measurement is small. Similarly, if the
condence interval for the forecast value for a measurement is large, then the threshold for that
measurement is large.
114
Forecast Error: If the sensor measurement time series is monitored continuously for a long duration
for the presence of faults, we can use the difference in forecast values and actual measurements
observed in the past to set the threshold using the Condence Limit heuristic described for
LLSE based method in Section 6.3.2.
Alternatives to the ARIMA model: The choice of a time series model for sensor measurements is de-
termined by the nature of the phenomenon being measured. For example, if the sensor measurements do
not exhibit periodicity, an autoregressive (AR) or a moving average (MA) (or a combination of the AR
and MA models called the autoregressive moving average (ARMA) model) would be more appropriate for
time series analysis. The model that we use in this work is one of the simplest seasonal models available.
It is possible that a more complex season model can be a better t for the sensor measurement time series
that we analyze in this paper. However, using a more complex model requires estimating more parameters,
and generally, implies a more computationally intensive training phase requiring a larger training dataset.
Our results with real-world datasets (Section 6.5) show that the model that we use in this paper is effective
at detecting faults in a time series of temperature measurements. The issue of determining the best-t time
series model for modeling phenomena such as outdoor temperature and humidity is not the focus of our
work.
6.3.4 A Learning-based Method
For phenomena that may not be spatio-temporally correlated, a learning-based method might be more
appropriate. For example, if the pattern of normal sensor readings and the effect of sensor faults on the
reported readings for a sensor are well understood, then we can use learning-based methods, for example
Hidden Markov Models (HMMs) and neural networks, to construct a model for the measurements reported
by that sensor. In this paper we chose HMMs because they are a reasonable representative of learning based
methods that can simultaneously detect and classify sensor faults. Determining the most effective learning
based method is outside the scope of this paper.
115
A Hidden Markov Model (HMM) is characterized by the following.
The number of states in the model, S.
The set of possible measurements, O.
For each state s2 S, the conditional probability of observing a measurement o2 O, Pfoj sg.
The state transition probabilities A = fa
ij
g where a
ij
represents the probability of a transition to
state j from state i.
The initial state distribution =f
i
g where
i
is the probability that the HMM starts in state i.
Although the states of an HMM are hidden, we can attach some physical signicance to them. For ex-
ample, based on our characterization of faults in Section 6.2, for a sensor measuring ambient temperature,
we can use a 5-state HMM with the states corresponding to day, night, SHORT faults, NOISE faults, and
CONSTANT faults. Such an HMM can capture not only the diurnal pattern of temperature but also the
distinct patterns in the reported values in the presence of faults.
For the basic HMM, the set of possible measurements O is discrete. In this paper, we use the basic
HMM, and when modeling temperature, humidity, etc. we make the observation space discrete by cluster-
ing the sensor readings into bins. For example, we bin temperature measurements into bins of width 0:1.
If the observation space is continuous (and cannot be made discrete), one can use a continuous density
HMM (CDHMM), dened for a continuous observation space. However, a CDHMM is computationally
more complex than a basic HMM. We chose a basic HMM over CDHMM because (a) we could make the
observation space discrete without introducing signicant rounding-off errors, and (b) we wanted to avoid
the additional computational complexity involved in using a CDHMM (provided the basic HMM based
method proved effective at detecting sensor faults). Our results with injected faults (Section 6.4) and real-
world datasets (Section 6.5) demonstrate that our basic HMM based method is effective at detecting the
types of sensor faults we consider in this paper.
116
Given values for S;O;Pfoj Sg;A, and , and a sequence of sensor measurementsfO
t
g, the HMM
can generate the most likely state S
t
that resulted in observation O
t
for each observation. If the state S
t
associated with an observation O
t
is a fault state (SHORT, NOISE, or CONSTANT), then we classify
observation O
t
as faulty. Thus, our HMM based method can detect as well as classify faults.
In order to estimate the parameters S;O;Pfo j Sg;A, and of the HMM used for fault detection,
we used a supervised learning technique. We injected faults into (fault-free) training dataset (using the
techniques described in Section 6.4), labeled each sample as fault-free or faulty with a particular fault
type, and used this labeled data for estimating the parameters of the HMM. For details on the techniques
used for estimating the parameters of an HMM, and for generating the most likely sequence of states for a
given sequence of observations, please refer to the tutorial by Rabiner [72]. We used the implementation
of HMMs provided in MATLAB [51].
6.3.5 Hybrid Methods
Finally, observe that we can use combinations of the Rule-based, LLSE, ARIMA and HMM methods to
eliminate/reduce the false positives and negatives. In this paper, we study two such schemes:
Hybrid(U): Over two (or more) methods, this method identies a sample as faulty if at least one
of the methods identies the sample as faulty. Thus, Hybrid(U) is intended for reducing false neg-
atives (it may not eliminate them entirely, since all methods might fail to identify a faulty sample).
However, it can suffer from false positives.
Hybrid(I): Over two (or more) methods, this method identies a sample as faulty only if both (all)
the methods identify the sample as faulty. Essentially, we take an intersection over the set of samples
identied as faulty by different methods. Hybrid(I) is intended for reducing false positives (again, it
may not eliminate them entirely) but suffers from false negatives.
Several other hybrid methods are possible. For example, Hybrid(U) can be easily modied so that
results from different methods have different weights in determining if a measurement is faulty. This would
117
be advantageous in situations where a particular method or heuristic is known to be better at detecting faults
of a certain type.
Table 6.1 summarizes the methods (and their specic variations) described in this section. It also
highlights the parameters associated with each method that must be determined using expert knowledge,
heuristics or empirically (trial and error) in order to be able to use these methods for fault detection. For
each parameter, we also specify, within brackets, the approaches used in this paper for determining their
value.
Method Parameters
SHORT & NOISE rules N: sample window size (domain knowledge),
: threshold for fault detection (Histogram method)
Linear Least-Squares Estimation (LLSE) : threshold for fault detection
(Maximum training error, Condence Limit)
ARIMA model s: seasonality (domain knowledge)
ARIMA model: One-step : threshold for fault detection
(Forecast Condence Interval, Forecast error)
ARIMA model: L-step L: look-ahead forecast parameter
(domain knowledge and empirically),
: threshold for fault detection
(Forecast Condence Interval, Forecast error)
HMM Number of states (domain knowledge)
Table 6.1: Fault Detection Methods
6.4 Evaluation: Injected Faults
Before we can evaluate the prevalence of faults in real-world datasets using the methods discussed in the
previous section, we need to characterize the accuracy and robustness of these methods. To do this, we
articially injected faults of the types discussed in Section 6.2 into sensor measurements from a real-world
dataset containing measurements for chlorophyll concentration in lake water [60]. Ideally, before injecting
faults, we should ensure that the dataset does not have any faulty samples. However, since we did not have
ground truth information about faults for this dataset, we relied on a combination of visual inspection, and
feedback from the members of the NAMOS project to ensure (to the extent possible) that the dataset did
not contain any faults.
118
This methodology has two advantages. Firstly, injecting faults into a dataset gives us an accurate
ground truth that helps us better understand the performance of a detection method. Secondly, we are
able to control the intensity of a fault and can thereby explore the limits of performance of each detection
method as well as comparatively assess different schemes at low fault intensities. Many of the faults we
have observed in existing real datasets are of relatively high intensity; even so, we believe it is important
to understand the behavior of fault detection methods across a range of fault intensities, since it is unclear
if faults in future datasets will continue to be as pronounced as those in today's datasets.
Sensor measurements for injecting faults. For evaluating the Rule-based methods, LLSE, HMM and the
two hybrid methods, we inject NOISE faults into measurements of chlorophyll concentration in lake water
collected by buoy number 106 during the NAMOS deployments in October, 2005 [60]. Buoy number 106
collected a measurement every 8 seconds, and collect 22;600 measurements in total. We use the samples
collected during the rst 24 hours (the rst 11;000 samples) as training data to train LLSE and the HMM.
We inject Noise faults into the remaining samples, and we use these samples to test our methods. To train
the HMM, we inject faults in the training data as well. These faults were of the same duration and intensity
as the faults used for comparing different methods. We did not inject any faults in the training data for
LLSE.
The samples from the NAMOS deployment did not provide enough training data to estimate the pa-
rameters of our ARIMA model, and hence, we use samples from the SensorScope deployment to evaluate
the ARIMA model based method. We injected NOISE faults into temperature measurements collected
by weather station 3 during the SensorScope deployment [79]. This weather station took a temperature
measurement every 30 seconds. We estimate the parameters of the ARIMA model using samples collected
over 3 days (8640 total samples) by station 3. We inject NOISE faults into samples from another day (2880
samples collected over 24 hours), and use these samples to test our ARIMA model based methods.
119
Below, we discuss the detection performance of various methods for each type of fault. We describe
how we generate faults in the corresponding subsections. We use three metrics to understand the per-
formance of various methods: the number of faults detected, false negatives, and false positives. More
specically, we use the fraction of samples with faults as our metric, to have a more uniform depiction of
results across the data sets. For the gures pertaining to this section and Section 6.5, the labels used for
different detection methods are:R: Rule-based, L: LLSE, H: HMM, OS: ARIMA model based One-step
ahead forecasting, LS: ARIMA model based L-step ahead forecasting, U:Hybrid(U), and I: Hybrid(I).
6.4.1 SHORT Faults
R L H U I R L H U I R L H U I R L H U I
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
−3
Fraction of Samples with Faults
Intensity = 1.5, 2, 5, 10
Detected
False Negative
False Positive
OS LS OS LS OS LS OS LS
0
1
2
3
4
5
6
7
8
x 10
−3
Intensity = 1.5,2,5,10
Fraction of Samples with Faults
Detected
False Negative
False Positive
(a) SHORT Rule, LLSE, HMM, and Hybrid Methods (b) ARIMA Model
Figure 6.5: Injected SHORT faults
To inject SHORT faults, we picked a sample i and replaced the reported value v
i
with ^ v
i
= v
i
+fv
i
.
The multiplicative factor f determines the intensity of the SHORT fault. We injected SHORT faults with
intensity f = f1:5;2;5;10g. Injecting SHORT faults in this manner (instead of just adding a constant
value) does not require the knowledge of the range of normal sensor readings.
Figures 6.5.a and 6.5.b depict the performance of our various methods for detecting SHORT faults of
different intensities. The horizontal line in both the gures represents the actual fraction of samples with
injected faults. The four sets of bar plots correspond to increasing intensity of SHORT faults (left to right).
120
The ARIMA model based methods incur signicantly higher number of false positives compared to
other methods. We plot their performance separately in Figure 6.5.b in order to be able to depict the
performance of all the other methods with better clarity in Figure 6.5.a. For the Hybrid(U) and Hybrid(I),
we include all the methods except the ARIMA model based methods. Including the ARIMA model based
method in our hybrid methods would signicantly increase the number of false positives for the Hybrid(U)
method.
The SHORT rule and LLSE do not have any false positives; hence, the Hybrid(I) method exhibits no
false positives (thus eliminating the false positives incurred by the HMM based method). However, for
faults with low intensity (f = 1:5;2), the SHORT rule as well as LLSE have signicant false negatives,
and hence, the Hybrid(I) method also has a high number of false negatives for these intensities.
The HMM method has fewer false negatives compared to SHORT rule and LLSE but it has false
positives for the lowest intensity (f = 1:5) faults. While training the HMM for detecting SHORT faults,
we observed that if the training data had a sufcient number of SHORT faults (on the order of 15 faults in
11000 samples), the intensity of the faults did not affect the performance of the HMM.
It is evident from Figure 6.5.a that Hybrid(U) performs like the method with more detections and
Hybrid(I) performs like the method with less detections (while eliminating the false positives). However,
in general this does not have to be the case, e.g., in the absence of false positives, Hybrid(U) could detect
more faults than the best of the methods and Hybrid(I) could detect fewer faults than the worst of the
methods (as illustrated on the real data sets in Section 6.5).
Our ARIMA model based methods do not perform as well as the other methods. Even though the
One-step (OS) and the L-step (LS) ahead forecasting methods are able to detect most of the high intensity
f = f5;10g faults, overall, they incur a signicantly higher fraction of false positives than the other
methods. However, comparing the performance of One-step ahead forecasting method against the L-step
ahead forecasting with L = 120 samples shows that for all fault intensities, One-step ahead forecasting
performs better than the L-step ahead forecasting, especially for faults with intensity f = 10. This is to
121
be expected because fault detection with L-step ahead forecasting is better suited for detecting faults that
affect more than one sample, for example NOISE and CONSTANT faults.
The choice of threshold used to detect a faulty sample governs the trade-off between false positives and
false negatives; reducing the threshold would reduce the number of false negatives but increase the number
of false positives. We select the threshold using the histogram method (with window size N = 100) for the
Rule-based methods, the Maximum Error heuristic for LLSE, the Forecast Error heuristic for One-step
ahead forecasting, and the Forecast Condence Interval heuristic for L-step ahead forecasting.
6.4.2 NOISE Faults
To inject NOISE faults, we pick a set of successive samples W and add a random value drawn from a
normal distribution, N(0;
2
), to each sample in W . We vary the intensity of NOISE faults by choosing
different values for . The Low, Medium and High intensity of NOISE faults correspond to 0:5x, 1:5x,
and 3x increase in standard deviation of the samples in W . Apart from varying the intensity of NOISE
faults, we also vary their duration by considering different numbers of samples in W .
Duration of Noise faults. We inject NOISE faults of duration (number of samples in W ) 3000 samples,
2000 samples, 1000 samples, and 100 samples for evaluating the NOISE rule, LLSE, HMM, and the two
hybrid methods. Since these samples (from the NAMOS deployment) were collected at an interval of 8
seconds, in terms of time, these fault durations range from longer than 6 hours (for 3000 samples) to less
than 15 minutes (for 100 samples).
For our ARIMA model based method, like in the case of injected SHORT faults, we use One-step ahead
forecasting and L-step ahead (L = 120) forecasting for detecting the (injected) NOISE faults. In order to
understand the impact of the parameter L on the performance of the L-step ahead forecasting based NOISE
fault detection, we vary the duration of NOISE faults relative to L = 120. Hence, we inject NOISE faults
of duration 720 samples, 120 samples, and 60 samples. Note that a fault affecting 720 samples lasts for 6
hours because the SensorScope deployment used a sampling interval of 30 seconds.
122
R L H U I R L H U I R L H U I
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Fraction of Samples with Faults
Intensity = Low, Medium, High
Detected
False Negative
False Positive
Figure 6.6: NOISE Fault: 3000 samples
R L H U I R L H U I R L H U I
0
0.02
0.04
0.06
0.08
0.1
0.12
Fraction of Samples with Faults
Intensity = Low, Medium, High
Detected
False Negative
False Positive
Figure 6.7: NOISE Fault: 2000 samples
R L U I R L H U I R L H U I
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Fraction of Samples with Faults
Intensity = Low, Medium, High
Detected
False Negative
False Positive
Figure 6.8: NOISE Fault: 1000 samples
R L U I R L U I R L U I
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Fraction of Samples with Faults
Intensity = Low, Medium, High
Detected
False Negative
False Positive
Figure 6.9: NOISE Fault: 100 samples
Figures 6.6 (jWj = 3000), 6.7 (jWj = 2000), 6.8 (jWj = 1000) and 6.9 (jWj = 100) show the
performance of the NOISE rule, LLSE, HMM and the hybrid methods for NOISE faults with varying
intensity and duration. For the ARIMA model, Figures 6.10 (jWj = 60), 6.11 (jWj = 120), and 6.12
(jWj = 720) show the performance of One-step and L-step (L = 120) ahead forecasting based fault
detection. The horizontal line in each gure corresponds to the fraction of samples with faults.
6.4.2.1 Impact of Fault Duration
The impact of NOISE fault duration is most dramatic for the HMM method. ForjWj = 100, regardless of
the fault intensity, the number of faulty samples were not enough to train the HMM model. Hence, Figure
(6.9) does not show results for HMM. ForjWj = 1000 and low fault intensity, we again failed to train the
123
OS LS OS LS OS LS
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Intensity = Low, Medium, High
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.10: NOISE, ARIMA: 60 samples
OS LS OS LS OS LS
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Intensity = Low, Medium, High
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.11: NOISE, ARIMA: 120 samples
OS LS OS LS OS LS
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Intensity = Low, Medium, High
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.12: NOISE, ARIMA: 720 samples
HMM model. This is not very surprising because for short duration (e.g.,jWj = 100) or low intensity
faults, the data with injected faults is very similar to data without injected faults. For faults with medium
and high intensity or faults with sufciently long duration, e.g.,jWj 1000, performance of the HMM
method is comparable to the NOISE rule and LLSE.
The NOISE rule and LLSE method are more robust to fault duration than HMM in the sense that we
were able to derive model parameters for those cases. However, forjWj = 100 and low fault intensity,
both the methods fail to detect any of the samples with faults. The LLSE also has a signicant number of
false positives forjWj = 100 and fault intensity 0:5x. The false positives were eliminated by the Hybrid(I)
method.
124
For the ARIMA model based method, One-step ahead forecasting based method detects fewer faults
as the fault duration increases. For example, for jWj = 60 samples and high fault intensity, One-step
forecasting based method detects 50% of faults, but forjWj = 720 samples and high fault intensity, it
detects only 9% of the faults. L-step ahead forecasting based method is more robust to increase in fault
duration. It detects 41% and 33% of the high intensity faults whenjWj = 60 samples andjWj = 720
samples, respectively, and hence, the degradation in its performance is not as severe as in case of the One-
step ahead forecasting based method. It is also worth noting that for NOISE faults affectingjWj = 60
samples andjWj = 120 samples, regardless of the fault intensity, One-step ahead forecasting detects more
faults than L-step ahead forecasting. However, for NOISE faults affectingjWj = 720 samples, L-step
ahead forecasting detects more faults than One-step ahead forecasting. Hence, overall, the One-step ahead
forecasting based method is more suited for detecting short-to-medium duration faults, whereas L-step
ahead forecasting based method is need when the faults last for a long duration.
6.4.2.2 Impact of Fault Intensity
For medium and high intensity faults, there are no false negatives for the three methodsNOISE rule,
LLSE and HMM. For low intensity faults, these three methods have signicant false negatives. For fault
duration and intensities for which the HMM training algorithm converged, the HMM method gives lower
false negatives compared to the NOISE rule and LLSE. However, most of the time the HMM method gave
more false positives. Hybrid methods are able to reduce the number of false positives and negatives, as
intended. Like the other methods, the ARIMA model based One-step ahead and L-step ahead forecasting
methods perform better (detect more faults and incur fewer false positives) as the fault intensity increases.
High false negatives for low fault intensity arise because the measurements with injected faults are very
similar to the measurements without faults.
Overall, however, the ARIMA model based method does not perform as well as the other three meth-
ods. For example, for NOISE faults lasting 6 hours (720 samples in case of the ARIMA model and 3000
samples for the other methods), even with high fault intensity, the ARIMA model based method detects
125
33% of the faults whereas the other three methods can detect all the faults. The ARIMA model based
method also incurs a higher rate of false positives compared to the other methods.
6.4.2.3 ARIMA modelImpact of parameter L
The performance results for the ARIMA model indicate that, for detecting very long duration faults, L-step
ahead forecasting is better suited than the One-step ahead forecasting. However, for high intensity NOISE
faults lasting for 6 hours (affecting 720 samples), by estimating the measurements 1 hour in advance
(using L-step ahead forecasting with L = 120), we could detect only 33% of the faults. Will increasing
the forecasting interval, L, help us detect more faults?
OS LS (1hr) LS (2 hr) LS (6 hr)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.13: High intensity, Long duration
NOISE faults
OS LS (1 hr) LS (2 hrs) LS (6 hrs)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.14: CONSTANT and NOISE faults
Figure 6.13 shows the performance of L-step ahead forecasting technique for detecting high intensity
NOISE faults with durationjWj = 720 samples. Contrary to our intuition, setting L equal to the fault
duration (720 samples or 6 hours) detects less faults than L = 120 samples (1 hour). As we increase L,
the uncertainty in the forecast value grows. Due to this increased uncertainty, we are forced to set the
fault detection threshold to a large value (using the Forecast Condence Interval heuristic dened in
Section 6.3.3) to prevent large number of false positives. Even though the variance of the NOISE fault is
high, for most of the samples, the change in the sensor measurement (due to the additive noise) is small
compared to the threshold . Hence, fewer faults are detected for large value of L. For L = 120 samples,
the uncertainty in the forecast value is smaller and hence, we can set a lower threshold value. By setting a
126
lower threshold value, we are able to detect more faults because now the change in sensor measurements
due to additive noise is larger than the threshold for a larger fraction of faulty samples. Thus, forecasting
too far ahead into the future (large value of L) is not always benecial. The benets of using a large value
of L to detect long duration faults can be outweighed by the increased error in forecast value for large L.
Does there exist a scenario for which increasing the value of L improves the performance of the
ARIMA model based L-step ahead forecasting? In the same temperature measurement time series as
the one used to obtain the results in Figure 6.13 (obtained from the SensorScope deployment), we inject a
combination of high intensity NOISE and CONSTANT faults into 720 samples. The CONSTANT faults
add a value of 20 to each sample. Figure 6.14 shows the impact of increasing L on the number of faults
detected. As we increase the value of L, we detect more faults and for L equal to the duration of faults,
we detect all the faults. In this scenario, the faults are of such a high intensity that the increase in thresh-
old (due to a larger value of L) is not large enough to cause false negatives. A similar combination of
CONSTANT and NOISE faults occurred in the INTEL lab dataset [35], and for this dataset, increasing the
value of L enabled us to detect more faults (refer to Figure 6.17).
The results shown in Figures 6.13 and 6.14 demonstrate that the best value of L for detecting long
duration faults depends not only on the duration of the faults but also on the difference in magnitude
between the faulty and normal sensor readings. For the real world datasets, contextual information about
the phenomenon being sensed and the nature of faults are useful in determining the best value for L.
For slowly varying measurements like ambient temperature, humidity, etc., if the long duration faults
increase only the variance of the sensor measurements, limiting the forecast interval to a short duration
(for example, an hour) works best. However, if the faults change both the mean and the variance of the
sensor measurements, matching the forecast interval to the fault duration (to the extent possible) gives
better performance. Apart from using this contextual information, we also relied on trial and error to
determine the value of L that gave the best performance.
127
6.4.3 Other Hybrid methods
We noted earlier that in our evaluation with injected faults, Hybrid(U) performs like the method with
more detections and Hybrid(I) performs like the method with less detections (while eliminating the false
positives). As a result, for low intensity NOISE faults (see Figures 6.6 and 6.7), Hybrid(U) performs like
the HMM, and Hybrid(I) performs like the NOISE rule.
Note that the LLSE method, apart from detecting faulty measurements, also provides an estimate of
their correct value. We can leverage this fact to design a hybrid method that uses two different methods in
sequence as follows.
Use the LLSE method to identify (some of the) faulty measurements.
Replace these measurements with their estimates from LLSE.
Use this modied time series of measurements as input to another method.
We next evaluate whether the hybrid approach of using two different methods in sequence can detect more
low intensity faults.
LLSE and HMM: We use a combination of LLSE and HMM to detect low intensity NOISE faults with
durationjWj = 3000, andjWj = 2000 samples injected into readings from the NAMOS, October, 2005
deployment [60]. These measurements are also used for the evaluation results shown in Figures 6.6 and 6.7.
Table 6.2 shows a comparison between the LLSE, the HMM, and the combined LLSE!HMM method.
Note that using the LLSE and the HMM methods in sequence helps us detect more faults than either of the
two methods.
Duration LLSE HMM LLSE!HMM
# samples
2000 40 68 70
3000 23.3 71 74
Table 6.2: Low Intensity NOISE faults, NAMOS, Percentage of faulty samples detected
128
LLSE and ARIMA: Low intensity faults can also be detected using a combination of the LLSE and the
ARIMA model based methods. Table 6.3 compares the performance of two in sequence hybrid methods
(LLSE!ARIMA (One-step), and LLSE!ARIMA (L-step), L=120) against several other methods for
detecting low intensity NOISE faults. The results in Table 6.3 are for the same sensor measurements as
the ones used for Figures 6.10, 6.11, and 6.12.
We make three interesting observations based on the results shown in Table 6.3. First, the combined
LLSE!ARIMA (L-step) method outperforms the ARIMA (L-step) method, but its performance is com-
parable to the LLSE method. Hence, combining LLSE and ARIMA (L-step) does not provide a sig-
nicant benet over using the LLSE method. Second, for fault durations of 120 and 720 samples, the
LLSE!ARIMA (One-step) hybrid method outperforms both the LLSE and the ARIMA (One-step) meth-
ods; with the hybrid method detecting 4:3% more faults compared to the best performing method (LLSE)
for the fault duration equal to 720 samples. Third, for fault duration equal to 60 samples, the hybrid
LLSE!ARIMA (One-step) method detects 11:9% fewer faults than the ARIMA (One-step) method, but
it outperforms LLSE by a small margin. Thus, using two different methods in sequence may not always
perform better than either of the two methods.
Duration LLSE ARIMA LLSE!ARIMA ARIMA LLSE!ARIMA
# samples (time) (One-step) (One-step) (L-step) (L-step)
60 (0.5 hr.) 23.7 37.3 25.4 1.7 23.7
120 (1 hr.) 23.5 25.2 26.9 0 23.5
720 (6 hrs.) 35.9 7.4 40.2 6.7 37.4
Table 6.3: Low intensity NOISE faults, Percentage of faulty samples detected
Duration LLSE ARIMA LLSE!ARIMA ARIMA LLSE!ARIMA
# samples (time) (One-step) (One-step) (L-step) (L-step)
60 (0.5 hr) 9.6 1 2 0.87 0.97
120 (1 hr) 9.6 0.76 1.8 0.87 0.97
720 (6 hrs) 4.5 0 0.3 0.2 0.2
Table 6.4: False Positives as % of total # samples (2880)
False positives: Using two (or more) methods in sequence to detect faults can increase the number of false
positives. Consider the case of using LLSE and ARIMA methods in sequence. In the rst step, we replace
129
the value of all the samples identied as faulty by their estimate given by the LLSE method. If the LLSE
method has false positives, then we alter the values of these normal (not faulty) samples. In addition, if
the estimates from LLSE are not good (differ signicantly from normal sensor readings), these samples
might be identied as faulty by the ARIMA method in the next step.
Table 6.4 compares the false positive rate for the hybrid methods LLSE!ARIMA (One-step and L-
step) against the ARIMA methods. For reference, we also show the false positive rate for the LLSE method.
The increase in the false positive rate is more signicant for the LLSE!ARIMA (One-step) compared to
LLSE!ARIMA (L-step). For both, the One-step and the L-step methods, we used the Forecast Condence
Interval heuristic (Section 6.3.3) to determine the threshold for fault detection. As discussed in Section
6.3.3, the condence interval is smaller for One-step ahead forecasting (resulting in a smaller ) compared
to the L-step ahead forecasting. Hence, when used in sequence with the LLSE method, the ARIMA (One-
step) method is more vulnerable to false positives (due to the false positives from LLSE in the rst step)
compared to the ARIMA (L-step) method.
In summary, the evaluation presented in Tables 6.2, 6.3, and 6.4 show that if it is possible to obtain a
good estimate of the correct value of an erroneous measurement, then using two methods in sequence can
(possibly) detect more faults at the expense of a (slightly) higher false positive rate.
6.5 Faults in Real-World data sets
We analyze four datasets from real-world deployments SensorScope [79], Great Duck Island (GDI) [50],
INTEL Berkeley Lab [35], and NAMOS [61] for prevalence of faults in sensor traces. The sensor traces
contain measurements for temperature, humidity, light, pressure, and chlorophyll concentration. All of
these phenomena exhibit a diurnal pattern in the absence of outside perturbation or sensor faults.
130
Out of the four datasets, we were able to apply all four fault detection methods only to the SensorScope
dataset. We could not apply one or more methods to the other datasets due to a variety of factorsfor
example, the NAMOS dataset did not have enough data for training. We discuss these factors in detail
below. Table 6.5 provides a summary of the methods applied to each of the four datasets for fault detection.
Dataset Rule-based LLSE HMM ARIMA
SensorScope [79] X X X X
INTEL Lab [35] X X X
GDI [50] X X
NAMOS [61] X
Table 6.5: Real-world datasets and Detection methods
6.5.1 SensorScope
The SensorScope project is an ongoing outdoor sensor network deployment consisting of weather-stations
with sensors for sensing several environmental quantities such as temperature, humidity, solar radiation,
soil moisture, and so on [79]. We analyzed the temperature measurements collected every 30 seconds over
six months at 64 weather stations.
We did not have the ground truth regarding faulty samples for this dataset. Since this dataset is very
large (more than 500,000 samples per weather station), we used a combination of visual inspection and
Rule-based methods to identify samples with (very likely) faulty temperature values. The samples identi-
ed as faulty not only provide a ballpark estimate for the prevalence of faults, but also serve as a benchmark
against which we compare the performance of other fault detection methods.
Using visual inspection and the SHORT rule, we identied approximately 0:01% of the total samples
as affected by SHORT faults. There was signicant variation in the prevalence of SHORT faults across
individual weather stations some stations did not have any faulty samples whereas one weather station
(ID=39) had more than 0.07% of samples affected by SHORT faults. We did not nd any instance of
NOISE and CONSTANT faults. Table 6.6 shows the performance of various methods. The Detected
131
Method Detected False Positive
(% of total # faulty samples) (% of total # samples)
HMM 35.3 < 0.01
LLSE 69.8 0.01
ARIMA (One-step) 96.7 0.02
ARIMA (L-step), L=120 76.7 3
Hybrid(I) 25 < 0.01
Hybrid(U) 98.9 3
Table 6.6: SensorScope: SHORT faults
and False Positive percentages are computed by aggregating the number of faulty samples and the false
positives, respectively, over all the weather stations.
Based on the results shown in Table 6.6, we make the following three observations. First, the ARIMA
(One-step) method performs the best; it detected 96.9% of the faulty samples and incurred few false
positives. Second, our evaluation with injected SHORT faults (Section 6.4.1) showed that the ARIMA
(One-step) method is better suited than the ARIMA (L-step) method for detecting SHORT faults. This
observation is conrmed by the relative performance of the One-step and the L-step ahead forecasting
methods for detecting the SHORT faults in the SensorScope dataset. Third, the HMM and the LLSE
methods detect less faults than the two ARIMA methods but they incur fewer false positives.
6.5.2 INTEL Lab, Berkeley data set
54 Mica2Dot motes with temperature, humidity and light sensors were deployed in the Intel Berkeley
Research Lab between February 28th and April 5th, 2004 [35]. In this paper, we present the results on the
prevalence of faults in the temperature readings (sampled on average once every 30 seconds).
This dataset exhibited a combination of NOISE and CONSTANT faults. Each sensor also reported the
voltage values along with the samples. Inspection of these voltage values showed that the faulty samples
were well correlated with the last few days of the deployment when the lithium ion cells supplying power
to the motes were unable to supply the voltage required by the sensors for correct operation.
The faulty samples were contiguous in time (Figure 6.15). We applied the NOISE rule, the HMM
method (using a simple 2-state HMM model), and the ARIMA One-step and L-step methods to detect the
132
0 0.5 1 1.5 2 2.5 3
x 10
6
0
20
40
60
80
100
120
140
Time (seconds)
Temperature
Figure 6.15: Intel data set: NOISE faults
faulty samples. Interestingly, for this dataset, we could not apply the LLSE method. NOISE faults across
various nodes were correlated, since all the nodes ran out of battery power at approximately the same time.
This breaks an important assumption underlying the LLSE technique, that faults at different sensors are
uncorrelated.
Figure 6.16 shows the fraction of the total temperature samples (collected by all the motes) with faults,
and the performance of the NOISE rule, the HMM and the hybrid methods at detecting these faults. We
present the performance results for the ARIMA model separately in Figure 6.17 for clarity. Both the
NOISE rule and HMM have some false negatives while the HMM also has some false positives. For this
data set, we could eliminate all the false positives using Hybrid(I) with NOISE rule and HMM. However,
combining the NOISE rule and HMM for Hybrid(I) incurred more false negatives.
R H U I
0
0.05
0.1
0.15
0.2
0.25
Fraction of Samples with Faults
Detected
False Negative
False Positive
Figure 6.16: Intel data set: Prevalence of
NOISE faults
One−step L−step (1 hr) L−step (24 hrs)
0
0.1
0.2
0.3
0.4
0.5
Fraction of samples with faults
Detected
False Negative
False positives
Figure 6.17: Intel data set: ARIMA methods
133
The fact that this dataset contained sensor measurements collected over two months enabled us to
apply our time series method for fault detection. We set the periodicity parameter s = 2880 because
samples were collected every 30 seconds and hence, 2880 samples were collected per day. For each
sensor mote, we used measurements collected over the rst 10 days to estimate the parameters of the
time series model. The size of the training dataset was inuenced by two factors: (a) the periodicity
parameter s, and (b) missing samples. Since our model differences the time series twice (the differences
are (z
t
z
t1
) (z
ts
z
ts1
), refer to equation (6.2)), we discard s + 1 = 2881 samples (i.e.,
measurements collected over a day). The average yield for INTEL data set was 50%; i.e. only 50% of the
samples collected every 30 seconds at a sensor were delivered at the base-station. Due to large number of
missing samples per day, we had to include measurements from more days in the training dataset, in order
to have sufcient data for training the ARIMA model.
Figure 6.17 shows the performance of the time series based method with three different forecasting
scenarios: One-step ahead, L-step ahead with L = 120 samples (i.e. 1 hour) and L-step ahead with
L = 2880 samples (i.e. 24 hours). Maximum number of faults are detected with L-step ahead forecast
with L = 2880. This is so because the duration of the NOISE and CONSTANT faults in the INTEL dataset
(Figure 6.15) was on the order of days, and hence L-step ahead forecasting with a large L is needed to
detect these long duration faults. Even for L = 2880 samples, we could not detect two-thirds of the faults.
However, our method was able to detect 90% of the faulty samples reported during the rst day on which
the faults started. This shows that our time series based method can detect long duration faults provided
that their duration is shorter than the time series periodicity.
Finally, in this data set, surprisingly there were few instances of SHORT faults. A total of 6 faults were
observed for the entire duration of the experiment (Table 6.7). All of these faults were detected by the
HMM method, LLSE, the SHORT rule, and the ARIMA model based method.
134
ID # Faults Total # Samples
2 1 46915
4 1 43793
14 1 31804
16 2 34600
17 1 33786
Table 6.7: Intel Lab: SHORT Faults, Temperature
6.5.3 Great Duck Island (GDI) data set
We looked at data collected using 30 weather motes on the Great Duck Island over a period of 3 months [50].
Attached to each mote were temperature, light, and pressure sensors, and these were sampled once every 5
mins. Of the 30 motes, the data set contained sampled readings from the entire duration of the deployment
for only 15 motes. In this section, we present our ndings on the prevalence of faults in the readings for
these 15 motes.
The predominant fault in the readings was of the type SHORT. We applied the SHORT rule, the
LLSE method and Hybrid(I) to detect SHORT faults in light, humidity, and pressure sensor readings.
We could not apply the ARIMA and the HMM methods to this dataset due to a large number of missing
(not recorded) observations. Each of the 15 motes that we considered had more than 60% of the samples
missing for at least 18 days, and more than 10% of the samples missing for at least 56 days. For an ARIMA
model with periodicity s, in order to predict the sensor reading at time t, we need sensor readings at times
t1, ts, and ts1 (refer to Equation (6.2), Section 6.3.3). If any of these three readings are missing,
then the ARIMA model cannot predict the reading at time t. For the ARIMA methods, one way to tackle
the problem of one (or more) missing readings needed to predict the value at time t can be to use estimates
of these missing readings (obtained from the ARIMA or LLSE method). This works if we can be condent
that these estimates are not erroneous. This approach worked for the INTEL dataset, but did not for the
GDI dataset, mainly because we could not obtain estimates for all the missing samples. Similarly, the large
fraction of missing samples also prevented the training phase of the HMM method from converging, and
hence, we could not use the HMM method on this dataset.
135
Figure 6.18 shows the overall prevalence (computed by aggregating results from all the 15 nodes) of
SHORT faults for different sensors in the GDI data set. The Hybrid(I) technique eliminates any false
positives reported by the SHORT rule or the LLSE method. The intensity of SHORT faults was high
enough to detect them by visual inspection of the entire sensor readings timeseries. This ground-truth is
included for reference in the gure under the label V.
It is evident from Figure 6.18 that SHORT faults are relatively infrequent. They are most prevalent in
the light sensor (approximately 1 fault every 2000 samples). Figure 6.19 shows the distribution of SHORT
faults in light sensor readings across various nodes. We did not observe any discernible pattern in the
prevalence of these faults across different sensor nodes;
In this data set, NOISE faults were infrequent. Only two nodes had NOISE faults with a duration
of about 100 samples. The NOISE rule detected it, but the LLSE method failed primarily because its
parameters had been optimized for SHORT faults.
R L I V R L I V R L I V
0
1
2
3
4
5
6
x 10
−4
Fraction of Samples with Faults
Light (left), Humidity (Center), Pressure (Right)
Detected
False Negative
False Positive
Figure 6.18: SHORT Faults in GDI data set
101 103 109 111 116 118 119 121 122 123 124 125 126 129 900
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x 10
−3
Fraction of Samples with Faults
SHORT Faults
Figure 6.19: SHORT Faults: Light Sensor
6.5.4 NAMOS data set
Nine buoys with temperature and chlorophyll concentration sensors (uorimeters) were deployed in Lake
Fulmor, James Reserve for over 24 hours in August, 2006 [61]. Each sensor was sampled every 10 seconds.
We analyzed the measurements from chlorophyll sensors for the prevalence of faults.
136
101 102 103 106 107 109 110 112 114
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Fraction of Samples with Faults
Node ID
Detected
False Negative
False Positive
Figure 6.20: NAMOS data set:
NOISE/CONSTANT faults
The predominant fault was a combination of NOISE and CONSTANT caused by hardware faults in
the ADC (Analog-to-Digital Converter) board. Figure 6.1.a shows the measurements reported by buoy
103. We applied the NOISE Rule to detect samples with errors. Figure 6.20 shows the fraction of samples
corrupted by faults. The sensors at 4 buoys were affected by the ADC board fault and in the worst case, at
buoy 103, 35% of the reported values were erroneous. We could not apply the LLSE, the HMM and the
ARIMA model based method because there was not enough data to train the models (data was collected
for 24 hours only).
1
6.6 Discussion and Future Work
In this section, we briey discuss two issues that are relevant to data fault detection outliers and event
detection as well as utility of faulty samples; however, these are not a focus of our work. We also discuss
interesting extensions to the fault detection methods evaluated in this paper that can be useful for detecting
other types of faults.
1
The NAMOS dataset used in Section 6.4 is from a different deployment done in October 2005 [60]. This deployment consisted
of only 4 buoys collecting data over 48 hours. We did not nd any instances of faults in the dataset from the October 2005 deployment
and hence do not discuss it here.
137
Outliers and Events. Sensor measurements can deviate from their expected values due to an unexpected
event or without any known causes (outliers), especially in the context of environmental monitoring. The
fault detection methods discussed in this paper are likely to ag these measurements as faulty. However,
discarding these samples as faulty may result in loss of important information related to an unknown event
or outlier [28]. Before discarding/ltering out the faulty samples, we can try to extract some information
from these samples. For example, if the samples agged as faulty do not match any of the known fault
models, it is possible that they are due to an unknown event or are simply outliers. In such a situation, con-
textual information about the sensors and the phenomenon being monitored can help us decide whether
these samples are due to an unknown event. If so, using the agged samples, we can try to generate a
signature for the event using statistical and learning techniques other than those we have presented in this
paper. Having a signature can help us detect occurrences of the same event in the future.
Utility of faulty samples. Applications using a sensor network typically combine data from several sen-
sors with different sensing modalities and spatiotemporal scales (commonly referred to as data fusion) in
order to extract the relevant information. Depending on the fault type and intensity, a faulty sample may
still provide useful information. During the data fusion process, we can assign a lower weight/importance
to faulty samples compared to the clean/non-faulty samples [106]. In the case of data faults due to calibra-
tion errors, we can correct faulty samples if we can determine the proper calibration formula. Thus, it is
not advisable to always discard/lter out the fault samples.
Next we discuss several enhancements to the methods presented in this paper. These enhancements
can possibly not only improve the accuracy of these methods, but also enhance their applicability to other
fault types. We plan to evaluate these enhanced methods as part of our future work.
Enhancements to the proposed methods. As mentioned in Section 6.3.2, the vector version of the LLSE
method can incorporate spatial correlation across sensors attached to different nodes in computing the
estimates for sensor values. Spatial correlations can provide a global view of the data collected by the
138
network. In the absence of ground truth values, using the spatial correlation information from multiple
sensors can result in a higher delity model or better estimates for sensor data, and hence, more accurate
and robust fault detection. We expect the vector version of the LLSE method to perform better than the
scalar version for fault detection that relies on leveraging inter-node relationships for example, detecting
faults due to calibration errors.
The ARIMA model based methods can be expanded to take measurements from different sensor(s)
attached to same and/or different node as input while forecasting the expected sensor reading (refer to
Chapter 11, Section 5 in [9]). Such enhancements can enable the ARIMA model based methods to
leverage spatial correlations (or inter-node relationships) as well as inter-sensor relationships, in addition
to the temporal correlations. For example, in applying the ARIMA model based methods to a time series of
temperature measurements, we can use measurements from the humidity sensor as an input. Temperature
and humidity variations are known to be strongly correlated, and an ARIMA model that combines the
two can perform better in situations where inter-sensor relationships (across different sensing modalities)
are needed for fault detection. However, this enhanced ARIMA model is more complex and we need
to estimate more model parameters. This increased complexity may result in a more computationally
intensive training phase requiring a larger training dataset. Similarly, we can enhance our HMM based
method to take measurements from other sensors as input [7].
6.7 Summary and Conclusions
In this paper, we focused on a simple question: How often are the sensor data fault types SHORT, NOISE,
and CONSTANT observed in real deployments? To answer this question, we rst explored and char-
acterized four qualitatively different classes of fault detection methods (Rule-based, LLSE, time series
forecasting, and HMMs) and then applied them to real world datasets. Several other methods based on
Bayesian lters, neural networks, etc. can be used for sensor fault detection. However, the four methods
139
discussed in this paper are representatives of the larger class of these alternate techniques. Hence, an analy-
sis of the four methods with injected faults presented in Section 6.4, not only demonstrates the differences,
in terms of accuracy and robustness, between these methods, but can also help make an informed opinion
about the efcacy of several other methods for sensor fault detection.
Fault type SensorScope INTEL Lab GDI NAMOS
Infrequent (less than Infrequent (only 5 Infrequent,
SHORT 0.01% samples affected), nodes affected) high intensity None
high intensity
Frequent (20-25% Frequent
NOISE, None samples affected), Infrequent (15-35% samples
CONSTANT spatiotemporally affected)
correlated
Table 6.8: Datasets: Prevalence of faults
We now summarize our main ndings. Table 6.8 summarizes the fault prevalence for the different
datasets. The prevalence of faults was lowest in the SensorScope dataset (less than 0:01% samples were
affect by faults). However, in the INTEL Lab and NAMOS datasets a signicant percentage (between
15-35%) of samples were affected by a combination of NOISE and CONSTANT faults. Such a high
percentage of erroneous samples highlights the importance of automated, on-line sensor fault detection. In
the GDI data set SHORT faults occurred once in two days but the faulty sensor values were often orders
of magnitude higher than the correct value. Except for the INTEL Lab dataset, we found no spatial or
temporal correlations among faults. In that dataset, the faults across various nodes were correlated because
all the nodes ran out of battery power at approximately the same time.
Table 6.9 and 6.10 summarize the evaluation results with injected faults. As discussed in Section 6.4,
most of the methods work well for high and medium intensity SHORT faults, and high intensity and long
duration NOISE faults. However, their performance is severely degraded in case of low intensity and/or
short duration faults. In particular, the HMM and the ARIMA (L-step) methods are not suited for detect-
ing short duration NOISE faults, while the LLSE and the ARIMA methods perform poorly at detecting
low intensity SHORT faults. The observation that low intensity faults are harder to detect is not entirely
140
Method High/Medium Low False
fault intensity fault intensity positives
SHORT rule works well detects at least 50% of faults No
LLSE works well performs poorly No
HMM works well
detects more faults Yes
than SHORT rule and LLSE (for low fault intensity)
ARIMA
works well detects at least 50% of faults
Yes
(One-step) (high rate of false positives)
ARIMA
works well performs poorly
Yes
(L-step) (high rate of false positives)
Table 6.9: Detection methods: Performance on injected SHORT faults
Method
High/Medium Low
Fault duration False positives
fault intensity fault intensity
NOISE rule works well performs poorly
robust to changes
No
in fault duration
LLSE works well performs poorly
more suited for
Yes
long duration faults (for low intensity and/or
short duration faults)
HMM works well
detects more faults not suited for Yes
than NOISE rule low intensity and (high false
and LLSE short duration faults positive rate)
Avg. performance for
Yes
ARIMA No signicant impact short duration faults;
(One-step) on performance not suited for
long duration faults
performs poorly
reasonable performance
Yes
ARIMA reasonable for long duration
(L-step) performance faults; not suited for
short duration faults
Table 6.10: Detection methods: Performance on injected NOISE faults
141
unexpected. Our data-centric approach to fault detection is most effective when the faulty samples dif-
fer signicantly from normal sensor readings. This is not always the case for low intensity faults. Two
important observations can be made based on the evaluation of the fault detection methods with injected
faults: (i) the four classes of methods sit at different points on the accuracy/robustness spectrum, and no
single method is perfect for detecting the different types of faults; (ii) hybrid methods can help eliminate
false positives or false negatives, and using two methods in sequence can detect more low intensity faults
at the expense of slightly higher false positives.
The fault detection methods performed well on the real world datasets except for the ARIMA model
based methods when used on the INTEL Lab, Berkeley dataset. This is because most of the datasets
experienced high intensity faults.
Even though we analyzed most of the publicly available real world sensor datasets for faults, it is
hard to make general statements about sensor faults in real world deployments based on just four datasets.
However, our results raise awareness of the prevalence and severity of the problem of data corruption and
can inform future deployments. Overall, we believe that our work opens up new research directions in
automated high-condence fault detection, fault classication, data rectication, and so on. More sophisti-
cated statistical and learning techniques than those we have presented can be brought to bear on this crucial
area.
142
Chapter 7
Conclusion
In this dissertation, we explored resource management in three different distributed systems: (1) server
clusters prodiving computing-as-a-service, (2) tiered-architectures hosting web services, and (3) networks
of wireless sensors.
For compute clouds providing computing-as-a-service, we designed and implemented a service
model, called MRM, that provides predictability in job nish times and prioritized service to delay sensitive
jobs. Experiments using our full-edged prototype reveal that MRM can achieve near-perfect predictabil-
ity and offer earlier-than-FCFS deadline to delay sensitive jobs by incentivizing users to give slack.
We also develop a machine learning based workload characterization technique for web services that
categorizes users' request based on their resource usage. Such categorization is useful in improving the
accuracy of performance models for these systems.
In the context of wireless sensor networks, we make the following two contributions: (1) we design an
online algorithm, called SEEC, that makes joint compression and transmission decisions to save energy, and
(2) we explore techniques for detecting anomalies in data collected using these networks. Our evaluation
of SEEC shows that it is able to achieve more than 30% energy savings and adapts seamlessly across a
wide range of system dynamics. Finally, we evaluated four different methods using real-world data sets
to characterize both their efcacy at detecting anomalies as well as the prevalence of anomalies in sensor
datasets.
143
References
[1] Amazon's web services. http://aws.amazon.com/.
[2] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y . Lu, B. Saha, and Ed Harris. Reining
in the Outliers in Map-Reduce Clusters using Mantri. In OSDI, 2010.
[3] Sugato Bagchi, Eugene Hung, Arun Iyengar, Norbert V ogl, and Noshir Wadia. Capacity planning
tools for web and grid environments. In Proceedings of the 1st International Conference on Perfor-
mance Evaluation Methodolgies and Tools, October 1996.
[4] L. Balzano and R. Nowak. Blind Calibration in Sensor Networks. In Proceedings of the Intena-
tional Conference on Information Processing in Sensor Networks (IPSN), 2007.
[5] Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using Magpie for request
extraction and workload modelling. In Proceedings of the OSDI'04, December 2004.
[6] K. Barr and K. Asanovi c. Energy Aware Lossless Data Compression. In Proceedings of MobiSys,
2003.
[7] Y . Bengio and P. Frasconi. An Input Output HMM Architecture. In Proceedings of the Neural
Information Processing Systems Conference (NIPS), 1995.
[8] J. Blazewicz. Deadline scheduling of tasksa survey. Foundation Contr. Engg., 1977.
[9] G. E. P. Box, G. M. Jenkins, and G. C. Reinsen. Time Series Analysis: Forecasting and Control,
3rd Edition. Prentice Hall, 1994.
[10] Nicolas Burri, Pascal von Rickenbach, and Roger Wattenhofer. Dozer: ultra-low power data gath-
ering in sensor networks. In Proceedings of IPSN, pages 450459. ACM, 2007.
[11] R. Buyya, J. Giddy, and D. Abramson. An evaluation of economy-based resource trading and
scheduling on computational power grids for parameter sweep applications. In Active Middleware
Services, 2000.
[12] Vladimir Bychkovskiy, Seapahn Megerian, Deborah Estrin, and Miodrag Potkonjak. A Collabora-
tive Approach to In-Place Sensor Calibration. In Proceedings of the 2nd Intenational Workshop on
Information Processing in Sensor Networks (IPSN), 2003.
[13] D. Caron, A. Das, A. Dhariwal, L. Golubchik, R. Govindan, D. Kempe, C. Oberg, A. B. Sharma,
B. Stauffer, G. Sukhatme, and B. Zhang. AMBROSia: An Autonomous Model-Based Reactive
Observing System. In Proceedings of ICCS, Invited paper, 2007.
[14] Chris Chateld. Time Series Forecasting. Chapman and Hall/CRC Press, 2000.
[15] Mike Y . Chen, Anthony Accardi, Emre Kiciman, Armando Fox, Dave Patterson, and Eric Brewer.
Path-Based Failure and Evolution Management. In Proceedings of the NSDI'04, March 2004.
144
[16] Alexandre Ciancio, Sundeep Pattem, Antonio Ortega, and Bhaskar Krishnamachari. Energy Ef-
cient Data-Representation and Routing for Wireless Sensor Networks Based on a Distributed
Wavelet Compression Algorithm. In Proceedings of the IPSN, 2006.
[17] A. Cockcroft and B. Walker. Capacity Planning for Internet Services. Sun Press, 2001.
[18] Thanh Dang, Nirupama Bulusu, and Wu chi Feng. RIDA: A Robust Information-Driven Data
Compression Architecture for Irregular Wireless Sensor Networks. In Proceedings of the EWSN,
2007.
[19] Douglas S. J. De Couto, Daniel Aguayo, John Bicket, and Robert Morris. A High-Throughput Path
Metric for Multi-Hop Wireless Routing. In Proceedings of Mobicom, 2003.
[20] J. Dean and S. Ghemawat. MapReduce: Simplied Data Processing on Large Clusters. In OSDI,
2004.
[21] ElasticMapReduce. http://aws.amazon.com/elasticmapreduce/.
[22] E. Elnahrawy and B. Nath. Cleaning and Querying Noisy Sensors. In Proceedings of the ACM
International Workshop on Wireless Sensor Networks and Applications (WSNA), 2003.
[23] fastica. The FastICA package for Matlab and R. http://www.cis.hut.fi/projects/
ica/fastica/.
[24] Ola Friman, M. Borga, P. Lundberg, and H. Knutsson. Exploratory fMRI Analysis by Autocorrela-
tion Maximization. NeuroImage, 16(2):454464, 2002.
[25] R. Garcia and A. Marn. Parking capacity and pricing in park'n ride trips: A continuous equilibrium
network design problem. Annals of Operations Research, 2002.
[26] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper. Workload Analysis and Demand Prediction of
Enterprise Data Center Applications. In Proceedings of the IISWC'07, September 2007.
[27] Moises Goldszmidt, Derek Palma, and Bikash Sabata. On the Quantication of e-Business Capac-
ity. In Proceedings of the Electronic Commerce, 2001.
[28] J. Gupchup, A. Sharma, A. Terzis, R. Burns, and A. Szalay. The Perils of Detecting Measurement
Faults in Environmental Monitoring Networks. In Proceedings of the ProSense Special Session
and International Workshop on Wireless Sensor Network Deployments (WiDeploy), held at DCOSS,
2008.
[29] John Hicks, Jeongyeup Paek, Sharon Coe, Ramesh Govindan, and Deborah Estrin. An Easily
Deployable Wireless Imaging System. In Proceedings of ImageSense Workshop, 2008.
[30] High Performance Computing and Communication, Universty of Southern California.
http://www.usc.edu/hpcc/, year = ,.
[31] HIVE. http://hadoop.apache.org/hive/.
[32] Xiangchi Huang, Fuchung Peng, Aijun An, and Dale Schuurmans. Dynamic Web Log Session
Identication With Statistical Language Models. Journal of the American Society for Information
Science and Technology, 55(14):12901303, 2004.
[33] A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley-Interscience,
2001.
[34] Aapo Hyvarinen. Gaussian moments for noisy independent component ananlysis. IEEE Signal
Processing Letters, 6(6), June 1999.
145
[35] INTEL. The Intel Lab Data. 2004. Data set available at:
http://berkeley.intel-research.net/labdata/.
[36] M. Isard, M. Budiu, Y . Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs
from Sequential Building Blocks. In EuroSys, 2007.
[37] M. Isard, V . Prabhakaran, J. Currey, U. Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: Fair
scheduling for distributed computing clusters. In SOSP, 2009.
[38] S. R. Jeffery, G. Alonso, M. J. Franklin, W. Hong, and J. Widom. Declarative Support for Sensor
Data Cleaning. In Proceedings of the International Conference on Pervasive Computing, 2006.
[39] John Judge. A Model for the Marginal Distribution of Aggregate Per Second HTTP Request Rate.
In Proceedings of the 10th IEEE Workshop on Local and Metropolitan Area Networks, 1999.
[40] Thomas Kailath, editor. Linear Least-Squares Estimation. Hutchison & Ross, Stroudsburg, PA,
1977.
[41] S. Kavulya, J. Tan, R. Gandhi, and Priya Narasimhan. An Analysis of Traces from a Production
MapReduce Cluster. In Proceedings of Cluster, Cloud and Grid Computing, 2010.
[42] N. Khoussainova, M. Balazinska, and D. Suciu. Towards Correcting Input Data Errors Probabilis-
tically Using Integrity Constraints. In Proceedings of the ACM Workshop on Data Engineering and
Mobile Access (MobiDE), 2006.
[43] F. Koushanfar, M. Potkonjak, and A. Sangiovammi-Vincentelli. On-line Fault Detection of Sensor
Measurements. In IEEE Sensors, 2003.
[44] L. Georgiadis and M. J. Neely and L. Tassiulas. Resource Allocation and Cross-Layer Control in
Wireless Networks. Foundations and Trends in Networking, 2006.
[45] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time
environment. Journal of the ACM, 1973.
[46] Z. Liu, L. Wynter, C. X. Xia, and F. Zhang. Performance inference of queueing models for IT
systems using end-to-end measurements. Performance Evaluation, 63(2006):3660.
[47] M. J. Neely. Energy Optimal Control for time varying wireless networks. IEEE Transactions on
Information Theory, 52(7):29152934, 2006.
[48] M. L. Pinedo. Scheduling: Theory, Algorithms, and Systems. Springer, 2008.
[49] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-
Completeness. Freeman, 1979.
[50] Alan Mainwaring, Joseph Polastre, Robert Szewczyk, and David Cullerand John Anderson. Wire-
less Sensor Networks for Habitat Monitoring . In the ACM International Workshop on Wireless
Sensor Networks and Applications (WSNA), 2002.
[51] MATLAB. http://www.mathworks.com/.
[52] Ningfang Mi, Qi Zhang, Alma Riska, Evgenia Smirni, and Eric Riedel. Performance Impacts of
Autocorrelated Flows in Multi-tiered Systems. In Proceedings of the Performance'07, October
2007.
[53] Microsoft. .NET Pet Shop 4.0. http://msdn2.microsoft.com/.
[54] Microsoft. Visual Studio 2005 Team Suite. http://msdn2.microsoft.com/.
146
[55] P. Mohan, V . N. Padmanabhan, , and R. Ramjee. Nericell: using mobile smartphones for rich
monitoring of road and trafc conditions. In Proceedings of ACM Sensys, 2008.
[56] K. Morton, M. Balazinska, and D. Grossman. ParaTimer: A Progress Indicator for MapReduce
DAGs. In SIGMOD, 2010.
[57] K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the Progress of MapReduce
Pipelines. In ICDE, 2010.
[58] M. Mun, S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and
P. Boda. PEIR: the personal environmental impact report, as a platform for participatory sensing
systems research. In Proceedings of MobiSys, 2009.
[59] Wady Naanaa and Jean-Marc Nuzillard. Blind source separation of positive and partially correlated
data. Signal Processing, 85(9):17111722, 2005.
[60] NAMOS. NAMOS: Networked Aquatic Microbial Observing System. October 2005. Data set
available at: http://robotics.usc.edu/namos/data/jr_oct/web/.
[61] NAMOS. NAMOS: Networked Aquatic Microbial Observing System. August 2006. Data set
available at: http://robotics.usc.edu/namos/data/jr_aug_06/.
[62] M. J. Neely. Dynamic Data Compression for Wireless Transmission over a Fading Channel. In
Proceedings of the Conference on Information Sciences and Systems, 2008.
[63] M. J. Neely, E. Modiano, and C. E. Rohrs. Dynamic Power Allocation and Routing for Time
Varying Wireless Networks. In Proceedings of the INFOCOM, 2003.
[64] Michael J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with
Time Varying Channels. PhD thesis, Massachusetts Institute of Technology, November 2003.
[65] K. Ni, N. Ramanathan, M. Chehade, L. Balzano, S. Nair, S. Zahedi, G. Pottie, M. Hansen, and
M. Srivastava. Sensor Network Data Fault Types. Transactions on Sensor Networks, 2008. to
appear.
[66] A. Odlyzko. Internet pricing and the history of communications. Computer Networks, 2001.
[67] Jeongyeup Paek, Omprakash Gnawali, Ki-Young Jang, Daniel Nishimura, Ramesh Govindan, John
Caffrey, Mazen Wahbeh, and Sami Masri. A Programmable Wireless Sensing System for Struc-
tural Monitoring. In Proceedings of the 4th World Conference on Structural Control and Monitor-
ing(4WCSCM), 2006.
[68] S. Pattem, B. Krishnamachari, and R. Govindan. The Impact of Spatial Correlation on Routing with
Compression in Wireless Sensor Networks. In Proceedings of the IPSN, 2004.
[69] PIG. http://hadoop.apache.org/pig/.
[70] J. Polo, D. de Nadal, D. Carrera, Y . Becerra, V . Beltran, J. Torres, and E. Ayguade. Adaptive task
scheduling for multijob mapreduce environments. XX Jornadas de Paralelismo, 2009.
[71] Qualnet. http://www.scalable-networks.com/products, 2008.
[72] Lawrence Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recog-
nition. Proceedings of IEEE, 77(2):257286, 1989.
[73] N. Ramanathan, L. Balzano, M. Burt, D. Estrin, E. Kohler, T. Harmon, C. Harvey, J. Jay, S. Rothen-
berg, and M. Srivastava. Rapid Deployment with Condence: Calibration and Fault Detection in
Environmental Sensor Networks. Technical Report 62, CENS, April 2006.
147
[74] N. Ramanathan, T. Schoellhammer, D. Estrin, M. Hansen, T. Harmon, E. Kohler, and M. Srivas-
tava. The Final Frontier: Embedding Networked Sensors in the Soil. Technical Report 68, CENS,
November 2006.
[75] Musaloiu-E. Razvan, Chieh-Jan Liang, and Andreas Terzis. Koala: Ultra-Low Power Data Retrieval
in Wireless Sensor Networks. In Proceedings of IPSN, 2008.
[76] C. Sadler and M. Martonosi. Data Compression Algorithms for Energy-constrained devices in
Delay Tolerant Networks. In Proceedings of the ACM Sensys, 2006.
[77] Salesforce. http://www.salesforce.com/.
[78] SAS. The SAS Forecasting Software.
http://www.sas.com/technologies/analytics/forecasting/index.html.
[79] SensorScope. The SensorScope Lausanne Urban Canopy Experiment (LUCE) Project. 2006. Data
set available at: http://sensorscope.epfl.ch/index.php/LUCE.
[80] SFpark. http://sfpark.org/how-it-works/.
[81] A. Sharma, R. Bhagwan, M. Choudhury, L. Golubchik, R. Govindan, and G. M. V oelker. Automatic
Request Categorization in Internet Services. In Proceedings of the First Workshop on Hot Topics in
Measurement and Modeling of Computer Systems (HotMetrics), 2008.
[82] A. B. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic Data Compression in Multi-
hop Wireless Networks. In Proceedings of the SIGMETRICS/ Performance, 2009.
[83] A. B. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic Data Compression in Multi-
hop Wireless Networks. Technical Report 09-905, Computer Science, University of Southern Cali-
fornia, April 2009.
[84] Abhishek B. Sharma, Leana Golubchik, and Ramesh Govindan. On the Prevalence of Sensor Faults
in Real-World Deployments. In Proceedings of the IEEE Conference on Sensor, Mesh and Ad Hoc
Communications and Networks (SECON), 2007.
[85] A. Sridharan, S. Moeller, and B. Krishnamachari. Investigating Backpressure based Rate Control
Protocols for Wireless Sensor Networks. Technical Report CENG-2008-7, University of Southern
California, July 2008.
[86] K. Srinivasan and P. Levis. RSSI is Under Appreciated. In Proceedings of EmNets Workshop, 2006.
[87] T. Stathopoulos, D. McIntire, and W. J. Kaiser. The Energy Endoscope: Real-Time Detailed Energy
Accounting for Wireless Sensor Nodes. In Proceedings of the IPSN, 2008.
[88] Christopher Stewart, Terence Kelly, and Alex Zhang. Exploiting Nonstationarity for Performance
Prediction. In Proceedings of the EuroSys'07, March 2007.
[89] Christopher Stewart and Kai Shen. Performance Modeling and System Management for Multi-
component Online Services. In Proceedings of the NSDI'05, May 2005.
[90] I. Stoica, H. Abdel-Wahad, and A. Pothen. A microeconomic scheduler for parallel computers. In
IPPS, 1995.
[91] M. Stokely, J. Winget, E. Keyes, C. Grimes, and B. Yolken. Using a market economy to provision
compute resources across planet-wide clusters. In IPDPS, 2009.
148
[92] Robert Szewczyk, Joseph Polastre, Alan Mainwaring, and David Culler. Lessons from a sensor
network expedition. In Proceedings of the First European Workshop on Sensor Networks (EWSN),
2004.
[93] Gilman Tolle, Joseph Polastre, Robert Szewczyk, David Culler, Neil Turner, Kevin Tu, Stephen
Burgess, Todd Dawson, Phil Buonadonna, David Gay, and Wei Hong. A Macroscope in the Red-
woods. In Proceedings of the 2nd international conference on Embedded networked sensor systems
(SenSys), pages 5163, New York, NY , USA, 2005. ACM Press.
[94] Daniela Tulone and Sam Madden. PAQ: Time series forecasting for approximate query answering in
sensor networks. In Proceedings of the European Conference on Wireless Sensor Networks (EWSN),
2006.
[95] A. Umut, M. Andrews, P. Gupta, J. Hobby, I. Sanjee, and A. Stolyar. Joint scheduling and conges-
tion control in mobile ad-hoc networks. In Proceedings of INFOCOM, 2008.
[96] Bhuvan Urgaonkar, Giovanni Pacici, Prashant Shenoy, Mike Spreitzer, , and Asser Tantawi. An
Analytical Model for Multi-tier Internet Services and its Applications. In Proceedings of the SIG-
METRICS'05, June 2005.
[97] C. A. Waldspurger, T. Hogg, B. A. Huberman, J. O. Kephart, and W. S. Stornetta. Spwan: A
distributed computational economy. IEEE Trans. on Software Engineering, 1992.
[98] A. Warrier, L. Le, and I. Rhee. Cross-layer optimization made practical. In Proceedings of Broad-
nets, Invited paper, 2007.
[99] Geoff Werner-Allen, Konrad Lorincz, Jeff Johnson, Jonathan Lees, and Matt Welsh. Fidelity and
Yield in a V olcano Monitoring Sensor Network. In Proceedings of the 7th USENIX Symposium on
Operating Systems Design and Implementation (OSDI), 2006.
[100] TeamQuest Model: Capacity Planning Software with Modeling.
http://www.teamquest.com/.
[101] Y . Yao, A. Sharma, L. Golubchik, and R. Govindan. Online Anomaly Detection for Sensor Systems:
a Simple and Efcient Approach. In Proceedings of the International Symposium on Computer
Performance, Modeling, Measurements and Evaluation (Performance), 2010.
[102] Wei Ye, Fabio Silva, and John Heidemann. Ultra-Low Duty Cycle MAC with Scheduled Channel
Polling. In Proceedings of the ACM Sensys, 2006.
[103] K. H. Yeung and C. W. Szeto. On the Modeling of WWW Request Arrivals. In Proceedings of the
International Conference on Parallel Processing Workshops, 1999.
[104] M. B. Yildirim and D. W. Hearn. A rst best toll pricing framework for variable demand trafc
assignment problems. Transportation Research Part B: Methodological, 2005.
[105] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling:
A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.
[106] Sadaf Zahedi, Marcin Szczodrak, Ping Ji, Dinkar Mylaraswamy, Mani B Srivastava, and Robert
Young. Tiered Architecture for On-Line Detection, Isolation and Repair of Faults in Wireless Sen-
sor Networks. In Proceedings of the MILCOM, 2008.
[107] Qi Zhang, Lucy Cherkasova, and Evgenia Smirni. A Regression-Based Analytic Model for Dy-
namic Resource Provisioning of Multi-Tier Applications. In Proceedings of the ICAC'07, June
2007.
149
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
QoS-aware algorithm design for distributed systems
PDF
Rate adaptation in networks of wireless sensors
PDF
Robust routing and energy management in wireless sensor networks
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Resource scheduling in geo-distributed computing
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
QoS based resource management for Internet applications
PDF
Joint routing, scheduling, and resource allocation in multi-hop networks: from wireless ad-hoc networks to distributed computing networks
PDF
Energy efficient design and provisioning of hardware resources in modern computing systems
PDF
Congestion control in multi-hop wireless networks
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
PDF
Distributed resource management for QoS-aware service provision
PDF
Reliable languages and systems for sensor networks
PDF
Domical: a new cooperative caching framework for streaming media in wireless home networks
PDF
Distributed wavelet compression algorithms for wireless sensor networks
PDF
Gradient-based active query routing in wireless sensor networks
PDF
Satisfying QoS requirements through user-system interaction analysis
PDF
QoS-based distributed design of streaming and data distribution systems
Asset Metadata
Creator
Sharma, Abhishek Bhan
(author)
Core Title
Adaptive resource management in distributed systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/22/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computing-as-a-service,OAI-PMH Harvest,stochastic network optimization,wireless networks
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Golubchik, Leana (
committee chair
), Govindan, Ramesh (
committee chair
), Neely, Michael J. (
committee member
)
Creator Email
absharma@gmail.com,absharma@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3550
Unique identifier
UC1319023
Identifier
etd-Sharma-4219 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-420633 (legacy record id),usctheses-m3550 (legacy record id)
Legacy Identifier
etd-Sharma-4219.pdf
Dmrecord
420633
Document Type
Dissertation
Rights
Sharma, Abhishek Bhan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
computing-as-a-service
stochastic network optimization
wireless networks