Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Adaptive task allocation and data gathering in dynamic distributed systems
(USC Thesis Other)
Adaptive task allocation and data gathering in dynamic distributed systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADAPTIVE TASK ALLOCATION AND DATA GATHERING IN DYNAMIC DISTRIBUTED SYSTEMS by Bo Hong A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ( ELECTRICAL ENGINEERING) August 2005 Copyright 2005 Bo Hong Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3219874 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3219874 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgments I would like to thank my advisor Dr. Viktor K. Prasanna for his guidance, encourage ment and support throughout my PhD study. I would like to thank my wife Hongwei Wu. Her support and love have been a source of power for me to complete this dissertation. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents Acknowledgments ii List Of Tables v List Of Figures vi Abstract ix 1 Introduction 1 2 Related Work 10 2.1 Dynamic Distributed Systems .............................................................. 11 2.1.1 Computational Grid and Peer-to-peer Systems.......................... 11 2.1.2 Networked Sensor Systems......................................................... 14 2.2 Adaptivity in Distributed S y ste m s ........................................................ 18 2.3 Task Allocation and Scheduling in Distributed S ystem s...................... 20 3 Adaptive Allocation of Independent Tasks to Maximize Throughput 25 3.1 Introduction.............................................................................................. 26 3.2 System M odel........................................................................................... 28 3.3 Extended Network Flow Representation for the Task Allocation Problem 33 3.4 Decentralized Adaptive Task Allocation A lgorithm ............................ 37 3.4.1 Relaxed Flow Maximization P ro b lem ....................................... 37 3.4.2 The A lg o rith m ............................................................................ 40 3.5 On-line Task Allocation Protocol............................................................ 45 3.6 Experimental R esu lts.............................................................................. 47 4 Maximum Data Gathering in Networked Sensor Systems 60 4.1 Introduction.............................................................................................. 61 4.2 Data Gathering with Energy Constraints............................................... 65 4.2.1 System Model ............................................................................. 65 4.2.2 Store-and-Gather Problems.......................................................... 66 iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.3 Continuous Sensing and Gathering P ro b lem s............................ 69 4.3 Flow Maximization with Constraint on V ertices...................................... 71 4.3.1 Problem Reductions ..................................................................... 71 4.3.2 Relationship to Sensor Network Scenarios.................................. 73 4.4 Distributed Algorithm and Protocol To Maximize Data Gathering . . 75 4.5 Experimental R e su lts.................................................................................. 77 5 In-network Processing in Networked Sensor Systems 88 5.1 Introduction................................................................................................. 90 5.2 Problem Statement and Reduction to Flow Optimization Problem . . . 92 5.3 On-line Protocol for In-network P ro cessin g ............................................ 99 5.4 Experimental R e su lts................................................................................ 100 5.4.1 Simulation S e tu p ........................................................................... 100 5.4.2 Summary of R e s u lts .................................................................... 101 5.5 Performance Comparison.......................................................................... 105 5.6 Discussion.................................................................................................... 110 6 Maximum Lifetime Data Sensing and Extraction in Energy Constrained Networked Sensor Systems 114 6.1 Introduction................................................................................................. 115 6.2 System Model and Problem Statem ent..................................................... 116 6.2.1 Model of Networked Sensor S ystem ........................................... 116 6.2.2 Problem Statem ent........................................................................ 118 6.3 Reduction to a Restricted Flow P roblem ....................................................121 6.4 Algorithm for the Restricted Flow Problem ........................................... 124 6.5 Reconstructing the Data Flow for Each R ound........................................ 132 6.6 Performance Comparison........................................................................... 139 6.7 A Distributed Heuristic for the MDG Problem ........................................ 143 7 Conclusion 152 Appendix A 162 A.l Algorithms for Network Flow Maximization P ro b le m s........................ 162 A.2 Proof of the Correctness and Complexity Bound of the RIPR Algorithm 164 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Tables 4.1 Normalized Steady-state Throughput, r is the communication radius. n is the number of nodes............................................................................ 81 5.1 Normalized Raw Throughput................................................................. 102 5.2 Normalized Steady-state Throughput..................................................... 102 5.3 Start up tim e.............................................................................................. 103 5.4 Adaptation tim e ........................................................................................ 104 5.5 Normalized raw throughput of the greedy h e u ristic ............................. 109 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Figures 3.1 Transform a base problem to a network flow representation. Left: a base problem with three nodes. Right: the corresponding network flow representation.............................................................................................. 34 3.2 An example of the relaxed flow maximization p ro b le m ...................... 38 3.3 Performance of the proposed task allocation protocol with uniformly distributed system topologies. The x-axis represents the 900 experiments 50 3.4 Performance of the proposed task allocation protocol with power law distributed system topologies. The x-axis represents the 900 experiments 51 3.5 Histogram of maximum length of consumed task buffer. The y — axis represents the frequency............................................................................. 57 3.6 Impact of buffer size on the throughput of the sy ste m .......................... 58 3.7 Adaptation to changes in the sy ste m ....................................................... 59 3.8 Impact of control message transfer cost ................................................. 59 4.1 Adaptation performed in batch m o d e....................................................... 79 4.2 The maximum and mean cost per node for executing the RIPR algorithm 80 4.3 Execution time of the RIPR algorithm .................................................... 82 4.4 Start-up time of the proposed p ro to co l.................................................... 82 4.5 Illustration of the start-up and the adaptation of the proposed proto col. Framed block (a) is zoomed in figure 4.6(a), framed block (b) is zoomed in figure 4.6(b).............................................................................. 84 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6 Detailed illustration of the start-up and the adaptation of the proposed pro to co l..................................................................................................... 87 5.1 Reduction of TMIP to a network flow problem. Sensor nodes are de noted by circles. The square in (a) denotes the event of interest. Dotted lines in (a) represent the collection of data from the environment. The upper square in (b) denotes the newly added pseudo source s'. The lower square in (b) denotes the pseudo sink t'. Weight of the nodes and links are omitted in this figure............................................................ 97 5.2 Impact of wmax on system th ro u g h p u t.................................................. 105 5.3 An example illustrating the poor performance of path based greedy heuristic..................................................................................................... 108 6.1 An example of the MDG p ro b le m ......................................................... 120 6.2 An illustrative example of the RFEC a lg o rith m ................................... 131 6.3 Flow graph generated by the RFEC algorithm ...................................... 136 6.4 Reconstructing the flow for each round. Step 1: split the edges out of s. 137 6.5 Reconstructing the flow for each round. Step 2: find the paths............138 6.6 Example for worst case performance com parison......................... 142 6.7 Example demonstrating the non-optimality of the shortest path heuristic 146 6.8 The histogram of q over all the simulations. The value of q for each simulation is calculated as the ratio between the achieved system life time and the optimal lifetime. The y-axis has been normalized to the total number of simulations....................................................................... 149 6.9 The impact of the number of nodes on system lifetime. The y-axis has been normalized to the optimal system lifetime...................................... 149 6.10 The impact of communication radius on system lifetime. The y-axis has been normalized to the optimal system lifetime................................ 149 6.11 The impact of the number of source nodes (normalized to | Vc|/ 1 V |) on system lifetime. The y-axis has been normalized to the optimal system lifetime........................................................................................................ 150 6.12 The impact of em ax on system lifetime. The y-axis has been normal ized to the optimal system lifetime........................................................... 150 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.13 The impact of nmax on system lifetime. The y-axis has been normal ized to the optimal system lifetime............................................................ 150 v iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Distributed heterogeneous systems have emerged as attractive platforms for high performance computing. In systems such as computational grids or peer-to-peer sys tems, geographically distributed resources are connected through local and/or wide area networks and utilized in a coordinated manner, thereby fulfilling the compu tational needs of many complex applications. However, efficient utilization of such resources is challenging task due to several specific features of such resources. Typi cally, the computers (network links) are installed with different hardware and software systems and operate at different speeds. Additionally, the available compute (commu nication) capabilities of the resources may vary during run-time due to the sharing of resources among multiple users. Consequently, the execution of the applications must be adaptive to such dynamic run-time performance variations. In this dissertation, we study the adaptive computation of a large set of independent tasks. By modeling computation as a special type of data flow, we show that the system throughput can be represented as the network flow in a corresponding graph. Using this flow repre sentation, we have developed the Relaxed Incremental Push-Relabel (RIPR) algorithm to allocate the tasks. The RIPR algorithm is executed in a distributed fashion and is proven to maximize the system throughput with a polynomial number of local opera tions. More importantly, the allocation of tasks is adapted on-line in this algorithm by ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. responding to the run-time performance changes in the computers and network links. The algorithm is then approximated as a distributed protocol to coordinate the com puters. Resource heterogeneities and dynamic run-time performance variations are not specific to distributed high-performance computing systems. Other systems, especially the emerging networked sensor systems, share the same properties at the application layer. Such a similarity leads to the investigation of run-time adaptive executions of applications in networked sensor systems. In this dissertation, we also establish the connection between the distributed computing systems and networked sensor systems. We show that several classes of networked sensor system applications (data gathering and in-network processing problems), either with or without energy constraints, reduce to the same flow problem as the task allocation problem for distributed computing sys tems. This leads to distributed and adaptive solutions for these applications by using the proposed RIPR algorithm. The performance of these solutions has been verified through simulations. Another problem addressed in this dissertation is the maximiza tion of the fife time for data gathering in networked sensor systems. A strongly polyno mial algorithm is developed to maximize the life time. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self- stabilizing spanning tree construction and shortest path routing methods. Although the heuristic cannot not guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction With the vast development of computer hardware and networking technologies, high performance computing platforms have evolved from cray-style vector processors to massively parallel processing systems and to clusters of stand-alone computers. Such a trend has been witnessed by the Top-500 supercomputer list [63] since 1993. Despite the tremendous changes in the platforms, resource sharing and aggregation are the two properties that have been well kept and consistently enhanced. Following the trend of resource sharing and aggregation, distributed heterogeneous systems have emerged as attractive platforms of high performance computing. In such systems as the computational grids or peer-to-peer systems, distributed resources such as workstations and supercomputers are connected through local and/or wide area net works. By utilizing these distributed resources in a coordinated manner, such systems can meet the computational demands of many complex applications. Research in the computational grids is gaining momentum worldwide. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These distributed systems have quite a few important features. Typically, the soft ware and hardware characteristics of the individual computers are different. This is largely due to the fact that the resources belong to different administrative domains, or the computers have been purchased during a relatively long time period. Besides the hardware, the computers may be installed with different operating systems, com pilers, or kernel libraries. Such heterogeneities in the resources have brought various challenges to application design which will be discussed in detail in Section 2.3. Gen erally speaking, to achieve the optimal system performance, the application needs to maximize the efficiency of its software routines for each individual computer, as well as schedule the distributed computers to minimize the overall execution time. Another important property of such distributed systems is the lack of a centralized controller. Although some studies have proposed to maintain a (global) resource direc tory, bookkeeping the status of the individual resources, such a directory only serves as an observer of the whole system. The distributed resources belong to different ad ministrative domains, each having its own local users as well as local job submission and management policy. Consequently, when such resources are shared across the system, they inevitably exhibit dynamic load characteristics if we look at the compute power and communication bandwidth available to a specific application. This leads to another set of challenges to application design: run-time performance variations of the resources need to be considered when optimizing the system performance. This Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. issue has been studied from various aspects. The most studied method is the migra tion framework (will be discussed in more detail in Section 2.2), which checkpoints, stops, moves, and restarts an application if a better set of resources are found for the application. Despite these challenges, many applications in which large processing problems can easily be divided and solved independently have already been taking the great advantages of distributed computing systems. These include Monte Carlo simula tions and parameter sweep applications such as the study of neuromuscular transmitter release, the modeling of photochemical pollution, high energy physics, fluid dynam ics, etc. Internet based computing applications fall into the same category, maybe even more well know. For example, the SETI@home project harnesses the computing power of over 500,000 PCs that are connected to the Internet. Other similar projects include Folding@home, Drug design optimization, Human protein folding etc. This dissertation studies the computation of a large set of independent tasks on dis tributed heterogeneous computing systems, which models the computation paradigm of the above applications that are divided into subproblems and then solved indepen dently. The optimization of such computation paradigm, in its mathematical form, is similar to the scheduling of independent tasks to heterogeneous platforms. Various studies have addressed such a scheduling problem with the objective of minimizing the make-span (the overall execution time of all the tasks). This scheduling problem, in its general form, is known to be NP-complete. Hence the focus of researches along 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. this direction is to design efficient scheduling heuristics. The solution technique pre sented in this dissertation differs from the heuristic based methods in two aspects. First of all, the objective is to maximize the system throughput rather than minimizing the make-span. Because the system may not work at full speed when the application starts or gets close to completion, maximizing the throughput is not strictly equivalent to the minimizing of the make-span. However, if there are a large set of tasks to com pute (which is tree for almost all the applications that seek the help from distributed computing platforms), or when the application is of streaming style (the application virtually never ends such as SETI@home), then throughput becomes a meaningful, and some times the only feasible, performance metric. Compared with the heuristic based methods whose performance is often verified through experiments, The second difference is that the proposed solution technique is distributed and adaptive. Each node makes local decisions and actions only based on its own information and the information from its direct neighbors. Additionally, the proposed solution technique is able to adapt to changes in the computation and communication capabilities of the resources. By modeling the computation at the nodes as a special type of data flow, the throughput maximization problem is reduced to a flow optimization problem, which is then utilized to develop a distributed algorithm to coordinate the nodes. In this algo rithm, every node determines its own activities (send, receive, and compute) based on the information about itself and its neighbors. Such a distributed cooperation among 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the nodes is proven to maximize the system throughput. Additionally, the algorithm is able to adapt to changes in the computation and communication capabilities of the nodes. The optimality and adaptivity of the algorithm are verified through extensive simulations on both uniformly distributed and power-law distributed systems. The re sults show that the proposed algorithm, when approximated as a distributed protocol, achieves close to optimal system throughput and outperforms the first-come-first-serve greedy heuristic. Start-up time and adaptation time of the distributed protocol is ex perimentally shown to be of the same order as the system diameter. Resource heterogeneity and run-time performance changes are not the unique prop erty of distributed computing systems. Other systems, especially the emerging net worked sensor systems, are sharing the same properties at the application level. Re cent advances in micro-electro-mechanical systems technology and wireless commu nications have enabled the development of low cost and low power-consuming sensor nodes. Compared with traditional sensors whose only responsibility is to collect infor mation about the phenomenon and send the information to a central processor/station for further processing, a networked sensor system typically consists of a large number of sensors, each of which is able to sense and process data, and communicate with other sensors. The sensors can perform some computations and transmit only required and partially processed data. The communication capability allows such data to be relayed in the networked, and eventually routed to a powerful base station for fur ther processing and decision making. The placement of sensors does not need to be 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. carefully engineered, which allows random deployment on possibly harsh or hazard terrains. These features make the networked sensor systems suitable for a wide range of applications such as environmental monitoring [65], intrusion detection [15], target tracking and identification [70], etc. From networking point of view, a networked sensor system is very similar to an ad hoc network. However, networked sensor systems have some significantly different features. Typically, sensor nodes are powered by batteries. Due to the large scale and possibly harsh working terrains of such networks, replenishing energy by replacing batteries for the sensor nodes is infeasible. Various techniques have been proposed to improve the energy-efficiency of WSNs. For example, the voltage scaling technique explores the trade-off between supply voltage and processing power. The rate adap tation technique explores the trade-off between radio transmission power and energy consumption. In general, all such techniques achieve energy efficiency at the cost of reduced capabilities of the sensor nodes. More likely, these techniques will be applied on the fly while the nodes are sensing, processing, or communicating. This has two implications. First, the nodes may have different amount of power supply and, accord ingly, different capabilities. Second, this leads to run-time performance changes of the nodes. Additionally, sensor nodes may fail during run time, which leads to frequently changing network topologies. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A fundamental operation of networked sensor systems is to sense the environment and eventually transmit the sensed (or processed) data to the base station. In this dissertation, two classes of sensing and processing problems are solved. The first class of problems considers data gathering, which includes store-and- gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical appli cations. The focus is to maximize the throughput or volume of data received by the base station. By modeling the energy consumption associated with each send and re ceive operation, we formulate the data gathering problems as a constrained network flow optimization problem where each each node u Is associated with a capacity con straint wu, so that the total amount of flow going through u (incoming plus outgoing flow) does not exceed wu. This constrained flow problem in turn reduces to a standard network flow problem, which leads to an adaptive and distributed solution to the data gathering problems, using the algorithm developed for task allocations in distributed computing systems. The second class of problems considers the processing of the data collected by the sensors. The trade-offs between communication and computation energy [51, 60] have shown that in-network processing of the sensed data is more energy efficient than transferring the raw data to a powerful base station for processing. In-network processing leads to prolonged lifetime of the system, which is one of the most critical factors in the design and deployment of networked sensor systems. The focus in this 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dissertation is to study the performance of those applications that require in-network processing of raw data blocks. By modeling the processing of data blocks as a special type of data flow, we reduce the in-network processing problem to a network flow optimization problem, which again leads to an adaptive and distributed solution using the algorithm developed for computer systems. A third data gathering is also studied in the dissertation. The system is assumed to operate in rounds where a subset of the sensors generate a certain number of data pack ets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system life time in terms of the number of rounds the sys tem can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing span ning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available informa tion. No global synchronization is needed. Although the heuristic cannot not guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The rest of the dissertation is arranged as follows. Some related works and back ground knowledge are reviewed in Chapter 2. Chapter 3 discusses the task allocation problem in distributed computing systems and presents a distributed and adaptive solu tion. Chapter 4 addresses data gathering and processing problems in networked sensor systems. Chapter 6 presents a strongly polynomial algorithm to maximize the life time for data gathering in the networked sensor systems. Concluding remarks are presented in Chapter 7. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Related Work Related work is reviewed in this chapter. In section 2.1, two classes of dynamic dis tributed systems are briefly reviewed. These include the emerging computational grid and peer-to-peer systems, and the networked sensor systems. In this dissertation, we will study task allocation in the first systems and data gathering in the second systems. These two classes of seeming different systems share some intrinsic characteristics in that both systems are dynamic and distributed. Various techniques have been pro posed to address the adaptivity in distributed computing systems. These are reviewed in Section 2.2. General scheduling techniques for distributed heterogeneous systems are discussed in Section 2.3. A simple survey on networked sensor systems and energy efficient data gathering and processing is provided in Section 2.1.2. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1 Dynamic Distributed Systems The computational grid and peer-to-peer systems will be briefly reviewed in Sec tion 2.1.1, with emphasis on the properties and major enabling techniques of these newly emerging systems. In Section 2.1.2, we discuss the current development in net worked sensor systems, with emphasis on energy efficiency of such systems. 2.1.1 Computational Grid and Peer-to-peer Systems Grids integrate networking, communication, computation and information to provide a virtual platform for computation and data management in the same way that the Internet integrates resources to form a virtual platform for information [21]. The Grid infrastructure facilitates the unification of geographically distributed resources (usually under different administration domains) to support the execution of many large-scale data-intensive or computation-intensive applications. To enable the utilization of the resources in such complicated systems, tremendous research efforts have been dedicated to the construction of software systems that can provide secure and user-friendly accesses, resource allocation and arbitration, and co ordination among various administration domains. A layered model of the Grid has been widely accepted by the Grid community after more than a decade’s researches. 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The bottom layer of the model consists of the hardware resources that form the Grid physically. The resources include computers, networks, data storages, instru ments, visualization devices, etc. With the vast development of the networking and computation techniques, capabilities of the resources have been growing at an accel erated speed. For example, the current Grid testbed in the US utilizes the Internet2 Abilene network, which has a backbone performance of about 10G bps, and the speed is expected to continue increasing. Moreover, many individual nodes in a Grid are themselves high-performance parallel computers. Resources in the Grids are intrinsically distributed, heterogeneous and dynamic. They may have different software and hardware characteristics (in terms of operating systems, compilers, processor architectures, networking technologies, etc). Due to their heterogeneities, the resources inevitably operate at different speed (in terms of process power measured in MFlops or in terms of communication power measured in bps and end-to-end latencies.) Additionally, availabilities and capabilities of the resources are highly dynamic since (1) new resources may be added to the resource pool (2) current resources may be preempted or removed from the pool and (3) the resources are shared in a multi-user environment. The second layer is the infrastructure that provides a uniform view of (and accesses to) the heterogeneous resources. The core operations at this layer are to create, manage, discover, and destroy Grid Services, which precisely define the interfaces between ap plications, databases, resources, and any other Grid artifacts. While the Grid Services 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. enable interoperability for resource sharing, other components at this layer provide se curity, data management, resource management, and directory services (bookkeeping information about the available resources on the Grid and their status). Research ef forts such as NSF’s Middle-ware Initiative [48], OGSA [49], and the Globus toolkit have been addressing the issues at this layer. The next layer consists of software packages on top of the infrastructure layer. It is the middle-ware that abstract away some complex operations and hand-shaking protocols (e.g. authentication, accounting, file transfer, etc.) in the infrastructure. This middle-ware layer enables the end users and applications to efficiently program and utilize the full-fledged infrastructure layer. Key components at this layer include kernel libraries, scheduling software, etc. The top-most layer consists of the applications, which treat the Grid as a single and powerful system. The actual execution of the applications may be highly dynamic. Various resources may be used at different stages of the execution. However, these internal activities are transparent from the end users. Although such a layered model has not been standardized yet, an increasing num ber of Grid systems have already been deployed for testing purposes as well as many newly emerging applications. For example, the TeroGrid project builds a comprehen sive distributed infrastructure for open scientific research, which has been providing 20 teraflops of computing power that is distributed at multiple sites and connected through 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a 40 gigabits per second network. Other Grid systems have been built for data analy sis in high energy and nuclear physics experiments, science, earthquake engineering, etc. As scientific computation in these research areas is continuously driven by the improvement in data collection techniques, there will continue to be a need for even more powerful computing systems. This is exactly where the Grid technology fits in. 2.1.2 Networked Sensor Systems Networked sensor system technology is enabled by current advances in micro-electro mechanical systems and wireless communications. It provides many advantages over traditional sensing technologies which either perform remote sensing or require precise deployment of the sensors. In such a system, a large number of low-cost low-power sensors are deployed in the area of interest. The sensors not only perform sensing, but also cooperatively transfer the raw or processed data to a power base station for further processing. Such a unique operation mode has ensured a wide range of applications, ranging from health care and environment monitoring to military and other commercial applications, etc. As the processing and communication capabilities of the sensors continue to improve, implementing resource intensive applications such as audio/video streaming has already started to receive attention. The networked sensor systems have many unique features, at both the system and the node levels. One of the most important features of the sensors is the limited energy supply as the sensor nodes are usually powered by irreplaceable batteries. Compared 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with sensing and computation, communication is much more expensive in terms of energy consumption [2], Generally, data transfers are performed via multi-hop com munications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low- lying antennae and near-ground channels typically found in networked sensor systems. Short-range communication also enables efficient spatial frequency re-use. Various hardware design techniques have been proposed to give the end user the opportunity to prolong the system lifetime at the cost of lower throughput or higher transmission delay. For example, the voltage scaling technique explores the trade-off between the supply voltage and the processing capability. The rate adaptation tech nique explores the trade-off between radio transmission power and energy consump tion. Some other research works have proposed to switch the nodes to a power-saving stand-by mode when no activity is needed and wake up the nodes only when sensing or data transfer are required. Due to the application of such energy saving techniques, the available computation and communication capabilities of the nodes may be con tinuously adjusted during the life time of the system. Besides these hardware techniques, energy efficiency can be achieved at the MAC (media access control), network and application layers, or through a combined opti mization crossing multiple layers. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, Sensor-MAC is a MAC protocol specifically designed for networked sensor systems with energy efficiency being the major concern. While other MAC protocols for wireless communications focus on collision avoidance, Sensor-MAC re duces energy consumptions at the cost of reduced per-hop fairness and possibly larger per-hop latency. Sensor-MAC reduces the listen time of the nodes by letting them go into periodic sleep mode. Neighboring nodes are synchronized so that they go to sleep at the same time and wake up at the same time. Sensor-MAC uses carrier sense and RTS/CTS exchanges to avoid collision. Each transmitted data packet has a duration field, indicating how much time the transmission will take. If a node receives a packet destined to other nodes, it remains silent (according to the duration field) until the transmission completes. Sensor-MAC also reduce the time period that a node listens to transmissions not destined to itself. This is achieved by letting interfering nodes go to sleep after they hear an RTS or CTS packet. At the network layer, the key player is energy efficient routing protocol. Exten sive researches have been conducted in this area. An illustrative protocols is LEACH (Low Energy Adaptive Clustering Hierarchy) [30]. LEACH is a clustering based com munication protocol. It uses localized coordination to maintain the cluster structure and randomly rotate the cluster head within each cluster to evenly distribute the en ergy load on the nodes. It also applies data fusion at the cluster heads to reduce the amount of data that needs to be transmitted to the base station. Another example is 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the energy-aware protocol proposed in [56], This protocol maintains a set of paths in stead of a single optimal path. These paths are selected based on a certain probability that depends on the energy consumption of each path. By choosing different paths for routing, the energy of any single path do not deplete quickly. A important operation in most applications of the networked sensor systems is to sense the environment and transmit the sensed or processed data to a base station. Numbers of techniques have been proposed to explore energy efficiency for such a data gathering operation. In [35], data gathering is assumed to be performed in rounds and each sensor can communicate (in a single hop) with the base station and all other sensors. The total number of rounds is then maximized under a given energy constraint on the sensors. In [50], a non-linear programing formulation is proposed to explore the trade-offs between energy consumed and the transmission rate. It models the radio transmission energy according to Shannon’s theorem. In [54], the data gathering prob lem is formulated as a linear programing problem and a 1+ u approximation algorithm is proposed. This algorithm further leads to a distributed heuristic. Some other applications proposed may have different requirements. For example, the balanced data transfer problem [24] is formulated as a linear programming problem where a ‘minimum achieved sense rate’ is set for every individual node. In [23], data gathering is considered in the context of energy balance. A distributed protocol is designed to ensure that the average energy dissipation per node is the same throughout the execution of the protocol. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 Adaptivity in Distributed Systems Adaptivity of software packages (in terms of performance) in distributed systems has been an active research area. There are two major issues under this general topic. The first is to optimize software routines for each individual computation platform. This objective can be classified as static adaptation. The second is to optimize the performance of the whole system in face of the dynamic load characteristics of the resources. There have been a number of efforts in designing and developing adaptive software packages. These packages differ in their functionalities, platforms, as well as when adaptivity is performed. Some examples are listed below. ATLAS (Automatically Tuned Linear Algebra Software) improves the perfor mance of dense linear algebra kernels over the well known BLAS (Basic Linear Al gebra Subroutine) packages. ATLAS captures the hardware characteristics of the plat forms during installation time by collecting such information as the number of floating point units in the processor, the number of pipeline stages, and the cache size. It then chooses the block size, the instruction set, and loop structures etc to exploit instruction level parallelism (ILP) as well as cache localities, all of which are performed auto matically during the installation of ATLAS. Optimal or close-to-optimal performance (comparable to or exceeding vendor provided linear algebra packages) has been ob served across most known computer architectures. SPIRAL (Signal Processing Implementation Research for Adaptable Libraries) is a code generation system for DSP (Digital Signal Processing) transforms. Due to their 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recursive nature, many DSP transforms can have multiple algorithms and implemen tations. SPIRAL uses a formal framework to generate many alternative algorithms for a given transform and translate them to code. Then SPIRAL searches over and prunes these alternative implementations and finds the best one for the targeted plat forms. Similar to ATLAS, SPIRAL also explores ILP and cache localities and achieves optimal or close-to-optimal performance across various computer architectures. GrADS (Grid Application Development Software) is an ongoing research project to automate the execution of applications in the Grid environment. In the GrADS framework, a program is augmented to include not only the code, but also the strat egy for mapping the program onto distributed computing resources and an estimation model to predict the performance of alternative mappings. The Application Manager initiates the resource selection, launches and monitors the execution of the program. In case the observed performance falls below some pre-set threshold, the contract monitor invokes the rescheduler which carries out some appropriate actions such as replacing the resources, redistributing the tasks, or selecting an alternative mapping program/resource mapping. The goal of GrADS is to provide good resource alloca tion for the applications and to support adaptive reallocation if performance degrades because of changes in the availability of Grid resources. Several other studies have proposed to improve the efficiency of rescheduling by considering not only the potential improvement of rescheduling, but also the cost of 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rescheduling itself. Nevertheless, these approaches perform adaptivity during the ex ecution of the applications. Their effectiveness depends on the correct prediction of resource capabilities and estimation of the execution time. 2.3 Task Allocation and Scheduling in Distributed Systems Given a set of distributed computers interconnected with network links that may oper ate at different speed, the matching of tasks onto computers and scheduling the execu tion order of the tasks have a huge impact on the execution time. The minimization of the overall execution time of all tasks, namely the make-span, in its general form, is known to be NP-complete. Hence developing efficient heuristics has become the major concern. For example, when tasks are independent, the Opportunistic Load Balancing heuristic assigns each task to the next computer that is expected to be available, with the expectation to keep all the computers busy. The Minimum Execution Time heuris tic assigns each task (in an arbitrary order) to the computer with the minimum execu tion time. The Minimum Completion Time heuristic assigns each task (in an arbitrary order) to the computer with the minimum expected completion time for the task. Some other heuristics are more complicated. For example, the Min-min heuristic searches all the un-assigned tasks, picks the task that has the minimum completion 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. time and assigns the task to the corresponding computer. This effectively removes one tasks from the set of un-assigned tasks. This procedure is repeated until all the tasks are assigned. The Max-min heuristic is similar to the Min-min heuristic, except that the task with the maximum completion time is selected each time. Other heuristics use various search techniques such as genetic algorithms, simulated annealing, etc. When tasks have inter-dependencies, the applications are usually represented as a DAG (directed acyclic graph) where the nodes represent the tasks and the links repre sent the dependencies among the tasks. The allocation and scheduling of task DAGs onto distributed systems have also been extensively studied. Due to the intrinsic com plexity of the problem, again, heuristics are the major concern. These heuristics differ from various aspects. For example, some heuristics assume a limited number of pro cessors while others do not. Some studies assume a general communication model where contention may occur. For detailed description of these heuristics, please refer to [14]. The performances of many heuristics have been benchmarked, using both ran domly generated task graphs and graphs abstracted from actual applications. Accord ing to these benchmarking works, heuristics utilizing the critical-paths (the longest path in the task DAG) are in general better than other heuristics. Yet, it should be pointed out that the conclusion is drawn from the benchmarking experiments and it does not necessarily predicts the performance of a particular application deployed in a particular systems. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For computation in heterogeneous systems, maximizing the system throughput is not a new idea. A well known example is the Condor project [62], It develops a soft ware infrastructure so that distributed system with different ownerships can be utilized in a uniformed manner to provide high throughput computation. The throughput max imization problem has also been studied from various algorithmic aspects. The work in [61] considers heterogeneous computing system that are connected via a general graph topology. The application is of streaming style: a task continuously receives data from certain preceding tasks, process the received data, and sends processed data to some other tasks. Finding the optimal mapping from tasks to computers (such that the throughput of data processing is maximized) is shown to be NP-complete and a mapping heuristic is developed in [61]. A different scenario is considered in [8] where the system topology is also graph structured but the application consists of a set of independent identical problems and each problem in turn consists of a set of inter dependent tasks. The result shows that maximizing the throughput (defined as the number of problems executed in one unit of time) for this general scenario is also NP-complete. Although these general scenarios of throughput maximization is NP-complete, better complexity results and algorithms have been obtained for some specific (and possibly more practical) scenarios of the application settings and system topologies. Throughput maximization for single level master-slave computation in a Grid has been studied in [57], where the compute resources are considered to communicate with the 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. root node only. When the application consists of a large set of equal-sized independent tasks, the throughput maximization in general graph-structured system is studied in [5], The problem is uniformly formulated as a linear programming problem, which is a well-studied problem with many algorithms available. However, these algorithms are centralized and not suitable for distributed execution. In [9], a bandwidth-centric method has been obtained for the computation of equal-sized independent tasks when the computers are connected via a tree topology, which further leads to a localized au tonomous task allocation strategy. When the system are connected via a general graph topology, the problem of extracting a best spanning tree that has the highest through put among all possible spanning trees is studied in [6]. The result in [6] shows that the achievable throughput of the optimal spanning tree, in the worst case, can be arbitrarily bad compared with that of the original graph-structured system. Similar to some of the previous studies, we also study the throughput maximiza tion for the computation of equal-sized independent tasks. Our study differ from the previous studies in that we develop a distributed and adaptive algorithm when the system has a graph-structured topology - which constitutes the major contribution in Chapter 3. In this dissertation, we consider the system model in which (1) a computer can send and receive data to/from multiple neighboring computers concurrently; (2)computation and communication can be overlapped at the computers. This model, as well as several other more restrictive models, have been considered in previous studies such as in [5]. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, a computer may not send or receive at the same time; a computer may not send/receive data to/from multiple computers; or the computation and communication may not be overlapped. The study in [9] develops a unified distributed allocating algorithm for tree-structured systems for all the various system models. We show that, for the model we consider, we can develop a distributed and adaptive task allocation algorithm for general graph-structured systems. Additionally, the model we consider represents the capabilities of a typical modem computer. 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Adaptive Allocation of Independent Tasks to Maximize Throughput In this chapter, we consider the task allocation problem for computing a large set of equal-sized independent tasks on heterogeneous computing systems. This prob lem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider a general problem in which the interconnection between the nodes is modeled using a graph. We show that the maximization of system throughput reduces to a standard network flow problem. We then develop a decentralized adaptive algorithm that solves a relaxed form of the stan dard network flow problem and maximizes the system throughput. This algorithm leads to a simple decentralized protocol that coordinates the resources adaptively. Sim ulations have been conducted to verify the effectiveness of the proposed approach, for both uniformly and power law distributed systems. Performance improvement over a 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. first come first serve greedy heuristic has been observed. The adaptivity of the pro posed approach is also verified through simulations. 3.1 Introduction With recent advances in networking and computation techniques, distributed hetero geneous computing systems have become attractive platforms for high performance computing. In such systems, distributed resources such as workstations and super computers are connected through local and/or wide area networks. By utilizing these distributed resources in a coordinated manner, a heterogeneous computing system can meet the computational demands of complex applications [27]. In this chapter, we consider the computation of a large set of equal-sized inde pendent tasks in a heterogeneous computing system. In particular, we consider the scenario where each task is to process a fixed amount of source data. The source data of all the tasks initially resides on a single node in the system, which we call the root node. Other nodes in the system need to receive the source data of a task, either directly from the root node or indirectly through other compute nodes, before they can compute the task. This computation paradigm models a variety of research and commercial activities. Internet based distributed computations are among the most well-known examples, which include SETI@home [37], Folding@home [40], data 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. encryption/decryption [20], etc. The computation paradigm also models various ap plications that are typically executed on tightly coupled systems (e.g. Monte Carlo simulations, Computational Phylogeny [46], etc). The computation paradigm reduces to the general problem of allocating or schedul ing independent tasks in heterogeneous computing systems. One possible objective of optimizing the system performance is to minimize the overall execution time (the make-span ) of all the tasks. Although some specific scenarios can be solved in poly nomial time, the make-span minimization problem is known to be NP-complete in its general form. In this chapter, we consider an alternative optimization objective: maxi mizing the system throughput. This objective may be less meaningful when there are only a few tasks to compute. In such a scenario, the application may have already terminated before the system can reach its achievable throughput. However, if the ap plications have a large number of tasks, then system throughput becomes a suitable optimization objective. For such applications, it is the achievable throughput that de termines the maximum number of tasks that the system can execute during a given time period. Because computation and communication resources in distributed heterogeneous computing systems are typically shared among multiple users and applications, the network performance and the effective compute power of each node may vary at run time. This is particularly true in the case of Internet based computation, Peer to Peer Computation, and the Grid [33]. Optimizing the performance of the system based on 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a snapshot of the current system status may not lead to optimal performance. Conse quently, the task allocation needs to be adaptive to the changes in the system [10, 53]. We show that the throughput maximization problem can be efficiently solved in an adaptive fashion. We model the computation as a special type of data flow. This leads to our extended network flow (ENF) representation for the throughput maximization problem. Based on the ENF representation, we show that the system throughput can be transformed to the network flow in a corresponding graph. More importantly, for the network flow maximization problem, we develop a decentralized task allocation algorithm that adapts to the changes in the system. This task allocation algorithm can then be implemented as a decentralized task allocation protocol to coordinate the com pute nodes in the system. Simulations were conducted to verify the effectiveness the ENF representation based task allocation approach. Simulation results show that the overhead of task allocation (transferring control messages among the compute nodes) is negligible when the number of tasks becomes large. We have also observed perfor mance improvement over the heuristic that allocates tasks in a first come first serve fashion. 3.2 System Model The compute nodes are assumed to be connected via an arbitrary topology and the system is represented by a directed graph G(V, E ). Each node u G V in the graph represents a compute node, u has weight wu, representing the processing power of u, 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i.e. u can perform one unit of computation in 1 /w u time. The edge (it, v) G E in the graph represents a network link from u to v. Edge (it, v) has capacity cuv representing the communication bandwidth of (it, v), i.e. (it, v) can transfer one unit of data from it to v in 1 / c : uv time. To model non-symmetric communication links, the edges are uni-directional, so G is directed and (it, v) ^ (v, it). In the rest of this chapter, we use ‘edge’ and ‘link’ interchangeably. The successors of it in G is denoted as au = {w E V|(it, w) E E } and the predecessors of it in G is denoted as il)u = {ui E V\(w, v) G E}. Although the physical media in modem networking techniques may be simplex (such as most fiber optic communications that use one strand to send data in each di rection) or half-duplex (such as un-switched Ethernet which allows at most one device to transmit at a time), full-duplex network interfaces have been widely supported and implemented as the standard practice. Consequently, we only consider full duplexed network interfaces, which means that the compute nodes can send and receive data concurrently. We also assume that the network interfaces can communicate with multiple ad jacent nodes concurrently. This represents the modern network techniques (e.g. the packet switching technique) that support concurrent communications. However, the rate with which a network interface sends and receives data cannot increase infinitely as the number of concurrent communications increases. The data transfer rate cannot 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exceed the hardware limitation of the network interface. Furthermore, there are typi cally a send and a receive buffer associated with each network interface. Implemented in either hardware or software, the buffers are used to control the data flow rate as spec ified by the network protocol. To reflect this limitation, for each node u, we introduce another two parameters: c™ and c°ut. These two parameters indicate the capability of u to receive and send data: within one unit of time, at most c™(c°ut) units of data can flow into (out of) u. We further assume that the compute nodes can perform computation and commu nication concurrently. The overlapping of computation and communication is made possible by various techniques (e.g. direct memory access, multi-threading, etc) and supported by software libraries such as PVM and MPI. Performance improvement obtained by overlapping the communication and computation has been observed in various studies such as [7]. Some researchers have also pointed out that certain cost is associated with the overlapping of computation and communication [38], i.e. the computation capability of a computer may reduce if the computer is involved inten sively in communications. For discussion in this chapter, we do not consider the cost of overlapping. Without loss of generality, we assume that each task is to perform one unit of computation on one unit of source data. The tasks are independent of each other and do not share the source data. A compute node can compute a task only after receiving the source data of the task. Initially, node s holds the source data for all the tasks, s is 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. called the root node. Except s, all other compute nodes in the system need to answer the same questions (1) where to get the tasks? (2) how many tasks to compute locally? (3) where to send the remaining tasks? The purpose of this study is to answer the above three questions for all nodes in the system such that the system throughput can be maximized. The throughput of a system is defined as the number of tasks computed by the system in one unit of time under steady state condition. For convenience, we say that a task is transferred from u to v when the source data of a task is transferred from u to u. Let f(u , v) denote the number of tasks transferred from u to v in one unit of time. We have the following formal problem statement: Base Problem: Given a directed graph G (V,E). Node u € V has weight wu > 0, input constraint cj" > 0, and output constraint c°ut > 0. Edge (u, v) has capacity constraint cuv > 0. s is the distinguished root node. Maximize: w a + E uev - { s } ( E ^ / ( u> u) “ E „ e< r u v)) 1. 0 < f(u, v) < cuv for (u, v) e E 2 ■ c fo r U e V 3- Eweau /(« . w) < c° u ut f o r U e V 4- 0 < / K u) - Y ,we< T u f ( u ’ w) < w u f o r u e V - {s}. The following is the detailed explanation of the problem statement: In the optimization objective, E u e ^ f(v> u) ~ Eue<ru / ( u;v) IS the net number of tasks received (and processed) by node u in one unit of time. Because s does 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. not need to receive tasks from other nodes, the computation capability of s is just an additive factor to the optimization objective. The optimization objective is therefore to maximize the total number of tasks processed by all the nodes in the system. Condition 1 reflects the capacity constraints of the edges. In Condition 2, Ylwetpu f ( wiu) is the total number of tasks transferred to u. Condition 2 means that no node can receive tasks at a rate high than what is allowed by its network interface. Similarly, Condition 3 limits the rate that a node can send out tasks. In Condition 4, Ylweipu / ( w)u) ~ Ylwevu w) is the net number of tasks that u has kept locally. Condition 4 means that any node (except s, which has the source data for all the tasks) cannot not keep more tasks than it can compute, otherwise the number of un-computed tasks on this node will increase monotonically as time advances. Base Problem has a linear programming formulation. Because we are consider ing steady-state throughput, Base Problem only needs to be solved in rational. We are not looking for integer-valued solutions. Various algorithms have been proposed to solve linear programming problems. These include the Simplex algorithm that has excellent average case performance, and the interior point algorithms that guarantee a polynomial execution time. However, these linear programming algorithms need to be executed by a central coordinator that knows all the parameters of the problem. Although parallel implementations of these algorithms do exist, the parallelism comes from the parallelization of linear system solver, matrix inversion, matrix-vector mul tiplication, etc. The central coordinator still needs to know all the parameters of the 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problem before distributing the computations. Centralized algorithm is not desirable for the distributed computing system we consider. If an instance of the Base Problem has G as the input graph and s as the root node, we denote it as Base Problem (G, s). 3.3 Extended Network Flow Representation for the Task Allocation Problem Base Problem (G, s) can be transformed to a network flow representation using the following procedure. (In the next section, a distributed and adaptive algorithm will be developed based on this representation.) Graph Transformation: 1. Create an empty graph G'(V', E'). 2. Insert a node t into V '. 3. For each node u in Base Problem (G, s), (a) insert three nodes u i, u2, and u3 into V'. -t/i, u2, and u? J all have zero weight; (b) insert edges (u2, ui) and edges (u i,u 3) into E'. Set the capacity of (u2, ui) to c“ , and the capacity of (ui, u3) to (3) insert edge (ul51) into E' and set the capacity of (ui,t) to wu. 4. For each edge (u, v) in Base Problem (G, s), insert edge (u3, v2) into E' and set the capacity of (u3, v2) to cuv. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ V ? Figure 3.1: Transform a base problem to a network flow representation. Left: a base problem with three nodes. Right: the corresponding network flow representation. The transformation results in a new graph G'. A hypothetical node t is first added to G'. Each node it in Base problem (G, s) is then split into three nodes u\, it2, and it3, representing the processor, the input interface, and the output interface of it. Hypo thetical edges from the iti to t represent task executions at the processor of it. Si is the root node in G'. The transformation procedure is illustrated in Figure 3.1. To simplify the figure, node weight and edge capacities are not marked. After transforming the graph, we have the following flow maximization problem: Problem 1: Given a directed graph G(V, E ) with root node s and sink node t. s ± t. Edge (u, v) has weight cuv > 0. Maximize: £ u e < T s f{s, u) - £ u /(it, s) Subject to: 1. 0 < /(it, v) < cuv for (it, v) £ E 2- f(u, v) = u) for it G F - {s, t} 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Problem 1 is the well-studied network flow problem. The objective is to maximize the amount of flow out of the root node without violating the edge capacity constraints. In the mean time, all the nodes except root s and sink t must have the same amount of incoming and out-going flow. Similarly, if an instance of Problem 1 has G as the input graph, s as the root node, and t as the sink node, we denote it as Problem 1 (G ,s,t ). We further use Tb (G,s) to denote the maximum throughput for Base problem instance (G, s), and 7j(G , s, t) to denote the maximum flow for Problem 1 instance (G ,s,t ). The next theorem shows that the Base Problem is a special case of Problem 1 after applying the graph transformation. Theorem 1. Suppose Base Problem (G, s ) is converted to Problem 1 (G1 , .Si, t) by applying the above graph transformation, then TB( G , s ) = T 1(G',su t) Proof: We use the notation in Procedure 1 to denote the nodes and edges in G and the corresponding nodes and edges in G'. Suppose f(u,v), (u, v) e E is a feasible solution for Base Problem (G, s). We map it to a feasible solution v'), (u', v') G E' for Problem 1 (G1 , si, t) as follows: 1- f'(u3,v2) <- f(u,v ) 2. /'(«2,ui) ^Y ,w e ipJ(w,u) 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3- f'(uUU3) <-Y,wZa*f(U’W) 4 . f'(uu t) 4- f'{u2,ui) - f'(ui,u3) It is easy to verify that such an / ' is a feasible solution for Problem 1 (G', si, t) and that / ' results in the same throughput as / . Suppose ;/), (V, v') E E' is a feasible solution for Problem 1 (G', .sy, t). We map it to a feasible solution f(u, v), (u, v) E E for Base Problem (G, s) as follows: f(u,v) = f'{u3,v2) It is also easy to verify that such an / is a feasible solution for Base Problem (G, s) and that it has the same throughput as /'. □ Problem 1 is the well studied network flow problem. Several algorithms [17] can be used to solve this problem, e.g. the Edmonds-Karp algorithm which has 0 (|E | • IE1 ! 2) complexity, the Push-Relabel algorithm which has 0 ( |E |2 • |J5|) complexity, and the Relabel-to-Front algorithm which has 0 ( |E |3) complexity. However, in terms of decentralization and adaptivity, these well-known flow max imization algorithms are not suitable for distributed computing environments. Both the Edmonds-Karp and the Relabel-to-Front algorithms are centralized. The Push- Relabel algorithm has a decentralized implementation where every node only needs to exchange messages with its immediate neighbors and makes decisions locally. But in order to be adaptive to the changes in the system, this algorithm has to be re-initialized and re-run from scratch each time when some parameters (edge capacities) of the flow 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. maximization problem change. For each of the re-runs, none of the nodes can start executing the push and relabel operations until all the nodes have finished the initial ization process (setting h(u) to 0, etc). In this case, there has to be a global controller that monitors the initialization of all the nodes and gives the nodes ’ok-to-start’ signal. This again compromises the desired property of decentralization. 3.4 Decentralized Adaptive Task Allocation Algorithm In this section, we first show that the maximum flow remains the same even if condition 2 in Problem 1 is relaxed. Then we develop a distributed and adaptive algorithm for the relaxed problem. 3.4.1 Relaxed Flow Maximization Problem Consider the example in Figure 3.2. The flow and capacity of each edge is marked in the form of “flow/capacity”. Given the capacities of the edges, the maximum achiev able flow is 18. In this example, node e has 12 units of in-coming flow and 15 units of outgoing flow. Such a flow is not a feasible solution to Problem 1 since condition 2 is violated at node e. Suppose the nodes form an actual system and 12 tasks have reached e, then e can send out no more than 12 tasks even if it is allowed to send out 15 tasks. The above observation can be generalized as follows: when a system is ac tually deployed, the number of tasks that a node can send out is limited not only by 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6/10 6/10 10/ 10, 4/4 4/4 8/8 15/10 8/8 Figure 3.2: An example of the relaxed flow maximization problem the capacities of the edges, but also by the number of tasks that have been received. Intuitively, ‘allowing’ a node to send out more tasks them what has been received will not affect the system throughput adversely. This leads to the following relaxed network flow problem: Relaxed Flow Problem: Given directed graph G(V, E ), source node s G V, and sink node t G V. Edge (u,v) has weight cuv > 0. Maximize: E u e v / 0 ^ ) Subject to: 1. f(u, v) < cuv for u € V, v £ V 2. f(v, u) = —f(u, v) for u, v G V 3- T,vevf(v’u) ^ 0 f o r u e F - { s , f } In the problem statement, we have adopted the widely used notation for network flow problems: when the actual data transfer is from u to v, we define f(v,u) = —f(u,v). Additionally, when edge (u,v) E, we define cuv = 0 and still enforce 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the capacity constraint on (u,v). In this way, we can define f(u,v) over V ' x V, rather than being restricted to E. If neither (u, v) nor (v, u) belongs to E, then cuv = cvu = 0, which implies that f(u,v) = f(v,u) = 0. after enforcing edge capacity constraints. f(v, u) = — /( it, v) also allows us to compute the total amount of flow into u as Ylwev f(wiu) (which is equal to ^ we^ U ( T „ f{w-> u)) Note that this expanded definition of f(u, v) is for notational convenience only. It does not change the essence of the flow problem. The Relaxed Flow Problem differ from Problem 1 in that the total flow out of a node can be equal to or larger than the total flow into the node (condition 3). The objective is to maximize the total amount of flow out of root s. A feasible function / to the Relaxed Flow Problem is called a relaxed flow in graph G. We use TR(G, s , t) to denote the maximum throughput that flows out of node s in a relaxed flow problem with graph G, root s and sink t. The following theorem shows the relation between the Relaxed Flow Problem and Problem 1. Theorem 2. Given graph G(V, E), source s and sink t. If f* is an optimal solution to the Relaxed Flow problem, then there exists an optimal solution f to Problem 1 such that 0 ^ f (u, v) ^ f {n, uj fo r each f (u, u) ^ 0. Additionally, Tr(G, s , t ) — Ti(G,s,t). Proof of the theorem is not difficult and omitted here due to space limitations. 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.2 The Algorithm The algorithm is an augmentation to the Push-Relabel algorithm and is denoted as the Relaxed Incremental Push-Relabel (RIPR) algorithm. To explain the RIPR algorithm we need two additional notations. For each node u € G, e(u) is defined as e(u) = J2weV f { w , u), which is the total amount of flow into node u. An integer valued auxiliary function h(u) is also defined for u G G, which will be explained in the algorithm. The algorithm is described as below: 1. Initialization phase: h(u), and f(u, v) are initialized as follows: h(u) - < —0 fo r u G V — {s} f(u, v) 4— 0 fo r u ^ s an d v ^ s h(s) <- \V\ f{s,u) 4 - csu fo r u e v f(u,s) < f(s,u ) for u ^ V e(u) 4 - T lwev f ( w ^u) f o r u e V 2. The initialization phase is executed only once (when the algorithm starts). After all the nodes complete the initialization phase, every node in the system except s and t execute the following two operations as long as e{u) > 0: (a) P ush(u, v): applies when e(u) > 0 and 3v 6 V s.t. cuv — f(u,v) > 0 and h(u) > h(v), d = m in ( e ( it ) , cuv - f(u , v)) 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u, v) < — f (u, v) + d f(v,u) < f(u,v ) eiu) ^ Y , w€v f ( w’u) e(v) ^ Y , wevf(w’v) (b) Relabel(u): applies when e(u) > 0 and h(u) < h(v) for all v G {v\cuv — f(u,v) > 0}, h(u) G- min„e{t)|C iti)_/ (u^)>0} h(v) + 1 3. Whenever the capacity of some edge edge, say (u, v), changes from changes to c^ , the following Adaptation (u , v) operation is executed: (a) if > c, b and /(u, u) < c ^ , do nothing. (b) if <4e > C m and /(« , w ) = c*,0, then /i(s) G - /i(s) + 2|y| /(s,w) G-csu for rt G y f(u,s) i f(s,u ) f o ru e V e(u) <- Ysvev f(v> for u G V " (c) if 4> < C fio and /(« , v) < c ' m , do nothing. (d) if C y < cm and f(u, v) > then h(s) < —h(s) + 2\V\ f(s,u ) < -c3 u for u G y f(u,s) < f(s,u) for u e V 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u,v) <-£4fi f{v,u) < f{u,v) e (u ) ^ Y , v e v f ( v ’u) foruGV In the algorithm, e(u), and h(u) are the local variables maintained by u. Only it’s immediate neighbor nodes will query the value of h(u) (to determine if a ‘push’ or ‘relabel’ can be executed, u 's neighbor nodes need to know h(u)). The ‘push’ and ‘relabel’ operations only change the local variables maintained by u (e(u), h(u), f(u, v) where v is a neighbor of v). Note that /(u, v) is actually shared between both u and v. Maintaining a consistent image of a shared variable is another research topic with quite a few research results available (e.g. certain consistency protocols have been designed in []). In summary, both ‘push’ and ‘relabel’ operations are distributed. When the ‘adaptation’ operation is performed due to the capacity change of edge (u,t5, the algorithm changes f(u,v), which is local to u and v. The algorithm also increases h(s) by 2\V\ and set the flow out of s to the edge capacities, regardless of the new capacity of (u, v. Notifying s about the capacity change is indeed not a local operation. However, since all the tasks initially reside on the root node in our task allocation problem, it is reasonable to assume that every node can send a message to the root node. The RIPR algorithm assumes that s knows |V|, the total number of nodes in the system, which is the only global information that the RIPR algorithm needs to know. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Although nodes in the system execute the ‘push’ and ‘relabel’ operations in an asynchronous fashion, we assume that each individual operation is atomic, meaning that a node u cannot read the intermediate values of h(v) of its neighbor v while v is updating h(v). The same rule applies to other local variables maintained by v. The ‘adaptation (u, v’ operation actually consists of two parts (1) update the flow on edge (u, v if necessary (2) s increases h(s) and the flow going out of s. We assume that both part 1 and part 2 are atomic operations. Note that the execution of ‘push’ and ‘relabel’ operations starts immediately after the initialization phase and continues at each node u as long as e(u) > 0. ‘push’ and ‘relabel’ at one node may enable ‘push’ and ‘relabel’ to be executed at other nodes. The ‘adaptation’ operation increases the value of h(s) and may increase some e(u) to a positive value, which will also enable new ‘push’ and ‘relabel’ operations to be exe cuted. The algorithm will execute continuously as long as edge capacities keep chang ing. Assuming that edge capacities eventually stop changing, the following theorem which shows that eventually no push and relabel operations will need to be executed and the RIPR algorithm finds the maximum flow then. Theorem 3. Given graph G(V,E) with root s and sink t, and assume that no ca pacity changes occur after the n th adaptation operation, then the number of ‘ push’ and ‘relabel’ operations executed by the RIPR algorithm is bounded from above by 0 (n 2 • \V\2 ■ \E\), where \ V\ is the number of nodes and \E\ is the number of edges in 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the graph. Additionally, the RIPR algorithm finds the optimal solution to the relaxed flow problem when no ‘ push’ or ‘relabel’ operation can be performed. The proof of the theorem is presented in the Appendix. The execution of the RIPR algorithm follows a ‘stimulation-response-stabilization’ pattern when the graph changes dynamically. After initialization, the nodes execute the ‘push’ and ‘relabel’ operations (distributively) until no further ‘push’ or ‘relabel’ can be applied. At this point, f(u, v) determined by the nodes constitutes an optimal solu tion to the relaxed flow problem for the current status of the system. Then some edge capacity changes, then ‘adaptation’ is evoked, which will trigger new ‘push’ and ‘rela bel’ operations. Again, when no more ‘push’ and ‘relabel’ operations can be applied, f(u,v) determined by the nodes constitutes a new optimal solution to the relaxed flow problem for the then current status of the system. Edge capacity can change before the RIPR algorithm finds an optimal solution. However, the RIPR algorithm is guar anteed to find an optimal solution for the current status of the system as long as edge capacities do not change any more. The RIPR algorithm differs from the original Push-Relabel algorithm in two as pects. First, it solves the relaxed flow maximization problem instead of the standard flow maximization problem. Second, when some edge capacities have changed, the RIPR algorithm starts from the current values of f(u, v) and searches for the new op timal solution. Such an incremental optimization means that the algorithm does not 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. need to be re-initialized when adapting to edge capacity changes. Consequently, no global controller is needed to monitor the initialization process of all the nodes. 3.5 On-line Task Allocation Protocol The RIRP algorithm can be approximated as a simple protocol to actually allocate the tasks. As discussed in Section 3.3, a compute node maps to three nodes in the network flow representation. Hence in the protocol each compute node needs to execute ‘push’, ‘relabel’, and ‘adaptation’ for the three ‘hypothetical’ nodes. For the discussion in the rest of this section, u is used to refer to either a compute node or a node in the network flow representation, which can be easily clarified given the context. In this protocol, a task buffer is maintained by each compute node. By limiting the size of the task buffers we can prevents any compute nodes from accumulating more tasks than they can compute. The buffers contain the source data of the tasks. Initially, the task buffer at the root node contains all the source data and all other task buffers are empty. Let b(u) denote the length of the buffer at u. At any time instance, each compute node u e V operates as follows: 1. Contact the adjacent compute node(s) and perform the ‘push’, ‘relabel’, and ‘adaptation’ operations, if necessary. By performing the operations, u can find the optimal rate /(« , v) to transfer data to/from each neighbor v. 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. If b(u) > 0 and u is not computing any task, then remove one task from the task buffer, set b{u) b(u) — 1, and compute the task. 3. While b(u) > 0 and u is computing a task, send message ‘request to send’ to each node v s.t. f(u,v ) > 0. If ‘clean to send’ is received from v, then send a task to v at rate f(u , v) and set b(u) b(u) — 1. To maximize the system throughput, u sends tasks at the rate determined by the RIPR algorithm, rather than utilizing the full capacity of its outgoing edges. 4. Upon receiving ‘request to send’, u acknowledges ‘clean to send’ if b(u) < U. u acknowledges a denial if b(u) > U. Here U is a pre-set threshold that limits the maximum number of tasks a task buffer can hold. In the above protocol, two types of data are transferred in the system: the control messages that are used by the RIPR algorithm, and the tasks themselves (or more precisely, the source data of the tasks). The control messages are exchanged among the compute nodes to query and update the values of h(u) and f(u , v). The ‘request to send’ and ‘clean to send’ messages are also control messages. It should be pointed out that the flow rate control used in step 3 of the above pro tocol (data is sent at a specific rate) is applicable. First of all, the RIPR algorithm guarantees that /(u , v) never exceeds the bandwidth constraints of cuv and the capa bility constraints of c°ut and cj1 , even before the optimization is completed. (In case of a capacity change, /(« , v) will be updated according to the new capacities as in the ‘adaptation’ step of the RIPR algorithm.) Furthermore, rate control is supported by 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. current network techniques and software development, (e.g. many FTP software can impose the rate that data is transferred.) 3.6 Experimental Results Simulations were performed to evaluate the performance of the proposed task alloca tion protocol. Often the topology of a distributed heterogeneous system exhibits a hierarchical structure, rather than an arbitrary graph. At the back-bone level, various geographi cally distributed administrative organizations are connected via the Internet, resulting in a graph topology, if each administrative organization is abstracted as a single super node. Yet within each administrative organization, the interconnection of the compute resources is often a tree. For example, when a computer cluster is connected through a local area high-speed network such as the Myrinet, usually one of the computers is chosen to be the interaction node, which accepts jobs from the user, distributes the computation to the other nodes, and serves as the gateway to the outside network. This forms a master/slave relation and therefore a single level tree. When multiple such clusters are connected to form a typical campus-wide distributed system, the tree topology is further expanded to multiple levels. The throughput maximization problem is trivial for a tree structured system be cause the routing is fixed. A parent node simply pushes data to its children at the highest possible rate. More importantly, a tree structured system can be easily reduced 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to a single super-node, which represents the overall processing power of the whole tree. The procedure for such a reduction is fairly easy to derive and hence omitted here. The hierarchical system topology suggests that we only need to model the system at the back-bone level when evaluating its performance, not only because task alloca tion within one administrative organization is straightforward, but also because each administrative organization can be easily reduced to a single super-node. To reflect this consideration, instead of simulating hundreds or thousands of nodes, we limit the number of nodes in a system to 80 when conducting the simulations, although our protocol can be scaled to much larger systems. In the simulations, a graph is represented by its adjacency matrix A where each non-zero entry represents the bandwidth of the corresponding link cuv. Initially, all entries in A are set to 0. Then a randomly selected set of a ^ ’s are assigned values that are uniformly distributed between 0 and 1. c\a and c°“f are also uniformly dis tributed between 0 and 1. wu is uniformly distributed between 0 and wmax. Note that 1 / Wmax represents the average computation/communication ratio of a task. wmax > 1 represents a trivial scenario because the direct neighbors of the root node can con sume, statistically, all the tasks flowing out of the root node, and hence there is no need for other nodes to join the computation. The actual value of wmax depends on the application. For example, in SETI@home, it takes about 5 minutes to receive a task 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. through a modem and about 10 hours to compute a task on a current model home com puter [37], In our simulations, we used wmax = 0.1 and wrnax — 0.05, which represent an average computation/communication ratio of 10 and 20, respectively. The network latencies were ignored when simulating the transfer of tasks. This is a reasonable ap proximation because the system operates in a pipelined fashion after it starts up, and the latencies are well hidden by such a pipeline unless the volume of data per task is small. In order to evaluate the performance of the proposed task allocation protocol, we compare it against a greedy protocol, in which node u sends a task request to a ran domly chosen neighbor when the task buffer at u becomes empty. When multiple requests are received simultaneously, the request from the neighbor with the highest compute power is processed first. This protocol represents the first-come-first-serve approach, where a compute node gets a task whenever it becomes free. Hence the more powerful a compute node is, the more the number of tasks assigned to it. For this set of simulations, the parameters cuv, wu, c™ , and c°ut were assumed to be constant; threshold U, whose impact will be studied later, was temporarily set to 5. 1800 systems were simulated, 900 with wmax = 0.05 and 900 with wmax = 0.1; and there were 20 nodes in each of the systems. Initially, there were 2500 tasks on node s. The throughput of a system is calculated as 2500/ t au, where taii is the overall computation time of all tasks. The results in Figure 3.3 compare the performance of our task allocation protocol and the greedy protocol. In Figure 3.3, throughput of 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.4 ” ■ ------ O ur protocol ■ G reed y 0 .4 « ------O u r protocol ■ G reed y 2 0 0 400 6 00 800 2 00 4 0 0 6 0 0 800 E xperim ent (c l) I V f j i a x — 0 . 0 5 E x p erim en t ( b ) W m a x — 0 . 1 Figure 3.3: Performance of the proposed task allocation protocol with uniformly dis tributed system topologies. The x-axis represents the 900 experiments the two protocols has been normalized to the optimal throughput (calculated offline). The results are shown in the increasing order of the normalized throughput by using our task allocation protocol. As can be seen from these results, our task allocation protocol achieves a close-to-optimal system throughput. In most of the experiments, our protocol outperforms the greedy protocol. Throughput improvement by a factor of up to 2.4 was observed. For those experiments that the greedy protocol outperforms the proposed protocol, the performances of both protocols were close to the optimal, and the difference between these two was negligible. Figure 3.3 demonstrates the performance of the proposed method when the system topology are represented by a uniformly distributed random matrix. We also simulate more practical scenarios where compute resources are connected via the Internet. Em pirical studies [25] have shown that the Internet topologies exhibit power laws (e.g. out degree v.s. rank, number of nodes v.s. out degree, number of node pairs within 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.95 0.9 0.85 1 0.75 0.7 0.65 O ur protocol * G reedy 0.6, 800 200 400 Experim ents 600 1 « * ■ 0.95 0.85 0.8 1 0.75 0.7 0.65 — O ur protocol » G reed y 0 .6, 400 Experim ents 600 8 00 200 (&) Wmax — 0.05 (b) Wmax — 0.1 Figure 3.4: Performance of the proposed task allocation protocol with power law dis tributed system topologies. The x-axis represents the 900 experiments a neighborhood v.s. neighborhood size in hops, etc). In this set of simulations, the graph representations of the systems were generated using Brite, which is a tool de veloped in [44] that generates networks with power law characteristics. The values of cU v, c f, and c°ut were uniformly distributed between 0 and 1. wu was uniformly dis tributed between 0 and wmax. U was set to 5. s has 2500 tasks initially. We simulated 900 systems with wmax = 0.05 and another 900 systems with wmax = 0.1. Figure 3.4 compares the performance of our protocol against the greedy protocol. For these power law distributed systems, our protocol achieves close-to-optimal system throughput and outperforms the greedy protocol for most of the experiments. An interesting observa tion from Figure 3.4 is that the greedy protocol works better for power law distributed systems than for uniformly distributed systems. The task buffers are used to discretize the real-valued optimal task allocation gen erated by the Incremental Push-Relabel algorithm. In terms of storage requirement on 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the compute nodes, small buffers are preferred. However, larger buffer sizes enable a higher utilization of the network resources since task transfers do not have to be sus pended while waiting for the receiver nodes to clear space in their task buffers. In the second set of simulations, we study the impact of the buffer size on the performance of the system. To simplify the discussions, we assume that the task buffers at all the nodes have the same size U. If the size of the task buffers is set to U = 0 0 , then a good indicator of storage requirement is the maximum length of task buffer that is actually consumed. We sim ulated 200 systems each having 20 nodes and another 200 systems each having 40 nodes. wmax was set to 0.05. No changes occurred to the system during the simula tions, hence no adaptations were activated. For each system, there were 2500 tasks on root node s initially and all the nodes in the system have infinite task buffers. We monitored the maximum buffer consumed by each individual compute node and the results are summarized in Figure 3.5 in the form of histogram. The histogram in Fig ure 3.5 (a) is computed over all the 20-node systems (200 such systems in total), and the histogram in Figure 3.5 (b) is computed over all the 40-node systems (200 such systems in total). The results in figure 3.5 show that task buffers of size 4 can cover more than 99% of the cases. More interestingly, task buffers of size 1 is large enough for more than 80% of the cases. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. While the results in Figure 3.5 examines the maximum length of the task buffer consumed, the following simulation results illustrate the impact on the system perfor mance when the sizes of the task buffers are limited. We conducted simulations on the same 400 systems that were used to generate the results in Figure 3.5, but U was no longer set to infinity. For each system, we conducted two separated simulations, one with U = 5 and another with U — 1. Again no changes to the systems occurred during the simulations. The results are shown in Figure 3.6, in the increasing order of the normalized throughput of scenarios with U = 1. We can see that U = 5 leads to a higher, and closer to optimal, throughput than U — 1. We have observed in the simulations that further increasing the value of U did increase the system throughout. The benefit, however, becomes marginal as U gets larger. As claimed in Section 3.4 and 3.5, adaptivity is an important property of our task allocation protocol. This is illustrated in the next set of simulations. The system consists of 20 nodes. The adjacency matrix was initialized as discussed above. wmax = 0.1. However, during the course of the simulation, the network condition and the effective compute power of the nodes were altered at two time instances. At time instance t — 4000, the compute power of a selected set of compute nodes was reduced by 80%. Then at t = 8000, these compute nodes recovered their compute power, while at the same time, the compute power of another set of nodes was increased by 40% and the bandwidth of a selected set of links was increased by 30%. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In Figure 3.7, the optimal throughput was calculated offline. It indicates the max imum throughput that can be achieved by the system. The actual throughput for time instance t was calculated as (N(t + 75) — N(t — 75))/150, where N (r) is the number of tasks computed by the system from time 0 to time r. The size of this moving win dow, 150, is selected experimentally, as a trade-off between preserving the details and describing the trend. When t < 75, the actual throughput was calculated as N(t)/t. As illustrated in Figure 3.7, our task allocation adapts to the changes in the system and approaches the optimal throughput during the course of the computation. We sim ulated U = 1 and U = 5 for the threshold U on the buffer size. Our task allocation exhibits similar adaptivity for both values of U. When the system parameters changed at t = 4000 and t = 8000, the adaptation procedure was activated and the task allo cation was adapted. As can be seen, the system operates at (close to) the new optimal throughput, after the adaptation was completed. In Figure 3.7, at some time instances, the actual system throughput exceeds the optimum. This is because the size of the moving window is not wide enough. The impact of U is similar to the static scenario: U = 5 leads to a higher, and closer to optimal, throughput than U = 1. We have also observed that the benefit of further increasing the value of U, not surprisingly, becomes marginal as U gets larger. These results show that our task allocation does not need a large task buffer to adapt. We further simulated the impact of control message transfer cost (As previously explained, the costs of updating f(u, v), h(u), and e(u) on each node can be ignored 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. since each update requires only a few simple arithmetic operations.)- The results are shown in Figure 3.8. The system consists of 20 nodes. The adjacency matrix was initialized as before, with wmax = 0.1. At time t = 4000, the compute power of a selected set of nodes was increased by 25%. When calculating the actual throughput, the width of the moving window was experimentally set at 250. Let CPCM denote the cost (transfer time) per control message. We compared two scenarios: CPCM = 0.01 (units of time) and CPCM = 0.2. We use CPCM = 0.01 to simulate a scenario depicting a fast network and CPCM = 0.2 to simulate a scenario depicting a network with high latencies. The two values of CPCM were selected by considering the fact that cuv’s are uniformly distributed between 0 and 1, hence the average cost to transfer a task between two nodes is 2 units of time. In this experiment, the longest simple path leaving the root node consists of 3 hops. When CPCM = 0.01, the round trip time to transfer a control message along this path is 0.06 units of time. When CPCM = 0.2, the round trip time is 1.2 units of time. There are two different optimal allocations for the system, one before t = 4000, the other after t = 4000. In the simulations, we observed that when CPCM = 0.01, it took 19 units of time for the system to find the first optimal allocation, and 31 units of time to adapt to the new optimal allocation when the parameter changed at t = 4000. If CPCM = 0.2, it took the system 87 and 124 units of time to find the two optimal task allocations, respectively. We also observed that the search times (to find the optimal allocation) are independent of the value of U. Although the cost per control message 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. does affect the search time, the search time is insignificant considering the fact that the computation lasts much longer. Hence, the impact of transferring the control message on the system performance is negligible, as can be seen from Figure 3.8. Another observation from Figure 3.8 is that the transition time, starting at t = 4000, for the system to reach the new optimal throughput is much larger than the searching time for the new optimal task allocation. This is because the task buffers need to be filled and/or emptied for the system to operate on the new optimal allocation. 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1800 1600 1400 1200 1000 800 600 400 200 ° 0 1 2 3 4 5 6 7 8 9 10 M axim um length of c o n su m e d ta sk buffer (a) 20-node systems, 200 experiments 14000 12000 10000 8000 6000 4000 2000 ° 0 12 34 567 89 10 M aximum length of co n su m ed ta sk buffer (b) 80-node systems, 200 experiments Figure 3.5: Histogram of maximum length of consumed task buffer. The y — axis rep resents the frequency. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0.95 3 0.9 CL D > - 1 0 .8 5 ; .c • £ 0.8 0) .N « 0.75 - o 2 0.7- 0.65 (a) 20-node systems, 200 experiments 0.95 § 0.85 0.8 g 0.75 0.65 U=5 1 1 = 1 0 .61 100 Experiments 150 200 (b) 80-node systems, 200 experiments Figure 3.6: Impact of buffer size on the throughput of the system U=5 U=1 50 c 19 ° 150 200 Experiments 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.8 0.7 0.6 5 0.5 “ 0.4 I- 0.3 0.2 — Optimal A ctual 2000 400 0 6000 800 0 10000 12000 0.8 0.7 0.6 3 0.5 0.4 I - 0.3 0.2 — Optim al Actual 2000 4000 6000 8000 10000 12000 Time (a) U = 1 (b) U = 5 Figure 3.7: Adaptation to changes in the system 0.8 0.6 £ 0.4 0.2 O ptim al A ctual, CPCM =0.01 - - - A ctual, CPC M =0.2 8 000 2000 4 000 Tim e 6000 0.8 £ 0.4 0.2 O ptim al A ctual, CPCM =0.01 - - - A ctual, C PC M =0.2 4 000 Tim e 2000 6000 8000 (a) U = 1 (b) U = 5 Figure 3.8: Impact of control message transfer cost 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Maximum Data Gathering in Networked Sensor Systems In this chapter, we study data gathering problems in energy-constrained networked sensor systems. We study store-and-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem, which means that the algorithm developed in Chapter 3 can be applied. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. This approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 Introduction State-of-the-art sensors (e.g. Smart Dust [66]) are powered by batteries. Replenishing energy by replacing the batteries is infeasible since the sensors are typically deployed in harsh terrains. The sensors, which are usually unattended, need to operate over a long period of time after deployment. Energy efficiency is thus critical. Techniques ranging from low power hardware design [3, 52] and energy aware routing [31, 59] to application level optimizations [58, 69] have been proposed to improve energy effi ciency of networked sensor systems. An important application of networked sensor systems is to monitor the environ ment. Examples of such applications include vehicle tracking and classification in the battle field, patient health monitoring, pollution detection, etc. In these applications, a fundamental operation is to sense the environment and transmit the sensed data to the base station for further processing. In this chapter, we study energy efficient data gathering in networked sensor systems from an algorithmic perspective. Compared with sensing and computation, communication is the most expensive operation (in terms of energy consumption). Generally, data transfers are performed via multi-hop communications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low-lying antennae and near-ground channels typically found in networked sensor systems. Short-range communication also enables efficient spatial frequency 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. re-use. A challenging problem with multi-hop communications is the efficient transfer of data through the system when the sensors have energy constraints. Some variations of the problem have been studied recently. In [35], data gathering is assumed to be performed in rounds and each sensor can communicate (in a single hop) with the base station and all other sensors. The total number of rounds is then maximized under a given energy constraint on the sensors. In [50], a non-linear pro graming formulation is proposed to explore the trade-offs between energy consumed and the transmission rate. It models the radio transmission energy according to Shan non’s theorem. In [54], the data gathering problem is formulated as a linear programing problem and a 1 -I-a; approximation algorithm is proposed. This algorithm further leads to a distributed heuristic. This study departs from the above with respect to the problem definition as well as the solution technique. For short-range communications, the difference in the en ergy consumption between sending and receiving a data packet is almost negligible. We adopt the reasonable approximation that sending a data packet consumes the same amount of energy as receiving a data packet [2], The study in [50] and [54] differ entiate the energy dissipated for sending and receiving data. Although the resulting problem formulations are indeed more accurate than ours, the improvement in accu racy is marginal for short-range communications. In [35], each sensor generates exactly one data packet per round (a round corre sponds to the occurrence of an event in the environment) to be transmitted to the base 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. station. The system is assumed to be fully connected. The study in [35] also consid ers a very simple model of data aggregation where any sensor can aggregate all the received data packets into a single output data packet. In our system model, each sen sor communicates with a limited number of neighbors due to the short range of the communications, resulting in a general graph topology for the system. We study store- and-gather problems where data are locally stored on the sensors before the data gath ering starts, and continuous sensing and gathering problems that models time critical applications. A unified flow optimization formulation is developed for the two classes of problems. The focus in this chapter is to maximize the throughput or volume of data received by the base station. Such an optimization objective is abstracted from a wide range of applications in which the base station needs to gather as much information as possi ble. Some applications proposed for the networked sensor systems may have different optimization objectives. For example, the balanced data transfer problem [24] is for mulated as a linear programming problem where a ‘minimum achieved sense rate’ is set for every individual node. In [23], data gathering is considered in the context of energy balance. A distributed protocol is designed to ensure that the average energy dissipation per node is the same throughout the execution of the protocol. However, these issues are not the focus of this dissertation. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. By modeling the energy consumption associated with each send and receive op eration, we formulate the data gathering problem as a constrained network flow opti mization problem where each each node u is associated with a capacity constraint wu, so that the total amount of flow going through u (incoming plus outgoing flow) does not exceed wu. We show that such a formulation models a variety of data gathering problems (with energy constraint on the sensor nodes). The constrained flow problem reduces to the standard network flow problem, which means that the algorithm developed in Chapter 3 can be applied. The algorithm can be used to solve both store-and-gather problems and continuous sensing and gathering problems. For the continuous sensing and gathering problems, we develop a simple distributed protocol based on the algorithm. The performance of this protocol is stud ied through simulations. Because the store-and-gather problems are by nature off-line problems, we do not develop a distributed protocol for this class of problems. This chapter is organized as follows. The data gathering problems are discussed in Section 4.2. We show that these problems reduce to network flow problem with constraint on the vertices. In Section 4.3, we develop a mathematical formulation of the constrained network flow problem and show that it reduces to a standard network flow problem. A simple protocol based on the RIPR algorithm is presented in Section 4.4. Experimental results are presented in Section 4.5. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 Data Gathering with Energy Constraints 4.2.1 System Model Suppose a network of sensors is deployed over a region. The location of the sensors are fixed and known a priori. The system is represented by a graph G(V, E ), where V is the set of sensor nodes. (u,v) G E if u G V, v G V and u is within the communication range of v. The set of successors of u is denoted as cru = {v G V\(u,v) G E}. Similarly, the set of predecessors of u is denoted as tjju = {v G V\(v, u) G E}. The event is sensed by a subset of sensors Vc C V. r is the base station to which the sensed data are transmitted. Sensors V — Vc — {r} in the network does not sense the event but can relay the data sensed by Vc. Among the three categories (sensing, communication, and data processing) of power consumption, a sensor node typically spends most of its energy in data commu nication. This includes both data transmission and reception. Our energy model for the sensors is based on the first order radio model described in [32]. The energy consumed by sensor u to transmit a k — bit data packet to sensor v is Tuv = edec x k-\-£am p xd^x/c, where eeiec is the energy required for transceiver circuitry to process one bit of data, sa mp is the energy required per bit of data for transmitter amplifier, and duv is the dis tance between u and v. Transmitter amplifier is not needed by u to receive data and the energy consumed by u to receive a A ;— bit data packet is R u = £eiec x k. Typ ically, £dec = 50nJ/bit and eam p = 0.1nJ/bit/m 2. This effectively translates to 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sam p x d'iv Select especially when short transmission ranges (~ 1 m) are considered. For the discussion in the rest of this chapter, we adopt the approximation that Tuv = R u for (u.v) 6 E. We further assume that no data aggregation is performed during the transmission of the data. Communication link (it, v) has transmission bandwidth cuv. We do not require the communication links to be identical. Two communication links may have different transmission latencies and/or bandwidth. Symmetry is not required either. It may be the case that cuv ^ cuv. If (u , v) E, then we define cuv = 0. An energy budget B u is imposed on each sensor node u. We assume that there is no energy constraint on base station r. To simplify our discussions, we ignore the energy consumption of the sensors when sensing the environment. However, the rate at which sensor u G Vc can collect data from the environment is limited by the maximum sensing capability gu. We consider both store-and-gather problems and continuous sensing and gathering problems. For the store-and-gather problems, B u represents the total number of data packets that u can send and receive. For the continuous sensing and gathering problems, B v represents the total number of data packets that u can send and receive in one unit of time. 4.2.2 Store-and-Gather Problems In store-and-gather problems, the information from the environment is sensed (pos sibly over a long time period) and stored locally at the sensors. The data is then 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transferred to the base station during the data gathering stage. This represents those data-oriented applications (e.g. counting the occurrences of endangered birds in a par ticular region) where the environment changes slowly. There is typically no deadline (or the deadline is loose enough to be ignored) on the duration of data gathering for such problems, and we are not interested in the speed at which the data is gathered. But due to the energy constraint, not all the stored data can be gathered by the base station, and we want to maximize the amount of data gathered. For each u G Vc, we assume that u has stored du data packet before the data gathering starts. Let /(it, v) represent the number of data packets sent from u to v. For the simplified scenario where Vc contains a single node s, we have the follow ing problem formulation: Single Source Maximum Data Volume (SMaxDV) Problem: Given: A graph G(V, E). Source s G V and sink r G V . Each node u G V — { r } has energy budget B u. Find: A real valued function / : E — » • R Maximize: £ „ e (7 , 5 f(s, v) Subject to: f(u,v)> 0 for V (it, v) G E (1) T,ve*u v) + f (v’u ) - Bu for u £ V — { r } (2) E„e<r«/(“’u) = Eve*./(w > u) for u G V - {s, r} (3) 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bu is the energy budget of u. Since we have normalized both Tuv and R u to 1, the total number of data packets that can be sent and received by u is bounded from above by B u. Condition 2 above represents the energy constraint of the sensors. Sensors V — { s , r } do not generate sensed data, nor should they posses any data packets upon the completion of the data gathering. This is reflected in Condition 3 above. We do not model ds, the number of data packets stored at s before the data gathering starts. This is because ds is an obvious upper bound for the SMaxDV problem, and can be handled trivially. \VC \ > 1 represents the general scenario where the event is sensed by multiple sensors. This multi-source data gathering problem is formulated as follows: Multiple Source Maximum Data Volume (MMaxDV) Problem: Given: A graph G(V, E). The set of source nodes Vc C V and sink r G V. Each node u G V — { r } has energy budget B u. Each node v G Vc has dv data packets that are locally stored before the data gathering starts. Find: A real valued function / : E — ) ■ R Maximize: E ^ / M ) Subject to: o Al for V (u, v) G E (1) £«e<7. /(u> v) + f ( v ’ u ) ^ for u g 7 - {r} (2) T /vecJ(u^v) = T JV ^ J { v^u) for u G V — Vc — {r} (3) Ytveau / ( “ ’ v) ^ U) + du for u G Sc (4) 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Similar to the SMaxDV problem, the net flow out of the intermediate nodes (V' — Vc — {r}) is 0 in the MMaxDV problem, as is specified in Condition 3. For each source node u e Vc, the net flow out of u cannot exceed the number of data packets previously stored at u. This is specified in Condition 4. 4.2.3 Continuous Sensing and Gathering Problems The continuous sensing and gathering problems model those time critical applications that need to gather as much information as possible from the environment while the nodes are sensing. Examples of such applications include battle field surveillance, tar get tracking, etc. We want to maximize the total number of data packets that can be gathered by the base station r in one unit of time. We assume that the communications are scheduled by time/frequency division multiplexing or channel assignment tech niques. We consider the scenario in which B u is the maximum power consumption rate allowed by u. Let f ( u , v) denote the number of data packets sent from u to v in one unit of time. Similar to the store-and-gather problem, we have the following mathematical for mulation when Vc contains a single node s. Single Source Maximum Data Throughput (SMaxDT) Problem: Given: A graph G(V, E ). Source s G V and sink r e V. Each node u 6 V — {r} has energy budget B u. Each edge (u, v) G E has capacity cuv. Find: A real valued function / : E — » R 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Maximize: £ oeff. f(s, v) Subject to: 0 < f{u, v) < cuv for V (u,v) e E (1) / ( u > v) + T,ve^u /(u’ for u £ V — {r} (2) E«gffu /(«»w ) = £„e^„ / ( w> w) for u E V — {s, r} (3) The major difference between the SMaxDV and the SMaxDT problem is the con sideration of link capacities. In the SMaxDV problem, since there is no deadline for the data gathering, the primary factor that affects the maximum number of gathered data is the energy budgets of the sensors. But for the SMaxDT problem, the number of data packets that can be transferred over a link in one unit of time is not only affected by the energy budget, but also bounded from above by the capacity of that link, as is specified in Condition 1 above. For the SMaxDT problem, we do not model the im pact of gu because gu is an obvious upper bound of the throughput and can be handled trivially. Similarly, we can formulate the multiple source maximum data throughput prob lem as follows. Multiple Source Maximum Data Throughput (MMaxDT) Problem: Given: A graph G(V,E). The set of source nodes Vc C V and sink r € V. Each node u e V — {r} has energy budget Bu. Each edge (it, v) G E has capacity C-uv Find: A real valued function / : E — » R 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Maximize: f(s,v) Subject to: 0 < f ( u ,v ) < c uv for V (u,v) E E (1) £»e<r. / ( u »v) + £„€*« / ( u’u) ^ for w 6 y {r} (2) E u e C T u / ( “ . u) = £ „ e ^ / K u) ^ u E V - V c - {r} (3) £ u G C T u /(«> w ) ^ £ « e ^ . f ( v’u) + 9u for u e V c (4) Condition 4 in the above problem formulation takes into account the sensing capa bilities of the sensors. 4.3 Flow Maximization with Constraint on Vertices 4.3.1 Problem Reductions In this section, we present the formulation of the constrained flow maximization prob lem where the vertices have limited capacities (CFM problem). The CFM problem is an abstraction of the four problems discussed in Section 4.2. In the CFM problem, we are given a directed graph G(V,E) with vertex set V and edge set E. Vertex u has capacity constraint wu > 0. Edge (u, v) starts from vertex u, ends at vertex v, and has capacity constraint cuv > 0. If (u, v) £ E, we define cuv = 0. We distinguish two vertices in G, source s, and sink r. A flow in G is a real valued function f : E R that satisfies the following constraints: 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. 0 < f(u,v) < cu v for V (u,v) G E. This is the capacity constraint on edge (u,v). f(v,u) for V u G V — {.s, r}. This represents the flow conservation. The net amount of flow that goes through any of the vertices, except s and t, is zero. 3- Y,veau f ( u> v) + / K u) < ^ for V m G V. This is the capacity con straint of vertex u. The total amount of flow going through u cannot exceed wu. This condition differentiates the CFM problem from the standard network flow problem. The value of a flow / , denoted as |/|, is defined as |/ | = J2veas /(-s> v)^ which is the net flow that leaves s. In the CFM problem, we are given a graph with vertex and edge constraint, a source s, and a sink r, and we wish to find a flow with the maximum value. It is straight forward to show that the SMaxDV and the SMaxDT problems reduce to the CFM problem. By adding a hypothetical super source node, the MMaxDV and the MMaxDT problems can also be reduced to SMaxDV and SMaxDT, respectively. It can be shown that the CFM problem reduces to a standard network flow problem. Due to the existence of condition 1, condition 3 is equivalent to ^2v€l7u f(u, v) < wu/2 for u e V — {.s, r}. This means that the total amount of flow out of vertex u cannot exceed wu/2. Suppose we split u (u G V — {s, r}) into two nodes v,\ and u2, re-direct 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. all incoming links to u to arrive at u\ and all the outgoing links from u to leave from u2, and add a link from ui to u2 with capacity wu/2, then the vertex constraint wu is fully represented by the capacity of link (ui,u2). Actually, such a split transforms all the vertex constraints to the corresponding link capacities, and effectively reduces the CFM problem to a standard network flow problem. The CFM problem has been studied in [41] where a similar reduction can be found. The standard network flow problem is stated below: Given: graph G(V, E ), source node s G V, and sink node r G V. Link (u, v) has capacity cuv. Maximize: f(s, v) Subject to: 0 < f ( u ,v ) < c uv for V (u,v) E E (1) = Y /V ^ u f ( v’u) for u G F - {s,r} (2) 4.3.2 Relationship to Sensor Network Scenarios The vertex capacity wu in the CFM problem models the energy budget Bu of the sensor nodes. Bu does not have to be the total remaining energy of u. For example, when the remaining battery power of a sensor is lower than a particular level, the sensor may limit its contribution to the data gathering operation by setting a small value for Bu (so that this sensor still has enough energy for future operations). For another 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. example, if a sensor is deployed in a critical location so that it is utilized as a gateway to relay data packets to a group of sensors, then it may limit its energy budget for a particular data gathering operation, thereby conserving energy for future operations. These considerations can be captured by vertex capacity wu in the CFM problem. The edge capacity in the CFM problem models the communication rate (meaning ful for continuous sensing and gathering problems) between adjacent sensor nodes. The edge capacity captures the available communication bandwidth between two nodes, which may be less than the the maximum available rate. For example, a node may reduce its radio transmission power to save energy, resulting in a less than max imum communication rate. This capacity can also vary over time based on environ mental conditions. Our decentralized protocol results in an on-line algorithm for this scenario. Because energy efficiency is a key consideration, various techniques have been proposed to explore the trade-offs between processing/communication speed and en ergy consumption. This results in the continuous variation of the performance of the nodes. For example, the processing capabilities may change as a result of dynamic voltage scaling [45]. The data communication rate may change as a result of modula tion scaling [55]. As proposed by various studies on energy efficiency, it is necessary for sensors to maintain a power management scheme, which continuously monitors and adjusts the energy consumption and hence changes the computation and commu nication performance of the sensors. In data gathering problems, these energy related 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. adjustments translate to changes of parameters (node/link capacities) in the problem formulations. Determining the exact reasons and mechanisms behind such changes is beyond the scope of this chapter. Instead, we focus on the development of data gathering algorithms that can adapt to such changes. 4.4 Distributed Algorithm and Protocol To Maximize Data Gathering Because the four classes of data gathering problems all reduce to the maximum flow problem. The RIPR algorithm developed in Chapter 3 can be applied. In this sec tion, we present a simple on-line protocol for SMaxDT problem based on the RIPR algorithm. In this protocol, each node maintains a data buffer. Initially, all the data buffers are empty. The source node s senses the environment and fills its buffer continuously. At any time instance, let /3U denote the amount of buffer used by node u. Each node u G V operates as follows: 1. Contact the adjacent node(s) and execute the RIPR algorithm. 2. While /3U > 0, send message ‘request to send’ to all successors v of u s.t. f(u, v) > 0. If ‘clear to send’ is received from v, then set /?„■ <— /3U — 1 and send a data packet to v at rate /(it, v). (recall that f(u, v) is the flow rate at which data should be sent from u to v according to the RIPR algorithm.) 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Upon receiving ‘request to send’, u acknowledges ‘clear to send’ if /3U < U. Here U is a pre-set threshold that limits the maximum number of data packets a buffer can hold. For node s, it stops sensing if ds > U. The nodes execute the RIPR algorithm and find the rate f(u, v) for sending the data. Meanwhile, the nodes transfer the data according to the values of f{u, v), without waiting for the RIPR algorithm to termi nate. Two types of data are transferred in the system: the control messages that are used by the RIPR algorithm, and the sensed data themselves. The control messages are exchanged among the nodes to query and update the values of f{u , v) and hu when executing the RIPR algorithm. The control messages and the sensed data are transmit ted over the same links and higher priority is given to the control messages in case of a conflict. For the MMaxDT problem, the situation is a bit more complicated. Since the MMaxDT problem is reduced to the SMaxDT problem by adding a hypothetical super source node s', the RIPR algorithm needs to maintain the flow out of s' as well as the value of function h(s'). Additionally, the values of f(s',v ) (v e Vc) and h(s') are needed by all nodes u 6 Vc during the execution of the algorithm. Because s' is not an actual sensor, sensors in Vc therefore need to maintain a consistent image of s'. This requires some regional coordination among sensors in Vc and may require some extra cost to actually implement such a consistent image. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The SMaxDV and MMaxDV problems are by nature off-line problems and we do not develop online protocols for the two problems. 4.5 Experimental Results Simulations were conducted to illustrate the effectiveness of the RIPR algorithm and the data gathering protocol. For the sake of illustration, we present simulation results for the SMaxDT problem. The systems were generated by randomly scattering the sensor nodes in a unit square. The base station was located at the lower-left comer of the square. The source node was randomly chosen from the sensor nodes. Bu’s are uniformly distributed between 0 and Bmax. Brnax was set to 100. We assume a signal decaying factor of r~2. The flow capacity between sensor nodes u and v is determined by Shannon’s theorem as cuv = W log(l + P u v ^uv) where W is the bandwidth of the link, ruv is the distance between u and v, Puv is the transmission power on link (it, v), and r/ is the noise in the communication channel. In all the simulations, W was set to IK Hz, Puv was set to 10~3mW, and rj was set to 10~6ml'T. U was set to 2. Each data packet was assumed to contain 32 bytes. Each control message was assumed to be transferred in 1ms. The RIPR algorithm described in Section 4.4 adapts to every single change that occurs in the system. The adaptation is initiated by source node s, which increases h(s) by 2|V| and pushes flow to every node in as. However, the adaptation can be performed in batch mode, i.e. source node s initiates the adaptation after multiple 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. changes have occurred in the system. Since the proof of Theorem 3 does not utilize any information about the number of changes occurred, the correctness and complexity of the RIPR algorithm still holds even if the adaptation is performed in batch mode. We have observed that the RIPR algorithm always finds the optimal solution, regardless of the number of changes occurred before the adaptation is performed. The result in Figure 4.1 illustrates the cost of adaptation (in terms of the total number of basic operations) vs the number of changes occurred before the adaptation. In each experiment, a randomly generated system with 40 nodes was deployed in a unit square. After the system stabilized and found the optimal solution, the bandwidth of a certain number of links was changed. Then adaptation was performed and the system stabilized again (and found a new optimal solution) after executing a certain number of basic operations. For each experiment, we record the number of basic operations executed by the system to find the new optimal solution. Each data point in Figure 4.1 is averaged over 50 experiments. We can see that the required number of basic operations increases as the number of changes (per adaptation) increases. So far the performance of the RIPR algorithm is evaluated in terms of the total number of basic operations. We do not expect the individual nodes to execute the same number of operations since the RIPR algorithm is not designed for load balanc ing. But interestingly, the following simulation results show that the RIPR algorithm is pretty well-balanced in terms of the number of basic operations executed by different 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 700 6 00 ■ a 500 400 Z 300 100 5 0 100 N u m b er of c h a n g e s in th e sy ste m 150 Figure 4.1: Adaptation performed in batch mode nodes. For each experiment, a randomly generated system is initialized and the num ber of basic operations executed by the system to stabilize was recorded. The basic operations were re-classified into two categories: local updates and control message exchanges. Each push(u, v) operation consists of one local update at u, one message transfer (send) at u, and one message transfer (receive) at v. Each relabel(u) operation consists of one local update at u, one message transfer (broadcast h(u) to au) at u, and one message transfer (receive h(u)) at each v € ou. Figure 4.2 shows the number of local updates and control message executed/transferred by the nodes. We report the maximum and the mean number of local updates, and the maximum and mean num ber of control message exchanges. Each data point is averaged over 100 experiments. Figure 4.2 shows that the maximum number of local updates is only about 2 times the mean number of local updates. The maximum number of control message exchanges is also about 2 times the mean number of control message exchanges. This result shows that the RIPR algorithm is quite well-balanced in terms of per node cost. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 16000 — Max Ctrl M ean Ctrl - + - Max u p d ate M ean up d ate 14000 o 12000 q .1 0000 8000 i: 6000 § 4000 2000 50 100 N um ber of nodes 150 200 Figure 4.2: The maximum and mean cost per node for executing the RIPR algorithm The second set of simulation results illustrate the convergence and adaptivity of the proposed protocol. In each experiment, a certain number (between 40 and 100) of nodes were randomly deployed in the unit square. Communication radius ranging from 0.2 to 0.5 units were tested. For each experiment, the data gathering process lasted 30 seconds. The steady state throughput is calculated as the average throughput during the last 10 seconds of data gathering. Table 4.1 shows the steady state throughput of the protocol. The results have been normalized to the optimal throughput. The optimal throughput was calculated offline. Each data point in Table 4.1 is averaged over 50 systems. The results show that the steady state throughput of the proposed protocol approaches the optimal throughput, regardless of the number of nodes and the com munication radius. In the protocol, data is transferred when the RIPR algorithm is being executed. Hence the start-up time of the system needs to be evaluated from two aspects: the 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 4.1: Normalized Steady-state Throughput, r is the communication radius, n is the number of nodes._______________________________________ r = 0.2 r = 0.3 r = 0.4 r = 0.5 n = 40 0.9641 0.9557 0.9446 0.9423 n = 60 0.9317 0.9322 0.9208 0.9075 n = 80 0.9239 0.9262 0.9186 0.9315 n = 100 0.9264 0.9184 0.9247 0.9080 execution time of the RIPR algorithm (i.e. how fast the RIPR algorithm terminates), and the time for the data transfer to reach steady state throughput. For each experiment in the second set of simulations, we monitored the activities of each individual node. The termination of the RIPR algorithm was detected when none of the nodes needed to execute any of the basic operations. Note that such a global monitoring is made available in the simulations for performance analysis only. It may be very costly to implement this monitoring function in actual deployment. Let N(t) denote the number of data packets received by the base station from time 0 to time t. The instantaneous throughput at time instance t is defined as (N(t + 0.1) — iV(t — 0.1))/O.2. The start-up time of the protocol is defined as the time period for the instantaneous throughput to reach 85% of the steady-state throughput. The impact of the number of nodes and the communication radius on the execution time of the RIPR algorithm is shown in Figure 4.3. The execution time increases as the number of nodes increases. The execution time also increases as the communication radius increases, which leads to an increase in the number of links in the system. Such a trend is expected from Theorem 3. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. S3 100 0.2 0.3 0 .4 0 .5 4 0 N u m b e r of n o d e s C o m m u n ica tio n ra d iu s Figure 4.3: Execution time of the RIPR algorithm S 3 - - 2 100 0.2 0.3 6 0 0.4 0 .5 4 0 N u m b er of no d e: C o m m unication rad iu s Figure 4.4: Start-up time of the proposed protocol The start-up time of the protocol is shown in Figure 4.4. The result shows that for a given communication radius, the start-up time of the protocol increases as the number of nodes increases; and interestingly, for a given number of nodes, the start-up time decreases as the communication radius increases. Such a behavior is due to the fact that a larger communication radius leads to a smaller diameter of the graph. The diameter of a graph is defined as the largest distance (in terms of the number hops) between any two nodes in the graph. In systems with small diameter, the base station 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is closer to the source node. Hence the data can be transferred sooner to the base station during the start-up time. We have also observed that in some experiments, the system throughput reached steady state even before the RIPR algorithm terminated. This is not a contradiction. Actually, when such scenarios occurred, the RIPR algorithm was pushing excessive flow (node u is said to have excessive flow when e(u) > 0, i.e. when u has more incoming flow than outgoing flow) back to the source node. During this time period, the RIPR algorithm was still executing, but the net flow from the source to the sink did not increase. In other words, the RIPR algorithm had already found the maximum flow if the excessive flow had been eliminated. Meanwhile, data was transferred when the RIPR algorithm was still executing. Because each node maintained a data buffer which prevented the node from accumulating excessive data, the excessive flow did not cause the nodes to accumulate data. Consequently, the protocol was able to drive the system throughput to steady state before the RIPR algorithm terminated. The above results illustrate the behavior of the the protocol and the RIPR algorithm. Awareness of such behaviors is useful for system synthesis. For example, in order to reduce the start-up time of the protocol, we can deploy the nodes so that they can reach the sink in a small number of hops. To reduce the cost (both time and energy) of executing the RIPR algorithm, we can restrict the communication of each node to a subset of its neighbors (thereby reducing l-E)). 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. aio 3 0 4 0 20 tim e ( s e c ) Figure 4.5: Illustration of the start-up and the adaptation of the proposed protocol. Framed block (a) is zoomed in figure 4.6(a), framed block (b) is zoomed in fig ure 4.6(b). Note that the observed execution time of the RIPR algorithm (less than 1.3 seconds) and the start-up time of the protocol (less than 4.3 seconds) depends on the bandwidth settings of the links. In our simulations, the bandwidth of the links is around 10kbps, which is around 40 data packets per second because each data packet is 32 bytes. The shortest path (in terms of the transfer time of one data packet) from the source node to the base station ranges from 0.05 seconds to 0.13 seconds. The execution and start up time will be much shorter if the links have higher bandwidth. For example, if the system is built with Telos [18] wireless sensors that can communicate at 250 kbps, we can expect about 20 times speed up in both the execution time of RIPR algorithm and the start-up time of the protocol. Adaptivity of the proposed protocol is shown in Figure 4.5. The system consisted of 40 nodes randomly deployed in the unit square. Communication radius was set 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to 0.4. The system activities during the first 40 seconds are shown. At time t = 20 sec, we changed the bandwidth of a randomly selected set of links, each of which was increased by 100%. Consequently, the optimal throughput (calculated off-line) changed from 314 to 492 (data packets/sec). As such changes occurred, the adaptation procedure was activated and the system operated at a new steady state throughput after the adaptation was completed. Figure 4.5 shows the number of data packets received by the base station as time advances. The throughput actually achieved by the protocol is reflected by the slope of the curve, which is 293 (93% of the optimal) before t = 20 and 452 thereafter (92% of the optimal). For this experiment, we define the start-up time as the time period for the instantaneous throughput to reach 85% of the first steady state throughput 293, starting from t = 0; and the adaptation time as the time period for the instantaneous throughput to reach 85% of the second steady state throughput 452, starting from t = 20. In this experiment, the shortest path (in terms of overall transfer time) to send a data packet (from the source node) to the base station consists of 3 hops and requires 0.06 sec. By using our protocol, the first data packet was received by the base station 0.12 seconds after the system started; the start-up time is 1.13 seconds; and the adaptation time is 1.4 seconds. The system activities during the start-up and adaptation period are shown in more detail in Figure 4.6. An important observation from Figure 4.6 is that the system started (at t = 0) and continued (at t = 20) to gather data while the RIPR algorithm was still executing. The system did not wait until the optimal solution was found. Actually, because the protocol was executed 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in a distributed fashion, none of the nodes would know the completion of the RIPR algorithm unless a global synchronization was performed. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. xld3 0.8 0.6 0 .4 0.2 5 0 1 2 3 4 6 tim e (sec) (a) 0 < t < 6, the framed block is zoomed in (c) x 1 ( f 6.8 6.6 5 .8 5 .6 5 .4 5 .2 1 8 1 9 21 22 20 2 3 tim e (sec) (b) 17.5 < t < 23.5, the framed block is zoomed in (d) x 1 t f 0 .0 4 0 .0 3 5 0 .0 3 S .0 .0 2 5 P 0 .0 2 0.01 0 .0 0 5 0 0.2 0 .3 0 .4 0 .5 0.1 tim e ( s e c ) (c) 0 < t < 0.5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 In-network Processing in Networked Sensor Systems As mentioned in Chapter 4, many applications envisioned for the networked sensor systems are to detect and monitor events in the environment. Typically, these applica tions require processing of the raw data collected by the sensors. This can be achieved by gathering all the sensed data to a powerful base station for further processing. Data gathering has been studied intensively from various aspects such as data aggregation [36] and energy efficiency [16,42]. A subset of the data gathering problems have been studied in Chapter 4. Alternatively, the sensors can process the raw data and transmit the processing re sults. The trade-offs between communication and computation energy [51, 60] have shown that in-network processing of the sensed data is more energy efficient than trans ferring the raw data to a powerful base station for processing. In-network processing leads to prolonged lifetime of the system, which is one of the most critical factors in the design and deployment of networked sensor systems. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In-network processing has been studied extensively from various perspectives. In [39], a hierarchical architecture is proposed to organize the heterogeneous nodes according to their computation capabilities. This hierarchy facilitates the partitioning of tasks into different sub-tasks so that they can be mapped onto heterogeneous nodes. The study in [19] focuses on the development of security models for in-network pro cessing. The major concern of this study is to set up trust between aggregators and sensors. Some other works consider the systems as distributed databases where the data is stored/collected by the sensors and in-networking processing is used to retrieve data from as well as disseminate commands to the sensors. For example, the placement of aggregators and filters (which execute data queries) is studied in [12] to minimize the overall communication cost. The TinyDB [43] and the Cougar [13] projects offer powerful database tools that support efficient in-network query processing. This chapter is organized as follows. Section 5.2 presents the system model and the formal problem statement. We also show that the throughput of in-network process ing again reduces to network flow problems, which means that the RIPR algorithm developed in Chapter 3 can be applied. A simple in-network processing protocol is developed in Section 5.3. Experimental results are shown in Section 5.4. Section 5.5 studies the performance of path based greedy heuristics. 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1 Introduction In this chapter, we study the performance of those applications that require in-network processing of raw data blocks. The event is assumed to be detected and sensed by a subset of the nodes, which we call the source nodes. Other nodes can receive and relay the data blocks. At the same time, all the nodes can process the data blocks. Compared with previous studies that focus on the development of infrastructures for in-network processing, we study this problem from an algorithm perspective. The goal is to maximize the throughput of the system, i.e. the total number of data blocks that can be processed by the system in one unit of time. Such an in-network processing problem models a wide range of practical appli cations. For example, a basic function in many environment monitoring applications is to classify an event after sensing it. The sensed data is often divided into blocks (e.g. a block may consist of the acoustic data collected by a node within one second, while the overall data collection process may last several minutes.) and each individ ual block is processed independently. Data fusion is then applied at the base station to combine these individual classification results [67]. Studies have shown that the overall accuracy rate of the final result increases monotonically with the number of classification-making individuals (sensors in our case) as well as the number of data blocks. If the base station is required to perform a classification with certain accuracy rate and within a given time period, then the networked sensor system needs to collect 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. enough data blocks, make decisions for these data blocks, and transfer these decisions to the base station, all within the given time period. By modeling the processing of data blocks as a special type of data flow, we can reduce the in-network processing problem to a network flow optimization problem, for which the algorithm described in Chapter3 can be used to coordinate the sensors in a distributed and adaptive fashion. We further develop a simple distributed proto col for in-network processing. The performance of this protocol is studied through simulations and system throughput upto 95% of the optimal was observed. Note that the proposed work does not directly model energy consumption. Performance of the sensors are characterized by their processing capabilities and communication rates. These parameters are assumed to be continuously adjusted by some application level power management scheme. Instead of controlling the energy consumption directly, our objective is to develop an algorithm that can adapt to such energy-related changes. Path based heuristics can be used as alternative methods for in-network process ing. In these heuristics, the nodes determine some paths to transfer the data, based on locally available information. Examples of such heuristics include shortest path, min imum latency path, etc. These heuristics are easy to implement. However, we show that these heuristics can have very poor performance. 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Problem Statement and Reduction to Flow Optimization Problem The sensor nodes are assumed to be connected via an arbitrary topology and the net work is represented by a graph G(V,E). Each node u G V represents a sensor. The weight of u is denoted by wu. wu represents the processing power of node u, i.e. u can perform one unit of computation in 1 /wu time. Each edge Euv E E in the graph represents a network link. The capacity of Euv is denoted by cuv. Link Euv can there fore transfer one unit of data from u to v in 1 / cuv time. The links are uni-directional, so G is directed and in general Euv ± Evu. In the rest of the chapter, ‘edge’ and ‘link’ are interchangeably used. We assume that the communications are scheduled by time/frequency division multiplexing or channel assignment techniques such as the one in [11]. The successors of u in G is defined as ou = {v G V\Euv < E E} and the predeces sors of u in G is defined as r tpu — {v < E V\Evu e E}. Sc is the set of source nodes that collect data by sensing the environment. Node u 6 Sc can collect data from the environment at a rate no more than du. du is assumed to be larger than wu, because otherwise u can process all the data it collects and the problem can be solved trivially. t is the base station that eventually receives all the results of processing, t is called the sink node in G. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Without loss of generality, we assume that each data block consists of one unit of data and requires one unit of computation to process. A data block is an atomic logical unit for processing. It may consist of multiple data packets but the complete data block must be received by a node before it can be processed by the node. The processing of various data blocks are independent of each other. Let f(u ,v ) denote the number of data blocks transferred from u to v in one unit of time. For notational convenience, if edge Euv ^ E, we define cuv = 0; if the actual data transfer is from u to v, we define f(v,u) = —f(u, v). With these two definitions, if neither Euv nor Evu belongs to E, then cuv — cvu = 0, which implies that f(u, v) = f(v, u) — 0. In this way, we can define f(u, v ) over V x V, rather than being restricted to E. f(v, u) = —f(u, v) also allows us to compute the total number of data blocks transferred to u as / ( u> u)> which equals Ylvev f ( viu) since f(u, v) = f(v, u) = 0 if Euv E and Evu E. The maximization of the system throughput is mathematically formulated as follows: Throughput Maximization for In-network Processing (TMIP) Given: Graph G(V, E) with the set of source nodes Sc. Node u e V has processing capability wu and data collection capability du. Euv 6 E has capacity cuv. cuv = 0 if Euv E. Maximize: £ ueSc wu + E ueSc E .e v / ( « , v) Subject to: Y lveVf ( v’u) ^ wu for u e V - S c 1 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Y,vevf(u’v) < du~Wu for V itG Sc 2 f{u,v) < cuv fo rV aG V ,i)G V 3 f(u,v) = —f(v,u) for V u eV , u eV 4 The TMIP problem maximizes the overall number of data blocks that can be pro cessed in one unit of time by the source nodes (XXesc wu) and the other nodes in the system (% 2uesc XX ev f ( u- > v))- Since XXesc Wu is Just an additive constant to the opti mization objective, it can be omitted without affecting the optimal solution. Constraint 1 requires that no intermediate node should receive more data blocks than it can pro cess; source node u e Sc can collect data at maximum rate du and process the data at rate wu, hence the rate at which data can flow out of u cannot exceed du — wu, as is specified in constraint 2; constraint 3 represents the capacity constraints of the links. A feasible solution / to the above problem represents a valid steady-state flow of data blocks from Sc to the other nodes (where the data blocks are processed). Because the processing results of each data block consist of a very small number of bits (e.g. 1 bit in binary classification problems), we ignore the cost of transferring the processing results to sink node t in the problem formulation. In the TMIP problem, data blocks are initially generated by the source nodes. All the the other sensors face the same questions upon receipt of the data blocks: should the data blocks be relayed to other sensors or processed locally? If the data blocks are processed locally, these data blocks will be discarded after the processing as they will be replaced by the processing results (of very small sizes and hence not considered in 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. our problem formulation). Then the question becomes: how many data blocks should be processed locally? If the data blocks are to be relayed to other nodes for processing, then to which nodes should the data be relayed? The objective of the TMIP problem is to answer these questions for each sensor such that the overall system throughput is maximized. We can see that the system throughput in TMIP is the sum of Sc’s processing capa bilities and the rate with which data blocks flow out of Sc. After the data blocks flow out of Sc, they will be transferred in the system and finally be consumed (processed) by some nodes. If we model these data consumptions as a special type of data flow to a hypothetical node, then the throughput of the system is solely defined by the rate with which data blocks flow out of Sc. Given a TMIP problem with G(V,E) as the input graph and Sc C V as the source nodes, it is transformed to a standard network flow maximization problem in a new graph G' using the following procedure: Procedure 1: 1. For each node u G V, create a node u' in G'. Add a pseudo source s' and a pseudo sink t' to G'. 2. For each link (it, v) € E, add a link (it', v') to G' with c„v = cuv. 3. For each node u e V — Sc, add a link (it', t') to G' with cu> ti = wu. 4. For each node u e Sc, add a link (s', it') to G" with ca > u> = du — wu 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We have the following flow optimization problem based on the above procedure. (To simplify the notations, we have omitted the superscripts of the vertices.) Problem 1: Given: Graph G(V,E), source s G V and sink t e V. Edge (u,v) e E has capacity cuv. Maximize: E uey / 0 > w) Subject to: E „ ey / K w) = 0 foru G V - {s,t} 1 /(it, v) < cuv for V u G V, v € V 2 f(u , v) = —f(v, u) for V u € V, v e V 3 If an instance of the TMIP Problem has G as the input graph and Sc as the source nodes, we denote it as TMIP(G, Sc). If an instance of Problem 1 has G as the input graph, s as the root, and t as the sink, we denote it as Problem 1 (G, s, t). We use W t(G , Sc) to represent the maximum throughput for TMIP(G, Sc). We use Wi (G, s, t) to represent the maximum throughput for Problem 1 (G, s,t). The next proposition shows that the TMIP Problem is a special case of Problem 1. Proposition 1: Suppose TMIP(G, Sc) is converted to Problem 1 (G1 , s', t') using Pro cedure 1, then W r (G ,S c) = W 1( G ', s ', 0 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) A networked sensor system (b) The corresponding network flow rep resentation Figure 5.1: Reduction of TMIP to a network flow problem. Sensor nodes are denoted by circles. The square in (a) denotes the event of interest. Dotted lines in (a) represent the collection of data from the environment. The upper square in (b) denotes the newly added pseudo source s'. The lower square in (b) denotes the pseudo sink t'. Weight of the nodes and links are omitted in this figure. Proof: We use the notation in Procedure 1 to denote the nodes and edges in G and their corresponding nodes and edges in G'. Suppose f : V x V ^ R is a feasible solution for TMIP(G, Sc). We map it to a feasible solution f : V‘ x V‘ ' — > R for Problem 1 (G1 , s' , t') as follows: 1. initialize f(u', v') = 0 for V u', v' G V '. 2. if f(u, v) > 0, then set v') = f(u, u). 3. for u < E V, if Ylvev f( viu) ^ °»then set f '(u'> = T,vev f ( v> u) 4. for u G Sc, if J^vev f (u > v) > °»then set u') = E*,ev / ( u> v) 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It is easy to verify that such an f' is a feasible solution for Problem 1 (G", s', t') and that f leads to the same throughput as /. Suppose / : V x V' — » • R is a feasible solution for Problem 1 (G ',s',t'). We map it to a feasible solution f : V x V R for TMIP(G, s) simply as follows: for V u, v G V, set f(u, v) = v'). It is also easy to verify that such an / is a feasible solution for TMIP(G, Sc) and that it has the same throughput as f . □ Figure 5.1 illustrates a networked sensor system and the corresponding network flow representation after applying Procedure 1. Note that wu and cuv in the TMIP problem represents the processing capability and communication bandwidth of the sensors. The actual value of wu is determined by various factors such as the clock frequency, the supply voltage, the specific design of the circuitry, and the complexity of the algorithms for processing. The actual value of cuv is also determined by multiple factors such as the radio transmission power, the rate of signal decaying, distance between the sender and the receiver, etc. Since energy efficiency is a key consideration of networked sensor systems, trade-offs be tween computation/communication speed and energy have been explored extensively. For example, dynamic voltage scaling and frequency scaling techniques save energy by reducing the supply voltage and clock frequency of the sensor nodes, at the cost of slower processing speeds. The modulation scaling reduces the radio transmission power, however, at the cost of lower data communication rate. Additionally, these 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scaling techniques can be activated on-the-fly based on the workload and remaining energy of the sensors. The fact that wu and cuv are under continuous real-time adjust ment translates to the run-time variations of the link capacities in Problem 1. 5.3 On-line Protocol for In-network Processing The RIPR algorithm developed in Chapter 3 can be used to implement an on-line protocol for in-network processing. In this protocol, each node maintains a data buffer. Initially, all the data buffers are empty. Buffers at the source node u G Sc are filled at rate du. Let j3u denote the length of the used buffer at it. At any time instance, each node tiG k operates as follows: 1. Contact the adjacent node(s) and execute the RIPR algorithm. (Reduction from the TMIP problem to the relaxed flow maximization problem is straight forward and omitted here.) 2. If /3 U > 0 and u is not processing any data blocks, remove one data block from the data buffer and process it. 3. While f3 u > 0 and u is processing a data block, send message ‘request to send’ to V w G < 7 U if /(it, v) > 0. If ‘clear to send’ is received from v, then set /3U G- pu — 1 and send a data block to v. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Upon receiving ‘request to send’, u acknowledges ‘clear to send’ if Bu < U. u acknowledges a denial if j3u > U. Here U is a pre-set threshold that limits the maximum number of data blocks a buffer can hold. 5. If u E Sc and (3 U > U, stop sensing the environment until /3 U < U. Because the pseudo source node s is not an actual sensor, the nodes in Sc need to maintain a consistent image for s. This can be implemented by first electing a leader from Sc. Observe that h(s) is the only variable that all nodes in Sc share. The leader then maintains h(s) and broadcasts h(s) to nodes in Sc whenever h(s) changes. Such regional cooperations will cause some extra cost. However, since leader election can be implemented efficiently [47], such extra cost is minimum when compared with the push and relabel operations executed by the protocol. 5.4 Experimental Results Simulation study of the proposed protocol is conducted using the PARSEC [4] soft ware package. 5.4.1 Simulation Setup The simulated networked sensor system was generated by randomly scattering 20 - 80 sensor nodes in unit square. The base station was located at the lower-left comer of the square. The event of interest was randomly dropped into the square. Nodes within 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.2 units of distance from the event are assumed to sense the event. Each data block is 32 bytes. Radio transmission range of the nodes was set to 0.2, i.e. nodes within 0.2 units of distance can communicate with each other. Assuming a signal decaying factor of r~2, the flow capacity between sensor nodes u and v is determined by Shannon’s theorem as cuv = W log(l + ^ ^ ) V where W is the bandwidth of the link, ruv is the distance between u and v, Puv is the transmission power on link (u, v), and p is the noise in the communication channel. In all the simulations, W was set to IK H z and r] was set to 10 ~6m W . wu are uniformly distributed between 0 and wmax. du is distributed between 0 and dmax. Transmission time of a control message is assumed to be 1ms. Because we consider the scenario where the processing results of each data block consist of a very small number of bits, we ignore the cost of transferring the processing results to sink node in the simulations. 5.4.2 Summary of Results Our first set of simulations examine the convergence of our in-network processing protocol. For this set of simulations, the transmission power of all links was set to a constant value Puv = 10~3mW . The in-networking processing lasted 30 seconds for each simulation. Let N (t) denote the total number of data blocks processed by the system from time 0 to time t. The raw throughput is calculated as pa = iV(30)/30. The steady state throughput is calculated as ps = 2 1 _ The instantaneous 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.1: Normalized Raw Throughput n= 2 0 n=40 n=60 o oo I I c U = 1 0.9452 0.9421 0.9472 0.9516 U = 5 0.9591 0.9366 0.9440 0.9535 U = 10 0.9452 0.9372 0.9349 0.9531 Table 5.2: Normalized Steady-state Throughput n= 2 0 n=40 n=60 n=80 U = 1 0.9493 0.9456 0.9529 0.9634 U = 5 0.9639 0.9395 0.9482 0.9628 o T — 1 I I 0.9489 0.9412 0.9374 0.9640 throughput at time t is approximated as pr (t) = _ The start up time is defined as r = argm int(pr(i) > 0.85ps), which indicates the convergence speed of our in-network processing protocol. wmax was set to 1 0 . dmax was set to 1 0 0 . Buffer sizes of U = 1,5,10 were used. The simulation results are listed in Ta ble 5.1 - 5.3, where each data point is an average over 200 experiments. The values of pa and ps have been normalized to the optimal system throughput, which was calcu lated off-line. These simulation results show that our in-network processing protocol achieves around 95% of the optimal throughput. Our protocol is insensitive to the buffer size limit. Actually, reducing the buffer size from 10 to 1 does not cause noticeable degra dation in the system throughput. As can be seen from the results, the number of sensor nodes does not have noticeable impact on the system throughput. We define the time distance between any two nodes as the length (in terms of trans fer time of a data block) of the shortest path between the two nodes, and the diameter 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.3: Start up time o C N I I C i n=40 n=60 o 00 I I U = 1 0.2965s 0.2800s 0.3127s 0.3671s U = 5 0.3067s 0.2790s 0.3432s 0.3797s U = 1 0 0.2902s 0.2793s 0.3138s 0.3692s of a system as the longest time distance between any two nodes. The average sys tem diameter in the simulations is 0.105s. By using our protocol, the average system start-up time is around 0.3s. The adaptivity of our protocol has been verified by modifying the sensing, process ing and communication capabilities of the sensors while in-network processing is be ing performed. The simulation settings is the same as before except that we randomly chose 20% of the communication links and increase their bandwidth each by 50% during the simulations. When such changes occurred in the system, the adaptation procedure was activated and in-network processing was adapted. We have observed that the system operated at close to (about 95%) the new optimal throughput after the adaptation was completed. The results in table 5.4 show the adaptation time r a of our protocol, which is defined as follows: suppose a set of changes occur at time instance t0 and the the steady state throughput after the adaptation is p's, then Ta = argmin(0.85 < pr(t)/p's < 1.15) — t0 t> to Intuitively, ra is the time for the system to achieve 85% (when the new steady state throughput is higher than the original throughput) to 115% (when the new steady state 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.4: Adaptation time n= 2 0 n=40 n=60 n=80 U = 1 0.3078s 0.2907s 0.2656s 0.2186s U = 5 0.2678s 0.2826s 0.2442s 0 .2 1 0 2 s U = 1 0 0.2679s 0.3506s 0.2998s 0.2508s throughput is lower than the original throughput) of new steady state. The results in Table 5.4 show that the adaptation time is around 0.3s, roughly the same as the start up time shown in Table 5.3. However, the adaptation time does not increase as the number of nodes increases. This is possibly caused by the following facts: when the system has a large number of nodes, a subset of the nodes are already capable of processing all the data generated by the source nodes. If the performance of those not-in-use nodes was changed, then our algorithm would not be activated and the system simply continues to operate as if no changes have occurred. This suggests that sensing capabilities may be the performance bottleneck as the number of nodes increases. The results in Table 5.4 also show that our protocol is insensitive to buffer size. The next set of simulations study the impact of wmax. With W, rj, and Puv fixed, wniax represents the relative compute power of the nodes and hence the average com munication/computation ratio of the data blocks. We simulated systems with 20 and 80 nodes. For each system size, we evaluated the performance of our protocol with wmax ranging from 5 to 50. The results are shown in Figure 5.2, where each data point is an average over 500 simulations. The results have been normalized to the optimal system throughput. This optimal throughput was calculated off-line. We can see that the num ber of nodes does not have any noticeable impact on the system throughput. When 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. wmax becomes larger, our protocol achieves a closer to optimal throughput. However, the improvement in throughput is marginal as wmax increases. This suggests that com munication bandwidth is a more important factor for improving system throughput than processing capabilities of the nodes. „ 0 .9 8 o 0.96 = 1 0 .9 4 0 .9 2 — 20-node 8 0 - n o d e 0.9, 20 3 0 4 0 w max Figure 5.2: Impact of wmax on system throughput 5.5 Performance Comparison An alternative method for in-network processing problem is to transfer data blocks to neighbors that can process the data. This heuristic attempts to maximize system throughput by pushing data blocks from the source (where the data is sensed) towards the sink (where the data is processed) along some paths. Such path based greedy heuristics are widely used for many data routing problems since they are easy to im plement [2]. Actual choice of the path (shortest path, minimum latency path, etc) is application specific. But a common property of such heuristics is that the path is deter mined by the nodes based on some locally available information. An example of path 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. based greedy heuristics is directed diffusion [34] that offers a solution to a class of data gathering problems. In [34], the sink node notifies its interests in the data. While the interest in propagated throughput the system, each node locally determines a gradient that specifies which neighbor the data should be sent to. This gradient is then used to establish a path from the source nodes to the sink. Generally, path based greedy heuristics consist of the following four steps in the context of the TMIP problem: (1) Transform the TMIP problem to its network flow representation by applying the procedure specified in Section 5.2. (2) Find a certain path p from the source s to the sink t. We define the capacity of the path cp as minimum capacity of all the edges on this path. (3) Push cp units of flow along path p and reduce the capacity of all the edges on path p by cp. (4) Repeat step 2 and 3 until there does not exist any path from s tot. Note that the heuristic is applied to the network flow representation of the TMIP problem. Sink t is a pseudo node representing the processing of the data. If a certain amount of flow is pushed along a path to t, it actually means that the data is transferred along this path and then processed by the last node of this path. There can be a wide range of choices to find a path in step 2 above. It can be the shortest path, the minimum latency path, or just a randomly chosen path. For the sake of illustration, suppose step 2 of the above heuristic uses a randomly chosen path, the heuristic can be approximated by the following simple distributed protocol: (1) Every node maintains a data buffer, which has a predetermined size limit. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2) The source nodes keep sensing the environment and load their data buffers until the buffer becomes full. (3) At each node, as long as the data buffer is not empty, the node removes a data block from its buffer and processes the data block. (4) At each node, as long as the data buffer is not empty, the node sends a data block to a neighbor if the neighbor has less data blocks in its buffer. To determine if a node u has more data blocks than its neighbor v, u can send a query ‘request the number of data blocks’ to v and wait for the response. Or, v can broadcast the number of data blocks in its buffer whenever it changes. It is possible that two nodes u\ and u2 query a common neighbor v at the same time. u\ then sends a data block to v, increasing the number of data blocks at v by 1. At this time instance, the knowledge u2 has about v is stale. If u2 makes any decision based on this stale knowledge, then u2 may not be following the above protocol precisely. Such consistency issues can be solved via some low level handshaking mechanism. For example, we can enforce that ‘query the number of data blocks’ and ‘send the data block’ be executed together as a single atomic operation. Or, v can delay its response to u2 in the first place, and wait until ui has finished its operations. Designing details for this protocol is beyond the scope of this chapter. But it is clear that the spirit of the above protocol is to simply move data from the source to the sensors (where the data is processed, i.e. the pseudo sink) along paths based on local information. Although easy to implement, the greedy heuristic cannot guarantee the optimality of the solution when applied to the TMIP problem. Actually, the performance of the 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.3: An example illustrating the poor performance of path based greedy heuris tic greedy heuristic can be arbitrarily bad in the worst case. This is illustrated using the example in Figure 5.3. For the sake of illustration, the problem reduction procedure is skipped and the TMIP problem is shown in its network flow representation in Figure 5.3. Edges (n5, t), (nio, t), and (7745, t) represent data processing in the TMIP problem. Capacities of the edges are marked on the edges. Suppose the path that the greedy heuristic first choses is p = s — y ni — y n7 — y n 8 — > ■ n i5 — y t. 1 0 units of flow can be pushed via path p. Then no more flow can be pushed from s to t since there does not exist any available path from s to t. However, the system can actually achieve 30 units of flow because 1 0 units of flow can be pushed along each of the following three paths px = s — y Ti\ — y n 2 — y n 3 — y 7 7 . 4 — y n . 5 — y t, P2 — s — y n§ — y 7 7 , 7 — y ng — y 7 7 9 — y n X o —y t, p 3 = s — y nn — > ■ n \2 — > 77i3 -> n 14 — y 7715 — > t. In this example, the greedy heuristic achieves 1/3 of the optimal solution. Note that we can insert multiple copies of path 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.5: Normalized raw throughput of the greedy heuristic n= 2 0 n=40 n=60 n=80 U = 1 0.9612s 0.8423s 0.8041s 0.7377s U = 5 0.9604s 0.8348s 0.8021s 0.7342s o I I 0.9589s 0.8303s 0.7976s 0.7275s p2 into the system. This will lead to an arbitrarily bad worst case performance of the greedy heuristic. We have shown that choosing paths randomly can lead to very poor performance. It can be shown that for other path based greedy heuristics, there also exist instances in which the system performance is arbitrarily bad. The above example can be gener alized to show the following: Theorem 4. Given any real number e > 0, there exists instances of the TMIP problem in which the throughput of the path based greedy heuristic is less than e times the optimal throughput. To illustrated the non-optimality of path based heuristics, simulations were con ducted using the same settings as in section 5.4.1. The heuristic uses a randomly chosen path and is approximated as described above. The transmission power of all links was set to a constant value Puv = 10~3m W . The in-networking processing lasted 30 seconds for each simulation. wmax was set to 10. dmax was set to 100. We tested buffer sizes of U = 1 ,U = 5, and U = 10. The raw throughput of the greedy heuristic (normalized to the optimal throughput) are listed in Table 5.5, where each data point is averaged over 200 experiments. The results show that the performance of the path 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. based greedy heuristic is insensitive to the buffer size. Compared with the results in Table 5.1, the path based greedy heuristic has good performance (around 96%) when the network has only 20 nodes. However, as the number of nodes increases, the perfor mance of the greedy heuristic decreases (down to 72% when the system has 80 nodes). In summary, the results in Table 5.5 show that the scalability of the greedy heuristic is limited. 5.6 Discussion In this chapter, we considered the problem of in-network processing in networked sensor systems. After reducing the problem to its network flow representation, we developed a decentralized adaptive algorithm to maximize the system throughput. This algorithm was further implemented as an on-line protocol. System throughput upto 95% of the optimal was observed in the simulations. Adaptivity of the protocol (w.r.t. to the performance changes of the sensors) was illustrated through simulations. In the TMIP problem, we have modeled the processing capability, communication rate, and sensing rate of the sensors. Power consumption of the nodes was not directly modeled, but assumed to be controlled by some application level power management scheme. This leads to the continuous changes of the computation and communication capabilities of the nodes (due to power management). In addition, environmental fac tors can also affect the performance of the nodes. We addressed the issue of adaptation to such performance changes. 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To address power consumption directly, we can introduce a fourth characteristic for the sensors: power budget. It represents the maximum amount of energy that a sensor can consume in one unit of time. The power budget may be determined by various factors. For example, if a node is low on battery, it may impose a low power budget on its activities so as to extend its life. If a node is on the critical path of connecting two groups of sensors, it may also impose a low power budget so that it can operate over a long period of time. It is reasonable to assume that power budget is also controlled by some application level power management scheme. The goal is to maximize the throughput under the given power budgets of the sensors. The TMIP problem studies the class of in-network processing problems that maxi mize the throughput. An equally important problem is to maximize the total number of data blocks processed. Because every sensor has certain energy budget, the number of data blocks that a sensor can sense, send, receive, and process are limited. Therefore we need to determine the routing of data blocks through the system without violating the energy budgets of the sensors. The energy constraints, again, can be represented as constraints of the vertices. An interesting observation is that there are no edge capacity constraints: a communication link can be used to transfer an arbitrary number of data blocks (over an arbitrarily long time period) since we are not maximizing the number of data blocks processed in one unit of time. This effectively reduces the maximization of the number of data blocks to a special case of throughput maximization with edge capacities being set to infinity. For this class of in-network processing problems, we 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are exploring the possibility of developing a distributed algorithm that can be executed by nodes while the system continues to collect and process data. In this chapter, we have modeled networked sensor systems as a general graph where all the nodes have the same functionality. However, various studies have pro posed hierarchical infrastructures for networked sensor systems (e.g. [64]). In these infrastructures, one node is elected as the cluster head that coordinates the operations of the nodes in a cluster. Additionally, the role of the cluster head is often rotated among the nodes in a cluster [6 8 ]. This results in dynamic tree structured system topology. Because routing is greatly simplified for tree topologies, many data gath ering/processing problems can be solved efficiently. For example, if the root of the tree (often the base station) needs to disseminate data to the complete system, then the greedy algorithm in [9], which was originally developed for tree structured distributed computer systems, can be applied. When data processing capabilities and energy con straints are considered, non-trivial algorithms need to be designed to optimize system performance. Associated with the performance optimization problems, is the prob lem of system synthesis: given a system connected via a general graph, what is the optimal tree structured sub-graph that can collect the maximum number of data pack ets? Or, what is the optimal tree that can operate over the longest time period? Many networked sensor system applications can be abstracted as the coordination of com munication and computation in tree structured systems. Exploration in this direction 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which focuses on studies at the infrastructure level will greatly aid application design for networked sensor systems. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Maximum Lifetime Data Sensing and Extraction in Energy Constrained Networked Sensor Systems In this chapter, we focus on data gathering problems where the networked sensor sys tems operate in rounds. A subset of the sensors generate a certain number of data pack ets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system life time in terms of the number of rounds the sys tem can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing span ning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available informa tion. No global synchronization is needed. Although the heuristic cannot not guarantee 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic. 6.1 Introduction In this chapter, we consider the class of networked sensor system applications where the system works in rounds as in [35]. The event is sensed by a subset of the sen sors, each of which generate certain number of data packets during each round and all the sensed data needs to be collected by the base station. The goal is to maximize the number of rounds that the system can operate under the energy constraints of the sensors. By modeling the energy consumption associated with each send and receive oper ation, we formulate the data gathering problem as a restricted flow optimization prob lem. We show that the maximization of the number of rounds reduces to a modification of the standard network flow problem that has quota constraint (details can be found in Section 6.3) on the nodes. We develop an 0 (| Vd • | V| • \E\2-log(min{Bu/(S u+Tu)\u € K})) algorithm for this problem, where |V| is the total number of sensors, \E\ is the number of communication links, Vc is the set of sensors that collect data packets from the environment and |VC | is the number of such sensors, Bu is the energy budget of sensor u, Su is the energy cost for u to sense one data packet, and Tu is the energy cost for u to send out one data packet. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. While the above algorithm finds the optimal solution, it is centralized. We then develop a distributed heuristic for the data gathering problem where every node deter mines its sensing activities and data transfers based on locally available information. This heuristic is based on self-stabilizing spanning tree construction algorithms and the shortest path routing method. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. The rest of the chapter is organized as follows. The system model and problem formulation are discussed in Section 6.2. In Section 6.3, we reduce the maximization of rounds in data gathering to a restricted flow problem. In Section 6.4, we present our algorithm for the restricted flow problem and prove its optimality. In Section 6.5, we reconstruct the data flow for each round from the solution we find in Section 6.4. Section 6 . 6 compares the worst case performance of our algorithm against that in [35]. Section 6.7 discusses the distributed heuristic, examines its worst case performance, and presents simulations results on its average case performance. 6.2 System Model and Problem Statement 6.2.1 Model of Networked Sensor System Suppose a network of sensors is deployed over a region. The location of the sensors are fixed and known a priori. The networked sensor system is represented by a graph 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. G(V, E ), where V is the set of sensor nodes, (it, v) G E if it G V, v G V and it is within the communication radius of v. au, the set of successors of it, is defined as (ju = {v G V\(u,v) G - E 1 }. Similarly, tpu, the set of predecessors of u is defined as ipu = {v E V\(v,u) G E}. The event is sensed by a subset of sensors Vc C V. t is the base station to which the sensed data are transmitted. Sensors V — Vc — {t} in the network does not sense the event but can relay the data sensed by Vc. We further assume that no data aggregation is performed during the transmission of the data. Data transfers are assumed to be performed via multi-hop communications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low-lying antennae and near-ground channels typically found in networked sensor systems [2]. Additionally, short-range communication enables efficient spatial frequency re-use in sensor net works [2 ]. Each sensor it G V has an energy budget Bu. Base station t is assumed to have unlimited energy supply. Our energy model for radio transmissions of the sensors is based on the first order radio model described in [32], If v is within the communication radius of u, the energy consumed by sensor it to transmit a A :— bit data packet to v is Tu = eeiec x k+ea r n p x d 2 axk, where £e j ec is the energy required for transceiver circuitry to process one bit of data, eam p is the energy required per bit of data for transmitter amplifier, and du is the communication radius of sensor it. Transmitter amplifier is 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. not needed by u to receive data and the energy consumed by u to receive a A ;— bit data packet is Ru = £P j ec x k. The system is considered to be heterogeneous, i.e. TU (RU ) can be different from Tv(Ry). The energy consumed by sensor u to sense a A ;— byte data packet from the environment is Su = £sen x k where esen is the energy required for sensing circuitry to collect one bit of data from the environment. 6.2.2 Problem Statement We consider the class of applications where the system operates in rounds. Sensor u 6 Vc generates nu data packets during each round. These data packets need to be gathered by the base station. The total number of rounds that the system can operate is limited by the energy budgets of the sensors as well as the routing of data packets through the system. Our goal is to maximize the number of rounds. Let f(u ,v ) denote the number of data packets transferred from u to u. The maximal data gathering (MDG) problem is formulated as follows: Given: a networked sensor system represented by a graph G(V,E) where each sensor u 6 V has energy budget B u, cost of sending a data packet Tu, and cost of receiving a data packet Ru. Vc C V is the set of sensors that collect data from the environ ment. v e V c generates nv data packets during each round. The cost for v < G Vc to sense a data packet is SVEdge (u, v) e E if v is within the communication radius of u. t e V — Vc is the base station. Maximize: N 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Subject to: (!) £„ga„ / ( « , u) - / ( u> u) = N • nu for u e V c (2) E*e C T u /(«, v) = e „ 6^ / K u) for u G V - Vc - {t} 0) E „ effu / ( “ > u) ‘ + E U tf. / ( u> “ ) ' + N ' ^ S « for u £ Vc (4) E ^ / ( M>v) • T u + E^ev-u / ( u > f 0 T u e V - V c - {*} Function / in the MDG problem is called a data flow in G. Variable N in the problem statement represents the number of rounds the system can operate. Condition 1 requires that each sensor u E Vc generates nu data packets for each rounds. Condition 2 says that all the intermediate sensors do not generate or drop data packets. Condition 3 and 4 describes the energy constraints of the sensors. Nodes in Vc can sense the environment as well as relaying data packets for other nodes. Hence they may receive some data packets from the neighbors, as is shown in condition 3. Bu in the MDG problem models the energy constraints of the sensor nodes. It does not have to be the total remaining energy of u. For example, when the remaining battery power of a sensor is lower than a particular level, the sensor may limit its contribution to the data gathering operation by setting a small value for Bu (so that this sensor still has enough energy for future operations). For another example, if a sensor 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.1: An example of the MDG problem is deployed in a critical location so that it is utilized as a gateway to relay data packets to a group of sensors, then it may limit its energy budget for a particular data gathering operation, thereby conserving energy for future operations. These considerations can be captured by energy budget Bu in the MDG problem. Figure 6.1 shows an example of the MDG problem. The system consists of 9 nodes. Vc = {a, b, c}. Node t is the base station. Energy budgets are marked on the nodes. For simplicity, we consider Ru = Tu = 1 for all u € V, Sa = Sb = Sc = 1, and na = rib = nc = 1. This example system can operate over a maximum of two rounds. The paths to route the data packets are illustrated in Figure 6.1 as the three dotted lines. After two rounds, nodes a through h will have remaining energy 2.2, 0.3, 1.7,0.6, 1.2, 0.8, 1.2, 4.3, respectively. Nodes a and c can still sense data and some of the sensed data can be transferred to t. But node b cannot sense any more data packets since it has only 0.3 unit of energy left, which is less than Sb. Even if b can sense, it does not have enough energy to transmit the data packets since Tb > 0.3. Since a round is defined as collection of all data packets sensed by all the nodes in Vc, it means that 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the loss of node b prevents the system from successfully operating any more rounds. Another scenario that the system fails to operate a round occurs when the intermediate nodes fail (due to lack of energy) to transfer all the sensed data to t. This scenario is not shown in this example. 6.3 Reduction to a Restricted Flow Problem We can see that the energy budgets (condition 3) in the MDG problem is imposed on the nodes. In this section, we show that energy budgets of the sensors can be transformed to edge capacities. The MDG problem is an optimization problem of finding the maximum number of rounds N that can be achieved in a given graph. We consider the corresponding decision problem: given a graph G ( V , E ) and a number N , can we achieve N rounds in graph G? With the existence of condition 1, condition 3 can be re-written as Y f i u >v ) ■ Tu + i Y f ( u ’ v ) - N ■ n u) ■ R u + N -nu - S u < B u v £ a u v£ < tu for u G Vc 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which is equivalent to Y f(u, v) < (Bu + N ■ nu ■ (Ru - Su))/ (Tu + Ru) for u G Vc veiTu With the existence of condition 2, condition 4 can be re-written as Y /(it, v) ■ Tu 4 - ^ /(u , v)- RU< B U for u £ V — Vc — {f} ^6o-u uGCu which is equivalent to Y / K u) < Bu/(Tu + R u) for u G V - Vc - {£} ve<ru Then condition 3 and 4 can be represented uniformly as Y / ( u > v) - & for u E.V — {t} (6 .1) v£cru where fiu = {Bu + N • nu ■ (Ru - SU ))/(TU + Ru) for u G Vc Pu = BU /(TU + Ry) for u G V - Vc - {t} By introducing a pseudo source node s that connects to each node u G Vc, we can state the decision version of the MDG problem as the following restricted flow problem with vertex capacity constraints (RFVC): 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given: a graph G(V, E) with source s and sink t, a number N. Node u € V — {s,t} has capacity constraint 8U . Vc C V — {s, t}. Node v £ Vc generates nv data packets per round. Determine: whether there exists a data flow / : E — > • R that satisfies the following conditions: For the RFVC problem, suppose we split u € V — {s, t}) into two nodes u\ and u2, re-direct all incoming links to u to arrive at u\ and all the outgoing links from u to leave from u2, and add a link from ui to u2 with capacity j3 u, then the vertex constraint 8U is fully represented by the capacity of link (ui ,u 2). Actually, such a split transforms all the vertex constraints to the corresponding link capacities and leads to the restricted flow problem with edge capacities (RFEC). The RFEC problem is stated as follows: Given: a graph G(V, E) with source s and sink t, a number N, and Vc C V — {s, t}. Edge (u, v) has capacity constraint c(u, v). Node v e Vc generates nv data packets Determine: whether there exists a data flow / : E — > ■ R that satisfies the following conditions: (1) f{s,u) = N ■ nu (2 ) Hvevu / ( “ >v) = u) ( 3 ) E ^ e c ru f ( u , v ) < Pu fo r u G Vc for u € V — {s, t} for u G V — {s, t} per round. (1) f(s,u ) = N -nu fo r u < E Vc 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2 )£„G<r„ / ( “ « /(«> uX 0r U d z V — {s, t} (3)/(it,u) < c(u,u) f o r « e f - { s , t } Formal proof of the correctness of reduction from RFVC problem to RFEC prob lem is straightforward and hence omitted here. Condition 1 in the RFEC problem is the quota constraint, which requires node u e Vc to generate nu data packets per round. Condition 2 is the flow conservation constraint. It says that nodes does not generate or consume data packets except source s and sink t. (Node u G Vc does generate data packets in the MDG formulation. But with the introduction of pseudo source s, flow conservation is enforced on u in RFVC and RFEC problems.) Condition 3 is the edge capacity constraint. The RFEC problem is very similar to the standard network flow problem [17], which enforces flow conservation and edge capacity constraints. The RFEC problem differs from the standard network flow problem w.r.t. the additional quota constraint. We call both RFVC and RFEC problems restricted flow problems due to such a quota constraint. 6.4 Algorithm for the Restricted Flow Problem Given a RFEC problem with graph G(V, E) and number N, we define the the source capped graph induced by N as agraph GN (V, E N ) where E N = {(u, v)\(u, v) € E} and the the capacity t?(u, v) of the edge (u, v) in E* is defined as follows: cN (s, u ) = N -nu 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for u E os and c N ( u , v) = c(u, v) otherwise. Obviously, a feasible solution to a RFEC problem with graph G(V,E ) and number N must also be an optimal solution to the standard network flow problem in GN (V, E N ). Before introducing the algorithm for the RFEC problem, let us review some nota tions and concepts for the standard network flow problem. The standard network flow problem is to find a maximum flow from s to t in graph G(V, E), subject to the flow reservation and edge capacity constraints. For notational convenience, c(u, v) =f 0 if (it, v) E. If the actual data flow is from u to v, we define /(it, it) = — /(it, it). With the definition of /(it, it) and c(it, v) thus expanded, if neither (it, v) nor (it, it) belongs to E, then c(n, it) = c(v, it) = 0, which implies that /(it, it) = f(v, u) = 0. In this way, we can define f(u,v) over V x V, rather than being restricted to E. f(u,v) = — /(it, u) also allows us to represent the flow conservation constraint as /(« , v) = 0 , which is equivalent to Y,v€tT u / K v) = £ „ e * . / ( u> “ )• Given a graph G(V, E) and a flow / , the residual graph induced by / is a graph Gf( V, Ef), where Ef = {(it, i>)|it G V, v G V, c(it, v) — f(u, v) > 0}. Edge (it, v) in Ef has resid ual capacity Cf(u, v) = c(u, v) — f(u , v). An augmenting path p is a simple path from s to t in the residual graph Gf. The residual capacity of augmenting path p is defined as Cf(p) = min{c/(it, v) : (it, it) is onp}. A cut of G(V, E) is a binary partition (S,T) of V such that s E S, t E T, and S U T = V. The capacity of a cut (S, T) is defined as c(S, T ) = Y2u€S,veT c(u > v)• 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our algorithm (denoted as the RFEC algorithm) for the RFEC problem is as fol lows: 1. f(u, v) = 0 for V u, v € V 2 . for each u € as 3 . while / ( s , u) < N ■ nu 4. find the shortest augmenting path p that has (s, u) as the first hop 5. if such a path p does not exist 6 . return FAIL and exit the algorithm 7. else 8 . d = min{cy(p), N ■ nu - f{s,u)} 9. for each edge (u, v) in p 10. f(u,v) « - f(u,v) + d 11. Cf(u,v) 4—Cf(u,v) — d 12. end for 13. end if 14. end while 15. end for 16. return SUCCESS Theorem 5. Given a RFEC problem with graph G(V, E) and number N, the RFEC algorithm returns SUCCESS iff there exists a feasible dataflow for the RFEC problem. Proof: 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. = > •: If the RFEC algorithm returns SUCCESS, the values of /(it, v) upon termina tion of the algorithm actually constructs a feasible flow for the RFEC problem. < = = : Suppose there exists a feasible data flow / for the given graph G(V, E) and number N. It is easy to verify that / is also a maximum flow from s to t in G'N (V, F/N ). According to maximum-flow minimum-cut theorem [17], this maximum flow implies a minimum-cut (s , V — {s}) of GN (U, £ ’ N ) whose capacity is N ■ Ylue^s n“- Assume for the sake of contradiction that the RFEC algorithm returns FAIL. With out loss of generality, assume that the algorithm returns FAIL when checking u* E < j s , i.e. when the algorithm cannot find any augmenting paths that has (s, it*) as the first hop while / ( s , u*) is still less than N ■ nu*. Now we consider the source capped graph G ,N (U, E N ). Upon termination of the al gorithm, data flow / in G(V, E) actually constructs a flow (not necessarily a maximum flow) in GN (V, £ N ). We construct a cut (S', T) of GN (U, EN ) as follows: t = {s} U {u| there exists a path p from s to v in GN j, and the first hop of p is (s, u*)} T= V — S Gy in the above equation denotes the residual graph of GN , induced by / . We claim that t £ S. Otherwise there exists an augmenting path from s to tin Gf (and also in G^) with (s, it) as the first hop, then the test at line 5 of the algorithm will 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be valid and the algorithm will not fail. The fact that t S implies that t < G T, which means that (S, T) is indeed a cut of Gj. Let S' = S — {s}. We claim that f(u, v) = c(u, v ) for V u e S', v £ T. Otherwise f(u, v) < c(u, v) implies an edge (it, v) in GJ, which further implies the existence of a path from s to v (s — > ... — > ■ u — > v) in G^. But this contradicts the assumption that v e T . Because flow conservation is always satisfied as we push data packets along the augmenting paths, 0 = E v e v E u e s ' f (v ’ u ) = E „ es' / ( s >u ) + E v e r E „ e S ' f ( v , u ) + EveS' EueS' / (U > W ) — E u e s ' f (s >u ) + E ^ e r E u e s ' / (u >u ) = E u e s ' / ( s >M ) - E v e r E ues' / ( u > which means that E u e s ' f (s >u ) = E v e s ' E v e r / = E « g S ' E v e r c(u >v ) The capacity of cut (S’ , T) is calculated as follows: c(5, T) = Eues.ver C K u) = E v e r c (s >u) + E u e s ',v e r C(M> ^ = E v e r c (s ^ ) + E „ g5' / ( s > « ) 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. = E .et^n T } c(s > v) + Junes' / ( s > u ) = Ei;e{tTanT} Eue{o-sn5'} f (s > M ) According to edge capacity constraint, /(s, u) < c(s, u) = nu for u € as Particularly, for node u*, which is by definition in as n S', we have f(s,u*) < c(s,u*) = nu* Therefore, c(S, T ) — E^e{o-snr} nv "T E«e{o-sns'} f (s > u) < E ue{o-snT} n v E u e{o -sns'} c ( s >u ) — Eue{o-snT} nv + Euelo-sns'} n« = Eue{crsn(5'uT)} nu This is impossible since we have shown that the minimum cut of G *f has capacity N ■ J2ueas n«- The capacity of cut (S, T) can not be smaller than that of the minimum cut. □ It can be seen that the while loop in the RFEC algorithm consisting of line 3-14 is pushing data packets along the shortest augmenting path whose first hop is (s, u). Similar to the complexity analysis of the Edmonds-Karp algorithm [17], it can be 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shown easily that the complexity of the while loop is 0 (| V| • \E\2). The complexity of the RFEC algorithm is therefore 0{\as\ • |V’| • |_C|2). Note that as in the RFEC problem is mapped from Vc in the MDG problem. The mapping procedure can be completed in O (| V | + 1 E |) time. Hence the decision version of the MDG problem can be solved in 0(|V C | • \V\ ■ \E\2) time. We can apply binary search to find the maximum value of N for the original MDG problem. Because min{BU /(SU + Tu)\u G Vc} is an obvious upper bound for N, the maximum number of rounds in the MDG problem can be found in 0 (| Vc\ • | V| • \E\2 ■ log(min{Hu/(S'u + Tu)\u G V c}) time. In the RFVC problem, the capacity /3„ of vertex u restricts the maximum number of the data packets that u can transfer. There can only be integer number of data packets. Hence (5 U is effectively equivalent to \fiuJ where [/3U \ represents the largest integer smaller than or equal to @ u. Consequently, when mapping such a RFVC problem to a RFEC problem, a real-valued c(u, v) is effectively equivalent to \ c(u, u)J. It is well known that flow maximization using augmenting paths, which includes the scenario of our RFEC algorithm, generates integer valued solutions when the edge capacities are integers. Therefore, if the data packets are atomic and cannot be further divided, our algorithm is guaranteed to find integer valued optimal solution for the RFEC, and hence the RFVC and MDG problems. We illustrate the execution of the RFEC algorithm using the example system in Figure 6.1 where Vc = {a,b,c}. Remember that we consider Ru = Tu = 1 for 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.2: An illustrative example of the RFEC algorithm u £ {a, b, c, d, e, / , g, h}, Su = 1 andnu = 1 for u £ {a, b, c}. min{Bu/(Su+Tu)\u £ Vc} = 2.15, which is the upper bound on the number of rounds. Hence we start the binary search with N = 2. The first step is to transform the MDG problem formulation in Figure 6.1 to the RFEC formulation. The result is shown in Figure 6.2 where each node u £ {a, b, c, d, e, /, g. h} in the MDG formulation is split into two nodes ui and u2, and a pseudo source node s is added. Weight of the newly added edge u2) is calcu lated according to Equation 6.1. The value of nu {u £ {a, b, c}) in the MDG formu lation is inherited by nu (u £ {oi, bi, ci}) in the RFEC formulation, i.e. nu = 1 for u £ {ctiAjCi}. In order to check if the system can successfully operate 2 rounds, the RFEC al gorithm is executed. After initialization, suppose in line 2, the procedure chooses to check node ai first. The algorithm then attempts to push N x nai = 2 units of flow 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. from s to t along augmenting paths that have (s, ai) as the first hop. One of the possi ble augmenting paths is s — > ai -* a2 — > ex — > e 2 -> hx — > • h2 — > t. Suppose this path is chosen and 2 units of flow is pushed along this path. This fulfills the request of node a\. After pushing this flow, edges (oi, a2), (ei, e2), and (hi, h2) have remaining capac ities 1.1, 0.6, and 2.15, respectively. Then, suppose the algorithm chooses to check the second neighbor, b\, of source s. N x nbl = 2 units of flow need to be pushed from s to t via bi. s — > b\ — » b2 — > e\ — > a2 — > d\ — > d2 — > gi — > ■ t is the only possible augmenting path from s to t with (s, bi) being the first hop. Since the capacity of this augmenting path is 2 , the algorithm can successfully push 2 units of flow along this path. Similarly, for the third neighbor c\ of source s, N x nC l = 2 units of flow can be pushed from s to t along augmenting path s — > ■ c\ — > c2 — > ■ /i — > f 2 — > hi — > h2 t , which has (s, c\) as the first hop. This completes the check for all the three neighbors of s and the algorithm returns ‘SUCCESS’, which means that the system indeed can operate over a maximum of 2 rounds. Hence we stop the binary search at N — 2. Note that during the execution of the RFEC algorithm, we do not enforce a specific order by which the neighbors of the source node are checked. 6.5 Reconstructing the Data Flow for Each Round The RFEC algorithm (when used together with binary search) finds the maximum number of rounds N that the system can operate. Besides the total number of rounds N, the RFEC algorithm also finds the flow f(u, v) for each edge (u, v). The value of 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u,v) specifies the total number of data packets that are transferred along edge (u,v) over the N rounds, but it does not specify how many data packets should be transferred along (u, v) during the ith round, 1 < i < N. In this section, we address the problem of reconstmcting the data flow for each round, given the result of the RFEC algorithm, f{u,v). The reconstruction is based on the flow decomposition [1] technique, which was originally developed for the standard flow maximization problem. Let us first briefly review the standard flow decomposition problem. Given graph G(V, E) with source s and sink t, and any (not necessarily the maximum) flow f(u, v) in G, the flow graph induced by / is defined as graph GyflV, E') where E' — {(u, v) £ E\f(u, v) > 0}. A flow decomposition of Gy] is a decomposition of Gy] into a certain number of ‘primitives’. Each primitive is either a simple path p from s to t where the flow along each edge of p is the same, or a simple circle 7 where the flow along each edge of 7 is the same. It is well known that there can be at most \V\ + \E\ primitives when decomposing any flow graph Gy] [1]. Given an MDG problem, we have transformed its decision version to the RFEC problem, which enforces the quota constraint that differentiates the RFEC problem from the standard flow problem. The following discussion in this section considers the reconstruction of data flow for each round, given the solution (N and f(u,v), (u, v) £ E) to the RFEC problem. The result of reconstruction can then be easily 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transformed and applied to the original MDG problem. Let ip(p) denote the destination of the first hop of path p. Our objective is to decompose the solution of the RFEC algorithm (in the form of f(u, v) where (u, v) £ E) to N sets of paths. The ith set (i = 1, 2 , ..., N) corresponds to the ith round of data gathering. It specifies a set of paths IT = {pn ,p,2, ...} where each path is from s to t, and the data flow (denoted as 5(p)) that should be transferred along each path p £ fifi. We require that IT U • • • Ilyy is a decomposition of f{u, v) (where (u, v) £ E). To construct a valid data flow for the ith round, we require that YlPeni^v(p)=u^(p) = f°r Vu G < js. The various paths are not necessarily edge disjoint, i.e. an edge can be a member of multiple paths. Given G(V, E) and f(u, v), we first construct the flow graph G[p. Then we per form depth first search (DFS) on G[f] and find all the circles. Suppose a circle 7 is found, let 5 (7 ) denote the minimum flow along all the edges of 7 . Then for each edge (u,v) on 7 , we reduce f(u,v) by 5(7 ). In this way, we eliminate one edge from 7 and hence break the circle 7 . Repeating the above procedure, we can eliminate all the circles in Gyy In the following discussion, for the sake of simplicity, we assume that G[p is acyclic. To identify the paths, for each u G as, we split edge (s,u) into N edges (s,u)i, (s,u)2,..., ( s , u ) n , each going from s to u and having flow f(s,u )/N = nu, (recall that /(s , u) = N x n u upon completion of the RFEC algorithm). Flow nu along path (s, u)i corresponds to the quota constraint on u in the ith round of data gathering, 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i.e. node u needs to sense and send out nu data packets in the ith round. Then, as long as there are edges leaving s in G[f], follow a path p out of s until we reach t (we must reach t because Gy] is acyclic). Add path p thus found to Eh if the first hop of p is (s, u)k- Let the value of the path, 5(p), be the minimum flow along all the edges in p. Then we reduce the flow along every edge in p by S(p). This eliminates one edge from G[f\. We repeat the above procedure until there is no edge out of s in G[f\. Each time a path is added to n f c (k = 1,2,...N ), an edge is eliminated from Gyy Since there can be at most \E\ edges in Gyp the above procedure terminates in at most \E\ steps. Next we show that the above procedure, upon termination, finds a reconstruction of the data flow for each round. Upon termination, there is no edge out of s in Gyp Because all the nodes conserve flow and G[f] is acyclic, all the edges must have been eliminated from Gy], Therefore rix U • • • IIjv is indeed a decomposition of f(u,v) where (u, v) £ E. Now we consider any individual set 11; . II; consists of paths whose first hop is (s, u)i where u £ as. Then, upon completion of the above procedure, we must have X)J ,eni& ¥ > ( P )= u d(p) = nu f°r each u e as, i.e. the sum of flows on all paths in Ilj whose first hop is u must be equal to nu for each u £ os. Otherwise the sum would be less than nu, which means that edge (s, u)t is not eliminated from Gy ], contradicting the assumption that the procedure terminated. {p\p £ IIih<p(p) = u} is what we 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.3: Flow graph generated by the RFEC algorithm are looking for: the set of simple paths that can transfer nu data packets from s to t. Therefore, we use 1 1 * to transfer the data packets in the ith round, i = 1 , 2 , N. It is interesting to point out that the reconstruction is not unique. For example, the order in which the paths are found and added to ITi U • • • IIjv can be arbitrary; II* and Ilj can also be exchanged (when % / j). We illustrate the above reconstruction procedure using the example system previ ously shown in Figure 6.1. Given the MDG problem shown in Figure 6.1, we have illustrated the correspond ing RFEC problem formulation in Figure 6.2 and demonstrated the execution of the RFEC algorithm in Section 6.4. The solution generated by the RFEC algorithm is shown in Figure 6.3 in the form of a flow graph, where the flow is marked on each edge. Note that the flow along any edge (u, v) indicates the total number of data pack ets going through (u , v) for all the N(=2) rounds. Edge (a2, e\) does not carry any flow so it does not appear in the flow graph. 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.4: Reconstructing the flow for each round. Step 1: split the edges out of s. It is a coincidence that Figure 6.3 is already acyclic. So the removal of circles is skipped here. (Actually, we still need to perform a DFS to verify that the flow graph is indeed acyclic.) Note that the flow graph may not always be acyclic and the removal of circles then becomes necessary. The next step is to split (s, u) into N edges for each u £ as. In this example, crs = {ai, bi, Ci} and N = 2. Therefore, (s, ai) is split into N(=2) edges, (s, ai)i and (s, ai)2. /(s , a\) = 2 in Figure 6.3 indicates that a total number of 2 data packets should be transferred from s to a\ in 2 rounds. Therefore, both edge (s, ai)i and (s, a i ) 2 in Figure 6.4 have a flow of value /(s , ax)/N — na = 1, representing that na = 1 data packets should be transferred from s to ai in the first and the second round. Edges (s, b\) and (s, c\) are also split in the same way. The result of the split is shown in Figure 6.4. The names of the newly split edges are marked on the edges. The paths to transfer the data packets are identified as follows: As long as there are edges leaving s, we follow an arbitrary path till we reach t. Suppose the first path we choose is s — > c\ — > c2 — > fi — » f 2 — » ■ hi — > • h2 — > t and 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.5: Reconstructing the flow for each round. Step 2: find the paths. the first hop is along edge (s, ci)2. We add this path to n 2 since the first hop is (s, ci)2. The minimum flow along the edges in this path is 1 (edge (s, ci)2). Therefore, the flow along this path is set to 1 and we reduce the flow along all edges in this path, each by 1. Since (s, Ci) 2 does not have any flow after the reduction, it is removed from the flow graph. The result at this stage is shown in Figure 6.5. We repeatedly follow the paths from s to t and add the paths to lb according to their first hop, until there are no more edges out of s. It is easy to verify that upon termination of the above process, we have the following result: Pn = s — y a i —y a2 — > ■ d i — > d 2 —y gi —y g2 —y t P i 2 = s —y b \ —y b 2 —y e \ —y e 2 —y h i — y h 2 —y t P i3 = s — > ■ ci — y c2 —y /i —y / 2 —> ■ h i —y h 2 — » t H i = { p i l I « y (p n ) = 1 , P l 2 1 <s(p1 2 ) = l > ^ 1 3 1 5 ( p i 3 ) = 1 } P 21 — s ~ t a i ~ t a 2 —y d i —y d 2 —y g i —y g 2 —^ t p 22 = s —y b i —y b2 —y e i —y e 2 —y h i —y h 2 —y t 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P 2 3 — S — > C i — » C 2 — > ■ fl — > / 2 — > • ^1 — > h,2 — i ► £ I I 2 = {p 2 1 |(5(p2l ) = l ) P 22 ||5(P22) = 1 > P 23 |5(j?23) = 1 } For the sake of simplicity, we set na = = nc = 1 in this example, which means that nodes a, b, and c sense only one data packets in each round. Consequently, a single path is capable of transferring all the data packets sensed by a, b and c in each round. If na > 1 (nb > 1, or nc > 1), we may need multiple paths to transfer the data packets sensed by a (b, or c) for each round. Another observation from this example is that El] and n 2 contains the same paths. This is also a coincidence in this particular example. We choose this simple example for illustration purpose only. If the algorithm is applied to more complicated systems, Elj and IIj (i ± j ) may contain different paths and flow along the paths may also be different. 6.6 Performance Comparison In this section, we compare the performance of our algorithm against the method pro posed in [35]. The study in [35] considers a similar data gathering problem as the problem studied here. In [35], it is assumed that each sensor generates exactly one data packet in each round, and all the data packets need to be transferred to the base station. The goal is to maximize the number of rounds the system can operate. Such a scenario reduces 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to the MDG problem. The system model used in [35] is generally the same as our model except for three aspects: the energy for sending a data packet is assumed to be dependent on the distance of the receiving node from the sender, the energy cost of sensing the environment is ignored, and each sensor generates exactly one data packet per round. While it can be debated which system model accurately represents reality, in order to obtain a fair comparison, we consider the scenario that every sensor generates one data packet in each round, a sensor spends the same amount of energy when sending a data packets to any of its neighbors, and the energy cost sensing the environment is zero. For this scenario, the solution technique in [35] as well as the proposed technique can be applied. A two stage method is used in [35] to obtain an integer valued solution. In the first stage, a relaxed linear programming formulation of the data gathering problem is solved (using some linear programming algorithm). We refer to this approach as the relaxed flow approach. The solution specifies how many data packets f(u ,v ) should be transferred between sensors u and v (when communication link (u, v) exists). Since the result f(u, v) may not be an integer, the solution is floor rounded to f'(u, v). Note that this floor rounding may compromise the property of flow conservation. So f(u , v) is only used as the edge capacity constraint, which defines a new flow optimization problem. Then in the second stage, an integer valued solution is found for the new flow optimization problem, using the augmenting path method. It is claimed that the solution produced by such an approach is near-optimal. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Although experimental results in [35] illustrate that the two stage method can achieve close to optimal performance, in the following, we demonstrate the non optimality of the relaxed flow approach. Consider the simple example shown in Figure 6 .6 . The system consists of four sensors (a, b, c, d) and the base station t. For a, b, c, and d, the cost of sending a data packet is equal to that of receiving a data packet. The energy budget of the sensors, in terms of the number of data packets that can be received and transferred, are marked by the nodes. Obviously, the sample system can operate for a maximum number of two rounds (one of the optimal solutions is f(a,b) = 2, f(c,d ) = 2, f(b, t) = 4, f(d,t) = 4, f(a, d) = 0, /(c, b) = 0). It is easy to verify that the proposed approach generates this solution. However, when we apply the relaxed flow approach, the first stage may generate the following real-valued solution: /(a , b) = 1.5, /(c, d) = 1.5, f(b, t) = 4, f(d, t) = 4, /(a , d) = 0.5, f(c, b) = 0.5, which rounds to /'(a , b) = 1, f'(c, d) = 1, f'(b,t) = 4, f'(d,t) = 4, f'(a,d) = 0, f'(c,b) = 0. Note that such real valued solutions are obtained if, for example, the linear programming toolbox in Matlab is used. If we define a new flow problem using these rounded values and solve it as in the second stage in [35], the maximum number of rounds that the system can operate, however, in only 1, which is only 50% of the optimal. In the worst cast, the behavior of the relaxed flow approach can be much worse than above. Suppose in Figure 6 .6 , sensors a and c have energy budget 1, b and d 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6 .6 : Example for worst case performance comparison have energy budget 2. It is easy to verify that the proposed approach finds the optimal solution. The number of rounds in the optimal solution is one. The relaxed flow approach, however, may produce the following real-valued solution: f(a,b) = 0.5, f(c,d) = 0.5, f(b,t) = 2, f(d,t ) = 2, f(a,d ) = 0.5, f(c,b) = 0.5, which, after rounding, leads to an integer solution with 0 rounds. We denote a solution technique to have failed if it produces a solution with zero rounds (i.e. the solution does not gather all the packets in the first round) while there exists a solution with at least one round. Let (2 represent the number of sensors in the system. The above scenario can be generalized to show: Theorem 6 . For all 0 > 4, there exist instances in which the relaxed technique fails to produce a solution. 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.7 A Distributed Heuristic for the MDG Problem The proposed RFEC algorithm and the relaxed flow approach are centralized. Both can only be executed in a centralized fashion. Next we study the performance of an distributed heuristic for the MDG problem. In this heuristic, every node determines its activities (sensing and transferring) based on its own local information and the information available at its neighbors. The heuristic is based on the self-stabilizing spanning tree construction algorithms [29] and the widely used shortest path routing method. We denote the heuristic as the shortest path heuristic. In this heuristic, each node u E V maintains a variable d(u) which is used to record the distance from u to base station t. Node u also maintains a data buffer, which is used to store sensed or received data. We assume that every node knows an upper bound U on the total number of nodes. We also assume that a new round is triggered by some external mechanism. The failure of a round is also determined by some external mechanism. The time interval between two rounds is large enough so that the system can complete one round (if possible) before starting a new one. Let e(u) denote the remaining energy of u and b(u) denote the number of data packets in the data buffer of u. The heuristic is described as follows: 1. Initially, d(u) = 0 for all nodes u E V. 2. Each u E V — {£} executes the following two operations (a) distance update 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i. ifm inveau,e(v)>Rv{ d { v ) } < d{u) < U, then set d(u) < — m in^,^ d(v) + 1 ii. if e(u) < Tu, the set d(u) <-U (b) data transfer if 0 < b{u) < U, e(u) > Tu, and 3v G au s.t. d(u) = d(v) + 1, and b(v) < then u sends one data packet to v 3. Each node u G Vc has one additional operation to execute: whenever a new round starts, u senses and puts nu data packets into its data buffer. 4. Since the base station t is assumed to have unlimited energy supply, t simply receives any data packets intended for t. In the shortest path heuristic, the ‘distance update’ step attempts to establish for each node a shortest path to the base station. Base station t is always at distance 0 (i.e. d(t) = 0). For every other node u G V — {t}, it chooses the neighbor v with the smallest distance d(v) as its successor and sets its own distance to d(u) = d(v) + 1 . It can be shown that starting with d(u) = 0 for u G V, the distance update step will eventually establish a breadth first search tree. A node simply chooses a neighbor which is one hop closer to the base station as its parent in the tree. This constructs a shortest path to the base station for each node. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As the data gathering proceeds, some nodes may deplete their energy due to send ing and receiving data packets. Consequently, for any node, its shortest path to the base station may change. This has also been considered by the distance update step of the heuristic. If a node u does not have enough energy to send a data packet, then it sets its d(u) to the upper bound U, indicating that u cannot reach t anymore. If a node u does not have enough energy to receive a data packet, then it will be excluded from being the parent of any other nodes. Actually, the execution of the distance update step continuously updates the breadth first tree rooted at the bast station. After the breadth first tree is constructed, the nodes start transferring the data pack ets. Node u will send one data packet to its neighbor v if the following conditions are met: (1) d(u) is less than the upper bound U, (2) u has enough energy to send one more data packet, (3) v is one hop closer to the base station, and (4) v has less data packets in its buffer than u. Conditions 1 and 2 indicate that a node sends out a data packet only if a path to the base station exists. Condition 3 tells a node to send data packets only to its parent in the breadth first tree. Condition 4 prevents a node from receiving too many data packets but cannot deliver these data packets to the base station later on. Because the shortest path heuristic continuously updates the breadth first tree ac cording to the remaining energy of the nodes, it can be executed while the data trans fer is performed. However, it cannot guarantee optimality. This is illustrated using the example system in Figure 6.7. The system is shown in the MDG formulation. To simplify the notations and figures, we omit the transformation to RFEC representation 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.7: Example demonstrating the non-optimality of the shortest path heuristic and present the following discussions based on this MDG formulation directly. The system consists of 17 nodes V = { ai,a2, • • -ai6 ,f}. Vc = {ai}. Tu = Ru = 1 for u G V. Sai = 1. nai = 1. Energy budgets of the nodes are marked on the nodes. The system can operate over a maximum of 15 rounds. This optimal solution is obtained when f(ai, a5) = / ( a 5 ,a 6) = /( a 6 ,a 7) = / ( a 7 ,a 8) = / ( a 8,t) = 5, /( a x,a 2) = /(a 2,03) = /(ct3,a4) = /(a4,ag) = /(ag,aio) = /(aio,on) = /(an,ai2) = /(012, t) = 5, and / ( a x, a X 3 ) = / ( a i 3, au ) = f{au , a15) = /(015, ow) = / K e , t) = 5. It can be easily verified that the RFEC algorithm finds this solution. However, using the shortest path heuristic, the following scenario will occur: ( 1) the distance update step will find ax — > • a 5 -* a9 — > a1 6 t as the shortest path connecting ax and t. (2) 5 units of data packets are transfered along this path. (3) no more data packets can be transfered, so the data gathering terminates. In the above solution, step (2) utilizes the currently available shortest path to trans fer the data packets. It uses up the energy of nodes a5, ag, and ai6. Then no more data 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can be transferred from ax to t because there does not exist any path between oi and t. The shortest path heuristic can achieve a total number of 5 rounds, which is only one third of the optimal. If the shortest path from a\ to t contains n nodes instead of 3, we can construct a similar example system where the shortest path heuristic achieves only 1/n of the optimal. As before, let 0 denote the total number of nodes. The above scenario can be generalized to show: Theorem 7. For any real valued e E (0,1), there exists an integer number Q0 > 0 s.t. for all > fi0, there exists problem instance(s) for which the shortest path heuristic produces a solution with system lifetime less than (e x the optimal system lifetime). However, this heuristic has very good average case performance. This is illustrated through the following simulations. In the simulations, nodes are randomly deployed in a unit square with uniform distribution. The communication radius of all the nodes are assumed to be equal. The base station is located at the lower left comer of the unit square. Su, Ru and Tu are uniformly distributed in [0,1]. A certain number of nodes are randomly chosen as the source nodes. The source nodes are not necessarily direct neighbors of each other. For each source node u, the number of data packets to be gathered per round, nu, is uniformly distributed in [1, nmax\ where nmax is a parameter in the simulations. The initial energy at the nodes are uniformly distributed in [ 0 , emax] where emax is another parameter. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The following five sets of parameters were studied: (1) |V|, the total number of nodes. (2 ) communication radius of the nodes, (3) \VC \, the number of source nodes (4) nmax, the maximum number of data packets to be gathered per source node per round, and (5) emax, the maximum amount of energy initially available at the nodes. In the simulations, |V| was selected from 40, 80,120, and 160. Communication radius was selected from 0.2, 0.3, 0.4, and 0.5. | V^|/1 V| was selected from 0.1, 0.2, 0.3, and 0.4. nmax was selected from 5, 10, 15, and 20. And emax was selected from 1000, 2000, 3000, and 4000. In summary, there were 1024 combinations of the parameters. For each of the 1024 combinations, we simulated 200 randomly generated systems. For each system simulated, we calculated the ratio between the lifetime achieved by the shortest path heuristic and the maximum lifetime obtained through the RFEC algorithm. Let q denote this ratio. Summarizing over all the simulation results, we observed that q has mean value 0.68 with standard deviation 0.36. Figure 6 . 8 shows the histogram of q over the 204800 experiments. In 47% of the experiments, the heuristic achieved the maximum system lifetime. For the remaining 53% of the experiments, the quality of solution, represented by q, was roughly uniformly distributed between 0 and 1 . To identify possible directions to further improve the performance of the heuristic, we studied the impact of each individual parameter. The results are represented in Figures 6.9-6.13. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.5 0.4 £0.3 a > c r £ 0.2 LL 0.1 °0 0.2 0.4 0.6 0.8 1 Quality of solution Figure 6 .8 : The histogram of q over all the simulations. The value of q for each sim ulation is calculated as the ratio between the achieved system lifetime and the optimal lifetime. The y-axis has been normalized to the total number of simulations. 0.8 o 0.6 £ 0.4 0.2 Number of nodes Figure 6.9: The impact of the number of nodes on system lifetime. The y-axis has been normalized to the optimal system lifetime. 0.8 c 5 0.6 £ 0 .4 0.2 0.3 Communication radius 0.4 0.5 radius Figure 6.10: The impact of communication radius on system lifetime. The y-axis has been normalized to the optimal system lifetime. 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.8 S 0-4 0.2 0.3 nodes 0.4 0.2 Number of source nodes source Figure 6.11: The impact of the number of source nodes (normalized to |V^|/|V|) on system lifetime. The y-axis has been normalized to the optimal system lifetime. 0.8 2 0 -6 2 0.4 0.2 1%00 3000 4000 2000 e m a x Figure 6.12: The impact of emax on system lifetime. The y-axis has been normalized to the optimal system lifetime. 1 0.8 a> b 2 0-6 E a> 0 4 C O 0.2 i n m a x Figure 6.13: The impact of nmax on system lifetime. The y-axis has been normalized to the optimal system lifetime. 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The results show that the performance of the heuristic is more sensitive to the number of nodes, the number of source nodes, and communication radius than to emax and nmax. Figures 6.9 and 6.11 show that the system lifetime reduces as the number of nodes or the number of source nodes increases. The system lifetime improves when the communication radius increases, as shown in Figure 6.10. On the other hand, the system lifetime did not change too much when we varied emax and nmax, as shown in Figures 6.12 and 6.13. This indicates that the topology of the system has more impact on the system lifetime than the properties of the individual nodes do. This observation has two implications for future studies. First, a better heuristic may be designed by taking into account more knowledge about the system topology. Second, given a set of nodes with pre-determined properties (for example, available energy), the system performance can be maximized by carefully designing the topology of the deployment. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Conclusion This dissertation studied adaptive application execution in two classes of seemingly different yet intrinsically related systems: distributed computer systems and networked sensor systems. For distributed computer systems, the dissertation studied the execu tion of a large set of independent tasks where each task is to process a certain amount of input data. The objective is to maximize the throughput of the whole system. For networked sensor systems, the dissertation studied data gathering problems where the data collected by the sensors needs to be transferred to the base station. The disserta tion also studied in-network processing problems for networked sensor systems where the data collected by the sensors needs to be processed by the sensors before sending to the base station. At an abstract level, both systems can be modeled as a graph where the vertices represent either the computers or the sensors and the edges represent the communication links among the computers or sensors. The networked sensor systems 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. have one additional capability constraint on the nodes, namely the energy budget. De spite this difference, we show that the problems of interest in two classes of systems reduce to the same network flow representation. Consequently, both the task allocation problem and the data gathering/processing problems are solved in a distributed fashion by using the proposed Relaxed Incremental Push Relabel algorithm, which provides a distributed and adaptive solution to the network flow problem. This dissertation also studies the maximization of the system life time for a class of data gathering problem. This problem reduces to a variation of the network flow problem. Unfortunately, the Relaxed Incremental Push Relabel cannot be applied. A strongly polynomial algorithm is then designed instead. Although the effectiveness of the proposed adaptive solution has been theoretically proved and experimentally verified, it should be pointed out that the solution has its limitations. The adaptive solution assumes that the available capabilities of the nodes and links will stop changing after a certain period of fluctuation, and then the system will stay in such a non-changing status for a time period of r. r must be long enough for the RIPR algorithm to find the optimal solution. Otherwise, changes in the capa bilities of the nodes and links will keep occurring before the RIPR algorithm can find the optimal solution, hence the algorithm always lags behind and the system can never really adapt to the dynamic behaviors of the resources. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Unfortunately, the rapid changes in the capabilities of the resources is not uncom mon, especially when the system is shared by a relatively large number of users. Ac tually, when the resource characteristics change so rapidly, defining the optimal per formance for the system becomes a big challenge because adaptation does not make senses any more. One of the possible solutions is to model the capabilities of the re sources as random variables whose statistical properties can be extracted from actual experiments. Then the ‘optimal’ performance of the system can be defined statistically. Of course, the statistical meaning of many optimization constraints (e.g. ‘in-coming flow equals out-going flow’) must be re-defined too. This statistical optimization can make for some interesting and practically im portant future work. Some studies have already shown that the load (and hence the available computing capabilities) of networked computers can be predicted based on previous loads characteristics. The prediction may not match the actual load precisely, but the statistical properties (mean, standard deviation, etc) of the prediction has been shown to be very close to the actual loads. We believe the study of load character istics provides a first step towards the statistical optimization of dynamic systems. Exploration along this direction will lead to a brand new, and hopefully better, way of utilizing the emerging geographically distributed and shared computer systems and other systems that share similar features. 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reference List [1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algo rithms, and Applications. Prentice-Hall Inc., 1993. [2] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cyirci. Wireless Sensor Networks: A Survey. Computer Networks, 38(4):393^122, 2002. [3] G. Asada, T. Dong, F. Lin, G. Pottie, W. Kaiser, and H. Marcy. Wireless In tegrated Network Sensors: Low Power Systems on a Chip. In Proceedings of European Solid State Circuits Conference, 1998. [4] R. Bagrodia, R. Meyer, M. Takai, Y. Chen, X. Zeng, J. Martin, and H. Song. PARSEC: A Parallel Simulation Environment for Complex Systems. IEEE Com puter, 31(10):77-85,1998. [5] C. Banino, O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Scheduling strategies for master-slave tasking on heterogeneous processor plat forms. IEEE Trans. Parallel Distributed Systems, 15(4):319-330, 2004. [6] C. Banino, O. Beaumont, A. Legrand, and Y. Robert. Scheduling Strategies for Master-slave Tasking on Heterogeneous Processor Grids. In PARA’02: Interna tional Conference on Applied Parallel Computing, LNCS 2367, pages 423— 432. Springer Verlag, 2002. [7] F. Baude, D. Caromel, N. Furmento, and D. Sagnol. Optimizing Metacomputing with Communication-Computation Overlap. Parallel Computing Technologies, 6th International Conference, PACT 2001, Russia, pages 190-204, 2001. [8] O. Beaumont, A. Legrand, L. Marchal, and Y. Robert. Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on hetero geneous platforms. In HeteroPar’2004: International Conference on Heteroge neous Computing, jointly published with ISPDC’ 2004: International Symposium on Parallel and Distributed Computing. IEEE Computer Society Press, 2004. [9] O. Beaumont, A. Legrand, Y. Robert, L. Carter, and J. Ferrante. Bandwidth- Centric Allocation of Independent Tasks on Heterogeneous Platforms. Interna tional Parallel and Distributed Processing Symposium (IPDPS), April 2002. 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [10] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johns- son, K. Kennedy, C. Kesselman, J. Mellor-Crummey, D. Reed, L. Torczon, and R. Wolski. The GrADS Project: Software Support for High-Level Grid Applica tion Development. International Journal of Supercomputer Applications, 15(4), 2001. [11] A. A. Bertossi, C. M. Pinotti, and R. B. Tan. Channel Assignment with Sepa ration for Interference Avoidance in Wireless Networks. IEEE Transactions on Parallel and Distributed Systems, 14(3):222-235, March 2003. [12] B. J. Bonfils and P. Bonnet. Adaptive and Decentralized Operator Placement for In-Network Query Processing. Second International Workshop on Information Processing in Sensor Networks, (IPSN 2003), pages 47-62, April 2003. [13] P. Bonnet, J. E. Gehrke, and Praveen Seshadri. Towards Sensor Database Sys tems. Second International Conference on Mobile Data Management, January 2001. [14] T. D. Braun, H. J. Siegel, and N. Beck. A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Com puting Systems. Journal of Parallel and Distributed Computing, 61:810-837, 2001. [15] J. C. Chen, Y. Kung, and R. E. Hudson. Source Localization and Beamforming. IEEE Signal Processing Magazine, 19(2), March 2002. [16] W. Choi, P. Shah, and S. K. Das. A Framework for Energy-Saving Data Gath ering Using Two-Phase Clustering in Wireless Sensor Networks. First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous’04), August 2004. [17] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1992. [18] Moteiv Corporation. Telos Wireless Sensor Module, http://www.moteiv.com. [19] J. Deng, R. Han, and S. Mishra. Security Support for In-Network Processing in Wireless Sensor Networks. 2003 ACM Workshop on Security o f Ad Hoc and Sensor Networks (SASN ’03), October 2003. [20] Distributed.net. http://www.distributed.net. [21] F. Berman (Editor), G. Fox (Editor), and A.J.G. Hey (Editor). Grid Computing: Making the Global Infrastructure a Reality. Wiley, 2003. 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [22] J. Edmonds and R. M. Karp. Theoretical Improvements in Algorithmic Effi ciency for Network Flow Problems. Journal of the ACM, 19:248-264, 1972. [23] C. Efthymiou, S. Nikoletseas, and J. Rolim. Energy Balanced Data Propagation in Wireless Sensor Networks. 4th International Workshop on Algorithms for Wireless, Mobile, Ad Hoc and Sensor Networks (WMAN ’04), hold in conjunction with IPDPS 2004, April 2004. [24] E. Falck, P. Floreen, P. Kaski, J. Kohonen, and P. Orponen. Balanced Data Gath ering in Energy-Constrained Sensor Networks. In Sotiris Nikoletseas and Jose D. P. Rolim, editors, Algorithmic Aspects o f Wireless Sensor Networks (First In ternational Workshop, ALGOSENSORS 2004), July 2004. [25] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In ACM SIGCOMM, pages 251-262,1999. [26] L. R. Ford and D. R. Fulkerson. Maximal Flow Through a Network. Canadian Journal of Math, 8:399-404,1956. [27] Richard F. Freund and H. J. Siegel. Heterogeneous Processing. IEEE Computer, 26(6): 13-17,1993. [28] H. N. Gabow. Scaling Algorithms for Network Problems. Journal of Computer and System Sciences, 31:148-168,1985. [29] F. Gaertner. A Survey of Self-Stabilizing Spanning-Tree Construction Algo rithms. Swiss Federal Institute of Technology (EPFL), School of Computer and Communication Sciences, Technical Report IC/2003/38, 2003. [30] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy-Efficient Com munication Protocol for Wireless Microsensor Networks. In Hawaii Conference on System Sciences, 2000. [31] W. B. Heinzelman. An Application-Specific Protocol Architecture for Wireless Microsensor Networks. IEEE Transactions on Wireless Communications, 1(3), 2002. [32] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy Efficient Communication Protocol for Wireless Micro-sensor Networks. In Proceedings of IEEE Hawaii International Conference on System Sciences, 2000. [33] I. Foster and C. Kesselman (editors). The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [34] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks, the Sixth Annual In ternational Conference on Mobile Computing and Networking (MobiCOM ’00), August 2000. [35] K. Kalpakis, K. Dasgupta, and R Namjoshi. Maximum Lifetime Data Gathering and Aggregation in Wireless Sensor Networks. IEEE Networks ’02 Conference, 2002. [36] K. Kalpakis, K. Dasgupta, and P. Namjoshi. Efficient algorithms for maximum lifetime data gathering and aggregation in wireless sensor networks. Computer Networks: The International Journal of Computer and Telecommunications Net working, 42(6):697-716, 2003. [37] E. Korpela, D. Werthimer, D. Anderson, J. Cobb, and M. Lebofsky. SETI@home-Massively Distributed Computing for SETI. Computing in Science and Engineering, January 2001. [38] B. Kreaseck, L. Carter, H. Casanova, and J. Ferrante. On the interference of communication on computation in java. In Proceedings o f the 3rd International Workshop on Performance Modeling, Evaluation and Optimization on Parallel and Distributed Systems (PMEO-PDS’04), Santa Fe, New Mexico, April 2004. [39] R. Kumar, V. Tsiatsis, and M. B. Srivastava. Computation Hierarchy for In- network Processing, the 2nd ACM international conference on Wireless sensor networks and applications, pages 68-77, 2003. [40] Stefan M. Larson, Christopher D. Snow, Michael Shirts, and Vijay S. Pande. Folding@Home and Genome@Home: Using Distributed Computing to Tackle Previously Intractable Problems in Computational Biology, Computational Ge nomics, Richard Grant, editor. Horizon Press, 2002. [41] E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rine hart and Winston, 1976. [42] S. Lindsey, C. Raghavendra, and K. M. Sivalingam. Data Gathering Algorithms in Sensor Networks Using Energy Metrics. IEEE Transactions on Parallel and Distributed Systems, 13(9):924-935, 2002. [43] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and Wei Hong. TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. 5th Symposium on Operat ing Systems Design and Implementation (OSDI ’02), December 2002. [44] A. Medina, I. Matta, and J. Byers. On the Origin of Power Laws in Internet Topologies. ACM Computer Communication Review, 30(2): 18-28, April 2000. 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [45] R. Min, T. Furrer, and A. Chandrakasan. Dynamic Voltage Scaling Techniques for Distributed Microsensor Networks. Workshop on VLSI (WVLSI ’00), April 2000. [46] B. M. E. Moret, D. A. Bader, and T. Wamow. High-Performance Algorithm En gineering for Computational Phylogeny. Journal of Supercomputing, 22(1):99- 111,2002. [47] K. Nakano and S. Olariu. Randomized Leader Election Protocols in Radio Net works with No Collision Detection. In International Symposium on Algorithms and Computation, pages 362-373, 2000. [48] NSF Middleware Initiative, http://www.nsf-middleware.org. [49] Open Grid Services Architecture (OGSA). http://www.globus.org/ogsa/. [50] F. Ordonez and B. Krishnamachari. Optimal Information Extraction in Energy- Limited Wireless Sensor Networks. IEEE Journal on Selected Areas in Commu nications, special issue on Fundamental Performance Limits of Wireless Sensor Networks, 22(6), August 2004. [51] G. J. Pottie and W. J. Kaiser. Wireless Integrated Network Sensors. Communi cations of the ACM, 43(5), May 2000. [52] J. Rabaey, J. Ammer, T. Karalar, S. Li, B. Otis, M. Sheets, and T. Tuan. Pico- Radios for Wireless Sensor Networks: The Next Challenge in Ultra-Low-Power Design. In Proceedings o f the International Solid-State Circuits Conference, 2002. [53] U. Rencuzogullari and S. Dwardadas. Dynamic Adaptation to Available Re sources for Parallel Computing in an Autonomous Network of Workstations. ACM SIGPLAN Notices, 36(7):72— 81,2001. [54] N. Sadagopan and B. Krishnamachari. Maximizing Data Extraction in Energy- Limited Sensor Networks. IEEE Infocom 2004, 2004. [55] C. Schurgers, O. Aberthome, and M. Srivastava. Modulation Scaling for En ergy Aware Communication Systems. In International Symposium on Low Power Electronics and Design, August 2001. [56] R. C. Shah and J. Rabaey. Energy Aware Routing for Low Energy Ad Hoc Sensor Networks. IEEE Wireless Communications and Networking Conference (WCNC), March 2002. [57] G. Shao, F. Berman, and R. Wolski. Master/Slave Computing on the Grid. 9th Heterogeneous Computing Workshop, May 2000. 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [58] M. Singh and V. K. Prasanna. Optimal Energy Balanced Algorithm for Selection in Single Hop Sensor Network. IEEE International Workshop on Sensor Network Protocols and Applications (SNPA) ICC, 2003. [59] S. Singh and C. Raghavendra. PAMAS: Power Aware Multi-Access protocol with Signalling for Ad Hoc Networks. ACM ComputerCommunications Review, 1998. [60] K. Sohrabi, J. Gao, V. Ailawadhi, and G. Pottie. Protocols for Self-Organization of a Wireless Sensor Network. IEEE Personal Communications Magazine, 7(5): 16-27, October 2000. [61] K. Taura and A. A. Chien. A heuristic algorithm for mapping communicat ing tasks on heterogeneous resources. In Heterogeneous Computing Workshop, pages 102-115. IEEE Computer Society Press, 2000. [62] D. Thain, T. Tannenbaum, and M. Livny. Condor and the Grid, in F. Berman, A.J.G. Hey, G. Fox, editors, Grid Computing: Making The Global Infrastructure a Reality. John Wiley, 2003. [63] Top-500 Supercomputer Sites, http://www.top500.org. [64] A. Wadaa, S. Olariu, L. Wilson, K. Jones, and Q. Xu. On Training a Sen sor Network. International Parallel and Distributed Processing Symposium (IPDPS’03), April 2003. [65] H. Wang, D. Estrin, and L. Girod. Preprocessing in a Tiered Sensor Net work for Habitat Monitoring. EURASIP JASP special issue of sensor networks, 2003(4):392-401, March 2003. [66] B. Wameke, M. Last, B. Liebowitz, and K. S. J. Pister. Smart Dust: Communi cating with a Cubic-Millimeter Computer. Computer, 34(1):44— 51, 2001. [67] H. Wu and J. M. Mendel. Quantitative Analysis of Spatio-Temporal Decision Fusion Based on the Majority Voting Technique. Proc. o f SPIE, vol. 5434, SPIE Defense and Security Symposium 2004, Orlando, FL, USA, 2004. [68] J. Wu, B. Wu, and I. Stojmenovic. Power-Aware Broadcasting and Activity Scheduling in Ad Hoc Wireless Networks Using Connected Dominating Sets. Wireless Communications and Mobile Computing, special issue on Research in Ad Hoc Networking, Smart Sensing, and Pervasive Computing, 3(4):425-438, June 2003. [69] Y. Yu and V. K. Prasanna. Energy-Balanced Task Allocation for Collaborative Processing in Networked Embedded System. ACM Conference on Language, Compilers, and Tools for Embedded Systems (LCTES), 2003. 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [70] F. Zhao, J. Shin, and J. Reich. Information-Driven Dynamic Sensor Collabora tion for Tracking Applications. IEEE Signal Processing Magazine, March 2002. 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A We first review some currently available algorithms for network flow maximization problems. Then we will prove the correctness and complexity bound of the proposed RIPR algorithm. A.l Algorithms for Network Flow Maximization Problems The task allocation and data gathering problems that are studied in this dissertation reduce to the flow maximization problem. Network flow maximization is a classical optimization problem with many applications, see [1]. The network flow maximization problem is to find a flow of maximum value, given a graph G(V, E) with edge capacities, a source s and a sink t. A flow is a function of the edges that satisfies the edge capacity constraint and conservation constraint (which says that whatever amount of flow comes into any node except s and t, the same amount goes out). The value of a flow is the total amount of flow out of s. 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given a graph G(V, E) and a flow / , the residual capacity of an edge (u, v) is defined as Cf(u, v) = cu„ — /(it, v) where cuv is the capacity of edge (u, v) and f(u, v) is the flow over edge (u, v). The residual graph induced by / is a graph Gf(V, Ef ) where Ef = {(u, v)\cj(u, v) > 0}. The first maximum flow algorithm was the augmenting path method [26]. An aug menting path is a path in Gf from s to t. The augmenting path method says that if we keep pushing flow from s to t along augmenting paths, the maximum flow will be obtained when no augmenting paths exist any more. This is called a method instead of an algorithm because there are multiple implementation of the method, resulting in different complexities. The method itself, if choosing an arbitrary augmenting path to push the flow, is pseudo polynomial. When choosing the shortest augmenting path, it becomes the Edmonds-Karp algorithm and runs in 0(\V\ ■ \E\2) time. When edge ca pacities are integers and the path search is restricted to edges with capacity larger than a scaling factor A (A will be reduced by a half when there are no augmenting paths left with capacity A), it becomes the Capacity Scaling algorithm[22], which runs in 0(\E\2logU) time where U is the maximum edge capacity. Other implementations of the augmenting paths method include the blocking flow method [28], the combination of blocking flows and capacity scaling, etc. The admissible path method is a variant of the augmenting path method, which maintains a distance function on the residual graph and uses the distance function to find the augmenting paths. Given a graph G(V,E) and a flow / , a distance function 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is defined on the vertices such that d(t) = 0 and d{u) < d(v) + 1 for any edge (u, v) E Gf. The basic admissible algorithm runs in 0(\V\ ■ \E\2) time. With the help of an advanced data structure, dynamic trees, the execution time can be reduced to 0(\V\ • \E\ log\E\) time. The pre-flow push algorithm also maintains a distance function (sometimes also denoted as a height function) for each vertex. A pre-flow is a flow function in which the conservation constraint can be violated. The pre-flow push method starts with a pre-flow and converts it to a flow function with a maximum value. Different from the admissible method, the pre-flow push algorithm does not search for augmenting paths. If a node has excessive incoming flow, it finds a neighbor to send the flow according to the distance function. Flow is pushed in an asynchronous fashion. Upon termination (when none of the nodes has excessive flow), a maximum flow is found. The basic pre flow push algorithm terminates in 0 (|U | x \E\2) time. Because multiple nodes may have excessive flow simultaneously, the execution time of the algorithm can be further reduced by specifying the rules in selecting the unbalanced nodes. For example, the running time of the lift-to-the-front algorithm is 0 (|U |3). A.2 Proof of the Correctness and Complexity Bound of the RIPR Algorithm In the following, we prove Theorem 3. 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Before presenting the proof, we first briefly re-state some notations widely ac cepted for network flow problems. Given a direct graph G(V, E), function / is called a flow if it satisfies the three constraints in the statement of Problem 1. Given G(V,E) and flow / , the residual capacity Cf(u, v) is given by cuv—f(u, v), and the residual net work of G induced by / is Gf(V, Ef), where Ef = {(u,v)\u V,Cf(u,v) > 0}. If a Push(u,v) operation removes (u, v) from Ef (i.e. Cf(u,v) = 0 after the operation), it is a saturated push, otherwise it is a non-saturated push. Given a direct graph G(V, E), a feasible function f to the Relaxed Flow Problem is called a relaxed flow in graph G. Lemma 1. During the execution of the RIPR Algorithm, fo r any u G V, h{u) never decreases. Proof: h(s) is only changed by the Adaptation operation, during which h(s) is in creased by 2\V\. When u ^ s, h{u) is only changed by Relabel(u). Relabel(u) is applied when e(u) > 0 and h(u) < h(w) for w e {w\cuw — f{u,w) > 0}. And h(u) = minu ,e{ w|C u u )_/(U ) W )>0} h(w) + 1 after Relabel(u). Hence h(u) is increased at least by 1. □ Lemma 2. During the execution o f the RIPR Algorithm, for each u G V s.t. e(u) > 0, there exists a simple path in E f from u to a node v s.t. e(v) < 0. Proof: Suppose e(u) > 0. Define V = {w\w € V and there exists a simple path from u to w} Note that u € V. We also define V = V — V . 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For sake of contradiction, suppose e(w) > 0 for any w G V. We claim that if x G V and y G V, then f(x, y) < 0. Otherwise if f(x, y) > 0, then c/(y, x) = cyx — f(y, x) = cyx + /(x , y) > 0, which means x can be reached from y in Ef, hence there exists a path from u to x in Ef. But this contradicts the choice that x EV. It is fairly easy to show that Y,wev e(w) = J2xev, yev f(x ’ v)- Hence 5Zwev e(w) — But this contradicts the assumption that e(w) > 0 for each w G V and e(u) > 0 . □ Lemma 3. During the execution of the RIPR algorithm, for u G V, if e(u) > 0, then either Relabel(u) can be applied, or there exists a node v G V s.t. Push(u, v) can be applied. Proof: e(u) > 0 means that node u has more incoming flow than out going flow, which further means that there exists at least one node w s.t. f(w, u) > 0. Obviously, for this node w, we have cuw — f(u,w) = cuw + f(w,u) > 0, which means that {w\cuw ~ /(« , w ) > 0 } ^ 0. If 3v G V s.t. cuv — f(u, v) > 0 and h(u) > h(v), then Push(u, v) can be applied; otherwise, h(u) < h(v) for each v e {v\cuv — f ( u , v) > 0}, which means Relabel(u) can be applied. Of course, when there does not exist any node v s.t. cuv — f(u, v) > 0 and h(u) > h(v), then we must also have cuv — f(u, v) < 0 for each v s.t. h(u) > h(v). But we are not interested in this scenario. 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. □ Lemma 4. During the execution o f the Incremental Push-Relabel Algorithm, after the initialization phase and if no adaptation can be applied, h(u) < max(h(v) + l,h(s) — |V| — 1) fo r each (u ,v ) G Ef, h(u) < h(s) 4- \V\ — 1 fo r each u G V Proof: We prove by induction on the number of adaptation operations. • Base case: Before any changes occur in the system, no adaptation operation can be applied. At this stage, the RIPR algorithm performs the exact operations as the Push- Relabel algorithm (this can be seen easily by comparing the details of the push and relabel operations executed by the two algorithms). For the Push-Relabel algorithm, it has been proven that h(u) < h(v) + 1 for each (u , v) G Ef, h{u) < 2\V\ — 1 for each i / G f Consequently, we have h{u) < ma x(h(v) + 1, h(s) — \ V\ — 1) for each (u, v) G Ef, h(u) < h(s) + \V | — 1 for each t i G f for the RIPR algorithm before any adaptation operation is applied. 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Induction step: Suppose the adaptation has been applied n — 1 times and we have h(u) < max(/i(u) + 1, h(s) — \V\ — 1) for each (u, v) £ Ef, h(u) < h(s) + \V\ — 1 for each u £ V , then a new capacity change occurs on edge (u, v) and the nth adaptation, Adaptation(u, v), is applied. 1. We first show that h{u) < max(h(v) + 1, h(s) — \V\ — 1) for each (u, v) £ Ef after Adaptation(u, v). Considering Adaptationiu, v), if either scenario (a) or (c) occurs, no resid ual edge is added or removed, no node w £ V has its h(w) changed, either. If scenario (d) occurs, the change in the system removes (u, v) from Ef and hence the corresponding constraint on h(u) and h{v). If scenario (b) occurs, (u, v) is added to the Ef. By induction assumption, h{u) < h{s) -1 - |V| — 1 before the adaptation operation. Because h(u) does not change and h(s) increases by 2\V\ after the operation, h(u) < h(s) — 2|V\ — 1 < h{s) — \V\ — 1 after the adaption operation. In summary, after the adaptation operation, h{u) < max(h(v) + 1, h(s) — \V\ — 1) for each {u, v) £ Ef. Adaptation{u, v) changes the values of some e(w), allowing new Push and Relabel operations to be applied. Yet these operations preserve the 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. property that h(u) < ma x(h(v) + 1, h(s) — \V | — 1) for each (it, v) G Ef. This is shown below: (a) Suppos&Push(u, v) is applied. This may add edge (v , it) into Ef or remove the edge (u, v) from Ef. In the former case, we have h(v) < h(u) because otherwise the push will not be applied. In the latter case, the removal of (u,v) from Ef removes the corresponding constraint on h(u) and h(v). In both cases, we still have h(u) < m ax(/t(i;)+l, h(s) — \V\ — 1) for any (u,v) G Ef. (b) Suppose Relabel(u) is applied. For a residual edge (it, v) that leaves it, we have h(u) = m in^g^K u^e^} h(w)+ 1 after the Relabel operation, which means h(u) < h(v) + 1. For a residual edge (w,u) that enters it, h(w) < max(/t(it) + l,h(s) — \V\ — 1) before the relabel operation. According to Lemma 1, h(w) < max(/t(it) + 1, h(s) — \V| — 1) after the relabel operation. Therefore, after a relabel operation, we have h (it) < m ax(/i(i;)+l, h(s) — \V\ — 1) for any (it, v) G Ef. 2. Now we need to show that h(u) < h(s) + | V| — 1 for each u E V. Let V denote the set of it G V s.t. there exists a simple path from it to s in E f . V = V - V . (a) For any node it G V, suppose the simple path to s in Ef is {it, it!,..., it/;}, where Uk = s and k < \V\ — 1. We have 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. h(u) < max(h(ui) + 1, h(s) — |L| — 1) h{ui) < max(/i(u2) + 1, h(s) — |V| — 1) h{uk-i)< max(h(uk) + 1, h(s) - \V\ - 1) Combining these inequalities, we have h(u) < max(h(uk) + k, h(s) — | V| — 1 + k — 1) = max(/i(s) + k, h(s) — |V| + k — 2) < max(h(s) + |V| — 1, h(s) — 3) = h(s) + |V| - 1 (b) For any node u G V, according to Lemma 2, there exist a simple path in Ef from u to a node w s.t. e(w) < 0 and w ^ s. Suppose the simple path is {u, ui, ..., W fc } , where Uk = w and k < \V\ — 1. We have h(u) < max(/i(ui) + 1, h(s) — \V\ — 1) h(ui) < max(/i(rt2) + 1, h(s) — |V| — 1) h(uk- 1)< max(h(uk) + 1, h(s) - \V\ - 1) Note that e(u) > 0 immediately after the initialization for each u e V. The only operation that can bring e(u) below 0 is the adaptation operation (when scenario (d) occurs). Suppose e(u) becomes negative as the result of mth adaptation operation (m < n). Since the Relabel 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. operation (which is the only operation that can increase the value of h(u)) is applied only if e(u) > 0, then e(u) < 0 means that h(u) has not been increased after the m th, and hence the n th adaptation operation. Therefore, h(u) < h(s) + \ V\ — 1 before the nth adaptation means that h{u) < h(s) — \V\ — 1 thereafter, since h(s) is increased by 2\V\ after the adaptation. Combining these inequalities, we have h(u)< max(/i(uf c ) + k , h(s) — \V\ — 1 + k — 1) = max(/i(Vr s) + k, h(s) — \V\ — 2 + k) < max(/i(s) - \ V \ - 1 + \V\ - 1, h(s) - \V\ - 2 + \V\ - 1) < h{s) + \V\ - 1 Since V UV = V, we claim that h(u) < h(s) + \V \ — 1 for any u £ V. □ Corollary 1. During the execution of the RIPR algorithm, for any node u G V, if e(u) < 0, then h(u) < h(s) — |V| — 1. The proof of Corollary 1 is included in the proof of Lemma 4. Lemma 5. During the execution of the Incremental Push-Relabel algorithm, after the initialization phase, and if no adaptation can be applied, then there is no path from s to t in Ef. 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Proof: For sake of contradiction, suppose there exist a path {s,u-i,.... ukl t] in Ef. Without loss of generality, this is a simple path and k < \V\ — 2. According to Lemma 4, h(s) < max(/z(ui) + 1, h(s) — \V\ — 1) h(ui) < ma.x(h(u2) + 1, h(s) — | V| — 1) h(uk- 1)< max(h(uk) + 1, h(s) — |L | — 1) h{uk) < max(h(t) 4-1, h(s) — \V\ — 1) Hence, h(s) < max(h(t) + k + 1, h(s) — |L| — 1 + k) Since node t is never relabeled, h(t) remains 0. Therefore, h(s) < m ax(|F| — 1 ,h(s) — 3) This is impossible, because h(s) > \V\ — 1 and h(s) > h(s) — 3. □ Lemma 6. During the execution of the RIPR algorithm, after the initialization phase, and if no adaptation can be applied, then fo r each u e V — {s, t} such that that there exists a simple path from s to u in Ef, e(u) > 0. Proof: Immediately after the initialization phase, e{u) > 0 for each u e V — {s, t}, the lemma holds trivially. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. During the execution of the algorithm, when no adaptation can be applied, one of the following two conditions must be occurring: (1) no edge capacities have ever changed, (2) some edge capacities changed, the corresponding adaptations have been performed, and no further edge capacity changes have occurred yet. In both condi tions, f(s,u) is first set to csu. (In condition 1, the value setting is performed in the initialization phase; in condition 2, the value setting is performed in the adaptation operations). The value of /(s , u) is then (possibly) modified by some push operations either after the initialization phase or after the adaptation operations. Suppose for the sake of contradiction that there exists a node u & V — {s,t} such that e(u) < 0 and there exists a simple path (s, ?j:, ..., u} in Ef. Without loss of generality, this is a simple path and k < \V\ - 2. According to Lemma 4, h(ui) < max(/i(rt2) + 1, h(s) — \V\ — 1) h(uk_i) < max(/i(«f c ) + 1, h(s) - |V| - 1) h(uk) < max(/i(u) + 1, h(s) — \V\ — 1) According to Corollary 1, h(u) < h(s) — \V\ — 1 since e(u) < 0. Combining these inequalities, we can see that h(ui) < max(h(u) + k , h(s) — |L| — 2 + k) < max(h(s) — |L| — 1 + k, h(s) — |V| — 2 + A ;) 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. < h ( s ) - \ V \ - l + \V \-2 < h(s) On the other hand, consider the first hop (s,«i) along this path, (s, ui) € Ef implies that /(s , «i) < cSUl. Recall that the value of /(s, v,\) is set to csui immediately after the initialization phase and each adaptation operation. The only operation that can reduce the value of f(s,ui) is a push from u\ to s. However, Push{ui,s) is applied only when h(ui) > h(s). This contradicts the claim h(ui) < h(s) that we just derived. □ Similar to the standard flow problem, for the relaxed flow problem, a cut is defined as a binary partition (L, R) of V such that LU R = V, L n i? = 0, s £ f and t G R. The capacity clr of a cut (L, R) is defined as clr = E u g l veR Cuv' The next lemma shows that the value of a relaxed flow cannot exceed the capacity of any cuts. Lemma 7. Given graph G(V, E) with source s and sink t , a relaxed flow f , and an arbitrary cut (L, R) ofG, Yuev / ( s> u) ^ clr- Proof: we have e(u) < 0 for u e V — {s, t}. Therefore E ueL-{ s} e(«) = Yvev,u€L-{s} f(viu) < 0 f { v i u ) “ t " Yiv£R,ueL-{s} f (Vi U) — ® = * £ « € £ - { * } / ( S > U) + E v€L -{s},u eL -{s} / ( U> U) + Y v W - { s } f (V > U) < 0 =^ " Yhu£L— {s} / (s > u ) + Et)GL-{s},uGL-{s} f ( V’ U) — Yiv€R,uet-{s} f W ) 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. = * H u e L - { s } f ( s ’u ) < I 2 v e R , u e L - { s } f ( u ’ v ) => T , u e l - { s} / ( s > u ) + Y , u£ R f ( S . U) < Y ,v e R ,u e L -{ s } / K v ) + E ueK / ( s , « ) =^’ X^uGV— { s} f(s,U) < ^2iu£L,v£R f (^’ ^0 Since /(it, d) < cu„ for each u,v e V , additionally, because /(s , s) = 0, we have ^2vev f (s> v) = X)uev-{s} / (s > u) — Y1u£L,v£R C u v = °LR □ Lemma 7 states a property of the relaxed flow (not a property of the RIPR algo rithm). This property is used to prove Lemma 8, which shows that the RIPR algorithm will find the maximum relaxed flow if it terminates. After proving Lemma 8, we will show that the RIPR algorithm indeed terminates. Lemma 8. If the RIPR terminates, it finds the maximum relaxed flow. Proof: According to Lemma 3, if the algorithm terminates, then e(u) < 0 for each u G V — {s, t}. Hence / is a flow if the algorithm terminates. Given such an / , we construct a cut of G as follows: L = {u G V\ there exits a simple path from s to u in Ef} R = V - L 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. According to Lemma 6, e(u) > 0 for u G L — {,s}. Note that e(u) < 0 for each u G V — {s,t} upon termination of the algorithm. Hence e(u) = 0 (i.e. YhveV f{viu) = 0) f°r each u G L — {,s}. Then it is easy to show that Eu G L /(S > U) = Eu G L -{ s } /(S > U) = E „eL-{s},veR /(M > V)' Therefore, J 2 v e v f ( s i v ) = E veL / ( s > v ) + E veR / ( s >v ) = E u eL -{s},,e R /K + E„eii /(s> u) = J2ueL,veRf ( u’v) We claim that f(u,v) = cuv for each u e L and v G R, because otherwise f(u,v) < cuv implies that edge (u,v) G Ef, hence v can be reached by s in Ef. But this contradicts the definition of R. Therefore, E , ev f ( si v) = TlueL,veRC uv = °LR According to Lemma 7, such a relaxed flow / is a maximum relaxed flow. □ Now we show that the RIPR algorithm indeed terminates. Lemma 9. If the adaptation is applied n (n > 0) times, then the number of Relabel operations that can be performed is less than (2n + 2)|V |2. Proof: If the adaptation is applied n times, then h(s) = (2n + 1)|V|. According to Lemma 4, h(u) < h(s) + |V| — 1 = (2n + 2)|V| — 1 for each u G V. Each time Relabel(u) is applied, h(s) is increased at least by 1. Since h(s) = 0 initially, 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Relabel(u) is applied at most (2n + 2)\V\ — 1 times. There are |V| nodes in the system, hence then the total number of Relabel operations that can be performed is at most ((2n + 2)\V\ — 1)|V|, which is less than (2n + 2)|L |2. Lemma 10. If the adaptation is applied n (n > 0) times, then the number of saturated push operations that can be performed is less than (n + 1)|V| • \E\. Proof: Consider edge (it, v) G E. Suppose a saturated push P ush(u, v) is first ap plied. For a second saturated push to be applied over (it, v), Push(v,u) must be applied before the second saturated push. Because h(u) > h(v) for the first push (oth erwise the first push will not be applied), then h(v) must be increased at least by 2, otherwise h{v) < h{u) then Push(v, u) will not be applied. Similarly, h(u) must be increased at least by 2 for the second saturated push P ush(u, v) to occur. So on so force. Because h (it) < h(s) + \V\ — l = (2n+2)\V\ — 1, h(u) andh(v) cannot increase to infinity. It is easy to show that saturated push can occur at most ((2n + 2) | V| — l)/2 times for edge (u, i;). There are |f?| edges in the graph. The total number of saturated push operations is less than ((2n+2)|V | — l)/2-\E\, which is less than (n + l)|V |-|.E |. □ Lemma 11. If the adaptation is applied n (n > 0J times, then the number of non saturated push operations that can be performed is less than (n2 + 2n + 1) • (41 V " |3 + 2\VP\E\). Proof: 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Define a potential function $ = J2e(u)> o Mm )- $ = 0 initially. Obviously, $ > 0. According to Lemma 4, h(u) < h(s) + |V| — 1 = (2n + 2)|V| — 1, hence a relabel operation increases $ by at most (2n + 2)\V\ — 1. According to Lemma 9, there can be at most (2n + 2) | V " |2 Relabel operations. The increase in $ induced by all relabel operations is at most ((2n + 2)\V\ — 1) • (2n + 2 )|F |2. A saturated push Push(u, v) increases $ by at most (2n + 2) | V| — 1 since e(v) may become positive after the push, and (2n + 2)|L| — 1 is the highest value that h(v) can be. According to Lemma 10, the increase in $ induced by all saturated push is at most ((2n+2) | V \—1) ■ ((ra+1) | V | • |E \). For a non-saturated push P ush(u, v), e(u) > 0 before the push and e(u) = 0 after the push, hence h(u) is excluded from $ after the push. If e(v) > 0 after the push, $ is decreased at least by 1 because h(u) — h(v) > 0. If e(v) < 0 after the push, then $ is decreased by h(u) > 1 . Therefore, the total increase in $ is at most ((2n+2)| V| — 1) • (2n+2)\V\2 + ((2n+ 2)\V\ — 1) • ((n + 1)| V| • |E|) < (n2 + 2n + l) • (4|F |3 + 2\V\2\E\), while each non saturated push decreases $ at least by 1. Therefore, the total number of non-saturated push operations that can be performed is at most (n2 + 2n + 1) • (4|V|3 + 2|1/|2|£ |). □ Theorem 3 is recited below. Theorem 8. Given graph G(V,E) with root s and sink t, and assume that no ca pacity changes occur after the nth adaptation operation, then the number of ‘ push ’ and ‘relabel’ operations executed by the RIPR algorithm is bounded from above by 178 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0(n2 ■ \V\2 ■ \E\), where \ V | is the number of nodes and \E\ is the number of edges in the graph. Additionally, the RIPR algorithm finds the optimal solution to the relaxed flow problem when no ‘ push’ or ‘relabel’ operation can be performed. Proof: Immediate from Lemma 8, 9, 10, and 11. □ 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dynamic logic synthesis for reconfigurable hardware
PDF
Energy -efficient information processing and routing in wireless sensor networks: Cross -layer optimization and tradeoffs
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
Improving memory hierarchy performance using data reorganization
PDF
Adaptive packet scheduling and resource allocation in wireless networks
PDF
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Dynamic voltage and frequency scaling for energy-efficient system design
PDF
A 1.2 V micropower CMOS active pixel sensor
PDF
Hierarchical design space exploration for efficient application design using heterogeneous embedded system
PDF
Compression, correlation and detection for energy efficient wireless sensor networks
PDF
Generation, filtering, and application of subcarriers in optical communication systems
PDF
Experimental demonstration of techniques to improve system performance in non-static microwave frequency analog and digital signal transmission over fiber -optic communication systems
PDF
A thermal management design for system -on -chip circuits and advanced computer systems
PDF
Energy efficient hardware-software co-synthesis using reconfigurable hardware
PDF
An efficient design space exploration for balance between computation and memory
PDF
Investigation of degrading effects and performance optimization in long -haul WDM transmission systems and reconfigurable networks
PDF
Data compression and detection
PDF
Adaptive video transmission over wireless fading channel
PDF
High performance components of free -space optical and fiber -optic communications systems
PDF
Efficient PIM (Processor-In-Memory) architectures for data -intensive applications
Asset Metadata
Creator
Hong, Bo
(author)
Core Title
Adaptive task allocation and data gathering in dynamic distributed systems
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Prasanna, Viktor (
committee chair
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-599591
Unique identifier
UC11341405
Identifier
3219874.pdf (filename),usctheses-c16-599591 (legacy record id)
Legacy Identifier
3219874.pdf
Dmrecord
599591
Document Type
Dissertation
Rights
Hong, Bo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical