ADAPTIVE TASK ALLOCATION AND DATA GATHERING IN DYNAMIC DISTRIBUTED SYSTEMS by Bo Hong A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ( ELECTRICAL ENGINEERING) August 2005 Copyright 2005 Bo Hong Acknowledgments I would like to thank my advisor Dr. Viktor K. Prasanna for his guidance, encourage ment and support throughout my PhD study. I would like to thank my wife Hongwei Wu. Her support and love have been a source of power for me to complete this dissertation. Right: the corresponding network flow representation.............................................................................................. 34 3.2 An example of the relaxed flow maximization p ro b le m ...................... 38 3.3 Performance of the proposed task allocation protocol with uniformly distributed system topologies. The x-axis represents the 900 experiments 50 3.4 Performance of the proposed task allocation protocol with power law distributed system topologies. The x-axis represents the 900 experiments 51 3.5 Histogram of maximum length of consumed task buffer. The y — axis represents the frequency............................................................................. 57 3.6 Impact of buffer size on the throughput of the sy ste m .......................... 58 3.7 Adaptation to changes in the sy ste m ....................................................... 59 3.8 Impact of control message transfer cost ................................................. 59 4.1 Adaptation performed in batch m o d e....................................................... 79 4.2 The maximum and mean cost per node for executing the RIPR algorithm 80 4.3 Execution time of the RIPR algorithm .................................................... 82 4.4 Start-up time of the proposed p ro to co l.................................................... 82 4.5 Illustration of the start-up and the adaptation of the proposed proto col. Framed block (a) is zoomed in figure 4.6(a), framed block (b) is zoomed in figure 4.6(b).............................................................................. 84 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6 Detailed illustration of the start-up and the adaptation of the proposed pro to co l..................................................................................................... 87 5.1 Reduction of TMIP to a network flow problem. Sensor nodes are de noted by circles. The square in (a) denotes the event of interest. Dotted lines in (a) represent the collection of data from the environment. The upper square in (b) denotes the newly added pseudo source s'. The lower square in (b) denotes the pseudo sink t'. Weight of the nodes and links are omitted in this figure............................................................ 97 5.2 Impact of wmax on system th ro u g h p u t.................................................. 105 5.3 An example illustrating the poor performance of path based greedy heuristic..................................................................................................... 108 6.1 An example of the MDG p ro b le m ......................................................... 120 6.2 An illustrative example of the RFEC a lg o rith m ................................... 131 6.3 Flow graph generated by the RFEC algorithm ...................................... 136 6.4 Reconstructing the flow for each round. Step 1: split the edges out of s. 137 6.5 Reconstructing the flow for each round. Step 2: find the paths............138 6.6 Example for worst case performance com parison......................... 142 6.7 Example demonstrating the non-optimality of the shortest path heuristic 146 6.8 The histogram of q over all the simulations. The value of q for each simulation is calculated as the ratio between the achieved system life time and the optimal lifetime. The y-axis has been normalized to the total number of simulations....................................................................... 149 6.9 The impact of the number of nodes on system lifetime. The y-axis has been normalized to the optimal system lifetime...................................... 149 6.10 The impact of communication radius on system lifetime. The y-axis has been normalized to the optimal system lifetime................................ 149 6.11 The impact of the number of source nodes (normalized to | Vc|/ 1 V |) on system lifetime. The y-axis has been normalized to the optimal system lifetime........................................................................................................ 150 6.12 The impact of em ax on system lifetime. The y-axis has been normal ized to the optimal system lifetime........................................................... 150 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.13 The impact of nmax on system lifetime. The y-axis has been normal ized to the optimal system lifetime............................................................ 150 v iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Distributed heterogeneous systems have emerged as attractive platforms for high performance computing. In systems such as computational grids or peer-to-peer sys tems, geographically distributed resources are connected through local and/or wide area networks and utilized in a coordinated manner, thereby fulfilling the compu tational needs of many complex applications. However, efficient utilization of such resources is challenging task due to several specific features of such resources. Typi cally, the computers (network links) are installed with different hardware and software systems and operate at different speeds. Additionally, the available compute (commu nication) capabilities of the resources may vary during run-time due to the sharing of resources among multiple users. Consequently, the execution of the applications must be adaptive to such dynamic run-time performance variations. In this dissertation, we study the adaptive computation of a large set of independent tasks. By modeling computation as a special type of data flow, we show that the system throughput can be represented as the network flow in a corresponding graph. Using this flow repre sentation, we have developed the Relaxed Incremental Push-Relabel (RIPR) algorithm to allocate the tasks. The RIPR algorithm is executed in a distributed fashion and is proven to maximize the system throughput with a polynomial number of local opera tions. More importantly, the allocation of tasks is adapted on-line in this algorithm by ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. responding to the run-time performance changes in the computers and network links. The algorithm is then approximated as a distributed protocol to coordinate the com puters. Resource heterogeneities and dynamic run-time performance variations are not specific to distributed high-performance computing systems. Other systems, especially the emerging networked sensor systems, share the same properties at the application layer. Such a similarity leads to the investigation of run-time adaptive executions of applications in networked sensor systems. In this dissertation, we also establish the connection between the distributed computing systems and networked sensor systems. We show that several classes of networked sensor system applications (data gathering and in-network processing problems), either with or without energy constraints, reduce to the same flow problem as the task allocation problem for distributed computing sys tems. This leads to distributed and adaptive solutions for these applications by using the proposed RIPR algorithm. The performance of these solutions has been verified through simulations. Another problem addressed in this dissertation is the maximiza tion of the fife time for data gathering in networked sensor systems. A strongly polyno mial algorithm is developed to maximize the life time. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self- stabilizing spanning tree construction and shortest path routing methods. Although the heuristic cannot not guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction With the vast development of computer hardware and networking technologies, high performance computing platforms have evolved from cray-style vector processors to massively parallel processing systems and to clusters of stand-alone computers. Such a trend has been witnessed by the Top-500 supercomputer list [63] since 1993. Despite the tremendous changes in the platforms, resource sharing and aggregation are the two properties that have been well kept and consistently enhanced. Following the trend of resource sharing and aggregation, distributed heterogeneous systems have emerged as attractive platforms of high performance computing. In such systems as the computational grids or peer-to-peer systems, distributed resources such as workstations and supercomputers are connected through local and/or wide area net works. By utilizing these distributed resources in a coordinated manner, such systems can meet the computational demands of many complex applications. Research in the computational grids is gaining momentum worldwide. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These distributed systems have quite a few important features. Typically, the soft ware and hardware characteristics of the individual computers are different. This is largely due to the fact that the resources belong to different administrative domains, or the computers have been purchased during a relatively long time period. Besides the hardware, the computers may be installed with different operating systems, com pilers, or kernel libraries. Such heterogeneities in the resources have brought various challenges to application design which will be discussed in detail in Section 2.3. Gen erally speaking, to achieve the optimal system performance, the application needs to maximize the efficiency of its software routines for each individual computer, as well as schedule the distributed computers to minimize the overall execution time. Another important property of such distributed systems is the lack of a centralized controller. Although some studies have proposed to maintain a (global) resource direc tory, bookkeeping the status of the individual resources, such a directory only serves as an observer of the whole system. The distributed resources belong to different ad ministrative domains, each having its own local users as well as local job submission and management policy. Consequently, when such resources are shared across the system, they inevitably exhibit dynamic load characteristics if we look at the compute power and communication bandwidth available to a specific application. This leads to another set of challenges to application design: run-time performance variations of the resources need to be considered when optimizing the system performance. This Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. issue has been studied from various aspects. The most studied method is the migra tion framework (will be discussed in more detail in Section 2.2), which checkpoints, stops, moves, and restarts an application if a better set of resources are found for the application. Despite these challenges, many applications in which large processing problems can easily be divided and solved independently have already been taking the great advantages of distributed computing systems. These include Monte Carlo simula tions and parameter sweep applications such as the study of neuromuscular transmitter release, the modeling of photochemical pollution, high energy physics, fluid dynam ics, etc. Internet based computing applications fall into the same category, maybe even more well know. For example, the SETI@home project harnesses the computing power of over 500,000 PCs that are connected to the Internet. Other similar projects include Folding@home, Drug design optimization, Human protein folding etc. This dissertation studies the computation of a large set of independent tasks on dis tributed heterogeneous computing systems, which models the computation paradigm of the above applications that are divided into subproblems and then solved indepen dently. The optimization of such computation paradigm, in its mathematical form, is similar to the scheduling of independent tasks to heterogeneous platforms. Various studies have addressed such a scheduling problem with the objective of minimizing the make-span (the overall execution time of all the tasks). This scheduling problem, in its general form, is known to be NP-complete. Hence the focus of researches along 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. this direction is to design efficient scheduling heuristics. The solution technique pre sented in this dissertation differs from the heuristic based methods in two aspects. First of all, the objective is to maximize the system throughput rather than minimizing the make-span. Because the system may not work at full speed when the application starts or gets close to completion, maximizing the throughput is not strictly equivalent to the minimizing of the make-span. However, if there are a large set of tasks to com pute (which is tree for almost all the applications that seek the help from distributed computing platforms), or when the application is of streaming style (the application virtually never ends such as SETI@home), then throughput becomes a meaningful, and some times the only feasible, performance metric. Compared with the heuristic based methods whose performance is often verified through experiments, The second difference is that the proposed solution technique is distributed and adaptive. Each node makes local decisions and actions only based on its own information and the information from its direct neighbors. Additionally, the proposed solution technique is able to adapt to changes in the computation and communication capabilities of the resources. By modeling the computation at the nodes as a special type of data flow, the throughput maximization problem is reduced to a flow optimization problem, which is then utilized to develop a distributed algorithm to coordinate the nodes. In this algo rithm, every node determines its own activities (send, receive, and compute) based on the information about itself and its neighbors. Such a distributed cooperation among 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the nodes is proven to maximize the system throughput. Additionally, the algorithm is able to adapt to changes in the computation and communication capabilities of the nodes. The optimality and adaptivity of the algorithm are verified through extensive simulations on both uniformly distributed and power-law distributed systems. The re sults show that the proposed algorithm, when approximated as a distributed protocol, achieves close to optimal system throughput and outperforms the first-come-first-serve greedy heuristic. Start-up time and adaptation time of the distributed protocol is ex perimentally shown to be of the same order as the system diameter. Resource heterogeneity and run-time performance changes are not the unique prop erty of distributed computing systems. Other systems, especially the emerging net worked sensor systems, are sharing the same properties at the application level. Re cent advances in micro-electro-mechanical systems technology and wireless commu nications have enabled the development of low cost and low power-consuming sensor nodes. Compared with traditional sensors whose only responsibility is to collect infor mation about the phenomenon and send the information to a central processor/station for further processing, a networked sensor system typically consists of a large number of sensors, each of which is able to sense and process data, and communicate with other sensors. The sensors can perform some computations and transmit only required and partially processed data. The communication capability allows such data to be relayed in the networked, and eventually routed to a powerful base station for fur ther processing and decision making. The placement of sensors does not need to be 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. carefully engineered, which allows random deployment on possibly harsh or hazard terrains. These features make the networked sensor systems suitable for a wide range of applications such as environmental monitoring [65], intrusion detection [15], target tracking and identification [70], etc. From networking point of view, a networked sensor system is very similar to an ad hoc network. However, networked sensor systems have some significantly different features. Typically, sensor nodes are powered by batteries. Due to the large scale and possibly harsh working terrains of such networks, replenishing energy by replacing batteries for the sensor nodes is infeasible. Various techniques have been proposed to improve the energy-efficiency of WSNs. For example, the voltage scaling technique explores the trade-off between supply voltage and processing power. The rate adap tation technique explores the trade-off between radio transmission power and energy consumption. In general, all such techniques achieve energy efficiency at the cost of reduced capabilities of the sensor nodes. More likely, these techniques will be applied on the fly while the nodes are sensing, processing, or communicating. This has two implications. First, the nodes may have different amount of power supply and, accord ingly, different capabilities. Second, this leads to run-time performance changes of the nodes. Additionally, sensor nodes may fail during run time, which leads to frequently changing network topologies. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A fundamental operation of networked sensor systems is to sense the environment and eventually transmit the sensed (or processed) data to the base station. In this dissertation, two classes of sensing and processing problems are solved. The first class of problems considers data gathering, which includes store-and- gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical appli cations. The focus is to maximize the throughput or volume of data received by the base station. By modeling the energy consumption associated with each send and re ceive operation, we formulate the data gathering problems as a constrained network flow optimization problem where each each node u Is associated with a capacity con straint wu, so that the total amount of flow going through u (incoming plus outgoing flow) does not exceed wu. This constrained flow problem in turn reduces to a standard network flow problem, which leads to an adaptive and distributed solution to the data gathering problems, using the algorithm developed for task allocations in distributed computing systems. The second class of problems considers the processing of the data collected by the sensors. The trade-offs between communication and computation energy [51, 60] have shown that in-network processing of the sensed data is more energy efficient than transferring the raw data to a powerful base station for processing. In-network processing leads to prolonged lifetime of the system, which is one of the most critical factors in the design and deployment of networked sensor systems. The focus in this 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dissertation is to study the performance of those applications that require in-network processing of raw data blocks. By modeling the processing of data blocks as a special type of data flow, we reduce the in-network processing problem to a network flow optimization problem, which again leads to an adaptive and distributed solution using the algorithm developed for computer systems. A third data gathering is also studied in the dissertation. The system is assumed to operate in rounds where a subset of the sensors generate a certain number of data pack ets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system life time in terms of the number of rounds the sys tem can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing span ning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available informa tion. No global synchronization is needed. To enable the utilization of the resources in such complicated systems, tremendous research efforts have been dedicated to the construction of software systems that can provide secure and user-friendly accesses, resource allocation and arbitration, and co ordination among various administration domains. A layered model of the Grid has been widely accepted by the Grid community after more than a decade’s researches. 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The bottom layer of the model consists of the hardware resources that form the Grid physically. The resources include computers, networks, data storages, instru ments, visualization devices, etc. With the vast development of the networking and computation techniques, capabilities of the resources have been growing at an accel erated speed. For example, the current Grid testbed in the US utilizes the Internet2 Abilene network, which has a backbone performance of about 10G bps, and the speed is expected to continue increasing. Moreover, many individual nodes in a Grid are themselves high-performance parallel computers. Resources in the Grids are intrinsically distributed, heterogeneous and dynamic. They may have different software and hardware characteristics (in terms of operating systems, compilers, processor architectures, networking technologies, etc). Due to their heterogeneities, the resources inevitably operate at different speed (in terms of process power measured in MFlops or in terms of communication power measured in bps and end-to-end latencies.) Additionally, availabilities and capabilities of the resources are highly dynamic since (1) new resources may be added to the resource pool (2) current resources may be preempted or removed from the pool and (3) the resources are shared in a multi-user environment. The second layer is the infrastructure that provides a uniform view of (and accesses to) the heterogeneous resources. The core operations at this layer are to create, manage, discover, and destroy Grid Services, which precisely define the interfaces between ap plications, databases, resources, and any other Grid artifacts. While the Grid Services 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. enable interoperability for resource sharing, other components at this layer provide se curity, data management, resource management, and directory services (bookkeeping information about the available resources on the Grid and their status). Research ef forts such as NSF’s Middle-ware Initiative [48], OGSA [49], and the Globus toolkit have been addressing the issues at this layer. The next layer consists of software packages on top of the infrastructure layer. It is the middle-ware that abstract away some complex operations and hand-shaking protocols (e.g. authentication, accounting, file transfer, etc.) in the infrastructure. This middle-ware layer enables the end users and applications to efficiently program and utilize the full-fledged infrastructure layer. Key components at this layer include kernel libraries, scheduling software, etc. The top-most layer consists of the applications, which treat the Grid as a single and powerful system. The actual execution of the applications may be highly dynamic. Various resources may be used at different stages of the execution. However, these internal activities are transparent from the end users. Although such a layered model has not been standardized yet, an increasing num ber of Grid systems have already been deployed for testing purposes as well as many newly emerging applications. For example, the TeroGrid project builds a comprehen sive distributed infrastructure for open scientific research, which has been providing 20 teraflops of computing power that is distributed at multiple sites and connected through 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a 40 gigabits per second network. Other Grid systems have been built for data analy sis in high energy and nuclear physics experiments, science, earthquake engineering, etc. As scientific computation in these research areas is continuously driven by the improvement in data collection techniques, there will continue to be a need for even more powerful computing systems. This is exactly where the Grid technology fits in. 2.1.2 Networked Sensor Systems Networked sensor system technology is enabled by current advances in micro-electro mechanical systems and wireless communications. It provides many advantages over traditional sensing technologies which either perform remote sensing or require precise deployment of the sensors. In such a system, a large number of low-cost low-power sensors are deployed in the area of interest. The sensors not only perform sensing, but also cooperatively transfer the raw or processed data to a power base station for further processing. Such a unique operation mode has ensured a wide range of applications, ranging from health care and environment monitoring to military and other commercial applications, etc. As the processing and communication capabilities of the sensors continue to improve, implementing resource intensive applications such as audio/video streaming has already started to receive attention. The networked sensor systems have many unique features, at both the system and the node levels. One of the most important features of the sensors is the limited energy supply as the sensor nodes are usually powered by irreplaceable batteries. Compared 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with sensing and computation, communication is much more expensive in terms of energy consumption [2], Generally, data transfers are performed via multi-hop com munications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low- lying antennae and near-ground channels typically found in networked sensor systems. Short-range communication also enables efficient spatial frequency re-use. Various hardware design techniques have been proposed to give the end user the opportunity to prolong the system lifetime at the cost of lower throughput or higher transmission delay. For example, the voltage scaling technique explores the trade-off between the supply voltage and the processing capability. The rate adaptation tech nique explores the trade-off between radio transmission power and energy consump tion. Some other research works have proposed to switch the nodes to a power-saving stand-by mode when no activity is needed and wake up the nodes only when sensing or data transfer are required. Due to the application of such energy saving techniques, the available computation and communication capabilities of the nodes may be con tinuously adjusted during the life time of the system. Besides these hardware techniques, energy efficiency can be achieved at the MAC (media access control), network and application layers, or through a combined opti mization crossing multiple layers. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, Sensor-MAC is a MAC protocol specifically designed for networked sensor systems with energy efficiency being the major concern. While other MAC protocols for wireless communications focus on collision avoidance, Sensor-MAC re duces energy consumptions at the cost of reduced per-hop fairness and possibly larger per-hop latency. Sensor-MAC reduces the listen time of the nodes by letting them go into periodic sleep mode. Neighboring nodes are synchronized so that they go to sleep at the same time and wake up at the same time. Sensor-MAC uses carrier sense and RTS/CTS exchanges to avoid collision. Each transmitted data packet has a duration field, indicating how much time the transmission will take. If a node receives a packet destined to other nodes, it remains silent (according to the duration field) until the transmission completes. Sensor-MAC also reduce the time period that a node listens to transmissions not destined to itself. This is achieved by letting interfering nodes go to sleep after they hear an RTS or CTS packet. At the network layer, the key player is energy efficient routing protocol. Exten sive researches have been conducted in this area. An illustrative protocols is LEACH (Low Energy Adaptive Clustering Hierarchy) [30]. LEACH is a clustering based com munication protocol. It uses localized coordination to maintain the cluster structure and randomly rotate the cluster head within each cluster to evenly distribute the en ergy load on the nodes. It also applies data fusion at the cluster heads to reduce the amount of data that needs to be transmitted to the base station. Another example is 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the energy-aware protocol proposed in [56], This protocol maintains a set of paths in stead of a single optimal path. These paths are selected based on a certain probability that depends on the energy consumption of each path. By choosing different paths for routing, the energy of any single path do not deplete quickly. A important operation in most applications of the networked sensor systems is to sense the environment and transmit the sensed or processed data to a base station. Numbers of techniques have been proposed to explore energy efficiency for such a data gathering operation. In [35], data gathering is assumed to be performed in rounds and each sensor can communicate (in a single hop) with the base station and all other sensors. The total number of rounds is then maximized under a given energy constraint on the sensors. In [50], a non-linear programing formulation is proposed to explore the trade-offs between energy consumed and the transmission rate. It models the radio transmission energy according to Shannon’s theorem. In [54], the data gathering prob lem is formulated as a linear programing problem and a 1+ u approximation algorithm is proposed. This algorithm further leads to a distributed heuristic. Some other applications proposed may have different requirements. For example, the balanced data transfer problem [24] is formulated as a linear programming problem where a ‘minimum achieved sense rate’ is set for every individual node. In [23], data gathering is considered in the context of energy balance. A distributed protocol is designed to ensure that the average energy dissipation per node is the same throughout the execution of the protocol. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 Adaptivity in Distributed Systems Adaptivity of software packages (in terms of performance) in distributed systems has been an active research area. There are two major issues under this general topic. The first is to optimize software routines for each individual computation platform. This objective can be classified as static adaptation. The second is to optimize the performance of the whole system in face of the dynamic load characteristics of the resources. There have been a number of efforts in designing and developing adaptive software packages. These packages differ in their functionalities, platforms, as well as when adaptivity is performed. Some examples are listed below. ATLAS (Automatically Tuned Linear Algebra Software) improves the perfor mance of dense linear algebra kernels over the well known BLAS (Basic Linear Al gebra Subroutine) packages. ATLAS captures the hardware characteristics of the plat forms during installation time by collecting such information as the number of floating point units in the processor, the number of pipeline stages, and the cache size. It then chooses the block size, the instruction set, and loop structures etc to exploit instruction level parallelism (ILP) as well as cache localities, all of which are performed auto matically during the installation of ATLAS. Optimal or close-to-optimal performance (comparable to or exceeding vendor provided linear algebra packages) has been ob served across most known computer architectures. SPIRAL (Signal Processing Implementation Research for Adaptable Libraries) is a code generation system for DSP (Digital Signal Processing) transforms. Due to their 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recursive nature, many DSP transforms can have multiple algorithms and implemen tations. SPIRAL uses a formal framework to generate many alternative algorithms for a given transform and translate them to code. Then SPIRAL searches over and prunes these alternative implementations and finds the best one for the targeted plat forms. Similar to ATLAS, SPIRAL also explores ILP and cache localities and achieves optimal or close-to-optimal performance across various computer architectures. GrADS (Grid Application Development Software) is an ongoing research project to automate the execution of applications in the Grid environment. In the GrADS framework, a program is augmented to include not only the code, but also the strat egy for mapping the program onto distributed computing resources and an estimation model to predict the performance of alternative mappings. The Application Manager initiates the resource selection, launches and monitors the execution of the program. In case the observed performance falls below some pre-set threshold, the contract monitor invokes the rescheduler which carries out some appropriate actions such as replacing the resources, redistributing the tasks, or selecting an alternative mapping program/resource mapping. The goal of GrADS is to provide good resource alloca tion for the applications and to support adaptive reallocation if performance degrades because of changes in the availability of Grid resources. Several other studies have proposed to improve the efficiency of rescheduling by considering not only the potential improvement of rescheduling, but also the cost of 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rescheduling itself. Nevertheless, these approaches perform adaptivity during the ex ecution of the applications. Their effectiveness depends on the correct prediction of resource capabilities and estimation of the execution time. 2.3 Task Allocation and Scheduling in Distributed Systems Given a set of distributed computers interconnected with network links that may oper ate at different speed, the matching of tasks onto computers and scheduling the execu tion order of the tasks have a huge impact on the execution time. The minimization of the overall execution time of all tasks, namely the make-span, in its general form, is known to be NP-complete. Hence developing efficient heuristics has become the major concern. For example, when tasks are independent, the Opportunistic Load Balancing heuristic assigns each task to the next computer that is expected to be available, with the expectation to keep all the computers busy. The Minimum Execution Time heuris tic assigns each task (in an arbitrary order) to the computer with the minimum execu tion time. The Minimum Completion Time heuristic assigns each task (in an arbitrary order) to the computer with the minimum expected completion time for the task. Some other heuristics are more complicated. For example, the Min-min heuristic searches all the un-assigned tasks, picks the task that has the minimum completion 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. time and assigns the task to the corresponding computer. This effectively removes one tasks from the set of un-assigned tasks. This procedure is repeated until all the tasks are assigned. The Max-min heuristic is similar to the Min-min heuristic, except that the task with the maximum completion time is selected each time. Other heuristics use various search techniques such as genetic algorithms, simulated annealing, etc. When tasks have inter-dependencies, the applications are usually represented as a DAG (directed acyclic graph) where the nodes represent the tasks and the links repre sent the dependencies among the tasks. The allocation and scheduling of task DAGs onto distributed systems have also been extensively studied. Due to the intrinsic com plexity of the problem, again, heuristics are the major concern. These heuristics differ from various aspects. For example, some heuristics assume a limited number of pro cessors while others do not. Some studies assume a general communication model where contention may occur. For detailed description of these heuristics, please refer to [14]. The performances of many heuristics have been benchmarked, using both ran domly generated task graphs and graphs abstracted from actual applications. Accord ing to these benchmarking works, heuristics utilizing the critical-paths (the longest path in the task DAG) are in general better than other heuristics. Yet, it should be pointed out that the conclusion is drawn from the benchmarking experiments and it does not necessarily predicts the performance of a particular application deployed in a particular systems. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For computation in heterogeneous systems, maximizing the system throughput is not a new idea. A well known example is the Condor project [62], It develops a soft ware infrastructure so that distributed system with different ownerships can be utilized in a uniformed manner to provide high throughput computation. The throughput max imization problem has also been studied from various algorithmic aspects. The work in [61] considers heterogeneous computing system that are connected via a general graph topology. The application is of streaming style: a task continuously receives data from certain preceding tasks, process the received data, and sends processed data to some other tasks. Finding the optimal mapping from tasks to computers (such that the throughput of data processing is maximized) is shown to be NP-complete and a mapping heuristic is developed in [61]. A different scenario is considered in [8] where the system topology is also graph structured but the application consists of a set of independent identical problems and each problem in turn consists of a set of inter dependent tasks. The result shows that maximizing the throughput (defined as the number of problems executed in one unit of time) for this general scenario is also NP-complete. Although these general scenarios of throughput maximization is NP-complete, better complexity results and algorithms have been obtained for some specific (and possibly more practical) scenarios of the application settings and system topologies. Throughput maximization for single level master-slave computation in a Grid has been studied in [57], where the compute resources are considered to communicate with the 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. root node only. When the application consists of a large set of equal-sized independent tasks, the throughput maximization in general graph-structured system is studied in [5], The problem is uniformly formulated as a linear programming problem, which is a well-studied problem with many algorithms available. However, these algorithms are centralized and not suitable for distributed execution. In [9], a bandwidth-centric method has been obtained for the computation of equal-sized independent tasks when the computers are connected via a tree topology, which further leads to a localized au tonomous task allocation strategy. When the system are connected via a general graph topology, the problem of extracting a best spanning tree that has the highest through put among all possible spanning trees is studied in [6]. The result in [6] shows that the achievable throughput of the optimal spanning tree, in the worst case, can be arbitrarily bad compared with that of the original graph-structured system. Similar to some of the previous studies, we also study the throughput maximiza tion for the computation of equal-sized independent tasks. Our study differ from the previous studies in that we develop a distributed and adaptive algorithm when the system has a graph-structured topology - which constitutes the major contribution in Chapter 3. In this dissertation, we consider the system model in which (1) a computer can send and receive data to/from multiple neighboring computers concurrently; (2)computation and communication can be overlapped at the computers. Chapter 3 Adaptive Allocation of Independent Tasks to Maximize Throughput In this chapter, we consider the task allocation problem for computing a large set of equal-sized independent tasks on heterogeneous computing systems. This prob lem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider a general problem in which the interconnection between the nodes is modeled using a graph. We show that the maximization of system throughput reduces to a standard network flow problem. We then develop a decentralized adaptive algorithm that solves a relaxed form of the stan dard network flow problem and maximizes the system throughput. This algorithm leads to a simple decentralized protocol that coordinates the resources adaptively. Sim ulations have been conducted to verify the effectiveness of the proposed approach, for both uniformly and power law distributed systems. Performance improvement over a 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. first come first serve greedy heuristic has been observed. The adaptivity of the pro posed approach is also verified through simulations. 3.1 Introduction With recent advances in networking and computation techniques, distributed hetero geneous computing systems have become attractive platforms for high performance computing. In such systems, distributed resources such as workstations and super computers are connected through local and/or wide area networks. By utilizing these distributed resources in a coordinated manner, a heterogeneous computing system can meet the computational demands of complex applications [27]. In this chapter, we consider the computation of a large set of equal-sized inde pendent tasks in a heterogeneous computing system. In particular, we consider the scenario where each task is to process a fixed amount of source data. The source data of all the tasks initially resides on a single node in the system, which we call the root node. Other nodes in the system need to receive the source data of a task, either directly from the root node or indirectly through other compute nodes, before they can compute the task. This computation paradigm models a variety of research and commercial activities. Internet based distributed computations are among the most well-known examples, which include SETI@home [37], Folding@home [40], data 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. encryption/decryption [20], etc. The computation paradigm also models various ap plications that are typically executed on tightly coupled systems (e.g. Monte Carlo simulations, Computational Phylogeny [46], etc). The computation paradigm reduces to the general problem of allocating or schedul ing independent tasks in heterogeneous computing systems. One possible objective of optimizing the system performance is to minimize the overall execution time (the make-span ) of all the tasks. Although some specific scenarios can be solved in poly nomial time, the make-span minimization problem is known to be NP-complete in its general form. In this chapter, we consider an alternative optimization objective: maxi mizing the system throughput. This objective may be less meaningful when there are only a few tasks to compute. In such a scenario, the application may have already terminated before the system can reach its achievable throughput. However, if the ap plications have a large number of tasks, then system throughput becomes a suitable optimization objective. For such applications, it is the achievable throughput that de termines the maximum number of tasks that the system can execute during a given time period. Because computation and communication resources in distributed heterogeneous computing systems are typically shared among multiple users and applications, the network performance and the effective compute power of each node may vary at run time. This is particularly true in the case of Internet based computation, Peer to Peer Computation, and the Grid [33]. Optimizing the performance of the system based on 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a snapshot of the current system status may not lead to optimal performance. Conse quently, the task allocation needs to be adaptive to the changes in the system [10, 53]. We show that the throughput maximization problem can be efficiently solved in an adaptive fashion. We model the computation as a special type of data flow. This leads to our extended network flow (ENF) representation for the throughput maximization problem. Based on the ENF representation, we show that the system throughput can be transformed to the network flow in a corresponding graph. More importantly, for the network flow maximization problem, we develop a decentralized task allocation algorithm that adapts to the changes in the system. This task allocation algorithm can then be implemented as a decentralized task allocation protocol to coordinate the com pute nodes in the system. Simulations were conducted to verify the effectiveness the ENF representation based task allocation approach. Simulation results show that the overhead of task allocation (transferring control messages among the compute nodes) is negligible when the number of tasks becomes large. We have also observed perfor mance improvement over the heuristic that allocates tasks in a first come first serve fashion. 3.2 System Model The compute nodes are assumed to be connected via an arbitrary topology and the system is represented by a directed graph G(V, E ). Each node u G V in the graph represents a compute node, u has weight wu, representing the processing power of u, 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i.e. u can perform one unit of computation in 1 /w u time. The edge (it, v) G E in the graph represents a network link from u to v. Edge (it, v) has capacity cuv representing the communication bandwidth of (it, v), i.e. (it, v) can transfer one unit of data from it to v in 1 / c : uv time. To model non-symmetric communication links, the edges are uni-directional, so G is directed and (it, v) ^ (v, it). In the rest of this chapter, we use ‘edge’ and ‘link’ interchangeably. The successors of it in G is denoted as au = {w E V|(it, w) E E } and the predecessors of it in G is denoted as il)u = {ui E V\(w, v) G E}. Although the physical media in modem networking techniques may be simplex (such as most fiber optic communications that use one strand to send data in each di rection) or half-duplex (such as un-switched Ethernet which allows at most one device to transmit at a time), full-duplex network interfaces have been widely supported and implemented as the standard practice. Consequently, we only consider full duplexed network interfaces, which means that the compute nodes can send and receive data concurrently. We also assume that the network interfaces can communicate with multiple ad jacent nodes concurrently. This represents the modern network techniques (e.g. the packet switching technique) that support concurrent communications. However, the rate with which a network interface sends and receives data cannot increase infinitely as the number of concurrent communications increases. The data transfer rate cannot 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exceed the hardware limitation of the network interface. Furthermore, there are typi cally a send and a receive buffer associated with each network interface. Implemented in either hardware or software, the buffers are used to control the data flow rate as spec ified by the network protocol. To reflect this limitation, for each node u, we introduce another two parameters: c™ and c°ut. These two parameters indicate the capability of u to receive and send data: within one unit of time, at most c™(c°ut) units of data can flow into (out of) u. We further assume that the compute nodes can perform computation and commu nication concurrently. The overlapping of computation and communication is made possible by various techniques (e.g. direct memory access, multi-threading, etc) and supported by software libraries such as PVM and MPI. Performance improvement obtained by overlapping the communication and computation has been observed in various studies such as [7]. Some researchers have also pointed out that certain cost is associated with the overlapping of computation and communication [38], i.e. the computation capability of a computer may reduce if the computer is involved inten sively in communications. For discussion in this chapter, we do not consider the cost of overlapping. Without loss of generality, we assume that each task is to perform one unit of computation on one unit of source data. The tasks are independent of each other and do not share the source data. A compute node can compute a task only after receiving the source data of the task. Initially, node s holds the source data for all the tasks, s is 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. called the root node. Except s, all other compute nodes in the system need to answer the same questions (1) where to get the tasks? (2) how many tasks to compute locally? (3) where to send the remaining tasks? The purpose of this study is to answer the above three questions for all nodes in the system such that the system throughput can be maximized. The throughput of a system is defined as the number of tasks computed by the system in one unit of time under steady state condition. For convenience, we say that a task is transferred from u to v when the source data of a task is transferred from u to u. Let f(u , v) denote the number of tasks transferred from u to v in one unit of time. We have the following formal problem statement: Base Problem: Given a directed graph G (V,E). Node u € V has weight wu > 0, input constraint cj" > 0, and output constraint c°ut > 0. Edge (u, v) has capacity constraint cuv > 0. s is the distinguished root node. Maximize: w a + E uev - { s } ( E ^ / ( u> u) “ E „ e< r u v)) 1. 0 < f(u, v) < cuv for (u, v) e E 2 ■ c fo r U e V 3- Eweau /(« . w) < c° u ut f o r U e V 4- 0 < / K u) - Y ,we< T u f ( u ’ w) < w u f o r u e V - {s}. The following is the detailed explanation of the problem statement: In the optimization objective, E u e ^ f(v> u) ~ Eue<ru / ( u;v) IS the net number of tasks received (and processed) by node u in one unit of time. Because s does 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. not need to receive tasks from other nodes, the computation capability of s is just an additive factor to the optimization objective. The optimization objective is therefore to maximize the total number of tasks processed by all the nodes in the system. Condition 1 reflects the capacity constraints of the edges. In Condition 2, Ylwetpu f ( wiu) is the total number of tasks transferred to u. Condition 2 means that no node can receive tasks at a rate high than what is allowed by its network interface. Similarly, Condition 3 limits the rate that a node can send out tasks. In Condition 4, Ylweipu / ( w)u) ~ Ylwevu w) is the net number of tasks that u has kept locally. Condition 4 means that any node (except s, which has the source data for all the tasks) cannot not keep more tasks than it can compute, otherwise the number of un-computed tasks on this node will increase monotonically as time advances. Base Problem has a linear programming formulation. Because we are consider ing steady-state throughput, Base Problem only needs to be solved in rational. We are not looking for integer-valued solutions. Various algorithms have been proposed to solve linear programming problems. These include the Simplex algorithm that has excellent average case performance, and the interior point algorithms that guarantee a polynomial execution time. (In the next section, a distributed and adaptive algorithm will be developed based on this representation.) Graph Transformation: 1. Create an empty graph G'(V', E'). 2. Insert a node t into V '. 3. For each node u in Base Problem (G, s), (a) insert three nodes u i, u2, and u3 into V'. -t/i, u2, and u? J all have zero weight; (b) insert edges (u2, ui) and edges (u i,u 3) into E'. Set the capacity of (u2, ui) to c“ , and the capacity of (ui, u3) to (3) insert edge (ul51) into E' and set the capacity of (ui,t) to wu. 4. For each edge (u, v) in Base Problem (G, s), insert edge (u3, v2) into E' and set the capacity of (u3, v2) to cuv. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ V ? Figure 3.1: Transform a base problem to a network flow representation. Left: a base problem with three nodes. Right: the corresponding network flow representation. The transformation results in a new graph G'. A hypothetical node t is first added to G'. Each node it in Base problem (G, s) is then split into three nodes u\, it2, and it3, representing the processor, the input interface, and the output interface of it. Hypo thetical edges from the iti to t represent task executions at the processor of it. Si is the root node in G'. The transformation procedure is illustrated in Figure 3.1. To simplify the figure, node weight and edge capacities are not marked. After transforming the graph, we have the following flow maximization problem: Problem 1: Given a directed graph G(V, E ) with root node s and sink node t. s ± t. Edge (u, v) has weight cuv > 0. Maximize: £ u e < T s f{s, u) - £ u /(it, s) Subject to: 1. 0 < /(it, v) < cuv for (it, v) £ E 2- f(u, v) = u) for it G F - {s, t} 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Problem 1 is the well-studied network flow problem. The objective is to maximize the amount of flow out of the root node without violating the edge capacity constraints. In the mean time, all the nodes except root s and sink t must have the same amount of incoming and out-going flow. Similarly, if an instance of Problem 1 has G as the input graph, s as the root node, and t as the sink node, we denote it as Problem 1 (G ,s,t ). We further use Tb (G,s) to denote the maximum throughput for Base problem instance (G, s), and 7j(G , s, t) to denote the maximum flow for Problem 1 instance (G ,s,t ). The next theorem shows that the Base Problem is a special case of Problem 1 after applying the graph transformation. Theorem 1. Suppose Base Problem (G, s ) is converted to Problem 1 (G1 , .Si, t) by applying the above graph transformation, then TB( G , s ) = T 1(G',su t) Proof: We use the notation in Procedure 1 to denote the nodes and edges in G and the corresponding nodes and edges in G'. Suppose f(u,v), (u, v) e E is a feasible solution for Base Problem (G, s). We map it to a feasible solution v'), (u', v') G E' for Problem 1 (G1 , si, t) as follows: 1- f'(u3,v2) <- f(u,v ) 2. /'(«2,ui) ^Y ,w e ipJ(w,u) 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3- f'(uUU3) <-Y,wZa*f(U’W) 4 . f'(uu t) 4- f'{u2,ui) - f'(ui,u3) It is easy to verify that such an / ' is a feasible solution for Problem 1 (G', si, t) and that / ' results in the same throughput as / . Suppose ;/), (V, v') E E' is a feasible solution for Problem 1 (G', .sy, t). We map it to a feasible solution f(u, v), (u, v) E E for Base Problem (G, s) as follows: f(u,v) = f'{u3,v2) It is also easy to verify that such an / is a feasible solution for Base Problem (G, s) and that it has the same throughput as /'. □ Problem 1 is the well studied network flow problem. Several algorithms [17] can be used to solve this problem, e.g. the Edmonds-Karp algorithm which has 0 (|E | • IE1 ! 2) complexity, the Push-Relabel algorithm which has 0 ( |E |2 • |J5|) complexity, and the Relabel-to-Front algorithm which has 0 ( |E |3) complexity. However, in terms of decentralization and adaptivity, these well-known flow max imization algorithms are not suitable for distributed computing environments. Both the Edmonds-Karp and the Relabel-to-Front algorithms are centralized. The Push- Relabel algorithm has a decentralized implementation where every node only needs to exchange messages with its immediate neighbors and makes decisions locally. But in order to be adaptive to the changes in the system, this algorithm has to be re-initialized and re-run from scratch each time when some parameters (edge capacities) of the flow 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. maximization problem change. For each of the re-runs, none of the nodes can start executing the push and relabel operations until all the nodes have finished the initial ization process (setting h(u) to 0, etc). In this case, there has to be a global controller that monitors the initialization of all the nodes and gives the nodes ’ok-to-start’ signal. This again compromises the desired property of decentralization. 3.4 Decentralized Adaptive Task Allocation Algorithm In this section, we first show that the maximum flow remains the same even if condition 2 in Problem 1 is relaxed. Then we develop a distributed and adaptive algorithm for the relaxed problem. 3.4.1 Relaxed Flow Maximization Problem Consider the example in Figure 3.2. The flow and capacity of each edge is marked in the form of “flow/capacity”. Given the capacities of the edges, the maximum achiev able flow is 18. In this example, node e has 12 units of in-coming flow and 15 units of outgoing flow. Such a flow is not a feasible solution to Problem 1 since condition 2 is violated at node e. Suppose the nodes form an actual system and 12 tasks have reached e, then e can send out no more than 12 tasks even if it is allowed to send out 15 tasks. The above observation can be generalized as follows: when a system is ac tually deployed, the number of tasks that a node can send out is limited not only by 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6/10 6/10 10/ 10, 4/4 4/4 8/8 15/10 8/8 Figure 3.2: An example of the relaxed flow maximization problem the capacities of the edges, but also by the number of tasks that have been received. Intuitively, ‘allowing’ a node to send out more tasks them what has been received will not affect the system throughput adversely. This leads to the following relaxed network flow problem: Relaxed Flow Problem: Given directed graph G(V, E ), source node s G V, and sink node t G V. Edge (u,v) has weight cuv > 0. Maximize: E u e v / 0 ^ ) Subject to: 1. f(u, v) < cuv for u € V, v £ V 2. f(v, u) = —f(u, v) for u, v G V 3- T,vevf(v’u) ^ 0 f o r u e F - { s , f } In the problem statement, we have adopted the widely used notation for network flow problems: when the actual data transfer is from u to v, we define f(v,u) = —f(u,v). Additionally, when edge (u,v) E, we define cuv = 0 and still enforce 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the capacity constraint on (u,v). In this way, we can define f(u,v) over V ' x V, rather than being restricted to E. If neither (u, v) nor (v, u) belongs to E, then cuv = cvu = 0, which implies that f(u,v) = f(v,u) = 0. after enforcing edge capacity constraints. f(v, u) = — /( it, v) also allows us to compute the total amount of flow into u as Ylwev f(wiu) (which is equal to ^ we^ U ( T „ f{w-> u)) Note that this expanded definition of f(u, v) is for notational convenience only. It does not change the essence of the flow problem. The Relaxed Flow Problem differ from Problem 1 in that the total flow out of a node can be equal to or larger than the total flow into the node (condition 3). The objective is to maximize the total amount of flow out of root s. A feasible function / to the Relaxed Flow Problem is called a relaxed flow in graph G. We use TR(G, s , t) to denote the maximum throughput that flows out of node s in a relaxed flow problem with graph G, root s and sink t. The following theorem shows the relation between the Relaxed Flow Problem and Problem 1. Theorem 2. Given graph G(V, E), source s and sink t. If f* is an optimal solution to the Relaxed Flow problem, then there exists an optimal solution f to Problem 1 such that 0 ^ f (u, v) ^ f {n, uj fo r each f (u, u) ^ 0. Additionally, Tr(G, s , t ) — Ti(G,s,t). Proof of the theorem is not difficult and omitted here due to space limitations. 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.2 The Algorithm The algorithm is an augmentation to the Push-Relabel algorithm and is denoted as the Relaxed Incremental Push-Relabel (RIPR) algorithm. To explain the RIPR algorithm we need two additional notations. For each node u € G, e(u) is defined as e(u) = J2weV f { w , u), which is the total amount of flow into node u. An integer valued auxiliary function h(u) is also defined for u G G, which will be explained in the algorithm. The algorithm is described as below: 1. Initialization phase: h(u), and f(u, v) are initialized as follows: h(u) - < —0 fo r u G V — {s} f(u, v) 4— 0 fo r u ^ s an d v ^ s h(s) <- \V\ f{s,u) 4 - csu fo r u e v f(u,s) < f(s,u ) for u ^ V e(u) 4 - T lwev f ( w ^u) f o r u e V 2. The initialization phase is executed only once (when the algorithm starts). After all the nodes complete the initialization phase, every node in the system except s and t execute the following two operations as long as e{u) > 0: (a) P ush(u, v): applies when e(u) > 0 and 3v 6 V s.t. cuv — f(u,v) > 0 and h(u) > h(v), d = m in ( e ( it ) , cuv - f(u , v)) 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u, v) < — f (u, v) + d f(v,u) < f(u,v ) eiu) ^ Y , w€v f ( w’u) e(v) ^ Y , wevf(w’v) (b) Relabel(u): applies when e(u) > 0 and h(u) < h(v) for all v G {v\cuv — f(u,v) > 0}, h(u) G- min„e{t)|C iti)_/ (u^)>0} h(v) + 1 3. Whenever the capacity of some edge edge, say (u, v), changes from changes to c^ , the following Adaptation (u , v) operation is executed: (a) if > c, b and /(u, u) < c ^ , do nothing. (b) if <4e > C m and /(« , w ) = c*,0, then /i(s) G - /i(s) + 2|y| /(s,w) G-csu for rt G y f(u,s) i f(s,u ) f o ru e V e(u) <- Ysvev f(v> for u G V " (c) if 4> < C fio and /(« , v) < c ' m , do nothing. (d) if C y < cm and f(u, v) > then h(s) < —h(s) + 2\V\ f(s,u ) < -c3 u for u G y f(u,s) < f(s,u) for u e V 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u,v) <-£4fi f{v,u) < f{u,v) e (u ) ^ Y , v e v f ( v ’u) foruGV In the algorithm, e(u), and h(u) are the local variables maintained by u. Only it’s immediate neighbor nodes will query the value of h(u) (to determine if a ‘push’ or ‘relabel’ can be executed, u 's neighbor nodes need to know h(u)). The ‘push’ and ‘relabel’ operations only change the local variables maintained by u (e(u), h(u), f(u, v) where v is a neighbor of v). Note that /(u, v) is actually shared between both u and v. Maintaining a consistent image of a shared variable is another research topic with quite a few research results available (e.g. certain consistency protocols have been designed in []). In summary, both ‘push’ and ‘relabel’ operations are distributed. When the ‘adaptation’ operation is performed due to the capacity change of edge (u,t5, the algorithm changes f(u,v), which is local to u and v. The algorithm also increases h(s) by 2\V\ and set the flow out of s to the edge capacities, regardless of the new capacity of (u, v. Notifying s about the capacity change is indeed not a local operation. However, since all the tasks initially reside on the root node in our task allocation problem, it is reasonable to assume that every node can send a message to the root node. The RIPR algorithm assumes that s knows |V|, the total number of nodes in the system, which is the only global information that the RIPR algorithm needs to know. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Although nodes in the system execute the ‘push’ and ‘relabel’ operations in an asynchronous fashion, we assume that each individual operation is atomic, meaning that a node u cannot read the intermediate values of h(v) of its neighbor v while v is updating h(v). The same rule applies to other local variables maintained by v. The ‘adaptation (u, v’ operation actually consists of two parts (1) update the flow on edge (u, v if necessary (2) s increases h(s) and the flow going out of s. We assume that both part 1 and part 2 are atomic operations. Note that the execution of ‘push’ and ‘relabel’ operations starts immediately after the initialization phase and continues at each node u as long as e(u) > 0. ‘push’ and ‘relabel’ at one node may enable ‘push’ and ‘relabel’ to be executed at other nodes. The ‘adaptation’ operation increases the value of h(s) and may increase some e(u) to a positive value, which will also enable new ‘push’ and ‘relabel’ operations to be exe cuted. The algorithm will execute continuously as long as edge capacities keep chang ing. Assuming that edge capacities eventually stop changing, the following theorem which shows that eventually no push and relabel operations will need to be executed and the RIPR algorithm finds the maximum flow then. Theorem 3. Given graph G(V,E) with root s and sink t, and assume that no ca pacity changes occur after the n th adaptation operation, then the number of ‘ push’ and ‘relabel’ operations executed by the RIPR algorithm is bounded from above by 0 (n 2 • \V\2 ■ \E\), where \ V\ is the number of nodes and \E\ is the number of edges in 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the graph. Additionally, the RIPR algorithm finds the optimal solution to the relaxed flow problem when no ‘ push’ or ‘relabel’ operation can be performed. The proof of the theorem is presented in the Appendix. The execution of the RIPR algorithm follows a ‘stimulation-response-stabilization’ pattern when the graph changes dynamically. After initialization, the nodes execute the ‘push’ and ‘relabel’ operations (distributively) until no further ‘push’ or ‘relabel’ can be applied. At this point, f(u, v) determined by the nodes constitutes an optimal solu tion to the relaxed flow problem for the current status of the system. Then some edge capacity changes, then ‘adaptation’ is evoked, which will trigger new ‘push’ and ‘rela bel’ operations. Again, when no more ‘push’ and ‘relabel’ operations can be applied, f(u,v) determined by the nodes constitutes a new optimal solution to the relaxed flow problem for the then current status of the system. Edge capacity can change before the RIPR algorithm finds an optimal solution. However, the RIPR algorithm is guar anteed to find an optimal solution for the current status of the system as long as edge capacities do not change any more. The RIPR algorithm differs from the original Push-Relabel algorithm in two as pects. First, it solves the relaxed flow maximization problem instead of the standard flow maximization problem. Second, when some edge capacities have changed, the RIPR algorithm starts from the current values of f(u, v) and searches for the new op timal solution. Such an incremental optimization means that the algorithm does not 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. need to be re-initialized when adapting to edge capacity changes. Consequently, no global controller is needed to monitor the initialization process of all the nodes. 3.5 On-line Task Allocation Protocol The RIRP algorithm can be approximated as a simple protocol to actually allocate the tasks. As discussed in Section 3.3, a compute node maps to three nodes in the network flow representation. Hence in the protocol each compute node needs to execute ‘push’, ‘relabel’, and ‘adaptation’ for the three ‘hypothetical’ nodes. For the discussion in the rest of this section, u is used to refer to either a compute node or a node in the network flow representation, which can be easily clarified given the context. In this protocol, a task buffer is maintained by each compute node. By limiting the size of the task buffers we can prevents any compute nodes from accumulating more tasks than they can compute. The buffers contain the source data of the tasks. Initially, the task buffer at the root node contains all the source data and all other task buffers are empty. Let b(u) denote the length of the buffer at u. At any time instance, each compute node u e V operates as follows: 1. Contact the adjacent compute node(s) and perform the ‘push’, ‘relabel’, and ‘adaptation’ operations, if necessary. By performing the operations, u can find the optimal rate /(« , v) to transfer data to/from each neighbor v. 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. If b(u) > 0 and u is not computing any task, then remove one task from the task buffer, set b{u) b(u) — 1, and compute the task. 3. While b(u) > 0 and u is computing a task, send message ‘request to send’ to each node v s.t. f(u,v ) > 0. If ‘clean to send’ is received from v, then send a task to v at rate f(u , v) and set b(u) b(u) — 1. To maximize the system throughput, u sends tasks at the rate determined by the RIPR algorithm, rather than utilizing the full capacity of its outgoing edges. 4. Upon receiving ‘request to send’, u acknowledges ‘clean to send’ if b(u) < U. u acknowledges a denial if b(u) > U. Here U is a pre-set threshold that limits the maximum number of tasks a task buffer can hold. In the above protocol, two types of data are transferred in the system: the control messages that are used by the RIPR algorithm, and the tasks themselves (or more precisely, the source data of the tasks). The control messages are exchanged among the compute nodes to query and update the values of h(u) and f(u , v). The ‘request to send’ and ‘clean to send’ messages are also control messages. It should be pointed out that the flow rate control used in step 3 of the above pro tocol (data is sent at a specific rate) is applicable. First of all, the RIPR algorithm guarantees that /(u , v) never exceeds the bandwidth constraints of cuv and the capa bility constraints of c°ut and cj1 , even before the optimization is completed. (In case of a capacity change, /(« , v) will be updated according to the new capacities as in the ‘adaptation’ step of the RIPR algorithm.) Furthermore, rate control is supported by 46 Reproduced with permission of the copyright owner. For example, when a computer cluster is connected through a local area high-speed network such as the Myrinet, usually one of the computers is chosen to be the interaction node, which accepts jobs from the user, distributes the computation to the other nodes, and serves as the gateway to the outside network. This forms a master/slave relation and therefore a single level tree. When multiple such clusters are connected to form a typical campus-wide distributed system, the tree topology is further expanded to multiple levels. The throughput maximization problem is trivial for a tree structured system be cause the routing is fixed. A parent node simply pushes data to its children at the highest possible rate. More importantly, a tree structured system can be easily reduced 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to a single super-node, which represents the overall processing power of the whole tree. The procedure for such a reduction is fairly easy to derive and hence omitted here. The hierarchical system topology suggests that we only need to model the system at the back-bone level when evaluating its performance, not only because task alloca tion within one administrative organization is straightforward, but also because each administrative organization can be easily reduced to a single super-node. To reflect this consideration, instead of simulating hundreds or thousands of nodes, we limit the number of nodes in a system to 80 when conducting the simulations, although our protocol can be scaled to much larger systems. In the simulations, a graph is represented by its adjacency matrix A where each non-zero entry represents the bandwidth of the corresponding link cuv. Initially, all entries in A are set to 0. Then a randomly selected set of a ^ ’s are assigned values that are uniformly distributed between 0 and 1. c\a and c°“f are also uniformly dis tributed between 0 and 1. wu is uniformly distributed between 0 and wmax. Note that 1 / Wmax represents the average computation/communication ratio of a task. wmax > 1 represents a trivial scenario because the direct neighbors of the root node can con sume, statistically, all the tasks flowing out of the root node, and hence there is no need for other nodes to join the computation. The actual value of wmax depends on the application. For example, in SETI@home, it takes about 5 minutes to receive a task 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. through a modem and about 10 hours to compute a task on a current model home com puter [37], In our simulations, we used wmax = 0.1 and wrnax — 0.05, which represent an average computation/communication ratio of 10 and 20, respectively. The network latencies were ignored when simulating the transfer of tasks. This is a reasonable ap proximation because the system operates in a pipelined fashion after it starts up, and the latencies are well hidden by such a pipeline unless the volume of data per task is small. In order to evaluate the performance of the proposed task allocation protocol, we compare it against a greedy protocol, in which node u sends a task request to a ran domly chosen neighbor when the task buffer at u becomes empty. When multiple requests are received simultaneously, the request from the neighbor with the highest compute power is processed first. This protocol represents the first-come-first-serve approach, where a compute node gets a task whenever it becomes free. Hence the more powerful a compute node is, the more the number of tasks assigned to it. For this set of simulations, the parameters cuv, wu, c™ , and c°ut were assumed to be constant; threshold U, whose impact will be studied later, was temporarily set to 5. 1800 systems were simulated, 900 with wmax = 0.05 and 900 with wmax = 0.1; and there were 20 nodes in each of the systems. Initially, there were 2500 tasks on node s. The throughput of a system is calculated as 2500/ t au, where taii is the overall computation time of all tasks. The results in Figure 3.3 compare the performance of our task allocation protocol and the greedy protocol. In Figure 3.3, throughput of 49 Reproduced with permission of the copyright owner. In the second set of simulations, we study the impact of the buffer size on the performance of the system. To simplify the discussions, we assume that the task buffers at all the nodes have the same size U. If the size of the task buffers is set to U = 0 0 , then a good indicator of storage requirement is the maximum length of task buffer that is actually consumed. We sim ulated 200 systems each having 20 nodes and another 200 systems each having 40 nodes. wmax was set to 0.05. No changes occurred to the system during the simula tions, hence no adaptations were activated. For each system, there were 2500 tasks on root node s initially and all the nodes in the system have infinite task buffers. We monitored the maximum buffer consumed by each individual compute node and the results are summarized in Figure 3.5 in the form of histogram. The histogram in Fig ure 3.5 (a) is computed over all the 20-node systems (200 such systems in total), and the histogram in Figure 3.5 (b) is computed over all the 40-node systems (200 such systems in total). The results in figure 3.5 show that task buffers of size 4 can cover more than 99% of the cases. More interestingly, task buffers of size 1 is large enough for more than 80% of the cases. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. While the results in Figure 3.5 examines the maximum length of the task buffer consumed, the following simulation results illustrate the impact on the system perfor mance when the sizes of the task buffers are limited. We conducted simulations on the same 400 systems that were used to generate the results in Figure 3.5, but U was no longer set to infinity. For each system, we conducted two separated simulations, one with U = 5 and another with U — 1. Again no changes to the systems occurred during the simulations. The results are shown in Figure 3.6, in the increasing order of the normalized throughput of scenarios with U = 1. We can see that U = 5 leads to a higher, and closer to optimal, throughput than U — 1. We have observed in the simulations that further increasing the value of U did increase the system throughout. The benefit, however, becomes marginal as U gets larger. As claimed in Section 3.4 and 3.5, adaptivity is an important property of our task allocation protocol. This is illustrated in the next set of simulations. The system consists of 20 nodes. The adjacency matrix was initialized as discussed above. wmax = 0.1. However, during the course of the simulation, the network condition and the effective compute power of the nodes were altered at two time instances. At time instance t — 4000, the compute power of a selected set of compute nodes was reduced by 80%. Then at t = 8000, these compute nodes recovered their compute power, while at the same time, the compute power of another set of nodes was increased by 40% and the bandwidth of a selected set of links was increased by 30%. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In Figure 3.7, the optimal throughput was calculated offline. It indicates the max imum throughput that can be achieved by the system. The actual throughput for time instance t was calculated as (N(t + 75) — N(t — 75))/150, where N (r) is the number of tasks computed by the system from time 0 to time r. The size of this moving win dow, 150, is selected experimentally, as a trade-off between preserving the details and describing the trend. When t < 75, the actual throughput was calculated as N(t)/t. As illustrated in Figure 3.7, our task allocation adapts to the changes in the system and approaches the optimal throughput during the course of the computation. We sim ulated U = 1 and U = 5 for the threshold U on the buffer size. Our task allocation exhibits similar adaptivity for both values of U. When the system parameters changed at t = 4000 and t = 8000, the adaptation procedure was activated and the task allo cation was adapted. As can be seen, the system operates at (close to) the new optimal throughput, after the adaptation was completed. In Figure 3.7, at some time instances, the actual system throughput exceeds the optimum. This is because the size of the moving window is not wide enough. The impact of U is similar to the static scenario: U = 5 leads to a higher, and closer to optimal, throughput than U = 1. We have also observed that the benefit of further increasing the value of U, not surprisingly, becomes marginal as U gets larger. These results show that our task allocation does not need a large task buffer to adapt. We further simulated the impact of control message transfer cost (As previously explained, the costs of updating f(u, v), h(u), and e(u) on each node can be ignored 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. since each update requires only a few simple arithmetic operations.)- The results are shown in Figure 3.8. The system consists of 20 nodes. The adjacency matrix was initialized as before, with wmax = 0.1. At time t = 4000, the compute power of a selected set of nodes was increased by 25%. When calculating the actual throughput, the width of the moving window was experimentally set at 250. Let CPCM denote the cost (transfer time) per control message. We compared two scenarios: CPCM = 0.01 (units of time) and CPCM = 0.2. We use CPCM = 0.01 to simulate a scenario depicting a fast network and CPCM = 0.2 to simulate a scenario depicting a network with high latencies. The two values of CPCM were selected by considering the fact that cuv’s are uniformly distributed between 0 and 1, hence the average cost to transfer a task between two nodes is 2 units of time. In this experiment, the longest simple path leaving the root node consists of 3 hops. When CPCM = 0.01, the round trip time to transfer a control message along this path is 0.06 units of time. When CPCM = 0.2, the round trip time is 1.2 units of time. There are two different optimal allocations for the system, one before t = 4000, the other after t = 4000. In the simulations, we observed that when CPCM = 0.01, it took 19 units of time for the system to find the first optimal allocation, and 31 units of time to adapt to the new optimal allocation when the parameter changed at t = 4000. If CPCM = 0.2, it took the system 87 and 124 units of time to find the two optimal task allocations, respectively. We also observed that the search times (to find the optimal allocation) are independent of the value of U. Although the cost per control message 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. does affect the search time, the search time is insignificant considering the fact that the computation lasts much longer. Hence, the impact of transferring the control message on the system performance is negligible, as can be seen from Figure 3.8. Another observation from Figure 3.8 is that the transition time, starting at t = 4000, for the system to reach the new optimal throughput is much larger than the searching time for the new optimal task allocation. This is because the task buffers need to be filled and/or emptied for the system to operate on the new optimal allocation. 56 Reproduced with permission of the copyright owner. We study store-and-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem, which means that the algorithm developed in Chapter 3 can be applied. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. This approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 Introduction State-of-the-art sensors (e.g. Smart Dust [66]) are powered by batteries. Replenishing energy by replacing the batteries is infeasible since the sensors are typically deployed in harsh terrains. The sensors, which are usually unattended, need to operate over a long period of time after deployment. Energy efficiency is thus critical. Techniques ranging from low power hardware design [3, 52] and energy aware routing [31, 59] to application level optimizations [58, 69] have been proposed to improve energy effi ciency of networked sensor systems. An important application of networked sensor systems is to monitor the environ ment. Examples of such applications include vehicle tracking and classification in the battle field, patient health monitoring, pollution detection, etc. In these applications, a fundamental operation is to sense the environment and transmit the sensed data to the base station for further processing. In this chapter, we study energy efficient data gathering in networked sensor systems from an algorithmic perspective. Compared with sensing and computation, communication is the most expensive operation (in terms of energy consumption). Generally, data transfers are performed via multi-hop communications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low-lying antennae and near-ground channels typically found in networked sensor systems. Short-range communication also enables efficient spatial frequency 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. re-use. A challenging problem with multi-hop communications is the efficient transfer of data through the system when the sensors have energy constraints. Some variations of the problem have been studied recently. In [35], data gathering is assumed to be performed in rounds and each sensor can communicate (in a single hop) with the base station and all other sensors. The total number of rounds is then maximized under a given energy constraint on the sensors. In [50], a non-linear pro graming formulation is proposed to explore the trade-offs between energy consumed and the transmission rate. It models the radio transmission energy according to Shan non’s theorem. In [54], the data gathering problem is formulated as a linear programing problem and a 1 -I-a; approximation algorithm is proposed. This algorithm further leads to a distributed heuristic. This study departs from the above with respect to the problem definition as well as the solution technique. For short-range communications, the difference in the en ergy consumption between sending and receiving a data packet is almost negligible. We adopt the reasonable approximation that sending a data packet consumes the same amount of energy as receiving a data packet [2], The study in [50] and [54] differ entiate the energy dissipated for sending and receiving data. Although the resulting problem formulations are indeed more accurate than ours, the improvement in accu racy is marginal for short-range communications. In [35], each sensor generates exactly one data packet per round (a round corre sponds to the occurrence of an event in the environment) to be transmitted to the base 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. station. The system is assumed to be fully connected. The study in [35] also consid ers a very simple model of data aggregation where any sensor can aggregate all the received data packets into a single output data packet. In our system model, each sen sor communicates with a limited number of neighbors due to the short range of the communications, resulting in a general graph topology for the system. We study store- and-gather problems where data are locally stored on the sensors before the data gath ering starts, and continuous sensing and gathering problems that models time critical applications. A unified flow optimization formulation is developed for the two classes of problems. The focus in this chapter is to maximize the throughput or volume of data received by the base station. Such an optimization objective is abstracted from a wide range of applications in which the base station needs to gather as much information as possi ble. Some applications proposed for the networked sensor systems may have different optimization objectives. For example, the balanced data transfer problem [24] is for mulated as a linear programming problem where a ‘minimum achieved sense rate’ is set for every individual node. In [23], data gathering is considered in the context of energy balance. A distributed protocol is designed to ensure that the average energy dissipation per node is the same throughout the execution of the protocol. However, these issues are not the focus of this dissertation. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. By modeling the energy consumption associated with each send and receive op eration, we formulate the data gathering problem as a constrained network flow opti mization problem where each each node u is associated with a capacity constraint wu, so that the total amount of flow going through u (incoming plus outgoing flow) does not exceed wu. We show that such a formulation models a variety of data gathering problems (with energy constraint on the sensor nodes). The constrained flow problem reduces to the standard network flow problem, which means that the algorithm developed in Chapter 3 can be applied. The algorithm can be used to solve both store-and-gather problems and continuous sensing and gathering problems. For the continuous sensing and gathering problems, we develop a simple distributed protocol based on the algorithm. The performance of this protocol is stud ied through simulations. Because the store-and-gather problems are by nature off-line problems, we do not develop a distributed protocol for this class of problems. This chapter is organized as follows. The data gathering problems are discussed in Section 4.2. We show that these problems reduce to network flow problem with constraint on the vertices. In Section 4.3, we develop a mathematical formulation of the constrained network flow problem and show that it reduces to a standard network flow problem. A simple protocol based on the RIPR algorithm is presented in Section 4.4. Experimental results are presented in Section 4.5. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 Data Gathering with Energy Constraints 4.2.1 System Model Suppose a network of sensors is deployed over a region. The location of the sensors are fixed and known a priori. The system is represented by a graph G(V, E ), where V is the set of sensor nodes. (u,v) G E if u G V, v G V and u is within the communication range of v. The set of successors of u is denoted as cru = {v G V\(u,v) G E}. Similarly, the set of predecessors of u is denoted as tjju = {v G V\(v, u) G E}. The event is sensed by a subset of sensors Vc C V. r is the base station to which the sensed data are transmitted. Sensors V — Vc — {r} in the network does not sense the event but can relay the data sensed by Vc. Among the three categories (sensing, communication, and data processing) of power consumption, a sensor node typically spends most of its energy in data commu nication. This includes both data transmission and reception. Our energy model for the sensors is based on the first order radio model described in [32]. The energy consumed by sensor u to transmit a k — bit data packet to sensor v is Tuv = edec x k-\-£am p xd^x/c, where eeiec is the energy required for transceiver circuitry to process one bit of data, sa mp is the energy required per bit of data for transmitter amplifier, and duv is the dis tance between u and v. Transmitter amplifier is not needed by u to receive data and the energy consumed by u to receive a A ;— bit data packet is R u = £eiec x k. Typ ically, £dec = 50nJ/bit and eam p = 0.1nJ/bit/m 2. This effectively translates to 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sam p x d'iv Select especially when short transmission ranges (~ 1 m) are considered. For the discussion in the rest of this chapter, we adopt the approximation that Tuv = R u for (u.v) 6 E. We further assume that no data aggregation is performed during the transmission of the data. Communication link (it, v) has transmission bandwidth cuv. We do not require the communication links to be identical. Two communication links may have different transmission latencies and/or bandwidth. Symmetry is not required either. It may be the case that cuv ^ cuv. If (u , v) E, then we define cuv = 0. An energy budget B u is imposed on each sensor node u. We assume that there is no energy constraint on base station r. To simplify our discussions, we ignore the energy consumption of the sensors when sensing the environment. However, the rate at which sensor u G Vc can collect data from the environment is limited by the maximum sensing capability gu. We consider both store-and-gather problems and continuous sensing and gathering problems. For the store-and-gather problems, B u represents the total number of data packets that u can send and receive. For the continuous sensing and gathering problems, B v represents the total number of data packets that u can send and receive in one unit of time. 4.2.2 Store-and-Gather Problems In store-and-gather problems, the information from the environment is sensed (pos sibly over a long time period) and stored locally at the sensors. The data is then 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transferred to the base station during the data gathering stage. This is because ds is an obvious upper bound for the SMaxDV problem, and can be handled trivially. \VC \ > 1 represents the general scenario where the event is sensed by multiple sensors. This multi-source data gathering problem is formulated as follows: Multiple Source Maximum Data Volume (MMaxDV) Problem: Given: A graph G(V, E). The set of source nodes Vc C V and sink r G V. Each node u G V — { r } has energy budget B u. Each node v G Vc has dv data packets that are locally stored before the data gathering starts. Find: A real valued function / : E — ) ■ R Maximize: E ^ / M ) Subject to: o Al for V (u, v) G E (1) £«e<7. /(u> v) + f ( v ’ u ) ^ for u g 7 - {r} (2) T /vecJ(u^v) = T JV ^ J { v^u) for u G V — Vc — {r} (3) Ytveau / ( “ ’ v) ^ U) + du for u G Sc (4) 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Similar to the SMaxDV problem, the net flow out of the intermediate nodes (V' — Vc — {r}) is 0 in the MMaxDV problem, as is specified in Condition 3. For each source node u e Vc, the net flow out of u cannot exceed the number of data packets previously stored at u. This is specified in Condition 4. 4.2.3 Continuous Sensing and Gathering Problems The continuous sensing and gathering problems model those time critical applications that need to gather as much information as possible from the environment while the nodes are sensing. Examples of such applications include battle field surveillance, tar get tracking, etc. We want to maximize the total number of data packets that can be gathered by the base station r in one unit of time. We assume that the communications are scheduled by time/frequency division multiplexing or channel assignment tech niques. We consider the scenario in which B u is the maximum power consumption rate allowed by u. Let f ( u , v) denote the number of data packets sent from u to v in one unit of time. Similar to the store-and-gather problem, we have the following mathematical for mulation when Vc contains a single node s. Single Source Maximum Data Throughput (SMaxDT) Problem: Given: A graph G(V, E ). Source s G V and sink r e V. Each node u 6 V — {r} has energy budget B u. Each edge (u, v) G E has capacity cuv. Find: A real valued function / : E — » R 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Maximize: £ oeff. f(s, v) Subject to: 0 < f{u, v) < cuv for V (u,v) e E (1) / ( u > v) + T,ve^u /(u’ for u £ V — {r} (2) E«gffu /(«»w ) = £„e^„ / ( w> w) for u E V — {s, r} (3) The major difference between the SMaxDV and the SMaxDT problem is the con sideration of link capacities. In the SMaxDV problem, since there is no deadline for the data gathering, the primary factor that affects the maximum number of gathered data is the energy budgets of the sensors. But for the SMaxDT problem, the number of data packets that can be transferred over a link in one unit of time is not only affected by the energy budget, but also bounded from above by the capacity of that link, as is specified in Condition 1 above. For the SMaxDT problem, we do not model the im pact of gu because gu is an obvious upper bound of the throughput and can be handled trivially. Similarly, we can formulate the multiple source maximum data throughput prob lem as follows. Multiple Source Maximum Data Throughput (MMaxDT) Problem: Given: A graph G(V,E). The set of source nodes Vc C V and sink r € V. Each node u e V — {r} has energy budget Bu. Each edge (it, v) G E has capacity C-uv Find: A real valued function / : E — » R 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Maximize: f(s,v) Subject to: 0 < f ( u ,v ) < c uv for V (u,v) E E (1) £»e<r. / ( u »v) + £„€*« / ( u’u) ^ for w 6 y {r} (2) E u e C T u / ( “ . u) = £ „ e ^ / K u) ^ u E V - V c - {r} (3) £ u G C T u /(«> w ) ^ £ « e ^ . f ( v’u) + 9u for u e V c (4) Condition 4 in the above problem formulation takes into account the sensing capa bilities of the sensors. 4.3 Flow Maximization with Constraint on Vertices 4.3.1 Problem Reductions In this section, we present the formulation of the constrained flow maximization prob lem where the vertices have limited capacities (CFM problem). The CFM problem is an abstraction of the four problems discussed in Section 4.2. In the CFM problem, we are given a directed graph G(V,E) with vertex set V and edge set E. Vertex u has capacity constraint wu > 0. Edge (u, v) starts from vertex u, ends at vertex v, and has capacity constraint cuv > 0. If (u, v) £ E, we define cuv = 0. We distinguish two vertices in G, source s, and sink r. A flow in G is a real valued function f : E R that satisfies the following constraints: 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. 0 < f(u,v) < cu v for V (u,v) G E. This is the capacity constraint on edge (u,v). f(v,u) for V u G V — {.s, r}. This represents the flow conservation. The net amount of flow that goes through any of the vertices, except s and t, is zero. 3- Y,veau f ( u> v) + / K u) < ^ for V m G V. This is the capacity con straint of vertex u. The total amount of flow going through u cannot exceed wu. This condition differentiates the CFM problem from the standard network flow problem. The value of a flow / , denoted as |/|, is defined as |/ | = J2veas /(-s> v)^ which is the net flow that leaves s. In the CFM problem, we are given a graph with vertex and edge constraint, a source s, and a sink r, and we wish to find a flow with the maximum value. Then the question becomes: how many data blocks should be processed locally? If the data blocks are to be relayed to other nodes for processing, then to which nodes should the data be relayed? The objective of the TMIP problem is to answer these questions for each sensor such that the overall system throughput is maximized. We can see that the system throughput in TMIP is the sum of Sc’s processing capa bilities and the rate with which data blocks flow out of Sc. After the data blocks flow out of Sc, they will be transferred in the system and finally be consumed (processed) by some nodes. If we model these data consumptions as a special type of data flow to a hypothetical node, then the throughput of the system is solely defined by the rate with which data blocks flow out of Sc. Given a TMIP problem with G(V,E) as the input graph and Sc C V as the source nodes, it is transformed to a standard network flow maximization problem in a new graph G' using the following procedure: Procedure 1: 1. For each node u G V, create a node u' in G'. Add a pseudo source s' and a pseudo sink t' to G'. 2. For each link (it, v) € E, add a link (it', v') to G' with c„v = cuv. 3. For each node u e V — Sc, add a link (it', t') to G' with cu> ti = wu. 4. For each node u e Sc, add a link (s', it') to G" with ca > u> = du — wu 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We have the following flow optimization problem based on the above procedure. (To simplify the notations, we have omitted the superscripts of the vertices.) Problem 1: Given: Graph G(V,E), source s G V and sink t e V. Edge (u,v) e E has capacity cuv. Maximize: E uey / 0 > w) Subject to: E „ ey / K w) = 0 foru G V - {s,t} 1 /(it, v) < cuv for V u G V, v € V 2 f(u , v) = —f(v, u) for V u € V, v e V 3 If an instance of the TMIP Problem has G as the input graph and Sc as the source nodes, we denote it as TMIP(G, Sc). If an instance of Problem 1 has G as the input graph, s as the root, and t as the sink, we denote it as Problem 1 (G, s, t). We use W t(G , Sc) to represent the maximum throughput for TMIP(G, Sc). We use Wi (G, s, t) to represent the maximum throughput for Problem 1 (G, s,t). The next proposition shows that the TMIP Problem is a special case of Problem 1. Proposition 1: Suppose TMIP(G, Sc) is converted to Problem 1 (G1 , s', t') using Pro cedure 1, then W r (G ,S c) = W 1( G ', s ', 0 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) A networked sensor system (b) The corresponding network flow rep resentation Figure 5.1: Reduction of TMIP to a network flow problem. Sensor nodes are denoted by circles. The square in (a) denotes the event of interest. Dotted lines in (a) represent the collection of data from the environment. The upper square in (b) denotes the newly added pseudo source s'. The lower square in (b) denotes the pseudo sink t'. Weight of the nodes and links are omitted in this figure. Proof: We use the notation in Procedure 1 to denote the nodes and edges in G and their corresponding nodes and edges in G'. Suppose f : V x V ^ R is a feasible solution for TMIP(G, Sc). We map it to a feasible solution f : V‘ x V‘ ' — > R for Problem 1 (G1 , s' , t') as follows: 1. initialize f(u', v') = 0 for V u', v' G V '. 2. if f(u, v) > 0, then set v') = f(u, u). 3. for u < E V, if Ylvev f( viu) ^ °»then set f '(u'> = T,vev f ( v> u) 4. for u G Sc, if J^vev f (u > v) > °»then set u') = E*,ev / ( u> v) 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It is easy to verify that such an f' is a feasible solution for Problem 1 (G", s', t') and that f leads to the same throughput as /. Suppose / : V x V' — » • R is a feasible solution for Problem 1 (G ',s',t'). We map it to a feasible solution f : V x V R for TMIP(G, s) simply as follows: for V u, v G V, set f(u, v) = v'). It is also easy to verify that such an / is a feasible solution for TMIP(G, Sc) and that it has the same throughput as f . □ Figure 5.1 illustrates a networked sensor system and the corresponding network flow representation after applying Procedure 1. Note that wu and cuv in the TMIP problem represents the processing capability and communication bandwidth of the sensors. The actual value of wu is determined by various factors such as the clock frequency, the supply voltage, the specific design of the circuitry, and the complexity of the algorithms for processing. The actual value of cuv is also determined by multiple factors such as the radio transmission power, the rate of signal decaying, distance between the sender and the receiver, etc. Since energy efficiency is a key consideration of networked sensor systems, trade-offs be tween computation/communication speed and energy have been explored extensively. For example, dynamic voltage scaling and frequency scaling techniques save energy by reducing the supply voltage and clock frequency of the sensor nodes, at the cost of slower processing speeds. The modulation scaling reduces the radio transmission power, however, at the cost of lower data communication rate. Additionally, these 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scaling techniques can be activated on-the-fly based on the workload and remaining energy of the sensors. The fact that wu and cuv are under continuous real-time adjust ment translates to the run-time variations of the link capacities in Problem 1. 5.3 On-line Protocol for In-network Processing The RIPR algorithm developed in Chapter 3 can be used to implement an on-line protocol for in-network processing. In this protocol, each node maintains a data buffer. Initially, all the data buffers are empty. Buffers at the source node u G Sc are filled at rate du. Let j3u denote the length of the used buffer at it. At any time instance, each node tiG k operates as follows: 1. Contact the adjacent node(s) and execute the RIPR algorithm. (Reduction from the TMIP problem to the relaxed flow maximization problem is straight forward and omitted here.) 2. If /3 U > 0 and u is not processing any data blocks, remove one data block from the data buffer and process it. 3. While f3 u > 0 and u is processing a data block, send message ‘request to send’ to V w G < 7 U if /(it, v) > 0. If ‘clear to send’ is received from v, then set /3U G- pu — 1 and send a data block to v. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Upon receiving ‘request to send’, u acknowledges ‘clear to send’ if Bu < U. u acknowledges a denial if j3u > U. Here U is a pre-set threshold that limits the maximum number of data blocks a buffer can hold. 5. If u E Sc and (3 U > U, stop sensing the environment until /3 U < U. Because the pseudo source node s is not an actual sensor, the nodes in Sc need to maintain a consistent image for s. This can be implemented by first electing a leader from Sc. Observe that h(s) is the only variable that all nodes in Sc share. The leader then maintains h(s) and broadcasts h(s) to nodes in Sc whenever h(s) changes. Such regional cooperations will cause some extra cost. However, since leader election can be implemented efficiently [47], such extra cost is minimum when compared with the push and relabel operations executed by the protocol. 5.4 Experimental Results Simulation study of the proposed protocol is conducted using the PARSEC [4] soft ware package. 5.4.1 Simulation Setup The simulated networked sensor system was generated by randomly scattering 20 - 80 sensor nodes in unit square. The base station was located at the lower-left comer of the square. The event of interest was randomly dropped into the square. Nodes within 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.2 units of distance from the event are assumed to sense the event. Each data block is 32 bytes. Radio transmission range of the nodes was set to 0.2, i.e. nodes within 0.2 units of distance can communicate with each other. Assuming a signal decaying factor of r~2, the flow capacity between sensor nodes u and v is determined by Shannon’s theorem as cuv = W log(l + ^ ^ ) V where W is the bandwidth of the link, ruv is the distance between u and v, Puv is the transmission power on link (u, v), and p is the noise in the communication channel. In all the simulations, W was set to IK H z and r] was set to 10 ~6m W . wu are uniformly distributed between 0 and wmax. du is distributed between 0 and dmax. Transmission time of a control message is assumed to be 1ms. Because we consider the scenario where the processing results of each data block consist of a very small number of bits, we ignore the cost of transferring the processing results to sink node in the simulations. 5.4.2 Summary of Results Our first set of simulations examine the convergence of our in-network processing protocol. For this set of simulations, the transmission power of all links was set to a constant value Puv = 10~3mW . The in-networking processing lasted 30 seconds for each simulation. Let N (t) denote the total number of data blocks processed by the system from time 0 to time t. The raw throughput is calculated as pa = iV(30)/30. The steady state throughput is calculated as ps = 2 1 _ The instantaneous 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.1: Normalized Raw Throughput n= 2 0 n=40 n=60 o oo I I c U = 1 0.9452 0.9421 0.9472 0.9516 U = 5 0.9591 0.9366 0.9440 0.9535 U = 10 0.9452 0.9372 0.9349 0.9531 Table 5.2: Normalized Steady-state Throughput n= 2 0 n=40 n=60 n=80 U = 1 0.9493 0.9456 0.9529 0.9634 U = 5 0.9639 0.9395 0.9482 0.9628 o T — 1 I I 0.9489 0.9412 0.9374 0.9640 throughput at time t is approximated as pr (t) = _ The start up time is defined as r = argm int(pr(i) > 0.85ps), which indicates the convergence speed of our in-network processing protocol. wmax was set to 1 0 . dmax was set to 1 0 0 . Buffer sizes of U = 1,5,10 were used. The simulation results are listed in Ta ble 5.1 - 5.3, where each data point is an average over 200 experiments. The values of pa and ps have been normalized to the optimal system throughput, which was calcu lated off-line. These simulation results show that our in-network processing protocol achieves around 95% of the optimal throughput. Our protocol is insensitive to the buffer size limit. Actually, reducing the buffer size from 10 to 1 does not cause noticeable degra dation in the system throughput. As can be seen from the results, the number of sensor nodes does not have noticeable impact on the system throughput. We define the time distance between any two nodes as the length (in terms of trans fer time of a data block) of the shortest path between the two nodes, and the diameter 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.3: Start up time o C N I I C i n=40 n=60 o 00 I I U = 1 0.2965s 0.2800s 0.3127s 0.3671s U = 5 0.3067s 0.2790s 0.3432s 0.3797s U = 1 0 0.2902s 0.2793s 0.3138s 0.3692s of a system as the longest time distance between any two nodes. The average sys tem diameter in the simulations is 0.105s. By using our protocol, the average system start-up time is around 0.3s. The adaptivity of our protocol has been verified by modifying the sensing, process ing and communication capabilities of the sensors while in-network processing is be ing performed. The simulation settings is the same as before except that we randomly chose 20% of the communication links and increase their bandwidth each by 50% during the simulations. When such changes occurred in the system, the adaptation procedure was activated and in-network processing was adapted. We have observed that the system operated at close to (about 95%) the new optimal throughput after the adaptation was completed. The results in table 5.4 show the adaptation time r a of our protocol, which is defined as follows: suppose a set of changes occur at time instance t0 and the the steady state throughput after the adaptation is p's, then Ta = argmin(0.85 < pr(t)/p's < 1.15) — t0 t> to Intuitively, ra is the time for the system to achieve 85% (when the new steady state throughput is higher than the original throughput) to 115% (when the new steady state 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.4: Adaptation time n= 2 0 n=40 n=60 n=80 U = 1 0.3078s 0.2907s 0.2656s 0.2186s U = 5 0.2678s 0.2826s 0.2442s 0 .2 1 0 2 s U = 1 0 0.2679s 0.3506s 0.2998s 0.2508s throughput is lower than the original throughput) of new steady state. The results in Table 5.4 show that the adaptation time is around 0.3s, roughly the same as the start up time shown in Table 5.3. However, the adaptation time does not increase as the number of nodes increases. This is possibly caused by the following facts: when the system has a large number of nodes, a subset of the nodes are already capable of processing all the data generated by the source nodes. If the performance of those not-in-use nodes was changed, then our algorithm would not be activated and the system simply continues to operate as if no changes have occurred. This suggests that sensing capabilities may be the performance bottleneck as the number of nodes increases. The results in Table 5.4 also show that our protocol is insensitive to buffer size. The next set of simulations study the impact of wmax. With W, rj, and Puv fixed, wniax represents the relative compute power of the nodes and hence the average com munication/computation ratio of the data blocks. We simulated systems with 20 and 80 nodes. For each system size, we evaluated the performance of our protocol with wmax ranging from 5 to 50. The results are shown in Figure 5.2, where each data point is an average over 500 simulations. The results have been normalized to the optimal system throughput. This optimal throughput was calculated off-line. We can see that the num ber of nodes does not have any noticeable impact on the system throughput. When 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. wmax becomes larger, our protocol achieves a closer to optimal throughput. However, the improvement in throughput is marginal as wmax increases. This suggests that com munication bandwidth is a more important factor for improving system throughput than processing capabilities of the nodes. „ 0 .9 8 o 0.96 = 1 0 .9 4 0 .9 2 — 20-node 8 0 - n o d e 0.9, 20 3 0 4 0 w max Figure 5.2: Impact of wmax on system throughput 5.5 Performance Comparison An alternative method for in-network processing problem is to transfer data blocks to neighbors that can process the data. This heuristic attempts to maximize system throughput by pushing data blocks from the source (where the data is sensed) towards the sink (where the data is processed) along some paths. Such path based greedy heuristics are widely used for many data routing problems since they are easy to im plement [2]. Actual choice of the path (shortest path, minimum latency path, etc) is application specific. But a common property of such heuristics is that the path is deter mined by the nodes based on some locally available information. An example of path 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. based greedy heuristics is directed diffusion [34] that offers a solution to a class of data gathering problems. In [34], the sink node notifies its interests in the data. While the interest in propagated throughput the system, each node locally determines a gradient that specifies which neighbor the data should be sent to. This gradient is then used to establish a path from the source nodes to the sink. Generally, path based greedy heuristics consist of the following four steps in the context of the TMIP problem: (1) Transform the TMIP problem to its network flow representation by applying the procedure specified in Section 5.2. (2) Find a certain path p from the source s to the sink t. We define the capacity of the path cp as minimum capacity of all the edges on this path. (3) Push cp units of flow along path p and reduce the capacity of all the edges on path p by cp. (4) Repeat step 2 and 3 until there does not exist any path from s tot. Note that the heuristic is applied to the network flow representation of the TMIP problem. Sink t is a pseudo node representing the processing of the data. If a certain amount of flow is pushed along a path to t, it actually means that the data is transferred along this path and then processed by the last node of this path. There can be a wide range of choices to find a path in step 2 above. It can be the shortest path, the minimum latency path, or just a randomly chosen path. For the sake of illustration, suppose step 2 of the above heuristic uses a randomly chosen path, the heuristic can be approximated by the following simple distributed protocol: (1) Every node maintains a data buffer, which has a predetermined size limit. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2) The source nodes keep sensing the environment and load their data buffers until the buffer becomes full. (3) At each node, as long as the data buffer is not empty, the node removes a data block from its buffer and processes the data block. (4) At each node, as long as the data buffer is not empty, the node sends a data block to a neighbor if the neighbor has less data blocks in its buffer. To determine if a node u has more data blocks than its neighbor v, u can send a query ‘request the number of data blocks’ to v and wait for the response. Or, v can broadcast the number of data blocks in its buffer whenever it changes. It is possible that two nodes u\ and u2 query a common neighbor v at the same time. u\ then sends a data block to v, increasing the number of data blocks at v by 1. At this time instance, the knowledge u2 has about v is stale. If u2 makes any decision based on this stale knowledge, then u2 may not be following the above protocol precisely. Such consistency issues can be solved via some low level handshaking mechanism. For example, we can enforce that ‘query the number of data blocks’ and ‘send the data block’ be executed together as a single atomic operation. Or, v can delay its response to u2 in the first place, and wait until ui has finished its operations. Designing details for this protocol is beyond the scope of this chapter. But it is clear that the spirit of the above protocol is to simply move data from the source to the sensors (where the data is processed, i.e. the pseudo sink) along paths based on local information. Although easy to implement, the greedy heuristic cannot guarantee the optimality of the solution when applied to the TMIP problem. Actually, the performance of the 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.3: An example illustrating the poor performance of path based greedy heuris tic greedy heuristic can be arbitrarily bad in the worst case. This is illustrated using the example in Figure 5.3. For the sake of illustration, the problem reduction procedure is skipped and the TMIP problem is shown in its network flow representation in Figure 5.3. Edges (n5, t), (nio, t), and (7745, t) represent data processing in the TMIP problem. Capacities of the edges are marked on the edges. Suppose the path that the greedy heuristic first choses is p = s — y ni — y n7 — y n 8 — > ■ n i5 — y t. 1 0 units of flow can be pushed via path p. Then no more flow can be pushed from s to t since there does not exist any available path from s to t. However, the system can actually achieve 30 units of flow because 1 0 units of flow can be pushed along each of the following three paths px = s — y Ti\ — y n 2 — y n 3 — y 7 7 . 4 — y n . 5 — y t, P2 — s — y n§ — y 7 7 , 7 — y ng — y 7 7 9 — y n X o —y t, p 3 = s — y nn — > ■ n \2 — > 77i3 -> n 14 — y 7715 — > t. In this example, the greedy heuristic achieves 1/3 of the optimal solution. Note that we can insert multiple copies of path 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.5: Normalized raw throughput of the greedy heuristic n= 2 0 n=40 n=60 n=80 U = 1 0.9612s 0.8423s 0.8041s 0.7377s U = 5 0.9604s 0.8348s 0.8021s 0.7342s o I I 0.9589s 0.8303s 0.7976s 0.7275s p2 into the system. This will lead to an arbitrarily bad worst case performance of the greedy heuristic. We have shown that choosing paths randomly can lead to very poor performance. It can be shown that for other path based greedy heuristics, there also exist instances in which the system performance is arbitrarily bad. The above example can be gener alized to show the following: Theorem 4. Given any real number e > 0, there exists instances of the TMIP problem in which the throughput of the path based greedy heuristic is less than e times the optimal throughput. To illustrated the non-optimality of path based heuristics, simulations were con ducted using the same settings as in section 5.4.1. The heuristic uses a randomly chosen path and is approximated as described above. The transmission power of all links was set to a constant value Puv = 10~3m W . The in-networking processing lasted 30 seconds for each simulation. wmax was set to 10. dmax was set to 100. We tested buffer sizes of U = 1 ,U = 5, and U = 10. The raw throughput of the greedy heuristic (normalized to the optimal throughput) are listed in Table 5.5, where each data point is averaged over 200 experiments. The results show that the performance of the path 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. based greedy heuristic is insensitive to the buffer size. Compared with the results in Table 5.1, the path based greedy heuristic has good performance (around 96%) when the network has only 20 nodes. However, as the number of nodes increases, the perfor mance of the greedy heuristic decreases (down to 72% when the system has 80 nodes). In summary, the results in Table 5.5 show that the scalability of the greedy heuristic is limited. 5.6 Discussion In this chapter, we considered the problem of in-network processing in networked sensor systems. After reducing the problem to its network flow representation, we developed a decentralized adaptive algorithm to maximize the system throughput. This algorithm was further implemented as an on-line protocol. System throughput upto 95% of the optimal was observed in the simulations. Adaptivity of the protocol (w.r.t. to the performance changes of the sensors) was illustrated through simulations. In the TMIP problem, we have modeled the processing capability, communication rate, and sensing rate of the sensors. Power consumption of the nodes was not directly modeled, but assumed to be controlled by some application level power management scheme. This leads to the continuous changes of the computation and communication capabilities of the nodes (due to power management). In addition, environmental fac tors can also affect the performance of the nodes. We addressed the issue of adaptation to such performance changes. 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To address power consumption directly, we can introduce a fourth characteristic for the sensors: power budget. It represents the maximum amount of energy that a sensor can consume in one unit of time. The power budget may be determined by various factors. For example, if a node is low on battery, it may impose a low power budget on its activities so as to extend its life. If a node is on the critical path of connecting two groups of sensors, it may also impose a low power budget so that it can operate over a long period of time. It is reasonable to assume that power budget is also controlled by some application level power management scheme. The goal is to maximize the throughput under the given power budgets of the sensors. The TMIP problem studies the class of in-network processing problems that maxi mize the throughput. An equally important problem is to maximize the total number of data blocks processed. Because every sensor has certain energy budget, the number of data blocks that a sensor can sense, send, receive, and process are limited. Therefore we need to determine the routing of data blocks through the system without violating the energy budgets of the sensors. The energy constraints, again, can be represented as constraints of the vertices. An interesting observation is that there are no edge capacity constraints: a communication link can be used to transfer an arbitrary number of data blocks (over an arbitrarily long time period) since we are not maximizing the number of data blocks processed in one unit of time. This effectively reduces the maximization of the number of data blocks to a special case of throughput maximization with edge capacities being set to infinity. For this class of in-network processing problems, we 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are exploring the possibility of developing a distributed algorithm that can be executed by nodes while the system continues to collect and process data. In this chapter, we have modeled networked sensor systems as a general graph where all the nodes have the same functionality. However, various studies have pro posed hierarchical infrastructures for networked sensor systems (e.g. [64]). In these infrastructures, one node is elected as the cluster head that coordinates the operations of the nodes in a cluster. Additionally, the role of the cluster head is often rotated among the nodes in a cluster [6 8 ]. This results in dynamic tree structured system topology. Because routing is greatly simplified for tree topologies, many data gath ering/processing problems can be solved efficiently. For example, if the root of the tree (often the base station) needs to disseminate data to the complete system, then the greedy algorithm in [9], which was originally developed for tree structured distributed computer systems, can be applied. When data processing capabilities and energy con straints are considered, non-trivial algorithms need to be designed to optimize system performance. Associated with the performance optimization problems, is the prob lem of system synthesis: given a system connected via a general graph, what is the optimal tree structured sub-graph that can collect the maximum number of data pack ets? Or, what is the optimal tree that can operate over the longest time period? Many networked sensor system applications can be abstracted as the coordination of com munication and computation in tree structured systems. Exploration in this direction 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which focuses on studies at the infrastructure level will greatly aid application design for networked sensor systems. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Maximum Lifetime Data Sensing and Extraction in Energy Constrained Networked Sensor Systems In this chapter, we focus on data gathering problems where the networked sensor sys tems operate in rounds. A subset of the sensors generate a certain number of data pack ets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system life time in terms of the number of rounds the sys tem can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing span ning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available informa tion. No global synchronization is needed. Although the heuristic cannot not guarantee 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic. 6.1 Introduction In this chapter, we consider the class of networked sensor system applications where the system works in rounds as in [35]. The event is sensed by a subset of the sen sors, each of which generate certain number of data packets during each round and all the sensed data needs to be collected by the base station. The goal is to maximize the number of rounds that the system can operate under the energy constraints of the sensors. By modeling the energy consumption associated with each send and receive oper ation, we formulate the data gathering problem as a restricted flow optimization prob lem. We show that the maximization of the number of rounds reduces to a modification of the standard network flow problem that has quota constraint (details can be found in Section 6.3) on the nodes. We develop an 0 (| Vd • | V| • \E\2-log(min{Bu/(S u+Tu)\u € K})) algorithm for this problem, where |V| is the total number of sensors, \E\ is the number of communication links, Vc is the set of sensors that collect data packets from the environment and |VC | is the number of such sensors, Bu is the energy budget of sensor u, Su is the energy cost for u to sense one data packet, and Tu is the energy cost for u to send out one data packet. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. While the above algorithm finds the optimal solution, it is centralized. We then develop a distributed heuristic for the data gathering problem where every node deter mines its sensing activities and data transfers based on locally available information. This heuristic is based on self-stabilizing spanning tree construction algorithms and the shortest path routing method. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. The rest of the chapter is organized as follows. The system model and problem formulation are discussed in Section 6.2. In Section 6.3, we reduce the maximization of rounds in data gathering to a restricted flow problem. In Section 6.4, we present our algorithm for the restricted flow problem and prove its optimality. In Section 6.5, we reconstruct the data flow for each round from the solution we find in Section 6.4. Section 6 . 6 compares the worst case performance of our algorithm against that in [35]. Section 6.7 discusses the distributed heuristic, examines its worst case performance, and presents simulations results on its average case performance. 6.2 System Model and Problem Statement 6.2.1 Model of Networked Sensor System Suppose a network of sensors is deployed over a region. The location of the sensors are fixed and known a priori. The networked sensor system is represented by a graph 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. G(V, E ), where V is the set of sensor nodes, (it, v) G E if it G V, v G V and it is within the communication radius of v. au, the set of successors of it, is defined as (ju = {v G V\(u,v) G - E 1 }. Similarly, tpu, the set of predecessors of u is defined as ipu = {v E V\(v,u) G E}. The event is sensed by a subset of sensors Vc C V. t is the base station to which the sensed data are transmitted. Sensors V — Vc — {t} in the network does not sense the event but can relay the data sensed by Vc. We further assume that no data aggregation is performed during the transmission of the data. Data transfers are assumed to be performed via multi-hop communications where each hop is a short-range communication. This is due to the well known fact that long-distance wireless communication is expensive in terms of both implementation complexity and energy dissipation, especially when using the low-lying antennae and near-ground channels typically found in networked sensor systems [2]. Additionally, short-range communication enables efficient spatial frequency re-use in sensor net works [2 ]. Each sensor it G V has an energy budget Bu. Base station t is assumed to have unlimited energy supply. Our energy model for radio transmissions of the sensors is based on the first order radio model described in [32], If v is within the communication radius of u, the energy consumed by sensor it to transmit a A :— bit data packet to v is Tu = eeiec x k+ea r n p x d 2 axk, where £e j ec is the energy required for transceiver circuitry to process one bit of data, eam p is the energy required per bit of data for transmitter amplifier, and du is the communication radius of sensor it. Transmitter amplifier is 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. not needed by u to receive data and the energy consumed by u to receive a A ;— bit data packet is Ru = £P j ec x k. The system is considered to be heterogeneous, i.e. TU (RU ) can be different from Tv(Ry). The energy consumed by sensor u to sense a A ;— byte data packet from the environment is Su = £sen x k where esen is the energy required for sensing circuitry to collect one bit of data from the environment. 6.2.2 Problem Statement We consider the class of applications where the system operates in rounds. Sensor u 6 Vc generates nu data packets during each round. These data packets need to be gathered by the base station. The total number of rounds that the system can operate is limited by the energy budgets of the sensors as well as the routing of data packets through the system. Our goal is to maximize the number of rounds. Let f(u ,v ) denote the number of data packets transferred from u to u. The maximal data gathering (MDG) problem is formulated as follows: Given: a networked sensor system represented by a graph G(V,E) where each sensor u 6 V has energy budget B u, cost of sending a data packet Tu, and cost of receiving a data packet Ru. Vc C V is the set of sensors that collect data from the environ ment. v e V c generates nv data packets during each round. The cost for v < G Vc to sense a data packet is SVEdge (u, v) e E if v is within the communication radius of u. t e V — Vc is the base station. Maximize: N 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Subject to: (!) £„ga„ / ( « , u) - / ( u> u) = N • nu for u e V c (2) E*e C T u /(«, v) = e „ 6^ / K u) for u G V - Vc - {t} 0) E „ effu / ( “ > u) ‘ + E U tf. / ( u> “ ) ' + N ' ^ S « for u £ Vc (4) E ^ / ( M>v) • T u + E^ev-u / ( u > f 0 T u e V - V c - {*} Function / in the MDG problem is called a data flow in G. Variable N in the problem statement represents the number of rounds the system can operate. Condition 1 requires that each sensor u E Vc generates nu data packets for each rounds. Condition 2 says that all the intermediate sensors do not generate or drop data packets. Condition 3 and 4 describes the energy constraints of the sensors. Nodes in Vc can sense the environment as well as relaying data packets for other nodes. Hence they may receive some data packets from the neighbors, as is shown in condition 3. Bu in the MDG problem models the energy constraints of the sensor nodes. It does not have to be the total remaining energy of u. For example, when the remaining battery power of a sensor is lower than a particular level, the sensor may limit its contribution to the data gathering operation by setting a small value for Bu (so that this sensor still has enough energy for future operations). For another example, if a sensor 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.1: An example of the MDG problem is deployed in a critical location so that it is utilized as a gateway to relay data packets to a group of sensors, then it may limit its energy budget for a particular data gathering operation, thereby conserving energy for future operations. These considerations can be captured by energy budget Bu in the MDG problem. Figure 6.1 shows an example of the MDG problem. The system consists of 9 nodes. Vc = {a, b, c}. Node t is the base station. Energy budgets are marked on the nodes. For simplicity, we consider Ru = Tu = 1 for all u € V, Sa = Sb = Sc = 1, and na = rib = nc = 1. This example system can operate over a maximum of two rounds. The paths to route the data packets are illustrated in Figure 6.1 as the three dotted lines. After two rounds, nodes a through h will have remaining energy 2.2, 0.3, 1.7,0.6, 1.2, 0.8, 1.2, 4.3, respectively. Nodes a and c can still sense data and some of the sensed data can be transferred to t. But node b cannot sense any more data packets since it has only 0.3 unit of energy left, which is less than Sb. Even if b can sense, it does not have enough energy to transmit the data packets since Tb > 0.3. Since a round is defined as collection of all data packets sensed by all the nodes in Vc, it means that 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the loss of node b prevents the system from successfully operating any more rounds. Another scenario that the system fails to operate a round occurs when the intermediate nodes fail (due to lack of energy) to transfer all the sensed data to t. This scenario is not shown in this example. 6.3 Reduction to a Restricted Flow Problem We can see that the energy budgets (condition 3) in the MDG problem is imposed on the nodes. In this section, we show that energy budgets of the sensors can be transformed to edge capacities. The MDG problem is an optimization problem of finding the maximum number of rounds N that can be achieved in a given graph. We consider the corresponding decision problem: given a graph G ( V , E ) and a number N , can we achieve N rounds in graph G? With the existence of condition 1, condition 3 can be re-written as Y f i u >v ) ■ Tu + i Y f ( u ’ v ) - N ■ n u) ■ R u + N -nu - S u < B u v £ a u v£ < tu for u G Vc 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which is equivalent to Y f(u, v) < (Bu + N ■ nu ■ (Ru - Su))/ (Tu + Ru) for u G Vc veiTu With the existence of condition 2, condition 4 can be re-written as Y /(it, v) ■ Tu 4 - ^ /(u , v)- RU< B U for u £ V — Vc — {f} ^6o-u uGCu which is equivalent to Y / K u) < Bu/(Tu + R u) for u G V - Vc - {£} ve<ru Then condition 3 and 4 can be represented uniformly as Y / ( u > v) - & for u E.V — {t} (6 .1) v£cru where fiu = {Bu + N • nu ■ (Ru - SU ))/(TU + Ru) for u G Vc Pu = BU /(TU + Ry) for u G V - Vc - {t} By introducing a pseudo source node s that connects to each node u G Vc, we can state the decision version of the MDG problem as the following restricted flow problem with vertex capacity constraints (RFVC): 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given: a graph G(V, E) with source s and sink t, a number N. Node u € V — {s,t} has capacity constraint 8U . Vc C V — {s, t}. Node v £ Vc generates nv data packets per round. Determine: whether there exists a data flow / : E — > • R that satisfies the following conditions: For the RFVC problem, suppose we split u € V — {s, t}) into two nodes u\ and u2, re-direct all incoming links to u to arrive at u\ and all the outgoing links from u to leave from u2, and add a link from ui to u2 with capacity j3 u, then the vertex constraint 8U is fully represented by the capacity of link (ui ,u 2). Actually, such a split transforms all the vertex constraints to the corresponding link capacities and leads to the restricted flow problem with edge capacities (RFEC). The RFEC problem is stated as follows: Given: a graph G(V, E) with source s and sink t, a number N, and Vc C V — {s, t}. Edge (u, v) has capacity constraint c(u, v). Node v e Vc generates nv data packets Determine: whether there exists a data flow / : E — > ■ R that satisfies the following conditions: (1) f{s,u) = N ■ nu (2 ) Hvevu / ( “ >v) = u) ( 3 ) E ^ e c ru f ( u , v ) < Pu fo r u G Vc for u € V — {s, t} for u G V — {s, t} per round. (1) f(s,u ) = N -nu fo r u < E Vc 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2 )£„G<r„ / ( “ « /(«> uX 0r U d z V — {s, t} (3)/(it,u) < c(u,u) f o r « e f - { s , t } Formal proof of the correctness of reduction from RFVC problem to RFEC prob lem is straightforward and hence omitted here. Condition 1 in the RFEC problem is the quota constraint, which requires node u e Vc to generate nu data packets per round. Condition 2 is the flow conservation constraint. It says that nodes does not generate or consume data packets except source s and sink t. (Node u G Vc does generate data packets in the MDG formulation. But with the introduction of pseudo source s, flow conservation is enforced on u in RFVC and RFEC problems.) Condition 3 is the edge capacity constraint. The RFEC problem is very similar to the standard network flow problem [17], which enforces flow conservation and edge capacity constraints. The RFEC problem differs from the standard network flow problem w.r.t. the additional quota constraint. We call both RFVC and RFEC problems restricted flow problems due to such a quota constraint. 6.4 Algorithm for the Restricted Flow Problem Given a RFEC problem with graph G(V, E) and number N, we define the the source capped graph induced by N as agraph GN (V, E N ) where E N = {(u, v)\(u, v) € E} and the the capacity t?(u, v) of the edge (u, v) in E* is defined as follows: cN (s, u ) = N -nu 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for u E os and c N ( u , v) = c(u, v) otherwise. Obviously, a feasible solution to a RFEC problem with graph G(V,E ) and number N must also be an optimal solution to the standard network flow problem in GN (V, E N ). Before introducing the algorithm for the RFEC problem, let us review some nota tions and concepts for the standard network flow problem. The standard network flow problem is to find a maximum flow from s to t in graph G(V, E), subject to the flow reservation and edge capacity constraints. For notational convenience, c(u, v) =f 0 if (it, v) E. If the actual data flow is from u to v, we define /(it, it) = — /(it, it). With the definition of /(it, it) and c(it, v) thus expanded, if neither (it, v) nor (it, it) belongs to E, then c(n, it) = c(v, it) = 0, which implies that /(it, it) = f(v, u) = 0. In this way, we can define f(u,v) over V x V, rather than being restricted to E. f(u,v) = — /(it, u) also allows us to represent the flow conservation constraint as /(« , v) = 0 , which is equivalent to Y,v€tT u / K v) = £ „ e * . / ( u> “ )• Given a graph G(V, E) and a flow / , the residual graph induced by / is a graph Gf( V, Ef), where Ef = {(it, i>)|it G V, v G V, c(it, v) — f(u, v) > 0}. Edge (it, v) in Ef has resid ual capacity Cf(u, v) = c(u, v) — f(u , v). An augmenting path p is a simple path from s to t in the residual graph Gf. The residual capacity of augmenting path p is defined as Cf(p) = min{c/(it, v) : (it, it) is onp}. A cut of G(V, E) is a binary partition (S,T) of V such that s E S, t E T, and S U T = V. The capacity of a cut (S, T) is defined as c(S, T ) = Y2u€S,veT c(u > v)• 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our algorithm (denoted as the RFEC algorithm) for the RFEC problem is as fol lows: 1. f(u, v) = 0 for V u, v € V 2 . for each u € as 3 . while / ( s , u) < N ■ nu 4. find the shortest augmenting path p that has (s, u) as the first hop 5. if such a path p does not exist 6 . return FAIL and exit the algorithm 7. else 8 . d = min{cy(p), N ■ nu - f{s,u)} 9. for each edge (u, v) in p 10. f(u,v) « - f(u,v) + d 11. Cf(u,v) 4—Cf(u,v) — d 12. end for 13. end if 14. end while 15. end for 16. return SUCCESS Theorem 5. Given a RFEC problem with graph G(V, E) and number N, the RFEC algorithm returns SUCCESS iff there exists a feasible dataflow for the RFEC problem. Proof: 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. = > •: If the RFEC algorithm returns SUCCESS, the values of /(it, v) upon termina tion of the algorithm actually constructs a feasible flow for the RFEC problem. < = = : Suppose there exists a feasible data flow / for the given graph G(V, E) and number N. It is easy to verify that / is also a maximum flow from s to t in G'N (V, F/N ). According to maximum-flow minimum-cut theorem [17], this maximum flow implies a minimum-cut (s , V — {s}) of GN (U, £ ’ N ) whose capacity is N ■ Ylue^s n“- Assume for the sake of contradiction that the RFEC algorithm returns FAIL. With out loss of generality, assume that the algorithm returns FAIL when checking u* E < j s , i.e. when the algorithm cannot find any augmenting paths that has (s, it*) as the first hop while / ( s , u*) is still less than N ■ nu*. Now we consider the source capped graph G ,N (U, E N ). Upon termination of the al gorithm, data flow / in G(V, E) actually constructs a flow (not necessarily a maximum flow) in GN (V, £ N ). We construct a cut (S', T) of GN (U, EN ) as follows: t = {s} U {u| there exists a path p from s to v in GN j, and the first hop of p is (s, u*)} T= V — S Gy in the above equation denotes the residual graph of GN , induced by / . We claim that t £ S. Otherwise there exists an augmenting path from s to tin Gf (and also in G^) with (s, it) as the first hop, then the test at line 5 of the algorithm will 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be valid and the algorithm will not fail. The fact that t S implies that t < G T, which means that (S, T) is indeed a cut of Gj. Let S' = S — {s}. We claim that f(u, v) = c(u, v ) for V u e S', v £ T. Otherwise f(u, v) < c(u, v) implies an edge (it, v) in GJ, which further implies the existence of a path from s to v (s — > ... — > ■ u — > v) in G^. But this contradicts the assumption that v e T . Because flow conservation is always satisfied as we push data packets along the augmenting paths, 0 = E v e v E u e s ' f (v ’ u ) = E „ es' / ( s >u ) + E v e r E „ e S ' f ( v , u ) + EveS' EueS' / (U > W ) — E u e s ' f (s >u ) + E ^ e r E u e s ' / (u >u ) = E u e s ' / ( s >M ) - E v e r E ues' / ( u > which means that E u e s ' f (s >u ) = E v e s ' E v e r / = E « g S ' E v e r c(u >v ) The capacity of cut (S’ , T) is calculated as follows: c(5, T) = Eues.ver C K u) = E v e r c (s >u) + E u e s ',v e r C(M> ^ = E v e r c (s ^ ) + E „ g5' / ( s > « ) 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. = E .et^n T } c(s > v) + Junes' / ( s > u ) = Ei;e{tTanT} Eue{o-sn5'} f (s > M ) According to edge capacity constraint, /(s, u) < c(s, u) = nu for u € as Particularly, for node u*, which is by definition in as n S', we have f(s,u*) < c(s,u*) = nu* Therefore, c(S, T ) — E^e{o-snr} nv "T E«e{o-sns'} f (s > u) < E ue{o-snT} n v E u e{o -sns'} c ( s >u ) — Eue{o-snT} nv + Euelo-sns'} n« = Eue{crsn(5'uT)} nu This is impossible since we have shown that the minimum cut of G *f has capacity N ■ J2ueas n«- The capacity of cut (S, T) can not be smaller than that of the minimum cut. □ It can be seen that the while loop in the RFEC algorithm consisting of line 3-14 is pushing data packets along the shortest augmenting path whose first hop is (s, u). Similar to the complexity analysis of the Edmonds-Karp algorithm [17], it can be 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shown easily that the complexity of the while loop is 0 (| V| • \E\2). The complexity of the RFEC algorithm is therefore 0{\as\ • |V’| • |_C|2). Note that as in the RFEC problem is mapped from Vc in the MDG problem. The mapping procedure can be completed in O (| V | + 1 E |) time. Hence the decision version of the MDG problem can be solved in 0(|V C | • \V\ ■ \E\2) time. We can apply binary search to find the maximum value of N for the original MDG problem. Because min{BU /(SU + Tu)\u G Vc} is an obvious upper bound for N, the maximum number of rounds in the MDG problem can be found in 0 (| Vc\ • | V| • \E\2 ■ log(min{Hu/(S'u + Tu)\u G V c}) time. In the RFVC problem, the capacity /3„ of vertex u restricts the maximum number of the data packets that u can transfer. There can only be integer number of data packets. Hence (5 U is effectively equivalent to \fiuJ where [/3U \ represents the largest integer smaller than or equal to @ u. Consequently, when mapping such a RFVC problem to a RFEC problem, a real-valued c(u, v) is effectively equivalent to \ c(u, u)J. It is well known that flow maximization using augmenting paths, which includes the scenario of our RFEC algorithm, generates integer valued solutions when the edge capacities are integers. Therefore, if the data packets are atomic and cannot be further divided, our algorithm is guaranteed to find integer valued optimal solution for the RFEC, and hence the RFVC and MDG problems. We illustrate the execution of the RFEC algorithm using the example system in Figure 6.1 where Vc = {a,b,c}. Remember that we consider Ru = Tu = 1 for 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.2: An illustrative example of the RFEC algorithm u £ {a, b, c, d, e, / , g, h}, Su = 1 andnu = 1 for u £ {a, b, c}. min{Bu/(Su+Tu)\u £ Vc} = 2.15, which is the upper bound on the number of rounds. Hence we start the binary search with N = 2. The first step is to transform the MDG problem formulation in Figure 6.1 to the RFEC formulation. The result is shown in Figure 6.2 where each node u £ {a, b, c, d, e, /, g. h} in the MDG formulation is split into two nodes ui and u2, and a pseudo source node s is added. Weight of the newly added edge u2) is calcu lated according to Equation 6.1. The value of nu {u £ {a, b, c}) in the MDG formu lation is inherited by nu (u £ {oi, bi, ci}) in the RFEC formulation, i.e. nu = 1 for u £ {ctiAjCi}. In order to check if the system can successfully operate 2 rounds, the RFEC al gorithm is executed. After initialization, suppose in line 2, the procedure chooses to check node ai first. The algorithm then attempts to push N x nai = 2 units of flow 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. from s to t along augmenting paths that have (s, ai) as the first hop. One of the possi ble augmenting paths is s — > ai -* a2 — > ex — > e 2 -> hx — > • h2 — > t. Suppose this path is chosen and 2 units of flow is pushed along this path. This fulfills the request of node a\. After pushing this flow, edges (oi, a2), (ei, e2), and (hi, h2) have remaining capac ities 1.1, 0.6, and 2.15, respectively. Then, suppose the algorithm chooses to check the second neighbor, b\, of source s. N x nbl = 2 units of flow need to be pushed from s to t via bi. s — > b\ — » b2 — > e\ — > a2 — > d\ — > d2 — > gi — > ■ t is the only possible augmenting path from s to t with (s, bi) being the first hop. Since the capacity of this augmenting path is 2 , the algorithm can successfully push 2 units of flow along this path. Similarly, for the third neighbor c\ of source s, N x nC l = 2 units of flow can be pushed from s to t along augmenting path s — > ■ c\ — > c2 — > ■ /i — > f 2 — > hi — > h2 t , which has (s, c\) as the first hop. This completes the check for all the three neighbors of s and the algorithm returns ‘SUCCESS’, which means that the system indeed can operate over a maximum of 2 rounds. Hence we stop the binary search at N — 2. Note that during the execution of the RFEC algorithm, we do not enforce a specific order by which the neighbors of the source node are checked. 6.5 Reconstructing the Data Flow for Each Round The RFEC algorithm (when used together with binary search) finds the maximum number of rounds N that the system can operate. Besides the total number of rounds N, the RFEC algorithm also finds the flow f(u, v) for each edge (u, v). The value of 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f(u,v) specifies the total number of data packets that are transferred along edge (u,v) over the N rounds, but it does not specify how many data packets should be transferred along (u, v) during the ith round, 1 < i < N. In this section, we address the problem of reconstmcting the data flow for each round, given the result of the RFEC algorithm, f{u,v). The reconstruction is based on the flow decomposition [1] technique, which was originally developed for the standard flow maximization problem. Let us first briefly review the standard flow decomposition problem. Given graph G(V, E) with source s and sink t, and any (not necessarily the maximum) flow f(u, v) in G, the flow graph induced by / is defined as graph GyflV, E') where E' — {(u, v) £ E\f(u, v) > 0}. A flow decomposition of Gy] is a decomposition of Gy] into a certain number of ‘primitives’. Each primitive is either a simple path p from s to t where the flow along each edge of p is the same, or a simple circle 7 where the flow along each edge of 7 is the same. It is well known that there can be at most \V\ + \E\ primitives when decomposing any flow graph Gy] [1]. Given an MDG problem, we have transformed its decision version to the RFEC problem, which enforces the quota constraint that differentiates the RFEC problem from the standard flow problem. The following discussion in this section considers the reconstruction of data flow for each round, given the solution (N and f(u,v), (u, v) £ E) to the RFEC problem. The result of reconstruction can then be easily 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transformed and applied to the original MDG problem. Let ip(p) denote the destination of the first hop of path p. Our objective is to decompose the solution of the RFEC algorithm (in the form of f(u, v) where (u, v) £ E) to N sets of paths. The ith set (i = 1, 2 , ..., N) corresponds to the ith round of data gathering. It specifies a set of paths IT = {pn ,p,2, ...} where each path is from s to t, and the data flow (denoted as 5(p)) that should be transferred along each path p £ fifi. We require that IT U • • • Ilyy is a decomposition of f{u, v) (where (u, v) £ E). To construct a valid data flow for the ith round, we require that YlPeni^v(p)=u^(p) = f°r Vu G < js. The various paths are not necessarily edge disjoint, i.e. an edge can be a member of multiple paths. Given G(V, E) and f(u, v), we first construct the flow graph G[p. Then we per form depth first search (DFS) on G[f] and find all the circles. Suppose a circle 7 is found, let 5 (7 ) denote the minimum flow along all the edges of 7 . Then for each edge (u,v) on 7 , we reduce f(u,v) by 5(7 ). In this way, we eliminate one edge from 7 and hence break the circle 7 . Repeating the above procedure, we can eliminate all the circles in Gyy In the following discussion, for the sake of simplicity, we assume that G[p is acyclic. To identify the paths, for each u G as, we split edge (s,u) into N edges (s,u)i, (s,u)2,..., ( s , u ) n , each going from s to u and having flow f(s,u )/N = nu, (recall that /(s , u) = N x n u upon completion of the RFEC algorithm). Flow nu along path (s, u)i corresponds to the quota constraint on u in the ith round of data gathering, 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i.e. node u needs to sense and send out nu data packets in the ith round. Then, as long as there are edges leaving s in G[f], follow a path p out of s until we reach t (we must reach t because Gy] is acyclic). Add path p thus found to Eh if the first hop of p is (s, u)k- Let the value of the path, 5(p), be the minimum flow along all the edges in p. Then we reduce the flow along every edge in p by S(p). This eliminates one edge from G[f\. We repeat the above procedure until there is no edge out of s in G[f\. Each time a path is added to n f c (k = 1,2,...N ), an edge is eliminated from Gyy Since there can be at most \E\ edges in Gyp the above procedure terminates in at most \E\ steps. Next we show that the above procedure, upon termination, finds a reconstruction of the data flow for each round. Upon termination, there is no edge out of s in Gyp Because all the nodes conserve flow and G[f] is acyclic, all the edges must have been eliminated from Gy], Therefore rix U • • • IIjv is indeed a decomposition of f(u,v) where (u, v) £ E. Now we consider any individual set 11; . II; consists of paths whose first hop is (s, u)i where u £ as. Then, upon completion of the above procedure, we must have X)J ,eni& ¥ > ( P )= u d(p) = nu f°r each u e as, i.e. the sum of flows on all paths in Ilj whose first hop is u must be equal to nu for each u £ os. Otherwise the sum would be less than nu, which means that edge (s, u)t is not eliminated from Gy ], contradicting the assumption that the procedure terminated. {p\p £ IIih<p(p) = u} is what we 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.3: Flow graph generated by the RFEC algorithm are looking for: the set of simple paths that can transfer nu data packets from s to t. Therefore, we use 1 1 * to transfer the data packets in the ith round, i = 1 , 2 , N. It is interesting to point out that the reconstruction is not unique. For example, the order in which the paths are found and added to ITi U • • • IIjv can be arbitrary; II* and Ilj can also be exchanged (when % / j). We illustrate the above reconstruction procedure using the example system previ ously shown in Figure 6.1. Given the MDG problem shown in Figure 6.1, we have illustrated the correspond ing RFEC problem formulation in Figure 6.2 and demonstrated the execution of the RFEC algorithm in Section 6.4. The solution generated by the RFEC algorithm is shown in Figure 6.3 in the form of a flow graph, where the flow is marked on each edge. Note that the flow along any edge (u, v) indicates the total number of data pack ets going through (u , v) for all the N(=2) rounds. Edge (a2, e\) does not carry any flow so it does not appear in the flow graph. 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.4: Reconstructing the flow for each round. Step 1: split the edges out of s. It is a coincidence that Figure 6.3 is already acyclic. So the removal of circles is skipped here. (Actually, we still need to perform a DFS to verify that the flow graph is indeed acyclic.) Note that the flow graph may not always be acyclic and the removal of circles then becomes necessary. The next step is to split (s, u) into N edges for each u £ as. In this example, crs = {ai, bi, Ci} and N = 2. Therefore, (s, ai) is split into N(=2) edges, (s, ai)i and (s, ai)2. /(s , a\) = 2 in Figure 6.3 indicates that a total number of 2 data packets should be transferred from s to a\ in 2 rounds. Therefore, both edge (s, ai)i and (s, a i ) 2 in Figure 6.4 have a flow of value /(s , ax)/N — na = 1, representing that na = 1 data packets should be transferred from s to ai in the first and the second round. Edges (s, b\) and (s, c\) are also split in the same way. The result of the split is shown in Figure 6.4. The names of the newly split edges are marked on the edges. The paths to transfer the data packets are identified as follows: As long as there are edges leaving s, we follow an arbitrary path till we reach t. Suppose the first path we choose is s — > c\ — > c2 — > fi — » f 2 — » ■ hi — > • h2 — > t and 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.5: Reconstructing the flow for each round. Step 2: find the paths. the first hop is along edge (s, ci)2. We add this path to n 2 since the first hop is (s, ci)2. The minimum flow along the edges in this path is 1 (edge (s, ci)2). Therefore, the flow along this path is set to 1 and we reduce the flow along all edges in this path, each by 1. Since (s, Ci) 2 does not have any flow after the reduction, it is removed from the flow graph. The result at this stage is shown in Figure 6.5. We repeatedly follow the paths from s to t and add the paths to lb according to their first hop, until there are no more edges out of s. It is easy to verify that upon termination of the above process, we have the following result: Pn = s — y a i —y a2 — > ■ d i — > d 2 —y gi —y g2 —y t P i 2 = s —y b \ —y b 2 —y e \ —y e 2 —y h i — y h 2 —y t P i3 = s — > ■ ci — y c2 —y /i —y / 2 —> ■ h i —y h 2 — » t H i = { p i l I « y (p n ) = 1 , P l 2 1 <s(p1 2 ) = l > ^ 1 3 1 5 ( p i 3 ) = 1 } P 21 — s ~ t a i ~ t a 2 —y d i —y d 2 —y g i —y g 2 —^ t p 22 = s —y b i —y b2 —y e i —y e 2 —y h i —y h 2 —y t 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P 2 3 — S — > C i — » C 2 — > ■ fl — > / 2 — > • ^1 — > h,2 — i ► £ I I 2 = {p 2 1 |(5(p2l ) = l ) P 22 ||5(P22) = 1 > P 23 |5(j?23) = 1 } For the sake of simplicity, we set na = = nc = 1 in this example, which means that nodes a, b, and c sense only one data packets in each round. Consequently, a single path is capable of transferring all the data packets sensed by a, b and c in each round. If na > 1 (nb > 1, or nc > 1), we may need multiple paths to transfer the data packets sensed by a (b, or c) for each round. Another observation from this example is that El] and n 2 contains the same paths. This is also a coincidence in this particular example. We choose this simple example for illustration purpose only. If the algorithm is applied to more complicated systems, Elj and IIj (i ± j ) may contain different paths and flow along the paths may also be different. 6.6 Performance Comparison In this section, we compare the performance of our algorithm against the method pro posed in [35]. The study in [35] considers a similar data gathering problem as the problem studied here. In [35], it is assumed that each sensor generates exactly one data packet in each round, and all the data packets need to be transferred to the base station. The goal is to maximize the number of rounds the system can operate. Such a scenario reduces 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to the MDG problem. The system model used in [35] is generally the same as our model except for three aspects: the energy for sending a data packet is assumed to be dependent on the distance of the receiving node from the sender, the energy cost of sensing the environment is ignored, and each sensor generates exactly one data packet per round. While it can be debated which system model accurately represents reality, in order to obtain a fair comparison, we consider the scenario that every sensor generates one data packet in each round, a sensor spends the same amount of energy when sending a data packets to any of its neighbors, and the energy cost sensing the environment is zero. For this scenario, the solution technique in [35] as well as the proposed technique can be applied. A two stage method is used in [35] to obtain an integer valued solution. In the first stage, a relaxed linear programming formulation of the data gathering problem is solved (using some linear programming algorithm). We refer to this approach as the relaxed flow approach. The solution specifies how many data packets f(u ,v ) should be transferred between sensors u and v (when communication link (u, v) exists). Since the result f(u, v) may not be an integer, the solution is floor rounded to f'(u, v). Note that this floor rounding may compromise the property of flow conservation. So f(u , v) is only used as the edge capacity constraint, which defines a new flow optimization problem. Then in the second stage, an integer valued solution is found for the new flow optimization problem, using the augmenting path method. It is claimed that the solution produced by such an approach is near-optimal. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Although experimental results in [35] illustrate that the two stage method can achieve close to optimal performance, in the following, we demonstrate the non optimality of the relaxed flow approach. Consider the simple example shown in Figure 6 .6 . The system consists of four sensors (a, b, c, d) and the base station t. For a, b, c, and d, the cost of sending a data packet is equal to that of receiving a data packet. The energy budget of the sensors, in terms of the number of data packets that can be received and transferred, are marked by the nodes. Obviously, the sample system can operate for a maximum number of two rounds (one of the optimal solutions is f(a,b) = 2, f(c,d ) = 2, f(b, t) = 4, f(d,t) = 4, f(a, d) = 0, /(c, b) = 0). It is easy to verify that the proposed approach generates this solution. However, when we apply the relaxed flow approach, the first stage may generate the following real-valued solution: /(a , b) = 1.5, /(c, d) = 1.5, f(b, t) = 4, f(d, t) = 4, /(a , d) = 0.5, f(c, b) = 0.5, which rounds to /'(a , b) = 1, f'(c, d) = 1, f'(b,t) = 4, f'(d,t) = 4, f'(a,d) = 0, f'(c,b) = 0. Note that such real valued solutions are obtained if, for example, the linear programming toolbox in Matlab is used. If we define a new flow problem using these rounded values and solve it as in the second stage in [35], the maximum number of rounds that the system can operate, however, in only 1, which is only 50% of the optimal. In the worst cast, the behavior of the relaxed flow approach can be much worse than above. Suppose in Figure 6 .6 , sensors a and c have energy budget 1, b and d 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6 .6 : Example for worst case performance comparison have energy budget 2. It is easy to verify that the proposed approach finds the optimal solution. The number of rounds in the optimal solution is one. The relaxed flow approach, however, may produce the following real-valued solution: f(a,b) = 0.5, f(c,d) = 0.5, f(b,t) = 2, f(d,t ) = 2, f(a,d ) = 0.5, f(c,b) = 0.5, which, after rounding, leads to an integer solution with 0 rounds. We denote a solution technique to have failed if it produces a solution with zero rounds (i.e. the solution does not gather all the packets in the first round) while there exists a solution with at least one round. Let (2 represent the number of sensors in the system. The above scenario can be generalized to show: Theorem 6 . For all 0 > 4, there exist instances in which the relaxed technique fails to produce a solution. 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.7 A Distributed Heuristic for the MDG Problem The proposed RFEC algorithm and the relaxed flow approach are centralized. Both can only be executed in a centralized fashion. Next we study the performance of an distributed heuristic for the MDG problem. In this heuristic, every node determines its activities (sensing and transferring) based on its own local information and the information available at its neighbors. The heuristic is based on the self-stabilizing spanning tree construction algorithms [29] and the widely used shortest path routing method. We denote the heuristic as the shortest path heuristic. In this heuristic, each node u E V maintains a variable d(u) which is used to record the distance from u to base station t. Node u also maintains a data buffer, which is used to store sensed or received data. We assume that every node knows an upper bound U on the total number of nodes. We also assume that a new round is triggered by some external mechanism. The failure of a round is also determined by some external mechanism. The time interval between two rounds is large enough so that the system can complete one round (if possible) before starting a new one. Let e(u) denote the remaining energy of u and b(u) denote the number of data packets in the data buffer of u. The heuristic is described as follows: 1. Initially, d(u) = 0 for all nodes u E V. 2. Each u E V — {£} executes the following two operations (a) distance update 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i. ifm inveau,e(v)>Rv{ d { v ) } < d{u) < U, then set d(u) < — m in^,^ d(v) + 1 ii. if e(u) < Tu, the set d(u) <-U (b) data transfer if 0 < b{u) < U, e(u) > Tu, and 3v G au s.t. d(u) = d(v) + 1, and b(v) < then u sends one data packet to v 3. Each node u G Vc has one additional operation to execute: whenever a new round starts, u senses and puts nu data packets into its data buffer. 4. Since the base station t is assumed to have unlimited energy supply, t simply receives any data packets intended for t. In the shortest path heuristic, the ‘distance update’ step attempts to establish for each node a shortest path to the base station. Base station t is always at distance 0 (i.e. d(t) = 0). For every other node u G V — {t}, it chooses the neighbor v with the smallest distance d(v) as its successor and sets its own distance to d(u) = d(v) + 1 . It can be shown that starting with d(u) = 0 for u G V, the distance update step will eventually establish a breadth first search tree. A node simply chooses a neighbor which is one hop closer to the base station as its parent in the tree. This constructs a shortest path to the base station for each node. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As the data gathering proceeds, some nodes may deplete their energy due to send ing and receiving data packets. Consequently, for any node, its shortest path to the base station may change. This has also been considered by the distance update step of the heuristic. If a node u does not have enough energy to send a data packet, then it sets its d(u) to the upper bound U, indicating that u cannot reach t anymore. If a node u does not have enough energy to receive a data packet, then it will be excluded from being the parent of any other nodes. Actually, the execution of the distance update step continuously updates the breadth first tree rooted at the bast station. After the breadth first tree is constructed, the nodes start transferring the data pack ets. Node u will send one data packet to its neighbor v if the following conditions are met: (1) d(u) is less than the upper bound U, (2) u has enough energy to send one more data packet, (3) v is one hop closer to the base station, and (4) v has less data packets in its buffer than u. Conditions 1 and 2 indicate that a node sends out a data packet only if a path to the base station exists. Condition 3 tells a node to send data packets only to its parent in the breadth first tree. Condition 4 prevents a node from receiving too many data packets but cannot deliver these data packets to the base station later on. Because the shortest path heuristic continuously updates the breadth first tree ac cording to the remaining energy of the nodes, it can be executed while the data trans fer is performed. However, it cannot guarantee optimality. This is illustrated using the example system in Figure 6.7. The system is shown in the MDG formulation. To simplify the notations and figures, we omit the transformation to RFEC representation 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.7: Example demonstrating the non-optimality of the shortest path heuristic and present the following discussions based on this MDG formulation directly. The system consists of 17 nodes V = { ai,a2, • • -ai6 ,f}. Vc = {ai}. Tu = Ru = 1 for u G V. Sai = 1. nai = 1. Energy budgets of the nodes are marked on the nodes. The system can operate over a maximum of 15 rounds. This optimal solution is obtained when f(ai, a5) = / ( a 5 ,a 6) = /( a 6 ,a 7) = / ( a 7 ,a 8) = / ( a 8,t) = 5, /( a x,a 2) = /(a 2,03) = /(ct3,a4) = /(a4,ag) = /(ag,aio) = /(aio,on) = /(an,ai2) = /(012, t) = 5, and / ( a x, a X 3 ) = / ( a i 3, au ) = f{au , a15) = /(015, ow) = / K e , t) = 5. It can be easily verified that the RFEC algorithm finds this solution. However, using the shortest path heuristic, the following scenario will occur: ( 1) the distance update step will find ax — > • a 5 -* a9 — > a1 6 t as the shortest path connecting ax and t. (2) 5 units of data packets are transfered along this path. (3) no more data packets can be transfered, so the data gathering terminates. In the above solution, step (2) utilizes the currently available shortest path to trans fer the data packets. It uses up the energy of nodes a5, ag, and ai6. Then no more data 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can be transferred from ax to t because there does not exist any path between oi and t. The shortest path heuristic can achieve a total number of 5 rounds, which is only one third of the optimal. If the shortest path from a\ to t contains n nodes instead of 3, we can construct a similar example system where the shortest path heuristic achieves only 1/n of the optimal. As before, let 0 denote the total number of nodes. The above scenario can be generalized to show: Theorem 7. For any real valued e E (0,1), there exists an integer number Q0 > 0 s.t. for all > fi0, there exists problem instance(s) for which the shortest path heuristic produces a solution with system lifetime less than (e x the optimal system lifetime). However, this heuristic has very good average case performance. This is illustrated through the following simulations. In the simulations, nodes are randomly deployed in a unit square with uniform distribution. The communication radius of all the nodes are assumed to be equal. The base station is located at the lower left comer of the unit square. Su, Ru and Tu are uniformly distributed in [0,1]. A certain number of nodes are randomly chosen as the source nodes. The source nodes are not necessarily direct neighbors of each other. For each source node u, the number of data packets to be gathered per round, nu, is uniformly distributed in [1, nmax\ where nmax is a parameter in the simulations. The initial energy at the nodes are uniformly distributed in [ 0 , emax] where emax is another parameter. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The following five sets of parameters were studied: (1) |V|, the total number of nodes. (2 ) communication radius of the nodes, (3) \VC \, the number of source nodes (4) nmax, the maximum number of data packets to be gathered per source node per round, and (5) emax, the maximum amount of energy initially available at the nodes. In the simulations, |V| was selected from 40, 80,120, and 160. Communication radius was selected from 0.2, 0.3, 0.4, and 0.5. | V^|/1 V| was selected from 0.1, 0.2, 0.3, and 0.4. nmax was selected from 5, 10, 15, and 20. And emax was selected from 1000, 2000, 3000, and 4000. In summary, there were 1024 combinations of the parameters. For each of the 1024 combinations, we simulated 200 randomly generated systems. For each system simulated, we calculated the ratio between the lifetime achieved by the shortest path heuristic and the maximum lifetime obtained through the RFEC algorithm. Let q denote this ratio. Summarizing over all the simulation results, we observed that q has mean value 0.68 with standard deviation 0.36. Figure 6 . 8 shows the histogram of q over the 204800 experiments. In 47% of the experiments, the heuristic achieved the maximum system lifetime. For the remaining 53% of the experiments, the quality of solution, represented by q, was roughly uniformly distributed between 0 and 1 . To identify possible directions to further improve the performance of the heuristic, we studied the impact of each individual parameter. The results are represented in Figures 6.9-6.13. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.5 0.4 £0.3 a > c r £ 0.2 LL 0.1 °0 0.2 0.4 0.6 0.8 1 Quality of solution Figure 6 .8 : The histogram of q over all the simulations. The value of q for each sim ulation is calculated as the ratio between the achieved system lifetime and the optimal lifetime. The y-axis has been normalized to the total number of simulations. 0.8 o 0.6 £ 0.4 0.2 Number of nodes Figure 6.9: The impact of the number of nodes on system lifetime. The y-axis has been normalized to the optimal system lifetime. 0.8 c 5 0.6 £ 0 .4 0.2 0.3 Communication radius 0.4 0.5 radius Figure 6.10: The impact of communication radius on system lifetime. The y-axis has been normalized to the optimal system lifetime. 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.8 S 0-4 0.2 0.3 nodes 0.4 0.2 Number of source nodes source Figure 6.11: The impact of the number of source nodes (normalized to |V^|/|V|) on system lifetime. The y-axis has been normalized to the optimal system lifetime. 0.8 2 0 -6 2 0.4 0.2 1%00 3000 4000 2000 e m a x Figure 6.12: The impact of emax on system lifetime. The y-axis has been normalized to the optimal system lifetime. 1 0.8 a> b 2 0-6 E a> 0 4 C O 0.2 i n m a x Figure 6.13: The impact of nmax on system lifetime. The y-axis has been normalized to the optimal system lifetime. 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The results show that the performance of the heuristic is more sensitive to the number of nodes, the number of source nodes, and communication radius than to emax and nmax. Figures 6.9 and 6.11 show that the system lifetime reduces as the number of nodes or the number of source nodes increases. The system lifetime improves when the communication radius increases, as shown in Figure 6.10. On the other hand, the system lifetime did not change too much when we varied emax and nmax, as shown in Figures 6.12 and 6.13. This indicates that the topology of the system has more impact on the system lifetime than the properties of the individual nodes do. This observation has two implications for future studies. First, a better heuristic may be designed by taking into account more knowledge about the system topology. Second, given a set of nodes with pre-determined properties (for example, available energy), the system performance can be maximized by carefully designing the topology of the deployment. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Conclusion This dissertation studied adaptive application execution in two classes of seemingly different yet intrinsically related systems: distributed computer systems and networked sensor systems. For distributed computer systems, the dissertation studied the execu tion of a large set of independent tasks where each task is to process a certain amount of input data. The objective is to maximize the throughput of the whole system. For networked sensor systems, the dissertation studied data gathering problems where the data collected by the sensors needs to be transferred to the base station. The disserta tion also studied in-network processing problems for networked sensor systems where the data collected by the sensors needs to be processed by the sensors before sending to the base station. At an abstract level, both systems can be modeled as a graph where the vertices represent either the computers or the sensors and the edges represent the communication links among the computers or sensors. The networked sensor systems 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. have one additional capability constraint on the nodes, namely the energy budget. De spite this difference, we show that the problems of interest in two classes of systems reduce to the same network flow representation. Consequently, both the task allocation problem and the data gathering/processing problems are solved in a distributed fashion by using the proposed Relaxed Incremental Push Relabel algorithm, which provides a distributed and adaptive solution to the network flow problem. This dissertation also studies the maximization of the system life time for a class of data gathering problem. This problem reduces to a variation of the network flow problem. Unfortunately, the Relaxed Incremental Push Relabel cannot be applied. A strongly polynomial algorithm is then designed instead. Although the effectiveness of the proposed adaptive solution has been theoretically proved and experimentally verified, it should be pointed out that the solution has its limitations. The adaptive solution assumes that the available capabilities of the nodes and links will stop changing after a certain period of fluctuation, and then the system will stay in such a non-changing status for a time period of r. r must be long enough for the RIPR algorithm to find the optimal solution. Otherwise, changes in the capa bilities of the nodes and links will keep occurring before the RIPR algorithm can find the optimal solution, hence the algorithm always lags behind and the system can never really adapt to the dynamic behaviors of the resources. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Unfortunately, the rapid changes in the capabilities of the resources is not uncom mon, especially when the system is shared by a relatively large number of users. Ac tually, when the resource characteristics change so rapidly, defining the optimal per formance for the system becomes a big challenge because adaptation does not make senses any more. One of the possible solutions is to model the capabilities of the re sources as random variables whose statistical properties can be extracted from actual experiments. Then the ‘optimal’ performance of the system can be defined statistically. Of course, the statistical meaning of many optimization constraints (e.g. ‘in-coming flow equals out-going flow’) must be re-defined too. This statistical optimization can make for some interesting and practically im portant future work. 