Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Architecture -independent programming and software synthesis for networked sensor systems
(USC Thesis Other)
Architecture -independent programming and software synthesis for networked sensor systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ARCHITECTURE-INDEPENDENT PROGRAMMING AND SOFTWARE SYNTHESIS FOR NETWORKED SENSOR SYSTEMS by Amol B. Bakshi __________________________________________________________ A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) December 2005 Copyright 2005 Amol B. Bakshi UMI Number: 3219853 3219853 2006 Copyright 2006 by Bakshi, Amol B. UMI Microform Copyright All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346 All rights reserved. by ProQuest Information and Learning Company. Dedication To my parents ii Acknowledgements I am deeply grateful to my advisor Prof. Viktor K. Prasanna for his guidance, kindness, and patience. Working with him has always been – and continues to be – an enriching experience. I also appreciate the encouragement and advice I have received from my committee members Prof. Bhaskar Krishnamachari and Prof. Ramesh Govindan. I owe special thanks to Daniel Larner, Jim Reich, Maurice Chu, Qingfeng Huang, Patrick Cheung, and Julia Liu at the Palo Alto Research Center for making the summer of 2004 a truly memorable and enjoyable experience. I also owe a debt of gratitude to Girish Birajdar for giving me a home during the three months of the internship and for his company in our occasional outdoor adventures. The PhD years have passed by in a flash because of the numerous friends at USC – special thanks to Abhinav Sethy and to the “p-group”. I am grateful for their friendship, although I do sometimes believe I would have graduated sooner had I not felt so much at home here! I am thankful to my family for supporting my academic pursuits. I hope I have made them proud. iii Table of Contents Dedication ii Acknowledgements iii List Of Tables vii List Of Figures viii Abstract x 1 Introduction 1 1.1 Networked sensor systems . . .... ... .... .... .... ... .... .... 1 1.2 Sensor networks and traditional distributed systems .... .... ... .... .... 2 1.3 Macroprogramming: What and Why? . . . .... .... .... ... .... .... 6 1.4 Contributions and Outline . . . .... ... .... .... .... ... .... .... 9 2 Programming of Distributed Sensor Networks 13 2.1 Wireless sensor networks . . . .... ... .... .... .... ... .... .... 13 2.1.1 Service-oriented specification . . . .... .... .... ... .... .... 13 2.1.2 Macroprogramming . .... ... .... .... .... ... .... .... 14 2.1.3 Node-centric programming . . . . . .... .... .... ... .... .... 15 2.2 Parallel and distributed computing . . . . . .... .... .... ... .... .... 16 3 The Abstract Task Graph 18 3.1 Target applications and architectures . . . . .... .... .... ... .... .... 18 3.2 Key Concepts . .... .... .... ... .... .... .... ... .... .... 20 3.2.1 Data Driven Computing . . . . . . .... .... .... ... .... .... 20 3.2.1.1 Program flow mechanisms.... .... .... ... .... .... 20 3.2.1.2 Why data driven? . . . . .... .... .... ... .... .... 23 3.2.2 Mixed Imperative-Declarative Specification .... .... ... .... .... 25 3.3 Syntax . . . . . .... .... .... ... .... .... .... ... .... .... 27 3.4 Semantics . . . .... .... .... ... .... .... .... ... .... .... 32 3.4.1 Terminology . .... .... ... .... .... .... ... .... .... 32 3.4.2 Firing rules . . .... .... ... .... .... .... ... .... .... 32 3.4.3 Task graph execution . .... ... .... .... .... ... .... .... 33 3.4.4 get() andput().. .... ... .... .... .... ... .... .... 33 3.5 Programming idioms . .... .... ... .... .... .... ... .... .... 35 3.5.1 Object tracking .... .... ... .... .... .... ... .... .... 36 3.5.2 Interaction within local neighborhoods . . .... .... ... .... .... 37 3.5.3 In-network aggregation.... ... .... .... .... ... .... .... 38 iv 3.5.4 Hierarchical data fusion . . . . . . .... .... .... ... .... .... 39 3.5.5 Event-triggered behavior instantiation . . . .... .... ... .... .... 40 3.6 Future work . . .... .... .... ... .... .... .... ... .... .... 42 3.6.1 State-based dynamic behaviors . . . .... .... .... ... .... .... 42 3.6.2 Resource management in the runtime system . . . .... ... .... .... 44 3.6.3 Utility based negotiation for task scheduling and resource allocation . . .... 46 4 DART: The Data Driven ATaG Runtime 48 4.1 Design objectives . . . .... .... ... .... .... .... ... .... .... 48 4.1.1 Support for ATaG semantics . . . . .... .... .... ... .... .... 48 4.1.2 Platform independence .... ... .... .... .... ... .... .... 49 4.1.3 Component-based design . . . . . . .... .... .... ... .... .... 50 4.1.4 Ease of software synthesis . . . . . .... .... .... ... .... .... 51 4.2 Overview . . . .... .... .... ... .... .... .... ... .... .... 53 4.3 Components and functionalities . . . . . . .... .... .... ... .... .... 56 4.3.1 UserTask . . . .... .... ... .... .... .... ... .... .... 56 4.3.1.1 Service . . . .... ... .... .... .... ... .... .... 56 4.3.1.2 Interactions .... ... .... .... .... ... .... .... 56 4.3.1.3 Implementation . . . . . .... .... .... ... .... .... 57 4.3.2 DataPool . . . .... .... ... .... .... .... ... .... .... 59 4.3.2.1 Service . . . .... ... .... .... .... ... .... .... 59 4.3.2.2 Interactions .... ... .... .... .... ... .... .... 59 4.3.2.3 Implementation . . . . . .... .... .... ... .... .... 59 4.3.3 AtagManager . .... .... ... .... .... .... ... .... .... 62 4.3.3.1 Service . . . .... ... .... .... .... ... .... .... 62 4.3.3.2 Interactions .... ... .... .... .... ... .... .... 62 4.3.3.3 Implementation . . . . . .... .... .... ... .... .... 63 4.3.4 NetworkStack . .... .... ... .... .... .... ... .... .... 67 4.3.4.1 Service . . . .... ... .... .... .... ... .... .... 67 4.3.4.2 Interactions .... ... .... .... .... ... .... .... 67 4.3.4.3 Implementation . . . . . .... .... .... ... .... .... 67 4.3.5 NetworkArchitecture . .... ... .... .... .... ... .... .... 68 4.3.5.1 Service . . . .... ... .... .... .... ... .... .... 68 4.3.5.2 Interactions .... ... .... .... .... ... .... .... 69 4.3.5.3 Implementation . . . . . .... .... .... ... .... .... 69 4.3.6 Dispatcher . . .... .... ... .... .... .... ... .... .... 71 4.3.6.1 Service . . . .... ... .... .... .... ... .... .... 71 4.3.6.2 Interactions .... ... .... .... .... ... .... .... 71 4.3.6.3 Implementation . . . . . .... .... .... ... .... .... 72 4.4 Control flow . . .... .... .... ... .... .... .... ... .... .... 72 4.4.1 Startup .... .... .... ... .... .... .... ... .... .... 72 4.4.2 get() andput().. .... ... .... .... .... ... .... .... 74 4.4.3 Illustrative example . . .... ... .... .... .... ... .... .... 76 4.5 Future work . . .... .... .... ... .... .... .... ... .... .... 77 4.5.1 Lazy compilation of channel annotations . .... .... ... .... .... 78 v 5 Programming and Software Synthesis 80 5.1 Terminology . . .... .... .... ... .... .... .... ... .... .... 80 5.2 Meta-modeling for the ATaG domain . . . . .... .... .... ... .... .... 81 5.2.1 Objectives . . . .... .... ... .... .... .... ... .... .... 81 5.2.2 Application model . . .... ... .... .... .... ... .... .... 83 5.2.3 Network model.... .... ... .... .... .... ... .... .... 86 5.3 The programming interface . . .... ... .... .... .... ... .... .... 87 5.4 A Case Study: Temperature Monitoring and Object Tracking . . . . . . .... .... 88 5.5 Synthesis, simulation, and visualization . . .... .... .... ... .... .... 96 6 Concluding Remarks 103 6.1 Domain-specific application development . .... .... .... ... .... .... 103 6.2 Compilation and software synthesis . . . . .... .... .... ... .... .... 105 Bibliography 108 vi List Of Tables 3.1 Abstract task: Annotations . . .... ... .... .... .... ... .... .... 29 3.2 Abstract channel: Annotations .... ... .... .... .... ... .... .... 30 3.3 Event-reaction pairs for object tracking . . .... .... .... ... .... .... 36 3.4 Event-reaction pairs for neighbor-to-neighbor protocol. . . .... ... .... .... 38 3.5 Event-reaction pairs for tree-based aggregation . . .... .... ... .... .... 38 3.6 Event-reaction pairs for hierarchical data fusion . . .... .... ... .... .... 40 3.7 Event-reaction pairs for alarm-triggered data collection. . .... ... .... .... 41 vii List Of Figures 1.1 Multiple coordinate systems for the same deployment . . . .... ... .... .... 5 2.1 Layers of abstraction for application development on WSNs . . . . . . .... .... 14 3.1 An overview of the ATaG syntax . . . . . . .... .... .... ... .... .... 28 3.2 Object tracking .... .... .... ... .... .... .... ... .... .... 36 3.3 Neighbor-to-neighbor gradient monitoring . .... .... .... ... .... .... 38 3.4 Tree-based aggregation .... .... ... .... .... .... ... .... .... 39 3.5 Hierarchical data fusion.... .... ... .... .... .... ... .... .... 40 3.6 Wide-area data collection triggered by a local alarm .... .... ... .... .... 41 4.1 DART: The ATaG runtime system . . . . . .... .... .... ... .... .... 54 4.2 UML class diagram: Supervisor (instance of UserTask) . . .... ... .... .... 57 4.3 ATaG code listing for the Monitor task in Figure 3.6.... .... ... .... .... 58 4.4 UML class diagram: DataPool .... ... .... .... .... ... .... .... 60 4.5 Structure of the data pool . . . .... ... .... .... .... ... .... .... 60 4.6 Storing abstract task declarations in the TaskDeclaration class . . . . . .... .... 63 4.7 Storing abstract channel information in the ChannelDeclaration class . . .... .... 64 4.8 Automatically generated section of the AtagManager constructor. . . . . .... .... 65 4.9 UML class diagram: AtagManager . . . . . .... .... .... ... .... .... 66 4.10 UML class diagram: NetworkStack . . . . .... .... .... ... .... .... 67 4.11 UML class diagram: NetworkArchitecture . .... .... .... ... .... .... 69 4.12 UML class diagram: Dispatcher . . . . . . .... .... .... ... .... .... 72 viii 4.13 Flow of control on aput() invocation . . .... .... .... ... .... .... 74 4.14 Data collection control flow at the sampler and collector nodes . . . . . .... .... 76 5.1 Sensor network application . . .... ... .... .... .... ... .... .... 83 5.2 Modeling paradigm for the ATaG program (declarative) . . .... ... .... .... 84 5.3 Specifying annotations for tasks and data items . . .... .... ... .... .... 85 5.4 Modeling paradigm for the network . . . . .... .... .... ... .... .... 86 5.5 Abstract syntax: Temperature gradient monitoring . .... .... ... .... .... 90 5.6 Concrete syntax: Temperature gradient monitoring .... .... ... .... .... 90 5.7 Abstract syntax: Object tracking by local leader election . .... ... .... .... 91 5.8 Concrete syntax: Object tracking by local leader election . .... ... .... .... 91 5.9 Library of ATaG programs (behaviors) . . . .... .... .... ... .... .... 93 5.10 Composing ATaG programs from existing libraries .... .... ... .... .... 93 5.11 A library of deployments . . . .... ... .... .... .... ... .... .... 94 5.12 A network of 9 nodes . .... .... ... .... .... .... ... .... .... 94 5.13 Application as a set of behaviors mapped onto one deployment . . . . . .... .... 95 5.14 Invoking the GME model interpreters . . . .... .... .... ... .... .... 98 5.15 GME model interpreter: Network visualization . . .... .... ... .... .... 98 5.16 Automatically generated skeleton code for an abstract task. .... ... .... .... 99 5.17 Simulation and visualization: Object tracking and gradient monitoring . .... .... 101 ix Abstract Networked sensor systems such as wireless sensor networks (WSNs) are a relatively new class of distributed computing systems that are closely coupled to the physical environment and provide em- bedded sense-and-response capabilities. For various reasons, programming the global behavior of such systems by individually configuring each sensor node is not feasible for complex applications. Macro- programming models seek to provide high level abstractions for the application developer to specify collective behaviors that are then automatically compiled into low level operations at the individual node. This thesis proposes a data driven macroprogramming model called the Abstract Task Graph (ATaG) for architecture-independent development of networked sensing applications. Architecture indepen- dence allows application development to proceed prior to decisions being made about the target deploy- ment, and also allows the same application to be automatically synthesized when needed onto different deployments. In ATaG, the application is modeled as a set of abstract tasks that represent types of information processing functions in the system, and a set of abstract data items that represent types of information exchanged between abstract tasks. Code that implements the actual information processing is provided by the user. Appropriate numbers and types of tasks can then be instantiated to match the actual network configuration, with each node incorporating the user-provided code, automatically generated glue code, and a runtime engine. The runtime engine called DART has a component-based software architecture, and manages all low level aspects of coordination and communication in the network. A prototype environment for visual programming and software synthesis has been implemented with an integrated functional simulation and visualization front end. x Chapter 1 Introduction 1.1 Networked sensor systems A networked sensor system (a ‘sensor network’) is a distributed computing system where some or all nodes are capable of interacting with the physical environment. These nodes are termed as sensor nodes and the interaction with the environment is through sensing interfaces. Sensors typically measure properties such as temperature, pressure, humidity, flow, etc. The sensed value can be one-dimensional or multi-dimensional. Sensor networks have a wide range of applications. Acoustic sensing can be used to detect and track targets in the area of deployment. Temperature, light, humidity, and motion sensors can be used for effective energy management through climate moderation in homes and commercial buildings. Wireless sensor networks (WSNs) [46, 3, 18] are a new class of sensor networks, enabled by ad- vances in VLSI technology, and comprised of sensor nodes with small form factors, a portable and limited energy supply, on board sensing, computing, and storage capability, and wireless connectivity through a bidirectional transceiver. WSNs promise to enable dense, long lived embedded sensing of the environment. The unprecedented degree of information about the physical world provided by WSNs can be used for in situ sensing and actuation. WSNs can also provide a new level of context aware- ness to other back-end applications, making sensor networks an integral part of the vision of pervasive, 1 ubiquitous computing – with the long term objective of seamlessly integrating fine grained sensing infrastructure into larger, multi-tier systems. There has been significant research activity over the last few years in the system-level aspects of wireless sensing. System level refers to the problems such as: (a) localization [43] and time synchro- nization [16, 17] to provide the basic ‘situatedness’ for a sensor node node, (b) energy-efficient medium access protocols that aim to increase the system lifetime through means such as coordinated sleep- wake scheduling [60], (c) novel routing paradigms such as geographic [35, 49], data-centric [24], and trajectory-based [42] that provide the basic communication infrastructure in a network where the as- signment and use of globally unique identifiers (such as the IP addresses of the Internet) is infeasible or undesirable, (d) modular, component-based operating systems for extremely resource constrained nodes [29], etc. A variety of routing and data fusion protocols for generic patterns such as multiple- source single-sink data gathering trees are also being developed to optimize for a range of goodness metrics [32, 31, 61]. A comprehensive overview of state of the art in system level aspects of wireless embedded sensing can be found in [33, 19]. 1.2 Sensor networks and traditional distributed systems It is instructive to compare and contrast the fundamental nature of networked sensing with traditional parallel and distributed computing, with a view to identifying the degree to which the research in the latter field over the past few decades can be leveraged (with our without modification) to propose so- lutions for analogous problems in the former. Since the primary focus of this work is on models and methodologies for programming of large scale networked sensor systems, the comparison will be biased towards aspects which influence application development and not so much on system level issues. Sensor networks are essentially collections of autonomous computing elements (sensor nodes) that pass messages through a communication network, and hence fit the definition of a distributed computing 2 system proposed in [8]. However, some of the fundamental differences between networked sensor systems and traditional distributed computing systems are as follows: Transformational versus reactive processing. The primary reasons for programming applications for a majority of traditional distributed computing systems were “high speed through parallelism, high reliability through replication of process and data, and functional specialization” [8]. Accordingly, the objective of most programming models and languages was (i) to allow the programmer to expose parallelism for the compiler and run-time system to exploit, and (ii) to provide support for abstractions such as shared memory that hide the distributed and concurrent nature of the underlying system from the application developer. In other words, the purpose of most abstractions was to allow the programmer to still visualize the target architecture as a von Neumann machine, which provided an intuitive and straightforward mental model of reasoning about sequential problem solving. Alternate approaches such as dataflow and functional programming were also proposed, motivated by a belief in the fundamental unsuitability of the von Neumann approach for parallel and distributed computing [5]. Regardless of the approach, most parallel and distributed applications were ultimately transformational systems that are characterized by a function that maps input data to output data. This function can be specified as a sequential, imperative program for a von Neumann architecture, and the purpose of parallelizing and distributing the execution over multiple nodes is mainly to reduce the total latency. A networked sensor system is not a transformational system that maps a well defined set of input data to an equally well defined set of output data. Instead, like a majority of embedded software, it is a continuously executing and primarily reactive system that has to respond to external and inter- nal stimuli [26]. An event of interest in the environment triggers computation and communication in the network. A quiescent environment ideally implies a quiescent network as far as application level processing is concerned. Space awareness. An embedded sensor network can be considered to represent a discrete sampling of a continuous physical space. In fact, an abstract model of a distributed sensor network can be defined and 3 analyzed purely in terms of measurements of the space being monitored [34], without any reference to the network architecture. In contrast to traditional distributed computing where all compute nodes were basically interchangeable and the physical location of a particular computing element is not directly relevant from a programming or optimization perspective, space awareness [63] is an integral part of embedded networked sensing. Most of the data in a sensor network deployment is created through the act of sampling the sensing interface(s), and the time and location of the sampling is in most cases a necessary part of the description of the sampled data. The spatio-temporal origin of a data item also affects the quality and quantity of processing performed on it. Space awareness implies the existence of a coordinate system in which sensor nodes can be situated. In fact, a typical sensor network deployment is likely to have more than one coordinate system, each designed for a different purpose. For instance, the absolute or relative geographic coordinates might be required for tagging data samples at the node level, whereas the routing protocols could be using a dif- ferent coordinate system that leads to reduced congestion and higher probability of timely data delivery in the network. Yet another coordinate system could be used for back-end processing which maps a particular (x,y) coordinate to, say, a building, a corridor, or a warehouse, depending on the application domain. Figure 1.1 depicts three coordinate systems overlaid on the same sensor network. From the perspective of application development for networked sensor systems, a real or virtual coordinate sys- tem can be deemed to be an essential service included in the system level infrastructure, the details of which need not concern the programmer. Another aspect of space awareness is that the application behavior can be naturally specified in terms of spatial abstractions than in terms of nodes and edges of the network graph. For example, a temperature monitoring application can be specified as “if more than 70% of nodes within a 2 meter radius of any node report a temperature higher than 90 degrees, activate an alarm at that node location”. The deployment of the network itself can be specified in terms of the desired degree of coverage. The exact placement of sensor nodes might not be of interest to the application developers as long as the 4 2D coordinates Virtual topologies Domain-specific labels 10,40 10,90 200,180 95,120 Root-I Root-II Office Parking Walkway Figure 1.1: Multiple coordinate systems for the same deployment set of sensing tasks mapped onto a subset of those nodes at any given time collaboratively ensures the desired coverage. Space-aware specification of the desired functionality is a unique aspect of networked sensor systems that has no analogous equivalent in traditional parallel and distributed computing. Nature of input data. A majority of the data in a networked sensor system represents the occurrence of events in the physical environment, and/or carries information about the events. Each data instance can be considered as a first class entity with associated properties that could change with time and distance from its point of origin. For instance, in embedded sense-and-respond systems where sensing is coupled with local actuation and timely response to detected events is essential, the utility of the data that rep- resents occurrence of the event reduces with time. If the data is not processed by the application within a certain duration from its time of origin, it is effectively useless. In-network processing that seeks to move the computation close to the source of the data is required in many sensor network applications to guarantee the desired end-to-end functionality. This is in contrast to traditional distributed computing where the distribution of data and placement of tasks on compute nodes is primarily determined by performance and reliability considerations. 5 Also, different subsets of the total data in the network will be of interest to different applications at a given time, or to the same application at different times. In a sensor network deployed for climate moderation in a commercial building, an application component that periodically logs all temperature readings in a central database might not be interested in the semantics of that information, whereas another application component that is responsible for maintaining a uniform climate could be interested in temperature gradients that are above a certain threshold. From a programming perspective, it is important to give application developers the freedom to define what is relevant and what is irrelevant, and to produce and consume data at the desired level of semantic abstraction. The semantics of data could also influence the protocols and services used for transporting data through the network, and for prioritizing in-network activities that are triggered in response to certain events. A piece of data that represents a catastrophic event such as a forest fire is much more important than any other data in the network at that time and the computation and communication resources in the network can be expected to be devoted to expediting the transmission of the forest fire notification to its eventual destination. In a purely transformational system, however, it can be argued that the notion of importance of a particular piece of data does not really exist. 1.3 Macroprogramming: What and Why? The primary focus of this dissertation is on the programming of large scale networked sensor systems. The purpose of the typical sensor network deployment is to gather and process data from the environ- ment for a single ’end to end’ objective. The program that executes on each node is part of a larger distributed application that delivers the results of an implicit or explicit domain specific query. Each node is required to be aware of its role in accomplishing the overall objective, i.e., it is required to implement a pre-defined protocol for information exchange within the network. Consider a sensor net- work deployed for object tracking. The desired result of the implicit and perennial domain specific query in this case is the current location of target(s) (if any) in the network. A node-centric approach 6 to programming the network requires each node to be programmed with the following behavior. The acoustic sensor is sampled periodically with a fixed or varying frequency, a Fourier transform is applied to the time-domain samples, and the result is compared with a set of acoustic patterns of interest to the end user. If a match is found, the time- and location-stamped result is communicated to a designated ’clusterhead’ node which performs further processing such as line of bearing estimation in an attempt to predict the location of the target. This programming methodology where the desired global application behavior is manually decom- posed by the programmer and subsequently coded into individual node-level programs is termed node- centric programming and is representative of state of the art. Node-centric programming has several limitations. Manual translation of global behavior into local actions is likely to be time-consuming and error prone for complex applications. If a new global behavior is to be added to an existing program, the modifications to the existing code are essentially ad hoc. The strong coupling of application-level logic and system level services such as resource management, routing, localization, etc., also results in high coding complexity. Macroprogramming broadly refers to programming methodologies for sensor networks that allow the direct specification of aggregate behaviors. The existence of a mechanism to translate the macro- program into the ‘equivalent’ set of node-level behaviors is implicit. The exact interpretation of macro- programming varies. A Regiment program specifies operations (such as fold and map) over sensor data produced by nodes with certain geographic or topological relationships of interest. Since these subsets of the global network state can be manipulated as a single unit, Regiment is a macroprogramming lan- guage. Kairos is a macroprogramming language because the programmer writes a single, centralized program for the entire network, and the compiler and runtime system are responsible for the translation of this program into node level behaviors, and implementing data coherence respectively. TinyDB also 7 enables macroprogramming because the programmer who formulates the SQL-like declarative aggre- gate query over sensor data is not responsible for (or even aware of) the details of in-network processing that are responsible for data collection and processing. We define the following two types of macroprogramming that are supported by ATaG. Application-level macroprogramming means that the programming abstractions should allow the manipulation of information at the desired level of semantic abstraction. The information may indicate the occurrence of an event and/or also carry information about the occurrence. For in- stance, in an object tracking application, the program should be able to access information such as ‘number of targets currently tracked’, ‘location of nearest target’, etc., without worrying about how that information is obtained. Architecture-level macroprogramming means that the programming abstractions should allow concise specification of common patterns of distributed computing and communication in the network. Such patterns are represented as part of neighborhood libraries defined for node-centric programming methodologies [55]. These will typically have equivalent, concise abstractions in the macroprogramming language whose node-level implementation invokes the libraries. A macroprogramming language can be application neutral or application specific. The application- specific approach entails customized language features to support a particular class of networked sensing applications. For example, a programming language explicitly designed for multi-target tracking might provide the current set of target locations or the handles to the current targets as a language feature whose implementation is hidden from the user. A language for temperature monitoring might provide a topographic map of the terrain as a built-in data structure that is created and maintained entirely by the runtime system. The advantage of this approach is that the implementation of domain-specific features can be optimized based on a priori knowledge of the pattern of information flow. If domain-specific features are integrated into the language, the resultant complexity of coding a behavior in that domain 8 is also reduced. The drawback of this approach is that the portability and reusability of application- level code across network architectures, node architectures, and domains could be compromised. Also, adding new language features or modifying existing features might require a redesign of the runtime system, and could be impossible or difficult for the application developer. 1.4 Contributions and Outline The two main contributions of this research are: (i) a programming model called the Abstract Task Graph (ATaG) for architecture-independent application development for a class of networked sensor systems, and (ii) a component-based software architecture for the runtime system. A third contribu- tion is a prototype environment for visual programming in ATaG and automatic software synthesis for the target network deployment. The prototype compiler integrated into this environment is designed to demonstrate functionally correct synthesis of a subset of the program features and does not optimize for any performance related metrics. Indeed, the definition of the compilation problem in the context of ATaG and the design and implementation of optimizing compilers for different scenarios is a significant research problem in its own right and one of the main areas of future work. The Abstract Task Graph (ATaG). ATaG is a macroprogramming model that builds upon the core concepts of data driven computing and incorporates novel extensions for distributed sense-and-respond applications. In ATaG, the types of information processing functionalities in the system are modeled as a set of abstract tasks with well-defined input/output interfaces. User-provided code associated with each abstract task implements the actual processing in the system. An ATaG program is abstract because the exact number and placement of tasks and the control and coordination mechanisms are not defined in the program but are determined at compile-time and/or run-time depending on the characteristics of the target deployment. Although ATaG is superficially based on the task graph representation, there are significant differences in the syntax and semantics, which arise from the requirements of distributed networked sensing. The differentiating factors include the notion of ‘abstract’ tasks and data items, the 9 use of data driven program flow semantics of the graph, the elevation of data items as a first class entity in the graph representation along with the computational tasks, the concept of spatial scope of a directed edge, etc. There is a growing interest in defining macroprogramming languages [25, 44] and application de- velopment environments [12, 54] for sensor networks. ATaG enables a methodology for architecture- independent development of networked sensing applications. The same ATaG program may be auto- matically synthesized for different network deployments, or adapted as nodes fail or are added to the system. Furthermore, it allows application development to proceed prior to decisions being made about the final configuration of the nodes and the network, and in future implementations, will permit dynamic reconfiguration of the application as the underlying network changes. ATaG provides application-neutral support for macroprogramming. Using a small set of basic ab- stractions, ATaG allows programmers to define their own semantics for tasks and data items. The mod- ularity and composability of ATaG programs means that a library of common behaviors in a particular domain can be defined by the programmer and can later be plugged into other applications that need not know the implementation details of the library component. This approach provides the benefits of us- ing pre-defined domain-specific features while avoiding the restrictiveness of a domain-specific, custom built runtime system. Data-Driven ATaG RunTime (DART). ATaG is supported by a runtime system called DART whose structure and function is not visible to the programmer. DART has a component-based software archi- tecture for modularity and flexibility. Each component of DART provides one or more well-defined services to other components. The implementation of a service is hidden from the users of the service. The current DART design can be easily implemented on operating systems that support preemptive priority-based scheduling, multi-threaded execution, mutual exclusion semaphores, message queues, and other mechanisms to handle concurrent access to critical sections and coordinate interactions be- tween threads. Most traditional operating system kernels provide these facilities. A prototype version 10 of DART has been implemented in Java, and is designed to run on relatively heavy duty sensor nodes, although Java Virtual Machines for resource-constrained architectures are also available [50]. DART is also being implemented on the C/OS-II real-time OS kernel [41] which has been ported to a vast number of devices. The performance of DART is unlikely to compare favorably with hand-optimized runtime systems where different functionalities are tightly integrated into an inflexible, monolithic structure and many cross-layer optimizations are incorporated into the design. However, the tradeoff between usability and flexibility on one hand, and hand-optimized performance on the other is common in all methodologies that seek to automate the design of complex systems. A greater level of experience with implement- ing different applications on a real DART-based system will guide future design choices for the ATaG runtime. Software synthesis. In the context of the ATaG-based programming framework, software synthesis is the process of generating code for each node of the target sensor network deployment for the selected ATaG program. The code that is associated with each application-level functionality (abstract task) is to be provided by the programmer. The task of the software synthesis process is to generate the remainder of the software that is responsible for coordination and communication between the abstract tasks. To ease the task of software synthesis, we designed DART such that a majority of the code base is either agnostic to the application level functionality or can be customized by means of a configuration file that is generated by the software synthesizer. As an example, approximately 3000 lines of Java code runs on each sensor node in the ATaG program for object tracking (Section 3.5.1) - of which only 100 lines are actually provided by the application developer. The rest is comprised of DART code that is used essentially unchanged and some glue code that is generated by the software synthesizer. The newly generated glue code is only 15 lines of Java that basically embeds the declarative part of the ATaG program into the runtime system, and a one-line configuration file for each node in the target network that provides some state information to govern the node’s behavior during the simulation. 11 Outline. The core ideas of ATaG have been individually explored in different contexts in the parallel and distributed computing community. There are also other approaches to the problem of macropro- gramming of sensor networks being explored in the sensor networking community. Chapter 2 discusses some of this related work. Chapter 3 presents the ATaG programming model in detail with a description of a syntax and semantics of ATaG program. A set of programming idioms are also provided to illustrate the formulation of oft cited behaviors in sensor networking as ATaG programs. The design of the DART runtime system is the subject of Chapter 4, which describes the service provided by each of the DART components, the interactions between the various components, and implementation notes. Chapters 3 and 4 also include a discussion of future research directions in the context of the programming model and the design of the runtime system respectively. Chapter 5 presents the visual programming and software synthesis environment for ATaG. A brief primer on the Generic Modeling Environment [22] precedes the discussion of the various modeling paradigms that are provided to the application devel- oper. A case study is included to illustrate the development of an application consisting of two behaviors - object tracking and environment monitoring - using this programming environment. We conclude in Chapter 6. 12 Chapter 2 Programming of Distributed Sensor Networks 2.1 Wireless sensor networks Figure 2.1 depicts our view of the emerging layers of programming abstraction for networked sensor systems. Many protocols have been implemented to provide the basic mechanisms for efficient in- frastructure establishment and communication in ad hoc deployments. These include energy-efficient medium access, positioning, time synchronization, and a variety of routing protocols such as data centric and geographic routing that are unique to spatial computing in embedded networked sensing. Ongoing research, such as MiLAN [28] is focusing on sensor data composition as part of the basic infrastructure. Sensor data composition essentially means that the responsibility of interfacing with physical sensors and aggregating the data into meaningful application-level variables is delegated to an underlying run- time instead of being incorporated as part of the application-level logic. We now discuss the layers of abstraction from the highest level of abstraction to the lowest. 2.1.1 Service-oriented specification To handle the complexity of programming heterogeneous, large-scale, and possibly dynamic sensor net- work deployments and to make the computing substrate accessible to the non-expert, the highest level 13 Network deployment Sensor interfaces Sensor data composition Positioning, time synch, MAC Data-centric/geographic routing Node-centric programming models Libraries and middleware services: Logical neighborhood maintenance, event addressing, logical namespaces Macro-programming Selection, composition, optimization, deployment Service-oriented application specification Figure 2.1: Layers of abstraction for application development on WSNs of programming abstraction for a sensor network is likely to be a purely declarative language. The Se- mantic Streams markup and query language [57] is an example of such a language that can be used by end users to query for semantic information without worrying about how the corresponding raw sensor data is gathered and aggregated. The basic idea is to abstract the collaborative computing applications in the network as a set of services, and provide a query interpretation, planning, and resource manage- ment engine to translate the service requirements specified by the end user into a customized distributed computing application that provides the result. A declarative, service-oriented specification allows dy- namic tasking of the network by multiple users and is also easier to understand compared to low level distributed programming. 2.1.2 Macroprogramming The objective of macroprogramming is to allow the programmer to write a distributed sensing appli- cation without explicitly managing control, coordination, and state maintenance at the individual node level. Macroprogramming languages provide abstractions that can specify aggregate behaviors that are automatically synthesized into software for each node in the target deployment. The structure 14 of the underlying runtime system will depend on the particular programming model. While service- oriented specification is likely to be invariably declarative, various program flow mechanisms - func- tional, dataflow, and imperative - are being explored as the basis for macroprogramming languages. Regiment [44] is a declarative functional language based on Haskell, with support for region-based ag- gregation, filtering, and function mapping. Kairos [25] is an imperative, control-driven macroprogram- ming language for sensor networks that allows the application developer to write a single centralized program that operates on a centralized memory model of the sensor network state. ATaG [6] (discussed in more detail in the next section) explores the data flow paradigm as a basis for architecture-independent programming of sensor network applications. 2.1.3 Node-centric programming In node-centric programming, the programmer has to translate the global application behavior in terms of local actions on each node, and individually program the sensor nodes using languages such as nesC [20], galsC [13], C/C++, or Java. The program accesses local sensing interfaces, maintains appli- cation level state in the local memory, sends messages to other nodes addressed by node ID or location, and responds to incoming messages from other nodes. While node-centric programming allows manual cross-layer optimizations and thereby leads to efficient implementations, the required expertise and ef- fort makes this approach insufficient for developing sophisticated application behaviors for large-scale sensor networks. The concept of a logical neighborhood – defined in terms of distance, hops, or other attributes – is common in node-centric programming. Common operations upon the logical neighborhood in- clude gathering data from all neighbors, disseminating data to all neighbors, applying a computational transform to specific values stored in the neighbors, etc. The usefulness and ubiquity of neighborhood creation and maintenance has motivated the design of node-level libraries [56, 55] that handle the low level details of control and coordination and provide a neighborhood API to the programmer. 15 Middleware services [28, 39, 62] also increase the level of programming abstraction by providing facilities such as phenomenon-centric abstractions. Middleware services could create virtual topologies such as meshes and trees in the network, allow the program to address other nodes in terms of logical, dynamic relationships such as leader-follower or parent-child, support state-centric programming mod- els [37], etc. The middleware protocols themselves will typically be implemented using node-centric programming models, and could possibly but not necessarily use communication libraries as part of their implementation. 2.2 Parallel and distributed computing ATaG allows programmers to write architecture-independent networked sensing applications using a small set of application-neutral abstractions. Intuitive expression of reactive processing is accomplished in ATaG by using a data driven paradigm, while architecture-independence is made possible through separation of functional concerns from the non-functional. These two core ideas have been explored in the distributed computing community. The Data Driven Graph [53] extended the basic directed acyclic task graph model to support loop representation and dynamically created tasks in parallel programming. The use of data driven semantics coupled with the task graph-like representation enabled clarity and sim- plicty of program design, and also allowed for some optimizations relating to the data communication between tasks. The benefits of separating the core application functionality from other concerns such as task place- ment and coordination motivated the FarGo [30] model that enabled dynamic layout of distributed applications in large scale networks where capabilities of nodes and links could vary at run time. By explicitly indicating co-location and re-location semantics of the tasks, FarGo elevated the performance and reliability of applications by allowing the deferment of layout decisions to run time. Distributed Oz [27] is perhaps the closest to ATaG in terms of its objective of network transparency and network awareness. Distributed Oz cleanly separates the application functionality from aspects of distribution 16 structure, fault tolerance, resource control and security, and openness. There are no explicit operations to transfer data across the network. All invocations of send() and receive() are done implicitly through language constructs of centralized programming. IBM’s PIMA project [9] explored a “write once, run anywhere” model for application front-ends by specifying device-specific presentation hints separately from the tasks and their interactions – yet again highlighting separation of functional and non-functional concerns as the key enabler of architecture independence. Tuple space is an abstract computation environment that represents a global communication buffer accessible to computational entities in the system. This was the basis for the generative communication model in the Linda coordination language [21] and is also being applied in networked sensing [14]. Communication orthogonality is a property of generative communication and means that both the sender and the receiver of a message are unaware of each other. ATaG also has this property because the tasks that produce and consume a particular data item in ATaG are not aware of each other. The data pool in ATaG is superficially similar to the notion of a tuple space. However, our active data pool moves the data items from producer to consumer(s) as soon as they are produced, and schedules the consumer tasks based on their input interface and firing rules. This is different from the passive tuple space that merely buffers the produced data items and whose modifications are really a side effect of control-driven task execution. In fact, the concept of tuple spaces has its roots in Blackboard architectures [45] of AI research. ATaG’s active data pool is similar to the ‘Demoned data servers’ of DOSBART [36] that enabled dis- tributed data driven computation in a blackboard architecture. The notions of activity class and trigger activities of DOSBART are similar to the abstract tasks and their firing rules in the ATaG model respec- tively. 17 Chapter 3 The Abstract Task Graph 3.1 Target applications and architectures ATaG is not designed for a particular sensor node platform, network architecture, or application domain. We model the deployment as a distributed system consisting of a set of autonomous elements (sensor nodes). Each element of the system has on-board computation and storage capability, and can commu- nicate with the rest of the elements through one or more neighbors. In addition, each element may be equipped with one or more types of sensing or actuation mechanisms that can be controlled through software. Since situatedness (localization) is fundamental to embedded networked sensing, we assume that each element is capable of determining its own location in some shared coordinate system and/or namespace. The programming model makes no assumptions about the communication interface (wired or wire- less), or the computation, storage, and energy resources available to a node. Of course, the resources at a node will constrain the number of tasks that can be mapped onto it, the latency of communication could be affected by the available bandwidth between the node and its neighbors, and the type of en- ergy resources available could also affect the system-wide performance. This analysis is expected to be performed at compile time in the context of a specific network architecture and the suitability of an ATaG program for a particular architecture is not meant to be inherent in the program itself. Thus, the 18 target system can encompass a heterogeneous collection of micro-sensor nodes such as the Motes, more capable nodes such as the Stargate, and even desktop PCs or servers connected to the internet. ATaG also makes no assumptions about the mobility of nodes or other factors that could lead to changes in network topology at run time. The interpretation of program elements will depend on the nature of the target deployment but the definition of the features of the programming model is independent of such assumptions. For example, an ATaG programmer can specify the instantiation density of an application level task and state, say, that one instance of task A should be instantiated per square meter of the deployment. In this case, if the nodes are mobile, the runtime system is expected to be capable enough to detect situations when this requirement is no longer satisfied and take corrective measures such as reassign- ing tasks to nodes in a way that expected density is once again achieved. If the nodes are immobile, the initial task assignment at compile time can be expected to be valid till other factors such as energy depletion necessitate reassignment. In this example, the application developer does not care about the static or dynamic nature of the deployment as long as the high-level application requirements as ex- pressed through an ATaG program are met. More important, keeping the programming model free of such assumptions also adds to the architecture independence of the application. Of course, this does not preclude ATaG programs from being designed for specific types of deployments, but the programming model itself is designed for a range of network architectures, with the job of deployment-specific cus- tomization largely delegated to the compilation process and the protocols and services incorporated into the underlying runtime system. ATaG programs are data driven, which means that tasks are scheduled when their data is available (possibly also subject to other firing rules). Tasks interact only with the data pool and one task cannot directly control other tasks. This lack of application-level control over task scheduling and execution (that is entirely managed by the underlying runtime system) limits the applicability of ATaG to scenarios where such fine grained control over node level execution is not required. Low duty cycle environment 19 monitoring that require periodic network-wide data collection with or without in-network aggregation is an example of an application that can be programmed in ATaG. On the other hand, if an application requires strict latency guarantees on critical paths from sensing to actuation, a control-driven program- ming language such as Kairos [25] may be better suited than the data driven semantics of ATaG. 3.2 Key Concepts ATaG is based on two key concepts: data driven program flow that enables intuitive expression of re- active processing in the network, and leads to modular, composable, and reusable programs, and mixed imperative-declarative program specification that separates the functional and non-functional aspects of the application and provides architecture independence, spatial awareness, and network awareness. We discuss these concepts in more detail in the following subsections. 3.2.1 Data Driven Computing 3.2.1.1 Program flow mechanisms Three basic program flow mechanisms being explored in the context of programming of networked sensor systems are: control driven, data driven, and demand driven. In control driven program flow, instructions are executed in an explicitly specified order. An example of this is the well known von Neumann architecture where the program counter is incremented (or otherwise modified) after every execution and the next instruction in the sequence is decoded and executed. The single thread of con- trol passes from instruction to instruction, and the modifications to the data store are a side effect of instruction execution. Data is passed indirectly between instructions by means of referencing common memory locations. In parallel forms of control flow, there are multiple threads and mechanisms such as fork and join for coordination between the threads. Imperative languages such as C are representative 20 of control driven programming. Paradigms such as object oriented programming, distributed program- ming through message passing, etc., provide ways to structure complex control driven programs to make them easier to design, maintain, and/or deploy, but the basic model of a set of ‘active’ instructions manipulating a (conceptually) shared ‘passive’ data store remains unchanged. Data driven program flow is fundamentally different from control driven flow in the following aspects. First, the flow of control is governed by data dependencies and not determined by an explicitly specified sequence of tasks/instructions to be executed. Tasks are defined in terms of their input and output data items. In the basic dataflow model, an instruction is considered to be enabled when its operands are ready, and the program terminates when no instructions are enabled. Data dependence is the sole means of task scheduling and also the synchronization. Second, data is explicitly passed between tasks. There is a data pool abstraction that tasks write to and read from, but the concept of indirect sharing of data through referencing common locations (shared variables) in the data pool does not exist. Data flow programs are commonly expressed as directed graphs where the nodes of the graph correspond to tasks (instructions) and the directed arrows denote data dependencies between tasks. The term ‘event driven processing’ is used in the sensor network community, specifically in the context of the TinyOS operating system for the Berkeley Motes. Event-driven means that processes need not poll or block for input, consuming valuable system resources while doing so. In networked sensor systems where certain kinds of events might be very rare compared to the frequency of polling, such behavior is wasteful. Instead, the event driven philosophy allows the process to sleep till its required trigger input is available and be woken up (activated) at the suitable time. Programming with the nesC language qualifies as event-driven programming because the program is basically structured as a set of modules with well defined interfaces that can be invoked by other modules to request a service (“commands”) or act as a callback to the caller module to indicate completion of the service (“events”). The event-driven execution in this context is essentially control driven program flow where the events correspond not to the availability of input data for a particular module, but to the invocation of an 21 asynchronous function call by another module. The transfer of data between modules (if any) is hidden in the arguments to the function being invoked. The core of the operating system is just a scheduler and there is no active data store that spawns tasks based on their firing conditions. Tuple spaces is another abstraction that is superficially similar to data driven program flow but, at least as used in the Linda coordination language, is basically a mechanism for spatially and temporally decoupled sharing of data among multiple processes in a control driven distributed program. A tuple space is a shared, associative memory maintained by an underlying run time system. Although the shared memory abstraction reduces the complexity of distributed programming compared to message passing, location-based addressing of the shared memory is cumbersome for a variety of reasons. In- stead, processes add ‘tuples’ to the shared memory by means of an in() primitive, and read tuples by means of the out() primitive. Tuples are typed groupings of relevant fields that are addressed not by their location in the logically shared memory but by their content and type. Since the reads and writes are directed at the tuple space and not at other processes, programs gain modularity and extensibility. The tuple space can be considered as just another form of shared memory in a control driven program flow because the thread of control is very much in the processes themselves and not determined by the contents of the tuple space. Like the event driven programming of nesC/TinyOS which eliminates the need for polling or blocking and thereby makes control-driven programming more efficient, mechanisms such as thenotify() primitive of JavaSpaces have been defined for the tuple space abstractions. However, just as event-driven execution does not make nesC a data driven language, the addition ofnotify() to tuple spaces does not make it a data driven paradigm, although the other benefits of the tuple spaces make it a promising approach for sharing information in highly distributed and dynamic systems such as sensor networks. One of the many extensions to the basic Linda model that have been proposed over the past couple of decades is Lime, which among other extensions, adds the concept of a reaction, which is a method to be executed when a tuple matching a particular pattern 22 is found among the contents of the tuple space. An overview, classification, and analysis of approaches to embed reactive processing in shared dataspaces can be found in [11]. Finally, demand-driven programming - also known as reduction programming - is a third paradigm where the demand for a value triggers the computation that is responsible for producing the value. That computation may in turn require values that lead to more computations and so on. Functional programming with lazy evaluation is an example of the demand driven program flow mechanism. In reduction programs, there is typically no concept of a storage location that can be read and written. All program structures are expressions. When a program is expressed as a function whose arguments in turn can be functions themselves, the programmer is describing the solution space without specifying the exact sequence of instruction execution required to arrive at a solution. Regiment [44] is a functional language based on Haskell that exploits the declarative nature of functional programming to simplify the task of collaborative computing in networked sensor systems. 3.2.1.2 Why data driven? The individual sensor node will typically have a traditional, sequentially programmable von Neumann or Harvard architecture, and support for one or more control-driven, imperative languages such as C. At the system level, which is the domain of macroprogramming, there are different ways of modeling the collection of von Neumann architectures that forms the overall computing substrate. One approach is to extend the node-level programming paradigm to encompass the entire system, and model the sensor network as a single processing element and a single centralized memory [25]. The von Neumann model can also be abandoned at the system level altogether and the macroprogramming language can be based upon an alternate paradigm such as functional programming [44]. ATaG explores the dataflow paradigm for the following reasons. Reactive processing. A sensor network application can be intuitively modeled as a set of node-level or system-level responses to node-level or system-level events. Events will be defined by the application 23 developer at desired levels of semantic abstraction, based on the application domain. An event could indicate the occurrence of phenomena in the physical environment (physical event) or the execution of a particular phase of processing in the network (computational event). In addition to denoting occurrence, the event could also carry information about the phenomena in the former case, and the results of intermediate computation in the latter. Similarly, a reaction to an event could involve a sequence of computation and communication involving one or more nodes of the network. Data driven programming is especially suited for expressing reactive applications. A data driven program consists of a set of tasks with well defined input and output interfaces. In the pure data driven model, a task is executed only when all of its inputs are available. However, variants of the basic model (including our variant in ATaG) allow the definition of firing rules that can be used to define triggering condition of a task. For instance, a task could be triggered when a specific input is available, or when any one of its inputs is available, or when a certain fraction of its inputs are available. These basic rules can be used to define complex behaviors, as will be illustrated in Section 3.5. Also, tasks are disjoint from each other in the sense that all interaction between tasks is indirect - through the production and consumption of data items. Since tasks are decoupled, a given task can defined to use data items at the desired level of semantic abstraction without having to worrying about how they are produced. This supports application level macroprogramming. Reusability and composability. Modularity, reusability, and composability are important non- functional requirements for sensor network applications. Ultimately, we envision our programming model to be integrated into an application synthesis framework similar to the vision of service-oriented program composition [38]. Macroprograms will be generated automatically from a high level declar- ative specification and in turn compiled into node-level specifications. Modularity and composability enables the creation of libraries of commonly encountered behaviors and allows existing applications to be suitably reused as subsets of larger functionalities. 24 In control driven distributed programming using message passing or other communication libraries, tasks explicitly invoke each other’s services. Since this requires a task to have information about other task it communicates with, any modification to a task is likely to affect other tasks in the program. Also, if a new task (functionality) is added to the program, all tasks that are to take advantage of that functionality must be modified to incorporate the suitable calls to the newly added task. This tight coupling of task interfaces restricts the reusability of code and composability of programs. In data driven programming however, task interfaces are specified as “Task A reads data item Tem- perature and produces data item Alarm”, or “Task B reads data item Temperature and produces data item Maximum”. Suppose a new functionality is to be added to this temperature monitoring program. The purpose of this new task is to corroborate the readings from a wider area around the node that produced the alarm and produce another ‘verified alarm’ based on the results. In data driven programming, all that is required is to simply define a new task as “Task C reads data items Alarm and Temperature and produces data item VerifiedAlarm”. The representation of the spatial aspect of this processing will be discussed in the next section, specifically the collection of data from the neighborhood. The emphasis here is on the fact that the addition of Task C does not change the existing tasks in any way. Also, Task C does not care about how the Alarm is produced by Task B. The new program is simply a concatenation of the three tasks, and their mutual dependency is implicit in their input and output interfaces defined in terms of data items. Such composability of ATaG programs is illustrated in Section 5.4. 3.2.2 Mixed Imperative-Declarative Specification Imperative programming is a programming paradigm where computation is specified in terms of state- ments (commands) that are to be executed in sequence and that change the program state. Almost all processors are designed to execute imperative programs and the program state at any given time is represented by the contents of the processor memory at that time. Since imperative programming 25 requires the programmer to specify the ‘how’ of computation in detail, the advantage of intimate con- trol over program execution is offset by the programming complexity, especially for large scale and/or distributed systems. High level procedural languages and object oriented languages provide constructs such as objects that ease the task of writing complex imperative programs but the basic paradigm re- mains unchanged. nesC [20] and Kairos [25] are examples of imperative programming languages for sensor network applications. Declarative programming, in contrast, focuses on the ‘what’ of computation, leaving the ‘how’ unspecified. A declarative program can be viewed as the description of a solution space where the se- quence of steps to arrive at the solution is left to some underlying interpreter. Functional programming and logic programming are examples of declarative programming. The major advantage of declarative programming from an application development perspective is the reduced complexity of programming that is a result of delegating most of the selection and synthesis of underlying mechanisms to an unspec- ified interpreter, while the application developer focuses primarily on formulating the solution space. Regiment [44], TinyDB [40], and Semantic Streams [57] are examples of the declarative programming paradigm for sensor network applications. Now, the functional aspect of a sensor network application refers to the code (tasks) that runs on the individual sensor nodes and performs data processing. Examples of non functional aspects are task placement and mechanisms for communication and coordination. Consider a simple application where a collector task running on a designated root node periodically receives and logs temperature readings from every node in the network. The functional aspects of this application are completely defined by the code that performs the sampling and the code that performs the logging. As long as there is a mechanism to (i) ensure the placement of one sampling task on each node of the network and one logging task on the root node, (ii) periodically execute the sampling task, and (iii) route the sampled data from its point of origin to the root node, the details of its implementation should not be the application developers’ concern. 26 The ATaG programming paradigm is based on the observation that specification of functional as- pects of the networked sensing application in an imperative style and the non functional aspects in a declarative style affords a tradeoff between the need for control over application execution and the need to reduce the complexity of communication and coordination. The latter is a substantial fraction of a networked sensing application and can really be considered as a service offered by the system instead of an integral and integrated part of the application code. More importantly, ATaG enables architecture independence by clearly separating the “when and where” of processing from the “what”. The former constitutes the declarative part and is specified through parameterized spatial and temporal attributes for a generic network architecture. The latter constitutes the imperative part and is the actual task code supplied by the user. The same program can be compiled for a different network size and topology by interpreting the declarative part in the context of that network architecture while the imperative part remains unchanged. 3.3 Syntax The task graph is a widely used application model. In the task graph notation, the overall computation is represented as an acyclic directed graph. The nodes of the graph correspond to processes (tasks), and a pair of distinct tasks are connected by a directed edge iff the task at the tail of the directed edge requires as input the results of execution of the task at its head. In the simplest model, a task cannot start executing till all its predecessors have finished execution. For transformational applications, the task graph exposes the potential for concurrent execution of tasks and is widely used in task scheduling and allocation [2, 47]. The task graph is also commonly annotated/extended with other information relevant to the problem domain; e.g., the conditional task graph for low power embedded system synthesis [58], the augmented task dependency graph [48] for automated software partitioning and mapping for dis- tributed multiprocessors, the iterative task graph for representing loops [59], etc. Annotation of paths 27 ATaG Program Abstract Task Abstract Data Abstract Channel Firing rules (“when”) Placement (“where”) periodic aperiodic anydata alldata node ID(s) geographic location(s) resource availability neighborhood hops distance k-nearest virtual topology parent children clustering domain instances-nearest initiation push pull degree of coverage Figure 3.1: An overview of the ATaG syntax in the task graph with throughput and latency constraints has been employed for resource allocation in distributed sensor-actuator systems [4]. The ATaG model of a program is similar to the task graph model in that the application is represented as a set of tasks and a set of data items connected via directed arrows denoting the input or output relationship between a task and a data item. Tasks and arrows (called ‘channels’ in ATaG) also have associated annotations that determine the translation of the architecture-independent ATaG program in the context of a particular network deployment. An ATaG program is a set of abstract declarations. An abstract declaration can be one of three types: abstract task, abstract data,or abstract channel. Each abstract declaration consists of a set of anno- tations. Each annotation is a 2-tuple where the first element is the type of annotation, and the second element is the value. Hereafter, we occasionally omit the word ‘abstract’ for sake of brevity. Figure 3.1 provides a general overview of the ATaG syntax and the broad classification of the annotation types currently supported. The task annotations relate to the placement and firing rules of tasks, while the 28 channel annotations are used to specify different types of ‘interests’ in instances of the associated ab- stract data item. Support for task placement based on compile-time or run-time availability of resources or on the desired degree of coverage (for sensing tasks) is not yet implemented in the prototype ATaG programming environment, and is hence italicized in the figure. The set of annotations is open ended - more types can be defined based on the target class of applications, the hardware architecture of the sensor node, and the capabilities of the runtime system. Type: Instantiation value[:parameter] Description one-on-node-ID:id Create one instance of the task on node id one-anywhere Create one instance of the task on any node in the network nodes-per-instance:[/]n Create one instance of the task for each n nodes of the network. When n is preceded by a “/”, create exactly n instances of the task and divide the total number of nodes into n non-overlapping domains, each owned by one instance. area-per-instance:[/]area Same as for nodes-per-instance. Parameter denotes area of deployment instead of number of nodes. The non-overlapping domains are in terms of area of deployment, not number of nodes. spatial-extent: Create one instance of the task on every node that is deployed in the polygon defined by the co-ordinates , , , . Type: Firing rule value[:parameter] Description periodic:p Schedule task for periodic execution with period of p seconds. any-data Schedule task for execution when at least one of the input data items are available. all-data Schedule task for execution only when all the input data items are available. Table 3.1: Abstract task: Annotations Abstract task: Each abstract task declaration represents a type of processing that could occur in the application. The number of instances of the abstract task existing in the system at a given time is determined in the context of a specific network description by the annotations associated with that declaration. Each task is labeled with a unique name by the programmer. Associated with each task declaration is an executable specification in a traditional programming language that is supported by the target platform. Table 3.1 describes the annotations that can be associated with a task declaration in the current version of ATaG. 29 Type: Initiation value Description push The runtime system at the site of production of each instance of the asso- ciated abstract data item is responsible for sending the instance to nodes hosting suitable instances of the consumer task(s). pull The runtime system at the node hosting an instance of the consumer task is responsible for requesting the required instance(s) of the associated ab- stract data item from the site(s) of production. Type: Interest value[:parameter] Description [ ]local Channel applies to the local data pool of the task instance. The nega- tion qualifier excludes the local data pool, and can be used in con- junction with other qualifiers. neighborhood-hops:n Channel includes all nodes within the n-hop neighborhood of the node hosting the task instance neighborhood-distance:d Channel includes all nodes within a distance d of the node hosting the task instance neighborhood-nearest:k Channel includes the k nodes nearest in terms of distance to the node hosting the task instance instances-nearest:k Specified ONLY for an input channel, this means that each instance of the data item associated with the input channel should be sent to the k nearest instances of the abstract task associated with the input channel. all-nodes Channel includes all nodes in the system domain Channel includes all nodes that are owned by the task instance. This value is used in conjunction with the nodes-per-instance or area-per- instance values of the Instantiation annotation of the abstract task. parent Channel applies to the parent of the node hosting the task instance - in the virtual tree topology imposed on the network by the runtime system. children Channel applies to all children of the node hosting the task instance - in the virtual tree topology imposed on the network by the runtime system. Table 3.2: Abstract channel: Annotations 30 Abstract data: Each abstract data declaration represents a type of application-specific data object that could be exchanged between abstract tasks. ATaG does not associate any semantics with the data dec- laration. The number of instances of a particular type of data object in the system at a given time is determined by the associated annotations in the context of a specific deployment and depends on the instantiation and firing rules of tasks producing or consuming the data objects. Each data declaration is labeled with a unique name. Similar to the executable code associated with the task declaration, an application-specific payload is associated with the data declaration. This payload typically consists of a set of variables in the programming language supported by the target platform. No other annotations are currently associated with abstract data items. Abstract channel: The abstract channel associates a task declaration with a data declaration and rep- resents not just which data objects are produced and/or consumed by a given task, but which instances of those types of data items are of interest to a particular instance of the task. An abstract channel is called an input (output) channel if the data item is to be consumed (produced) by the task. In an ATaG program, more than one input channels may be defined for a given abstract data item - denoting the fact that more than one consumer exists for that type of data. The current design of the ATaG runtime allows only one output channel to be associated with a particular abstract data item; i.e., there can be at most one producer task. This restriction may be eliminated in the future. Table 3.2 describes the annotations that can be associated with an abstract channel in the current version of ATaG. The abstract channel is the key to concise, flexible, and architecture-independent specification of common patterns of information flow in the network. For instance, spatial dissemination and collection patterns may be expressed using simple annotations such as “l-hop,” “local,” or “all nodes,” on output and input channels. More sophisticated annotations may be defined as needed or desired for a particular application domain. Section 3.5 illustrates the application of these annotations through a set of ATaG programming examples. 31 3.4 Semantics 3.4.1 Terminology The following terminology is used in the remainder of this section. Task: A ‘task’ may refer to a particular instance of an abstract task or the abstract task itself. For example, a ‘periodic task’ means that the corresponding abstract task in the ATaG has a ‘periodic’ firing rule. On the other hand, a ‘periodic task that is ready for execution’ refers to a particular instance of that abstract task on some node whose firing condition has been met. Although the usage is overloaded, the meaning should be apparent from the context of its usage, especially in light of the fact that an instance of an abstract task is the executable entity, and not the abstract task itself. Data item: The phrase ‘data item’ always refers to an abstract data item. If an instance of a particular data item is being referred to, it will be explicitly stated. Input (output) data item: In the context of a particular abstract task, a data item is called an input (output) data item if there is an input (output) channel that associates the data item with that particular task. Dependent task: In the context of a particular data item, an abstract task is called a dependent task if there is an input channel associating the data item with that particular task. 3.4.2 Firing rules The following rules determine when a task is considered to be ready for execution. The actual time of execution of a ready task depends on factors such as the number of tasks that might precede this task in the scheduler’s queue, the time remaining for the currently running task to complete execution, the duration of each of the preceding tasks, etc. 32 A periodic task is ready when the periodic timer expires, regardless of the state of its input data items. The per-task timer is set to zero each time the task begins execution and is said to expire when the timer value becomes equal to the task’s period. An any-data task is ready as soon as a new instance of any of its input data items is available. An all-data task is ready as soon as a new instance of each of its input data items is available. A periodic any-data task is ready when the periodic timer expires or a new instance of any of the input data items is available. A periodic all-data is ready when the periodic timer expires or a new instance of each of the input data items is available. If a task is any-data all-data, the any-data firing rules apply. 3.4.3 Task graph execution Task execution is atomic. Each instance of an abstract task will run to completion before an instance of any other abstract task can commence execution. All members of the set of dependent tasks of a particular data item are executed before other tasks that might be dependent on the data items output by the tasks in this set. When the production of an instance of a data item results in one or more of its dependent tasks be- coming ready, those tasks will consume the same instance when they invoke aget() on the input data item. This means that that particular instance that triggered the task will not be overwritten or removed from the data pool before every scheduled dependent task finishes execution. 3.4.4 get() andput() A task reads its input data instances from the datapool using theget() primitive invoked as: 33 d = get(int dataID); wheredataID is the unique integer identifier of the desired data item. Each invocation of the instance of a well-behaved abstract task results in exactly one invocation of get() for each of its input data items.get() is a non-blocking call in the sense that the calling task is not suspended till an instance of the requested data item becomes available. The following rules apply to theget() primitive: When an any-data task executes, at least one of itsget() calls will succeed. When an all-data task executes, each of itsget() calls will succeed. get() is a destructive read from the task’s perspective. Once a particular instance of a data item is read by a task, it is considered to be eliminated from the data pool as far as that task is concerned. Subsequent calls toget() for the same data item in later invocations of the task will fail if no newer instance is available, or will return a new instance if one has been produced since the last invocation. A task adds its output data items to the data pool by using theput() primitive invoked as: boolean status = put(d); whered is an instance of some data item, andstatus is the boolean indication of success or failure of the call. put() is not guaranteed to succeed. This is because the ATaG runtime allows for at most one instance of each data item to be present in the data pool at a given node. If a new instance of a particular data item is produced at a node, the old instance (if any) must be overwritten, which is possible only if none of the tasks that are scheduled for execution on that node are dependent on it. If there is at least one task scheduled for execution that is dependent on the particular instance, a put() on that node will return with an indication of failure. Otherwise, the instance will be added to the node’s data 34 pool andput() will return success. The different valid states of a data item and the structure of the data pool on the node is discussed in the next subsection. The responsibility of determining the success of put() and taking appropriate action(s) at the application level is entirely the programmers’. A common scenario whereput() might fail is if a periodic task is producing one or more data items at a faster rate than they can be consumed by the set of dependent tasks. The impact on the application will depend on the semantics of the data item being produced. 3.5 Programming idioms In this section, we qualitatively demonstrate the key features of ATaG by providing sample programs for commonly encountered patterns of information flow that form the building blocks of a large class of applications. The purpose of these examples is to specifically highlight the following: The ATaG data driven model is a natural fit for specifying reactive applications. The concepts of abstract tasks, data items, and channels concisely capture a variety of task placements, and data dissemination and collection patterns. ATaG allows the coding of symmetric behaviors (e.g., neighbor-to-neighbor protocols), asymmetric behaviors (e.g., many-to-one data collection), and combinations of the two (e.g., local neighbor interaction resulting in an alarm condition that is then routed to a root node). ATaG programs are architecture independent. The set of task and channel annotations allow the programmer to control the degree of architecture independence of the specification. Tasks can be placed on specific node IDs or geographic locations or the placement can be left entirely to the compilation framework. Realistic applications can be expected to employ a compromise between the two extremes, with some tasks assigned to specific nodes or locations that are known a priori, while others can be more flexibly mapped. 35 [one-on-node-ID:0] [any-data] Threshold TargetAlert [nodes-per-instance:1] [periodic:10] 10m local local [nodes-per-instance:1] [any-data] Leader- Elect all-nodes TargetInfo local Supervisor Figure 3.2: Object tracking Event Reaction Scope Periodic timer expires Acoustic sensor is sampled Local Sensor reading exceeds threshold (object in range) Propagate location- and time- stamped reading All other nodes that may have detected the same target Sensor reading arrives at node Determine if own reading is higher than readings from neighbors Local Node elects itself the leader Send target location to designated root node - Table 3.3: Event-reaction pairs for object tracking ATaG programming only requires familiarity with a traditional programming language such as C or Java. The declarative part of the ATaG program (depicted by the figures accompanying each example) is specified visually. The imperative part is in a traditional sequential programming language. ATaG programming does not require the mastery of a new syntax or any extensions to traditional programming languages. 3.5.1 Object tracking Object tracking basically involves determining the location of an object in the area being monitored. A simple algorithm for object tracking [55] requires each node to periodically sample its sensing interface and compare it against a pre-defined threshold. A reading that exceeds the threshold is indicative of the presence of a target in the sensing range. The nodes that detect the target elect a leader node, which is the node with the maximum reading among all nodes involved in the election. The leader node then 36 performs some processing of the set of sensor readings and transmits the resultant estimate of target location to a base station. Figure 3.2 is a complete ATaG program for this application behavior. A prototype implementation of this application required approximately 100 lines of Java code overall. Threshold performs the sampling and thresholding on each node of the network. If a target is detected, it generates a Tar- getAlert data item which also carries information about the sensor reading. The assumption in this case is that the sensing range is less than half the dissemination range of 10m, which ensures that every node that detects the target communicates its reading to every other node that has detected the target. TheLeader-Elect task also runs on each node and receives theTargetAlert notifications from all nodes that have detected the target. Since Threshold is pushing the data item to a 10m radius, theLeader-Elect task can just read from its local datapool and does not need to explicitly pull in- stances of data items from its neighborhood. After a requisite number of sensor readings are obtained, Leader-Elect generates the TargetInfo data item if its local reading is the maximum of the readings received from other nodes. 3.5.2 Interaction within local neighborhoods Fig. 3.3 is a complete ATaG program based on neighbor-to-neighbor interaction, which is a common technique to implement collaborative computation where the processing at a given node is a function of its own state or the state of the immediate neighbors. The technique is common because such protocols require a fixed, typically low amount of resources, and scale well with network size. The purpose of this program is to periodically compare its own temperature reading with that of its 1-hop neighbors. This comparison could be used for corroboration or calibration, or to detect unusual conditions such as a fire. Only a single abstract task and a single abstract data item is sufficient to capture this behavior, as shown in the figure. The output channel is annotated with a “ local” because an output to the local data pool 37 local 1-hop && local [nodes-per-instance:1] [periodic:10 || any-data] Temperature Monitor Figure 3.3: Neighbor-to-neighbor gradient monitoring Event Reaction Scope Periodic timer expires Read temperature from sensor Local Temperature reading available Propagate to 1-hop neighbors - Temperature received from neigh- bor Compare with own reading Local Table 3.4: Event-reaction pairs for neighbor-to-neighbor protocol. of the same type of data item that is also an input may cause an infinite loop and unpredictable system behavior, depending on the scheduling policies in the runtime system. 3.5.3 In-network aggregation Fig. 3.4 is a complete ATaG program that sets up a data aggregation tree across the network. Such a mechanism is commonly used in the computation of system-wide properties such as the minimum or maximum reading in the entire system [64]. Event Reaction Scope Periodic timer expires Temperature sensor is sampled Local Temperature reading available from own node or other nodes Apply aggregation function (say, MAX) Local Pre-determined number of appli- cations of aggregation function completed Send aggregated reading to parent node - Table 3.5: Event-reaction pairs for tree-based aggregation 38 Aggregator Temperature [nodes-per-instance:1] [any-data] parent children Sampler local [nodes-per-instance:1] [periodic:10] Maximum local Figure 3.4: Tree-based aggregation Note that although the program indicates a virtual topology (tree), it does not specify how the tree is to be constructed or maintained. The runtime system that supports the “parent” and “children” annota- tions is expected to manage the required protocols. Each node of the tree applies an aggregation function to its own periodic reading (Sampler task) and the readings received from its child nodes. The result is then communicated up the tree to be incrementally aggregated. This is a continuous process, driven by the periodic sampling at each node. To reduce network traffic and save energy, the Aggregator could use static variables to maintain a count of incoming packets (local state) and communicate the reading up the tree only after a certain number of invocations. 3.5.4 Hierarchical data fusion The data aggregation tree in the previous example is a useful but simple structure. More sophisticated applications can be efficiently programmed using hierarchical data fusion. In this pattern, the network is partitioned into domains, and each domain reports to its leader. The leaders in turn are successively organized into a hierarchy with a root node at the top. A quad tree is an example of such hierarchy, with applications in topographic querying of sensor fields [7]. Figure 3.5 is a complete ATaG program that sets up a two level quad-tree. The network is divided into four domains, each managed by one instance of the L1Fusion task. Leaf tasks report to the 39 all-nodes local [one-on-node-ID:0] [any-data] L1Map Root [nodes-per-instance:/4] [any-data] [nodes-per-instance:1] [periodic:10] domain local LeafMap L1Fusion Leaf Figure 3.5: Hierarchical data fusion Event Reaction Scope Periodic timer expires on leaf node Temperature reading sampled Local Temperature reading available at leaf node Reading sent to parent - Reading received at L1 cluster- head Apply aggregation function Local Pre-determined number of read- ings received at clusterhead Send result of aggregation to root node - Table 3.6: Event-reaction pairs for hierarchical data fusion appropriateL1Fusion task. The Root collects the data fromL1Fusion tasks. The data items are labeledLeafMap andL1Map motivated by the application discussed in [7]. The meaning of the domain annotation and the use of ‘/4’ as a parameter for nodes-per-instance are explained in Tables 3.2 and 3.1 respectively. 3.5.5 Event-triggered behavior instantiation The set of collaborative behaviors used to compose distributed spatial computing applications is usually known at design time. However, it is not desirable from both a performance and functionality point of view to execute all behaviors at all times. Especially in systems that monitor and respond to events in the physical environment, there could be quiescent behaviors that are built into the system at design time, but are to be instantiated only when certain conditions are satisifed at run time. The conditions 40 local [nodes-per-instance:1] [any-data] Corroborator GlobalAlarm [nodes-per-instance:1] [periodic:10 || any-data] Monitor Temperature LocalAlarm local 10m:PULL local local 1-hop && local Figure 3.6: Wide-area data collection triggered by a local alarm Event Reaction Scope Temperature gradient exceeds threshold Produce alarm notification Local Alarm notification produced Request temperature readings for corroboration All nodes within a 10m radius Readings corroborate local alarm Produce global alarm Local Table 3.7: Event-reaction pairs for alarm-triggered data collection. could denote a variety of events such as resource depletion at a critical node, abnormal sensor readings, etc. The previous examples used abstract data items primarily to pass information such as the sensor reading or information derived from sensor readings such as a topographic map of the sensor field. However, the semantics of ATaG also allow the instantiation of new behaviors at run time by using abstract data items to represent the occurrence of events, in addition to passing information about the events. Figure 3.6 is a complete ATaG program for an application that monitors temperature gradients be- tween nodes and triggers a data collection and anomaly corroboration over a larger neighborhood if a node detects a high gradient between itself and its neighbors. Only if the anomaly is confirmed does the node produce an alarm event possibly targeted for some supervisor task. The data item LocalAlarm is used to trigger the collection of data from nodes within a radius of 10m. Note that the firing rule for 41 the Corroborator task is any-data. Also, the input channel from Temperature to Corroborator has pull semantics. When the Monitor detects a discrepancy, it produces a LocalAlarm. Due to the any-data firing rule, the Corroborator is scheduled for execution, and the pull semantics then initiate a collection of data from the neighborhood. The Corroborator will use persistent storage (static variables) across in- stantiations to store the collected temperature readings, and produce a GlobalAlarm if the LocalAlarm is corroborated by neighboring nodes. 3.6 Future work 3.6.1 State-based dynamic behaviors The set of task and channel annotations listed and briefly described in Tables 3.1 and 3.2 are useful for describing many behaviors that form the building blocks of networked sensing applications in domains such as environment monitoring and non-real-time object tracking 1 . What the current set of annotations really provides is an abbreviated, concise, and architecture- independent representation of task placement and coordination in an application that can be otherwise developed manually, although with a much greater effort, using a language such as nesC or C. The examples shown as programming idioms can be developed in a top-down manner by first defining the event-reaction-scope tuples and then translating them into the abstract task graph. The same ATaG pro- gram could also be developed in a bottom-up manner by inspecting the placement and communication of tasks in the desired application on a concrete network deployment and then abstracting the commu- nication patterns as channels, the types of functionalities as abstract task with placement annotations, and the types of data exchanged as abstract data items. A promising avenue for future work is to define high level annotations that go beyond mere task placement and communication pathway instantiation. An example of such a class of annotations is 1 Since timing requirements cannot be indicated in the ATaG program and the runtime system may or may not include routing protocols that provide timing guarantees or other latency related quality of service (QoS) requirements, we refer to the object tracking example as non real time. 42 state-based dynamic selection from among alternate implementations of the same abstract task. State could refer to a broad range of parameters such as the resource availability on a particular sensor node, density of deployment in the neighborhood of the sensor node, the instantiation of one or more abstract tasks in a certain vicinity of the sensor node, etc. The tradeoff between quality of the result of a com- putation and the resources required to attain that quality - and algorithms to dynamically adjust for this tradeoff - has been an area of research in high performance scientific computing [10]. In sensor net- works consisting of energy and bandwidth constrained sensor nodes, the application developer might wish to exercise control over the amount of resources that are devoted to some functionality based on the value of parameters such as the state of the energy resources remaining at that node. Such control can be used to (i) optimize application-level execution by switching to a different implementation of the same task when energy levels decrease, and (ii) provide graceful degradation of functionality as resources are progressive exhausted. To support such a program specification, the abstract task will now be associated with one or more implementations in the same language meant to be invoked under different circumstances at run time. A new class of annotations will be required to allow the user to (concisely and precisely) specify the state of the node that is a trigger for a particular implementation. Although the ability to select a different implementation of the same abstract tasks at different times on the same node enables new ways of resource management for application-level quality of service, an equally useful feature is the ability to control which implementation of the abstract task will run on a particular sensor node, depending on state information available after deployment. In the latter case, the implementation may or may not change after run time. Note that this is different from the task placement annotations in the current model which allow the application developer to influence which abstract task is placed on which node in the network, but do not allow the selection between different implementations of abstract tasks. The idea of state of a node - a simple example of which is the amount of energy available on that node as a fraction of the total energy at initialization time - can be generalized to represent the state of 43 the neighborhood. For instance, consider a deployment where a designated root node wishes to receive a particular amount of data (e.g, a particular number of temperature samples per hour) from each region of the sensor network. Now, if the density of sensor nodes in a particular region is high, sampler tasks in that region could report their (aggregated) readings with a lower frequency compared to a region where the density of deployment is less. Examples of more sophisticated annotations that will require significant enhancements both to the ATaG model as well as to the runtime system include: “execute implementation I of task T only if it can be executed for every invocation of task T in the next 2 hours”. Such annotations will bridge the gap between the end users’ understanding of the application requirements and their corresponding specification in the ATaG program. The challenge in defining this particular annotation is to devise a mechanism in the runtime which is capable of predicting the resource usage on the node (with some degree of confidence) based on activity observed on that node in the past. 3.6.2 Resource management in the runtime system Two aspects of resource management are of interest in the context of extending the ATaG model. The first deals with the efficient management of sensing resources and the packaging of sensing as a service provided by the runtime instead of a set of APIs to be learnt by the programmer and invoked by the application-level program. The second aspect deals with allowing the application developer to provide performance-related hints to the compiler. We now discuss each of these in more detail. Sensing as a service. Currently, there are three classes of APIs available to the ATaG programmer: (i) theget() andput() calls to the data pool for consuming and producing data items respectively, (ii) the network-awareness and spatial-awareness API (also offered by the runtime system) that allows a task instance to determine the composition of the neighborhood of its host node, and (iii) the API to the sensor interface. Since the task instance directly accesses the sensing interface, the runtime system is not aware of the access patterns and cannot optimize for cases where sensing resources might be used 44 inefficiently. Consider a scenario where a periodic Task A is interested in sensor data not more than 10 minutes old, and Task B is interested in the same sensor data but with a tolerance of 30 minutes. In the current model, Task A and Task B will be defined as abstract tasks with periodic firing rules with periods of 10 minutes and 30 minutes respectively. The tasks will read from the sensor at each invocation although it is obvious that frequency of Task A’s sampling is sufficient for Task B. A manual optimization in this case is to declare an abstract data item S produced locally by Task A and consumed locally by Task B, and to change the firing rule of Task B to ‘any-data’. Task A will now sample the sensing interface at every invocation but will produce an instance of S (containing the sensor reading) every third invocation. However, such manual optimization is not possible if Task A and Task B are part of different ATaG libraries being composed into a larger application. Future work in this area involves the management of sensing (and actuation) resources through the ATaG runtime system. The ATaG model will be extended by defining a special class of read-only abstract data items (called ‘sensor data items’) that can be consumed but not produced by user-defined abstract tasks. These data items will represent readings (scalar values, images, etc.) from the sensing interface(s). Task will access sensor data using theget() primitive, and the programmer will not be required to learn the details of accessing the variety of sensor interfaces. A set of annotations will be defined for the sensor data items. These annotations could indicate the type of sensing interface and other parameters such as spatial coverage and temporal coverage (frequency of sampling, freshness of data, etc.). This extension will allow the runtime a greater flexibility in task placement and resource management. More importantly, indirect access of sensor interfaces through the runtime system makes ATaG programs even more architecture independent because the imperative part of the program (i.e., the task code) does not need to incorporate any code that is specific to a particular type of sensor or actuator. Nodes with diffeent sensors of the same type (i.e., producing the same type of sensor data item) can host instances of the same abstract task without the programmer being required to modify the code to adjust for the different sensor APIs. 45 Application-level control of system performance. In almost all traditional parallel and distributed computing especially in scientific computing, all data was equal. The scheduling of tasks and handling of data was almost entirely influenced by end-to-end latency considerations. Hence, the many variants of the basic task graph (or other dependency graphs) did not support the concept of varying levels of ‘importance’ that could be assigned to tasks or data. The nature of networked sensing is such that some data items and computation pathways could have greater importance than others - where importance could imply preferential processing in terms of immediate scheduling of the tasks involved or allocating more resources to ensure that some data items are routed with better ‘quality’ (e.g., less latency) than others. For example, if the instance of the abstract data item represents the (possible) detection of a forest fire, the application developer would naturally want the runtime system to expedite the transmission of this data from the producer node to the designated supervisor node. Defining and supporting such annotations also requires a close integration with the network model, the architecture of the runtime system, and the availability of protocols that are capable of providing the required services. 3.6.3 Utility based negotiation for task scheduling and resource allocation Service-oriented specification of networked sensing applications is a vision where programming for sensor networks essentially involves the specification of semantic information desired by the end user. This purely declarative high level specification is used to first select a set of services from the library of available services for the target network - where each ‘service’ could map to an independent applica- tion with a well defined interface for integration with other applications. In the context of ATaG where composition of two independent ATaG program is equivalent - in the simplest case where the two pro- grams do not share data or functionality - to the concatenation of the corresponding task graphs, each service could naturally map to an ATaG program. Of course, this requires a new markup language for describing ATaG programs in terms of the services they provide to the end user - similar to Semantic Streams [57]. 46 Assuming that the component sub-programs can be identified from the high level specification, and the final mapping of tasks to nodes and the setup of communication pathways in the network is accomplished, the next problem is to manage resource allocation in face of conflicting requests from application tasks. For example, two tasks on the same node could request an image from the camera at the same time, but require the camera to be pointing in different directions. A utility based negotiator in the runtime could decide the resource allocation in such scenarios. The challenge is to develop a robust and scalable implementation of utility based negotiation, and to define a common utility scale that can be used across disparately developed ATaG libraries that are combined into a larger application. The concept of utility could also model task priorities and resolve conflicts when more than one task simultaneously requests preferential treatment. The key challenge in extending the basic model to handle such scenarios is to maintain the core design objectives - especially application neutrality - while enabling the expression of increasingly sophisticated behaviors. 47 Chapter 4 DART: The Data Driven ATaG Runtime 4.1 Design objectives 4.1.1 Support for ATaG semantics The primary objective of DART is to provide the required underlying mechanisms for communication and coordination between instances of abstract tasks specified by the programmer. Architecture inde- pendence of ATaG is ensured primarily by the deployment-specific interpretation of the generic task and channel annotations. Depending on the characteristics of the underlying network, the responsibility of translating the annotations could be distributed between the compile-time code generator and the run- time system itself. For instance, consider an output channel with an annotation neighborhood-hops:1, indicating that the data item produced by the assocated task is to be sent to all the 1-hop neighbors of the node where the item is produced. For a network composed of relatively resource-rich nodes such as Stargates [51] connected by a robust wireless network, this annotation can be translated at compile time. The compiler will analyze the network graph, determining the nodes that will host the associated task, determine the IDs or geographic locations of the 1-hop neighbors (depending on the routing protocol being used) and hardcode the list of destinations for that data item into the runtime system. Every time an instance of the data item is 48 produced, the runtime system will look up the IDs of the destination set (which is, in this scenario, assumed to be unchanging) and send the data item to each member of that set. On the other hand, the same ATaG program could be synthesized onto an underlying network that is dynamic in nature where the set of neighbors of a node is expected to change frequently - nodes being added or removed from the network (in a mobile setting), nodes failing due to exhaustion of limited energy resources, unreliable communication due to the hostile environment, etc. Clearly, the compile time analysis of the network graph is not feasible in such a scenario and the runtime system support runtime translation of the neighborhood-hops:1 annotation into the instantaneous membership of the set of 1-hop neighbors. In addition, there are decisions to be made about how frequently should the runtime system update its view of the neighborhood, the impact of such updation on the performance and of the system, etc. 4.1.2 Platform independence The objective of the DART design is not so much on the implementation of an ATaG runtime system for a particular sensor node platform or a particular language and operating system, but on the architecture of a runtime system template that will hopefully be useful for implementing versions of DART tailored to specific platforms. This means that the assumptions about the underlying operating system implicit in the operation of the DART template should be clearly spelled out and should also be minimized. Specifically, assumptions about the type of scheduler, support for multi-threading, synchronization and inter-process coordination primitives, etc., should be explicitly stated. Ideally, any operating system kernel that provides these basic facilities should be a friendly target for implementing DART. Such platform independence is important because an important purpose and of the ATaG program- ming model is to hide almost all the low level details of control and coordination from the programmer, allowing him/her to focus only on expressing the desired behavior in terms of data driven event-reaction semantics with suitable annotations to govern deployment-specific task placement and communication. 49 This architecture independence makes ATaG a good candidate for implementation on heterogenous sys- tem architectures. Unless the architecture of the underlying runtime system is defined in a platform independent manner, a ‘seamless’ deployment of ATaG on such systems will not be possible. 4.1.3 Component-based design Components are “units of independent production, acquisition, and deployment that interact to form a functioning system” [52]. A component is the deployment of one or more interfaces that define the service offered by the component to its consumers. Since the customers rarely care about how the particular interface is implemented, the data and algorithms used internally by the component (module) implementation can be considered to be ‘owned’ by the module and the implementation details will typically be hidden from other modules. This also means that development of a component is decoupled from its integration into the system. Indeed, a variety of implementations of the same component (i.e., providing the same service by implementing the same interface(s)) can be developed to meet various requirements, and the suitable implementation selected at the time of composition. The modular structure of component-based design has many significant advantages. First, it greatly simplifies the design by requiring the clear identification of components in terms of what exactly they model in the problem domain. Interactions and dependencies between components are also defined in terms of service provider and service consumer relationships. Second, as mentioned above, hiding the implementation of a module from other modules makes it possible for an entirely different set of protocols to be used to provide the same service interface without affect the rest of the system. In the specific case of the ATaG, this allows the runtime system to be tailored for a specific target platform by selecting the suitable intra-module protocols without requiring a complete redesign. For instance, one of the modules of DART is responsible for translating channel annotations into list of node IDs or locations. The list of channel annotations used by the ATaG program is known at compile time. This knowledge can be used by the software synthesis process to include only those 50 protocols in this module as are required to translate all the annotations actually used in the program and not all the annotations supported in the ATaG model. For example, if the application does not require a virtual topology (say, a tree) and therefore does not employ the parent and children annotations in the ATaG program, the protocol to construct and maintain a logical tree, and thereby the ability to translate the corresponding annotations need not be built into the runtime. Indeed, the runtime system can be customized differently for each node, based on the services (protocols) required by the tasks instantiated on that node. Component design of the ATaG runtime system can also be seen as a step towards defining standards to be followed by the designers of a particular protocol for, say, routing, to ensure that the result is usable in a ‘real’ end-to-end system. In the absence of a holistic template for structuring interactions between diverse protocols and services for sensor networks Another side effect of this design is that it allowed us to use essentially the same runtime system software for functional simulation as is intended for real deployments, by replacing only a subset of the modules - especially those that deal with the transceiver interface - and leaving others intact. This is discussed in Section 5.5. 4.1.4 Ease of software synthesis We have built an end-to-end application development framework based on the ATaG programming model that also includes a tool for synthesis of compile-ready customized software for the individual node of the target network, based on the ATaG program and the network description. The synthesized software for a node has three components: (i) a common DART kernel that runs on every node and handles basic tasks such as data pool management, managing the basic networking protocols, etc., (ii) user-supplied code for abstract tasks and user-supplied data structures (and methods) for abstract data items, and (iii) glue code for the interface between the runtime and the user-supplied code. 51 The user-supplied code and the common runtime code are available to the software synthesizer, and ease of software synthesis can be measured by the size of the glue code that is to be generated for a par- ticular node for a particular ATaG program. The choice of data driven computing as the programming paradigm for ATaG is also influenced by the fact that in a data driven software system, the only inter- action between the user-supplied code and the runtime system is through theget() andput() calls implemented in the datapool manager. Therefore, the purpose of the glue code that is to be synthesized can be broadly classified as follows: Allowing the runtime to interact with application tasks, i.e., to determine their attributes (such as firing rule and input-output interface), schedule the tasks for execution through suitable interfaces such as the Runnable interface of Java if the application tasks are provided as Java classes implementingRunnable, etc. Providing state information (context awareness) required by the node to situate itself in the net- work. For instance, if nodes have pre-assigned identifiers, the ID should be hardcoded into and accessible through a suitable function call by the modules of the runtime system. For scenarios where the program is synthesized onto relatively static and robust networks (as discussed above in Section 4.1.1), information such as the composition of the node’s neighborhood will be incor- porated into the runtime system at software synthesis time. Other information such as the role of the node in a virtual topology (if any) will also be determined and incorporated into the software. For instance, on initialization after deployment, each node will check if it is supposed to be the designated root node, and based on the (boolean) result of the query, adjust the behavior of its protocols for virtual topology formation. Pre-wiring communication pathways. Consider a simple ATaG program for centralized data col- lection with two abstract tasks and one abstract data item. The programmer uses channel an- notations to indicate that all data produced by the Sampler on each node is to be routed to the 52 Collector on some designated root node. The placement of the Collector is specified by the an- notation of that abstract task - say, as the node with ID 0. When a data item is produced on some non-root node, the runtime system on that node should know the destination of the data i.e., the location or ID of the root node. In some deployments, the ID and location of the designated root node could be fixed and known a priori (e.g., it might be a gateway node connected to the desk- top PC of the building supervisor). In such cases, the runtime systems on non-root node can be pre-programmed with a destination list (in this case, the root node) for the data item in question. Scenarios where this might not be suitable is if the root node itself is dynamic (say, a PDA device carried by the building supervisor) or the selection of a node as the root is performed only after the system is initialized. 4.2 Overview Figure 4.1 is a high level overview of the structure of DART (Data driven ATaG RunTime). In this section, we briefly discuss the functionalities of the various components and their interactions. Later subsections focus in detail on the implementation of each component. The software system on each node can be divided into an application-layer that consists of the user- supplied code for each abstract task placed on that node, and a system-layer that contains the modules of the runtime system. Presently, the sensor/actuator interfaces are not managed by the runtime, although that is the subject of future work. Hence, if an abstract task requires access to the sensor or actuator interface, the necessary code has to be supplied by the programmer who is also required to understand the APIs involved. The two system level interfaces that are available to the user tasks are the Datapool and the NetworkArchitecture - as shown in the figure. TheDataPool components on all nodes in the network together manage the production, con- sumption, localization, and routing of all instances of abstract data items produced in the network. 53 Translate annotations, manage outgoing data Dispatcher Routing protocols, medium access, physical layer NetworkStack Transceiver Spatial awareness, network awareness, logical namespaces NetworkArchitecture get() and put(), concurrent access, reference counts DataPool Task code, dependencies, annotations ATaGManager UserTaskn UserTask1 Sensors Actuators ... Application level System level Figure 4.1: DART: The ATaG runtime system They provide the abstraction of a single logical pool of data items that is uniformly accessed by all tasks in the system using the basicget() andput() primitives. TheAtagManager component acts as the respository of all relevant information concerning the declarative part of the ATaG program that might be required by other components for intra-node coordination and inter-node communication. This information includes the number of abstract tasks, data, and channels, the task and channel annotations, input-output relationships between tasks and data items on that node, and the firing rule for each task. The AtagManager also schedules the application-level tasks for execution when their firing conditions are met. NetworkArchitecture is the component responsible for managing all protocols for neigh- bor discovery, virtual topology formation, etc., with the objective of providing the mechanism to translate a channel annotation into a list of node identifiers. For instance, if a data item is to be 54 sent to the ‘parent’ node in the virtual tree topology, it is the role of this component to imple- ment the protocols for tree formation and maintenance, and when queried, return the ID of the neighboring node that is the current parent. As indicated by its name, theNetworkStack is in charge of communication with other nodes in the network, and manages the routing, medium access, and physical layer protocols. The Dispatcher is a helper component that coordinates between the DataPool, Atag- Manager,NetworkArchitecture, andNetworkStack with the purpose of transmitting instances of data items produced on the node to their suitable destinations in the network, as indi- cated by the annotations of the output channel associated with the data item in the ATaG program. In the following sections, we describe each component of DART in more detail. To highlight the component-based design of the software system, the service offered by each component is described first, followed by the consumers of that service, and finally the implementation details of the service. Note that the primary objective of the current version of DART is to demonstrate the feasibility and use- fulness of programming with ATaG. The programming and software synthesis environment (Section 5) for ATaG has an accompanying simulation and visualization front-end. The current implementation of DART is meant to be a component of this simulation environment that runs on a single machine. Al- though DART is designed as a component-based template for a general ATaG runtime, some of the low level functionalities (such as routing protocols and topology formation protocols) that will be required for DART to run on a multi-node distributed sensor node deployment have been replaced by code that simulates these functionalities for single-machine simulation. As highlighted in Section 4.1.3, the ad- vantage of component based design is that the implementation of a component can be changed as long as the service it provides remains the same. Hence, the replacement of some of the functionalities within a component by functionally equivalent code that simulates their existence could be performed without affecting other components such asDataPool,AtagManager,Dispatcher, etc. 55 4.3 Components and functionalities 4.3.1 UserTask 4.3.1.1 Service Each abstract task in the ATaG model is required to have an instance of UserTask. This class is the imperative part of the abstract task declaration and contains the application-level code represented by the abstract task. From the perspective of the DART design, the service interface provided by this component is basically the JavaRunnable interface which is invoked when this task is to be scheduled for execution. 4.3.1.2 Interactions The user level task interacts with the DataPool by accessing the get() and put() functions for reading and writing data items respectively. UserTask can also use the interface provided by the NetworkArchitecture component to obtain the list of node IDs (or locations) that constitute a specific neighborhood of the node defined either in terms of hops or Euclidean distance. For instance, the input channel for that user task might be annotated as neighborhood-hops:1, which means that each piece of incoming data is from one of the 1-hop neighbors of that node. If the functionality of the abstract task is to wait till at least one reading is received from each neighbor, and then aggregate the set of readings, it is important for the task to be able to determine how many 1-hop neighbors it has and what their locations or IDs are, so as to be able to decide when the round of collection can be considered complete. This information is maintained by theNetworkArchitecture module and can optionally be accessed via the suitable query interface if required by the user task. Finally, the UserTask can use the APIs provided by the sensors and/or actuators on the node. In the current version of DART, sensing resources are to be accessed directly by theUserTask. In future versions, this access will be through the runtime system (see Section 3.6.2 for more details). 56 4.3.1.3 Implementation Figure 4.2: UML class diagram: Supervisor (instance of UserTask) UserTask is basically a Java class that implements the Runnable interface so that the Atag- Manager can schedule it for execution when its firing rules are deemed to be satisfied. In our program- ming and software synthesis framework, the code template for each instance ofUserTask correspond- ing to a different abstract task is generated automatically. This template consists of the task constructor and some other attributes such as a reference to theDataPool that is required for invokingget() andput() etc. Sample ATaG code listing for the Monitor task of Figure 3.6 is shown in Figure 4.3. The ATaG programmer will write a similar piece of code for each abstract task. Note that the program is written in a traditional language (Java in this case), and involves only the manipulation of data items that cor- respond to application-level events. No calls to the networking stack or any other system level services are explicitly invoked. The code also does not involve any calls to other application-level tasks – a characteristic of data driven program flow. 57 public class Monitor extends UserTask { // local variables to maintain state private static int myReading = 0; private static boolean wasAlarm = false, isAlarm = false; private static int[] targetReadings; [...] public Monitor(DataPool dp, Config myconfig, NetworkArchitecture t_networkArchitecture, mGUI t_GUI){ super(dp, myconfig, t_networkArchitecture, t_GUI); // obtain information about the neighborhood of interest neighborIDs = m_networkArchitecture.kHopNeighborIDs(1); neighborCoords = m_networkArchitecture.kHopNeighborCoords(1); [...] } public void run() { DataItem t_dataItem = m_dataPool.getData( IDConstants.T_MONITOR, IDConstants.D_TEMPERATURE); [...] m_temperature = (Temperature) t_dataItem.core(); // store the received temperature reading with its origin if (t_dataItem.originID() == m_myState.myID()) myReading = m_temperature.get(); else setNeighborReading(senderID, m_temperature.get()); // check if gradient is exceeded for (int n=0; n<neighborIDs.length; n++) { if (Math.abs(getNeighborReading(n) - myReading) > 5) { isAlarm = true; break; } } // alarm produced only at transition (alarm to no-alarm) if (isAlarm && !wasAlarm) { // no-alarm->alarm transition wasAlarm = isAlarm; m_lAlarm = new LocalAlarm(); DataItem m_dataitem; m_dataitem = new DataItem(IDConstants.D_LOCALALARM, IDConstants.T_MONITOR, m_lAlarm); m_dataPool.putData(m_dataitem); } else if (!isAlarm && wasAlarm) { // indicate transition from alarm to no alarm [...] } } } Figure 4.3: ATaG code listing for the Monitor task in Figure 3.6 58 4.3.2 DataPool 4.3.2.1 Service TheDataPool provides two types of interfaces. The first interface includes theget() andput() commands used to add data items to and remove data items from the data pool respectively.putFrom- Network() is a minor variant of theput() call that is invoked when the data item arrives from the network interface instead of being produced by an application task. The second interface supports a variety of calls used to query the state of data items in the pool; e.g., whether an instance of a data item is available or unavailable, active or inactive, etc. These terms are defined and explained in more detail in Section 4.3.2.3. 4.3.2.2 Interactions In the current design, the user tasks interact with theDataPool through theget() andput() calls. TheNetworkStack invokes theputFromNetwork() call when a data item sent by another node arrives at the network interface. TheAtagManager invokes the status query interface to determine if one or more tasks are ready to be scheduled for execution. 4.3.2.3 Implementation Data pool management involves handling concurrent accesses by more than one user level or system level task, maintaining reference counts for each instance of a data item in order to determine if a particular instance is active (i.e., still waiting to be consumed by one or more tasks that are scheduled for execution) or inactive (i.e., it can be overwritten when a new instance of the same type of data item is produced by a local task or received by the NetworkStack from another node). The get() function returns a copy of the requested data item to the caller and decrements the reference count of the associated item by one. put() adds an instance of a particular abstract data item to the data pool, unless the existing instance is active, in which case it returns without changing the data pool. 59 Figure 4.4: UML class diagram: DataPool d n Instance totalRefs n nowRefs n 1 m 1 m . . . DP 1 DP n DP 2 d 1 d 2 d n Figure 4.5: Structure of the data pool Let be the set of abstract tasks, and be the set of abstract data items in the ATaG program. Atmost one instance of each data item can exist on a node at a given time. Let be the set of entries of the datapool. Two boolean arrays — and — of length each are associated with each entry (see Figure 4.5). When an instance of is produced, these arrays help to keep track of the dependent tasks for that data item, and the subset of those tasks that is scheduled for execution, respectively. The following explanation will clarify the role of these arrays. 60 When the node is initialized, the following is true: An instance can be in one or more of the following states at a given time: is available for task if . is available if s.t. is available for task . is unavailable for task if is unavailable if is unavailable for all tasks. is active if (i) is available, and (ii) s.t. is inactive if (i) is available, and (ii) Suppose task invokes get() for some . get() succeeds if is available for , and fails otherwise. Ifget() succeeds, and are both set to , indicating that the task has consumed that instance. Suppose task invokesput() for some . put() succeeds if is unavailable or inactive, and fails otherwise. If put() succeeds, the instance of passed by the call occupies entry of the datapool. Next, the datapool manager determines if there are any dependent tasks for , and further if any of those dependent tasks are ready. Let be the set of dependent tasks of and be the set of ready tasks at the time theput() was invoked, where . Before the successful put() returns, the datapool manager ensures that: , and . 61 An any-data task is scheduled for execution whenever any of its input data items become avail- able. Similarly, an all-data task is scheduled for execution whenever all of its input data items become available. The destructiveget() by task of some is ensured by setting the and to false. When a new instance of is created, the correspondingput() will set these values to true again. Also, when a new instance is produced, the number of tasks that are ready to consume that instance is reflected in the number of true entries in . Only when that instance is consumed by all the ready dependent tasks do the entries in become false and anyput() allowed to overwrite that instance. Note that the use of two arrays is necessary because the fact that one or more tasks are dependent but not ready is reflected in thetotalRefs array (e.g., an all-data task whose other data items are not yet available). ThenowRefs array merely records whether a particular instance is being ‘actively’ consumed by one or more dependent and ready tasks. 4.3.3 AtagManager 4.3.3.1 Service TheAtagManager supports a notification interface that is invoked whenever a new instance of a data item is produced by one of the tasks running on the node. A second interface provides answers to queries about the declarative part of the ATaG program - e.g., the type and parameters of a particular channel annotation. 4.3.3.2 Interactions The notification interface is used by theDataPool as part of processing aput() call from the user task or aputFromNetwork() call from theNetworkStack component. The query interface for the declarative part of the ATaG program is used by theDispatcher component. 62 4.3.3.3 Implementation AtagManager is charged with internally representing the entire declarative part of the ATaG program and maintain handles to the task code so that instances of abstract tasks mapped onto the node can be invoked when their firing conditions are met. Each abstract task declaration is stored as an instance of the TaskDeclaration class, and each abstract channel is stored as an instance of the Chan- nelDeclaration class. The UML diagrams for these two classes are shown in Figures 4.6 and 4.7 respectively. Figure 4.6: Storing abstract task declarations in the TaskDeclaration class The constructor ofAtagManager is where these classes are instantiated - one for each abstract task and abstract channel in the ATaG program (Figure 4.8). This is one of the few methods in the runtime 63 Figure 4.7: Storing abstract channel information in the ChannelDeclaration class 64 system that contain application-specific code which has to be generated during the software synthesis process. The UML class diagram for AtagManager in Figure 4.9 shows the various attributes and methods in the current version of this class. // ************ START OF AUTO-GENERATED CODE numTaskDecls = 3; taskDecls.add(IDConstants.T_SAMPLEANDTHRESHOLD, new TaskDeclaration(IDConstants.T_SAMPLEANDTHRESHOLD, new SampleAndThreshold(m_dataPool, m_config, m_networkArchitecture, m_GUI), Thread.MAX_PRIORITY-0, "NODES PER INSTANCE", false,1, PERIODIC", 1, true)); taskDecls.add(IDConstants.T_LEADER, new TaskDeclaration(IDConstants.T_LEADER, new Leader(m_dataPool, m_config, m_networkArchitecture, m_GUI), Thread.MAX_PRIORITY-1, "NODES PER INSTANCE", false,1, "ANYDATA", 3600, false)); taskDecls.add(IDConstants.T_SUPERVISOR, new TaskDeclaration(IDConstants.T_SUPERVISOR, new Supervisor(m_dataPool, m_config, m_networkArchitecture, m_GUI), Thread.MAX_PRIORITY-2, "ONE INSTANCE ON NODE ID", false,0, "ANYDATA", 3600, false)); numChannelDecls = 4; channelDecls.add(0, new ChannelDeclaration( IDConstants.T_SUPERVISOR, IDConstants.D_TARGETINFO, "INPUT", false, "push", "ALLNODES", 0)); channelDecls.add(1, new ChannelDeclaration( IDConstants.T_LEADER, IDConstants.D_TARGETALERT, "INPUT", false, "push", "", 0)); channelDecls.add(2, newChannelDeclaration( IDConstants.T_SAMPLEANDTHRESHOLD, IDConstants.D_TARGETALERT, "OUTPUT", true, "push", "NEIGHBORDISTANCE", 300)); channelDecls.add(3, new ChannelDeclaration( IDConstants.T_LEADER, IDConstants.D_TARGETINFO, "OUTPUT", true, "push", "", 0)); // ************** END OF AUTO-GENERATED CODE Figure 4.8: Automatically generated section of the AtagManager constructor. When a new data item is added to the data pool using theput() orputFromNetwork() function call, part of the processing of the function calls in theDataPool class involves an invocation of the newInstanceProduced() function of theAtagManager. The arguments to this function are the ID of the abstract task that produced the data item and the ID of the data item that was produced. If 65 the output channel between the task and data is marked as non-local, no further processing is performed because a non-local output channel is not meant to trigger any scheduling of dependent tasks on the local node. If the channel is not marked as non-local, theAtagManager determines the number of abstract tasks mapped onto that node that are dependent on the data item and populates the totalRefs and nowRefs arrays associated with the data item in the data pool. The role of these arrays was discussed in Section 4.3.2.3. While determining thenowRefs entries, the firing rule and input data items for each of the dependent tasks is checked, and the task is scheduled for execution (and marked correspondingly in the nowRefs array) only if the firing condition is met. This process involves a callback from the AtagManager to check the status of specific data items and to manipulate the contents of the two arrays. When the newInstanceProduced() function returns, all dependent tasks mapped onto that node whose firing conditions are met are already in the scheduler’s queue waiting for execution. Figure 4.9: UML class diagram: AtagManager 66 4.3.4 NetworkStack 4.3.4.1 Service As indicated by its name, the basic service provided by theNetworkStack to the other components of the runtime is sending a data item to one or more nodes in the sensor network. The component is responsible for managing and initializing all the required protocols which will typically include physical layer, medium access, and routing protocols. ThesendData() functions shown in the class diagram (Figure 4.10) provide this service. 4.3.4.2 Interactions TheDispatcher and the NetworkArchitecture components interact with the network stack. The former uses the interface to send data items to a set of nodes as indicated by the annotations of the output channel associated with that data item. The topology creation and management, and other similar protocols in the latter also access the transceiver through theNetworkStack. 4.3.4.3 Implementation Figure 4.10: UML class diagram: NetworkStack 67 The implementation of this component is almost entirely dependent on the target sensor node plat- form and the family of protocols available for that platform. The prototype version of DART is im- plemented primarily as a component of the simulation and visualization environment that accompanies the ATaG visual programming interface. Since the simulation is on a single machine, the interaction between independent DART processes representing different nodes of the network is through sockets on the simulation machine. The current DART implementation therefore opens a listener thread on a pre-defined socket number to simulate the receiver and a transmitter thread that sends the data items to the receiver sockets of the destination nodes. In a ‘real’ DART implementation (i.e., one that is deployed on a real or simulated sensor node that can directly communicate only with its 1-hop neighbors), protocols managed by theNetworkArchi- tecture will register their interest in specific message types which will correspond to the protocol- specific information exchanged between nodes. A message queue or similar mechanism will be used to exchange data between these protocols and the receiver and transmitter threads of theNetworkStack. This is similar to the active messages [15] model. 4.3.5 NetworkArchitecture 4.3.5.1 Service NetworkArchitecture is responsible for managing all protocols and maintaining all information related to the situatedness of the node in the network. Situatedness implies a knowledge of the neigh- borhood composition, the role of the node in one or more virtual topologies (such as trees or meshes) that might be permanently or temporarily overlaid on all or part of the network. This service is pro- vided through a query interface that translates architecture-independent specifications such as “ID of parent node”, “IDs of child nodes”, “geographic locations of nodes within 10m of this node”, etc., into the desired ID or location list. To summarize, this component provides context-awareness to the application-level and system-level components of the software system running on the sensor node. 68 4.3.5.2 Interactions UserTask instances may optionally interact with theNetworkArchitecture to obtain informa- tion about the node’s own coordinates, the composition of its neighborhood, its role (if any) in a virtual topology, etc. TheDispatcher also uses this query interface to translate annotations of output chan- nels into list of node IDs and/or locations for transmitting the newly produced data item to its specified destinations. Finally, the NetworkArchitecture uses the services provided by the Network- Stack as required by the protocols it manages. 4.3.5.3 Implementation Figure 4.11: UML class diagram: NetworkArchitecture As mentioned above,NetworkArchitecture is required as a separate (and important) compo- nent of DART because application level tasks require information about the situatedness of the node in the target deployment. The architecture-independence and data driven semantics of ATaG means that all the input and output by instances of abstract tasks is through the basicget() andput() primitives. 69 All communication over the network is implicit in the channel annotations and is not directly controlled by the imperative portion of the ATaG program. However, an integral characteristic of networked sens- ing is that the processing of data items could be influenced by factors such as the location of the node, the density of sensor nodes in its region of deployment, etc. This means that if an abstract task with an input channel labeled neighborhood-hops:1 is mapped onto a node, it is highly probable that the task code will want to know the composition of its 1-hop neighborhood in order to meaningfully interpret and suitably process the incoming data items represented by that channel. The current implementation of NetworkArchitecture maintains information about a neigh- borhood whose scope is determined by the channel annotations of abstract tasks mapped onto that node. For example, let task A and task B be the only two abstract tasks of the ATaG program that are mapped onto a particular node. Suppose task A has an input channel with annotation neighborhood-hops:3 and task B has an input channel with annotation neighborhood-distance:50m. At compile time, theNet- workArchitecture component on that node is configured to collect information only about the union of the set of nodes within 3 hops of that node and the set of nodes within 50m of that node. This ensures that the computation, communication, and storage resources required to maintain this informa- tion are justified by the (possible) utility of the information to tasks on that node. The set of function calls that form the query interface supported by this component are shown in the class diagram of Fig- ure 4.11. Decisions about the activation of protocols for virtual topology formation are also taken at compile time. For instance, if the application requires a virtual tree topology, the programmer will pre- sumably have identified the nodes which form the root and non-root members of the tree in the network model that is provided to the compiler. TheNetworkArchitecture modules on all or some of the nodes in the network will then be configured to start the tree formation protocols at node initialization time. The four types of events involving theNetworkArchitecture that can occur at run time are: a data item of interest to one of the protocols managed by this component arrives at the transceiver and is 70 communicated to the protocol by theNetworkStack, a data item is send to theNetworkStack by one of the protocols managed by this component, the query interface is invoked by an application level task, and the query interface is invoked by theDispatcher. 4.3.6 Dispatcher 4.3.6.1 Service TheDispatcher is responsible for transmitting any new instance of a data item produced on the node to other nodes (if any) indicated by the output channel annotation associated with the data item. The component therefore supports a notification interface that consists of anewInstanceProduced() function. 4.3.6.2 Interactions TheDataPool is responsible only for managing the data pool, theAtagManager stores information about the declarative part of the program and also schedules the imperative portions for execution when appropriate, the NetworkStack manages the transceiver, and theNetworkArchitecture is in charge of situatedness information of the node. Hence, a new component - theDispatcher is respon- sible for coordinating between these modules and when an instance of a data item is produced, sending it to the set of destination nodes as indicated in the ATaG program. Specifically, this component uses the query interface ofAtagManager to obtain the output channel annotation associated with the data item, the translation service of theNetworkArchitecture to convert the channel annotation into a list of node IDs (or locations) that correspond to the annotation at that time, and thesend() interface of theNetworkStack to actually dispatch the data to the destinations. 71 Figure 4.12: UML class diagram: Dispatcher 4.3.6.3 Implementation The implementation of theDispatcher is straightforward, mainly because its primary role is one of coordination between other modules. Hence, the only data maintained by this component consists of handles to theAtagManager,NetworkArchitecture, and theNetworkStack,tobeinvoked in that order. 4.4 Control flow The flow of control among the components of DART can be divided into two parts. The first is the set of activities that occur at node initialization. The second is the set of actions triggered during the course of application execution on that node. This set includes events that are generated by the user-level code (e.g., production and consumption of data items) and also events generated by components of the runtime system such as the protocols managed by theNetworkArchitecture component. 4.4.1 Startup Each module of DART is expected to implement astart() function that performs the basic initializa- tion (if any) required for that module. The initialization might involve memory allocation, initialization of variables, spawning of new threads for different protocols and services, etc. A special Startup module of DART is the first to run when the node is turned on, and invokes thestart() functions of 72 the other modules in the following order. First, theDataPool is started, which mainly involves allo- cating memory for each entry of the data pool corresponding to the different data items in the ATAG, and then marking the entries of the datapool as empty by suitably initializing the reference counts. Nat- urally, on resource-constrained platforms where dynamic memory allocation is not supported and the data structures of the data pool are determined and generated as part of software synthesis at compile time, the duties of the startup function will be reduced. Next, theNetworkStack is started, which spawns the listener thread to accept incoming connections, and a transmitter thread to handle outgoing messages. The initialization, if any, needed by the MAC and routing protocols, and also the localization and time synchronization protocols, is performed before thestart() of theNetworkStack returns control toStartup. Now that the basic communication service with other nodes is available, theNet- workArchitecture module is started, which will spawn the protocol threads required for neighbor discovery, virtual topology construction, middleware services, etc. The startup of this module could be deemed complete when some minimum node state has accumulated; e.g., all the information about the neighborhood is available. Finally, theATaGManager is started. This module traverses the list of user-level tasks assigned to that node, and spawns all the tasks that are marked ‘run at initialization’ by the programmer. These will typically be the tasks that (periodically) produce the set of sensor readings that will then drive the rest of in-network processing. It is important to note that a periodic firing rule does not necessarily mean that the periodic execution of the task is started when the node is powered on. This is because the application developer might want some task(s) to execute periodically only when a certain stage of the computation is reached or a certain event is detected. Hence, the boolean property ‘run at initialization’ is to be specified for each abstract task (false by default) and only the tasks that have this property set to true will be started at node initialization, regardless of the firing rule. The application developer can use this mechanism to define application level functionality that is executed only at initialization. 73 4.4.2 get() andput() During the normal course of application execution, three main events can occur: (i) aget() invocation by a user task, (ii) aput() invocation by a user task, or (iii) aput() invocation by the receiver thread when a data item arrives from another node. As explained in Section 4.3.2.3, a get() invocation merely results in the clearing of the corre- sponding entries of the totalRefs andnowRefs arrays of the data pool and, as a side effect, can change the state of a particular instance of a data item from available to unavailable, etc. In the current implementation, the processing of a get() call is performed entirely within theDataPool compo- nent and none of the services offered by other DART components are used byDataPool. Dispatcher NetworkStack Transceiver NetworkArchitecture DataPool ATaGManager UserTaskn UserTask1 ... 1 2 5 3 4 6 7 EVENT Figure 4.13: Flow of control on aput() invocation The processing of a put() call is more involved. Figure 4.13 shows the flow of control among DART components triggered by aput() - steps 1 through 7 of the figure correspond to the following: 1. An instance ofUserTask invokesput() for a particular abstract data item. TheDataPool first checks if the corresponding data item can be safely added to the data pool - i.e., if the data 74 item is unavailable or inactive. If the addition fails, theput() returns with an error code and the contents of the data pool are not modified. 2. If the addition succeeds,DataPool invokes thenewInstanceProduced() function of the AtagManager. The AtagManager checks if the output channel annotation for the newly produced data item contains non-local. If not, the AtagManager determines the list of tasks that depend on this data item and checks their firing rules. The arrow that denotes Step 2 is double headed because this process involves some calls back to the DataPool to check the status of certain data items. 3. If one or more tasks are ready to be scheduled for execution, the AtagManager invokes the run() function provided by theRunnable interface supported by eachUserTask. 4. TheDataPool notifies theDispatcher and returns control to the user task. Further process- ing by theDispatcher can proceed in a separate thread of control. 5. TheDispatcher obtains the output channel annotation for the data item. If the ouptut channel is marked local only, the data item is not to be transmitted to any other nodes in the network and processing of theput() call terminates at this point. 6. If the output channel annotation indicates transmission of the data item to one or more nodes of the network, theDispatcher queries theNetworkArchitecture to translate the channel annotation into a list of node IDs (or locations). Note that this assumes a scenario where the annotation is not translated into node IDs (or locations) at compile time, which typically will be the case if the network is dynamic. For a static network where some annotations are translated into node IDs (or locations) through an analysis of the network graph at compile time, the run time translation will not be required. Instead, a list of node IDs (or locations) will be provided to theAtagManager instead of an untranslated channel annotation. In this case, Step 6 will be omitted. 75 7. Finally, theDispatcher hands over the data item and the list of destinations to theNetwork- Stack for transmission. The operating system and compiler support for the platform on which DART is implemented heavily affects the design and implementation of the components, and the management of details of the control flow. For instance, a real-time operating system such as C/OS-II includes a preemptive priority-based scheduler and support for multithreading, which is not available in an operating system such as TinyOS for resource constrained sensor nodes. Also, if C/OS-II is the choice of operating system, the DART implementation (and the software synthesis process) will be affected by the target processor. The de- scription of the DART architecture and details of its control flow are hence intended to be a guide (template) for implementing system-level support for the ATaG programming model, with DART im- plementations on different sensor node platforms differing from another in the details. The third type of event – an invocation of theputFromNetwork() call by the receiver thread of theNetworkStack - is handled in much the same way as a local invocation, except that the Dis- patcher is not part of the loop. 4.4.3 Illustrative example Multi-hop routing Dispatcher NetworkStack NetworkArchitecture DataPool ATaGManager Sampler 1 2 4 3 5 6 Dispatcher NetworkStack NetworkArchitecture DataPool ATaGManager Sampler 9 8 7 Collector Figure 4.14: Data collection control flow at the sampler and collector nodes 76 In centralized data collection, aSampler task is hosted on each node of the network, and aCol- lector task is hosted on a single designated root node. TheSampler runs periodically and produces a data item that is to be routed to theCollector at the root node. The ATaG program for this behavior therefore consists of two abstract tasks and one data item. Figure 4.14 depicts the intra-node and inter-node flow of control whenever a data sample is created at a non-root node (left) and communicated to the root node (right). The individual steps have already been explained in the previous subsection. In this example, the invocation of aput() by theSampler only results in execution of six of the seven steps discussed in the earlier section. This is because the AtagManager does not invoke any task on that node, since no task dependent on the Sampler is mapped on the non-root node. When the data item arrives at the network interface of the root node, the NetworkStack adds it to the data pool, which leads to the scheduling of theCollector task that consumes the newly arrived data item sent by theSampler. 4.5 Future work A fully functional albeit simplified version of DART (DART-J) intended for single-machine simulation has been implemented in Java. DART-J has a modified network interface that communicates through sockets on the local host. Each instance of DART is also aware of the entire network architecture at startup (by reading from a file) and the protocols for neighborhood discovery, etc., are not required and not implemented. An ANSI C version of DART (DART-C) has also been partly implemented for a sensor node with a PIC18LF8720 microcontroller, 3840 bytes of RAM, 128KB of program memory, and 1KB of EEPROM. DART-C is designed for the MicroC/OS-II real-time operating system. Hardware design of the node, implementation of low level APIs, and software development of the runtime is proceeding in concert, and is not yet complete. We do not expect to implement an ATaG runtime on the TinyOS operating system in the near future primarily because of the lack of the prerequisite mechanisms required by DART to guarantee ATaG semantics. We believe that using a small footprint, 77 widely available component-based operating system that provides the necessary mechanisms is a better or as good an option as first implementing these mechanisms as a set of extra nesC modules for TinyOS and then layering the application-level task code on top of these modules. We now discuss some areas of future work for DART. These are in addition to the modifications to the DART design and implementation that will be required to support the proposed enhancements to ATaG (Section 3.6). 4.5.1 Lazy compilation of channel annotations The destination(s) of a particular data item produced on a node is indicated in the ATaG program in an architecture-independent manner. The actual translation of an annotation such as neighborhood- distance:10m into the list of nodes that fall within the 10m radius of a particular node in the network can take place at compile time or at run time. If the network deployment is static and known at design time, theAtagManager can be directly supplied with a list of source or destination IDs corresponding to input and output channels respectively. TheNetworkArchitecture does not need to maintain and update this information, thereby saving the resources required to run the necessary protocols. If the network is dynamic, this translation must happen at run time. The application developer does not care if the translation is eager (compile-time) or lazy (run-time), as long as the communication between tasks in the network occurs according to the scheme indicated in the ATaG program. Indeed, this is the essence of architecture independent high level programming. It also means that the decisions about lazy or eager evaluation of annotations, frequency of refreshing node state in a lazy evaluation scenario, etc., is entirely upto the compilation framework and the runtime system. One of the areas of future work in this context is to define a technique to minimize the cost of execu- tion (using a suitably defined metric) by selecting the evaluation policy for each annotation. The evalu- ation policy will determine whether the compilation of an annotation is eager, lazy, or a combination of 78 both. For lazy compilation, it will also determine how frequently theNetworkArchitecture will update the relevant information about the neighborhood. 79 Chapter 5 Programming and Software Synthesis This chapter describes the application development environment for ATaG. The declarative part of the ATaG program is specified graphically. The imperative part, consisting of the code associated with each abstract task and abstract data item, is provided by the user, with assistance from a code template gen- erator tool. Software synthesis, simulation, and visualization is performed by tools that are launched from the visual programming interface. The GUI is based on the Generic Modeling Environment tool- suite [22]. We first introduce the GME toolsuite and then describe how GME was used to implement a programming and software synthesis mechanism for ATaG. 5.1 Terminology Model integrated computing (MIC): MIC is an approach for development of complex systems that is based on capturing all the relevant system information in a structured form (models), and using the model information to drive a set of domain-specific tools for analysis and synthesis. Model: Models are abstractions that allow the representation and manipulation of various aspects of the underlying system. The set of parameters captured in the model depends entirely on the intended usage of the model information and the domain of application. The term ‘model’ is commonly used to refer to mathematical models that describe a system through a set of variables that represent properties 80 of interest, and a set of equations that describe the relationships between the variables. We use the term ‘model’ to denote structural models and not mathematical models. A domain-specific structural modeling language defines the basic building blocks that are available to the designer to describe a particular system in that domain. The domain-specific language also implicitly includes the semantics of each building blocks, and the semantics of relationships between the building blocks. Examples of relationships include association, containment, and physical connectivity. The Generic Modeling Environment (GME): GME is a configurable graphical toolsuite that supports MIC. The configuration of the environment to support domain-specific modeling is done in a formal manner through the use of metamodels. The metamodeling language is the UML class diagram notation. GME allows rapid creation of domain-specific modeling environments that are used by designers to describe systems in that domain, perform desired transformations on the model data, and drive external tools with the model information as input. Model interpreters are the software components that interface with the model database and manipulate and otherwise use the model information. 5.2 Meta-modeling for the ATaG domain 5.2.1 Objectives GME was used to create a programming and software synthesis for the ATaG model. The objective of the customized graphical programming environment was to provide the following basic capabilities: The ability to visually specify the declarative portion of the ATaG program. This means that the abstract task, abstract data, and abstract channel are the basic building blocks of the structural model of the ATaG program, and the annotations associated with each should also be specified (or selected from a list of pre-defined values). The ability to create a library of ATaG programs (also called ‘behaviors’) and compose larger applications by selecting and concatenating programs from this library. 81 The ability to visually specify the parameters of the target network deployment, such as the num- ber of nodes and the coordinates of each node, node identifiers, radio ranges, etc. The ability to create a library of network descriptions that will typically correspond to existing deployments, similar to the library of ATaG programs. The ability to indicate which set of ATaG programs is to be compiled on which of the network models, and to invoke the software synthesis tools for generating customized code to be down- loaded and deployed on each node in the selected target. A visual interface for drawing the ATaG program eliminates the need for the programmer to learn a new syntax and also makes it ease to comprehend the structure of an existing program. The ability to create libraries of behaviors and deployments allows reuse of existing applications as components of larger applications, and also allows the same application to be compiled for a different network. At the highest level of abstract, as will be shown in the following sections, ATaG programming translates into the selection of one or more behaviors from the program library, the selection of one network description from the deployment library, and the invocation of software synthesis tools integrated into the development environment. The software synthesis methodology itself is structured in such a way that the imperative portions of existing ATaG programs (i.e., the code associated with the tasks and data items) can be reused. Ultimately, this means that if a programmer wishes to merely combine existing behaviors to form a larger application, and compile it for one of the existing network descriptions, not a single line of code needs to be written. This feature is one of the biggest strengths of the ATaG model, and the best demonstration of the advantages of using the data driven programming paradigm for modularity and composability, with mixed imperative-declarative program specification for separation of concerns. The MetaGME paradigm that is used to specify the domain-specific metamodels provides basic building blocks that are used to define the structure of valid models in the target domain. Examples of the building blocks include atom, model, connection, reference, etc. The GME users’ manual [23] 82 explains the metamodeling and modeling concepts and processes in more detail. Here, we present the metamodels that are defined to create the ATaG programming environment. 5.2.2 Application model Deployment <<ModelProxy>> NumberOfNodes : field RadioRange : field SensingRange : field VirtualTopology : enum XRange : field YRange : field TreeRoot : field ATaGBehavior <<ModelProxy>> SensorNetworkApp <<Model>> DeploymentR <<Reference>> ATaGBehaviorR <<Reference>> 0..* 0..1 Figure 5.1: Sensor network application The modeling paradigm for the ATaG programming environment is defined as follows. As shown in Figure 5.1, the sensor network application consists of one or more behaviors and one network descrip- tion. All behaviors to be synthesized onto the target network are required to be part of this top level model. The individual behaviors are represented as models named ATaGBehavior. As mentioned in earlier sections, one of the advantages of the data driven paradigm is the com- posability of programs by literally concatenating sub-programs. This property allows the creation of libraries of ATaG programs for different behaviors, which can be easily composed into the desired application by the end user. To support this drag-and-drop composition of applications from existing 83 ATaG <<Model>> Task <<Model>> ExecutionPeriod : field FiringRule : enum InstantiationParameter : field InstantiationType : enum TaskID : field RunAtInit : bool DataItem <<Model>> Lifetime : field DataID : field JavaFile <<Atom>> Filename : field Selected : bool CFile <<Atom>> Filename : field Selected : bool OutputChannel <<Connection>> AddToLocal : bool Outgoing : enum Parameter : field InputChannel <<Connection>> Incoming : enum Parameter : field Behaviors <<Model>> 0..* 0..* 0..* 0..* 0..* 0..* src 0..* dst 0..* src 0..* dst 0..* 0..* 0..* 0..* Figure 5.2: Modeling paradigm for the ATaG program (declarative) libraries, we do not include the ATaGBehavior models directly into this high level model. Instead, the top level model contains references to behaviors and a reference to a network description. References act as pointers to other entities; in this case, the actual behaviors and the network description are stored separately in the library and the programmer includes the behaviors in the application by simply point- ing to it. In the figure, the ATaGBehaviorR entity is a reference to an ATaGBehavior model, and the DeploymentR entity is a reference to a Deployment model, each of which is now explained in more detail. The declarative portion of the ATaG program is described by instantiating the ATaG model. The structure of the model is shown in Figure 5.2. The model consists of Tasks and DataItems, corresponding to abstract tasks and abstract data items respectively. The annotations for tasks and data items are 84 Figure 5.3: Specifying annotations for tasks and data items specified as attributes of the models. As shown in the figure, attributes of the Task model include firing rule, type of instantiation, priority of the task, whether the task should be executed at node initialization, etc. The fact that an attribute is associated with a model does not necessarily mean that the programmer has to specify its value. Attributes such as TaskID and Priority could be computed at compile time for a particular application, and recorded in the model for inspection by the programmer. Note that the current version of the metamodel is a prototype primarily meant to demonstrate the power of visual programming with ATaG. Some attributes in the current model are placeholders for information that is not used by the mapping and software synthesis tools. As the programming paradigm evolves, the metamodels will evolve accordingly. One of the main attractions of using the GME toolkit for designing the ATaG programming en- vironment is the ease of modifying the modeling paradigm and automatically generating an updated 85 Deployment <<Model>> NumberOfNodes : field RadioRange : field SensingRange : field VirtualTopology : enum XRange : field YRange : field TreeRoot : field Networks <<Model>> SensorNode <<Atom>> NodeId : field Script : field YCoord : field XCoord : field 0..* 0..* Figure 5.4: Modeling paradigm for the network graphical modeling environment. Attributes of various types (boolean, integer, string, etc.) can be as- sociated with the metamodel entities (atoms, models, connections, references, etc.) by specifying them in the ’Attributes’ aspect of the metamodeling environment. Figure 5.3 shows the Attributes aspect of the ATaG metamodel. The FiringRule is an enum attribute, which means that a list of valid selec- tions is pre-specified in the metamodel. The lower right section of the GME window in Figure 5.3 shows the specification of allowable values for the firing rule, in accordance with the ATaG semantics in Section 3.4. 5.2.3 Network model The application developer describes a target network as an instance of the Deployment model. The structure of the Deployment model is shown in Figure 5.4. The description of the target deployment can be separated into network level parameters and node level parameters. Examples of network level parameters include: number of nodes, radio range (assuming all nodes have a fixed radio range), the real or virtual X and Y coordinate range of the localization system, etc. The set of parameters that are captured in our current metamodel are meant to be representative of the information that might be required for the compiler to synthesize an ATaG program on that deployment. By categorizing radio range as a network level parameter, we assume that all nodes have identical radios with fixed radio range, and hence the radio range can be specified at the network level 86 and not for the individual node. The X and Y coordinate range also implies that nodes are localized in a two dimensional space. The advantage of using a configurable modeling environment such as GME is that the metamodel (and hence the programming environment) can be easily modified by including additional network level or node level parameters as desired. 5.3 The programming interface This section describes the use of the visual programming environment configured by the GME meta- models discussed above. The sequence of steps to be followed by the programmer can be summarized as follows: 1. Create a library of behaviors, where each behavior is an instance of the ATaG model. Each instance of the ATaG model in turn consists of instances of abstract tasks, data items, and the input and output connections. If the desired behavior already exists in the library, this step can be omitted. 2. Create a library of network descriptions, where each description corresponds to the parameters of a different target deployment. A network description will consist of network level parameters such as the number of nodes, the scope of the coordinate system (if any) in terms of X and Y coor- dinate range, the availability of protocols for establishing virtual topology, etc. This information will be used to translate the ATaG annotations for that particular network, and will also be used to determine if a particular ATaG behavior selected by the programmer has a valid mapping onto the selected target deployment. For instance, if the ATaG program uses the annotations ‘parent’ and ‘children’ on the channels, the network description must indicate the availability of protocols to establish a virtual tree topology, and any parameters required by that protocol - such as the identity of the root node of the tree. Similar to the library of behaviors, if the desired network description already exists, this step can be omitted. 87 3. Select the set of behaviors to be included in the application, and the target network de- scription. This is accomplished by instantiating the desired number of references to behaviors (ATaGBehaviorR) - one for each behavior from the library - and one reference to the target net- work (DeploymentR). 4. Invoke the software synthesis tools. This process will be described in more detail in Section 5.5. Briefly, one of the intermediate steps in this process is the automatic generation of code skeletons for the abstract tasks and abstract data items, which should be filled in by the programmer. Even in this step, if the code for each abstract task already exists (e.g., if the task is part of a behavior already existing in the library), it can be reused. If such reuse is possible, the programmer is not required to write a single line of code in the entire process of ATaG programming. 5.4 A Case Study: Temperature Monitoring and Object Tracking We now illustrate the process of ATaG programming and software synthesis through a case study. In this case study, the programmer is interested in synthesizing an application consisting of two behaviors for a particular network deployment. The first behavior is temperature monitoring. The ATaG program for this was described in Sec- tion 3.5.2. Briefly, each node periodically compares its temperature reading with the reading of its neighboring nodes. If the gradient is above a certain threshold, an alarm notification is sent to a des- ignated root node. The abstract syntax of the ATaG program for this behavior is shown in Figure 5.5. Although the functionality of this ATaG program is same as that of the program in Figure 3.3, the struc- ture of the task graph is different. The current prototype of the DART runtime system does not support compound firing rules such as any-data OR periodic. Hence, it is not possible to define a single abstract task like theMonitor task in Figure 3.3 that is fired periodically for sampling its own reading as well as fired whenever a reading from any of the neighboring nodes is received. The same functionality is therefore accomplished as shown in Figure 5.5. 88 The TSampler task with a periodic firing rule samples the temperature on each invocation and disseminates it to the 1-hop neighborhood. TheMonitor task can now be defined with an any-data firing rule with an input channel that consumes data only from the local pool. TheMonitor uses static variables to store the received readings across invocations and at each invocation, performs the compar- isons required to detect a violation of the pre-defined gradient threshold. If the threshold is exceeded, theFire data item is produced, which is routed by the runtime system to theAlarmActuator on a designated root node (in this case, the node with ID 0). The concrete syntax of this program as modeled in GME is shown in Figure 5.6. Note that the the concrete syntax of the declarative part of the ATaG program is identical to the abstract syntax of the task graph. The programmer directly translates the task graph into the GME model by dragging, naming, and annotating the desired number of abstract tasks, data items, and channels into the modeling window. The ease of use this engenders is perhaps the most significant advantage of visual ATaG programming through GME. The second behavior is object tracking - the ATaG program for which was shown and explained in Figure 3.2. The abstract syntax of the program is also reproduced here (Figure 5.7). The GME model of the ATaG program for object tracking is shown in Figure 5.8. Again, the concrete syntax is identical to the abstract syntax. Each ATaG program thus defined forms part of a library of behaviors that can be reused in other ap- plications. Figure 5.9 shows a library of ATaG programs consisting of three behaviors: object tracking, gradient monitoring, and centralized data collection. Currently, the building blocks for each behavior are abstract tasks, data, and channels that are indicated by directed arrows between tasks and data items. This modeling paradigm developed for prototyping purposes is not ideal because some behaviors might include other behaviors too - in other words, the building blocks provided to the programmer should include abstract tasks, abstract data, and pointers (references) to other behaviors in the library. In the gradient monitoring program of Figure 5.6, notice that the pattern of communication implied by the 89 Temperature Monitor AlarmActuator TSampler Fire [any-data] [one-on-node-ID:0] [periodic:10] [nodes-per-instance:1] [any-data] [nodes-per-instance:1] 1-hop && local local local all-nodes Figure 5.5: Abstract syntax: Temperature gradient monitoring Figure 5.6: Concrete syntax: Temperature gradient monitoring 90 [one-on-node-ID:0] [any-data] Threshold TargetAlert [nodes-per-instance:1] [periodic:10] 10m local local [nodes-per-instance:1] [any-data] Leader- Elect all-nodes TargetInfo local Supervisor Figure 5.7: Abstract syntax: Object tracking by local leader election Figure 5.8: Concrete syntax: Object tracking by local leader election 91 Monitor,Fire, andAlarmActuator subgraph is centralized data collection. Similarly, the pat- tern of communication implied by LeaderElect, TargetInfo and Supervisor in the object tracking program of Figure 5.8 is also centralized data collection. The next version of the application modeling paradigm for ATaG will allow the programmer to integrate existing behaviors (such as the centralized data collection behavior shown as CentralizedDC in Figure 5.9) into other behaviors to maximize reuse. This composition is illustrated in Figure 5.10. The next version of the modeling paradigm will allow the programmer to perform such composition through appropriate building blocks in the GME interface. Next, the target network is described by instantiating a Deployment model and setting the parameter values to match the target deployment. As shown in the metamodel of Figure 5.4 a Deployment consists of one or more atoms of type SensorNode. Node level parameters are specified as attributes of SensorN- ode while network level parameters are specified for the model Deployment. The set of attributes can be easily increased or otherwise modified depending on the information required by the particular tools to be driven through the GME framework. Figures 5.11 and 5.12 show the library of deployment descrip- tions and the details of one particular 9-node deployment respectively. Network level and node-level parameters for this example are shown in the lower right sections of the GME windows. The library of ATaG programs in GME consists only of the declarative portions - i.e., the number of abstract tasks, data, and channels, and their annotations. The code associated with each abstract task is to be provided separately as a Java class that extends theUserTask class of DART (see Section 4.3.1). The developer of an ATaG behavior that is contributed to the library is also expected to provide the Java classes associated with the abstract tasks. Given the library of ATaG programs and the library of deployment descriptions, defining and syn- thesizing a networked sensing application is straightforward. The application is defined as an instance of the SensorNetworkApp model (Figure 5.1) that consists of one or more references (pointers) to ATaG programs and one reference to a deployment description. 92 Figure 5.9: Library of ATaG programs (behaviors) Temperature Monitor Collector TSampler Fire [any-data] [one-on-node-ID:0] [periodic:10] [nodes-per-instance:1] [any-data] [nodes-per-instance:1] 1-hop && local local local all-nodes Temperature Monitor AlarmActuator TSampler Fire [any-data] [one-on-node-ID:0] [periodic:10] [nodes-per-instance:1] [any-data] [nodes-per-instance:1] 1-hop && local local local all-nodes Producer [any-data] [nodes-per-instance:1] Figure 5.10: Composing ATaG programs from existing libraries 93 Figure 5.11: A library of deployments Figure 5.12: A network of 9 nodes 94 Figure 5.13 is an ATaG program that contains two behaviors from the library - object tracking and gradient monitoring. This program is specified by instantiating oneATaGBehaviorR reference for each behavior, linking each reference to its target behavior, instantiating oneDeploymentR reference to the target deployment description and linking it to the desired 20-node deployment. Since the two component behaviors of the program are part of the library, the application code will also be available. Hence, the application developer is not required to write any new code or draw any new ATaG diagrams. Such an interface can be used by end users who have no expertise or knowledge of ATaG, Java, or the lower level aspects of sensor networking. Figure 5.13: Application as a set of behaviors mapped onto one deployment 95 5.5 Synthesis, simulation, and visualization The software that runs on each node of an ATaG-programmed system consists of: (i) user-supplied code for each abstract task and abstract data item, (ii) components of the runtime system that are independent of the particular ATaG program being synthesized, and (iii) components of the runtime system that need to be customized for the particular ATaG program being synthesized. Examples of the ‘standard’ DART components are the DataPool 1 , the Dispatcher, the NetworkArchitecture 2 , and the NetworkStack 3 . The AtagManager has to be customized for the application because the information it maintains includes the task and channel annotations, and handles to the user level code. Software synthesis is performed through model interpreters in the GME environment. Model inter- preters are software components that can access the information entered graphically by the user by using an API provided by the GME toolsuite. The building blocks - such as Atom, Model, Reference, and Connection - provided by the GME metamodeling environment do not have associated domain-specific semantics. It can also be argued that the building blocks - such as ATaGBehaviorR, Deployment, Sen- sorNode, Task, and Data - provided by the domain-specific modeling environment also do not have any inherent semantics except in the mind of the programmer. It is the model interpreters written for a particular modeling paradigm that encapsulate the semantics of the domain by suitably interpreting the model components and parameters to accomplish the desired domain-specific objective. In our case, the objectives of model interpretation are: 1 If dynamic memory allocation is not supported on the target platform, the data structures ofDataPool need to be customized for the abstract data items that will be generated and consumed on that node - theDataPool needs to be customized accordingly. However, in the current implementation of DART in Java, this customization is not required because the data pool stores data items as instances of a generic classDataItem. 2 The current implementation of DART for single-machine simulation purposes does not require customization of the services of this component because the protocols for neighborhood maintenance and topology formation are replaced with equivalent code that reads from a configuration file on the disk and obtains the topology information. In a ‘deployable’ version of DART, some code in this component is likely to be customized for the requirements of the ATaG program. 3 If some sensor nodes in the target system have a wired network connection while others communicate through a variety of wireless network interfaces, the suitable NetworkStack will have to be selected for each node, based on the information provided in the network model. Currently, this component requires no customization because the network interfaces are assumed to be homogeneous, and no per-node optimization at compile time or run time is performed in this component. 96 1. to allow the application developer to visualize the network deployment in two dimensions and also inspect node connectivity and sensing coverage, 2. to generate code skeletons for each abstract task (if required) for the user to populate with application-specific code, 3. to customize DART components such as theAtagManager, and 4. to generate files and scripts to configure and launch the simulation and visualization environment. The compilation and software synthesis process is started by invoking a single model interpreter - the initial dialog box is shown in Figure 5.14. Similar dialog boxes guide the user through the process. If the application developer wants to visualize the deployment, a display similar to the one shown in Figure 5.15. The visualization is required because the GME model for a deployment is basically a container for Atoms of type SensorNode. Inspecting the GME model does not give an idea of the distribution of the nodes in the (two dimensional) field, the connectivity of the network as determine by the communication range of each transceiver, and the degree of coverage of each type of sensing interface in the network. The model interpreter then generates code skeletons for each abstract task in the application, if required. If a new ATaG behavior is being developed and the associated code is therefore to be written, the developer can create a dummy application by including only the ATaG behavior being created, and choosing to generate code skeletons for the abstract tasks and data items. The code synthesizer analyzes the I/O dependencies between abstract tasks and data, and the firing rules for the abstract task and generates a generic code skeleton as shown in Figure 5.16. The programmer can then add application-specific code to the body of the Java class, define static variables to store state information across invocations, etc. The remainder of the software synthesis consists of customizing the constructor of theAtagManager (see Figure 4.8) and generating configuration files that provide the basic startup information to each DART process when it is launched as part of the simulation. 97 Figure 5.14: Invoking the GME model interpreters Figure 5.15: GME model interpreter: Network visualization 98 package atag.application; import atag.runtime.*; import atag.sensor.*; import atag.runtime.config.*; import visualizer.*; public class SampleAndThreshold implements Runnable { private DataPool m_dataPool; private DataItem m_dataitem; private Config m_myState; private Sensor m_aSensor; private NetworkArchitecture m_networkArchitecture; private GUI m_GUI; private GUIMessage m_guiMessage; public SampleAndThreshold(DataPool dp, Config myconfig, NetworkArchitecture t_networkArchitecture, GUI t_GUI) { m_dataPool = dp; m_myState = myconfig; m_networkArchitecture = t_networkArchitecture; m_GUI = t_GUI; } public void run() { try { for(;;) { /* Write output data items */ Thread.sleep(1000); } //end for } catch (InterruptedException e) { return; } } } Figure 5.16: Automatically generated skeleton code for an abstract task. 99 Figure 5.17 is a screenshot of the simulation control and visualization interface. The application being simulated is object tracking and gradient monitoring on a 20-node network. Each of the twenty nodes in this network is simulated by an independently launched Java process. There is no central coordination or synchronization. Each process reads its own situatedness information (such as its ID) from its configuration file. All processes are assigned socket IDs on the localhost. The simulation version of DART has a modified sensor interface. When the user task makes a call to the sensor interface to read the current value of the sensor, the corresponding DART module reads the sensor value from a file. The visualization interface allows direct manipulation of the the values of the virtual sensor readings through the slider bar shown in the figure. Two types of sensor interfaces - acoustic and temperature - are currently supported and the value of each of them can be independently varied for each node. The screenshot of Figure 5.17 shows the values of the acoustic sensors at each node as bracketed integers below the circle representing the node. An object tracking mode is also supported for the acoustic sensor. When the object tracking mode is activated, the movement of the cursor simulates the movement of the object. Readings of acoustic sensors on sensor nodes within a certain range of the object (cursor) position are automatically adjusted in inverse proportion to the distance of the target from the node. Any such manipulation of the acoustic or temperature sensor reading through the graphical interface is reflected in the file that will be read by the sensor module when it is next sampled by one of the tasks on the simulated node. The application-level tasks and other DART components communicate with the visualization inter- face so that phenomena of interest at the application level or system level can be observed. For instance, the circle around node 14 in the screenshot indicates that the node (which is nearest to the cursor/object) has elected itself the leader and ‘acquired’ the object in accordance with the ATaG program for object tracking. Similarly, nodes 0 and 15 have detected a temperature gradient anomaly and reported the same to the root node. The readings shown below the sensor nodes in this screenshot are zero because the 100 Figure 5.17: Simulation and visualization: Object tracking and gradient monitoring 101 acoustic readings are being displayed and not the temperature readings. DART components can also send messages to this interface, which are displayed in the message log pane. The purpose of designing this graphical interface is to be able to evaluate the functionality of the distributed software system that is generated from the GME-based ATaG programming interface. The simulation is realistic in the sense that there is no central coordination and all the processes representing the various nodes execute in their own independent thread on the simulation machine. At startup, each process polls the socket IDs that other nodes are expected to open for receiving messages, and the initialization of the NetworkStack on each DART process completes only after it verifies that all other node processes are running. The component-based design of DART insulates other components of the system from the implementation details of the network stack and their behavior is independent of the fact that the processes are communicating through sockets on the same machine and not through a real network interface. 102 Chapter 6 Concluding Remarks The Abstract Task Graph is an attempt at defining a programming model and methodology that en- ables application developers to focus on the high level structure of collaborative computation without worrying about the details of the target sensor network deployment. It is based on the belief that ease of application development will ultimately determine the penetration of networked sensor systems into everyday life, and it can be achieved not just by defining more and more protocols for different aspects of networked sensing but by also providing frameworks where a selection of existing protocols can be packaged and provided as services through an integrated application development environment. In the following two sections, we comment on the role of ATaG as a framework for defining architecture-independent programming languages for specific application domains, and also as an ex- tensible framework for integrating a variety of compilation and software synthesis tools for multiple platforms and driving their execution from a single application development environment. 6.1 Domain-specific application development ATaG is based on two concepts. The first is data driven computing which provides a natural mental model for specifying reactive behaviors, and has other significant benefits from a software development perspective such as composability and reusability. The second concept which is the key to architecture 103 independence at the network level is the use of declarative task and channel annotations to specify the placement of functionalities and the patterns of interaction between functionalities. The task and channel annotations currently defined for ATaG and summarized in Tables 3.1 and 3.2 are merely meant to illustrate the power of declarative programming with ATaG. The choice of anno- tations was influenced by our desire to express patterns of interaction that form the building blocks of in-network computation in oft cited behaviors such as object tracking and environment monitoring. The annotations are not intended to be an exhaustive list and we expect that they will be modified to suit the particular application domain and the services available in the target deployment. For instance, the current set of task annotations allows placement based on node IDs or locations. This can be general- ized to placement based on context labels. The idea of context labels is employed in EnviroTrack [1] as a mechanism to address sensor nodes and also to host context-specific computation. The idea behind context labels is to allow the user to specify dynamic behaviors based on the current state of a node. The fraction of total energy reserves currently remaining in the node can be considered as a context. This context can be used as a task annotation to specify alternate implementations of the same task and tag each implementation with the context of its invocation. This can be used to adapt the computation to the amount of available energy and provide graceful degradation of functionality where possible. Other interpretations of the context of a node can be used to trigger specific behaviors only if other behaviors are activated on neighboring nodes. For instance, the programmer could want task A to start executing on a node only when at least 50% of its 1-hop neighborhood are executing task B. This will require a context label for each node that indicates whether task B is executing on that node, and a context label that indicates whether 50% of the node’s 1-hop neighborhood has the context label indicating task B. The point of these examples is to show that ATaG can be customized to a particular domain by defining task and channel annotations relevant to that domain. The requirement for defining a new domain-specific annotation is the existence of a mechanism to translate the annotation into a set of 104 parameters used to customize DART, and the availability of all the relevant information in the network model provided to the compiler. 6.2 Compilation and software synthesis Just as the extensible set of ATaG annotations form a framework for domain-specific customization of the declarative part of ATaG, the component-based design of DART can be considered to be a framework for integrating a variety of protocols proposed for sensor network applications. The purpose of this integration is to ultimately provide an end-to-end application development methodology that allows an application developer to use these protocols (explicitly or implicitly) for a real world application without necessarily knowing the details of their implementation, or even of their existence. A critical part of this end-to-end methodology that is only superficially addressed in this work is the ATaG compiler. The high level concept of compilation of a networked sensing application can be defined as the translation of a service-oriented specification or a macroprogramming language into an ‘equivalent’ distributed software system to be deployed on a target network. The exact algorithms used for compilation, the structure of the compilation process, and the scope for compile-time and run-time optimization, however, depends entirely on the particular programming model and runtime system. The contribution of ATaG and DART and to some extent, of the GME based visual programming and software synthesis environment, is to create a framework for compilation and software synthesis in the following sense. Each annotation (or a group of annotations) have a well defined association with a particular module or configuration parameter in the DART design. For instance, the result of compiling the task annotation nodes-per-instance: k for some abstract task T is that approximately of theAtagManager modules in the system will have the assignment flag for task T set totrue. Channel annotations are also suitably encoded into each node as DART configuration parameters. Every such translation of a task and channel annotation into configuration parameters for DART on some or all nodes in the network can be considered as an independent compilation problem. For instance, the 105 issue of optimal sensing coverage has been the focus of much research in distributed sensing. A version of the coverage problem of special interest in the context of ATaG is the static or dynamic selection of a set of sensors of a particular type, from among all sensors of that type in the network, such that the degree of coverage desired by the application developer is guaranteed with high probability. In the ATaG model, the selection of sensors could effectively translate into the selection of a set of nodes on which the sensing tasks (which are abstract tasks in the graphs) will be instantiated. The job of the compiler in this case is to interpret the high level intent of the programmer as specified through suitably defined task annotations and assign the sensing tasks to a particular set of nodes. The algorithm used to select this set of nodes will reflect the quality of the compilation by affecting the communication and computation cost that is engendered in the deployment. The choice of the Generic Modeling Enviornment (GME) for providing the visual programming interface as well as integrating the different tools for software synthesis, simulation, etc., is particularly felicitous from the perspective of the compilation problem. GME allows plug and play integration of software components called model interpreters. Each model interpreter when invoked can access all information about the current model which, in our domain, includes the library of behaviors, deploy- ments, and the application to be synthesized. A model interpreter for synthesizing the code skeletons for abstract tasks and data items inspects the I/O relationships between tasks and data to generate the suitableget() orput() calls, the names of the tasks and data items to generate the names of the java classes, and the firing rules for the abstract tasks to generate a suitably timed loop for periodic execution if specified by the firing rule. Other model interpreters will read the model information relevant to their own specific function. The compiler is just another (set of) model interpreter that reads the relevant annotations from the model database and performs the appropriate transformations either on the model itself or on external objects such as the DART code for a particular node. This flexibility also makes it possible for the same programming environment to seamlessly support a set of compilers and software synthesizers, each for a different target platform. 106 In summary, the contribution of ATaG is the definition of an extensible language, runtime system, and compilation framework that can be tailored to different application domains, network architectures, performance metrics, and sensor node platforms depending on the requirements of the end user. The work described in this document is a specific instance of this general framework. 107 Bibliography [1] T. Abdelzaher, B. Blum, Q. Cao, D. Evans, J. George, S. George, T. He, L. Luo, S. Son, R. Stoleru, J. Stankovic, and A. Wood. EnviroTrack: An Environmental Programming Model for Tracking Applications in Distributed Sensor Networks. In Proceedings of International Conference on Distributed Computing Systems (ICDCS), 2004. [2] T. F. Abdelzaher and K. G. Shin. Optimal combined task and message scheduling in distributed real-time systems. In 16th IEEE Real-time Systems Symposium, pages 162–171, December 1995. [3] I.F. Akyildiz, W. Su, Y . Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: A survey. Computer Networks, 38:393–422, 2002. [4] S. Ali, A. A. Maciejewski, H. J. Siegel, and J-K. Kim. Robust resource allocation for sensor- actuator distributed computing systems. In International Conference on Parallel Processing (ICPP), 2004),. [5] Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing: The data flow solution. Computation Structures Group Memo 226-2, Laboratory for Computer Science, Massachusetts Institute of Technology, July 1983. [6] A. Bakshi, V . K. Prasanna, J. Reich, and D. Larner. The abstract task graph: A methodology for architecture-independent programming of networked sensor systems. In Workshop on End-to-end Sense-and-respond Systems (EESR), June 2005. [7] A. Bakshi, M. Singh, and V . K. Prasanna. Constructing topographic maps in networked sensor systems. In AlgorithmS for Wireless And mobile Networks (A-SWAN), August 2004. [8] H. E. Bal, J. G. Steiner, and A. S. Tanenbaum. Programming languages for distributed computing systems. ACM Computing Surveys, 21(3):261–322, September 1989. [9] G. Banavar, J. Beck, E. Gluzberg, J. Munson, J. Sussman, and D. Zukowski. Challenges: An Application Model for Pervasive Computing. In 6th Annual ACM/IEEE Intl. Conf. on Mobile Computing and Networking, 2000. [10] K. Bondalapati and V . K. Prasanna. Dynamic precision management for loop computations on reconfigurable architectures. In IEEE symposium on field-programmable custom computing ma- chines (FCCM). [11] N. Busi, A. Rowstron, and G. Zavattaro. State- and event-based reactive programming in shared dataspace. In Coordination ’02, April 2002. [12] I. Chatzigiannakis, G. Mylonas, and S. Nikoletseas. jWebDust: A Java-based generic application environment for wireless sensor networks. In International Conference on Distributed Computing in Sensor Systems (DCOSS), June 2005. 108 [13] E. Cheong and J. Liu. galsC: A language for event-driven embedded systems. In Proceedings of Design, Automation and Test in Europe (DATE), 2005. [14] C. Curino, M. Giani, M. Giorgetta, A. Giusti, G. P. Picco, and A. L. Murphy. Tiny Lime: Bridging mobile and sensor networks through middleware. In 3rd IEEE Intl Conf on Pervasive Computing and Communications, 2005. [15] T. Eicken, D. Culler, S. Goldstein, and K. Schauser. Active messages: A mechanism for integrated communication and computation. In 19th Intl. Symposium on Computer Architecture, 1992. [16] J. Elson and D. Estrin. Time synchronization in wireless sensor networks. In International Paral- lel and Distributed Processing Symposium (IPDPS), Workshop on Parallel and Distributed Com- puting Issues in Wireless and Mobile Computing, April 2001. [17] J. Elson, L. Girod, and D. Estrin. Fine-grained network time synchronization using reference broadcasts. In Proc. of the Fifth Symposium on Operating Systems Design and Implementation (OSDI), December 2002. [18] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting the physical world with pervasive networks. IEEE Pervasive Computing, pages 59–69, 2002. [19] D. Ganesan, A. Cerpa, Y . Yu, W. Ye, J. Zhao, and D. Estrin. Networking issues in wireless sensor networks. Journal of Parallel and Distributed Computing (JPDC), 64(7):799–814, July 2004. [20] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler. The nesC language: A holistic approach to networked embedded systems. In Proceedings of Programming Language Design and Implementation (PLDI), 2003. [21] D. Gelernter. Generative communication in Linda. ACM Transactions on Programming Lan- guages and Systems, 7(1):80–112, 1985. [22] The Generic Modeling Environment, http://www.isis.vanderbilt.edu/projects/gme. [23] GME Users’ Manual, http://www.isis.vanderbilt.edu/Projects/gme/documentation.html. [24] R. Govindan. Data-centric Routing and Storage in Sensor Networks. (book chapter) Wireless Sensor Networks, Eds. T. Znati, K. Sivalingam and C. S. Raghavendra, 2004. [25] R. Gummadi, O. Gnawali, and R. Govindan. Macro-programming wireless sensor networks using kairos. In Intl. Conf. Distributed Computing in Sensor Systems (DCOSS), June 2005. [26] D. Harel and A. Pnueli. On the development of reactive systems. pages 477–498, 1985. [27] S. Haridi, P. Van Roy, P. Brand, and C. Schulte. Programming languages for distributed applica- tions. New Generation Computing, 16(3):223–261, 1998. [28] W.B. Heinzelman, A.L. Murphy, H.S. Carvalho, and M.A. Perillo. Middleware to support sensor network applications. IEEE Network, January 2004. [29] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture direc- tions for networked sensors. In 9th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2000. [30] O. Holder, I. Ben-Shaul, and H. Gazit. Dynamic layout of distributed applications in FarGo. In 21st Intl. Conf. Software Engineering, 1999. 109 [31] B. Hong and V . K. Prasanna. Constrained flow optimization with applications to data gathering in sensor networks. In First International Workshop on Algorithmic Aspects of Wireless Sensor Networks (ALGOSENSORS), July 2004. [32] B. Hong and V . K. Prasanna. Optimizing system life time for data gathering in networked sensor systems. In AlgorithmS for Wireless and Ad-hoc Networks (A-SWAN) (Held in conjunction with MobiQuitous 2004), August 2004. [33] S. S. Iyengar and R. R. Brooks, editors. Distributed Sensor Networks. Chapman & Hall/CRC, December 2004. [34] S. S. Iyengar and L. Prasad. A general computational framework for distributed sensing and fault- tolerant sensor integration. IEEE Transactions on Systems, Man and Cybernetics, 25(4):643–650, April 1995. [35] B. Karp and H. T. Kung. GPSR: Greedy perimeter stateless routing for wireless networks. In Proc. ACM/IEEE MobiCom, August 2000. [36] D. L. Larner. A distributed, operating system based, blackboard architecture for real-time control. In International conference on Industrial and engineering applications of artificial intelligence and expert systems, 1990. [37] J. Liu, M. Chu, J. Liu, J. Reich, and F. Zhao. State-centric programming for sensor-actuator network systems. In IEEE Pervasive Computing, 2003. [38] J. Liu and F. Zhao. Towards service-oriented networked embedded computing. Technical Report MSR-TR-2005-28, Microsoft Research, February 2005. [39] T. Liu and M. Martonosi. Impala: A middleware system for managing autonomic, parallel sensor systems. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003. [40] S. Madden, R. Szewczyk, M. Franklin, and D. Culler. Supporting aggregate queries over ad-hoc wireless sensor networks. In Workshop on Mobile Computing and Systems Applications, 2002. [41] uC/OS-II RTOS, http://www.ucos-ii.com/,. [42] B. Nath and D. Niculescu. Routing on a curve. In HOTNETS-I, October 2002. [43] N.Bulusu, J.Heidemann, and D.Estrin. Gps-less low cost outdoor localization for very small de- vices. IEEE Personal Communications Magazine, pages 28–34, March 2000. [44] R. Newton and M. Welsh. Region streams: Functional macroprogramming for sensor networks. In 1st Intl. Workshop on Data Management for Sensor Networks (DMSN), 2004. [45] P. Nii. The blackboard model of problem solving. AI Magazine, 7(2), 1986. [46] G. J. Pottie and W. J. Kaiser. Wireless integrated network sensors. Communications of the ACM, 43(5):51–58, 2000. [47] K. Ramamritham. Allocation and scheduling of complex periodic tasks. In International Confer- ence on Distributed Computing Systems, pages 108–115, 1990. [48] R. S. Ramanujan, J. C. Bonney, K. J. Thurber, R. Jha, and H. J. Siegel. A framework for au- tomated software partitioning and mapping for distributed multiprocessors. In 2nd International Symposium on Parallel Architectures, Algorithms, and Networks, pages 138–145, June 1996. 110 [49] A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica. Geographic routing without location infor- mation,. [50] Real Time Specification for Java, http://www.rtj.org/,. [51] Stargate: A PlatformX project http://platformx.sourceforge.net/. [52] C. Szyperski. Component-Oriented Software, Beyond Object-Oriented Programming. Addison- Wesley, 1997. [53] V . D. Tran, L. Hluchy, and G. T. Nguyen. Data driven graph: A parallel program model for scheduling. In Proc. 12th Intl. Workshop on languages and Compilers for Parallel Computing, pages 494–497, 1999. [54] M. Turon and J. Suh. MOTE-VIEW: A sensor network monitoring and management tool. In 2nd IEEE Workshop on Embedded Network Sensors (EmNets), May 2005. [55] M. Welsh and G. Mainland. Programming sensor networks using abstract regions. In First USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), March 2004. [56] K. Whitehouse, C. Sharp, E. Brewer, and D. Culler. Hood: a neighborhood abstraction for sensor networks. In 2nd Intl. Conf. on Mobile systems, applications, and services, 2004. [57] K. Whitehouse, F. Zhao, and J. Liu. Semantic streams: a framework for declarative queries and automatic data interpretation. Technical Report MSR-TR-2005-45, Microsoft Research, April 2005. [58] D. Wu, B. M. Al-Hashimi, and P. Eles. Scheduling and mapping of conditional task graphs for the synthesis of low power embedded systems. In Proceedings of Design, Automation and Test in Europe (DATE), 2003. [59] T. Yang and C. Fu. Heuristic algorithms for scheduling iterative task computations on distributed memory machines. IEEE Transactions on parallel and Distributed Systems, 8(6), June 1997. [60] W. Ye, J. Heidemann, and D. Estrin. An energy-efficient MAC protocol for wireless sensor net- works. Technical Report ISI-TR-543, USC/ISI, 2001. [61] Y . Yu, B. Krishnamachari, and V . K. Prasanna. Energy-latency tradeoffs for data gathering in wireless sensor networks. In Proceedings of INFOCOM, 2004. [62] Y . Yu, B. Krishnamachari, and V . K. Prasanna. Issues in designing middleware for wireless sensor networks. IEEE Network, 18(1), 2004. [63] F. Zambonelli and M. Mamei. Spatial computing: a recipe for self-organization in distributed computing scenarios. In Intl. Workshop on Self-* Properties in Complex Information Systems, 2004. [64] J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoring wireless sensor net- works. In International Conference on Communications (ICC), Workshop on Sensor Network Protocols and Applications, May 2003. 111
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Adaptive routing services in ad-hoc and sensor networks
PDF
Design of wireless sensor network based structural health monitoring systems
PDF
Compression, correlation and detection for energy efficient wireless sensor networks
PDF
Applying aggregate-level traffic control algorithms to improve network robustness
PDF
Directed diffusion: An application -specific and data -centric communication paradigm for wireless sensor networks
PDF
Hyperbolic geometry of networks
PDF
An efficient design space exploration for balance between computation and memory
PDF
Energy efficient hardware-software co-synthesis using reconfigurable hardware
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Application-specific external memory interfacing for FPGA-based reconfigurable architecture
PDF
Energy -efficient information processing and routing in wireless sensor networks: Cross -layer optimization and tradeoffs
PDF
Architectural support for network -based computing
PDF
Distributed resource allocation in networks for multiple concave objectives
PDF
Algorithms for performance and trust in peer -to -peer systems
PDF
Improving memory hierarchy performance using data reorganization
PDF
Characterizing Internet topology, routing and hierarchy
PDF
Energy latency tradeoffs for medium access and sleep scheduling in wireless sensor networks
PDF
Diagnosis and localization of interdomain routing anomalies
PDF
Architectural support for efficient utilization of interconnection network resources
Asset Metadata
Creator
Bakshi, Amol B.
(author)
Core Title
Architecture -independent programming and software synthesis for networked sensor systems
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Prasanna, Viktor (
committee chair
), Govindan, Ramesh (
committee member
), Khrishnamachari, Bhaskar (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-473073
Unique identifier
UC11336587
Identifier
3219853.pdf (filename),usctheses-c16-473073 (legacy record id)
Legacy Identifier
3219853.pdf
Dmrecord
473073
Document Type
Dissertation
Rights
Bakshi, Amol B.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA