Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
(USC Thesis Other)
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
LANGUAGE ABSTRACTIONS AND PROGRAM ANALYSIS TECHNIQUES TO BUILD RELIABLE, EFFICIENT, AND ROBUST NETWORKED SYSTEMS by Nupur Kothari ADissertation Presentedtothe FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2010 Copyright 2010 Nupur Kothari Acknowledgements I would like to thank my advisors Prof. Ramesh Govindan and Prof. Todd Millstein for giving me the opportunity to work on exciting and diverse problems. They gave me the freedom to explore and think independently, while ensuring that I had their guidance when I needed it. Without their unflagging support and encouragement, and their invaluable inputs, this dissertation would not have been possible. I would also like to thank the wonderful folks I was fortunate enough to collaborate with. Pleiades was joint work with Ramakrishna Gummadi. MAX was joint work with Ratul Mahajan and Madanlal Musuvathi. Apart from my dissertation research, I collaborated on various other projects with Prof. Vijay Raghunathan, Florin Sultan, Kiran Nagaraja, Aman Kansal, Jie Liu, and Feng Zhao. My PhD experience was enriched by an excellent set of lab-mates at ENL who were wells of information on a multitude of topics, research related and otherwise. Life at USC would have been extremely dull and boring without them and my co-dwellers of Smurfville, who kept me motivated and excited, and were an important part of my life in Los Angeles. I would like to thank my family for their patience and continued support over the years. This dissertation could not have been written without the understanding, support and encouragement of my husband Rajat. ii TableofContents Acknowledgements ii ListofTables v ListofFigures vi Abstract viii Chapter1: Introduction 1 1.1 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter2: Pleiades 8 2.1 The Pleiades Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 Design Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.2 Parking Cars with Pleiades . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2.1 A Centralized nesC Implementation . . . . . . . . . . . . . . 19 2.1.2.2 A Distributed nesC Implementation . . . . . . . . . . . . . . . 23 2.1.3 Other Features of Pleiades . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.1 Program Partitioning and Migration . . . . . . . . . . . . . . . . . . . . 28 2.2.1.1 Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1.2 Control Flow Migration. . . . . . . . . . . . . . . . . . . . . . 32 2.2.2 Serializable Execution ofcfors . . . . . . . . . . . . . . . . . . . . . . 33 2.2.2.1 Distributed Locking. . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.2.2 Distributed Deadlock Detection and Recovery. . . . . . . . . . 38 2.2.2.3 Failure Detection and Recovery . . . . . . . . . . . . . . . . . 41 2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter3: FSMGen 49 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.1 FSMs as abstractions of TinyOS components . . . . . . . . . . . . . . . 53 3.1.2 Deriving FSMs from TinyOS programs . . . . . . . . . . . . . . . . . . 56 3.2 Symbolic Execution for nesC/TinyOS . . . . . . . . . . . . . . . . . . . . . . . 58 iii 3.2.1 Basic Symbolic Execution . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.2 Handling features of nesC and TinyOS . . . . . . . . . . . . . . . . . . 61 3.2.3 Constraint Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3 Deriving State Machines withFSMGen . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1 Predicate Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 Generating the FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.3 Minimizing the FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter4: MAX 78 4.1 Manipulation Attacks: Definition and Challenges . . . . . . . . . . . . . . . . . 81 4.2 Exploring Manipulation Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 MAX Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.4.1 Exploring manipulation attacks in TCP. . . . . . . . . . . . . . . . . . . 102 4.4.2 Exploring manipulation attacks in 802.11 MAC. . . . . . . . . . . . . . 106 4.4.3 Exploring ECN manipulation attacks. . . . . . . . . . . . . . . . . . . . 110 4.4.4 Computational Footprint. . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter5: LiteratureReview 114 5.1 Programming Languages in Parallel and Distributed Computing . . . . . . . . . 114 5.1.1 Embedded and Sensor Networks Languages . . . . . . . . . . . . . . . . 114 5.1.2 Concurrent and Distributed Systems . . . . . . . . . . . . . . . . . . . . 116 5.1.3 Parallel Processing Languages . . . . . . . . . . . . . . . . . . . . . . . 117 5.2 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.3 Tools to Detect Errors and Network Attacks . . . . . . . . . . . . . . . . . . . . 120 Chapter6: Conclusions 122 Bibliography 124 iv ListofTables 1.1 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1 Computational Footprint of MAX . . . . . . . . . . . . . . . . . . . . . . . . . . 112 v ListofFigures 2.1 A street-parking application in Pleiades. . . . . . . . . . . . . . . . . . . . . . 14 2.2 Car Parking in nesC: Centralized Implementation – Reliable but Inefficient . . . 20 2.3 Car Parking in nesC – Distributed Implementation: Efficient but Unreliable . . . 21 2.4 Algorithm for determining nodecuts. . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Nodecuts generated for the street-parking example. . . . . . . . . . . . . . . . . 31 2.6 cfor execution and locking algorithm. . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7 Deadlock detection and recovery algorithm. . . . . . . . . . . . . . . . . . . . . 40 2.8 PEG performance comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.9 Street parking latency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.10 Street parking message cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 FSM embedded within Surge code . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 FSM derived manually for Surge . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 Structure ofFSMGen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4 FSM for the RfmToLeds Application . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5 A state transition as generated byFSMGen for the RfmToLeds FSM . . . . . . . 70 3.6 FSM for the Surge Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 vi 3.7 FSM for MultiHopEngine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.8 FSM for FTSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.9 FSM for TestNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1 Code for ECN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Overview of MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3 Control Flow Graph for ECN Example. . . . . . . . . . . . . . . . . . . . . . . 94 4.4 Experimental Setup for Attack Emulation . . . . . . . . . . . . . . . . . . . . . 100 4.5 Results of MAX for Daytona . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.6 Results of MAX for 802.11 (NA V Attack) . . . . . . . . . . . . . . . . . . . . . 107 4.7 Results of MAX for 802.11 (RTS Attack) . . . . . . . . . . . . . . . . . . . . . 109 vii Abstract Networked systems have an important role in our lives. Ranging from the Internet to new and upcoming domains like wireless sensor networks, smart-phones and data-centers, they are trans- forming the way we use computing. For networked systems to be of practical use, they need to be reliable,efficient,androbust. Building such systems poses a number of programming challenges. In this dissertation, we show that it is possible to adapt analysis and design techniques from the programming languages community and combine them with domain knowledge and simple user insights to address the programming challenges for specific network domains . We present tools and techniques to simplify the task of building reliable, efficient, and robust networked systems for the domains of wireless sensor networks and network protocols. We intro- duce Pleiades, a centralized programming framework for wireless sensor networks. We present FSMGen, a tool for wireless sensor networks, that automatically derives user-understandable and compact finite state machines from TinyOS programs. We also describe MAX, a tool that explores network protocol implementations written in C for vulnerability to manipulation attacks. These tools and systems address various programming challenges faced by developers of wireless sensor networks and network protocols. We demonstrate their utility and benefits via detailed evaluation and experiments under realistic conditions. viii Chapter1 Introduction Networked systems are an integral part of computing and play an important role in our lives. Apart from the Internet, other classes of networks are emerging that have the potential to trans- form the way we use computing. Some examples of these emerging classes of networks are sensor networks, smart-phones, and data-centers. There is currently a great push towards de- veloping new systems for these domains to explore the possibilities they offer and maximize their impact. Three properties that are desirable in a networked system designed for practical use arerelia- bility, efficiency,and robustness. A system should reliably function as expected and bugs should be minimized to the extent possible. It should be efficient in its use of various resources. Lastly, it should be robust to failures as well as attacks by malicious parties. Building reliable, efficient, and robust networked systems is challenging. The following are some common challenges faced by networked system programmers. Functionalitydistributedovermultiplenodes: Programmers have to deal with the inherent distributed nature of a network, which makes it hard to reason about reliability and robustness. 1 Manyapplication/domain-specificrequirements: Programmers have to respect various domain and application specific constraints while achieving efficiency, reliability and robustness. Gapbetweenprogrammerintentandfunctionality: Networked system implementations can be quite complicated, leading to differences in the functionality that the programmer in- tended and what the implementations exhibit in reality. Potentialvulnerabilitiesintheimplementations: Due to the distributed nature of networked systems and their complicated code, there may be a number of corner-case scenarios, that programmers may not be aware of. These unhandled scenarios may cause bugs, or leave the networked system implementation open to network attacks. These programming challenges impede the innovation and growth in upcoming and existing classes of networks, and their impact on society. We believe that analysis and design techniques from the programming languages community can be adapted and combined with some domain knowledge to help solve some of the problems encountered by programmers for networked systems. The use of domain knowledge is two-fold. First, domain knowledge is used to adapt general techniques to programs written for a specific domain, and to address the specific programming challenges of that domain. For example, knowl- edge about the severely resource constrained nature of wireless sensor networks and the common requirements of most of their applications (concurrency, synchronization, etc.) is used to de- sign an appropriate centralized programming framework for sensor networks [44]. Second, while general techniques may be intractable, domain knowledge can result in insights which lead to practical systems. For example, while deriving finite state machines from general programs can lead to state explosion, it is possible to build a tool specific to the sensor network domain that gen- erates compact, user-readable finite state machines [45]. We do this by using domain knowledge 2 to build a coarse approximation of the TinyOS execution model, which is still accurate enough for our purpose. While designing program analysis techniques, certain functionality of a networked system can be deeply entwined with the behavior of the hardware and other external factors, and hence hard to derive from the program alone. For example, it is hard to determine which statements in a network protocol implementation impact the delay for the protocol, since it also depends on the queueing mechanism used in the routers. Without information about such details, any program analysis technique would produce inaccurate results. We believe that in most cases, it is possible to work around this obstacle by making judicious use of the high-level knowledge of the user, which may be encapsulated as simple inputs to the analysis. Thus, in the previous example, the user may only need to specify what fields in the packet header have an impact on delay, and this can then be used to determine the statements that cause an increase in delay for the protocol. The central thesis of this dissertation is thatitispossibletodesignprogrammingabstractions andprogramanalysistechniquesbyleveragingdomainknowledgeandsimpleuserinsights,that simplifythetaskof buildingreliable,efficientand robustnetworked systems. 1.1 DissertationOverview Table 1.1 presents an overview of this dissertation. We attempt to address the above mentioned programming challenges in the context of the domains of sensor networks and network protocols. We first present language-based solutions for the domain of wireless sensor networks. Wireless sensor networks consist of a system of distributed sensors embedded in the physical world. They are increasingly used in scientific and commercial applications [81, 47]. However, 3 Table 1.1: Dissertation Overview constructing reliable, efficient and robust wirelessly-networked systems out of them is still a sig- nificant challenge. This is because the programmer must cope with severe resource, bandwidth, and power constraints on the sensor nodes as well as the challenges of distributed systems, such as the need to maintain consistency and synchronization among numerous, asynchronous loosely coupled nodes. The state of the art in today’s sensor-net programming is centered around a component-based language called nesC. nesC is anode-level language—a program is written for an individual node in the network—and nesC programs use the services of an operating system called TinyOS. We begin by introducing Pleiades, an approach to programming sensor networks that sig- nificantly raises the level of abstraction over current practice. The critical change is one of perspective: rather than writing programs from the point of view of an individual node, pro- grammers implement a central program that conceptually has access to the entire network. This approach pushes to the compiler the task of producing node-level programs that implement the desired behavior. We present the Pleiades programming language, its compiler, and its runtime. The Pleiades language extends the C language with constructs that allow programmers to name and access 4 node-local state within the network and to specify simple forms of concurrent execution. The compiler and runtime system cooperate to implement Pleiades programs efficiently and reliably, and provide robustness to node failures. First, the compiler employs a novel program analysis to translate Pleiades programs into message-efficient units of work implemented in nesC. The Pleiades runtime system orchestrates execution of these units, using TinyOS services, across a network of sensor nodes. Second, the compiler and runtime system employ novel algorithms for locking, deadlock detection, and deadlock recovery, that guarantee serializability in the face of concurrent execution. They also provide failure detection and recovery. We illustrate the read- ability, reliability and efficiency benefits of the Pleiades language through detailed experiments, and demonstrate that the Pleiades implementation of a realistic application performs similar to a hand-coded nesC version that contains more than ten times as much code. Next, we present an approach to automatically recover the high-level system logic from sen- sor network programs. The most common programming languages and platforms for sensor networks foster a low-level programming style. This design provides fine-grained control over the underlying sensor devices, which is critical given their severe resource constraints. However, this design also makes programs difficult to understand, maintain, and debug. We describeFSMGen, a tool to derive finite state machines from TinyOS programs. We adapt the technique ofsymbolicexecution developed by the program analysis community to handle the event-driven nature of TinyOS, providing a generic component for approximating the behavior of a sensor network application or system component. We then employ a form ofpredicateabstrac- tion on the resulting information to automatically produce a finite state machine representation of the component. We have used our tool FSMGen to automatically produce compact and fairly accurate state machines for several TinyOS applications and protocols. We illustrate how this 5 high-level program representation can be used to aid programmer understanding, error detection, and program validation. Finally, we present MAX, a tool to enable exploration of manipulation attacks in network protocol implementations. In a typical manipulation attack, malicious nodes modify their in- teractions with an honest node to induce behavior that benefits them or hurts others. These attacks are subtle and hard to detect in complex, real-world protocols. We adopt an approach that uses a novel combination of symbolic and concrete execution of protocol code. The for- mer helps focus on code paths that are potentially vulnerable, which makes MAX scalable. The latter helps confirm that attacks can be mounted in practice, which makes MAX precise. We demonstrate the effectiveness of MAX by using it on real implementations of TCP, 802.11 MAC, and ECN (Explicit Congestion Notification). In each case, we show how MAX automatically uncovers manipulation attacks that were previously discovered manually. We also show how MAX helps protocol developers gain confidence that certain aspects of a protocol are robust to manipulation attacks. These three systems thus address various programming challenges encountered while build- ing reliable, efficient and robust networked systems. Pleiades raises the level of abstraction for sensor network programs, requiring the programmer to write a single, centralized program and pushing the concerns of reliability, efficiency and robustness to failures to the compiler and runtime. Hence it addresses the challenges of distributing functionality over multiple nodes and addressing application or domain-specific constraints. FSMGen attempts to obtain compact rep- resentations of the functionality of sensor network programs, in an attempt to simplify the task of understanding and debugging sensor network programs, and checking their reliability and robust- ness. It helps the programmer with the challenges of identifying the gap between programmer 6 intent and actual functionality, as well as locating potential vulnerabilities in the code. Lastly, MAX helps developers determine whether a given network protocol implementation is robust to manipulation attacks. It thus helps with challenge of finding potential vulnerabilities to manipu- lation attacks in implementations. 1.2 DissertationOrganization This dissertation is organized as follows. In Chapter 2 we describe the Pleiades programming framework. We describe the language abstractions in Pleiades and the rationale behind their design. Next we present details of the Pleiades compiler and runtime, and evaluate the benefits of Pleiades, comparing the perfor- mance of programs written in Pleiades to nesC/TinyOS programs. In Chapter 3 we present an approach to deriving high-level functionality from sensor network programs. We describe the details of the tool FSMGen. We evaluate the accuracy of finite state machines generated by FSMGen for common TinyOS applications, and demonstrate how it may be used to find errors in functionality. In Chapter 4, we introduce the concept of manipulation attacks. We describe our approach to detect them in protocol implementations and present the tool MAX. We evaluate how MAX may be used to find manipulation attacks in TCP and 802.11 implementations, as well as to test any measures taken to prevent manipulation attacks. Chapter 5 presents a review of related work and Chapter 6 discusses conclusions and future directions. 7 Chapter2 Pleiades Current practice in sensor network programming uses a a highly concurrent dialect of C called nesC [24], which is anode-level language — a nesC program is written for an individual node in the network. nesC statically detects potential race conditions and optimizes hardware resources using whole-program analysis. nesC programs use the services of the TinyOS operating sys- tem [34], which provides basic runtime support for statically linked programs. TinyOS exposes an event-driven execution and scheduling model and provides a library of reusable low-level components that encapsulate widely used functionality, such as timers and radios. TinyOS was designed for efficient execution on low-power, limited-memory sensor nodes called motes. nesC and TinyOS provide abstractions and libraries that simplify node-level sensor-network application programming, but ensuring the efficiency, robustness and reliability of sensor network applications is still tedious and error prone (Section 2.1). For example, the programmer must manually decompose a high-level distributed algorithm into programs for each individual sensor node, must ensure that these programs efficiently communicate with one another, must implement any necessary data consistency and control-flow synchronization protocols among these node- level programs, and must explicitly manage resources at each node. 8 We are pursuing an alternative approach to programming sensor networks that significantly raises the level of abstraction over current practice. The critical change is one of perspective: rather than writing programs from the point of view of an individual node in the network, pro- grammers implement a central program that conceptually has access to the entire network. This change allows a programmer to focus attention on the higher-level algorithmics of an application, and the compiler automatically generates the node-level programs that properly and efficiently implement the application on the network. In the literature, this style of programming sensor networks is known as macroprogramming [82]. We have instantiated our macroprogramming approach in the context of a modest extension to C called Pleiades, which augments C with constructs for addressing the nodes in a network and accessing local state from individual nodes. These features allow programmers to naturally ex- press the global intent of their sensor-network programs without worrying about the low-level de- tails of inter-node communication and node-level resource management. By default, a Pleiades program is defined to have a sequential thread of control, which provides a simple semantics for programmers to understand and reason about. However, Pleiades includes a novel language con- struct for parallel iteration calledcfor, which can be used, for example, to iterate concurrently over all the nodes in the network or all one-hop neighbors of a particular node. The Pleiades compiler translates Pleiades programs into node-level nesC programs that can be directly linked with standard TinyOS components and the Pleiades runtime system and executed over a network of sensor motes. The key technical challenge for Pleiades is the need to automatically implement high-level centralized programs in an efficient and reliable manner on the nodes in the network. The Pleiades compiler and runtime system cooperate to meet this challenge in a practical manner (Section 4.3). 9 We make the following contributions: • Automaticprogrampartitioningandmigrationforminimizingenergyconsumption. En- ergy efficiency is of primary concern for sensor nodes because they are typically battery- powered. Wireless communication consumes significant battery energy, and so it is critical to minimize communication costs among nodes. Pleiades uses a novel combination of static and dynamic information in order to determine at which node to execute each statement of a Pleiades program. A compile-time analysis first partitions a program’s statements into node- cuts, each representing a unit of work to be executed on a single node. The runtime system then uses knowledge of the actual nodes involved in a nodecut’s computation to determine at which node it should be executed in order to minimize the communication overhead. • Aneasy-to-useandreliableconcurrencyprimitive. Concurrent execution is a natural com- ponent of sensor network applications, since each sensor node can execute code in parallel. However, with concurrency comes the potential for subtle errors in synchronization that can affect application reliability. To support concurrency while ensuring reliability, the Pleiades runtime system guarantees serializability for each cfor: the effect of a cfor loop always corresponds to some sequential execution of the loop. To achieve this semantics, the runtime system automatically synchronizes access to variables amongcfor iterations via locks, alle- viating the programmer of this burden. Locking has the potential to cause deadlocks, so the compiler and runtime system also support a novel distributed deadlock detection and recovery algorithm forcfors. 10 • Annotations for failure recovery. Sensor nodes are fragile, and prone to failures. To detect and recover from such node failures, Pleiades provides a set of annotations that allow the user to specify when a node should be declared failed, and measures to recover from such failures. • A mote-based implementation and its evaluation. We have implemented Pleiades on the widely used, but highly memory-constrained, mote platform. The motes we use have 10kB RAM for program variables and 48kB ROM for compiled code. Our implementation generates event-driven node-level nesC code that is conceptually similar to what a programmer would manually write today. We evaluate three applications belonging to three different classes (Sec- tion 2.3). We first compare the performance of a sophisticated pursuit-evasion game macro- program with that of a hand-coded nesC version written by others [26]. We find that the Pleiades program is significantly more compact (the source code size less than 10% as large), well-structured, and easy to understand. At the same time, the Pleiades implementation has comparable performance with the native nesC implementation. We then evaluate a car parking application that requires a strict notion of consistency and show that the Pleiades implemen- tation of the concurrent execution is reliable. We finally demonstrate the utility of control flow migration within a simple network information gathering example. Researchers have previously explored abstractions for programming sensor networks in the aggregate [82, 28, 67], as well as intermediate program representations to support compilation of such programs [65]. However, to our knowledge, a self-contained macroprogramming system for motes—one that generates the complete code necessary for stand-alone execution—has not previously been explored or reported on. Pleiades is also related to research on parallel and dis- tributed systems. Unlike traditional parallel systems and research on automatic parallelization, 11 we are primarily interested in achieving high task-level parallelism rather than data parallelism, given the loosely coupled and asynchronous nature of sensor networks. Further, we target con- currency support toward minimizing energy consumption rather than latency, since sensor net- works are primarily power constrained. Unlike traditional distributed systems, Pleiades features a centralized programming model and pushes the burden of concurrency control and synchro- nization to the compiler and runtime. A more detailed comparison with related work is presented in Chapter 5. 2.1 The PleiadesLanguage 2.1.1 DesignRationale Pleiades is designed to provide a simple programming model that addresses the challenges and requirements of sensor network programming. Pleiades’ sequential semantics makes programs easy to understand and is natural when programming sensor networks in a centralized fashion. Concurrency is introduced in a simple manner appropriate to the domain, via the cfor con- struct for node iteration. At the same time, the sequential semantics is still appropriate for the purpose of programmer understanding, because Pleiades ensures serializability ofcfors. This strong form of consistency and reliability is important for a growing class of sensor network ap- plications, like car parking and the part of an application responsible for building a routing tree across the nodes. For these kinds of applications, we argue that Pleiades’s sequential seman- tics is the right one. We have also used Pleiades for applications such as routing, localization, time synchronization and data collection, which require consistency for at least some program 12 variables. To our knowledge, no other macroprogramming system guarantees even weak forms of consistency. While Pleiades provides a sequential semantics, it nonetheless efficiently and naturally sup- ports event-driven execution. Pleiades has special language support for sensors and timers that provides a synchronous abstraction for event-driven execution. The synchronous semantics is easy for programmers to understand and fits well with the sequential nature of a Pleiades pro- gram. Under the covers, the language constructs are compiled to efficient event-driven nesC code. 2.1.2 ParkingCarswith Pleiades We illustrate the language features of Pleiades and the benefits they provide over node-level nesC programs through a small but realistic example application. It involves low cost wireless sensors that are deployed on streets in a city to help drivers find a free space. (According to recent surveys [76], searching for a free parking spot already accounts for up to 45% of vehicular traffic in some metropolitan areas.) Each space on the street has an associated sensor node that maintains the space’s status (free or occupied). The goal is to identify a sensor node with a free spot that is as close to the desired destination of the driver as possible. For ease of explanation, we define distance by hop count in the network, but it is straightforward to base this on physical distance. We consider an implementation of this application in Pleiades as well as two node-level versions written in nesC [24]. We show that the Pleiades version is simultaneously readable, reliable, and efficient. Each of the two nesC versions is more complex and provides reliability or efficiency, but not both simultaneously. 13 1 #include "pleiades.h" 2 boolean nodelocal isfree=TRUE; 3 nodeset nodelocal neighbors; 4 node nodelocal neighborIter; 5 6 void reserve(pos dst) { 7 boolean reserved=FALSE; 8 node nodeIter,reservedNode=NULL; 9 node n=closest_node(dst); 10 nodeset loose nToExamine=add_node(n, empty_nodeset()); 11 nodeset loose nExamined=empty_nodeset(); 12 13 if(isfree@n) { 14 reserved=TRUE; reservedNode=n; 15 isfree@n=FALSE; 16 return; 17 } 18 19 while(!reserved && !empty(nToExamine)){ 20 cfor(nodeIter=get_first(nToExamine);nodeIter!=NULL; 21 nodeIter = get_next(nToExamine)){ 22 neighbors@nodeIter=get_neighbors(nodeIter); 23 for(neighborIter@nodeIter=get_first(neighbors@nodeIter); 24 neighborIter@nodeIter!=NULL; 25 neighborIter@nodeIter=get_next(neighbors@nodeIter)){ 26 if(!member(neighborIter@nodeIter,nExamined)) 27 add_node(neighborIter@nodeIter,nToExamine); 28 } 29 if(isfree@nodeIter){ 30 if(!reserved){ 31 reserved=TRUE; reservedNode=nodeIter; 32 isfree@nodeIter=FALSE; 33 break; 34 } 35 } 36 remove_node(nodeIter,nToExamine); 37 add_node(nodeIter,nExamined); 38 } 39 } 40 } Figure 2.1: A street-parking application in Pleiades. 14 Figure 2.1 shows the key procedure that makes up a version of the street-parking application written in Pleiades. When a car arrives near the deployed area, a space near the driver’s indicated destination is found and reserved for it by invokingreserve, passing the car’s desired location. Thereserve procedure finds the closest sensor node to the desired destination and checks if its space is free. If so, the space is reserved for the car. If not, the node’s neighbors are recursively and concurrently checked. The code in Figure 2.1 makes critical use of Pleiades’scentralized view of a sensor network. We describe the associated language constructs in turn. Node Naming. Pleiades provides a set of language constructs that allow programmers to easily access nodes and node-local state in a high-level, centralized, and topology-independent manner. The node type provides an abstraction of a single network node, and the nodeset type provides an iterator abstraction for an unordered collection of nodes. For example, variable n (line 8) in reserve holds the node that is closest to the desired position (the code for the closest node function is not shown), andnToExamine (line 9) maintains the set of nodes that should be checked to see if the associated space is free. The set of currently available nodes in the network is returned by invoking get network nodes(), which returns a nodeset. Pleiades also provides a get neighbors(n) procedure that returns a nodeset containing n’s current one-hop ra- dio neighbors. In Figure 2.1, the reserve procedure uses get neighbors (line 18) to add an examined node’s neighbors to the nToExamine set. The Pleiades runtime implements get neighbors by maintaining a set of sensor nodes that are reachable through wireless broadcast. 15 Node-Local Variables. Pleiades extends standard C variable naming to address node- local state. This facility allows programmers to naturally express distributed computations and eliminates the need for programmers to manually implement inter-node data access and com- munication. Node-local variables are declared as ordinary C variables but include the attribute nodelocal, as shown for the isfree variable (line 2) in Figure 2.1. The attribute indicates that there is one version of the variable per node in the network. A node-local variable is addressed inside a Pleiades program using a new expressionvar@e, where var is a nodelocal variable and e is an expression of type node. For example, the reserve procedure uses this syntax to check if each node in nToExamine is free (line 23). An expression of the formvar@e can appear anywhere that a C l-value can appear; in particular, a node-local variable can be updated through assignment. All variables not annotated asnodelocal are treated as ordinary C variables, whose scope and lifetime respect C’s standard semantics. In Pleiades, we call these central variables, to distinguish them from node-local variables. In our example code,reserved is a central variable (line 6), which is therefore shared across all nodes in the network. Concurrency. By default, a Pleiades program has a sequential execution semantics. How- ever, Pleiades also provides a simple form of programmer-directed concurrency. Thecfor loop is like an ordinaryfor loop but allows for concurrent execution of the loop’s iterations. Acfor loop can iterate over any nodeset, and the loop body will be executed concurrently for each node in the set. For example, the reserve procedure in Figure 2.1 concurrently iterates over the nodes innToExamine (line 17), in order to check if any of these nodes is free. While concurrency is often essential to achieve good performance, it can cause subtle er- rors that are difficult to understand and debug. For example, a purely concurrent semantics of 16 the cfor in reserve can easily cause multiple free nodes to read a value of false for the reserved flag. This will have the effect of making each such node believe that it has been selected for the new car and is therefore no longer free. To help programmers obtain the benefits of concurrency while maintaining reliability, the Pleiades compiler and runtime system ensure that the execution of acfor is always serializable: the effect of acfor always corresponds to some sequential execution of the loop. In reserve, serializability ensures that only one free node will reserve itself for the new car; the other free nodes will see the updated value of the reserved flag at that point. Section 2.2.2 explains our algorithm for ensuring serializability for cfor loops. Pleiades allows cfors to be arbitrarily nested. The serializability semantics of a single cfor is naturally extended for nested cfors. Intuitively, the inner cfor is serialized as part of the iteration of the serialized outer cfor. So, in Figure 2.1, the programmer could have replaced the simple for in line 19 with a cfor, and the execution would be correct. It would also increase the available concurrency because multiple threads from the nestedcfor iterations would be active at a node. However, in this case, it would not be efficient to use acfor because the message and latency overheads involved in starting and terminating the concurrent threads and remotely accessingnExamined andnToExamine would offset the potential concurrency gain from executing on multiple neighboring nodes of nodeIter. In general, a programmer must weigh the benefits of fine-grained concurrency through nested cfors against the start-up and finalization overheads of such concurrency. Loose Variables. While serializability provides strong guarantees on the behavior ofcfor loops, sensor network applications often have variables that do not need serializability seman- tics and can obtain timeliness and message efficiency benefits by using a looser consistency 17 model. Examples include routing beacons that are used to maintain trees for sensor data col- lection, and sensor values that need to be filtered or smoothed using samples from neighboring nodes. Pleiades lets a programmer annotate such variables as loose, in which case accesses to these variables are not synchronized within a cfor. The consistency model used for loose variables closely follows release consistency semantics [38]. Writes to a loose variable can be re-ordered. The beginning of a new cfor statement or the end of any active cfor state- ment act as synchronization variables, ensuring that the current thread of control has no more outstanding writes. In Figure 2.1, variables nToExamine and nExamined are annotated as loose (lines 9 and 10) in order to gain additional concurrency and avoid lock overhead on them. These an- notations are based on the two observations that it is safe to examine a node in nToExamine multiple times, and that only a cfor iteration on nodeIter can remove the candidate node nodeIter from nToExamine. Alternatively, the programmer can derive the same concur- rency in this case without using loose by temporarily storing the set of nodes that would be added tonToExamine in line 21 and deferring theadd node operations on this set until after statement 31. In general, the programmer can derive maximum concurrency while ensuring seri- alizability by organizing her code so that writes on serialized variables happen toward the end of acfor. By default, loose variables are still reliably accessed, but the programmer can further annotate a loose variable to beunreliable, so that the implementation can use the wireless broadcast facility. In Section 2.3, we evaluate the street parking example with reliable loose variables and a separate application that primarily uses unreliable loose variables. 18 Automatic Control Flow Migration. Ultimately a centralized Pleiades program must be executed at the individual nodes of the network. As described in Section 4.3, the Pleiades implementation automatically partitions a Pleiades program into units of work to be executed on individual nodes and determines the best node on which to execute each unit of work in order to minimize communication costs. For example, the first five statements of the code (lines 6– 10) execute at the node invokingreserve. The implementation then migrates the execution of statements in lines 11–16 to noden. This is because it is cheaper to simply transfer control ton than to first readisfree@n and later write it back if necessary. Similarly, each iteration of the cfor loop will execute at the node identified by the current value ofnodeIter (line 17). While it does not happen in this example, the execution of a singlecfor iteration can also successively migrate to other nodes. Pleiades provides several important advantages over the traditional node-level programming for sensor networks in use today. To make things concrete, we consider how the street-parking algorithm would be implemented in nesC. We describe two different nesC implementations: a centralized version that is relatively simple and reliable but highly inefficient, and a more complex distributed version that is efficient but unreliable. In contrast, the Pleiades version is both reliable and efficient. 2.1.2.1 ACentralizednesCImplementation First, it is possible to implement a centralized version of the algorithm in nesC, wherein most of the algorithm is executed on a single node. The major advantage of this approach is its rela- tive simplicity for programmers. However, this version is extremely inefficient in terms of both message cost and latency. Figure 2.2 shows the core functions that comprise such a program. 19 1module ReserveM { 2 uses { ... } 3 provides { ... } 4}implementation { 5 nodeset nToExamine, nExamined; 6 boolean reserved, isfree, is_remote_free; 7 node closest, reserved_node, req, iter, iter1; 8 pos dst; 9 10 task void reserve(){ 11 call Topology.closest_node(dst); 12 } 13 event void Topology.found_node(node n){ 14 closest = n; 15 req=TOS_LOCAL_ADDRESS; 16 post transfer_control(); 17 } 18 task void transfer_control(){ 19 uint8_t i; 20 //Trigger remote doReserve() at ‘‘closest’’ 21 //node. Also, send ‘‘req’’ and ‘‘closest’’ node values 22 } 23 task void doReserve(){ 24 if(isfree){ 25 reserved_node=TOS_LOCAL_ADDRESS; 26 call MsgInt.send_reply(req,FOUND); 27 } 28 else{ 29 nToExamine=call Topology.get_neighbors(); 30 call RemoteRW.aread(nToExamine,ISFREE); 31 } 32 } 33 event void RemoteRW.aread_done(done_t done){ 34 if(done==ISFREE) 35 continue_reserve(); 36 else if(done==NEIGHBORS) 37 build_more_nodes(); 38 } 39 void continue_reserve(){ 40 for(iter=get_first(nToExamine);iter!=NULL; iter=get_next(nToExamine)){ 41 remove_node(iter, nToExamine); 42 add_node(iter, nExamined); 43 if(is_remote_free=call RemoteRW.read(iter,ISFREE)){ 44 reserved_node=iter; reserved=TRUE; 45 call RemoteRW.awrite(iter,ISFREE,0); 46 } 47 } 48 if(!reserved) 49 call RemoteRW.aread(nToExamine,NEIGHBORS); 50 } 51 void build_more_nodes(){ 52 nodeset nl; 53 for(iter=get_first(nToExamine);iter!=NULL; iter=get_next(nToExamine)) { 54 nl=(call RemoteRW.read(iter,NEIGHBORS)); 55 for(iter1=get_first(nl); iter1!=NULL; iter1=get_next(nl)) 56 if(!member(iter1,nExamined)) 57 add_node(iter1,nToExamine); 58 } 59 call RemoteRW.aread(nToExamine,ISFREE); 60 } 61} Figure 2.2: Car Parking in nesC: Centralized Implementation – Reliable but Inefficient 20 1module ReserveM { 2 uses { ... } 3 provides { ... } 4}implementation { 5 boolean isfree, seen, reserved; 6 pos dst; 7 node start_node[], req, orig, reserved_node; 8 uint8_t cnt_start_node, hopcount; 9 10 task void reserve(){ 11 call Topology.closest_node(dst); 12 } 13 event void Topology.found_node(node n){ 14 orig=TOS_LOCAL_ADDRESS; 15 start_node[0]=n, req=n, hopcount=HOP_MAX; 16 cnt_start_node=1; 17 post transfer_control(); 18 } 19 task void transfer_control(){ 20 uint8_t i; 21 for(i=0;i<cnt_start_node;i++){ 22 //Trigger remote doReserve() at every start_node[i]. Also,sendeachnode 23 //our req, orig, hopcount values 24 } 25 } 26 task void doReserve(){ 27 if(!seen){ 28 seen=TRUE; 29 } 30 if(isfree && !seen){ 31 reserved_node=TOS_LOCAL_ADDRESS; 32 isfree=FALSE; 33 call MsgInt.send_reply(req,FOUND); 34 } 35 else 36 flood_neighbors(); 37 } 38 void flood_neighbors(){ 39 nodeset nl=Topology.get_neighbors(); 40 node iter; 41 hopcount--; 42 if(hopcount>0){ 43 cnt_start_node=0; 44 for(iter=get_first(nl);iter!=NULL; iter=get_next(nl)) 45 start_node[cnt_start_node++]=iter; 46 post transfer_control(); 47 } 48 } 49 event void MsgInt.receive_reply(node rep, msg_t msg){ 50 if(msg==FOUND){ 51 if(!reserved){ 52 reserved_node=rep; 53 call MsgInt.send_reply(rep,ACCEPT); 54 call MsgInt.send_reply(orig,FOUND); 55 } 56 else 57 call MsgInt.send_reply(rep,REJECT); 58 } 59 else if(msg==REJECT){ 60 isfree=TRUE; 61 } 62 } 63} Figure 2.3: Car Parking in nesC – Distributed Implementation: Efficient but Unreliable 21 The overall logic is similar to that of the Pleiades version from Figure 2.1. However, program- mers must explicitly manage the details of inter-node communication. Because nesC uses an asynchronous, split-phase approach to such communication [24], the application’s logic must be partitioned across multiple callback functions at remote read/write boundaries. The control flow is as follows. A task reserve (line 9) is spawned on the node clos- est to the car, which, in turn, calls the closest node function (line 10) in the Topology component (this component is not shown). Since all tasks in nesC run to completion, and since Topology.closest node performs a split-phase lookup operation for the desiredclosest node, the callback functionfound node is later invoked byTopology (line 12). The callback creates a new task transfer control (line 14), which ultimately triggers doReserve on theclosest node (line 21). The rest of the algorithm then runs centrally on the closest node. doReserve,exe- cuting on closest, either finds itself free (line 22) or creates the nToExamine set with its current neighbor set (line 26). Next, it concurrently and asynchronously reads isfreesat nToExamine (line 27) using aread of the RemoteRW component (not shown). When the asynchronous read completes, it signals aread done (line 29), and continue reserve is called (line 30). Such reads are locally cached in the RemoteRW component, so that continue reserve can synchronously read them in line 37. If no node with a free spot is found (lines 37–41), more neighboring nodes of the current nodes are searched using another asynchronous read (line 42), which, ultimately callsbuild more nodes (line 31). Since the code is executed on a single node, this approach maintains a relatively straight- forward structure, similar to that of the Pleiades code. The main drawback of this approach to node-level programming is inefficiency. Message cost is high becauseisfree of every node is 22 centrally fetched and checked from a single node. In contrast, the Pleiades version from Fig- ure 2.1 uses acfor to allow each node to locally process its own data, using the code migration techniques described in Section 4.3. Thus, even for small example topologies of two-hop radius, it can be shown that the Pleiades version requires around half the messages required by the nesC version; this message count for Pleiades includes all control overhead for code migration and for ensuring serializability of cfors. The concurrent cfor iterations in Pleiades also find a free spot earlier than is possible in the nesC version. In the nesC version,continue reserve in line 42 waits on RemoteRW.aread for all remote neighbors in nToExamine to be asyn- chronously read, andbuild more nodes in line 51 similarly waits until all remoteisfrees innToExamine are read. 2.1.2.2 ADistributednesCImplementation The Pleiades version of car parking in Figure 2.1 does a breadth-first search around the closest node, moving to the next depth in a distributed fashion only if no free slot is found in the current one. Unfortunately, a distributed implementation in nesC that provides the same behavior as the Pleiades version would be exceedingly complex. Such an implementation would require the programmer to manually implement many of the same concurrency control techniques that Pleiades automatically implements for cfors, as discussed in Section 2.2.2. For example, to ensure that exactly one free space is reserved for a car, the programmer would have to implement a form of distributed locking for conceptually central variables. In general the use of locking would then require manual support for distributed deadlock detection or avoidance. Similarly, to ensure that the closest free space is always found, the programmer would have to manually 23 synchronize execution across the nodes in the network, to ensure that a depth d is completely explored before moving on to depthd+ 1. Therefore, in practice a distributed version in nesC would forgo synchronization, as shown in Figure 2.3. Here we do a distributed flooding-based search around the closest node, in order to find a free spot. The control flow is as follows. Afterreserve is invoked (line 9),doReserve is ultimately triggered, in a manner similar to the previous version. The only difference here is thatdoReserve may be active at multiple nodes that receive the flooding request and may be activated multiple times by several neighbors (lines 39–41). Since a node must process a request exactly once even if itsdoReserve is triggered multiple times by its neighbors,doReserve uses a flagseen (line 26) to ignore all but the first request. To limit the number of duplicate requests at a node, the code also suppresses broadcasts to neighbors when thehopcount reaches 0 (line 37). This is an effective technique when the net- work diameter is unknown and when we want to ensure the flooded requests prefer shorter hops from the flooding initiator (nodereq in line 14). receive reply (line 43) is a callback that is invoked by the local message interface componentMsgInt (not shown) whenever a remote node sends a message. When a spot is found at a remote node, it sendsFOUND to the flooding initiator (line 30), which rejects all but the first successfully replying node (lines 45–49). If a remote node is rejected, it sets itself back to free (line 50). As described earlier, the Pleiades version performs a breadth-first search on the topology, distributedly determining if there is a free slot at depth d before moving on to depth d+ 1. By contrast, the flooding approach starts up the free-slot determination concurrently at all network nodes by flooding the transfer of control. Given this distinction, two things follow. First, the Pleiades approach is always more message efficient, since it avoids multiple requests to the 24 same node. Second, the flooding approach has lower latency, since it can find a spot more quickly when the free spot is far away. The flooding approach is also much more efficient in terms of both messaging costs and latency than the centralized nesC version shown in Section 2.1.2.1. Despite the latency advantage, the code in Figure 2.3 is significantly less understandable and reliable than the Pleiades version. The programmer is responsible for explicitly managing the communication among nodes. For efficiency, this requires maintaining information about hop counts and other network details. It also requires that conceptually “central” variables be packaged up and passed among the nodes explicitly, taking care to maintain consistency. For example, a special protocol is used in receive reply (lines 44–50) to ensure a consistent view of the reserved flag, in order to avoid having multiple nodes be reserved for the same car. Similarly, intransfer control (lines 21–22), a node explicitly sends the values of the node originating the request and the node closest to the destination that initiated the search. In the Pleiades version, the combination of central variables and cfors takes care of these low- level details automatically. Finally, the flooding version, unlike the other two versions, makes no guarantee that the first node to reply is the topologically closest node. So, if we want it to reliably return only a closest node, thereq node executingMsgInt.receive reply (line 43) must wait for an indeterminable amount of time before accepting a replying node, negating the latency advantage. 2.1.3 OtherFeaturesof Pleiades Pleiades includes other language constructs to support the implementation of common sensor network idioms, which we briefly describe. 25 Sensors and Timers. As mentioned earlier, Pleiades uses special kinds of variables as an abstraction for sensors, which are critical components of sensor-network applications. Sensor readings are asynchronous events, and Pleiades provides a facility to synchronously wait for such an event to occur. In particular, Pleiades’s wait function takes a sensor variable and returns when the sensor takes a reading. At that point, the associated variable contains the most recent reading and the program can take appropriate action. For example, this mechanism is used in order for the car-parking application to wait for notification that a parked car has left its spot, at which point the spot’s sensor sets its associatedisfree variable defined in line 2 of Figure 2.1 to TRUE (this operation is not shown), so that it can once again service remotereserve requests. A similar technique is used to model timers, which fire at some user-specified rate. Annotations for Failure Recovery. Pleiades provides annotations to allow users to spec- ify how node failures during execution may be handled. A node is declared to be failed if it times out (timeout value is defined by the programmer in the Pleiades program as int failure timeout = T; where T is the timeout value) and does not respond to beacons. Node failures can be of primarily two kinds: of nodes storing values required during execution, and of nodes actually executing parts of the program. Pleiades provides annotations to allow the user to specify program behavior whenever either type of failure is encountered. Variables may be annotated to specify default values that can be used in case of the first kind of failure. For example, the node-local variableisfree in the street-parking example in Figure 2.1 may be annotated as boolean nodelocal default FALSE isfree = TRUE;. This ensures that if a node fails, rather than stalling the process of looking for free parking spots, the execution moves on, with the assumption that the node that failed did not have a free parking spot, and hence gracefully deals with this failure. Pleiades currently only deals with failures of nodes running 26 cfor iterations. It cannot handle failure of nodes running the top-level thread, which results in an automatic abortion of the entire program execution. If a node running acfor iteration fails and the Pleiades runtime is unable to restart it at another node, it declares it as being failed. Pleiades allows the user to annotatecfor loops with a condition on how many failed iterations they can tolerate before having to abort the entire thread. For example, thecfor on line 20 in the street-parking example in Figure 2.1 may be annotated with abort>50% to indicate that if more than 50% of the nodes running thecfor iterations fail, the execution of thecfor should be aborted. Modules. A Pleiades program consists of a number ofmodules, which are executed concur- rently. Each module encapsulates a logically independent application-level computation, such as building a shortest path tree rooted at a given node, computing an aggregate, or routing applica- tion data to a given node. A module is a set of functions that can invoke each other and define and use global and local variables of both central andnodelocal type. Since modules are meant to be independent tasks, we currently provide no synchronization among modules. 2.2 Implementation This section describes the Pleiades compiler and runtime system. The Pleiades compiler is built as an extension to the CIL infrastructure for C analysis and transformation [60]. Our compiler accepts a Pleiades program as input and produces node-level nesC code that can be linked with standard TinyOS components and the Pleiades runtime system. The Pleiades runtime system is a collection of TinyOS modules that orchestrates the execution of the compiler-generated nesC code across the nodes in the network. 27 The Pleiades compiler and runtime cooperate to tackle two key technical challenges. First, they must partition a Pleiades program into chunks that can be executed on individual nodes and determine at which node to run each chunk, striving to minimize communication costs. Second, they must provide concurrent but serializable execution of cfors. We discuss each challenge in turn. 2.2.1 ProgramPartitioningandMigration 2.2.1.1 Partitioning. The Pleiades compiler performs a dataflow analysis in order to partition a Pleiades program into a set ofnodecuts. Each nodecut is then converted into a nesC task [24], to be executed by the Pleiades runtime system on a single node in the network. At one extreme, one could consider the entire Pleiades program to be a single nodecut and execute it at one node, fetching node- local and central variables from other nodes as needed (moving the data to the computation). The other extreme would be to consider each instruction in the Pleiades program as its own nodecut, executing it on the node whose local variables are used in the computation (moving the computation to the data). Both of these strategies lead to generated code that has high messaging overhead and high latency, in the first case due to the on-the-fly fetching of individual variables, and in the second case due to the per-instruction migration of the thread of control. We adopt a compilation strategy for Pleiades that lies in between these two extremes, in- volving both control flow migration and data movement. A nodecut can include any number of statements, but it must have the property that just before it is to be executed, the runtime system can determine the location of all the node-local variables needed for the nodecut’s execution. We therefore define a nodecut as a subgraph of a program’s control-flow graph (CFG) such that for 28 FIND-NODECUT(P) 1ComputetheCFG G of the program P 2 for all nodes n∈G 3 donodecut(n)←entry(G) 4 for all nodes n∈G 5 doifn contains an expression of the form exp1@v 6 thenNC←{n # ∈G|nodecut(n # )=nodecut(n)} 7 RD←{n # ∈NC|n # contains a definition ofv that reaches n} 8 SUB← ! rd∈RD graph of all paths from rd ton 9 D←{n # ∈NC|n # dominates n inNC and 10 ∀rd∈RD n # post-dominates rd in SUB} 11 pick some node d∈D as the entry node of a new nodecut, 12 nodecut(d)←d 13 ∀n # ∈NC that are reachable from d inNC without traversing a back edge, 14 nodecut(n # )←d 15 return set of nodecuts formed Figure 2.4: Algorithm for determining nodecuts. every expression of the formvar@e in the subgraph, the l-values ine have no reaching defini- tions within that subgraph. Given this property, the runtime system can retrieve all the necessary node-local and cen- tral variables concurrently, before beginning execution of a nodecut, which improves the latency immensely over the first strategy above. At the same time, because the runtime system has in- formation about the required node-local variables, it can determine the best node (in terms of messaging costs) at which to execute the nodecut, thereby obtaining the benefits of the second strategy above without the latency and message costs of per-statement migration. Intuitively, the goal is to make each nodecut as large as possible, in order to minimize the control and data costs associated with a migration. Since a nodecut runs to its completion without any further communication, this approach would statically minimize the total communication cost of a program. We make the goal of minimizing migrations precise by striving to minimize the total number of edges in the program’s CFG that cross from one nodecut to another, since each such edge represents a migration of the dynamic thread of control from one sensor node to another. 29 This optimization problem is exactly equivalent to the directed unweighted multi-cut problem, which is known to be NP-complete [10]. Therefore, instead of finding the optimal partition of a CFG into nodecuts, the Pleiades compiler uses a heuristic algorithm that works well in practice, as shown in Section 2.3. This algorithm is described in Figure 2.4. The algorithm depends on some standard notions about the relationships among nodes in a CFG. A noden is said todominate another noden # if all paths in the CFG from the entry node to n # contain n. Similarly, a node n is said to post-dominate n # if every path from n # to the exit node contains n. Finally, an edge is a backedge if its target node dominates its source node. The algorithm starts by assuming that all CFG nodes are in the same nodecut and does a forward traversal through the CFG, creating new nodecuts along the way. For each CFG node n containing an expression of the form var@e, we find all reaching definitions of the l-values in e and collect the subset R of such definitions that occur within n’s nodecut. If R is nonempty, we induce a new nodecut by finding a CFG node d that dominates node n and post-dominates all of the nodes in R. Node d then becomes the entry node of the new nodecut. Any such node d can be used, but our implementation uses simple heuristics that attempt to keep the bodies of conditionals and loops in the same nodecut whenever possible. The implementation also uses heuristics to increase the potential for concurrency. For example, the body of acfor is always partitioned into nodecuts that do not contain any statements from outside thecfor, so that these nodecuts can be executed concurrently. The five nodecuts computed by our algorithm for the street-parking example in Figure 2.1 are shown in Figure 2.5. Nodecut 2 is induced due to the use ofisfree@n in line 11 of Figure 2.1, sincen is defined in line 8. The transitions from nodecut 2 to 3 and nodecut 3 to 4 are induced to keep thecfor body separate from statements outside the loop, as mentioned above. Further, 30 Nodecut 2 Nodecut 1 Transit Edge Nodecut 5 Nodecut 4 Nodecut 3 false end true 4 true true if(isfree@iter) if(!reserved) true true end reserved = true false false true 7 9 12 14 13 15 false nodeIter = get_first(nToExamine) if(nodeIter!=NULL) if(neighborIter@nodeIter!=NULL) reservedNode = nodeIter isfree@nodeIter = false get_next(neighbors@nodeIter) neighborIter@nodeIter = if(!member(neighborIter@nodeIter, nExamined)) remove_node(nodeIter,nToExamine) add_node(nodeIter,nExamined) nodeIter = get_next(nToExamine) neighbors@nodeIter = get_neighbors(iter) get_first(neighbors@nodeIter) neighborIter@nodeIter = false reserved = false true end 1 start if(isfree@n) reserved = true isfree@n = false 2 reservedNode = NULL n = closest_node(dst) reservedNode = n nExamined = {} nToExamine = {n} false if(!reserved && !empty(nToExamine)) 3 5 6 8 10 11 false add_node(neighborIter@nodeIter,nToExamine) Figure 2.5: Nodecuts generated for the street-parking example. an extra nodecut is induced within the cfor body (nodecut 5) to maximize read concurrency. The heuristic attempts to separate read and written variables into different nodecuts so that the acquisition of write locks, which is done before a nodecut starts execution, can be delayed until the write locks are actually required. In the current implementation we assume that a Pleiades program does not create aliases among node variables. Such aliasing has not been necessary in any of our experiments with the Pleiades language so far. It is straightforward to augment our algorithm for generating nodecuts to handle node aliasing by consulting a static may-alias analysis. 31 2.2.1.2 ControlFlowMigration. The Pleiades runtime system is responsible for sequentially (ignoring cfor for the moment) executing each nodecut produced by the compiler across the sensor network. When execution of a nodecutC completes at some noden, that node’s runtime system determines an appropriate node n # at which to run the subsequent nodecutC # and migrates the thread of control to n # . All of the Pleiades program’s central variables migrate along with the thread of control, thereby making them available to C # . Because of the special property of nodecuts, the runtime system knows exactly which node-local variables are required by C # , so these variables are also concurrently fetched ton # before execution ofC # is begun. To determine where the next nodecut should be executed, the runtime uses the overall migra- tion cost as the metric. The runtime knows the number of node-local variables needed from each node for executing the next nodecut as well as the distances (the number of radio hops) of these nodes relative to each other according to the current topology. The runtime chooses the node that minimizes the cost of transfers from within this set. For example, nodecut 2 in Figure 2.1 accesses the node-local variable isfree@n, as well as two central variables reserved and reservedNode. The cost of running this nodecut at the node executing nodecut 1 is the cost of fetching the value ofisfree fromn at the beginning of nodecut 2 and writing backisfree if necessary. This cost is two reliable messages across multiple radio hops. By contrast, if the runtime at nodecut 1 hands off nodecut 2 to node n, the cost is that of transferring the thread of control along with the central variables. This is only one reliable message across the same number of hops. So, Pleiades executes nodecut 2 atn. 32 Since the nodecuts along with the set of node-local variables accessed in each nodecut are statically supplied by the compiler, our migration approach thus exploits a novel combination of static and dynamic information in order to optimize energy efficiency. We note that this approach does not require every node to keep a fully consistent topological map, but only the relative dis- tances of the nodes involved in the nodecut. In our current implementation, nodes use a statically configured topological map in order to make the migration decision; we will explore lightweight, dynamic approaches to determine approximate topological maps as part of future work. 2.2.2 SerializableExecutionofcfors To execute acfor loop, the Pleiades runtime system forks a separate thread for each iteration of the loop. We call the forking thread the cfor coordinator. Program execution following thecfor only continues once all the forked threads have joined. Each forked thread is initially placed at the node representing the value of the variable the cfor iterates over, and any sub- sequent nodecuts in the thread are placed using the migration algorithm for nodecuts described above. A forked thread may itself execute acfor statement, in which case that thread becomes the coordinator for the innercfor, forking threads and awaiting their join. To provide reliability in the face of concurrency, Pleiades ensures serializability of cfor loops. This allows programmers to correctly understand their Pleiades programs in terms of a sequential execution semantics. The Pleiades compiler and runtime ensure serializability by transparently locking variables accessed in each cfor body. The use of locking has the potential to cause deadlocks, so we also provide a novel distributed deadlock detection and recovery algorithm. 33 EXECUTE(thread t) 1 whiletrue 2 doswitch NEXT-OPERATION(t) 3 case read(x) :ift does not have a read lock onv 4 then REQUEST-LOCK(x,read,t) 5 if lock not obtained 6 then SET-STATE(t,blocked) 7 suspend execution of t 8 casewrite(x) :ift does not have a write lock onv 9 then REQUEST-LOCK(x,write,t) 10 if lock not obtained 11 then SET-STATE(t,blocked) 12 suspend execution of t 13 casecfor(c) : SPAWN-THREADS(c);return 14 case join :for each lock l owned byt 15 do RELEASE-LOCK(l,t) 16 SEND-JOIN(t);return 17 EXECUTE-NEXT-OPERATION(t) LOCK-GRANTED(lock l,variablev,modem,thread t) 1 store lockl att 2 if t was suspended waiting forl 3 then resume execution oft 4 SET-STATE(t,executing);return 5 child t = first in queue wanting a lock onv att 6 if mode of l ==read andchild t wanted a write lock onv 7 thenreturn 8 removechild t from queue 9 LOCK-GRANTED(l,v, mode of child t ,child t) 10 mark lock l as being used bychild t att RELEASE-LOCK(lock l,thread t) 1 if t is the top-most level thread 2 then copy l back to the owner of the variable locked by it 3 else copy l back tocfor coordinator(t) 4 delete lockl att 5 markl as unused att 6 if queue of threads wantingl is non-empty atcfor coordinator(t) 7 thennew t = first in queue wantingl 8 removechild t from queue 9 LOCK-GRANTED(l, var ofl , mode of new t ,new t) 10 mark lockl as being used bynew t atcfor coordinator(t) Figure 2.6: cfor execution and locking algorithm. 34 Figure 2.6: Continued REQUEST-LOCK(variablev,modem,thread t) 1 if t is the top-most level thread 2 then fetch lockl from owner of variable 3 LOCK-GRANTED(l,v,m,t);return 4 ifcfor coordinator(t) does not have any locks onv 5 then REQUEST-LOCK(v,m,cfor coordinator(t)) 6addt to the queue of threads wanting 7 a lock onv atcfor coordinator(t);return 8 if m==read 9 thenifcfor coordinator(t) has read lock l on v 10 then LOCK-GRANTED(l,v,m,t) 11 mark lockl as being used 12 byt atcfor coordinator(t) 13 return 14 ifcfor coordinator(t) has a write lockl on v 15 thenif l is being used as a read lock or is free 16 then LOCK-GRANTED(l,v,m,t) 17 mark lockl as being used 18 byt atcfor coordinator(t) 19 else addt to the queue of threads wanting 20 a lock onv atcfor coordinator(t) 21 return 22 if m==write 23 thenifcfor coordinator(t) has a read lockl on v 24 then addt to the queue of threads wanting 25 a lock on v atcfor coordinator(t) 26 REQUEST-LOCK(v,m,cfor coordinator(t)) 27 return 28 ifcfor coordinator(t) has a write lockl on v 29 thenif l is being used 30 then addt to the queue of threads wanting 31 a lock onv atcfor coordinator(t) 32 return 33 else LOCK-GRANTED(l,v,m,t) 34 mark lockl as being used 35 byt atcfor coordinator(t) SEND-JOIN(thread t) 1 terminate t 2 if cfor coordinator(t) has no more executing children 3 then EXECUTE(cfor coordinator(t)) 4 SET-STATE(t, joined) 35 2.2.2.1 DistributedLocking. To ensure serializability, the Pleiades implementation protects each node-local and central vari- able accessed within a cfor iteration with its own lock. We employ a pessimistic locking ap- proach, since this consumes less memory than optimistic approaches such as versioning. The details of the locking scheme we employ are described in Figure 2.6. To ensure serializability, a lock must be held until the end of the outermost cfor iteration being executed; thus, the implementation uses strict two-phase locking. However, locks are ac- quired on demand rather than at the beginning of the cfor iteration, thereby achieving greater concurrency. To further increase concurrency, our algorithm distinguishes between read and write locks. Readers can be concurrent with one another, while a writer requires exclusive access. The implementation acquires locks at the granularity of a nodecut. This allows the locks to be fetched along with the associated variables before the nodecut’s execution, decreasing messaging costs. Our algorithm acquires locks in a hierarchical manner. Eachcfor coordinator keeps track of which locks it holds, the type of each lock (read or write), which of its spawned threads are currently using each lock, and which of its threads are currently blocked waiting for each lock. When a nodecut requires a particular lock, it asks the coordinator of its innermost enclosingcfor for the lock. If the coordinator has the lock, it either provides the lock or blocks the thread, de- pending on the lock’s current status, and updates the lock information it maintains appropriately. If the coordinator does not have the lock, it recursively requests the lock from itscfor coordi- nator, thereby handling arbitrarily nestedcfors. Once the top-levelcfor coordinator has been reached, it acquires the lock from the variable’s owner and grants the lock to the requesting thread (who will then grant the lock to its requesting thread, and so on down to the original requester). 36 Once a thread has obtained the lock on a variable, it fetches the actual value of the variable di- rectly from the owner. When a spawned thread joins, it returns its locks to itscfor coordinator, who may therefore be able to unblock threads waiting for these locks. Also, if any of the locks owned by the joining thread were write locks, before releasing the locks it writes back the current value of the variable at the owner. We now informally argue that our locking algorithm always results in a serializable execution of a cfor. Consider an execution of a cfor containing two accesses a i and a j to the same variable from (possibly nested) threads i and j, where at least one access is a write. Suppose without loss of generality that thread i acquires the necessary lock first. The key property of our algorithm is that a j can only proceed once thread i has joined. To see this, we note that in order for i to have obtained the lock, the algorithm ensures that each thread on the path from the top- levelcfor toi maintains the information that the lock is reserved for its direct child on the path. Let k be the least common ancestor thread of i and j. In order for a j to obtain the required lock, it must be the case that thread k is no longer reserving the lock for its child on the path to i. For this to happen,i had to have released the lock, and this only happens wheni joins. The above property of our algorithm implies that every execution trace of a cfor has an acyclic conflict graph. The conflict graph has one node per thread and an edge from thread i to thread j if the threads have conflicting accesses to the same variable and thread i accesses the variable first. Therefore, it is easy to show that thecfor execution is equivalent to one in which each thread is executed in some topological order consistent with the conflict graph. Let us revisit the street parking example in Figure 2.1. For eachcfor iteration, the Pleiades runtime at the coordinator sends a message containing thefork command to each of the remote nodes selected for execution. Each node initially acquires a read and write lock respectively 37 on its own versions of the node-local variables isfree and neighbors. isfree uses a read lock instead of a write lock even though it can potentially be modified in line 26, because using a read lock first and then upgrading it to a write lock if the conditional in line 23 succeeds significantly enhances concurrency. On receiving these locks, the threads fetch the variable values from the owners and begin concurrent execution of the initial nodecut of the cfor (nodecut 3 in Figure 2.5). Threads that run on nodes with an occupied parking space fail theif condition in line 23, release their locks, and join with the cfor coordinator. Threads on nodes that have a free space contend for a write lock on central variablesreserved andreservedNode and have to execute the second nodecut of thecfor sequentially. The first thread to do so is selected as the winner, and other nodes do not change theirisfree status. 2.2.2.2 DistributedDeadlockDetectionandRecovery. While the locking algorithm ensures serializability ofcfors, it can give rise to deadlocks. One possibility would be to statically ensure the absence of deadlocks, for example via a static or dynamic global ordering on the locks. However, such an approach would be very conservative in the face of cfors containing multiple nodecuts, nested and conditional cfors, or cfors that contain updates to node variables, thereby overly restricting the amount of concurrency possible. Further, we expect deadlocks to be relatively infrequent. Therefore Pleiades instead implements a dynamic scheme for distributed deadlock detection and recovery. While such schemes can be heavyweight and tricky in general [20], we exploit the fork-join structure of a cfor to arrive at a simple and efficient state-based deadlock detection algorithm. Our algorithm requires only two bits of state per thread, does not rely on timeouts, and finds deadlocks as soon as it is safe to determine the condition. Furthermore, this algorithm is implemented by the compiler and 38 runtime, without any programmer intervention. The details of this algorithm are described in Figure 2.7. We require every thread to record its state during execution, which is either executing, blocked, or joined. We define a cfor coordinator to be executing if at least one of the coordinator’s spawned threads isexecuting,blocked if at least one of the coordinator’s threads isblocked and none areexecuting, andjoined if all of the coordinator’s threads are joined. A thread can easily update its state appropriately as its locks are requested and released during the locking algorithm described above, in the process also causing the thread to recursively update the state of itscfor coordinator. The program is deadlocked if and only if the top-levelcfor coordinator ever has its state set toblocked. Informally, we can argue that the algorithm detects all deadlocks. Define a Wait-For Graph as a directed graph of links from thread i to thread j such that i is waiting to acquire a lock held by j. If there is a deadlock, this Wait-For Graph must have a cycle. Now consider all the threads in this cycle; every thread in this cycle must reach a persistentblocked state since it is waiting to acquire at least one lock. In general, every thread involved in a cycle will reach a persistent blocked state, and all others will eventually reach thejoined state. Consider a threadi which has reached ablocked state. By line 10 of DETECT-DEADLOCK, itscfor coordinator is notified of this. By lines 5–6 of DETECT-DEADLOCK,i’s cfor coordinator will set itself to the blocked state when all its children have either joined orblocked. This happens recursively all the way until the top-levelcfor coordinator. Thus, the top-levelcfor coordinator is set toblocked state only when all threads in the system have either joined or blocked. At that point, by line 9 of DETECT-DEADLOCK, deadlock recovery is invoked. One should note here that it will never be the case that there is a 39 SET-STATE(thread t,state s) 1 state(t)=s 2 if s=executing 3 thenif t is the topmost-level thread 4 thenreturn 5 else SET-STATE(cfor coordinator(t),executing) 6 else DETECT-DEADLOCK(t) DETECT-DEADLOCK(thread t) 1 if t executes join 2 then state(t)= joined; 3 DETECT-DEADLOCK(cfor coordinator(t)) 4 if t is acfor coordinator 5 thenif! c∈Children(t), state(c)=executing 6 then state(t)=blocked 7 if state(t)=blocked 8 thenif t is the topmost level thread 9 then RECOVER-DEADLOCK(t) 10 else DETECT-DEADLOCK(cfor coordinator(t)) RECOVER-DEADLOCK(thread t) 1 B= {c|c∈Children(t) and state(c)=blocked} 2 if |B|> 1 3 thenforb∈B 4 dofor each lockl owned byb 5 do RELEASE-LOCK(l,b) restart and executeb serially 6 else if |B|= 1 7 then RECOVER-DEADLOCK(b) Where B= {b} Figure 2.7: Deadlock detection and recovery algorithm. deadlock in the system and DETECT-DEADLOCK is not executed. The last action, before the deadlock becomes detectable, is that of a thread entering the blocked or joined state. In either case, DETECT-DEADLOCK will be called. Once a deadlock has been detected, we use a simple recovery algorithm. Starting from the top-level cfor coordinator, we walk down the unique path to the highest thread in the tree of cfor coordinators that has at least two blocked child threads. We then release all locks held by these blocked threads and re-execute them in some sequential order. This simple approach 40 guarantees that we will not encounter another deadlock after restart. To support re-execution, each thread records the initial values of all variables to which it writes, so that the variables previously updated at their owners can be rolled back appropriately during deadlock recovery. We assume that the iterations are idempotent, so there are no harmful side-effects of re-execution. This is true in many sensor networks programs, which primarily involve sensing and actuation as side effects. 2.2.2.3 FailureDetectionandRecovery Node and link failures are a common occurrence in sensor networks. The Pleiades runtime can deal with isolated link failures by employing a dynamic routing mechanism. As mentioned in Section 2.1, the Pleiades runtime detects a node failure by using a timeout mechanism. When- ever a node is either waiting for a cfor iteration to complete, or a variable value to arrive, it maintains a failure timer. When the timer expires, the waiting node pings the node from which it is expecting to hear, and if no response is received from the node after multiple pings, declares it to have failed. The Pleiades runtime uses information provided via annotations by the programmer to de- cide how to deal with a node failure. If the detecting node was waiting for a variable value from the failed node, and if the user had annotated the variable with a default value, the runtime pro- vides this value to the detecting node. The computation at the detecting node then moves ahead with this value assigned to the variable. If the detecting node was waiting for acfor iteration to execute at the failed node, it first undoes any partially committed changes caused by the execu- tion. It then attempts to try and restart thecfor iteration at another node. If this is not possible due to the failed node storing variables that have no default value, it checks to see if the user has 41 specified any condition that if met allows for the failure to be ignored. If there is such a condition and it is satisfied despite the failedcfor iteration, the detecting node ignores the failed node and itscfor iteration and continues execution once the othercfor iterations have completed. If the condition is not met after this failure, the detecting node declares the cfor execution failed. If the current thread is part of a highercfor, then the detecting node declares the particular iteration it was running as failed to the highercfor coordinator. If the current thread is the main program thread, then the entire execution is halted. While our simple scheme can deal with and attempt to recover from failure of nodes con- taining variables and running cfor iterations, currently it cannot deal with failure of any node running the main thread. This would require adding a lot of extra functionality at every node since monitoring for such a failure would break the naturalcfor hierarchy. Also deciding where to restart the main thread would require some form of leader selection. We’ve observed that in practice, the main thread is not migrated too often. Hence we simply assume that the main thread is only run on a subset of nodes which are sturdier and less failure prone. 2.3 Evaluation We have implemented the Pleiades compiler and runtime described in Section 4.3. In this sec- tion, we describe an evaluation of this implementation for various applications, with Pleiades running on TelosB Tmote Sky motes. We first discuss the performance of a Pleiades application relative to a nesC implementation of that same application. Then, we quantify the performance of Pleiades support for serializability and nodecut migration. 42 PleiadesandnesCComparison. We compare a Pleiades implementation of aPursuit-Evasion Game(PEG) against a hand-coded node-level nesC implementation of the same application writ- ten by others [26] on a 40 node mote testbed. PEGs [74] have been explored extensively in robotics research. In a PEG, multiple robots (the pursuers) collectively determine the location of one or more evaders using the sensor network, and try to corral them. The mote implementation of this game consists of three components: a leader election module performs data fusion to determine the centroid of all sensors that detect an evader; a landmark routing module routes leader reports to a landmark node; in turn, the landmark routes reports to pursuers. The Pleiades version of PEG implements the leader election component of PEG, and leverages the routing provided by the Pleiades runtime to route the leader reports directly to the pursuer. An important feature of this application is that it requires no serializability semantics for the core leader election module; in fact, the data we present below were obtained using a version of Pleiades that did not support serializability. We also implemented PEG on Pleiades with full serializability support for leader election, and found that it does not incur additional overhead due to locking, because leader election needs only read locks, which are acquired once at the beginning, and retained until the end. Figure 2.8 depicts the performance of the Pleiades version of PEG (Pleiades-PEG) as com- pared to a hand-crafted nesC version (Mote-PEG). We compared these two programs along four different metrics. Pleiades-PEG is almost a tenth of the nesC implementation in terms of lines of code. We measured the main application-perceived measure of performance, the error in position estimate on a topological (reduced) map of the environment [49] for both programs. Pleiades- PEG’s average error is comparable to (infact slightly better than) that of Mote-PEG. 43 Figure 2.8: PEG performance comparison. We also measured the latency between when a mote detects an evader and when the cor- responding leader report reaches the pursuer. Mote-PEG has noticeably lower latency than Pleiades-PEG, but for most nodes (about 80%), this latency difference is within a factor of two. This is because our implementation of the transport layer in Pleiades is unoptimized for handling cfor forks and joins, and because our nodecut placement implementation relies on relatively static hop count information. There is scope for improving both significantly. The message overhead for Pleiades-PEG is around 1.4 times that of Mote-PEG. While this is partly due to the transport layer in the Pleiades runtime not being fully optimized, it is still respectable for an auto-generated program. Thus, over all these results are highly encouraging; while they do merit further study, they suggest that Pleiades performance can be comparable to that of node-level programming. Serializability Evaluation. We ran the street-parking application of Figure 2.1 on a 10-node chain mote topology. This topology is an extreme configuration, and thus stresses our serial- izability implementation, because the efficiency of packet delivery in a chain of wireless nodes 44 0 10 20 30 40 50 60 70 80 90 01 23 45 67 89 10 11 Request ID Latency (seconds) Correct execution (SP) Correct execution with induced deadlocks (SPID) Incorrect execution without locking (SP-NL) Incorrect execution under deadlocks (SPID-NR) Figure 2.9: Street parking latency. drops dramatically with the length of the chain. In our experiments, 10 requests for free spots arrive sequentially at the node in the center of the chain. To illustrate the power of Pleiades’s serializability guarantees, and to understand its performance, we ran four different versions of the application: SP-NL, in which we configured the Pleiades compiler and runtime to dis- able locking; SP, which uses the complete Pleiades compiler and runtime for locking, dead- lock detection and recovery; SPID-NR, in which we induced a deadlock into the application and configured the Pleiades runtime to disable deadlock recovery; and SPID, which uses the complete Pleiades implementation with the deadlock-induced application. To improve per- formance, we implemented message aggregation for lock requests and forwarded locks across consecutive nodecuts. As expected, SP and SPID execute correctly, assigning exactly one spot to each request. SPID-NR fails to allocate a spot to all but the first request; in the absence of recovery code, the program deadlocks after the first request. Finally, SP-NL violates the correctness requirements 45 0 5000 10000 15000 20000 25000 30000 01 23 45 67 89 10 11 Request ID Total message cost (bytes) Incorrect execution under deadlocks (SPID-NR) Incorrect execution without locking (SP-NL) Correct execution (SP) Correct execution with induced deadlocks (SPID) Figure 2.10: Street parking message cost. of the application, correctly satisfying the first request, but assigning two free spots in each di- rection of the center node for the next four requests; consequently, it also fails to satisfy the last four requests. Figure 2.9 plots the time taken to assign a spot to the request, and Figure 2.10 plots the total number of bytes transmitted over the network for each request. The same qualitative observations may be drawn from both graphs. SP andSPID message cost and latency increase since successive requests have to search farther out into the network to find a free spot. However, for the initial requests, the overhead of SP is comparable to that of SP-NL. Moreover, SPID message cost and latency are only moderately higher than SP. The difference is attributable to the sequential execution of thecfor threads during deadlock recovery, with rollback overhead being negligible. The periodic spikes in both plots arise because, for even-numbered requests, there are two free spots at the same distance away from the requester that contend to satisfy the request. These two free spots also cause a deadlock in the case ofSPID. Finally, the latency and overhead ofSP-NL 46 flatten out for later requests because they each incur the same cost: they search the entire network for a free spot and fail, because spots were incorrectly over-allocated during earlier requests. Thus, our Pleiades implementation correctly ensures serializability and incurs moderate overhead for deadlock detection and recovery. The absolute overhead numbers imply that even for the request which encounters the highest overhead, the average bandwidth of a node used by Pleiades is around 250bps, with the maximum being 1kbps at the node where the requests come in. This is quite reasonable, considering that the maximum data rate for the TelosB motes is 250kbps. The absolute latency seems modestly high compared to the expected response time for human interactivity. For example, the last request takes almost a minute and a half to satisfy. This is an artifact of the end-to-end reliable transport layer that Pleiades currently uses, which waits for 2 seconds, before trying to resend a packet that has not been acknowledged as received. We believe that the overall latency can be significantly reduced by optimizing the transport layer. The Benefits of Migration. Finally, we briefly report on a small experiment on a 5-node chain that quantifies the benefit of Pleiades’s control flow migration. In this application, a node ac- cesses node-local node-sets from other nodes more than a hop away, so that application-level network information can be gathered. Without migration, the total message cost is 780 bytes, while, with migration, it is 120 bytes. Thus, we see that, even for small topologies, control flow migration can provide significant benefits. 47 2.4 Summary Pleiades enables a sensor network programmer to implement an application as a central pro- gram that has access to the entire network. This critical change of perspective simplifies the task of programming sensor network applications on motes and can still provide application per- formance comparable to hand-coded versions. Pleiades employs a novel program analysis for partitioning central programs into node-level programs and for migrating control flow across the nodes. Pleiades also provides a simple construct that allows a programmer to express concur- rency. This construct uses distributed locking along with simple deadlock detection and recovery to ensure serializability. Together, these features ensure that Pleiades programs are understand- able, efficient, and reliable. Our implementation of these features runs realistic applications on memory-limited motes. 48 Chapter3 FSMGen Understanding the correctness of sensor network applications is difficult, since programmers of- ten have to manage devices and resources at a relatively low-level and be aware of memory, processing and bandwidth constraints. At the same time, these applications are often required to run unattended for long periods of time in harsh environments. Ensuring the reliability of sensor network applications is thus an important problem. The sensor network community has responded to this problem in three ways. The first has been to propose high-level programming techniques that can simplify the application program- mer’s task. These include virtual machines [50], macroprogramming approaches [44, 66], and role-based or state-based programming languages [23, 37]. The second has been to develop run- time monitoring (e.g., Sympathy [70]) and debugging tools (e.g., Nucleus [80], Clairvoyant [85]) that can simplify the process of discovering program errors. The third has been to develop compile-time program analysis tools [75, 16, 15] for catching program errors before execution. If history is any guide, none of these approaches is likely to be a panacea in and of itself. Today programmers in other domains use a variety of tools ranging from compiler analysis to 49 profilers and debuggers, even though much work has gone into raising the level of abstraction from assembly code to visual programming. Similarly, we believe that for sensor networks in the future, it will be useful to have an arsenal of tools to catch or avoid program errors, and our work is an attempt to add to the existing arsenal. In this chapter, we focus on a program analysis tool for TinyOS software written in nesC. TinyOS is the most dominant application-development platform today and is likely to continue to be so in the near future. In the longer term, even if applications were to be written in a higher-level language, there would still likely be a large installed base of TinyOS software that implements the protocols and subsystems executing on sensor nodes. Generally speaking, our goal is to infer a user-readable high-level representation of any com- ponent of a TinyOS program. Such a high-level representation accurately captures system logic while abstracting away platform-specific details. This goal is motivated by the following ob- servation: when programmers write code, there is often a disparity between programmer intent and the functionality embedded in the resulting code. Such a disparity can arise, for example, when the programmer assumes a certain contract of an interface where none exists, resulting in a (possibly latent) program error. TinyOS’s event-driven programming model [51] can exacerbate this problem, since it makes it hard for the programmer to understand the exact sequencing of operations. Our claim is that, when programmers are presented with a high-level representation of TinyOS components they have written, they can much more easily detect such discrepancies. By a TinyOS component we refer to a logical component which may consist of a single TinyOS module, e.g. the Surge module for the Surge application, or of a number of cooperating TinyOS modules, such as the RfmToInt and IntToLeds modules, which cooperate together in 50 the RfmToLeds application. A component can implement application logic, like the above two examples, or a system function, like a routing (MultiHopLQI) or a time synchronization (FTSP). Inferring a high-level representation from arbitrary code is a significant challenge, but we can leverage current practice in developing TinyOS code: anecdotal evidence suggests that many programmers often design TinyOS software using finite-state machines (FSMs). Although the nesC programming language provides no explicit support for state machines, programmers track event execution by explicitly maintaining state information (as we show in Section 3.1). Thus, our specific goal is to infer compact, user-readable FSMs corresponding to TinyOS applications andsystemcomponents. We make the following contributions towards this goal. Novel Program Analysis. The programming languages community has developed many general- purpose techniques for program analysis. Two of these techniques are symbolic execution and predicateabstraction. The former precisely simulates a program’s execution and maintains sym- bolic information about the program, while the latter maps this symbolic information into pred- icates that define distinct program states. Our contribution is two-fold. First, we have adapted symbolic execution to TinyOS’s event-driven programming model. This entailed approximating the flow of control of an application, which is complicated due to the two-level scheduling struc- ture of TinyOS with events and tasks and due to split-phase operations. To address this issue, we employ a simple model of event-driven execution that is precise enough to capture important pro- gram behaviors yet abstract enough to be user-understandable. Second, we have used predicate abstraction to generate compact, user-readable state machines; prior work [2, 32] has focused on generating state machine representations as an internal step within a larger verification effort. 51 Tool Design, Implementation, and Evaluation. We have designed a tool called FSMGen that contains a symbolic execution engine for TinyOS programs and employs predicate abstraction, together with an aggressive state-machine minimization technique, to infer user-readable FSMs for any component of a TinyOS program (Sections 3.2 and 3.3). FSMGen is well-suited to in- fer the higher-level system logic and functionality of components such as, for example, how an incoming message is dealt with in a routing protocol. However, since we use a relatively coarse approximation of the TinyOS event-based execution model, it cannot precisely capture the func- tionality of low-level interrupt-driven code, like that of the timer component in TinyOS, or the radio component. We have applied FSMGen to a variety of TinyOS programs, generating FSMs for components ranging from simple applications like RfmToLEDs, Surge, and TestNetwork to a routing protocol of moderate complexity (MultiHopLQI) and a fairly complex time synchro- nization protocol (FTSP). We qualitatively discuss the performance of the tool and show how the inferred FSMs reveal surprising (and, we believe, previously unknown) aspects of some of these components (Section 3.4). 3.1 Overview In this section we provide an overview of our approach to inferring finite state machines for TinyOS components. We begin by discussing the suitability of FSMs as high level program representations of TinyOS components, provide a definition of an FSM for TinyOS components, and highlight our contributions using a simple example. 52 1 event result_t Timer.fired() { 2 if (initTimer) { 3 initTimer = FALSE; 4 return call Timer.start( 5 TIMER_REPEAT, timer_rate); 6 } 7 timer_ticks++; 8 if (timer_ticks % 9 TIMER_GETADC_COUNT == 0) { 10 call ADC.getData(); 11 return SUCCESS; 12 } 13 task void SendData() { 14 if (..) { 15 if ((call Send.send(..) != SUCCESS) 16 atomic gfSendBusy = FALSE; 17 } 18 } 19 event result_t ADC.dataReady 20 (uint16_t data) { 21 atomic { 22 if (!gfSendBusy) { 23 gfSendBusy = TRUE; 24 gSensorData = data; 25 post SendData(); 26 } 27 } 28 return SUCCESS; 29 } 30 event result_t Send.sendDone(...) { 31 atomic gfSendBusy = FALSE; 32 return SUCCESS; 33 } Figure 3.1: FSM embedded within Surge code 3.1.1 FSMsasabstractionsofTinyOScomponents The event-driven programming model enforced by TinyOS is qualitatively different from the thread-based programming models that most programmers are familiar with. To understand and reason about their TinyOS applications or system components, written in nesC, anecdotal evi- dence suggests that programmers often use a finite-state machine (FSM) based design approach. In such an approach, programmers design applications/protocols as finite state machines and then embed these within nesC code. A TinyOS program may thus consist of multiple FSMs, interacting with one another, each representing the functionality of a single logical component. The programmer maintains state information explicitly as program variables. On receipt of an 53 event, an event handler performs the appropriate action depending on the current state and tran- sitions to a new state by updating variables. Indeed, the developers of TinyOS recognized the relationship between FSMs and the event-driven programming model, as this excerpt from their paper [34] shows: ... in that the requirements of an FSM based design maps well onto our event/command structure. This relationship to FSMs is evident in TinyOS application code. Consider, for example, the snippet of code in Figure 3.1, taken from the Surge application in TinyOS. Surge periodically (on the Timer.fired event) tries to get readings from a sensor. On receipt of the data from the sensor (on theADC.dataReady event), Surge routes it back to the base station (using the Send.send function). The variables initTimer and gfSendBusy represent the explicit state of the Surge application, as maintained by the programmer. In the event handler for the ADC.dataReady event, ifgfSendBusy isTRUE, that implies that a packet is currently being sent, and hence the programmer does not try to send another packet. However, ifgfSendBusy isFALSE, the programmer setsgfSendBusy toTRUE and uses theSend.send command to send the data back to the base station. When the eventSend.sendDone is triggered, i.e. the data has been sent, the programmer resetsgfSendBusy toFALSE. In addition to this state explicitly maintained by the programmer, an application’s high-level FSM is dependent on the set of external events that can be signalled at each point. For example, in Surge, before the commandSend.send is called, the program is in a state where the external eventSend.sendDone cannot occur, sinceSend.send andSend.sendDone form a split- phase operation. When the command Send.send is called, the program moves into a new state, where the set of possible external events includes Send.sendDone. Here, by external 54 1 2 3 4 Timer.fired ADC.dataReady Send.sendDone initTimer = FALSE initTimer = TRUE gfSendBusy = FALSE ADC.dataReady enabled Send.sendDone enabled gfSendBusy = TRUE Waiting for Send.sendDone ADC.dataReady enabled Figure 3.2: FSM derived manually for Surge events, we refer to events which are triggered from outside the component whose functionality is being examined. These may be events which form part of a split-phase operation, like the Send.sendDone event in the above example and hence are triggered indirectly by a command in the component, or events that are not triggered (directly or indirectly) by any action performed within the component, e.g. an event indicating the reception of a packet. Thus, the state of execution of TinyOS components is a combination of explicitly maintained program variables and the set of external events possible at that point in execution. This suggests the following hypothesis, which we validate in this chapter: armed with limited domain-specific information about external events, it is possible to infer a user-readable finite-state machine correspondingto agivenTinyOS applicationor systemcomponent writteninnesC. 55 3.1.2 DerivingFSMsfromTinyOSprograms In this section, we present an overview of our technique for inferring FSMs for various compo- nents from TinyOS programs. Figure 3.2 depicts an FSM for the Surge application. This FSM was manually derived from the Surge application code, by focusing on the Surge component. Each state is intuitively defined by a combination of the explicitly maintained state informa- tion of the component and the set of enabled external events at that point. For example, the initial state of the FSM (denoted by the double circle) represents the situation wheninitTimer is set toTRUE, and theTimer.fired event is enabled. When the event handler forTimer.fired is invoked with the program in state 0,initTimer is set toFALSE and we transition to state 1. Each edge is labelled with an event, along with an optional predicate about the associated event handler’s execution. For example, the transition from state 1 to 2 only occurs when the Timer.fired event occurs and the call toADC.getData() within the associated event han- dler returns SUCCESS. If the Timer.fired event occurs and the call to ADC.getData() returnsFAIL, the program remains in state 1. In Figure 3.2, for the sake of simplicity, we have not labelled the edges with predicates, since the states they connect provide sufficient intuition. One approach to inferring FSMs in the literature [3] is to dynamically monitor a component to capture the order of events and the associated values of program variables when these events occur. This information is then fed to a machine learning algorithm to infer the states and state transitions. However, a dynamic approach is inherently incomplete, since an application can have an infinite number of execution traces. Therefore, the results can easily represent the particular runs of the application that occur during monitoring but fail to capture other program behaviors. 56 We have pursued an alternate approach based on static analysis of an application. Static analysis has the advantage that it can conservatively consider all possible program executions, in- cluding corner cases that could easily be missed in a dynamic approach. Inferring FSMs statically requires two key challenges to be addressed: • How can we obtain precise information about a component’s execution without running the ap- plication? This challenge is exacerbated by the TinyOS execution model, with its asynchronous execution oftasks and the possibility of hardware interrupts at any point. • How can we automatically identify the relevant state information of a component whose FSM we intend to extract, and how can we represent the component’s behavior in terms of this state information once it is identified? We address these challenges by adapting and extending two techniques from the programming languages literature. First, we precisely track the behavior of a component whose FSM we are interested in, via symbolic execution of its TinyOS program. This static analysis technique employs a constraint solver to precisely simulate the program’s execution, maintaining symbolic information about the values of program variables. Unlike a dynamic analysis, symbolic execution conservatively considers the behavior of all possible program executions while pruning many infeasible paths from consideration. We have designed and implemented a generic engine for symbolic execution of TinyOS programs. The next section describes this engine in detail. Second, we use a technique called predicate abstraction to map the program information as tracked by symbolic execution into a finite set of predicates that capture the important state information for the component of interest. The predicates are automatically derived from the 57 branch conditions in the system logic of the component. We discuss this technique in detail in Section 3.3. In addition, we use an adapted version of a well-known FSM minimization technique (the Myhill-Nerode algorithm [61]) to merge “similar” states, resulting (as we show later) in user- readable finite state machines. 3.2 SymbolicExecutionfornesC/TinyOS Symbolic execution is a program analysis technique that statically approximates the behavior of a program. Informally, the technique involves simulating the execution of a program without actually running it, maintaining at each point information about the value of each variable. Be- cause of its generality, symbolic execution has a wide variety of applications for reasoning about programs. We show in the next section how to use the results of our symbolic execution to auto- matically derive finite-state machines for user-specified components from TinyOS programs. Our symbolic execution engine is built as an inter-procedural analysis in the CIL [60] front end for C. Our engine takes as input, the C file generated as part of the building process for a TinyOS application using the nesC compiler. The engine simulates execution of this program starting frommain. We assume the user designates certain modules (and hence the functions in those modules) as interesting, meaning that they are part of the component being analyzed. As we discuss below, uninteresting functions are not traversed during the symbolic execution, but are instead treated conservatively. Symbolic execution is necessarily approximate. For example, it is not possible in general to know the exact value of each variable at every program point. Instead the symbolic execution maintains a symbolic store, which maps variables to symbolic values, which are values that can 58 refer to symbolic constants, denoted c i . For example, at some point we may know that x has the value c x and y has the value c x + 5. Further, it is not possible in general to know which path a program will take at a branch point (e.g., a conditional or loop), so the symbolic execution engine must simulate multiple paths. At each program point, the current path is represented by a set of predicates (which can include symbolic constants) that are assumed to be true. The predicates are simply the branch conditions that led to this point on the path. While symbolic execution has been implemented for C [83, 18], to our knowledge ours is the first symbolic execution engine to handle the unique features of nesC and TinyOS. We first describe the basic technique of symbolic execution, which is relatively standard. Next we describe the ways in which we extended symbolic execution to handle nesC- and TinyOS-specific features. Finally we describe the constraint solver that we use as part of the symbolic execution, in order to prune infeasible execution paths. 3.2.1 BasicSymbolicExecution Let a symbolic state be a pair of a symbolic store and a set of predicates representing the current path. The result of symbolic execution is the determination of a set of symbolic states for each point in the program, representing the possible runtime states that could arise during execution at that point. In the rest of this subsection we discuss how symbolic execution handles standard C language constructs. Assignments To symbolically execute an assignment x :=e, we evaluate e in the current sym- bolic store to some symbolic value v and update the symbolic store so that x maps to v. If the left-hand side is an array updatea[e a ], then the engine tries to evaluatee a to a numeric constant in the current symbolic store. If it is able to do so, then that array element is updated appropriately. 59 Otherwise, all information about the entire array is conservatively removed from the symbolic state. Assignments through pointers are handled similarly. Conditionals The engine invokes a constraint solver to determine the value of the conditional’s guard expression e in the current symbolic state. If the solver determines that the guard is true, then symbolic execution proceeds on the “then” branch, and on the ”else” branch if the solver determines that the guard is false. If the solver cannot determine e’s value, then the current symbolic execution bifurcates. The engine adds e to the set of predicates assumed to be true and continues traversal of the “then” branch. Separately, the engine instead adds !e to the set of predicates assumed to be true and continues traversal of the “else” branch. We use a work queue to keep track of pending paths to be traversed. Functioncalls The function call’s actual argument expressions are evaluated to symbolic values in the current symbolic store. If the function being called is part of an interesting module, then the symbolic store is updated with a mapping from the function’s formals to the symbolic values of the actuals, and traversal proceeds inside the function body. When the traversal eventually hits areturn statement (or the end of the function), control transfers back to the caller, and the returned value (if any) is handled like an assignment statement. If the function is not designated as interesting, then we do not traverse the function body. In- stead we use a precomputed summary of the function body (which we compute before beginning the traversal), which indicates variables in the caller’s scope that might be invalidated by the call, in order to conservatively “kill” facts in the current symbolic state. Our engine currently does not deal with recursive functions. 60 Loops As with conditionals, the engine invokes the constraint solver to determine the value of the loop’s termination condition. If the value is true, then traversal continues after the loop. If the value is false, then traversal continues inside the loop (and returns to the top of the loop upon reaching the end). In this way, we precisely simulate bounded loops (e.g., simple for loops), which we have found to be the common case in TinyOS applications. If the solver is unable to precisely evaluate the termination condition, then we simply traverse the loop exactly once. This is done in order to identify nesC tasks and events that are triggered within the loop (see the next subsection). This simple approach loses information about potentially signaled events, but it has not been a large problem in practice. To continue symbolic execution conservatively after the loop, we invalidate all information in the resulting symbolic state about variables that are potentially modified within the loop body. 3.2.2 HandlingfeaturesofnesCandTinyOS The nesC language and TinyOS platform pose several challenges for performing accurate sym- bolic execution. We discuss the key features of these tools and how our symbolic execution engine handles them. Tasks TinyOStasks are a form of asynchronous function. Posting a task pushes a pointer to the task into a task queue maintained by the TinyOS runtime. Tasks from this queue are dequeued and executed (in FIFO order) whenever there is nothing else running. Our symbolic execution engine mirrors this approach. We augment each symbolic state with a task queue. When we encounter the posting of a task during traversal, we simply treat it as a no-op and proceed to the next statement. However, we add the task to the queue. Once this path of execution has been completely simulated, we pop tasks off the queue in FIFO order and 61 simulate the execution of each in succession. Of course, the simulation of a task may in turn cause more tasks to be added to the queue recursively. Simulation continues until the task queue is empty. Events Our symbolic execution engine must track the events that can fire at any point during the program, in order to properly simulate program execution. There are two main flavors of events in TinyOS, and we handle each in a different manner. First, many events are simply triggered by a direct call from within the program (often from within a task). These events are treated as ordinary function calls, with the traversal continuing inside the corresponding event handler. Second, at each program point, our symbolic execution engine maintains a set of possible external events that may fire. It would be too unwieldy to consider the possibility of these events being handled at each program point. Instead, our engine assumes that such events will only be processed once a prior event handler and all posted tasks have completed their execution and the application is “waiting” for a new event. At that point, our symbolic execution engine explores all possible orders in which the enabled external events may be processed. While this approach can miss potential execution paths, if the programmer ensures that no interrupt handler unwittingly modifies any variables used within a task that it can interrupt, the resulting symbolic state after some missing execution path will be identical to that of some execution path that our model does consider. Possibly for this reason, we have not noticed the loss of precision in practice. Maintaining the set of enabled external events requires tracking two kinds of external events, as described in Section 3.1. External events that are not triggered within the program, but instead can occur at any time, are always considered in the set of enabled events. Apart from these, events 62 forming part of a split-phase operation are considered in the set of enabled external events only if the command triggering them is executed. We assume that the user provides our engine with the set of split-phase event pairs, in order to complete our knowledge about external events. In our experiments, we only needed to provide at most five such event pairs. When a call to a split-phase operation (e.g., Send.send) is encountered during traversal, we traverse the corresponding handler as described above. Upon returning to the caller, we invoke the constraint solver to determine the value of the result. If we determine that the result indicates success, then we add the corresponding external event (e.g., Send.sendDone) to the set of enabled external events. If we determine that the result indicates failure, then the external event is not added to the set. If the value of the result cannot be determined, then we simulate both possibilities. 3.2.3 ConstraintSolver As mentioned above, we use a constraint solver to determine the values of predicates during the traversal, in order to prune infeasible paths. Rather than building our own customized tool, we use an off-the-shelf constraint solver, CVC3 [78]. This tool incorporates decision procedures for a variety of logical theories, including propositional logic, linear arithmetic, bit vectors, arrays, and structures. To determine the value of a predicate e in a given symbolic state, our engine automatically mirrors the symbolic state as axioms that are provided to CVC3. For example, if the symbolic store mapsy to the symbolic valuec x + 5, then we declare variablesy andc x in CVC3 along with the axiomy==c x +5. Similarly, each predicate in the symbolic state’s set of assumed predicates is translated into a CVC3 axiom. 63 Finally, the predicate e is translated to CVC3 and posed as a query. If CVC3 indicates that e is valid in the context of the given axioms, then we know that e has the value true at this point. Otherwise, we pose !e as a query to CVC3. If CVC3 indicates that !e is valid in the context of the given axioms, then we know that e has the value false at this point. Otherwise, we consider the value of e to be unknown. 3.3 DerivingStateMachineswithFSMGen Figure 3.3 describes the overall structure of FSMGen, our tool for automatically inferring FSMs for TinyOS applications or system components. In addition to the TinyOS program, FSMGen requires domain-specific information about commands and events that are split-phase, as men- tioned earlier, since this information is not derivable from the code. FSMGen also requires the user to annotate modules to be consideredinteresting for the component whose FSM they want to extract, and to list the events they want included in the resulting state machine. FSMGen interacts with the symbolic execution engine described in the previous section to obtain symbolic states at various program points of interest. It also reuses that engine’s constraint solver to perform predicate abstraction, which maps each symbolic state to a state of the resulting FSM. Finally, FSMGen employs a minimization procedure to make the FSM compact and user-readable. We first describe howFSMGen derives and uses a finite set of predicates as the basis for each state in the FSM. We then describe the algorithm thatFSMGen uses to build the FSM, utilizing the symbolic execution engine and predicate abstraction. Finally, we describe FSMGen’s algorithm for minimizing the state machine produced by the previous step. 64 1 2 3 4 information Module Predicate Abstraction Minimizer FSM Constraint Solver query predicate result result symbolic state state FSM event, FSMGen Main Traversal Engine states Input "Interesting" Split−phase Call Output result symbolic predicate query Symbolic symbolic state Execution Machine State Finite modules, events program TinyOS User−readable FSM Engine Figure 3.3: Structure of FSMGen 3.3.1 PredicateAbstraction As described in Section 3.1, each state in the FSM should represent a set of predicates about the program state. We use a simple but effective approach to deriving the appropriate predicates to employ. First, we collect the set of predicates used as guards in conditional expressions within modules declared interesting. Intuitively these predicates are important since they determine the flow of control through the interesting modules, thereby also determining how program state is updated and which events are signaled. Since the states in the FSM are global to the entire application, we remove from this set any predicate that does not refer to a global variable or a formal parameter of a function. Second, we introduce one additional predicate for each split-phase operation provided by the user, which tracks whether we are in the middle of such an operation (e.g., send has been 65 signaled and we are waiting for asendDone event). These predicates effectively track the set of enabled external events at any program point. Let us denote this set of predicates as {e 1 ,...,e n }. These predicates induce an FSM with 2 n states, one for each possible valuation to then predicates. FSMGen employs our symbolic execu- tion engine to determine the relationships among these states, as described below. A key piece of FSMGen’s algorithm is predicate abstraction, which maps a symbolic state obtained from sym- bolic execution to the corresponding FSM state. Given a symbolic state, we employ the constraint solver as described earlier to obtain the valuation of each of the predicates in {e 1 ,...,e n }. Since in general some of these predicates might not be known to be either definitely true or definitely false in the given symbolic state, predicate abstraction in fact maps a symbolic state to a set of FSM states in which the program might be. 3.3.2 GeneratingtheFSM To begin, FSMGen uses the symbolic execution engine to analyze the main() function of the program. Predicate abstraction is applied to each of the returned symbolic states, and the resulting FSM states form the initial states of the FSM. Each FSM state is put on a work queue. Further, for each FSM state,FSMGen records the associated symbolic states and the list of enabled events at this point, both of which were obtained from the symbolic execution engine. After this initialization phase, the main loop of FSMGen begins. An FSM state is removed from the work queue, and the symbolic execution engine is asked to simulate each enabled event, starting from each recorded symbolic state. For each such query, the symbolic execution engine returns a list of new symbolic states as well as the new set of enabled events. FSMGen employs predicate abstraction on each symbolic state and adds a transition to the FSM from the original 66 FSM state to each resulting FSM state, labelled with the simulated event. The label also includes any new predicates that are part of the symbolic state returned from the symbolic execution en- gine, which represent the conditions under which the state is reached when that event is invoked. If this transition does not already exist in the FSM, then the new FSM states are added to the work queue. The algorithm continues in this way until the work queue is empty. We choose to start symbolic execution from the recorded symbolic state rather than the FSM state, for each state in the work queue, in order to have access to the extra information provided in the symbolic state. This extra information allows us to statically prune away transitions which may never be taken at run-time, and hence generates more accurate transitions. However, we only put the resulting FSM state onto the work queue if this FSM transition does not already exist, even if the symbolic state has changed. This choice can cause us to miss possible edges in the FSM. An example of this limitation was observed for the FTSP protocol described in Section 3.4. We are currently exploring ways to balance this tradeoff between the precision and completeness of our algorithm. 3.3.3 MinimizingtheFSM Finally, we employ a variant of the Myhill-Nerode FSM minimization algorithm on the FSM resulting from the above algorithm. The basic idea of that algorithm is to identify equivalence classes of FSM states that can be merged without loss of information. The algorithm works by initially assuming that all states belong to one equivalence class. It then looks at each pair of states(s 1 ,s 2 ) to see if they can in fact belong to the same equivalence class. In the Myhill-Nerode algorithm, this is the case if they agree on their outgoing edges. For example, ifs 1 has an outgoing edge labelledl to states 3 , thens 2 must also have an outgoing edge labelledl to a state in the same 67 equivalence class as s 3 . A label in our context is a pair of an event and the associated conditions under which this edge is taken. Our algorithm proceeds similarly, except that we place one additional requirement on each pair of states: If they have incoming edges from equivalent states labelled with the same event, then the labels must also agree on the associated conditions. Myhill-Nerode does not constrain incoming edges, since this does not affect the language accepted by the FSM. However, in our setting we care not only about the language accepted by the FSM, but also about what predicates hold at each point during program execution (i.e., which state we are in). We have found this new requirement on incoming edges to be a useful heuristic for minimizing FSMs while retaining important state information. However, it also causes less minimization than would otherwise be performed, so we allow the user to disable it. 3.4 Results In this section, we describe our evaluation of FSMGen. Our evaluation is qualitative and aims to demonstrate the practicality ofFSMGen, the compactness and user-readability of the resulting FSMs even for some sophisticated programs, and the utility of FSMGen in highlighting interest- ing and sometimes unexpected features of popular TinyOS applications and protocols. In addi- tion, we discuss various aspects of symbolic execution, predicate abstraction, and minimization that manifest themselves in the generated FSMs. We used FSMGen to infer FSMs for many TinyOS applications and system components. The selected TinyOS programs covered a range of complexity, from simple applications like RfmToLeds, to complex protocols like FTSP [57]. FSMGen took at most 15 minutes to analyze 68 all but one program. We discuss this exception later in the section. None of our inferred FSMs exceeds 16states. RfmToLeds Figure 3.4 depicts the FSM inferred byFSMGen for the RfmToLeds application in TinyOS-1.x. This application listens for packets containing a byte-sized value. When it receives such a packet, the application activates mote LEDs in the appropriate binary pattern. Our FSM captures this functionality accurately. State 0 is the initial state where the program waits to receive a packet. On receiving the packet, depending on the value contained therein, it moves into one of the others states, turning on/off the appropriate LEDs. Throughout this section, our graphical depictions of the FSMs include state and edge labels that are slightly simplified, using some information from the application code for expository pur- poses. For example, the labels on each state in Figure 3.4 indicating the corresponding LED configuration in fact correspond to conditions on edges in the FSMGen-generated FSM. Fig- ure 3.5 zooms in on the two transitions between states 0 and 4, showing the actual edge condi- tions in the FSM output by FSMGen.Here, value is a local variable within the event handler forReceive.receive, which depends on the packet received. We emphasize that, to a pro- grammer familiar with the actual code, the output ofFSMGen is highly readable. This application also illustrates another feature of FSMGen. There exists a correct, but less informative, 2-state FSM for this application: in the initial state, the program waits for a packet, and a second state in which it activates LEDs. FSMGen can generate this more compact FSM using the unmodified Myhill-Nerode minimization algorithm described in Section 3.3. Surge Figure 3.6 shows the FSM generated for the Surge example that we had described in Section 3.1. When we compare this with the manually-produced FSM in Figure 3.2, we notice 69 Receive.receive IntOutput.outputComplete 1 2 3 4 5 6 7 80 LEDs = 001 LEDs = 011 LEDs = 111 LEDs = 000 LEDs = 100 LEDs = 101 LEDs = 110 LEDs = 010 Figure 3.4: FSM for the RfmToLeds Application that most of the states and transitions in the two FSMs match, but the FSM generated byFSMGen has two extra states, 4, and 6. Interestingly, once the program has moved into either states 4 or 6, it stays in one of those states. These two states and the associated edges represent a path of execution that we did not expect to encounter. To understand this execution path, we examined the application code. The only way for the program to move into state 4 is via the edge from 2 to 4 on theADC.dataReady event. This 4 No condition 0 (value & 1) && (value & 2) && !(value & 4) Receive.receive IntOutput.outputComplete Figure 3.5: A state transition as generated byFSMGen for the RfmToLeds FSM 70 1 3 gfSendBusy = TRUE 2 4 6 5 gfSendBusy = TRUE 0 Timer.fired ADC.dataReady Send.sendDone Send.sendDone enabled Waiting for Send.sendDone ADC.getData enabled ADC.getData enabled gfSendBusy = FALSE ADC.getData enabled initTimer = FALSE initTimer = TRUE Figure 3.6: FSM for the Surge Application transition is taken in the tasksendData when the value returned by the callSend.getBuffer is 0. When this happens, the program exits the task without sending any data, but does not reset gfSendBusy toFALSE, so the program incorrectly assumes that data is being sent, and remains waiting forsendDone. Is this a program error? Quite possibly. Surge uses MultihopLQI, which provides the getBuffer interface. In the current implementationgetBuffer never returns 0, so the edge from state 2 to 4 would never be taken. However, the programmer seems to have anticipated the fact that if the underlying implementation were to change,getBuffer might return 0, and added a check for the return value in the code. But in forgetting to reset thegfSendBusy vari- able to FALSE, the programmer has introduced an anomaly (at best, and a latent bug, at worst) into Surge, one that was not readily apparent upon manual inspection of the code. MultiHopEngine MultiHopEngine is the component which acts as a packet forwarding engine for the MultiHopLQI and MintRouting routing protocol implementations in TinyOS-1.x. This 71 component provides aSend interface for the programmer to send packets to it. It then forwards these packets to theSendMsg interface, which sends them over the network to the next hop in the routing tree. Also, it forwards packets received from the network over the ReceiveMsg interface to theSendMsg interface. Unlike our prior examples, MultiHopEngine is not a stand- alone TinyOS application, but a system component. Figure 3.7 represents the FSM generated by FSMGen for MultiHopEngine. We can see that FSMGen is able to generate a compact as well as accurate FSM for Multi- HopEngine. It is able to capture the behavior of the component when theSend.send command is called in state 4 by the application that uses this component. In state 0, the self-loops for the Send.send denote cases in which the message is not sent either because the packet size ex- ceeds TOSH DATA LENGTH, or the node on which the application is running does not have a parent in the routing tree, or an attempt to send the packet on the radio fails. Only when the radioSendMsg.send succeeds does the component move into state 1 , where it waits for the SendMsg.sendDone event to occur. MultiHopEngine provides a Receive interface for applications. An application program- mer expects that an application running on the root node could use the Receive interface to receive packets sent up the tree. In fact, as we discovered by examining the FSM inferred by FSMGen, the MultiHopEngine implementation does not satisfy this expectation. We noticed that all edges for the ReceiveMsg.receive event in the FSM have a condition involving the calling of the SendMsg.send event, i.e. SendMsg.send is always called within the ReceiveMsg.receive event handler. This implies that for all packets received at any node, the packet is sent out on the network interface, never up to the application. Indeed, upon ex- amining the application code, we found that at the root node, the packet is sent to the UART. 72 1 2 3 4 Send.send ReceiveMsg.receive SendMsg.sendDone Msg send failed Send.send called 0 Remote msg forwarded SendMsg.sendDone enabled SendMsg.sendDone enabled Figure 3.7: FSM for MultiHopEngine A regular contributor to the TinyOS community expressed surprise at this finding, and we have verified that the TinyOS 2.x forwarding engine does not exhibit this behavior. We suspect that MultiHopEngine is always used with the root connected to a base station, and never with an application at the root node wired to MultiHopEngine. FTSP To see how FSMGen performed on extremely complex programs, we ran it on the code for FTSP, a popular time synchronization protocol [57]. The code for FTSP contained around 46 branching conditions, of which 19 were part of the predicate abstraction. The FSM for FTSP before minimization had 255 states, and after using the unmodified Myhill-Nerode algorithm for more aggressive minimization, has 16 states (Figure 3.8). We have verified that the FSM reflects the expected behavior but will refrain from discussing it since that requires a detailed description of the FTSP protocol. However, an interesting feature of the FSM is that all states have edges to state 6 on the event ReceiveMsg.receive, on the condition that although the node should be synchronized, the packet it receives has an error 73 Timer.fired 2 3 5 4 1 9 10 8 7 14 11 15 12 13 ReceiveMsg.receive SendMsg.sendDone 6 state Unsynchronized 0 Figure 3.8: FSM for FTSP greater than the allowable error. This condition basically implies that the node has become un- synchronized and needs to clear its table of timestamp entries. Remarkably,FSMGen produces a single “error” state for this, even though this state is not evident when inspecting the code. This FSM also illustrates a limitation of our symbolic execution engine, that we described in Section 3.3.2. In FTSP, when a node does not receive a beacon within a fixed number of timer events (five in the current implementation), the node sets itself as root and starts send- ing out beacons. Thus, the value of the predicate controlling this change in state, will only change after the Timer.fired event is fired for the fifth time. However, FSMGen simulates the Timer.fired event in this state only twice before stopping since it realizes that no new states are being introduced. HenceFSMGen is unable to capture this transition. 74 The FTSP example also illustrates another facet of FSMGen. It took nearly 24 hours for generating the FSM for FTSP. For us, this is not a cause for concern, for two reasons. First, FSMGen is not intended to be a frequently-used interactive tool. Rather, we expect programmers will use it occasionally when they make large-scale changes to system logic, or as part of a regression testing suite. Second, the bottleneck in FSM inference is symbolic execution, and well-known optimizations exist to scale this up to large programs [83]. We intend to implement these in a future version of the tool. TestNetwork The TestNetwork application comes as part of the TinyOS-2.x distribution. It periodically sends packets up a collection tree rooted at the base station, using the Collection Tree Protocol (CTP). The sending rate is configurable via dissemination. We chose this application to demonstrate thatFSMGen can easily be extended to TinyOS-2.x. We generated an FSM for TestNetwork, which can be seen in Figure 3.9. While in TinyOS- 1.x, the radio and other services are started using the simple start() function provided in theirStdControl interface, in TinyOS-2.x, thisStdControl interface has been replaced by the SplitControl interface. Thus, in TinyOS-2.x the start calls for Radio, Serial and other such devices form part of a split-phase operation. For example, calling start for the Radio component has to be followed by a startDone event whose argument is set to SUCCESS.If the argument is set toFAIL, that implies that the radio has not been started and that no packets can be received yet. By contrast, in TinyOS-1.x, after callingstart we could safely assume that the radio was operational. Hence, we needed to add this domain-specific information toFSMGen in order to generate FSMs which captured this. We again used the aggressive version of our minimization algorithm to generate this FSM, since although the size of the TestNetwork code is 75 1 2 3 5 4 6 Serial started Radio startDone failed Call Radio start Radio started SendBusy = FALSE Space in queue, memory for recvd. msg Waiting for Send.sendDone Space in queue, memory for recvd. msg 0 Send.sendDone Timer.fired Receive.receive SerialControl.startDone RadioControl.startDone DisseminationPeriod.changed Call Radio start Send.sendDone enabled Figure 3.9: FSM for TestNetwork reasonably small, we wanted to track the transitions due tosix different events, the largest number of events in any of the programs we have used for this evaluation. In Figure 3.9 we can see that the startup process for the application is captured quite nicely. Also, for transitions onTimer.fired we see that the program checks its state to see if the radio is busy, and then also checks to ensure that the Send.send function was called successfully. This is quite similar to the Surge application in TinyOS-1.x. The TestNetwork also receives packets over the radio, and if it has free space in its memory pool and the sending queue, pushes them in the queue to send over the serial port. If it has no space, it simply drops them. We note here that for states 3 and 5, theReceive.receive event causes the program to transition to other states if there is space in the message pool. However, for states 4 and 6, the program always remains in the same state upon a Receive.receive event. We verified by understanding the state machine and looking at the code, that there should be no difference in how Receive.receive is handled by any of the above states. This inconsistency is due to the fact that we disable our requirement on incoming edges during minimization and hence lose 76 some information in the process. Ideally, for 4 and 6, if there was no space to store the packets, the program should have transitioned to a new state. However, using aggressive minimization was important here, as otherwise the number of states would have increased to 20. Thus there is a clear and obvious tradeoff between the size and readability of the state machines, and the overall functionality captured by FSMGen. Summary. Thus, overall, FSMGen after being tested on a number of applications and other components, performed quite well, and was in a couple of cases, even able to capture incon- sistencies in code written by others. It managed to generate a respectable state machine for a complex component like FTSP, and also worked well for TestNetwork, a TinyOS-2.x application with a large number of events. 3.5 Summary In this chapter, we have tackled the problem of inferring compact, user-readable FSMs for ap- plications and system components from TinyOS programs. Our FSMGen tool uses symbolic execution and predicate abstraction to statically analyze implementations in order to infer FSMs. OurFSMGen tool uses a coarse approximation of the event-driven execution model of TinyOS, and hence the resulting FSM may not represent all possible execution paths. Through experi- ments, however, we have shown that this optimistic analysis provides FSMs that are both user- readable and detailed. We have tested FSMGen for a number of applications and system compo- nents and found that the inferred FSMs capture the functionality of the target applications quite well, and reveal interesting (potential) program errors. We suspect however, that this model may not be applicable to low-level interrupt driven code. 77 Chapter4 MAX That network protocols are vulnerable to attacks is well-known. One class of attacks exploits bugs or inadequate defenses in protocol implementations to induce crashes or buffer overflows. Such attacks are usually mounted by sending unexpected or malformed messages to the victim. In another class of attacks, (malicious or buggy) nodes unilaterally choose to not follow the protocol, benefitting themselves or hurting others. For instance, a TCP sender can ignore congestion control and flood the network. Over time, research has led to a good understanding of these classes of attacks. Beyond case- by-case fixes to each attack, researchers have developed general tools and techniques to detect and (where possible) fix classes of vulnerabilities [55, 59, 7, 63]. Examples include tools to automatically generate vulnerability signatures to filter malicious inputs and track the influence of tainted inputs on critical segments of the code. In this chapter, we focus on a different and a more subtle class of attacks that we call ma- nipulation attacks. In these attacks, one or more adversaries modify the messages they send to some of the honest participants (called targets) to induce a behavior that benefits the adversaries or harms the rest of the participants. These message manipulations do not exploit implementation 78 bugs like buffer overflows; they induce, in the targets, perfectly valid protocol behavior that might have occurred at some point during the targets’ execution. The manipulations exploit the fact that the target does not have a complete view of the network (e.g., the degree of congestion) and is thus forced to trust the adversaries. Often, a single message manipulation is not sufficient to alter the targets’ view of the network or substantially alter the protocol dynamics, and messages may need to be repeatedly manipulated to mount an effective attack. Several manipulation attacks have been identified in real-world protocols. Savage et al. show that a TCP receiver can manipulate the sender in several ways into sending faster than the rate dictated by congestion control [72]. Ely et al. show that an ECN (Explicit Congestion Notifica- tion) receiver can manipulate the sender into ignoring congestion by simply flipping a bit [21], undermining the goal of the protocol. Manipulation attacks have been identified for the 802.11 MAC protocol as well [6]. Our goal is to help protocol developers automatically identify manipulation attacks in real protocol implementations. This capability does not exist today—all attacks to date have been discovered through manual inspection. Two features of manipulation attacks make it difficult to identify them. First, in these attacks, adversaries induce targets to perform actions (e.g., increase congestion window) that they might have performed anyway under appropriate network condi- tions. Second, each individual manipulation by an adversary may appear relatively innocuous, and noticeable harm occurs only when it is combined with other actions or repeated many times. In this chapter, we take a first step towards automatically identifying manipulation attacks. We describe the design and implementation of a tool, called MAX (Manipulation Attack eXplorer), that takes as inputi) a protocol implementation;ii) a set ofvulnerable actions (e.g., increasing the congestion window in a TCP sender) in the protocol whose exploitability the protocol developer 79 wishes to explore; andiii) metrics of interest (e.g., throughput) based upon which a manipulation attack should be declared successful. Given these inputs MAX automatically determinesi) which (if any) message modifications would allow an adversary to force the target into executing the vulnerable actions; ii) if vulnerable actions can be forced repeatedly; and iii) if that impacts the metrics of interest in a way that significantly benefits the adversary and/or harms the target. MAX adapts and combines techniques from program analysis and software testing to address the challenges of finding manipulation attacks in large and complex protocol implementations. It employs a form of symbolic execution [42], a general technique for analyzing software that has been recently used successfully for a variety of purposes [39, 46, 48, 9]. MAX adapts symbolic execution to discover modifications to messages from adversaries that may lead to the execu- tion of vulnerable actions at the targets. MAX then employs a form of adversarial system-level testing to produce manipulation attacks using information derived from symbolic execution. It repeatedly executes a complete network configuration specified by the user, but intercepts and manipulates messages to mimic the behavior of an adversary and drive the target to perform a vulnerable action. MAX performs a number of optimizations to scale. For example, in certain cases MAX compromises the precision of symbolic execution in favor of speed, assured that any false positives will be eliminated during concrete execution. The end result is that MAX is able to scale to a full-fledged implementation of TCP. We demonstrate MAX’s efficacy by applying it both to real protocol implementations and simulator code. MAX automatically determines that Daytona [69], a user-level TCP implementa- tion, is vulnerable to the optimistic ACKing attack [72], and is able to successfully demonstrate the functioning of this attack. MAX is also able to find two distinctly different ways of manip- ulating a node in the Qualnet 802.11 implementation, to cause it to refrain from sending frames 80 and hence decrease its airtime. We also demonstrate, in the context of the ECN protocol, how MAX may be used not only to identify vulnerabilities but also to help developers gain confidence that the protective measures they take against manipulation attacks are sufficient. 4.1 ManipulationAttacks: DefinitionandChallenges 1 void ack_received(char * buffer){ 2 int num_pkts_to_send; 3 if(num_pkts_in_flight == 0) 4 return; 5 num_pkts_in_flight--; 6 num_pkts_sent++; 7 int ecn_bit = 8 ((packet * )buffer)->ecn_bit; 9 //if ecn bit is zero, send two 10 //packets in response else send none 11 if (ecn_bit == 0) 12 num_pkts_to_send = 2; 13 else 14 num_pkts_to_send = 0; 15 if(num_pkts_to_send > 0) 16 send_pkts(); 17 } (a) ECN Sender 1 void pkt_received(char * buffer) 2 int ecn_bit = 3 ((packet * )buffer)->ecn_bit; 4 packet * ack_buffer; 5 num_pkts_rcvd++; 6 ack_buffer = create_ack(buffer); 7 8 //if the ecnbit is non-zero, 9 //set it in the ack header 10 if (ecn_bit != 0) { 11 ack_buffer->ecn_bit = ecn_bit; 12 } 13 14 //send the ACK to the sender 15 send_ack(ack_buffer, 16 ((packet * )buffer)->src); 17 } (b) ECN Receiver Figure 4.1: Code for ECN We address the problem of automatically discovering manipulation attacks in protocol im- plementations. Our goal is to build a tool that provides systematic exploration for manipulation attacks in any communication protocol. In this section we describe our notion of manipulation attacks more precisely and present a simple example of such an attack. We also discuss the chal- lenges of discovering manipulation attacks in protocol implementations and building a tool which is general,scalable, and precise. ManipulationAttacks. We consider the following attack scenario. We assume the given protocol has two or more independent participants. One or more of these participants are adversarial and 81 may deviate from the protocol arbitrarily, while the other participants are assumed to be honest and protocol-compliant. Amanipulation attack has the following two characteristics: 1. The attack involves manipulating one or more otherwise honest participants (calledtargets) to change their behavior either for the benefit of the adversaries or to the detriment of honest participants. (This rules out attacks where a rogue TCP endpoint is unilaterally malicious, for example.) 2. The targets are manipulated into exhibiting behavior that is actually protocol-compliant behavior under some circumstances and network conditions. (This rules out crash attacks, for example.) Because they are subtle and indirect, most manipulation attacks have a third characteristic: 3. A single manipulative act is insufficient to mount an attack. Instead, the manipulation must be able to be repeated many times in order to be considered successful. While the above definition of manipulation attacks applies to multiple targets, we focus in this chapter only on attacks that involve a single target. Extending our approach to multiple targets is left for future work. A number of manipulation attacks have been manually discovered in widely used network protocols. For example, a malicious TCP receiver can obtain an unfair share of bandwidth by sending duplicate acknowledgements, sending byte-level acknowledgements rather than packet- level acknowledgements (ACK division), or by acknowledging packets before they are received (optimistic ACKing) [72]. 802.11 implementations have been found to be vulnerable to manipu- lation attacks that prevent honest nodes from gaining access to the channel and thus starve them; 82 these attacks work by setting a very large value in theduration field in the MAC header for RTS, data and ACK packets [6]. We use the ECN (Explicit Congestion Notification) protocol to illustrate a simple manip- ulation attack. This will also serve as a running example later in the chapter to illustrate the conceptual ideas behind our approach. ECN was designed to enable the network to notify TCP endpoints of the onset of congestion. Let us consider an ECN-enabled TCP sender and receiver within an ECN-capable network. When a packet sent from the TCP sender encounters conges- tion, the congested router marks the ECN bit in the TCP header of the packet. When the TCP receiver receives this packet, it copies the marked TCP header into the acknowledgement packet that it sends out. Hence, the TCP sender on receiving this marked acknowledgement packet, is notified of congestion in the network without causing a packet drop, and can adjust its congestion window accordingly. Figure 4.1 shows snippets of code for a simplified version of ECN we wrote as a simulator. The sender, on receiving an acknowledgement from the receiver, checks the ECN bit in the header, and if it is set, does not send out any new packets. If the ECN bit is not set, indicating that there is no congestion in the network, the sender sends two new packets to the receiver (Figure 4.1(a)). The receiver, on getting a packet from the sender, obediently reflects the ECN bit from the packet header into the acknowledgement packet that it sends back to the sender (Figure 4.1(b)). We now describe a possible manipulation attack for this implementation of ECN [21]. Let us assume that the ECN receiver is the adversary, while the ECN sender is honest and is the target. To mount a manipulation attack, a malicious ECN receiver modifies its interactions with the ECN sender such that it never sets the ECN bit to 1. By doing this, the ECN receiver manipulates the sender into believing that the network is not congested (even if it is), and hence sending out two 83 new packets every time it receives an acknowledgement. This manipulation causes an increase in throughput for the receiver. The above attack exhibits all three characteristics of a manipulation attack described above. The malicious ECN receiver manipulates the behavior of the honest ECN receiver for its own benefit. Furthermore, this manipulated behavior is protocol-compliant under some circumstances, namely when there is no congestion. For this reason, the attack is hard for the ECN sender to detect. Finally, the manipulation must be repetitively performed to be useful. Modifying a single message only causes two extra messages to be sent, which will not improve the receiver’s throughput by much. Challenges. To automatically identify manipulation attacks, our approach examines actual pro- tocol implementations. We face several competing challenges in building an effective tool for exploring manipulation attacks in real network protocol implementations. First, our tool must be precise. Manipulation attacks are inherently subtle. A practical tool must be able to analyze complex protocol implementations precisely enough to capture the possible effects of various manipulations. If the tool is too imprecise, it risks marking manipulations incorrectly as attacks. Furthermore, in general, multiple interactions between the adversaries and the target must be analyzed in order to detect a manipulation attack vulnerability. Second, our tool must bescalable enough to be practical for use on real protocol implemen- tations with minimal modification. Such implementations can contain complex low-level code and can be as large as tens of thousands of lines of code (e.g., the Linux implementation of TCP is nearly 50K lines of code). Moreover, the need for scalability is in direct conflict with the need for precision described above. 84 Finally, our tool must begeneral. It should be able to handle a diverse set of network proto- cols, including transport protocols like TCP, MAC protocols, routing protocols, and application- layer protocols like peer-to-peer protocols. Also, manipulation attacks can be diverse, with ma- nipulations taking a variety of forms and targeting a variety of behaviors in honest participants. They can also lead to a variety of malicious results (violation of protocol invariants, resource star- vation of honest participants, etc.) and can have varied metrics for success (throughput increase for the manipulator, packet jitter increase for the honest participants, etc.). An alternative approach to analyzing protocol implementations would have been to exam- ine protocol specifications instead, but we chose the former for several reasons. First, even if a specification is theoretically impervious to certain manipulation attacks, inadvertent program- ming errors can potentially allow such attacks to occur in a protocol’s implementation. Second, specifications are typically incomplete and imprecise in several ways. Therefore different imple- mentations of the protocol will inevitably make somewhat different design choices, leaving each potentially vulnerable to a different set of manipulation attacks. Thus, not all of the manipulation attacks for TCP described above were successful on all TCP implementations. 4.2 ExploringManipulationAttacks In this section we describe our approach to systematically exploring manipulation attacks in net- work protocol implementations. Our exposition focuses on the conceptual contributions, leaving to Section 4.3 the details of a tool that embodies our ideas. 85 We make the following assumptions about protocol participants: 1. Honest participants faithfully process incoming packets in accordance with the protocol implementation. 2. One or more adversaries manipulate a subset of the honest participants, thetargets, only by modifying the protocol messages they send. Our current design assumes that there is only one target; we leave to future work extensions to handle multiple targets. 3. The adversaries know, at each instant, the internal state of the target. This models an adver- sary with more capabilities than most real-world adversaries are likely to have.However, some adversaries may be able to guess the internal state of the target based on out-of-band information, thus approaching the capabilities of the adversary in our model. Moreover, as we discuss later, we allow the user to explore weaker attack models by selectively hiding the target’s internal state. Our approach is designed for use by protocol developers and testers. These users provide two main inputs to facilitate systematic exploration of manipulation attacks: VulnerableStatements Vulnerable statements are code points in the implementation of the tar- get that perform actions potentially vulnerable to attack (e.g., allocating resources or send- ing packets). Our approach repeatedly drives the target to a vulnerable statement in order to determine whether that leads to a manipulation attack. PerformanceMetric(s) One or more metrics that measure the protocol performance parameters (e.g., throughput, wireless airtime, memory use) which adversaries hope to impact. Our 86 approach lets the user examine and compare these metrics for runs with and without ad- versaries, in order to determine whether the implementation is vulnerable to manipulation attacks and its impact. In addition, the users also provide the network configuration they want to test. This configuration includes the network topology and traffic as well as the identities of the target, other honest participants, and adversaries. We believe it is reasonable to expect protocol developers and testers to have sufficient knowl- edge of the protocol to provide the inputs above. As discussed later, we also allow users to interac- tively experiment with a variety of vulnerable statements, performance metrics, and network configurations. Our approach uses a novel adaptation and combination of two program analysis techniques. These techniques were originally developed to find bugs in software systems, but we adapt them for our needs. SymbolicExecution This technique statically analyzes a program’s execution and identifies the conditions under which each code path is taken. We use it to derive a small set of feasible paths that lead to the vulnerable statements in the target’s protocol implementation, along with constraints on message inputs and internal protocol state that cause execution to follow those paths. AdversarialConcreteExecution We use a form of system-level testing to produce manipula- tion attacks given the information from symbolic execution. Specifically, we execute the complete protocol but repeatedly intercept messages sent from adversaries to the target and 87 modify them to drive execution to a vulnerable statement. These modifications mimic an adversary that crafts and sends those messages. We now describe each technique and illustrate each in turn using our ECN example. Enumerating Feasible Paths with Symbolic Execution. Symbolic execution [42] statically simulates the execution of a program or function, using a symbolic value σ x to represent the value of each variablex that is considered an unknown input (e.g., a function parameter). As the symbolic executor runs, it updates a symbolic store which maintains information about program variables. For example, after the assignmenty=2 * x the symbolic executor does not know the exact value ofy but has learned that its value is 2σ x . At branches, symbolic execution uses a constraint solver to determine the value of the guard expression, given the information in the symbolic store. This ensures that the symbolic executor only explores feasible paths. If there is insufficient information to determine the guard’s value, both branches are explored. In this way, symbolic execution produces a tree of possible program executions; each path through this tree can be summarized with a path condition, which is simply the conjunction of branch choices made to go down that path. Our approach uses symbolic execution on the target’s protocol implementation to explore feasible paths that lead to vulnerable statements. We treat incoming protocol messages as well as internal protocol state as input and give them symbolic values initially. Consider the ECN sender code in Figure 4.1(a) and assume the user chooses the function call send pkts() in line 16 as the sole vulnerable statement. Of all the possible paths through this code, symbolic execution determines that there is only one feasible path reaching the vulnerable statement, with associated path condition !(num pkts in flight == 0) && ecn bit == 0 && num pkts to send 88 >0. In words, this is the path that takes the else branch at line 3, the then branch at line 11, and the then branch at line 15. All other paths either can never lead to the vulnerable statement (e.g., taking the then branch at line 3 or the else branch at line 15) or are determined to be infeasible (e.g., taking the else branch at line 11 sets num pkts to send to 0, which prevents the then branch at line 15 from being taken). We convert each path condition into an input constraint, which is a predicate on symbolic values (i.e., on the incoming protocol message and internal protocol state) that is sufficient for execution to go down that path. This conversion is straightforward given the symbolic store maintained by the symbolic executor. For the sole path condition described above in the ECN example, the input constraint is!(σ num pkts in flight == 0) && σ buffer−>ecn bit == 0. The number of paths explored in symbolic execution is exponential in the number of program branches and potentially unbounded in the presence of loops. Thus, ensuring that symbolic exe- cution can scale to complex implementations is a challenge. We use two techniques to improve its scalability. First, we perform a simple initial pass over the code to identify branches that can never lead to a vulnerable statement, e.g., the then branch at line 3 in Figure 4.1(a). Symbolic execution can then avoid exploring any path that involves such a branch. Second, we allow users to identify relevant parts of the protocol implementation and only symbolically execute that por- tion of the code, conservatively approximating the behavior of other portions. While doing so can cause infeasible paths to be considered potentially feasible by the symbolic executor, it greatly enhances scalability, and any false positives will be pruned during the concrete execution step that we discuss next. AttackConstructionthroughAdversarialConcreteExecution. We use system-level concrete execution of the protocol implementation to construct possible manipulation attacks, using the 89 input constraints from symbolic execution and the network configuration provided by the user. However, we use interposition to intercept each protocol message from each adversary that is directed to the target. An adversarial module then attempts to modify the message so as to drive the target to execute a vulnerable statement. It does so by using a constraint solver to find concrete values for the message’s fields that, along with the current internal protocol state of the target, satisfy some input constraint obtained by symbolic execution. In the ECN example, the sole input constraint is !(σ num pkts in flight == 0) && σ buffer−>ecn bit == 0. Thus, whenever the ECN sender’s internal state has a nonzero value fornum pkts in flight, the input constraint can be satisfied by settingbuffer->ecn bit to 0, so the adversarial module will perform this modification. The adversarial module then passes the message back to the handling function at the target and transfers control back to the protocol implementation. By intercepting every message from the adversary to the target and using the adversarial module to modify it, our system-level concrete execution attempts to repeatedly drive execution to a vulnerable statement. Our tool measures the impact of the emulated adversarial behavior on the target and other protocol participants, using the performance metrics provided by the user. We also execute the same network configuration without emulating adversarial behavior and record the associated performance metrics. The user can compare the cases with and without adversaries to decide if the protocol implementation is vulnerable to manipulation attacks and quantify the impact of the attack. Discussion. Our approach meets our three goals of generality, scalability, and precision. It is general because our techniques are not limited to a particular protocol (such as TCP), or a particular setting (such as a simulator). It achieves precision and scalability through a judicious 90 combination of static and dynamic analysis that exploits the strength of each technique. Symbolic execution prunes the search space dramatically, achieving scalability, and provides a targeted set of constraints that must be satisfied. System-level concrete execution exploits knowledge of the actual values of the target’s internal protocol state to solve these constraints and allows users to directly see how the modifications affect protocol participants. Because we use system-level concrete execution, our approach is precise in that it cannot result in false positives; however, as we discuss in Section 4.3, our approach can have false negatives. Finally, users can easily experiment with different protocol configurations in order to understand the impact of possible manipulation attacks under a variety of scenarios of interest. While in principle it is possible for symbolic execution alone to identify manipulation attacks, in practice that would be difficult. Static analysis cannot be completely accurate in the presence of loops and pointers and would lead to many false positives, and the need to find not just a single manipulation but a repeated sequence of manipulations would make the approach even more inaccurate. It would also be difficult to statically analyze the impact of such manipulation on the user’s performance metrics, in order to help users determine the impact of an attack. Similarly, it is possible in theory to use adversarial concrete execution alone to identify ma- nipulation attacks. However, the search space of possible modifications is so large that a blind exploration would be unlikely to find attacks in a reasonable amount of time. 4.3 MAXImplementation We now discuss the details of the tool, called MAX, which we have built based on the approach above. Our tool enables exploration of manipulation attacks for protocol implementations written 91 List of feasible paths to Vulnerable functionality and their input constraints Network Topology and Setup Metrics Performance Message Processing Function Protocol Implementation Source Vulnerable Statement Module Symbolic Execution Engine Path Exploration Adversarial Module Adversarial Module Generator List of manipulations that work Comparison between honest and manipulated runs of the protocol implementation Attack Emulator Figure 4.2: Overview of MAX in C. The low-level nature of C code presents analysis challenges, but we choose C because it is widely used in real-world protocol implementations. Figure 4.2 provides an overview of MAX, which consists of four main modules. The path exploration module uses the symbolic execution engine to finds feasible paths that lead to vul- nerable statements. The adversarial module generator uses the results from path exploration to generate an adversarial module, which modifies messages produced by protocol participants in order to mimic adversarial behavior. The attack emulator module uses this adversarial mod- ule, along with experimental settings provided by the user, to concretely execute an instance 92 of the network protocol in an adversarial manner against the target. This execution produces values for user-provided performance metrics that help users understand the effectiveness of pos- sible manipulation attacks. Next, we describe how each of these modules has been implemented in MAX. Path Exploration Module. The path exploration module takes three inputs. The first is a set of functions in the target’s protocol implementation that the user deems relevant; this set can be the entire implementation if that is of interest. The second input is a set of vulnerable statements. The final input is the name of the function that handles incoming protocol messages, which will be used as the entry point for symbolic execution. The module generates as output a list of paths that lead to vulnerable statements. For each path, the module also generates the constraints on incoming messages and internal protocol state at the target that must hold for that path to be taken. The path exploration module’s functionality consists of three steps. CFG Generation. The module uses the CIL infrastructure for C program analysis and transfor- mation [60] to parse the source code of the protocol implementation and build an inter-procedural control flow graph (CFG) for the incoming message processing function. The generated CFG for the ECN example from Section 4.1 is shown in Figure 4.3. CFG Pruning. In order to reduce the complexity of symbolic execution and improve scalability, the path exploration module employs a standard reachability analysis to identify and remove 93 Start End 1 8 num_pkts_sent++; if(num_pkts_in_flight == 0) TRUE FALSE FALSE TRUE FALSE TRUE num_pkts_in_flight−−; if(ecn_bit == 0) num_pkts_to_send = 0; num_pkts_to_send = 2; if(num_pkts_to_send > 0) ecn_bit = ((packet *)buffer)−>ecn send_pkts(); 23 4 5 6 7 return; Figure 4.3: Control Flow Graph for ECN Example. nodes that do not reach nodes with vulnerable statements. Thus, for Figure 4.3, where the sole vulnerable statement is at Node 8, Node 2 would be pruned from the CFG. SymbolicExecution. In this step, the path exploration module uses the symbolic execution engine (discussed next) to simulate the execution of the incoming message processing function on the pruned CFG, treating the incoming packet as well as the protocol state as symbolic inputs. At each step the path exploration module checks whether the next statement to be symbolically executed is vulnerable. If so, the module marks the path being currently explored as a feasible path, converts the associated path condition into an input constraint, and stores both of these before continuing 94 the exploration. This process completes when there are no paths left unexplored in the CFG (see below for the treatment of loops). Symbolic Execution Engine. MAX’s symbolic execution engine is an advanced version of the engine used inFSMGen and is still built as a inter-procedural analysis over the CIL framework. It differs from the engine described in Chapter 3 in that it is designed for pure C, rather than nesC. Its treatment of loops, pointers, structures, and typecast expressions is more advanced than the symbolic execution engine used by FSMGen. Also, it uses the Z3 constraint solver [19] which has proven to be faster as well as more efficient in terms of memory usage, than CVC3. Lastly, it is multi-threaded, in order to take advantage of multi-core architectures to reduce execution time. We chose to implement our own symbolic execution engine rather than use existing ones [79, 8] in order to explore and control the tradeoff between accuracy and scalability, which is crit- ical for handling real-world protocol implementations. Also, most existing symbolic execution engines work only on whole programs, while we employ symbolic execution on partial pro- grams for increased scalability. MAX’s symbolic execution engine performs purely static analy- sis, as opposed to the mixture of symbolic and concrete execution used by many existing engines [27, 73, 79]. A mixed execution engine may work in our setting, but we have not yet explored the possibility. MAX’s symbolic execution engine simulates possible executions of the provided source code, starting at a given entry point function. Below we describe how our symbolic execution engine 95 handles various kinds of statements of interest. We avoid repeating a discussion for the most common statements for which the treatment remains the same as in the version used byFSMGen. Function Calls. The symbolic execution engine first checks whether the function definition for the called function is part of the code provided to the engine. If so, the symbolic execution engine adds the formal parameters of the function to its symbolic store, mapping them to the arguments specified in the function call, and starts traversing the function body. If the function is not part of the code, the symbolic execution engine treats the call conservatively. It does so by assigning a fresh symbolic value to each variable whose value might be modified by the call, thereby losing any knowledge of these variables’ values. Loops. A naive treatment of loops would require a symbolic execution engine to consider an unbounded number of paths, since in general the actual number of loop iterations cannot be known until runtime. To address this problem, our symbolic execution engine partitions the possible paths through a loop into two sets: one for the case when the loop guard is false and the body is never entered, and one for the case when the loop is iterated at least once [22]. The engine first invokes Z3 to evaluate the loop guard. If the guard is known to be false, then only the first case is considered. If the guard is known to be true, then only the second case is considered. Otherwise, both cases are symbolically executed. To handle the “false” case, the loop body is simply skipped and simulation continues at the subsequent statement. To handle the “true” case, the engine conservatively simulates the execu- tion of an arbitrary iteration of the loop. It does so by invalidating information in the symbolic store about any variable that may be modified by the loop body, similar to what is done for calls 96 to unknown functions above. After this invalidation, the loop body is symbolically executed. Fi- nally, the loop guard is assumed to be false, so its negation is added to the path condition, and symbolic execution continues after the loop. This is more accurate as compared to the treatment of loops by theFSMGen symbolic execution engine. Low-level Constructs. MAX’s symbolic execution engine is meant to be used on network proto- col implementations, which make heavy use of features such as pointer manipulation and type casting. One option for handling such features uniformly is to model all variables as byte ar- rays [8]. While very accurate, this approach dramatically increases the complexity of symbolic execution. Instead, we use a single symbolic value for most variables. For compound types like structs and arrays, we use a symbolic value per field and introduce these symbolic values lazily as each field is encountered during symbolic execution. In order to retain accuracy, we employ domain knowledge about how network protocols use low-level features in order to devise precise ways of simulating their execution. For example, consider a statement of the formx=(struct b * )y; wherey is of type struct a * . In general, it would be difficult to deduce the relationship between the two struct types, which is necessary to transfer any knowledge in the symbolic store abouty and its fields tox. Our engine makes use of the fact that when these structs represent packet header types, it is most often the case that one of the two structs is embedded as a field of the other. We therefore attempt to search for such an embedding in the struct definitions. If found, it is straightforward to update the symbolic store with accurate information about x. If we cannot determine the 97 relationship betweenstruct a andstruct b, then we conservatively learn nothing aboutx and its fields. AdversarialModule Generator. The adversarial module generator is responsible for automati- cally generating the code that is used by MAX to mimic the effect of adversaries during system- level concrete execution, as described in Section 4.2. First, it generates C methods that read the current values of the various protocol state variables. It also generates C methods to write values into fields of a message. By default the generator assumes that all internal state can be read by an adversary and all fields in the message can be modified. However, we allow users to experiment with weaker adversaries by restricting the state available to them as well as the message fields that adversaries may modify. Second, the generator creates a functionhandle intercept, which is invoked whenever a message from an adversary is sent to the target. Thehandle intercept function manages the execution of the adversarial module as described in Section 4.2. In MAX, the adversarial module reads the current state of the target and invokes the Z3 solver to obtain an assignment of values for the fields of the intercepted message that satisfy an input constraint provided by the path exploration module. There are many possible ways to explore the space of adversarial executions. Currently our approach is quite simple but still effective, as shown in the next section. At each message inter- ception we consider input constraints in an arbitrary order and use the first one for which Z3 finds a satisfying assignment. We also cache this satisfying assignment and reuse it whenever the same protocol state is encountered in the future, avoiding the need to invoke Z3 again. 98 The code for much of this functionality is written in OCaml because the CIL framework is implemented as an OCaml library. Hence the generator also creates interfacing functions that link the C functions described above to OCaml counterparts. Attack Emulator module. The attack emulator module allows users to concretely explore whether manipulation attacks are possible with respect to the specified vulnerable statements at the target and to understand the impact in practice. The module requires the user to perform the following actions: • Set up the experimental network topology and conditions under which the user wishes to test the protocol implementation. • Set up trace collection over this system and add instrumentation to compute performance metrics of interest. The attack emulator module first instruments the protocol implementation at the target to interpose on all incoming messages. Figure 4.4 provides a pictorial view of this instrumentation. Any messages that are received from an adversary are redirected to thehandle intercept function in the generated adver- sarial module, described above. Placing the adversarial module on the target provides easy access to the target’s internal protocol state, which is used by the adversarial module. The attack emula- tor module also instruments the protocol implementation at the target to record the modifications performed by the adversarial module, along with information about when and how many times the implementation executes avulnerable statement. Next the attack emulator module performs concrete execution over the whole network, given the network setup provided by the user. The module conducts two sets of runs with this setup: one 99 Constraint Solver Protocol Implementation Network Incoming Protocol Messages Protocol Messages Outgoing Modified Messages Target Adversarial Module Intercepted Messages Figure 4.4: Experimental Setup for Attack Emulation in which messages from adversaries to the target are modified (adversarial) and another where such messages are not modified (honest). In the honest runs, we still execute the interposition computation but simply do not update the message with the results. This ensures that there will be similar delays due to interposition for both sets of runs, allowing for a more direct comparison. After completing the runs, the module outputs the traces that were collected by the user- provided code, as well a trace of the manipulations used by the adversarial module and their effect on the ability to reach vulnerable statements in the target. Users can employ this information to decide whether a manipulation attack was successful and to understand the exact mechanism of the attack. Based on the information learned, users can easily iterate all or part of MAX’s steps 100 to explore their protocols further. For example, users can update their instrumentation and re-run the attack emulator module to obtain information about other performance metrics. Users can also change the set of vulnerable statements and re-execute MAX in its entirety. Limitations. MAX is a first effort to build a precise, scalable, and general tool to find manipu- lation attacks in complex network protocol implementations. There are a few limitations of our implementation that we plan to address in the future. First, our use of interception to emulate adversarial behavior introduces latency in protocol processing, which may affect protocol performance metrics. While these delays are not prob- lematic when the protocol being explored is part of a discrete-event simulator which does not account for compute time, they may affect the behavior of actual protocol implementations. We currently limit the impact of these delays by introducing comparable delays in honest runs of the implementation to enable correct performance comparisons. Second, placing the adversarial module at the target is convenient but can also be a limitation. For example, this approach does not naturally handle broadcast-based protocols. In such a proto- col, any modifications to a message from an adversary must be made visible to all recipients to maintain consistency, while our approach would only modify the version received by the target. Finally, the approximate nature of symbolic execution can lead to both false positives and false negatives. While false positives will be pruned out during concrete execution, false nega- tives represent possible manipulation attacks that will not be explored during concrete execution. The main source of false negatives comes from imprecisions in mapping constraints produced by symbolic execution back to the appropriate fields of incoming messages. Such imprecision can happen, for example, if an incoming message is cast to another type in a way that we cannot 101 accurately simulate during symbolic execution. False negatives in tools that examine real pro- tocol implementations are not unusual; for example, model-checking based tools like Mace [40] do exhibit unquantifiable false negative rates, but have nevertheless proven to have significant practical value. 4.4 Evaluation We evaluate MAX along two dimensions. The first is its ability to find manipulation attacks in real protocol implementations. The second is its computational footprint and the associated impact on performance. To evaluate these aspects of MAX, we use it on implementations of three well- known protocols: TCP, 802.11 and ECN. We show how our tool helps find manipulation attacks as well as confirm the efficacy of preventive measures. 4.4.1 ExploringmanipulationattacksinTCP. TCP is one of the largest and most complex network protocols, so it is a good stress test for MAX. We evaluated MAX on Daytona [69], which is a user-space port of the Linux 2.3.29 kernel’s implementation of TCP. We ran MAX’s path exploration module on roughly 50K lines of code of the Daytona imple- mentation, with the TCP sender as target. Due to the size and complexity of this code, the number of possible paths to explore is extremely large (to the order of billions), so we introduced a simple optimization to speed up this phase of MAX. We inserted a profiling step before path exploration, which executes TCP under the user-provided system configuration and records the set of internal states of the target encountered. We then seeded path exploration with this information, ensuring 102 that only paths consistent with at least one of these concrete states would be explored. In this way, path exploration (and the underlying symbolic execution engine that it employs) is targeted to the network conditions that the user will later test the protocol under during concrete execution, improving scalability without sacrificing precision. Automatically finding manipulation attacks. We first considered any statement that decreases the number of outstanding packets at the sender to be vulnerable. Intuitively, a decrease in the number of outstanding packets causes the window to inflate and the sender to send out more packets. Forcing such a decrease repeatedly can potentially increase the throughput received by the adversary. MAX’s path exploration module found close to 2 million feasible paths. For the network configuration, we used a simple topology of two nodes connected on the same LAN, one being the TCP sender and the other the receiver. We set up background TCP traffic to create congestion conditions; our goal was to explore the response of TCP to manipulation under congestion, when there would be a lot of outstanding packets at the sender. We disallowed the modification of the Flags in the TCP header, i.e., the SYN,ACK flags etc. MAX had access to all state at the TCP sender. MAX successfully emulated a manipulation attack on the TCP sender by modifying the se- quence numbers ACKed by the receiver. In particular, MAX automatically inferred the range of sequence numbers for which an ACK would cause the number of outstanding packets to be re- duced. Even when a packet was dropped and a duplicate ACK sent by the sender, MAX was able to automatically modify it such that its sequence number was in the queue of outstanding packets. MAX’s protocol modifications weresafe in that they did not result in connection termination. Thus, MAX automatically discovered the well-documented (but manually discovered earlier) Optimistic ACKing attack [1] in a large and complex protocol implementation. There are subtle 103 differences between the way in which MAX mounts the attack and the way in which prior work [1] suggests it be done. In the latter case, the receiver guesses the sequence number to manipulate: because MAX has access to the sender internal state, it can determine the highest valid sequence number relative to the sender state. Because of this, MAX is able to ascertain the worst-case throughput increase as a result of manipulation. We ran this experiment for two hours, and MAX was able to manipulate the sender into executing the vulnerable statements all but 22 times for the 11480 packets it manipulated. (The 22 incoming packets were received when the number of outstanding packets was zero, so MAX found no feasible paths leading to the vulnerable statement). Figure 4.5(a) depicts the throughput obtained by the adversarial receiver as modeled by MAX as compared to an honest receiver. Figure 4.5(b) shows that MAX was successfully able to hide all packet drops from the sender, resulting in a monotonically increasing congestion window. Figure 4.5(c) illustrates how MAX modified the sequence numbers in order to fool the sender into sending more packets. The sharp decrease in the sequence number in certain places in the plot is due to the few acknowledgement packets that MAX was not able to modify successfully. While MAX’s interposition reduced the throughput of TCP in our experiments, the function- ality of TCP remained intact. Further, by inducing comparable delays for the runs without an adversary, we were able to meaningfully compare honest and adversarial runs. Confirming robustness to manipulation attacks. We then considered any statement that increases the congestion window as vulnerable. Intuitively, forcing the sender to increase the congestion window will give the receiver more throughput. Researchers have manually identified such at- tacks in TCP [72] using ACK division and DupACK spoofing, but not all implementations are vulnerable; our goal was to determine if Daytona was susceptible to these attacks. The path 104 0 5 10 15 20 0 1000 2000 3000 4000 5000 6000 7000 Throughput (B/s) Time (s) Adversarial Run Honest Run (a) TCP Throughput 0 10 20 30 40 50 60 70 0 1000 2000 3000 4000 5000 6000 7000 Cwnd (packets) Time (s) Adversarial Run Honest Run (b) Congestion Window at Sender 2.0 3.0 4.0 5.0 6.0 7.0 8.0 0 1000 2000 3000 4000 5000 6000 7000 Sequence No. Time (s) x 10 7 Adversarial Run: After Modification Adversarial Run: Before Modification Honest Run (c) ACKed Sequence Number Figure 4.5: Results of MAX for Daytona 105 exploration module of MAX was able to find over a million feasible paths to such statements. However, the attack emulation module was unable to construct any attacks that repeatedly forced the sender down these paths. Sporadic manipulations of the congestion window did occur during adversarial concrete execution, but they did not lead to any observable increase in throughput for the adversary. Hence MAX provides some confidence that Daytona is robust to manipulations such as ACK division and DupACK spoofing. 4.4.2 Exploringmanipulationattacksin802.11MAC. We evaluated MAX on the implementation of the 802.11 MAC protocol in the Qualnet simula- tor. Qualnet is implemented in C++, but the core of the 802.11 code remains in C. Therefore, MAX’s path exploration module was able to analyze the 802.11 code after we removed its bind- ings to the Qualnet scheduler and redefined a few C++-specific data structures. MAX’s ability to focus program analysis on a relevant subset was a key enabler: other symbolic execution en- gines that perform whole program analysis might not have been able to accommodate this easily. We successfully identified two manipulation attacks on 802.11 that were previously identified manually [6]. NAV Attack. In this experiment, we considered the statement that sets the state of the 802.11 protocol to WF NAV (i.e.,, waiting for the Network Allocation Vector value to expire) as vulnerable. A node cannot transmit any data while the NA V value has not expired, so intuitively forcing the NA V value to not expire can potentially induce a DoS attack. MAX’s path exploration module generated 16,008 paths that lead to the vulnerable statement. We used as input configuration a simple 8-node topology with a single AP, where all nodes could 106 0 200 400 600 800 1000 1200 1400 1600 Flow 1 Flow 2 0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800 900 FTP Throughput (B/s) Time (s) Honest Run Adversarial Run (a) FTP Throughput 0 1000 2000 3000 4000 5000 6000 7000 8000 Adversary Base Station Target 0 1000 2000 3000 4000 5000 6000 7000 8000 0 100 200 300 400 500 600 700 800 900 Frames Sent per Sec. Time (s) Honest Run Adversarial Run (b) Frames Sent per Sec Figure 4.6: Results of MAX for 802.11 (NA V Attack) hear one another. In this topology, we set up two FTP flows such that one of the senders was a target while the other sender was an adversary attempting to increase its share of airtime. MAX was allowed to modify all fields except for the frameType of the intercepted frame. MAX success- fully constructed the known NA V attack by repeatedly causing the adversary to set an abnormally high value for theduration field of the packets it sends, resulting in the the target using this high value to compute its NA V . Figure 4.6(a) shows the comparison of the throughput for both flows for an honest as well as adversarial run. The throughput differences is fairly dramatic in the 107 adversarial run; clearly the target is unable to send very much during the adversarial run. We also plotted the number of frames sent out per second by all the nodes in the network (Figure 4.6(b), and again it is clear that the target has been successfully attacked. RTS Attack. In this experiment, we considered any statement that sets a sleep timer at the target to be vulnerable. By sleeping, the target is deprived of bandwidth, which makes it vulnerable to denial-of-service. MAX’s path exploration module found 147528 paths to a vulnerable statement. For concrete execution, we set up a simple 6 node ad-hoc wireless network topology in Qualnet where all nodes could hear each other. Node 2 was designated the sole adversary and node 5 was the target. We set up a constant bit rate (CBR) flow from node 2 to node 5. We also set up two FTP flows, one from node 5 to node 3 and another from node 3 to node 6. For this attack, we allowed modification of all fields of an intercepted frame. MAX’s adversarial concrete execution successfully constructed the known RTS attack. CBR frames from node 5 received at node 2 were automatically modified by the adversarial module in two ways: theframeType was modified to be an RTS frame, and theduration field was assigned a large number (as in the previous attack). This modification caused the target to respond with a CTS frame having a similar duration value, which in turn caused all honest participants in overhearing range to become quiescent for a long period, in expectation of an imminent data transmission. Further, the target node itself waits for the data. Thus, by sending a single packet, the adversary can impact multiple participants, some of them not even within its broadcast range. This attack has been documented as a virtual jamming attack [13], since the sender is not really sending any packets but still successfully jams the channel. 108 0 20 40 60 80 100 120 140 Flow 1 Flow 2 0 20 40 60 80 100 120 140 0 10 20 30 40 50 60 Application Throughput (B/s) Time (s) Honest Run Adversarial Run (a) FTP Throughput 0 100 200 300 400 500 600 0.03 0.04 0.05 0.06 0.066 0.1 0.11 0.19 0.2 Packets Transmitted CBR Sending Rate of Adversary (packets/s) Other Participant Target (b) Different CBR Rates 0 50 100 150 200 250 300 350 400 Adversary Target 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 Frames Sent per Sec. Time (s) Honest Run Adversarial Run (c) Frames Sent per Sec Figure 4.7: Results of MAX for 802.11 (RTS Attack) 109 Figure 4.7(a) shows a comparison between an honest and adversarial run for the same topol- ogy. For this particular figure, the adversary is sending packets to the target at the low rate of 0.2 packets per second. Using MAX it was easy to rerun the attack emulator module multiple times in order to understand the impact of different data rates for the adversary. Figure 4.7(b) provides a comparison between the performance of the two FTP flows for different sending rates of the adversary, and highlights an interesting scenario where the targets manages to send out packets only when the rate is 0.05 packets. Lastly, Figure 4.7(c) shows the rate at which frames were sent out by the participants. As can be seen, the adversary succeeds in reducing the rates for all other participants and increases its own substantially. As a side effect of MAX’s analysis, we also found an interesting bug in Qualnet. In a few places, the Qualnet code forgets to check that a timer value being set is nonnegative. Therefore, the input constraints that MAX creates for paths to those timer settings do not require nonnegative values. During concrete execution, MAX sometimes chose a negative timer value, and this caused Qualnet to subsequently crash. Thus we see that MAX’s techniques can also help identify crash attacks and similar implementation bugs. 4.4.3 ExploringECNmanipulationattacks. We described the Explicit Congestion Notification (ECN) protocol, which is an extension to TCP’s congestion control mechanism, along with a possible manipulation attack, in Section 4.1. We were able to demonstrate that MAX can automatically detect this attack in an implementation of ECN on top of TCP for Qualnet. We also illustrated the utility of MAX as a design tool by 110 patching the implementation using the well known nonce fix for ECN and then using MAX to gain confidence that the implementation is robust to the attack. Finding a manipulation attack. For ECN, we pick a vulnerable statement such that a path containing it does not contain any statement reducing the congestion window. Intuitively, an adversary can increase its throughput by continually forcing the target to avoid reducing the congestion window. The MAX path exploration module found a total of 1096 paths that avoid reducing the conges- tion window. During adversarial concrete execution, MAX automatically deduced that, in order to avoid reducing the congestion window, it should not reflect the ECN bit in its packets if that bit is set to 1. We were able to verify by comparing the throughput that MAX indeed discovers the ECN manipulation attack. Confirming the robustness of the nonce fix. Next we modified the Qualnet ECN implementation to protect against this attack using the nonce fix. In this fix, a sender inserts a randomly-generated nonce in its packets, and congested routers simply delete the nonce to indicate congestion. When the sender receives an acknowledgement from the receiver, it reduces the congestion window if the nonce in the received packet differs from the original. Intuitively, the receiver cannot guess the nonce correctly often enough to impact throughput. We ran MAX on this modified version of ECN, but hid the nonce state at the sender in order to emulate the capability of an adversary who could not guess the nonce value. The path exploration module found the same possible code paths as before. However, during system-level concrete execution, MAX was unable to reliably force execution down these paths due to the need to 111 Attack Time to find Total paths Interposition feasible path found delay NA V 6.6s 16008 63.75ms RTS 1.65s 147528 3591.6ms ECN 0.233s 1096 12.75ms OptACK 0.273s 2 Million 4843.6ms Table 4.1: Computational Footprint of MAX reconstruct the nonce value. Hence the attack was not successful, and MAX was able to confirm that the fix was effective in practice. 4.4.4 ComputationalFootprint. Finally, we quantify the overheads of MAX. Table 4.1 presents statistics about the computational overhead of MAX’s path exploration as well as attack emulator modules. As can be seen from the numbers, interposition delay is directly proportional to the number of paths found. This intuitively makes sense, since during concrete execution MAX’s searches through feasible paths to find satisfying assignments. The average time to find a single feasible path is low for TCP’s optimistic ACKing attack and the RTS attack, since it is amortized over a large number of paths. The interposition delay for both these attacks are uncommonly high as well. In the case of the RTS attack, since we are working with the Qualnet simulator, this will not impact the protocol performance or behavior, only the time taken to run the simulations. However, in the case of TCP, it can, and as we have seen from the graphs, does impact the throughput. However, since this delay is more or less constant for all packets, TCP adapts to it, and it does not cause TCP timeouts etc. Thus, even though large, this delay does not impact TCP’s behavior, only its performance. 112 Hence we showed that MAX is a practical tool that can be used in real implementations and simulators alike to detect, understand, and evaluate possible manipulation attacks. 4.5 Summary In this chapter, we have described the design of MAX, a tool that makes novel use of symbolic and system-level concrete execution for automatic exploration of manipulation attacks. MAX is general, in that it allows exploration of arbitrary protocol implementations written in C. MAX scales to real-world implementations of TCP, but is flexible enough to accommodate simulator implementations of protocols such as 802.11. It is able to identify a variety of manipulation attacks in these protocols, when given a network configuration and a set of vulnerable statements that a developer might suspect to be susceptible to manipulation attacks. 113 Chapter5 LiteratureReview In this chapter we present a survey of related work. We broadly divide our review into three distinct topics, each corresponding to research related to the systems Pleiades, FSMGen, and MAX presented in this dissertation. First, we discuss programming languages in parallel and distributed computing. We then discuss the use of finite state machines in sensor networks, and tools to derive finite state machines from programs. Last, we describe tools to find network attacks and program errors in implementations. 5.1 ProgrammingLanguagesinParallelandDistributedComputing Pleiades is related to many programming concepts developed in parallel and distributed com- puting. We classify these into three broad categories. They are embedded and sensor systems languages, concurrent and distributed systems languages, and parallel programming languages. 5.1.1 EmbeddedandSensorNetworksLanguages Several researchers have explored programming languages for expressing the global behavior of applications running on a network of less-constrained 32-bit embedded devices (e.g., iPAQs). 114 Pleiades’s programming model borrows from our earlier work on Kairos [28], an extension to Python that also provides support for iterating over nodes and accessing node-local state. How- ever, Kairos does not support automatic code migration or serializability. Kairos provides support for application-specific recovery mechanisms [29], which Pleiades lacks. SpatialViews [67] is an extension to Java that supports an expressive abstraction for defining and iterating over a virtual network. In SpatialViews, control flow migrates to nodes that meet the application requirements. To avoid concurrency errors, SpatialViews restricts the programming model within iterators. Regiment [64] is a functional programming language for centrally programming sensor net- works that models all sensor data generated within a programmer-specified region as a data stream. Regiment is a purely functional language, so the compiler can potentially optimize pro- gram execution extensively according to the network topology. On the other hand, since the lan- guage is side-effect-free, it does not support the ability to update node-local state. For example, the car parking application would be much harder to write in Regiment. TinyDB [54] provides a declarative interface for centrally manipulating the data in a sensor network. This interface makes certain applications reliable and efficient but it is not Turing- complete. Because TinyDB lacks support for arbitrary computation at nodes, it cannot be easily used to implement the kinds of applications we support, like car parking. Research on Abstract Regions [82] provides local-neighborhood abstractions for simplifying node-level programming. This work is focused on programmability and efficiency and does not provide support for consis- tency or reliability. Macrolab [35] provides a matlab-like interface for programming sensor networks. It uses ma- trices to represent values of node-local variables and sensors across the network. The distribution of functionality over the nodes is done based on a user-specified cost function. The user also has 115 to specify what variables need to be synchronized across the network. There is no support for node failures. Flask [56] is a functional language for programming sensor networks. It has a data-flow programming model and allows for staged programming of wireless sensor networks. DSN [14] is a declarative programming framework for sensor networks. It uses the language Snlog, which is a dialect of the deductive database query language, Datalog. ATAG [68] is a programming framework for sensor networks that promotes a mixed imperative and declarative approach. It uses the concept of tasks which are units of local computation described in an imperative fashion. The communication among tasks is described by the programmer in a declarative manner. 5.1.2 ConcurrentandDistributedSystems Argus [52] is a distributed programming language for constructing reliable distributed programs. Argus allows the programmer to define concurrent objects and guarantees their atomicity and re- covery through a nested transactions facility, but makes the programmer responsible for ensuring serializability across atomic objects and for handling any application-level deadlocks. Recently, composable Software Transactional Memory (STM) [31] has been proposed as an abstraction for reliable and efficient concurrent programming. Also, Atomos [11] is a new programming language with support for implicit transactions and strong atomicity features. Our cfor construct, with its serializability semantics and nesting ability, is designed in a similar spirit—a concurrency primitive with simplicity, efficiency, reliability, and composability as goals. Unlike these systems, however, Pleiades derives concurrency from a set of loosely coupled and distributed, resource constrained nodes. Therefore, the Pleiades implementation ofcfor emphasizes message and memory efficiency over throughput or latency. For the same 116 reason, it uses a simple distributed locking algorithm for serializability and a novel low-state algorithm for distributed deadlock detection and recovery. Pleiades’cfors are also similar to atomic sections in Autolocker [58] in that both implementations use strict two-phase locking. But Autolocker guarantees the absence of deadlocks through pessimistic locking, while Pleiades uses an optimistic locking model in which locks are acquired or upgraded as needed, and any deadlocks are detected and recovered by the runtime. Approaches to automatic generation of distributed programs have also been explored. For ex- ample, Coign [36] is a system for automatically partitioning coarse-grained components. Magne- tOS [53] also has support for partitioning a program written to a single system image abstraction. A program transformation approach for generating multi-tier applications from sequential pro- grams is described in [62]. All these systems are primarily meant for partitioning and distribution of programs into coarse-grained components, that can then be run concurrently on multiple nodes. Pleiades differs from these systems in generating nesC programs with fine-grained nodecuts and supporting lightweight control flow migration across such nodecuts. 5.1.3 ParallelProcessingLanguages Pleiades differs from prior parallel and concurrent programming languages such as Linda [25] and Split-C [17] by obviating the need for explicit locking and synchronization code. Pleiades also differs from automatic parallelization languages such as High Performance Fortran [43] by equipping the compiler and runtime with serializability facilities. This is because parallel pro- gramming languages focus on data parallelism on mostly symmetric processors, leaving to the programmer the responsibility of ensuring deadlock and live-lock freedom at the application level. On the other hand, Pleiades offers task-level parallelism, where data sharing among sensor nodes 117 is common, and where it is desirable to offload the correct implementation of concurrency to the compiler and runtime. 5.2 FiniteStateMachines A number of works have supported and inspired our hypothesis that FSMs are good high-level abstractions of event-driven sensor network programs. Kasten et al. [37] present the design of OSM, a programming language that allows the programmer to directly implement sensor net- works as finite state machines. Kim et al. [41] propose SenOS, a state-machine-based execution environment for sensor networks. We tackle a different problem than both these works, in that rather than trying to build a new programming architecture or operating system based on FSMs, we attempt to abstract programs written in the currently popular sensor network programming architectures into FSMs. Other work requires FSM specifications to perform validation of sensor network programs. For example, Archer et al. [4] use FSMs to represent correct usage specifications of TinyOS interfaces, enforcing these specifications at run time. Lighthouse [75] uses FSM specifications in order to statically analyze dynamic memory usage in SOS [30] applications. Both of these systems require FSMs to be provided by the user. Our work could be used to infer FSMs as input to these and other kinds of static and dynamic analysis tools for sensor networks. Tools for program analysis of sensor network programs have also been recently developed. cXprop [16] is an abstract interpreter built over CIL [60], designed for TinyOS. It allows users to define their own value-propagation analyses in the spirit of conditional constant propagation, using abstract value domains. cXprop contains a symbolic execution module, which manages 118 the abstract values in the program state. cXprop uses a conservative concurrency model for nesC/TinyOS in order to track the state of shared variables. In contrast, we use an optimistic approximation of the concurrency and execution model of TinyOS, since our final goal is different from that of cXprop. Safe TinyOS [15] is a tool built using cXprop, which provides memory safety for TinyOS. KleeNet [71] is another program analysis tool, that uses the Klee symbolic execution engine to run TinyOS programs on symbolic input, injecting non-deterministic failures like node reboots. It generates distributed execution paths that provide high-coverage, leading to discovery of interaction bugs before deployment. Within the programming languages community, there is a large body of related work on trying to derive FSMs from programs. Ammons et al. [3] profile multiple executions of an application and then employ machine learning on the resulting execution traces to infer an FSM. Static tech- niques are closer to our work. In particular, works by Alur et al. [2] and Henzinger et al. [32] employ forms of symbolic execution and predicate abstraction to infer FSMs. However, the goal of these works is to derive a temporal specification for a single component, which indicates the sequences of function calls that do not cause the component to crash (or throw an exception). This temporal specification can then be fed to automatic verification tools for checking clients of the component [5, 33]. Our differing goals lead to different design decisions. For example, they drive the construction of an FSM according to the ways in which the component can throw an exception, while we drive the construction of an FSM according to the application’s control flow. Also, we perform minimization to make the resulting FSM user-readable, while this is not a concern for those works. Finally, we handle the event-driven and asynchronous constructs of TinyOS, while these works are implemented for mainstream languages (Java and C). 119 5.3 ToolstoDetectErrorsandNetworkAttacks To our knowledge, MAX is the first tool that can detect manipulation attacks in network protocol implementations. However, symbolic and concrete execution have been used to detect many other kinds of errors and vulnerabilities in software systems. We compare with those works that are most closely related. A notion of gullibility for network protocols was defined in prior work [77]. Gullibility is broader than our definition of manipulation attacks, since it admits any adversarial strategy for violating desired protocol properties. For example, gullibility can encompass unilateral protocol violations on the part of one of the participants; in MAX we do not consider such violations. In [77] the authors described a prototype tool that attempts to detect gullibilities on protocols implemented in Mace [40] using a user-provided set of attack strategies. We have introduced the use of symbolic execution to automatically produce attack strategies given a set of vulnerable statements, which greatly lessens the burden on the user. We have also demonstrated that MAX can scale to real and complex protocol implementations and can detect real attacks. Several systems focus on finding and protecting against other classes of network attacks. El- cano [7] automatically generates signatures to filter out inputs that exploit known vulnerabilities. In contrast to MAX, Elcano assumes that an initial exploit exposing the vulnerability is known and uses this information, along with a form of symbolic execution, to generate the vulnerability signature. SAFER [12] is a static analysis tool that identifies vulnerabilities to DoS attacks using a form of taint analysis. Both of these tools target attacks that require only a single execution of the vulnerable statement. MAX targets a more subtle set of attacks which typically require 120 repeated manipulation, which motivates our use of adversarial concrete execution. Further, nei- ther of these tools has been shown to handle protocol implementations of the size and complexity of TCP. EXE [9] is a tool that generates inputs of death, which cause a program to crash. Brumley et al. present a technique that automatically generates exploits from software patches that target input validation vulnerabilities [13]. While the underlying techniques, in particular symbolic execution, are similar to those used in MAX, these techniques are used for very different ends. As above, the errors detected in these works require only a single execution to be successful, and MAX’s notion of adversarial concrete execution has no analogue in these systems. CrystalBall [84] is a tool that detects inconsistencies in running distributed systems. It exe- cutes the given distributed system and in parallel feeds its state to a model checker to find possi- ble future inconsistent states. It also uses heuristics to steer the execution away from such states. MAX is currently designed for use during testing and development, but it would be interesting to explore the use of adversarial concrete execution to detect manipulation attacks in a running production system, or divert system execution to avoid such attacks. CrystalBall is implemented for distributed systems implemented in Mace, while MAX works on arbitrary protocol implemen- tations written in C. 121 Chapter6 Conclusions In this dissertation, we demonstrated that program analysis tools and techniques can be used effectively to address programming challenges faced in building reliable, efficient and robust networked systems. We showed that by leveraging domain knowledge and simple user insights, it is possible to design language abstractions and program analysis techniques that significantly reduce the burden of the networked system programmer. We presented Pleiades, a programming framework for wireless sensor networks. Pleiades enables a sensor network programmer to implement an application as a central program that has access to the entire network. Pleiades programs are understandable, energy-efficient, reliable, and robust. Pleiades thus addresses the programming challenges of distributing functionality over multiple nodes, and handling application/domain-specific constraints. We demonstrated that our implementation of Pleiades can run realistic applications on memory-limited motes. We addressed the problem of inferring compact, user-readable FSMs for applications and system components from TinyOS programs. Our FSMGen tool uses symbolic execution and predicate abstraction to statically analyze implementations in order to infer FSMs. We tested FSMGen for a number of applications and system components and found that the inferred FSMs 122 capture the functionality of the target applications quite well, and reveal interesting (potential) program errors. FSMGen thus contributes towards making networked systems more reliable and robust, and addresses the programming challenges of finding gaps in programmer intent and actual functionality, and potential vulnerabilities in code. We described MAX, a tool that uses symbolic and system-level concrete execution for auto- matic exploration of manipulation attacks in network protocol implementations in C. We showed that MAX scales to real-world implementations of TCP, and is also flexible enough to accommo- date simulator implementations of protocols such as 802.11. We demonstrated that MAX is able to identify a variety of manipulation attacks in these protocols, with some inputs from the devel- oper. MAX thus helps developers in identifying potential vulnerabilities to manipulation attacks in network protocol implementations, and contributes towards making these implementations ro- bust to manipulation attacks. 123 Bibliography [1] Optimistic tcp acknowledgements can cause denial of service. Vulnerability Note VU 102014, US-CERT, 2005. [2] Rajeev Alur, Pavol ˇ Cern´ y, P. Madhusudan, and Wonhong Nam. Synthesis of interface spec- ifications for java classes. SIGPLANNot., 40(1), 2005. [3] Glenn Ammons, Rastislav Bodik, and James R. Larus. Mining specifications. In Proc. of POPL, 2002. [4] Will Archer, Philip Levis, and John Regehr. Interface contracts for tinyos. InProc.ofIPSN, 2007. [5] Thomas Ball and Sriram K. Rajamani. The slam project: debugging system software via static analysis. InProc. ofPOPL, 2002. [6] John Bellardo and Stefan Savage. 802.11 denial-of-service attacks: real vulnerabilities and practical solutions. In SSYM’03: Proceedings of the 12th conference on USENIX Security Symposium, pages 2–2, Berkeley, CA, USA, 2003. USENIX Association. [7] Juan Caballero, Zhenkai Liang, Pongsin Poosankam, and Dawn Song. Towards generating high coverage vulnerability-based signatures with protocol-level constraint-guided explo- ration. InProceedings of RAID, 2009. [8] Christian Cadar, Daniel Dunbar, and Dawson Engler. Klee: Unassisted and automatic gen- eration of high-coverage tests for complex systems programs. InOSDI, 2008. [9] Cristian Cadar, Paul Twohey, Vijay Ganesh, and Dawson Engler. Exe: A system for auto- matically generating inputs of death using symbolic execution. In Proceedings of the 13th ACM Conference onComputer andCommunications Security (CCS), 2006. [10] Gruia Calinescu, Cristina G. Fernandes, and Bruce Reed. Multicuts in unweighted graphs with bounded degree and bounded tree-width. LNCS 1998. [11] Brian D. Carlstrom, Austen McDonald, Hassan Chafi, JaeWoong Chung, Chi Cao Minh, Christos Kozyrakis, and Kunle Olukotun. The Atomos transactional programming lan- guage. InPLDI2006. [12] Richard Chang, Guofei Jiang, Franjo Ivancic, Sriram Sankaranarayanan, and Vitaly Shmatikov. Inputs of coma: Static detection of denial-of-service vulnerabilities. In CSF ’09: Proceedingsofthe200922ndIEEEComputerSecurityFoundationsSymposium, 2009. 124 [13] D. Chen, J. Deng, and P. K. Varshney. Protecting wireless networks against a denial of service attack based on virtual jamming. InMobiCom Poster, 2003. [14] David Chu, Lucian Popa, Arsalan Tavakoli, Joseph M. Hellerstein, Philip Levis, Scott Shenker, and Ion Stoica. The design and implementation of a declarative sensor network system. InSenSys, pages 175–188, 2007. [15] Nathan Cooprider, Will Archer, Eric Eide, David Gay, and John Regehr. Efficient memory safety for tinyos. InProc. ofSenSys, 2007. [16] Nathan Cooprider and John Regehr. Pluggable abstract domains for analyzing embedded software. InProc. of LCTES, 2006. [17] David E. Culler, Andrea C. Arpaci-Dusseau, Seth Copen Goldstein, Arvind Krishnamurthy, Steven Lumetta, Thorsten von Eicken, and Katherine A. Yelick. Parallel programming in Split-C. In Supercomputing,1993. [18] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: path-sensitive program verification in polynomial time. InProc. ofPLDI, 2002. [19] Leonardo Mendonc ¸a de Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InTACAS, pages 337–340, 2008. [20] Ahmed K. Elmagarmid. A survey of distributed deadlock detection algorithms. SIGMOD Rec., 1986. [21] David Ely, Neil Spring, David Wetherall, Stefan Savage, and Tom Anderson. Robust con- gestion signaling. InProceedings IEEEICNP 2001, pages 332–341, 2001. [22] Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. Extended static checking for Java. InProceedingsoftheACMSIGPLAN’02 Conference on ProgrammingLanguageDesignandImplementation, June 2002. [23] Christian Frank and Kay R¨ omer. Algorithms for generic role assignment in wireless sensor networks. InProc. of SenSys, 2005. [24] David Gay, Philip Levis, Robert von Behren, Matt Welsh, Eric Brewer, and David Culler. The nesC language: A holistic approach to networked embedded systems. InProc.ofPLDI, 2003. [25] David Gelernter and Nicholas Carriero. Coordination languages and their significance. Commun. ACM, 1992. [26] Omprakash Gnawali, Ben Greenstein, Ki-Young Jang, August Joki, Jeongyeup Paek, Mar- cos Vieira, Deborah Estrin, Ramesh Govindan, and Eddie Kohler. The tenet architecture for tiered sensor networks. InProceedingsof4thACMInternationalConferenceonEmbedded Networked Sensor Systems(SenSys’06), Boulder, Colorado, November 2006. [27] Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: directed automated random test- ing. In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language designandimplementation, pages 213–223. ACM, 2005. 125 [28] Ramakrishna Gummadi, Omprakash Gnawali, and Ramesh Govindan. Macro- programming wireless sensor networks using Kairos. InDCOSS 2005. [29] Ramakrishna Gummadi, Nupur Kothari, Todd Millstein, and Ramesh Govindan. Declara- tive failure recovery for sensor networks. InAOSD2007. [30] Chih-Chieh Han and et al. A dynamic operating system for sensor nodes. In Proc. of MobiSys, 2005. [31] Tim Harris, Simon Marlow, Simon Peyton-Jones, and Maurice Herlihy. Composable mem- ory transactions. InPPoPP2005. [32] Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar. Permissive interfaces. In Proc. of ESEC/FSE, 2005. [33] Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Gregoire Sutre. Lazy abstrac- tion. In POPL ’02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Prin- ciples of programminglanguages, pages 58–70, 2002. [34] J. Hill and et al. System architecture directions for networked sensors. SIGOPS Oper. Syst. Rev., 2000. [35] Timothy W. Hnat, Tamim I. Sookoor, Pieter Hooimeijer, Westley Weimer, and Kamin Whitehouse. Macrolab: a vector-based macroprogramming framework for cyber-physical systems. InSenSys, pages 225–238, 2008. [36] Galen C. Hunt and Michael L. Scott. The Coign automatic distributed partitioning system. InOSDI 1999. [37] Oliver Kasten and Kay R¨ omer. Beyond event handlers: Programming wireless sensors with attributed state machines. InProc.ofIPSN, 2005. [38] Pete Keleher, Alan L. Cox, and Willy Zwaenepoel. Lazy release consistency for software distributed shared memory. InISCA1992, 1992. [39] Sarfraz Khurshid, Corina Psreanu, and Willem Visser. Generalized symbolic execution for model checking and testing. In Tools and Algorithms for the Construction and Analysis of Systems, volume 2619 of LectureNotes inComputer Science, pages 553–568. 2003. [40] Charles Killian, James W. Anderson, Ryan Braud, Ranjit Jhala, and Amin Vahdat. Mace: Language support for building distributed systems. In Proceedings of the ACM SIGPLAN Conference on ProgrammingLanguageDesignandImplementation, 2007. [41] T. Kim and S. Hong. State machine based operating system architecture for wireless sensor networks. Parallel andDistributedComputing: Applications andTechnologies, 2004. [42] James C. King. Symbolic execution and program testing. Commun. ACM, 19(7), 1976. [43] Charles Koelbel. An overview of High Performance Fortran. SIGPLAN Fortran Forum, 11(4), 1992. 126 [44] Nupur Kothari, Ramakrishna Gummadi, Todd Millstein, and Ramesh Govindan. Reliable and efficient programming abstractions for wireless sensor networks. In Proc. of PLDI, 2007. [45] Nupur Kothari, Todd Millstein, and Ramesh Govindan. Deriving state machines from tinyos programs using symbolic execution. In IPSN ’08: Proceedings of the 7th International Conference on InformationProcessinginSensor Networks, pages 271–282, 2008. [46] Nupur Kothari, Todd Millstein, and Ramesh Govindan. Deriving state machines from tinyos programs using symbolic execution. InIPSN, 2008. [47] Lakshman Krishnamurthy, Robert Adler, Phil Buonadonna, Jasmeet Chhabra, Mick Flani- gan, Nandakishore Kushalnagar, Lama Nachman, and Mark Yarvis. Design and deployment of industrial sensor networks: Experiences from a semiconductor plant and the north sea. InSenSys 2005. [48] Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Automating mimicry attacks using static binary analysis. In SSYM’05: Proceedings of the 14th conference on USENIX Security Symposium, pages 11–11, Berkeley, CA, USA, 2005. USENIX Association. [49] B. J. Kuipers and Y .-T. Byun. A robust qualitative method for spatial learning in unknown environments. In Proceedings of 7th National Conference on Artificial Intelligence (AAAI- 88), Saint Paul, Minnesota, July 1988. [50] P. Levis, D. Gay, and D. Culler. Bridging the gap: Programming sensor networks with application specific virtual machines. Technical Report UCB//CSD-04-1343, UC Berkeley, 2004. [51] Philip Levis and et al. The emergence of networking abstractions and techniques in tinyos. InProc. of NSDI, 2004. [52] Barbara Liskov. Distributed programming in Argus. Commun. ACM, 31(3), 1988. [53] Hongzhou Liu, Tom Roeder, Kevin Walsh, Rimon Barr, and Emin Gün Sirer. Design and implementation of a single system image operating system for ad hoc networks. In MobiSys 2005. [54] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. The design of an acquisitional query processor for sensor networks. InSIGMOD2003. [55] Ratul Mahajan, Steven M. Bellovin, Sally Floyd, John Ioannidis, Vern Paxson, and Scott Shenker. Controlling high bandwidth aggregates in the network. SIGCOMMComput.Com- mun. Rev., 32(3):62–73, 2002. [56] Geoffrey Mainland, Greg Morrisett, and Matt Welsh. Flask: Staged functional program- ming for sensor networks. In ICFP ’08: Proceeding of the 13th ACM SIGPLAN interna- tionalconferenceonFunctionalprogramming, pages 335–346, New York, NY , USA, 2008. ACM. 127 [57] Miklos Maroti, Branislav Kusy, Gyula Simon, and Akos Ledeczi. The flooding time syn- chronization protocol. In Proceedings of 2nd ACM International Conference on Embedded Networked Sensor Systems(SenSys’04), Baltimore, Maryland, November 2004. [58] Bill McCloskey, Feng Zhou, David Gay, and Eric Brewer. Autolocker: synchronization inference for atomic sections. InPOPL’06: Conferencerecordofthe33rdACMSIGPLAN- SIGACT symposiumonPrinciplesofprogramminglanguages, pages 346–358, 2006. [59] Vishnu Navda, Aniruddha Bohra, and Samrat Ganguly. Using channel hopping to increase 802.11 resilience to jamming attacks. InIEEEInfocom Minisymposium, pages 2526–2530, 2007. [60] George C. Necula, Scott McPeak, S.P. Rahul, and Westley Weimer. CIL: Intermediate language and tools for analysis and transformation of C programs. InProc.ofCCC, 2002. [61] A. Nerode. Linear automaton transformations. Proc. of the American Mathematical Soci- ety, 9, August 1958. [62] Matthias Neubauer and Peter Thiemann. From sequential programs to multi-tier applica- tions by program transformation. InPOPL2005. [63] James Newsome and Dawn Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. InNDSS, 2005. [64] R. Newton and M. Welsh. Region Streams: Functional macroprogramming for sensor net- works. InProc. of DMSN2004. [65] Ryan Newton, Arvind, and Matt Welsh. Building up to macroprogramming: An intermedi- ate language for sensor networks. InIPSN2005. [66] Ryan Newton, Arvind, and Matt Welsh. Building up to macroprogramming: an intermedi- ate language for sensor networks. InProc.ofIPSN, 2005. [67] Yang Ni, Ulrich Kremer, Adrian Stere, and Liviu Iftode. Programming ad-hoc networks of mobile and resource-constrained devices. InPLDI2005. [68] Animesh Pathak, Luca Mottola, Amol Bakshi, Viktor K. Prasanna, and Gian Pietro Picco. A compilation framework for macroprogramming networked sensors. In DCOSS’07: Pro- ceedings of the 3rd IEEE international conference on Distributed computing in sensor sys- tems, pages 189–204, Berlin, Heidelberg, 2007. Springer-Verlag. [69] P. Pradhan, S. Kandula, W. Xu, A. Shaikh, and E. Nahum. Daytona: A user-level tcp stack. http://nms.lcs.mit.edu/kandula/data/daytona.pdf, 2003. [70] Nithya Ramanathan, Kevin Chang, Rahul Kapur, Lewis Girod, Eddie Kohler, and Deborah Estrin. Sympathy for the sensor network debugger. InProc.ofSenSys, 2005. [71] Raimondas Sasnauskas, Olaf Landsiedel, Muhammad Hamad Alizai, Carsten Weise, Stefan Kowalewski, and Klaus Wehrle. Kleenet: discovering insidious interaction bugs in wireless sensor networks before deployment. InProc.ofIPSN, pages 186–196, 2010. 128 [72] Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson. Tcp congestion control with a misbehaving receiver. SIGCOMM Comput. Commun. Rev., 29(5):71–78, 1999. [73] Koushik Sen, Darko Marinov, and Gul Agha. Cute: a concolic unit testing engine for c. SIGSOFT Softw. Eng. Notes, 30(5), 2005. [74] Cory Sharp, Shawn Schaffert, Alec Woo, Naveen Sastry, Chris Karlof, Shankar Sastry, and David Culler. Design and implementation of a sensor network system for vehicle track- ing and autonomous interception. In Proceedings of 2nd European Workshop on Wireless Sensor Networks (EWSN), Sydney, Australia, January 2005. [75] Roy Shea, Shane Markstrum, Todd Millstein, Rupak Majumdar, and Mani B. Srivastava. Static checking for dynamic resource management in sensor network systems. Technical Report TR-UCLA-NESL-200611-02, UCLA, 2006. [76] Donald Shoup. New York Times Op-Ed: Gone Parkin’,http://www.nytimes.com/ 2007/03/29/opinion/29shoup.html. [77] Milan Stanojevic, Ratul Mahajan, Todd Millstein, and Madanlal Musuvathi. Can you fool me? towards automatically checking protocol gullibility. InProc.ofHotNets, 2008. [78] Aaron Stump, Clark W. Barrett, and David L. Dill. CVC: A oooperating validity checker. InProc. of CAV, 2002. [79] Nikolai Tillmann and Jonathan de Halleux. Pexwhite box test generation for .net. In Tests and Proofs, volume 4966 of LectureNotes inComputer Science, pages 134–153. 2008. [80] Gilman Tolle and David Culler. Design of an application-cooperative management system for wireless sensor networks. InProc.ofEWSN, 2005. [81] Gilman Tolle, Joseph Polastre, Robert Szewczyk, David Culler, Neil Turner, Kevin Tu, Stephen Burgess, Todd Dawson, Phil Buonadonna, David Gay, and Wei Hong. A macro- scope in the Redwoods. InSenSys 2005. [82] M. Welsh and G. Mainland. Programming sensor networks using abstract regions. InNSDI ’04: Proceedings of the First Symposium on Networked Systems Design and Implementa- tion, pages 29–42, 2004. [83] Yichen Xie, Andy Chou, and Dawson Engler. Archer: using symbolic, path-sensitive anal- ysis to detect memory access errors. InProc.ofESEC/FSE, 2003. [84] Maysam Yabandeh, Nikola Knezevic, Dejan Kostic, and Viktor Kuncak. Crystalball: pre- dicting and preventing inconsistencies in deployed distributed systems. InNSDI’09, 2009. [85] Jing Yang, Mary Lou Soffa, Leo Selavo, and Kamin Whitehouse. Clairvoyant: A compre- hensive source-level debugger for wireless sensor networks. InProc.ofSenSys, 2007. 129
Abstract (if available)
Abstract
Networked systems have an important role in our lives. Ranging from the Internet to new and upcoming domains like wireless sensor networks, smart-phones and data-centers, they are transforming the way we use computing. For networked systems to be of practical use, they need to be reliable, efficient, and robust. Building such systems poses a number of programming challenges.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Reliable languages and systems for sensor networks
PDF
Robust routing and energy management in wireless sensor networks
PDF
Techniques for efficient information transfer in sensor networks
PDF
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Gradient-based active query routing in wireless sensor networks
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Rate adaptation in networks of wireless sensors
PDF
Reliable and power efficient protocols for space communication and wireless ad-hoc networks
PDF
Transport layer rate control protocols for wireless sensor networks: from theory to practice
PDF
Models and algorithms for energy efficient wireless sensor networks
PDF
Robust and efficient geographic routing for wireless networks
PDF
On location support and one-hop data collection in wireless sensor networks
PDF
Collaborative detection and filtering of DDoS attacks in ISP core networks
PDF
Cooperation in wireless networks with selfish users
PDF
Global analysis and modeling on decentralized Internet
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Understanding and exploiting the acoustic propagation delay in underwater sensor networks
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Networked cooperative perception: towards robust and efficient autonomous driving
Asset Metadata
Creator
Kothari, Nupur
(author)
Core Title
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/29/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
network protocols,OAI-PMH Harvest,program analysis,programming languages,symbolic execution,wireless sensor network
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Govindan, Ramesh (
committee chair
), Krishnamachari, Bhaskar (
committee member
), Millstein, Todd (
committee member
), Prasanna, Viktor K. (
committee member
)
Creator Email
nkothari@usc.edu,nupurk@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3565
Unique identifier
UC1428117
Identifier
etd-Kothari-4055 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-411062 (legacy record id),usctheses-m3565 (legacy record id)
Legacy Identifier
etd-Kothari-4055.pdf
Dmrecord
411062
Document Type
Dissertation
Rights
Kothari, Nupur
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
network protocols
program analysis
programming languages
symbolic execution
wireless sensor network