Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Topology generation for protocol testing: methodology and case studies
(USC Thesis Other)
Topology generation for protocol testing: methodology and case studies
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Transcript (if available)
Content
TOPOLOGY GENERATION FOR PROTOCOL TESTING: METHODOLOGY AND CASE STUDIES by Ganesha Bhaskara A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2010 Copyright 2010 Ganesha Bhaskara ii Acknowledgments I would like to thank my guides Dr. Sandeep Gupta and Dr. Ahmed Helmy for their guidance, support and encouragement without which, this work would not have been possible. I would also like to thank Terry Benzel for her encouragement and support. I would also like to thank my parents for their unconditional love and support. I would like to thank my dear friends Shesha and Deepa for hosting me for a whole month during my stay in Los Angeles at the time of my thesis defense. Last, but not the least, I would like to thank my dear wife Jai, for believing in me and sharing the ups and downs during my graduate school. iii Table of Contents Acknowledgments......................................................................................................................................ii List of Figures............................................................................................................................................vi List of Tables.............................................................................................................................................ix Abstract.......................................................................................................................................................x Chapter 1: Introduction...............................................................................................................................1 Chapter 2: Motivation and problem statement...........................................................................................4 2.1: Motivation......................................................................................................................................4 2.2: Problem statement..........................................................................................................................7 2.3: Utility of necessary topology conditions.......................................................................................8 2.4: Chapter 2 summary......................................................................................................................11 Chapter 3: Background.............................................................................................................................12 3.1: Topology model...........................................................................................................................12 3.2: Partial topologies.........................................................................................................................15 3.3: Extended communicating finite state machine............................................................................16 3.3.1: Formal model......................................................................................................................16 3.3.2: Types of eCFSMs................................................................................................................19 3.3.3: Advantages and dis-advantages...........................................................................................20 3.4: Global eCFSM (GCFSM)............................................................................................................21 3.4.1: Formal model......................................................................................................................21 3.4.2: Packet delivery and atomic operation.................................................................................21 3.5: State space of GCFSM.................................................................................................................22 3.5.1: Behavior of GCFSMs..........................................................................................................25 3.5.2: Reachability based on target behavior.................................................................................27 3.6: Chapter 3 summary......................................................................................................................27 Chapter 4: Methodology...........................................................................................................................28 4.1: Necessary topology conditions on a given topology...................................................................28 4.2: Necessary topology conditions on a given topology without state space exploration ................32 4.3: Necessary topology conditions on all topologies without state space enumeration....................40 4.3.1: Transitive closure graph......................................................................................................45 4.3.2: Paths representing the target behavior in TCG...................................................................47 4.3.3: Generating necessary conditions from a path in the TCG...................................................48 4.3.4: Optimizing the TCG............................................................................................................50 4.3.5: Handling loops in TCG.......................................................................................................57 4.4: COMPRESS methodology...........................................................................................................58 4.5: Chapter 4 summary......................................................................................................................60 iv Chapter 5: Relationship of COMPRESS to related work.........................................................................62 5.1: Overview of the testing problem..................................................................................................62 5.2: Expressiveness of property under test..........................................................................................63 5.3: Comparison of COMPRESS with other existing methodologies................................................64 5.4: Chapter 5 summary......................................................................................................................69 Chapter 6: Analysis of COMPRESS.........................................................................................................71 6.1: Constructs used in the proof........................................................................................................71 6.2: Proof of completeness of COMPRESS .......................................................................................75 6.2.1: Baseline...............................................................................................................................75 6.2.2: Completeness of TCG.........................................................................................................78 6.2.3: Completeness of pruned TCG.............................................................................................87 6.2.4: Completeness of topology generation.................................................................................92 6.2.5: Generalization for multiple eCFSM state transitions in target behavior.............................98 6.2.6: Generalization for multiple AEEs.......................................................................................99 6.3: Complexity of COMPRESS......................................................................................................101 6.4: Tightness of necessary conditions..............................................................................................102 6.5: Interpreting negative results.......................................................................................................104 6.6: Chapter 6 summary....................................................................................................................105 Chapter 7: Evaluation of COMPRESS and analysis of results...............................................................106 7.1: Case studies I: Static topologies ...............................................................................................109 7.1.1: Multicast based micro mobility protocol...........................................................................110 7.1.1.1: Target behaviors studied for M&M protocol............................................................112 7.1.1.2: Results and data .......................................................................................................112 7.1.2: Resource discovery protocols............................................................................................116 7.1.2.1: Simple resource discovery protocol..........................................................................116 7.1.2.2: Augmented zero-conf protocol.................................................................................119 7.1.3: Client server based protocols............................................................................................123 7.1.3.1: TCP: same port, same DST......................................................................................125 7.1.3.2: TCP port spray..........................................................................................................126 7.1.3.3: HTTP........................................................................................................................127 7.2: Insights into topology generation and COMPRESS..................................................................129 7.3: Advantages and limitations of COMPRESS..............................................................................133 7.3.1: Advantages of COMPRESS..............................................................................................133 7.3.2: Limitations of COMPRESS..............................................................................................134 7.4: Chapter 7 summary....................................................................................................................138 Chapter 8: Extensions of COMPRESS methodology.............................................................................139 8.1: Case Study II: Topology dynamics............................................................................................139 8.2: Handling packet loss semantics in COMPRESS.......................................................................148 8.3: Handling Delay Semantics in COMPRESS...............................................................................149 8.4: Chapter 8 summary....................................................................................................................155 Chapter 9: Guidelines for using COMPRESS........................................................................................156 9.1: Topology model.........................................................................................................................157 9.1.1: Vertices..............................................................................................................................157 9.1.2: Fields.................................................................................................................................158 9.1.3: Edges.................................................................................................................................159 9.1.4: Packet delivery rules.........................................................................................................160 9.1.5: Local, global and path constraints.....................................................................................161 9.2: Constraints imposed by COMPRESS........................................................................................163 v 9.3: Protocol model...........................................................................................................................165 9.4: Partial topologies.......................................................................................................................166 9.5: Target behavior...........................................................................................................................169 9.6: Construction of topology model used in M&M case study.......................................................170 9.7: Chapter 9 summary....................................................................................................................173 Chapter 10: Contributions and future work............................................................................................174 10.1: Contributions............................................................................................................................174 10.2: Future work..............................................................................................................................178 10.2.1: Extend and augment COMPRESS framework................................................................178 10.2.2: Relationship of design and testing to open processes.....................................................180 10.3: Chapter 10 summary................................................................................................................181 References...............................................................................................................................................182 vi List of Figures Figure 1: Protocol analysis using reachability in state space on a given topology.....................................5 Figure 2: Manual selection of topology during testing...............................................................................5 Figure 3: Number of equivalence classes of configurations for a simple LAN topology..........................6 Figure 4: A typical eCFSM table..............................................................................................................17 Figure 5: Mapping of address across layers.............................................................................................19 Figure 6: Local and global state transitions..............................................................................................23 Figure 7: Path in GCFSM state space exhibiting the target behavior.......................................................31 Figure 8: Compact representation of GCFSM state transition for a given topology................................33 Figure 9: Necessary topology conditions for a partial GCFSM state transition to occur.........................37 Figure 10: Paths representing target behavior...........................................................................................39 Figure 11: Compact representation per instance of topology...................................................................41 Figure 12: TCG of a networked system with 4 types of allowed eCFSM types......................................46 Figure 13: Path representing the TCG......................................................................................................48 Figure 14: Sample transitive closure graph..............................................................................................51 Figure 15: Compact representation of packet forwarding paths...............................................................52 Figure 16: Optimized representation of TCG...........................................................................................56 Figure 17: Paths in global state and TCG.................................................................................................58 Figure 18: COMPRESS methodology......................................................................................................59 Figure 19: Path in the GCFSM state space of a topology instance transformed to a TCG......................72 Figure 20: Structure of transitive closure graph generated by COMPRESS............................................73 Figure 21: Reconstruction of GCFSM paths using topology independent TCG and ti............................88 Figure 22: Unrolling of loops in paths representing the target behavior in TCG*...................................97 Figure 23: Tradeoff between quality of reachability based answers and complexity.............................103 vii Figure 24: Client and server FSMs for mobility detection and handover protocol................................111 Figure 25: Pictorial representation of necessary topology conditions for TB 5 and 7...........................114 Figure 26: Effectiveness of COMPRESS for target behaviors 1,2,3 for M&M protocol.......................114 Figure 27: Effectiveness of COMPRESS for target behaviors 4,5,6,7 for M&M protocol....................115 Figure 28: Effectiveness of COMPRESS as percentage of eliminated topologies.................................115 Figure 29: State machine of a simple resource discovery protocol........................................................117 Figure 30: Pictorial representation for necessary conditions for target behavior 2................................118 Figure 31: Effectiveness of COMPRESS for resource discovery on LAN............................................119 Figure 32: Pictorial representation of necessary conditions for TB 4....................................................121 Figure 33: Performance of COMPRESS for Enterprise wide resource discovery protocol...................122 Figure 34: Effectiveness of COMPRESS for enterprise wide resource discovery protocol..................123 Figure 35: FSM of client server protocol exchanging a sequence of messages.....................................124 Figure 36: Necessary topology conditions generated by COMPRESS..................................................125 Figure 37: Performance if COMPRESS for TCP based client server protocol......................................126 Figure 38: Pictorial representation of necessary topology conditions....................................................127 Figure 39: Performance of COMPRESS for TCP based client server protocol.....................................127 Figure 40: Necessary topology conditions for HTTP based client server protocol................................128 Figure 41: Performance of COMPRESS for HTTP based client server protocol..................................128 Figure 42: Dependence of occurrence of target behavior......................................................................131 Figure 43: Comparison between COMPRESS and full state space / topology space analysis..............132 Figure 44: Modeling finite topology dynamics......................................................................................140 Figure 45: Physical host model with and without virtualization............................................................141 Figure 46: FSMs of virtual and physical hosts tracking handovers as states.........................................142 Figure 47: ECRIT protocol state machine..............................................................................................143 Figure 48: Topologies satisfying necessary and sufficient conditions (ECRIT)....................................146 viii Figure 49: Effectiveness of COMPRESS for ECRIT case study............................................................147 Figure 50: Modeling loss and lossy links...............................................................................................149 Figure 51: Abstract delay semantics modeled in COMPRESS..............................................................151 Figure 52: BS with two interfaces modeled as two vertices with single interface.................................157 Figure 53: Physical and virtual edges in an instance of topology...........................................................159 Figure 54: Forwarding of unicast IP packet at IP layer...........................................................................160 Figure 55: Undesirable mapping of virtual edge to infinite physical paths............................................165 Figure 56: Examples of representing partial topologies.........................................................................168 ix List of Tables Table 1: Approaches to handling network topology in protocol testing...................................................65 Table 2: Naive approach Vs COMPRESS approach................................................................................78 Table 3: Comparison of handling of local paths between the naive and COMPRESS approaches. .......88 Table 4: Summary of results of case studies...........................................................................................109 Table 5: Target behaviors for which topologies were generated for variants of M&M..........................112 Table 6: Necessary conditions generated by COMPRESS for M&M....................................................112 Table 7: Performance indicators of COMPRESS for M&M..................................................................113 Table 8: Target behaviors studied for resource discovery protocol on LAN..........................................118 Table 9: Necessary topology conditions generated by COMPRESS for target behaviors.....................118 Table 10: Number of paths evaluated for resource discovery on LAN..................................................119 Table 11: Target behavior used for topology generation for extended ZeroConf protocol....................120 Table 12: Necessary conditions generated by COMPRESS...................................................................121 Table 13: Number of paths evaluated to generate necessary topology conditions.................................121 Table 14: Number of paths evaluated to generate necessary topology conditions.................................129 Table 15: Target behaviors studied for ECRIT protocol.........................................................................144 Table 16: Necessary topology conditions generated by COMPRESS for ECRIT protocol...................145 Table 17: Number of paths evaluated to generate necessary topology conditions for ECRIT...............147 Table 18: Summary of results of case studies.........................................................................................177 x Abstract One of the key steps in the design or refinement of an application or network protocol is, testing of the protocol for correctness and performance. Simulation and state space exploration are the most popular ways of testing protocols. Simulation tools like NS, OPNET etc., allow the designer to test/evaluate a protocol's performance on fully specified instances of topologies. State space exploration tools like SPIN allow the designer to test the protocol state space on fully specified instances of topologies for the desired properties. Correctness or (target) behavior of protocols can be expressed using designer specified reachability properties and standard structural properties (invariants on states), whereas, performance of protocols is often expressed as aggregate over states of the protocol (average throughput, average packet loss, etc.). The scope of this study is limited to the correctness of control or house keeping functions of protocols. Correctness and performance of many protocols and distributed applications in Internet Protocol (IP) based network depends on the network topology, i.e., on the end and intermediate nodes, their physical and logical (MAC, IP, transport etc.) inter-connectivity, as well as the relationship between the control fields of the different nodes. Theoretically, since topology controls packet delivery, it affects of state space of the system and hence must be considered as an input during testing. Though end to end topology abstractions are routinely used to evaluate and test protocols, it is not sufficient as specific topology configurations can drive the performance of the protocol to unacceptable levels or cause the protocol to misbehave. For example, certain topology configurations may cause the throughput of the protocol to decrease by orders of magnitude (microwave interference for 802.11 protocol) or the node requesting an IP address may not get an IP address (specific configurations of tunnels) or a node that is supposed to detect mobility may not detect mobility (in the presence of virtual machines). Not relying only on the high level abstraction of the end to end topology is especially important in today's world due to the large scale use of intermediate nodes that do not adhere to the end-to-end principles [salt84]. When America On Line (AOL) introduced a load balancing transparent HTTP proxy for its users, a great majority of its subscribers were unable to use a the AOL chat client to communicate with others on the other side of the proxy. Detecting such topology xi induced errors in protocols after deployment can be a costly affair due to the large scale disruption and its associated support costs. In short topology can severely affect protocol performance and correctness and hence the topology space on which the protocol is expected to execute needs to be characterized. To thoroughly test IP based protocols and applications, for each protocol behavior or protocol property, the testing procedure must iterate implicitly or explicitly over the set of all topology configurations (potentially large in case of IP) on which the protocol is expected to execute. Formally for each behavior B, ∀ topologies t i ∈ T valid , theset of all valid topologies ∀ initial states j of the system on topology t i ∀ permutations x of Autonomous External EventsAEE ∀ permutationsk of injecting AEE x into t i, j check if behavior B occurs in the state space of t i , j ,k AEE x Today, simulation and state space enumeration tools are used to test a protocol on a specific topology instance. In the above procedure, the loops within the outermost topology enumeration loop, those representing state space enumeration over a given topology instance has been extensively studied in [alur97], [bing04], [clar00], [henz00], [lai02], [leiy02] and [scho98]. Though this work has to deal with the complexity of state space enumeration indirectly, addressing the state space enumeration problem itself is not a part of this work. Any testing procedure, be it simulation or state space enumeration has to deal with the very large space of scenarios that can exist in the behavior space. Addressing the large scenario space is beyond the scope of this work. In the above procedure, the outer most loop enumerating individual instances of topologies in the topology configuration space, will incur impractical runtime complexity due to combinatorial explosion in the number of topology instances when the number of components that make up the topology and/or their xii configurations are very large. Further, to ensure thorough testing, the set of topologies on which the protocol can misbehave must be used during testing. Thus, an approach that can differentiate between the set of topologies on which the behavior occurs, and, the set on which the behavior does not occur, can be very useful to reduce the complexity arising out of the necessity to consider all topology configurations to ensure thorough testing. Though several studies ([anag02], [guna07], [heid01], [helm00], [helm04-1], [pete03-1] and [pete03]) have attempted to study the influence of topology on protocol behavior, a methodology to systematize selection or generation of topologies that can used for effective protocol testing has received little attention. This has often resulted in errors in protocols triggered by specific topology configuration going unnoticed at design time only to show up after deployment. The focus of this work is to generate necessary topology conditions that any topology on which the target behavior occurs must satisfy. This can be used to systematize topology coverage while protocol testing, by eliminating the large class of topologies that do not satisfy the necessary conditions and focusing the simulation and state space enumeration efforts on the topology space on which the problematic behavior occurs. To generate the necessary topology conditions, the procedure must implicitly or explicitly consider all topology configurations and for each topology, determine weather the target behavior occurs or not. A direct approach that uses reachability analysis on each topology configuration has very high complexity due to the potentially large state space for each instance of the topology configuration. There is a trade-off between the computational and space complexities and accuracy of practical search techniques. On one hand, the complexity of the problem can be reduced by restricting the properties of the systems where efficient decidability is guaranteed [emer05]. On the other hand, trade-offs can be made between the quality of the solution (closeness to sufficient conditions and/or decidability) and complexity. Two of the most common techniques used in practical fault detection techniques are state space folding (abstraction) and sampling [youn89]. State space folding obtained by abstracting details of the system introduces false positives for error detection, whereas, the sampling technique introduces false negatives for error detection. xiii To generate necessary topology conditions that a topology must satisfy for the target behavior to occur, a methodology called “Compressed Representation of State Space for topology configuration generation” (COMPRESS) is developed. This is based on the novel combination of two basic ideas (a) represent complex topologies as a composition of simple end to end topology component abstractions derived from packet destination types (e.g., IP unicast, IP broadcast, etc) (b) use the information already embedded in the protocol FSMs (folded state space) to drive the composition of the end to end topology abstractions to obtain more complex, but relevant topologies. For a given behavior, COMPRESS, indirectly, i.e., without explicit topology enumeration, identifies necessary conditions satisfied by the set of all topologies on which that behavior may occur, without solving the reachability problem. To our knowledge, this is the first work that addresses the problem of generation of necessary topology conditions for a target behavior to occur. The usefulness of the COMPRESS methodology is evaluated using case studies of several different class of protocols including client server protocol, resource discovery protocol, emergency location information server protocol and multicast based micro-mobility protocol. For each protocol, behaviors that include errors that either appear as 100% performance degradation or unexpected behavior to the end user are studied. The necessary topology conditions generated by COMPRESS were highly non-intuitive and revealed unexpected or severe errors triggered often by small topology configurations. It is also shown how necessary topology conditions generated by COMPRESS can, not only be used to effectively and efficiently characterize the problematic topology space causing that behavior by eliminating a very large set of uninteresting topologies (up to 99.4%), but also be a powerful tool to characterize the severity of the problematic behavior itself. Through these case studies, it is shown that the topology conditions generated are highly non-intuitive, close to sufficient conditions and practically useful, but also the practical runtime complexity is quite manageable. The contributions of this work are multifaceted. The first contribution is articulating and characterizing the role of topology in systematic protocol testing. This also includes understanding the fundamental properties of the components that make up the solution space of the topology generation problem and properties of xiv possible solution approaches. The second contribution is the COMPRESS methodology, especially the topology augmented TCG data structure, extensions to handle topology dynamics and theoretical framework characterizing the properties (including limitations) of the results generated by COMPRESS. The third contribution is the set case studies that not only illustrate the applicability and usefulness of the COMPRESS methodology, but also reveal unexpected behavior in existing and proposed protocols in the presence of specific topology configurations. 1 Chapter 1: Introduction Distributed applications follow a set of rules, also called a protocol, in order to communicate over the network and achieve their objective. A protocol or a distributed application's specification can be tested 1 either for correctness or performance using tools like Simple Promela Interpreter (SPIN [holz97]), Network Simulator (NS2 [isiu07]), Optimized Network Engineering Tools (OPNET [opne07]), Qualnet [qual07], etc. Correctness of protocols are usually expressed as state space properties like live lock, dead lock and reachability properties. Performance of protocols are often expressed as aggregate over states (throughput, delay etc). The focus of this work in correctness of protocols, especially, the control and house keeping functions of protocols. Correctness and performance of many protocols and distributed applications in Internet Protocol (IP) based network depends on the network topology, i.e., on the end and intermediate nodes, their physical and logical (MAC, IP, transport etc.) inter-connectivity, as well as the relationship between the control fields of the different nodes. Theoretically, since topology controls packet delivery, it affects the state space of the system and hence must be considered as an input during testing. Though end to end topology abstractions are routinely used to evaluate and test protocols, it is not sufficient as specific topology configurations can drive the performance of the protocol to unacceptable levels or cause the protocol to misbehave. For example, certain topology configurations may cause the throughput of the protocol to decrease by orders of magnitude (microwave interference for 802.11 protocol) or the node requesting an IP address may not get an IP address (specific configurations of tunnels) or a node that is supposed to detect mobility may not detect mobility (in the presence of virtual machines). Not relying only on the high level abstraction of the end to end topology is especially important in today's world due to the large scale use of intermediate nodes that do not adhere to the end-to-end principles [salt84]. 1 Though conformance testing is important, it is not a part of this study. The focus here, is on developing tools and methodologies that can be used by the designer at design time. 2 Many protocol specific studies have shown that topologies play an important role in understanding the behavior of protocols. The use of dumbbell topologies for transmission control protocol (TCP) simulation studies has received significant attention in the past and studies like [ebra05] have shown that many interesting scenarios are missed when dumbbell topologies are used. [helm04-1] has shown how topology configurations affect the worst case behavior of timer suppression mechanism in multicast protocols. Non- tree topologies causing sub-optimal routing in Cellular IP is studied in [bhas03] and [baif04]. Many real world examples indicate that topology can have a drastic impact on both the correctness and the performance of the protocols. TCP's throughput on a wireless link with microwave interference may go to zero. Distributed sensor network performance vastly varies based on the time of the day, based on what other radio sources get activated based on human activity. When America On Line (AOL) introduced a load balancing transparent hyper text transport protocol (HTTP) proxy for its users, a great majority of its subscribers were unable to use a the AOL chat client to communicate with others on the other side of the proxy. Detecting such topology induced errors in protocols after deployment can be a costly affair due to the large scale disruption and its associated support costs. In short topology can severely affect protocol performance and correctness and hence the topology space on which the protocol is expected to execute needs to be characterized. From the above examples, it is clear that topology affects protocol behavior and hence is a critical input during the testing the protocol's correctness and performance. Given that the topology can drastically affect both the correctness and performance of the protocols, a systematic study of the topology space is required for thorough testing of protocols. Being able to assert properties of topologies (topology coverage) on which the protocols are expected to execute, is one step towards systematization of protocol testing. The focus of this work, is to develop methodologies and techniques that the designer can use to explore topology space to understand the behavior of network protocols. 3 This document is organized as follows: Chapter 2 motivates the research problem and defines the problem statement. In Chapter 3, various definition and concepts used in this work in introduced. The methodology used to address the research goal is developed in Chapter 4. The developed methodology is compared and contrasted with other approaches in Chapter 5. In Chapter 6, the completeness of the developed methodology in proved. Using case studies, the utility and effectiveness of the proposed methodology is demonstrated in Chapter 7. In Chapter 8, the developed methodology is extended to handle simple delay semantics. The important contributions and future work are highlighted in Chapter 9. 4 Chapter 2: Motivation and problem statement In this section, the ideal protocol testing procedure is first introduced. The practical limitations of such an approach are discussed and it is used to motivate the technical problem being addressed in this work. Then, some of the applications of such a methodology are shown. 2.1: Motivation To thoroughly test IP based protocols and applications, for each protocol behavior or protocol property, the testing procedure must iterate implicitly or explicitly over the set of all topology configurations (potentially large in case of IP) on which the protocol is expected to execute. Formally for each behavior B, ∀ topologies t i ∈ T valid , theset of all valid topologies ∀ initial states j of the system on topology t i ∀ permutations xof Autonomous External EventsAEE ∀ permutations k of injecting AEE x into t i, j check if behavior Boccursin the statespace of t i , j, k AEE x Today, simulation and state space enumeration tools are used to test a protocol on a specific topology instance (Figure 1). In the above procedure, the loops within the outermost topology enumeration loop, those representing state space enumeration over a given topology instance has been extensively studied in [alur97], [bing04], [clar00], [henz00], [lai02], [leiy02] and [scho98]. Though this work has to deal with the complexity of state space enumeration indirectly, addressing the state space enumeration problem itself is not a part of this work. Any testing procedure, be it simulation or state space enumeration has to deal with the very large space of scenarios that can exist in the behavior space. Addressing the large scenario space is beyond the scope of this work. 5 Due to the lack of systematic methodologies that address the topologies during protocol testing, a typical run in the protocol refinement loop today is as shown in Figure 2. Topologies are often selected manually based on the designer's intuition. Such a manual procedure is error prone and not systematic. Further, Figure 1: Protocol analysis using reachability in state space on a given topology. Figure 2: Manual selection of topology during testing. 6 manual selection of topologies becomes intractable to a human as the number of variables in the system increases. Thus, abstraction and automation needs to be used to systematize protocol testing. In the ideal procedure, the outer most loop enumerating individual instances of topologies in the topology configuration space, will incur impractical runtime complexity due to combinatorial explosion in the number of topology instances when the number of components that make up the topology and/or their configurations are very large. Figure 3 depicts the number of possible equivalence 2 classes of topology configurations for a simple LAN (wired and wireless) topology consisting of a client, a router and a variable number of DHCP servers. 2 An equivalence relation is a binary relation R over a set S which is reflexive, symmetric and transitive. For a given connectivity and node cardinality, the equivalence relation used here is the relationship between IP addresses of the different entities of the topology expressed using operators like “=”, “ <” and “>”. Figure 3: Number of equivalence classes of configurations for a simple LAN topology. 7 Further, to ensure thorough testing, the set of topologies on which the protocol can misbehave must be used during testing. Thus, an approach that can differentiate between the set of topologies on which the behavior occurs, and, the set on which the behavior does not occur, can be very useful to reduce the complexity arising out of the necessity to consider all topology configurations to ensure thorough testing. Though several studies ([anag02], [guna07], [heid01], [helm00], [helm04-1], [pete03-1] and [pete03]) have attempted to study the influence of topology on protocol behavior, a methodology to systematize selection or generation of topologies that can used for effective protocol testing has received little attention. This has often resulted in errors in protocols triggered by specific topology configuration going unnoticed at design time only to show up after deployment. The focus of this work is to generate necessary topology conditions that any topology on which the target behavior occurs must satisfy. This can be used to systematize topology coverage while protocol testing, by eliminating the large class of topologies that do not satisfy the necessary conditions and focusing the simulation and state space enumeration efforts on the topology space on which the problematic behavior occurs. 2.2: Problem statement Assuming protocols are modeled as finite state machines, and topologies are modeled as a graph, an ideal topology testing procedure as shown differentiates between the set of all topologies on which the behavior occurs and the ones on which the behavior does not occur. Let T={∅} ∀ topologies t i ∈ T valid , theset of all valid topologies ∀ initial states j of the system on topology t i ∀ permutations x of Autonomous External EventsAEE ∀ permutationsk of injecting AEE x into t i, j if behavior B occursin thestate space of t i , j, k AEE x T=T∪t i 8 When the above procedure terminates, T represents the set of all topologies on which the behavior B occurs. The goal of this work is to generate necessary topology conditions that every topology instance t i ∈T must satisfy. To generate the necessary topology conditions, the procedure must implicitly or explicitly consider all topology configurations and for each topology, determine weather the target behavior occurs or not. A direct approach that uses reachability analysis on each topology configuration has very high complexity due to the potentially large state space for each instance of the topology configuration. There is a trade-off between the computational and space complexities and accuracy of practical search techniques. On one hand, the complexity of the problem can be reduced by restricting the properties of the systems where efficient decidability is guaranteed [emer05]. On the other hand, trade-offs can be made between the quality of the solution (closeness to sufficient conditions and/or decidability) and complexity. Two of the most common techniques used in practical fault detection techniques are, state space folding (abstraction) and sampling [youn89]. State space folding obtained by abstracting details of the system introduces false positives for error detection, whereas, the sampling technique introduces false negatives for error detection. To generate necessary topology conditions that a topology must satisfy for the target behavior to occur, a methodology called “Compressed Representation of State Space for topology configuration generation” (COMPRESS), which generates necessary topology conditions without resorting to explicit topology enumeration, is developed. 2.3: Utility of necessary topology conditions In Internet application and network protocol design, one of the strategies used to cope with extremely large numbers of topology configurations is to design for acceptable level of correctness i.e., for most common topology configurations. The notion of acceptability is subjective and highly context dependent. For example, an error in a product shipped to millions of customers which can trigger calls to support centers 9 from a large percentage of users can be extremely expensive when compared to an error that could trigger calls from only a minute fraction of the users. Thus, the criticality of the errors and the nature of fixes are determined by the context in which the errors are viewed. Sufficient topology conditions or topology instances represent a point in the problematic topology space, whereas, necessary topology conditions can be used to characterize the relative size of the entire problematic topology space. Therefore, a methodology to generate necessary topology conditions for a given protocol behavior to occur is useful from a practical point of view. Consider, for example, a transport protocol that guarantees in-order packet delivery. Violation of that property is a deemed a correctness problem, which needs to be addressed either by protocol redesign or if the problem occurs only on certain topologies, by limiting the protocol to be deployed only on topologies where in-order delivery is preserved. In this case, being able to generate an instance as well as a large class of topologies on which the property of interest is violated will help the designer to make the appropriate decision. If the class of topologies on which the property of interest is violated is large, then the designer may choose to redesign the protocol. However, if the class of topologies on which the violation of the property of interest occurs is extremely small and the protocol can be deployed in a controlled/administered environment, then a decision may be made to prevent the error from occurring by avoiding the problematic topology configurations. Consider another case in which the protocol guarantees probabilistic packet delivery, for example 90% of packets are guaranteed to be delivered. Here, violation of reliable delivery of a single packet cannot by itself be classified as a correctness or performance problem as violation of reliable packet delivery with a low probability can be seen as a performance problem, whereas, a similar violation with a probability over a given threshold (10% in this case) can be seen as a correctness problem. In this case being able to qualify the regions in topology space where the behavior of probabilistic packet delivery protocol changes from performance to correctness will help the designer to make an appropriate decision about addressing the problem. In the case where the problem occurs on a small number of, but frequently seen topology 10 configurations, even generating few of those instances of topologies is sufficient for the designer to take the appropriate decision. In the case where there are many instances of problematic topology configurations, generating necessary conditions satisfied by all problematic topologies may be useful to characterize the problem. Thus the ability to generate behavior specific topology conditions, be it necessary or sufficient will be a useful tool for the protocol's designer. Consider a third case where a protocol is being evaluated for its worst case behavior. If the worst case behavior occurs on topology configurations that very rarely occur in reality 3 , or the behavior occurs with a very low probability on commonly deployed topologies, then the designer may simply decide to leave the protocol unmodified with the tradeoff that the worst case behavior may occur with a very low probability. However if the same protocol needs to have a bounded worst case behavior and if there exists even a single instance of topology on which this bound is broken, the designer will be required to redesign the protocol so that the worst case behavior is within the stipulated bound. The necessary conditions can be used to generate complete topologies on which the protocol behavior occurs as well as to understand the cause and severity of the protocol behavior based on necessary and sufficient topology properties. The methodology is evaluated by generating the necessary conditions that topologies must satisfy for several protocol behavioral scenarios for several client server based protocols, extended ZeroConf [stei05] like protocol, as well as in the mobility detection and handover mechanisms of the multicast-based micro-mobility protocol [helm04]. Later it is shownhow the developed methodology can be extended to incorporate finite topology dynamics, topology delay and packet loss semantics. To showcase the extensibility of the methodology necessary topology conditions are generated for several behavioral scenarios of IP based location resolution protocol [ecri06]. For most behaviors that were targeted, the topologies generated by the COMPRESS approach not only generated highly non-intuitive topologies, but also helped characterize the behavior of the protocols at a lower overall complexity. 3 In a controlled deployment, characterization of the deployed topology configuration space may be possible. 11 2.4: Chapter 2 summary In chapter 1 the importance of topology in protocol testing was shown. In this chapter the ideal protocol testing procedure was dissected to show that the existing work addresses only a part of the procedure that deals with state space enumeration on an instance of topology. Based on this, the problem statement, i.e., “generation of necessary topology conditions that any topology on which the behavior occurs needs to satisfy” is formally defined. Some of the challenges in pursuing such a goal are discussed followed by discussion about the utility of such a methodology. 12 Chapter 3: Background In this chapter important models, definitions and concepts used in this work are introduced. 3.1: Topology model Topology is modeled as a graph G(V , E) where, V is a set of vertices and E is a set of edges. Each vertex has a vertex type and address fields. The class to which a vertex belongs to, is determined by the vertex type. Each edge has an edge type field. Each vertex has all of the following type fields • Phy: wired/radio • MAC: End node/Base station (BS) /Local Area Network (LAN) • IP ◦ Address_type: local/global, static/dynamic ◦ Node_type: End host/ Gateway (GW)/ Designated Router (DR)/ Rendezvous Point (RP)/ NAT • Transport ◦ Type: TCP/UDP ◦ Functionality: Client/server/Load Balancer • Session ◦ Type: HTTP ◦ Functionality: Client/Server/Transparent Proxy/ Transparent Load balancing Proxy Vertex address fields have all of the following fields • MAC ◦ Unicast, Multicast, broadcast • IP ◦ Unicast, multicast, sub-domain broadcast, sub-domain multicast, link local broadcast, link_local multicast 13 Edges can be one of the following types • Phy: Wired / radio • MAC: Unicast / broadcast / multicast • IP: Unicast (transit / stub-(reg/vpn) / multicast (domain specific) / sub-domain broadcast / sub- domain multicast / Link Local Broadcast/ link local multicast • Transport: TCP / UDP • Session: HTTP Using the vertices and edges as defined earlier, topologies can be constructed. The rules that every topology must follow is captured as a set of rules called the topology building rules (TBR). TBR specifies the following: • Local Rules ◦ Vertex Rules ▪ Type Rules: Enumerates all valid type combination across Phy, MAC, IP, Transport and session + Protocol FSMs ▪ Address Rules: Specifies relationship between address within and across Phy, MAC, IP, Transport and session and (Protocol FSMs) ▪ Type / Address rules: For a given type of vertex, specifies restrictions on address ◦ Edge Rules: Given edge type E_type between vertex “x” and vertex “y” edge rules specifies ▪ Types of nodes x, y that can have an edge E_type between them ▪ Relationship between the x(type, address) and x(type, address) given existence of E_type (x,y) • Neighborhood Rules: For a given vertex x and edge type E_p, Neigh( x, E_p) = { set of all vertices y } | there exists E_p(x, y) ◦ Given a vertex type ▪ For each neighborhood defined by a set of edge_types • For each edge type ◦ For each vertex type 14 ▪ Restriction on cardinality ◦ Restriction on cardinality • Restriction on cardinality • Global Rules specific to a graph ◦ Vertices rules ▪ Relationship between vertex instances in the graph • Uniqueness of MAC address across all vertices • Uniqueness of mapping MAC to valid global addresses • Uniqueness of all valid global IP addresses across all vertices ▪ Restrictions on cardinality of node type ▪ Restrictions on cardinality of node type + address ◦ Edges rules ▪ Rules about set of all edges in the topology graph • Restriction on cardinality/set of edge_type ◦ Single edge type at one extreme ◦ All edge types at the other extreme ◦ Path rules ▪ Existence of an edge Edge_x(a, b) → existence of a path (a, b) through edges of types defined by set { edge_types } • (analogous to layers of network) ▪ Number of paths between node x and node y • For each Edge_type ▪ Path patterns • End to end Unicast IP: stub – transit – stub 15 Following are some of the important topology rules at the MAC layer • Every non-packet forwarding eCFSM in the MAC layer has exactly one and only one packet delivery eCFSM (BS or LAN) in its neighborhood. • LAN can only have zero or more BS and zero or more non-packet forwarding eCFSM in its neighborhood. • BS must have one and only one LAN it its neighborhood and may have zero or more non-packet forwarding eCFSMs. Following are some of the important properties at the IP layer • Every sub-domain has exactly one gateway ◦ In private networks, NAT acts as the gateway • IP routing is assumed to be symmetric and shortest path • It is assumed that there exists exactly one rendezvous point per domain. Topology enumeration is defined as the process of enumerating instance of topologies expressed as G(V , E) such that each of them satisfy the TBR. 3.2: Partial topologies Partial topology representation was newly developed as a part of this work, It is represented as <G(V , E), A>, where, G(V , E) is a graph with V and E having the same structure as described earlier and annotations A represent additional rules that the fully specified topology instance of which the Graph(V , E) is a sub graph of, must satisfy. Annotations can be vertex, edge, neighborhood annotations • In a partial topology the vertices address and type fields may not be fully specified. Vertex annotations limit the values the address and type fields can take. • Edges in a partial topology also may not be fully specified. The edge annotations may limit the values the edge type may take and also assert relationship between the <address, types> fields of the vertices between which that edge exists. 16 • Neighborhood annotations enforce the following restrictions ◦ For each neighborhood defined by a set of edge_types ▪ For each edge type • For each vertex type ◦ Restriction on cardinality • Restriction on cardinality ▪ Restriction on cardinality Partial topology PT(G(V , E), A) is said to be valid, if there exists at least one fully specified topology G'(V', E'), such that the graph G(V , E) is a sub-graph of G'(V', E') while satisfying the annotations 'A'. 3.3: Extended communicating finite state machine 3.3.1: Formal model All protocols are modeled as a single port, infinite capacity buffer extended communicating finite state machines (eCFSM). The eCFSMs are defined as a 7-tuple <S, s0, E, O, f, V , B>, where S is a set of states, s0 is an initial state, E is the set of consumed messages, O is a set of output messages, f is a state transition function, V is a set of variables and B is a FIFO buffer of infinite capacity. The function f is represented as a state transition table and returns a next state, a set of output signals, and action list for each combination of a current state and a consumed message. The function f also includes a pre-transition predicate that must be satisfied by the consumed message and the current state of the eCFSM before the state transition can occur. It also asserts a post-transition predicate on the output message that is generated as a consequence of the state transition. 17 As shown in Figure 4 typical entry in the state transition table is of the form [<consumed-message, current- state>, <pre-transition predicate>, <next-state, output-message>, <post-transition predicate>]. The pre- transition predicate is restricted to the equality and inequality operators. The post-transition predicate is restricted to assignment operators. The state transition function considers only the message at the head of the buffer. Further the eCFSMs must satisfy the following rules. • The state machine must have only finite number of states and messages. • The eCFSM must be deterministic. • The eCFSM must consume one and only one message from the head of the buffer to undergo a state transition. • The eCFSM, after a state transition may generate utmost one message. • The next state of the eCFSM must only be dependent on the current state and the message it is about to consume. Figure 4: A typical eCFSM table. 18 • Consumed messages in an eCFSM can either be autonomous external events/messages (AEE) or a message generated by one of the eCFSMs • Output or emitted messages can never be AEEs To model network addressing and layering the each message is assumed to have an address part and a body part. The address fields consists of a source address and a destination address. eCFSM messages at MAC, IP, transport (TCP. UDP) and sessions layers (HTTP) are modeled. The address fields are assumed to be one of the standard message destination types as described in chapter 3.1.Each of the messages is assumed to have the following structure. • MAC ◦ MAC_body ◦ Src_MAC ◦ Dst_MAC • IP ◦ IP_body ◦ Src_IP ◦ Dst_IP • TCP/UDP ◦ TCP/UDP_body ◦ Src_port ◦ Dst_port • HTTP ◦ HTTP_body ◦ HTTP_URI For the sake of simplicity, options fields is protocols headers are not modeled. Protocol models can be extended by adding message headers and can be incorporated into the methodology as long as the eCFSM 19 properties are satisfied. The mapping of the addresses from the session through the MAC layer, as defined by the TBR is as show in Figure 5. 3.3.2: Types of eCFSMs To faithfully model the network and how it implements communication abstractions based on message destination types two types of eCFSMs are modeled. • Packet delivery eCFSMs: All state transitions depend exclusively on addressing portion of messages and do not use/change body of message. They may have one or more states. The states often depend on the end state assumptions made about the functionality that the protocol implements or may represent a set of possible manual configurations. For example, in this work, though routing protocols are not explicity modeled, assumptions are made about their end state, i.e., routing is shortest path and symmetric. Different assumptions may lead to different end states. Figure 5: Mapping of address across layers. 20 In many cases, the state also reflect the assumptions made about the topology. In this study following packet delivery eCFSMs are modeled ◦ MAC: LAN, BS ◦ IP: GW, DR, RP ◦ Transport: NATs ◦ HTTP: HTTP transparent proxy, HTTP transparent load balancing proxy • Others': All state transitions depend on the body of the message and do no depend on or change the address of the message. They may have one or more states The addressing part of the messages are used for end point identification and for message delivery. It is also assumed that the packet delivery, i.e., network communication abstraction as implemented by the network is exclusively dependent of the addressing part of the messages. All protocols or devices like firewalls, stateful load balancers etc. that do not adhere to this rule are assumed to be out of the scope of this study. 3.3.3: Advantages and dis-advantages The finite state machine and their various extensions are well studied and their properties well characterized [imme05]. Many network protocols can be modeled as extended communicating finite state machines. The types of protocols that can easily be modeled and analyzed using eCFSM models tend to be the ones that do perform local decisions. Using eCFSM models also come with many limitations. They cannot to used to model global co-ordination protocols especially when the number of state in the protocols depend on the size of the topology. For example, the states in a routing protocols cannot be formally expressed in a topology agnostic way as every topology will yield states specific to that topology. In such cases the scope of the study could be limited to the functionality of the protocols that can be modeled using eCFSM semantics. In spite of all the assumptions that have been made about the eCFSM representation of protocols and eCFSM types, the set of protocols that can be studied is still large and represents an important class of protocols that often perform control or house keeping like functions. 21 3.4: Global eCFSM (GCFSM) 3.4.1: Formal model A network system made up of different vertices connected to each other can be represented as graph, i,e., as set of eCFSM connected to each other by edges. Formally, it is a graph G(V , E) where, • Vertices V represents a set of 7-tuple <S, s0, E, O, f, V , B> eCFSM • Edges E: Connectivity between vertices Structurally topology and GCFSM are equivalent. The state of a GCFSM would include in addition to the topology, the individual state of each eCFSM as well as the state of the buffers of each eCFSM. 3.4.2: Packet delivery and atomic operation When an eCFSM in the GCFSM consumes a message, undergoes a state transitions and emits a message, that message needs to be delivered to the input buffers of the appropriate vertices in the topology. Given a topology T, a vertex “v” in T generating a message “m” with a specific message destination type, the packer delivery function PDF(T, v, m) yields a set of vertices “V” in T into whose input buffer the message has to be inserted. An atomic operation in a GCFSM is defined to include the following two steps • Exactly one eCFSM consumes a message, undergoes a state transition and may emit a message • If a message is emitted, that message is delivered to the input buffers of the eCFSM determined by the PDF In a global eCFSM there can be many eCFSMs which may be capable of undergoing an atomic operation. “Enabled atomic operations” is defines as the set of all atomic operations that may occur at any given instant in time. 22 3.5: State space of GCFSM Consider a GCFSM specified by a G(V , E). The global state of the GCFSM is defined by • Static state of the GCFSM ◦ Instances of vertices and of eCFSMs of protocols they host ◦ Edges and other topology related static properties • Dynamic state of the system ◦ States of individual eCFSMs ◦ Messages in the buffer of eCFSMs Figure 6 shows how the state of GCFSM evolves when an external event is injected into on of the eCFSMs. Here individual eCFSMs of type [red, yellow, green, orange] along with the set of edges [E] represents the GCFSM. For clarity, onlythe non-empty buffers are represented. Initially, the GCFSM is in the state [1,2,3,4] with all buffers empty. Let the autonomous external event “A” be injected into an eCFSM in state “2”. At this point, there is exactly one eCFSM with a message in its buffer. An eCFSM state transition, also referred to as a local state transition, is said to be pending, if there is a message in its buffer that can cause eCFSM to undergo a state transition when the eCFSM consumes that message. Messages at the head of the buffer queue that are irrelevant to the current state of the eCFSM are ignored, i.e., messages which according to the state transition table of the eCFSM, initiates a “no-op” state transition is silently thrown away. The state transition of the GCFSM occurs when a single eCFSM undergoes a state transition resulting in additions of messages to the buffers of the receiving eCFSM if a message is generated. As shown in Figure 6, the message “A” causes an eCFSM state transition from “2 to 5” resulting in the generation of message “a” which is deposited in the buffers of the eCFSM as determined by the PDF. 23 Since each individual eCFSM has its own buffer, and can consume the messages in its buffers independent of the actions or states of the other eCFSMs, there is inherent non-determinism in the evolution of GCFSM 4 . Figure 6 shows two of three possible sequences of evolution of the GCFSM after the first local state transition. Though the local state transitions in a GCFSM can occur in parallel, it can be effectively represented by serializing all eCFSM state transitions and considering all possible sequences in which they can occur at every step. The GCFSM state transition is represented by two consecutive GCFSM states in the sequence of evolution of the GCFSM. The space of all possible sequence of evolutions of a GCFSM, defined by the static topology and the dynamic state of component eCFSMs and buffers, is referred to as the enumerated GCFSM sequence space. 4 Other models of the buffer like random access etc can also lead to non-determinism and generate an enumerated state space different from the “buffer as queue” model. COMPRESS methodology without modification can handle these types of non-determinism. Figure 6: Local and global state transitions. 24 GCFSM is said to move from one global state to another when exactly one 5 of the individual instance of the eCFSM undergoes a state transition, generates a message (if it does) and the generated message is received by all the components that are supposed to receive that message. Thus when a GCFSM moves from one global state to another, there is exactly one eCFSM component whose state may have changed compared to its previous global state, and, if an output message was generated, one or more eCFSM components with an additional message in its buffer compared to the buffers of the eCFSM components in the previous global state. Thus the enumerated sequence space of the system modeled as a GCFSM is guided by the state transitions of the individual eCFSM in each of the nodes and the sequence of messages and state transitions that can possibly occur. As multiple enabled eCFSM state transitions in the GCFSM can lead to multiple choices for the next possible sequence, the GCFSM is inherently non-deterministic in nature. Further, in reality transmission, propagation and queuing delays also affect the time at which the state transition becomes enabled. Since delays are not explicitly modeled, the abstraction of delays has to be accounted for such that any sequence induced by any possible value or delays is accounted for. This is effectively handled by exhausting all possibilities of sequencing of triggering atomic operations, where choice exists. For a given set of external events, the state space of a GCFSM represents all ways in which the state of the GCFSM may evolve, starting for all possible initial states and for all possible places and sequence of injection of external events. The state space for a given GCFSM t is represented as G GCFSMt . The sequencing possibilities of atomic operation are assumed to be exhaustively explored. It has been shown that a GCFSM with infinite state space can be trivially constructed [peng95]. Assuming infinite space and computation time, G GCFSMt consists of directed trees of evolutions of the GCFSM, each rooted at the initial state and the first external event triggered. The structure thus described have the following property • Two consecutive GCFSM states represent GCFSM state transitions resulting from 5 Since all parallel eCFSM state transitions can be expressed as combination of sequences of the individual eCFSM state transitions, the states that the GCFSM can take after executing parallel eCFSM state transitions, will remain the same as defined by the constrained GCFSM state transition as used in this work. 25 ◦ A single eCFSM state transition ◦ Generated message being delivered to appropriate eCFSM in the system • GCFSM “b” is said to be the successor of “a”, if, “b” is a consequence of a single atomic operation in “a”. 3.5.1: Behavior of GCFSMs The behavior of a global eCFSM, can be expressed as one of the following properties: • Reachability to a state “x”, where “x” is expressed using: ◦ Syntax of GCFSM ▪ Fully specified including states of all eCFSM and buffers in the GCFSM ▪ Partially specified states of one or more of eCFSMs and their buffers in the global eCFSM ◦ Syntax of individual eCFSMs ▪ Fully specified including state of buffers ▪ Partially specified excluding state of buffers ◦ Syntax between the GCFSM as one end and a single eCFSM at the other end • Reachability to a sequence of states expressed using the above semantics • Structural properties expressed as properties of the state space graph ◦ Properties of structure of paths. Eg, livelocks, deadlocks etc ◦ Looping and branching properties of the state space graph • Aggregate properties ◦ Total number of times a state is reached in the state space graph (Aggregate timeouts) ◦ Aggregate over a set of states (E.g., TCP throughput) The simplest form of reachability in a state space is defined on GFSM states. A GFSM state “x” is said to reachable if there exists a path in the state space from the root to “x”. Both aggregate properties as well as 26 structural properties cannot be expressed using the syntax and semantics of eCFSMs alone. Neither properties are used to generate topology conditions in this work. The goal of this work is to necessary generate topology conditions, and hence, behavior cannot be expressed as GCFSM states, as GCFSM itself represents a topology. Therefore, in this work only behaviors that can be represented as partial or fully specified state of the eCFSM instance excluding the state of the buffers are considered. The target behavior can have eCFSM state transition instances of the form <consumed message, current state → next state, output message>. As a starting point a simple target behavior with a single eCFSM state transition instance caused by an external event is considered. It is expressed as <AEE, current-state → next-state, output message> leading to one or more eCFSM state transitions instance expressed as <consumed message, current state → next state, output message>. A typical example of a target behavior would be a set of two eCFSM state transition instances (a) node x receiving MAC handoff signal, relinquishing its IP address and changing its current state from having a valid IP address to the next state of invalidating its IP address, sending a request to find DHCP servers, followed by (b) node x receiving DHCP address from the server, rejecting its assigned IP address (changing state from invalid IP to invalid IP), and sending a message releasing its new IP address. The structural interpretation of one eCFSM state transition causing the other is that the messages generated by the first must be responsible for the eventual generation of the message that causes the second eCFSM state transition. In the later sections, the target behavior is generalized to include multiple eCFSM state transition instances caused by multiple external events. The main advantage of such a representation is that behaviors can be represented without knowing the full topology. However, this representation is neither useful to represent the structural properties nor the aggregate properties. Despite the limitation, from the case studies, it is shown that there are many desired properties of certain systems that can be expressed using such a representation. Though this type of specification of target behaviors may seem restrictive, this is routinely used in the field of Test Driven 27 Development (TDD) and Behavior Driven Development (BDD) [chel09] to specify user stories or scenarios. 3.5.2: Reachability based on target behavior Given a topology and a target behavior expressed as a sequence of eCFSM state transition instances, reachability is one way of determining if the target behavior occurs on the given topology. Since two consecutive GCFSM states in the state space tree represents a GCFSM state transition which embeds a eCFSM state transition, finding of the target behavior occurs for a given sequence of AEE consists of the following steps • Generate the complete GCFSM state space (generates one or more directed trees each rooted at the initial state) • Recognize all GCFSM state transition embedding the eCFSM specified in the target behavior • Find all ways in which the GCFSM state transitions of interest are connected to each other in all the directed trees representing the complete GCFSM state space. Existence of at least one path in the state space connecting the GCFSM state transitions of interest indicates that the target behavior occurs on the topology. 3.6: Chapter 3 summary In this section, the some of the basic concepts like the topology, partial topology, extended finite state machine (eCFSM), global eCFSM (GCFSM), atomic operations, GCFSM state transition, GCFSM state space as well as representing target behaviors is discussed. Partial topology representation was newly developed as a part of this work, whereas, the others are based on prior work. 28 Chapter 4: Methodology In this work, a methodology, “COMPressed Representation of State Space for topology generation”(COMPRESS) COMPRESS, based on state space folding [youn89] to generate the necessary conditions that topologies must satisfy for the target behavior to occur, is developed. To motivate the approach taken by the COMPRESS methodology, generation of necessary topology condition is first done on a given instance of topology and progressively relaxed until the necessary topology generation procedure can handle all topologies. 4.1: Necessary topology conditions on a given topology The objective here is to generate the necessary topology conditions required for the btarget behavior to occur on a given instance of topology. Necessary topology conditions of the topology includes • eCFSM types • eCFSM cardinalities / type • Edges between eCFSMs For a given set of AEE, the entire state space of the given instance of topology is explored to generate a set of trees each rooted at the initial state of the topology. The behavior is expressed a one eCFSM state transition instance causing another eCFSM instance state transition. Given the state space, all paths representing the target behavior are enumerated. On any path representing the target behavior, every vertex in the path is fully specified (buffers, state). Each GCFSM state has utmost one successor and has utmost one predecessor. Two consecutive vertices represents a GCFSM state transition and is a consequence of a single eCFSM atomic operation. 29 Property 1: • Claim: In a GCFSM state transition a → b, caused by eCFSM state transition <m_i, p → q, m_j> then mi is either and AEE or generated in a GCFSM state transition that occurred earlier. • Proof: Structurally, only two types of messages exist, namely external and internal. Structurally, an eCFSM state transition consumes only one message. Structurally, exactly one eCFSM state transition is responsible for a GCFSM state transition. By definition, AEE cannot be generated by eCFSM state transitions. If the consumed message is AEE, the claim is trivially true. Further, only an eCFSM state transition generates a message. If m_i is not an AEE, then it MUST have been generated by an eCFSM state transition <m_r, g → h, m_i> embedded in a GCFSM state transition l → m. Since m_i should have been generated before a → b, l → m generating m_i should have occurred before a → b as only one GCFSM state transition can occur at a given time on a path and every GCFSM state transition has utmost one predecessor. Property 2: • Claim: In a GCFSM state transition a → b, embedding eCFSM state transition <m_i, p → q, m_j>, an instance of eCFSM type x, if “p” is not the initial state of type “x”, then the eCFSM was put in state “p” by a previous GCFSM state transition l → m, embedding the eCFSM state transition “<m_r, g → p, m_s> of the same instance • Proof: All eCFSMs are put into its initial state in the beginning of the state space enumeration process. The only way the state of the eCFSM can change is by being a part of the atomic operation. There is exactly one atomic operation causing a GCFSM state transition. Thus in a GCFSM state transition a → b, the eCFSM undergoing the state transition <m_i, p → q, m_j > must have been put in state p by an earlier atomic operation which was also caused by the same eCFSM instance. 30 Property 3: • Claim: For a given eCFSM state transition, the only previous GCFSM state transitions required are the ones that embed the eCFSM state transition that generates the required message and the eCFSM state transition that puts the eCFSM in its current state. • Proof: A given eCFSM state transition occurs only when the eCFSM in a given state consumed the required message. Structurally state and message are the only necessary conditions for an eCFSM state transition to occur. In a GCFSM path, the message is generated by a GCFSM state transition that occurred earlier in the path (Property 1) and the eCFSM in question is put into the right state by another GCFSM state transition (Property 2) which also occurred earlier in the path. Thus the claim is true. Consider a topology consisting of one instance each of eCFSM types RED, YELLOW, GREEN and ORANGE. Let the target behavior expressed as eCFSM state transition instance <A, (2 → 5), a > on the instance of type YELLOW (start condition) causing the eCFSM state transition <c, (6 → 11), - > on the instance of type RED (end condition). Let there exist at least one path (as shown in Figure 7) representing the target behavior in the state space of the topology thus described, after injecting the external event A into the buffer of the eCFSM of type YELLOW. In a path representing the target behavior, the last GCFSM state transition in the path embeds the end condition eCFSM state transition of the target behavior. To generate necessary topology conditions, the logic is to follow the necessary GCFSM state transitions from the end condition all the way back to the start condition. To do this, two types of path traversal are defined: (a) reverse traversal and (b) forward traversal. Given a path and a GCFSM state transition, reverse traversal determines which GCFSM state transition in that path generated the consumed message (Property 1). For example, in Figure 7, starting from ST1, the reverse traversal determines that ST2 as the GCFSM state transition in which the message “c” was generated. This also implies that there exists an edge between the active eCFSMs in ST1 and ST2 and that edge in necessary. Given a GCFSM state transition, forward traversal determines which GCFSM state 31 transition in the path put the eCFSM into the required state (Property 2). For example, in Figure 7, forward traversal from ST1 determines that ST3 was the GCFSM state transition that put the RED eCFSM instance into state “6” prior to ST1.From Property 3, ST1 requires ST2 and ST3 to happen before it. In ST1, the RED eCFSM instance is active and hence is marked necessary. In ST2 the GREEN eCFSM is marked necessary as it in the one that generates the message. The two newly recognized GCFSM are now followed up by generating the necessary GCFSM state transition (forward and reverse) for each. When the procedure terminates, all the eCFSM instance and edges marked necessary will the necessary topology conditions for the target behavior to occur. Figure 7: Path in GCFSM state space exhibiting the target behavior. 32 Property 4: • Claim: Forward and reverse traversals in combination generates the necessary topology conditions • Proof: From Property 3, the only necessary conditions required for an eCFSM to occur are the GCFSM state transition that generated the consumed message and the GCFSM state transition that put the eCFSM into the require state. Therefore, given a GCFSM state transition, the necessary conditions for the embedded eCFSM state transition to occur is completely captured by a single reverse and a single forward traversal. Thus, any new GCFSM state transitions that are recognized are truly necessary. The necessary conditions are generated for every transitively recognized necessary GCFSM state transition. Thus, when the procedure terminates every eCFSM instance and the edges marked as necessary represent the topology conditions necessary for the target behavior to occur. Necessary topology conditions are generated for all paths in the GCFSM state space representing the target behavior. Common eCFSM instances and common edges in all the generated conditions represents the overall necessary conditions that the topology must satisfy for the target behavior to occur. State space enumeration problem is undecidable and hence this approach even for a single instance of topology is computationally not guaranteed to terminate. The obvious question to be asked at this point is, can the necessary topology conditions on an instance of topology be recognized without explicit state space enumeration ? . The next section explores such an approach, including its advantages and limitations. 4.2: Necessary topology conditions on a given topology with- out state space exploration In a fully specified GCFSM state, all the states of the eCFSMs as well as the state of the buffers are completely specified, in addition to the eCFSM instances / types and the edges between them. Each component in the target behavior in represented as an eCFSM state transition and essentially represents the set of all possible GCFSM state transitions that embed that eCFSM state transition instance. Consequently, 33 given a topology, this can be represented by only specifying the message and eCFSM state transition while leaving the precise information about the buffers of all the eCFSM and states of the remaining eCFSM instances unspecified. For example, ST1 in Figure 8 represents the set of all possible GCFSM state transition instances in the state space of the topology instance (one instance of RED, YELLOW, GREEN, ORANGE + edges) which embeds the RED eCFSM state transition instance <c, 6 → 11, - >. Thus both the start and end conditions eCFSM state transition instances representing the target behavior can be expressed using such a compact representation. In the remainder of this section, a partial GCFSM state is refers to a specification where the consumed message and the state transition of exactly one eCFSM instance is specified. Property 5: • Claim: A GCFSM representation in which only one instance's state and the consumable message in its buffer is specified, represents the set of all fully specified GCFSM states with the given property. Figure 8: Compact representation of GCFSM state transition for a given topology. 34 • Proof: Structurally, the state of a GCFSM is dependent of the states of individual eCFSM and the state of the buffers of each eCFSM. A partial specification of any of the state and/or buffers with everything else allowed to take any state will always include the possible states that the GCFSM could assume during full state space enumeration. Property 6: • Claim: A partial GCFSM representation using partial GCFSM states will represent the set of all possible fully specified GCFSM instances that could exist in the state space of the given topology. • Proof: Each partial GCFSM has the state of exactly one eCFSM specified. For the active eCFSM in both partial GCFSMs, the first specifies the consumable message in the buffer of the active eCFSM and the second specifies the last message in the buffers of eCFSMs that can receive the generated message. This two partially specified GCFSMs represents the set of all fully specified GCFSM state transitions where, the given instance of eCFSM undergoes a state transition and emits a message. The advantage of such a representation is that it can compactly represent the set of all fully specified GCFSM state transition embedding an eCFSM state transition instance that may occur in the state space of the given topology. However, the disadvantage of such a representation is that is also includes GCFSM state transition instances that have the same eCFSM state transitions instance, but may never occur in the state space. Again, consider a topology consisting of one instance of types RED, YELLOW, GREEN and ORANCE eCFSMs. Let the target behavior expressed as eCFSM state transition instance <A, (2 → 5), a > on instance of type YELLOW (start condition) causing the eCFSM state transition <c, (6 → 11), - > in instance of type RED (end condition). The goal is to generate necessary topology conditions required for the target behavior to occur, without explicitly enumerating the state space. 35 To generate necessary topology conditions, the partial GCFSM state transition embedding the end condition eCFSM state transition instance is used as the starting point. From here, modified forms of reverse and forward traversals are performed until termination, i.e., when the partial GCFSMs consuming external events are reached. Once this is done, the paths representing the target behavior are marked eventually necessary topology conditions are extracted from them. The modified reverse traversal here needs to account for the fact that there is only a partial GCFSM state transition (ST1 in Figure 8) to start with and not a path. If there was a path, then there would be utmost one GCFSM state transition that could generate the message on that path. Since there is no path, reverse traversal needs to compensate for the lack of information and enumerate all possible ways in which the message in question could have been generated by eCFSM instances in its neighborhood. To do this, the reverse traversal, for each instance of eCFSM in the neighborhood of the eCFSM instance consuming the message, looks up the eCFSM's state transition table to check if the required message can be generated. If the message can be generated, then a partial GCFSM embedding the eCFSM state transition instance generating the message in added to the graph and made the predecessor as shown in Figure 8. Property 7: • Claim: Backward traversal as described generates a finite number of partial GCFSM state transitions and represents all fully specified GCFSM state transitions that would have generated the required message on all paths in a fully explored state space. • Proof: ◦ There are only a finite number of instances of eCFSM in the topology. Each eCFSM type has only finite entries in its state transition table. This only a finite number of partial GCFSM state transition would be required to embed the eCFSM finite number of eCFSM state transition instances that could have generated the required message. ◦ Assume a set of partial GCFSMs have already been generated by reverse traversal. Among the possible eCFSM instances that could generate such a message, trying to add another 36 partial GCFSM state transition to the set of predecessors results in a duplicate entry. Coupled with “Property 6”, this is a consequence of selecting the new partial GCFSM state transition from the information in the topology instance and eCFSM state transition tables, while the reverse traversal uses the same information to exhaustively generate all possible predecessors that could generate the required message. The forward traversal is also modified to generate all possible ways in which the eCFSM instance can reach the required state. The forward traversal looks up the state transition table of the instance in question and generates a partial GCFSM for every entry in the table where the next state the required state. Property 8. i.e., completeness and finiteness of forward traversal can be proved in lines similar to the proof of Property7. Property 9: • Claim: Given a partial GCFSM state transition (x) that represents at least one fully specified GCFSM state transition that occurs in the state space, reverse and forward traversals each generates partial GCFSM state transitions that includes respectively, all fully specified GCFSM state transition predecessors (reverse and forward traversal) on all paths , in the state space. • Proof: From Property 7, the reverse traversal includes all possible fully specified GCFSM state transition instances that could generate the required message. From Property 8, forward traversal generates partial GCFSM state transitions that include all possible instance of fully specified GCFSM state transition that could put the eCFSM in question into the required state. Since every partial GCFSM state transition instance in the graph satisfies Property 5, the above claim in true. 37 Given a partial GCFSM state transition (x), and reverse traversal generated partial GCFSM state transitions p1, p2, p3, and forward traversal generated partial GCFSM state transitions q1, q2, q3, the necessary topology conditions required for x to occur is given by ( NC(p1, x) or NC (p2,x) or NC(p3,x) ) AND ( NC(q1) or NC (q2) or NC(q3) ), where NC(p1, x) represents the necessary condition for p1 and x to occur. Property 10: • Claim: Topology conditions common to all reverse and forward traversal respectively represent the necessary topology conditions required for a partial GCFSM state transition to occur. • Proof: From “property 9”, partial GCFSM state transitions recognized by forward and reverse traversals embed all fully specified predecessors on all paths in the state space. The common topology properties to all partial GCFSM state transitions generated by reverse traversal will yield the necessary topology conditions for the reverse path. Similarly, the common topology conditions for the partial GCFSMs generated by forward traversal yields the necessary topology conditions for the forward path. Since both are necessary, conjunction of the necessary conditions generated Figure 9: Necessary topology conditions for a partial GCFSM state transition to occur. 38 from reverse and forward partial GCFSM state transitions will also yield necessary conditions. This is true in spite of some GCFSM state transition never being the forward or reverse predecessor in reality, as there will exist at least one predecessor both in the forward and reverse path that does occur in reality. For every new partial GCFSM added by the forward and reverse traversal, further forward and reverse traversals are invoked. Thus, for every new partial GCFSM added to the graph, the necessary topology conditions can be calculated. Assuming infinite computational time and space, the above procedure generates a tree rooted at the partial GCFSM state transition embedding the end condition and terminating in partial GCFSM state transition consuming an external event. The next step is to be able to separate out the paths that which embed eCFSM state transitions representing the target behavior from the ones that do not. To do this, mark all partial GCFSM state transitions that embed the start condition. Then, find paths consisting only of partial GCFSM state transitions recognized by the reverse path. Consider a single path representing the target behavior (x leading to y) as shown in Figure 10. For every partial GCFSM state transition vertex in the path, there will be exactly one reverse traversal predecessor. However, for each such intermediate partial GCFSM state transactions in the path, there will be its corresponding set of partial GCFSM state transitions recognized by forward traversal from that instance (f1, f2, f3 in Figure 10). To compute the necessary conditions for any partial GCFSM state transition vertex recognized the forward path, all forward and reverse paths leading to any external event consuming partial GCFSM state transition vertex needs to be taken into account. Unlike the path representing the target behavior, which is bounded by two conditions, the paths recognized by forward paths are weak, as only the starting condition is specified. Further, the forward traversal is responsible to put the state of the eCFSM into the required state. For every intermediate partial GCFSM state transition vertex, the state itself is relaxed to generate all possible state transitions that can generate a given message and hence attempting to generate necessary topology conditions for a relaxed state is pointless. Thus all topology conditions that would have been accumulated through paths from partial GCFSM state transition 39 vertices recognized by forward traversal are ignored while generating paths representing the target behavior. In Figure 10, the paths f1, f2 and f3 are ignored and the path representing the target behavior is simply the (x, r1, r2, y). Property 11: • Claim: Ignoring paths through partial GCFSM state transition vertices generated by forward traversal and retaining only the paths through partial GCFSM state transition vertices generated by reverse traversal still yields necessary topology conditions • Proof: From Property 7, necessary GCFSM state transition vertices are recognized by reverse traversal. Considering only common topology elements generated from reverse traversal still yields necessary topology conditions. From case studies, it is shown that high quality necessary topology conditions can be generated from reverse path alone. As a part of future work, an algorithm that can handle variable depth of exploration of forward paths may be explored. Figure 10: Paths representing target behavior. 40 A primary path is a sequence of partially specified GCFSM state transitions recognized by the reverse traversal, starting from the end condition to the start condition. In the graph generated from repeated forward and reverse traversals, many primary paths representing the target behavior may exist. Property 12: • Claim: on a given primary path representing the target behavior, every active eCFSM instance in each partial GCFSM is necessary. Further, there exists edges between the active eCFSM instances in two consecutive partial GCFSM state transitions in the path. • Proof: On the primary path, each partial GCFSM state transition has utmost one successor and utmost one predecessor. By definition there exists exactly one active eCFSM state transition instance in each partial GCFSM state transition. Thus the active eCFSM instance in each partial GCFSM state transition is necessary. Two consecutive partial GCFSM state transitions are connected to each other as one is the reverse traversal predecessor of the other, implying that the active eCFSM instances in one receives a message form the other. The necessary condition required for packet delivery is an edge between the two active eCFSM. 4.3: Necessary topology conditions on all topologies without state space enumeration In the previous section, generating necessary topology conditions for a given target behavior to occur on a specific instance of topology was discussed. In this section a attempt is made to generate the necessary topology conditions that any topology on which the target behavior occurs must satisfy, without resorting to full state space enumeration. Given a target behavior expressed as one eCFSM state transition instance leading to another eCFSM state transition instance, the necessary topology conditions include the necessary eCFSM instances/type, necessary edges between instances and existence of eCFSM instances / type in the neighborhood of eCFSM types or instances (In short, the necessary topology conditions are expressed as a 41 partial topology). The known quantities are the set of all eCFSMs that can be in the system, the static rules specified by the topology building rules (TBR) and the instances specified by the target behavior. To generate the necessary topology conditions, the same approach of compactly representing all GCFSM state transitions for each instance of topology can be considered as described in the previous section. There exists a partial GCFSM state transition representing the end condition of the target behavior for each instance of the topology. As shown in Figure 11 reverse traversal and forward traversal from partial GCFSM state transition of each instance of topology generates its own set of finite number of predecessor for each instance of topology. However, since there are infinite number of topology instances, this is not a compact representation. Though the set of all partial GCFSM state transition in ST1 represents infinite instances of fully specified topology, the only common elements in all such representations is that existence of the eCFSM state Figure 11: Compact representation per instance of topology. 42 transition instance representing the end condition of the target behavior. This can further be compacted by representing only the common eCFSM state transition instance (active eCFSM instance in Figure 11) and implicitly assuming any other eCFSM instance may exist. This allows the representation of potentially infinite partial GCFSM state transitions of individual instances topologies with a single eCFSM state transition instance. Property 13: • Claim: ◦ The compact representation of the GCFSM state transition covers all fully specified GCFSM state transitions in all topology instances. • Proof: ◦ Completeness of representation ▪ Case 1: If the partial representation is an instance of eCFSM, then all combinations eCFSM instances/types, their individual state and buffer states are all implicitly assumed. Since there is exactly one instance of eCFSM which causes a GCFSM state transition, the compact representation represents all possible fully specified GCFSM state transitions on all topologies. ▪ Case 2: If the partial representation is a type, then the type will include all possible instance of Case 1. Hence the representation is complete. Given the compact representation (x) of the end condition of the target behavior, a reverse traversal looks up for each eCFSM type allowed by the TBR in neighborhood of eCFSM instance (x), the set of all eCFSM state transitions in the transition table, that generates the consumed message type, and creates a compact entry for each. Then, it makes them the predecessor of (x). Since the exact instance is unknown, all possible instance capable of generating that message is captured by representing the eCFSM as a type. Similarly the forward traversal generates the set of all eCFSM state transition type/instance that is required to put the eCFSM type/instance in (x) to the required state and makes it the predecessor of (x) 43 Property 14: • Claim: ◦ eCFSM types generated by reverse traversals generate all possible predecessor GCFSM state transition instance on all paths in the state space of all topologies that can generate the consumed message. ◦ eCFSM types generated by forward traversals represent all possible predecessor GCFSM state transition instances on all paths in the state space of all topologies that can put the eCFSM into the require state • Proof ◦ Reverse path exhausts all ways allowed by the state transition tables and TBR, in which the require message can be generated. Since each predecessor is a compact representation, from Property 13, it includes all instances of GCFSM state transitions in all paths in the state space of all topologies which can generate the consumed message. ◦ Forward path ▪ If the compact representation is that of an instance, the forward path exhausts all ways allowed by the eCFSM state transition table, in which that instance could be put the required state. Since each predecessor is a compact representation, from Property 13, it includes all instances of GCFSM state transitions in all paths in the state space of all topologies which can put the target eCFSM into its required state ▪ If the compact representation in that of a type, it includes all instance of that type. From the above argument and from property 13, the representation is complete. Property 15: • Claim: Predecessors generated by forward and reverse traversals represent the necessary predecessors required for any instance of GCFSM state transitions included in the compact representation to occur. 44 • Proof: The compact representation is specified only one a single eCFSM state transition instance or type. The only necessary conditions required by the eCFSM state transition to occur on that instance is that the consumed message must be generated earlier and the eCFSM instance must be put into the require state. From property 14, the reverse traversal generates predecessors that include all fully specified GCFSM state transition instances in all path in the state space of all topologies that can generate the consumed message. If the eCFSM state transition used in the compact representation is a type, then the same hold based in property 14. The forward traversal generates complete and compact representations of the predecessors required to put the instance or type into its require state. This the predecessors generated by forward and reverse traversals generate the predecessors required for the current partial GCFSM state transition to occur. Property 16: • Claim: Necessary topology conditions generated for any fully specified GCFSM state transition instance on any paths in the states space of any topology can be generated by considering the predecessors generated by reverse traversal. • Proof: From Property 15, the reverse path generates the necessary predecessor. ◦ Case 1: If the current partial representation is based on eCFSM instance, then each predecessor asserts that a there exists at least one instance of the eCFSM type recognized by the reverse traversal in the neighborhood of the instance. ◦ Case 2: if the current partial representation is based on a eCFSM type, then each predecessor asserts that there exists at least one instance of a specified eCFSM type in the neighborhood of another eCFSM type. • If common topology elements are extracted from all predecessors generated by reverser traversal, then those conditions will be necessary on all topologies and on all paths. Topology conditions generated From Property 15, forward and reverse path are independently necessary. Therefore ignoring forward traversal predecessors will still yield necessary topology conditions. 45 Forward traversal is concerned with generation of predecessors which put the eCFSM into the required state. But, to generate all reverse traversal predecessors, the state of the eCFSM is effectively relaxed, and hence ignoring the forward path will most likely not affect the generated necessary conditions. From case studies, it is shown that high quality necessary conditions can be generated from only from predecessors recognized by reverse traversal. Now that only reverse traversal predecessors are considered generating all paths representing the target behavior is straight forward. Once all the reverse path predecessors are generated, the procedure, given infinite time and space will terminate at eCFSM state transition types consuming the external events. Mark all eCFSM state transition types that match the type of state transition specified in the start condition. All paths from the eCFSM state transition type starting from the start conditions in target behavior to the end conditions in the target behavior represents all possible paths representing the target behavior. 4.3.1: Transitive closure graph The reverse traversal approach does not lend itself to a compact and finite representation as loops are not handled. To generate a compact and finite representation of all reverse traversal based paths on all topologies, the transitive closure graph (TCG) is used. A TCG is a directed bipartite graph G(V1,V2, E), where V1 a set of eCFSM state transition types, V2 is a set of message types and E represents the causal relationship V1 → V2 and V2 → V1. Initially the TCG is empty. The TCG generation procedure inserts vertices corresponding to all allowed external event types in the transition tables of all the allowed eCFSMs in the system. For each external event type, all state transition types from all the state transactions tables are inserted into the graph and the selected external event made its successor. For each <message type → state transition type> this inserted, the corresponding message type in the eCFSM state transition table is inserted if such a message types does not already exist in the graph. If the message already exists, then that message vertex is reused. For every new message vertex inserted into the graph, the TCG generation procedure looks up all state transition types of all eCFSM types and inserts the state transition types that consume that message type and makes the consumed message the predecessor of the newly inserted state 46 transition type. The next step of inserting any message the state transition types may generate is the same as described before. The procedure terminates when no new messages can be added or when all eCFSM state transition entries in all the tables have been exhausted. The procedure is guaranteed to terminate as there are only a finite number of eCFSM types each with a finite sized state transition table. Figure 12 shows a TCG for a system with four types of eCFSMs. Figure 12: TCG of a networked system with 4 types of allowed eCFSM types. 47 The transitive closure graph satisfies the following properties. • All AEEs allowed by the eCFSM state transition tables is a part of the set of graphs • All message types transitively reachable from an AEE or any other message type exists in the TCG ◦ Every transitively reachable message type occurs exactly once in the TCG • All eCFSM state transitions types transitively reachable from an AEE exists in the TCG 4.3.2: Paths representing the target behavior in TCG A target behavior can be represented as a path in the TCG. Given a target behavior, an eCFSM state transition on the instance 'a1' of type 'a' leading to eCFSM state transition instance 'b1' of type 'b', the following steps are used to generate paths representing the target behavior • Mark all eCFSM state transition types of type 'a' • Mark all eCFSM state transition instance of type 'b' • Find all paths from each eCFSM state transition of type 'a' to each eCFSM state transition of type 'b' • For each path, assert type 'a' to be instance 'a1' and type 'b' to be of instance 'b1' Figure 12 shows an example target behavior of an eCFSM state transition instance of type “a” leading to an eCFSM state transition instance of type ”b”. TCGs can have loops leading the infinite number of paths when blindly enumerated. Handling loops in TCG is discussed in the next section. Property 17: • Claim: All reverse traversal based paths representing the target behavior can be generated from the TCG • Proof: Given any eCFSM message type, TCG will have all eCFSM state transition types that can generate such a message, as its predecessor. Given any eCFSM state transition types, TCG will include all messages types that can cause the state transition, as its predecessors. Thus, given any eCFSM state transition type, the TCG has all the eCFSM state transition types that can generate 48 the require message type. Predecessors of a eCFSM state transition type in a TCG is exactly the same as the predecessor eCFSM state transition types recognized be reverse traversal. Since TCG is the compact but complete representation of the structure generated by the reverse traversal, all paths extracted form the reverse traversal based structure can be extracted from the TCG. The only additional operation in the TCG path is to assert that the start and end eCFSM state transitions are the appropriate instances specified in the target behavior. 4.3.3: Generating necessary conditions from a path in the TCG. A path representing the TCG consists of eCFSM state transition types, with the additional assertion that the state and end eCFSM state transitions are the appropriate instances specified by the target behavior. Consider a path as representing the target behavior as show in Figure 13. Figure 13: Path representing the TCG. 49 Let a partial topology be associated with each depth of traversal of the path. Initially the partial topology is empty. The first step considers the first eCFSM state transition in the path. In this case, assume that it represents the start condition of the target behavior and hence the first eCFSM state transition (p) in Figure 13 represents an eCFSM instance. This instance is added to the partial topology. The next step represents a eCFSM state transition type that consumes the message generated by the first. The very first conditions that needs to be checked here is if the eCFSM state transition (q) in the path shown in Figure 13 can be in the neighborhood of the eCFSM state transition instance in (p). This information is static and is specified in the TBR. The neighborhood invariants specified in the TRB have to be necessarily satisfied by the partial topology at all the time. Assuming the eCFSM state transition type in (q) is allowed to be in the neighborhood of eCFSM instance in (p), the next thing to be checked is the cardinality of that type that can be in the neighborhood of the eCFSM instance in (p). If the cardinality is one, then, the eCFSM type in (q) is added to the neighborhood of eCFSM instance in (p) as an instance, and an edge between in the instance in (p) and eCFSM instance in (q) is added to the partial topology. Else, the eCFSM in (q) is added to the neighborhood of eCFSM in (p) as a type of with the cardinality allowed by TBR. On proceeding to (r) in the path, and assuming the eCFSM type in (r) is allowed by TBR to be in the neighborhood of eCFSM type/instance in (q), there are two possibilities • If (r) turns out to be an instance, and different from the instance in (p), then the eCFSM instance is simply added to the partial topology and an edge between the eCFSM instance in (q) and (r) is inserted into the partial topology. • If (r) is an instance and the same as the instance in (p), then the existing eCFSM instance must be reused with any refinements that (r) may impose. • If (r) turns out to be a type, then all ways in which the new type can be merged into the existing partial topology must be explored. Therefore a given path in the TCG representing the target behavior may generate more than one partial topology. When this occurs, the common partial topology elements among all the generated necessary topology conditions represents the partial topology generated by that path. The detailed topology generation 50 procedure as well as its properties are discussed in detail in Chapter 6.2.4 Completeness of topology generation. This procedure is repeated for all paths in the TCG representing the target behavior. The common topology elements in all the generated paths represents the necessary topology conditions that any topology on which the target behavior occurs needs to satisfy. 4.3.4: Optimizing the TCG Though, TCG is complete and correct, it is not very efficient (representationally as well as computationally) as it fails to take advantage of some very basic optimization opportunities based on network layering and standard packet delivery interfaces exposed by lower layers to the higher layers. Much of the runtime complexity in the full state space enumeration occurs due to the multiple sequencing that needs to be explored at every step. From the eCFSM specification packet delivery vertices only use and change the addressing fields of messages, and consequently, only affect the sequencing of atomic operations by influencing the message delivery sequence. The procedure that generates TCG ignores sequencing completely. The packet delivery eCFSMs do not affect the body of the message and hence all the packet delivery vertices related eCFSM state transitions and messages can be abstracted out of the TCG, while still preserving all the behaviors expressed as eCFSM state transitions and message of non- packet delivery eCFSMs. IP based networks are built on layering principle. For example, the MAC layer provides the abstraction of MAC unicast, broadcast and multicast to the IP layer. Another important property is the role of addressing in protocol behavior. As assumed in the eCFSM protocol specification properties, the eCFSM state transition itself does not depend on the address field of the messages. All information relevant for state transaction is assumed to be packed into the body of the message. Further, the eCFSMs in that implement the packet/stream abstraction do not change or use the body of the messages. Thus when a packet delivery path between two IP nodes are traced, the intermediate MAC eCFSM only play the role of enabling packet forwarding between the two IP based eCFSMs. By abstracting away the packet delivery components in any path the TCG can be optimized while still retaining all the behaviors that can be expressed using the 51 eCFSM state transitions of non-packet forwarding eCFSM. Many of the static neighborhood properties of imposed by the TBR to enforce a particular network structure can be seamlessly incorporated into the TCG. For example, in a wired LAN based topology, the TBR specifies that every eCFSM that does not perform packet delivery at the MAC layer has one and only one vertex in its neighborhood, namely the LAN. This statically forces every alternate eCFSM in the TCG to be of type LAN. Since all protocols use a standard message destination types, and TBR forces a higher layer eCFSM to be connected to a lower layer eCFSM, all packets are delivered between two eCFSM at the same layer using a pattern like <eCFSM_layer_x → {set of eCFSMs in layers x-1, x-2 ,....., 1} → eCFSM_layer x>. Figure 14: Sample transitive closure graph. 52 A sample transitive closure graph for a LAN based topology which can only have eCFSM of type P1, P2, P3 and LAN is as shown in Figure 14. The TCG generation algorithm is modified to differentiate between the messages generated by a packet forwarding eCFSM and non-packet forwarding eCFSM. All messages generated by packet forwarding eCFSM are reused within that class and all messages generated by non- packet forwarding eCFSM are reused with that class. When all allowed packet forwarding eCFSMs in the MAC layer are considered, as show in Figure 15, they can be compactly represented by five different ways. This is statically determined by TBR and is invariant over all topologies and messages as • Non-packet forwarding eCFSM in the MAC layer can have exactly one LAN or BS in its neighborhood • LAN can have zero or more BSs and/or zero of more non-packet forwarding eCFSMs Figure 15: Compact representation of packet forwarding paths. 53 • Each BS must have exactly one LAN and zero or more non-packet forwarding eCFSMs in its neighborhood The intermediate packet forwarding vertices are represented by eCFSM types, which when incorporated into partial topology during topology generation can be converted into instances. The packet forwarding path itself is represented as a partial topology which connects two non-packet forwarding eCFSM at the MAC layer. All protocols use the standard message delivery interfaces exposed by the lower layer as listed in TBR. Given that the packet forwarding eCFSM in the TBR are assumed to not use/change the body of the messages, the set of all eCFSMs in the packet delivery path between two non-packet delivery eCFSMs can be abstracted out at all layers. Partial topologies can be used to represent all ways in which the two non- packet forwarding vertices can be connected at any layer, as, a packet delivery path is determined by • TBR which determines possible type and cardinalities of eCFSM in all possible topologies • Message destination type • PDF, which determines which vertices in the topology graph receives the message From TBR, • There are only finite types of packet forwarding eCFSM in the MAC (BS, LAN), IP (GW, DR, RP, NAT*), transport (NAT/Load balancers) and the session layers (*Proxy) and each have a finite number of states • There are only a finite number of message destination types and their mapping between layers is static. ◦ MAC (unicast, broadcast and multicast) ◦ IP (unicast, sub-domain broadcast/multicast, link local broadcast/multicast, domain wide multicast) ◦ Transport (tcp/udp, port) ◦ Session (http) 54 Using the above information, a finite number of packet forwarding path patterns represented by partial topologies can be used to represent how two non-packet forwarding eCFSMs can be connected. To build packet forwarding paths at the MAC layer, consider non-packet forwarding MAC vertices x, y and packet forwarding MAC vertices LAN and BS. Assume atomic operation in x emits a broadcast packet destined to MAC broadcast. Find all packet forwarding eCFSMs in all states that can be in the neighborhood of x (specified in TBR) and consumes that message. For each new vertex/FSM added, repeat the above step until all ways in which y can be reached is covered. The paths obtained gives all ways in which x, y can be connected through packet forwarding vertices. A similar approach can be used to construct packet forwarding paths for all the other message destination types. Static mapping for message destination types at one layer to the other, can be utilized to reuse the partial topologies generated at the lower layer when paths are being generated for the higher layer. Based on the TBR, the following partial topologies are defined for all the message destination types: • MAC Unicast / Broadcast / Multicast (x, y are vertex types at MAC allowed by TBR) ◦ x ◦ x----BS1----LAN----y ◦ x----LAN----BS1----y ◦ x----LAN----y ◦ x----BS1----y ◦ x----BS1----LAN----BS2----y • IP – Unicast ( x, y vertices allowed by TBR ) ◦ Global to Global ▪ x ▪ (x,y)----GW ▪ x----GW1----I----GW2----y ◦ Local to Local ◦ x 55 ◦ (x, y)-----NAT • Global to Local ◦ None • Local to Global ◦ None • TCP/UDP ◦ TCPC----TCPS ◦ TCPC----(NAT-LB*----TCPS) ◦ (TCPS----NAT-LB*)----TCPS ◦ (TCPS----NAT-LB*)----(NAT-LB*----TCPS) • HTTP/TCP/IP ◦ HTTPC----HTTPS ◦ HTTPC----TransProxy----HTTPS ◦ HTTPC----LBTransProxy----HTTPs This can be incorporated into the transitive closure graph to yield an optimized representation. The optimized TCG is a directed tripartite graph G(V1, V2, V3, E), where • V1 → set of state transitions • V2 → set of messages • V3 → set of partial topologies corresponding to the message destination types • E (v1 → v2, v2 → v3, v3 → v1) ◦ Causal relation ship between v2 and v1 • Message forwarding paths (v3) used by the message (v2) • Given (v2 → v3), the set of all v1s caused by v2 The TCG algorithm remains the same as described earlier, except when a message type vertex is added to the graph, then, all the partial topologies representing the packet forwarding path for that message 56 destination type is added to the graph and made the successor of the message. For every eCFSM state transition type that a message type can cause, each one of the partial topology added in the previous step is made the predecessor. This is shown in Figure 16. Extracting paths representing the target behavior remains the same as before. The true power of this data structure is revealed in the reduction in the number of paths that are generated. For example, if there are five rounds of messages destined to MAC unicast exchanged between eCFSMs, then the unoptimized TCG as show in Figure 15 would generate 5^4 paths. If an optimized representation of the TCG is used, the branching factor is completely eliminated leading to a single path. However, there are some disadvantages of such optimizations. If a non-packet delivery eCFSM is specified up to MAC, then the optimized Figure 16: Optimized representation of TCG. 57 representation will force exploration of more branches than required. Through various optimization techniques can be incorporated to handle every case optimally, the choice 6 of using a static definition of packet delivery path for a given message destination type represents a trade off between reduced representation to slightly increased computation complexity. 4.3.5: Handling loops in TCG When TCGs have loops in them, the loops must be unrolled. To ensure decidability, unrolling of loops has to be limited to a finite number. The depth to which unrolling has to be performed is determined by incrementally unrolling the loop and generating topology for each of them. When the topology generated by two subsequent paths are the same, unrolling can be stopped as more unrolling will not add any new information to the generated topology. For example Figure 17 shows a TCG with a loop. In such a TCG, let the target behavior be represented by the event (a) leading to event (b). There are infinite paths representing the number target behavior in this TCG depending on the number of times the loop is unrolled. However if Path-1 and Path-2, where the loop is unrolled once and twice respectively, generates the same topologies, unrolling the loop once will capture all the topology information embedded in the path and hence the set of all paths representing the target behavior on this TCG would include all paths with the loop unrolled once and twice respectively. 6 The choice was good enough for the case studies considered here. No claim of optimality or characterization of the trade off space is made in this study. 58 4.4: COMPRESS methodology The COMPRESS methodology takes as input topology building rules including eCFSM of allowed protocols, topology model, packet delivery function, partial topologies for message destination types, target behavior expressed as eCFSM state transition instances and generates as output partially specified topology representing necessary topology conditions. Figure 17: Paths in global state and TCG. 59 Figure 18: COMPRESS methodology. 60 The basic steps of COMPRES methodology is shown in Figure 18. Based on the inputs the algorithm first constructs a set of TCGs. Then, all paths representing the target behavior are extracted from the TCG. If there are paths with loops, bounding operation is done before extraction of paths. Once the paths are extracted, partial topologies are generated from each path. Finally, the partial topology representing the necessary topology conditions that any topology on which the target behavior occurs is generated by extracting all common elements from the partial topologies generated in the previous step. The path enumeration and topology generation algorithms can further be optimized, as, in a tree, paths often share the initial portions of the path and any computation performed on the common portions does not change for the common portions. Thus, interleaving the path generation and topology generation procedure enables the early contradiction detection on multiple paths at the same time and hence yields enormous gains in runtime complexity without any compromise in quality of solutions. This augmented algorithm is called A-COMPRESS. In the evaluation section, the gains obtained from A-COMPRESS is illustrated. 4.5: Chapter 4 summary In this section, the idea behind the COMPRESS algorithm was developed by initially generating necessary topology conditions for a target behavior to occur on a single instance of topology. Then the problem was relaxed to generate topology on a single instance of topology, but without enumerating the state space. Two concepts, namely backward traversal and forward traversal were developed. The problem was finally relaxed to generate necessary topology conditions that all topologies on which the target behavior occurs must satisfy. The TCG data structure was introduced and optimized. Based on the ideas developed and the TCG data structure, COMPRESS methodology was developed. The following procedure formally captures the steps of COMPRESS. 61 Let, Target behavior be theoccurrence of eCFSMstate transition instancea leading tob TCG be a set of directed tripartite graphs generateTCG be the procedure that generates theTCG P s,d be a set of paths from s to d enumeratePaths(g, (s,d)) be the procedure that enumerates all paths from s to d in graph g mapToTopologypbe the procedure that extracts partial topology from a path p T p be the set of partial topologies generated from path p TN be the necessary topology conditionsspecified asa partial topology P a , b ={∅},T P a ,b ={∅} TCG=generateTCG for each graph j∈TCG P a , b =P a , b ∪enumeratePaths j,a , b for each path p i ∈P a , b T P a , b =T P a, b ∪mapToTopologyp i TN a ,b =∩ n i=1 t i , t i ∈T P a, b 62 Chapter 5: Relationship of COMPRESS to related work Initial work in protocol testing emerged as a result of work in formal methods for protocol testing in telecommunication networks. However, telecommunication network services are almost entirely implemented in the network [loge99] and formal methods that evolved to serve systems with this property, developed techniques to verify state and message properties of a single node and its interaction with the network ([diet02], [mead03]). As observed in [diet02], variations in connectivity between the nodes of the system are not considered as this does not affect the services in telecommunication networks. Though much of the work cited in the above surveys deal with protocol testing, unlike the proposed methodology in this paper, they either do not address directly the influence of topology on protocol behavior or do not generate topology properties that can reduce the overall complexity of protocol testing in IP networks. 5.1: Overview of the testing problem Before comparing COMPRESS with other techniques and approaches, understanding the testing problem will give a context to better understand the approach and goals of the different techniques. Much of this section of inspired by [youn89]. Testing, a.k.a., fault detection techniques have been broadly binned into static and dynamic approaches and every practical approach strives the achieve a balance between accuracy and effort. As explained earlier, this is a fundamental consequence of the expressiveness of the formalisms used to represent the system in question. A detailed complexity landscape of different formalisms can be found in [imme05]. Since the complexity to effort tradeoff is common to most practical testing techniques, looking at the approaches based on this tradeoff, i.e., weather they fold or sample the state space helps one better understand the strengths and limitations of each approach. Folding techniques suffer from false positives whereas sampling 63 techniques suffer from false negatives for the occurrence of a given behavior. However, if the folding or the sampling techniques can assert properties about consequences of folding or sampling on the state space, the results obtained can be very useful. For example, if the sampling technique asserts a property that its samples will include all states or paths with at least one “red 7 ” FSM and no states or paths outside the real state space, then, any non-occurrence results obtained by this approach will assert the non-occurrence of the behavior of all systems with at least one red FSM. Similarly, if a folding technique guarantees to represent all paths that include the “red” FSM, but not limited to the states and paths in the real state space, then the results obtained from such a technique will be useful to assert the requirement of a “red” FSM for occurrence of the fault. 5.2: Expressiveness of property under test Several formal techniques have been developed for specific structural properties like live locks, dead locks etc, which typically can be studied independent of the semantics or meaning of the different states of the FSM in question. For example, the livelock or dead lock detection algorithms work independent of weather the FSM represents a DHCP protocol or a TCP protocol. Further, the structural nature of the properties under study allows development of testing methodologies optimized for the specific structural property. In some cases the solutions can be sub-exponential. Though these properties are very important, they are not sufficient when testing protocols. From the examples quoted in the earlier chapters, it is clear that it is important to study behavior that are specific to each protocol's semantics, i.e., behavior expressed in terms of the states and messages (semantics) of the protocol under study. COMPRESS does not focus on and hence compete with the methodologies that are designed to study specific structural properties of the system. 7 Red as a color is used as a place holder for a specific property that the FSM my satisfy. 64 5.3: Comparison of COMPRESS with other existing methodolo- gies In general, the problem of identifying all possible topologies or the necessary and sufficient conditions satisfied by topologies on which the target behavior, error or worst case performance may occur, has a very high complexity. In fact, the reachability problem of protocols modeled using FSM or bounded buffer eCFSMs over a given topology has been proved to have exponential complexity [peng95]. Further, for eCFSMs general reachability problem is undecidable [peng95]. For a general case, if reachability needs to be tested for a protocol modeled using FSM or extended FSM or a bounded eCFSM, over an infinite topology space, especially for a scenario that does not occur, every topology may need to be exhaustively explored. Since the topology space itself is very large, this problem too has very high complexity. There are some alternatives methodologies for topology generation, though none of the methodologies can generate necessary conditions. The methodologies are classified into four categories based on whether they handle state space and topology in a precise or approximate fashion. Following table lists some of the approaches that can be taken to perform topology generation 65 Space Explored Approach Complexity Completeness of conditions generated Properties of Topology conditions Trade-off Precise topology, Precise state space Explore complete state space for every enumerated instance of topology. Uses reachability to check if desired behavior occurs on topologies. Many existing state space exploration fall into this category. Reachability can be undecidable in the worst case for cFSMs. Automation Ease: HIGH Cannot produce necessary topology conditions. Quality of answer is proportional to the execution time. Can answer topology instance specific reachability questions (sufficient conditions) Sufficient conditions for both occurrence and non- occurrence of scenarios Necessary conditions CANNOT be generated. Increased complexity (non- termination ) to increased precision of any obtained answers Abstract topology, Precise state space [emer00], [emer05] prove topology abstractions for broadcast protocols (extremely simple topology) such that full state space analysis on an instance is sufficient to answer some reachability based questions reachability for an equivalence class of topologies. [bing04], [fink01] and [henz00] prove that backward search can be efficiently used to generate sufficient conditions for errors to occur for protocols whose state space is upward closed. [helm00] uses backward search to generate sufficient conditions. Formal proof is usually manually done and is very involved and specific to classes of protocols. Complexity is same as full state space exploration on an instance of topology (exponential in the worst case) Automation Ease: VERY LOW If required properties can be proved, reachability based answers are complete and precise. Sufficient conditions for both occurrence and non- occurrence of scenario can be generated for applicable protocols. Reduced protocol and scenario space, manual proof procedures to reduced complexity and increased precision in answers obtained. Limitations exist on expressivene ss of scenarios. Table 1: Approaches to handling network topology in protocol testing. 66 Table 1, Continued. Precise topology, Abstract state space Formal proof needed for a state space reduction algorithm like partial order reduction [alur97]. Simultaneous Reachability Analysis (SRA) [ozde97], [kara00], [leiy02] can be used to efficiently prove certain structural properties on a given topology. Uses reachability in the reduced state space to check structural properties for a given topology. One time manual formal proof is on generic representation of state space Runs forever as there are infinite topologies. Automation Ease: HIGH Cannot produce necessary topology conditions Will answer some reachability questions Can generate sufficient conditions for both occurrence and non- occurrence of a limited set of scenarios for lossless and lossy reductions of state space. Limited expressiveness to increased precision of obtained answers Abstract topology, Abstract state space Formal proof needed for a state space reduction algorithm like COMPRESS Does not use state space reachability to perform topology generation and hence CANNOT answer reachability questions Has finite computational runtime, though the worst case complexity itself may be exponential for behaviors that have non-empty necessary conditions. Automation Ease: MEDIUM Will produce a set of partial topology conditions that dominates ALL topologies on which the scenario may occur, if it occurs. Cannot answer reachability questions. Produces necessary conditions for topologies assuming scenario occurs Tightness of the superset of topologies on which scenario occur varies with protocol properties and scenarios Ability to generate necessary topology configurations to limited expressiveness and decreased precision in answers obtained 67 From the above table it can be seen that the four approaches widely vary in their capabilities and limitations. Each methodology trades off certain capabilities in order accomplish its goal. In practice, when solving all aspects of the problem becomes very complex, sampling and folding techniques are used. Sampling techniques tend to introduce false negatives whereas folding techniques tend to introduce false positives. Since solving both the problem of state space reachability and topology search is thought to be a hard problem, two techniques of interest to us are the ones that use abstract topology with precise state space and the ones that use abstract topology with abstract state space. Some approaches that address topology in testing have emerged in recent years. [isma05] describes a testing model and methodology that accepts topology as an explicit input. The models considered here can be used to model unicast, broadcast and multicast semantics on a local area network (LAN). One of the approaches taken in existing literature is to manually prove equivalence class on topology space such that performing analysis on an instance is sufficient to answer reachability based questions for all topologies in that equivalence class.[emer00], [emer05] reduce the problem of testing a system with n identical processes to testing a system of k nodes, where k < n. This approach only addresses a system with processes on a broadcast medium, which is only one form of connectivity in IP networks. In this work, network connectivity that is more complex than a single LAN or broadcast medium is considered, and the focus of this work is topology generation as opposed to proving properties of systems on a given topology. [emer00], [emer05] attempt to define equivalence class on broadcast topologies, where reasoning on topologies with some minimum number of nodes is sufficient to reason about topologies with nodes greater than that number. Another approach taken by [henz00], [fink01] and [bing04] is to limit the language used to express the protocols and topologies such that the state space is upward closed. A very important property of these approaches is the ability to qualify the answers obtained as being sufficient conditions. However, these methodologies requires extremely involved protocol specific manual proofs and are limited to broadcast topologies. Initial work done in the context of [helm00] used the concept of equivalence class of topologies to reduce the state space while solving state space reachability problem. 68 Another approach that is used to deal with similar computationally hard problems is to tradeoff quality of the solution to reduced computational complexity. Unlike problems with quantitative solutions, correctness problems can seldom be quantified as the solution is either affirmative or negative. The tradeoff made here is reduced computation complexity to generate necessary topology conditions at the cost of accurate state space reachability, i.e., no attempt is made to perform topology generation by solving the state space reachability problem. Thus, using COMPRESS one cannot affirmatively answer the reachability question. However, the necessary conditions identified can be used to restrict the set of topologies to be analyzed using full state space analysis tools. As the goal is to identify a subset of the necessary conditions, the set of topologies that are generated will be a superset of the set of topologies where the target behavior may occur. To be effective, a topology generation approach must generate many necessary conditions, thus making the size of the topology space to be searched smaller. Also, heuristic/deterministic forward/backward search approaches must be used to enumerate topologies that satisfy the identified necessary conditions in such an order that the target behavior is identified at a practical average complexity. The approach taken is expected to work well 8 in the following cases 1. Target behavior depends on low level details of the topology 2. The same target behavior does not occur on topologies with no common properties i.e., if the target behavior occurs on topologies with only red nodes and all topologies with only green nodes, then necessary conditions will be empty assuming the intersection of red and green is empty. In this case the idea of finding necessary conditions itself will not yield any information due to the fundamental nature of the target behavior. The solution though not straight forward is to modify the target behavior suitably to get the necessary conditions. 3. Cardinality of the messages transmitted and received does not play a significant role in the behavior of the protocol 8 It is possible to show that for a certain group of protocols and target behaviors that violate either one or all of the conditions COMPRESS yields trivial results. The author believes that the set of protocols and target behaviors for which our methodology works, is large enough to afford a novel methodology. 69 When either of the first two conditions are violated, the quality of the solutions generated by our methodology deteriorates drastically, whereas the deterioration is expected to be much gentler when the third conditions is not satisfied. To address the first issue, COMPRESS is applied to protocols whose behavior strongly depends on the topology, as in resource discovery protocols. From the results of the case studies is can be shown that a significant number of target behaviors do not violate the last two conditions. One of the weaknesses of this approach unlike many of the reachability based approaches is that it cannot answer reachability questions and consequently cannot generate sufficient conditions. This is a tradeoff explicitly chosen in order to enable COMPRESS to reason of arbitrarily large set of topologies. A consequence of this property is that it makes the problem of distinguishing between a semantically meaningful and a meaningless behavioral specification very hard. However, even other methodologies except [emer00] in some limited cases, will be unable to distinguish between the two, as most of them will likely not terminate trying to find a behavior that does not occur. Another drawback is that topologies on which a particular behavior does not occur cannot be expressed and hence no topology can be generated. Since COMPRESS does not precisely explore the state space, COMPRESS is unable to answer questions regarding cardinality of messages or entities in most cases. COMPRESS is can be extended to detect behaviors triggered by multiple external events. In many cases, if external event is the only event that can trigger a transitions from initial state to some intermediate state, static properties may be used to detect such cases. However, this property may not be true for all cases. Lastly, when the target behavior never occurs in reality, COMPRESS will still generate some set of topology conditions and there is no way of knowing if the set of topologies within the set represented by necessary conditions is actually an empty set. However, even the other methodologies cannot answer this question for a general case. 5.4: Chapter 5 summary In this chapter, the existing work has been classified into four categories and have shown how our methodology is different from the existing ones. It has also been shown that different methodologies in the four categories are designed to achieve a slightly different objective with optimizations based on different 70 assumptions. The strengths and weaknesses of COMPRESS are compared and contrasted to the other approaches. 71 Chapter 6: Analysis of COMPRESS In this section the completeness of COMPRESS is proved, i.e., the topology conditions generated by COMPRESS will include every topology on which the target behavior may occur. Initially the terms and definitions used in the proof are introduced followed by a multipart proof. The quality of necessary topology conditions generated by COMPRESS is discussed and a high level overview of the computational complexity of COMPRESS is given. 6.1: Constructs used in the proof In Chapter 3 Background and 4 Methodology, the following concepts were introduced: • GCFSM sequence space ◦ Paths in GCFSM sequence space representing target behavior • Transitive closure graph generated by COMPRESS ◦ Paths in the same representing the target behavior For the proof two additional concepts are introduced • Paths in GCFSM sequence space converted to transitive closure graphs • Paths in transitive closure graphs Any path in the GCFSM state space can be structurally transformed into a directed bipartite transitive closure graph G(V1, V2, E), where the V1 is the set of messages, V2 is the set of state transitions instance and E is the set of edges representing causal relationship between messages and state transitions. The procedure to transform any path in the GCFSM state space of an instance of topology to a TCG is as follows. • Let the path in the GCFM state space be 'p'. Initially the TCG is empty. • Extract the eCFSM state transition instance embedded in the first GCFSM state space transition and insert it into the TCG 72 • Follow the state transition instance in the path caused by the generated message and add those instances as the successors of the message • Repeat the previous step until all the GCFSM state transition in 'p' is exhausted The TCG thus generated will have the same number of eCFSM state transition instances as well as the causal relationship between the message instance and state transition instances as in the path 'p'. As shown in Figure 19, the path 'p' represents a sequence in the state space on which the target behavior <a, 2 → 5, a> leading to <c, 6 → 11. - > occurs. A similar path can be traced in the TCG extracted from the path, by tracing a path from the starting eCFSM state transition instance to the ending eCFSM state transition instance. The shaded portion of the TCG in Figure 19 represents such a path. Figure 19: Path in the GCFSM state space of a topology instance transformed to a TCG. 73 In the proof, the set of all possible sequence space for a given GCFSM or topology t is represented as G GCFSMt . Thus, this will include all the combinations of the initial state and the sequence of injection of external events. As defined earlier topology represents the static information encoded in the GCFSM state. Given a target behavior TB, P G GCFSM t TB represent the set of all paths in G GCFSMt on which the target behavior TB occurs. Figure 20: Structure of transitive closure graph generated by COMPRESS. 74 It is important to differentiate between the two class of TCGs and paths in TCGs used in the proof. ● Instance based TCG and paths: One is the graph obtained when each path in P G GCFSM t TB is transformed into a transitive closure graph. This is a simple graph operation performed on the individual paths. TCG P G GCFSM t TB represents the set of all TCGs that have been extracted from all paths in P G GCFSM t TB . It should be noted that this transformation is not a lossless operation and hence multiple instances of paths in P G GCFSM t TB may be mapped to a single TCG instance. The structure of the TCG is as show in Figure 19. P TB TCG P G GCFSMt TB represents the set of all paths in each TCG in the set TCG P G GCFSM t TB which satisfies the required start and end condition eCFSM instance state transitions specified by the target behavior. Every construct P G GCFSM t TB through TCG P G GCFSM t TB in this bullet point deals with eCFSM instances and is only used in the proof and not used by the COMPRESS algorithm. ● Type based TCG* and paths: The other is the TCG* generated by the COMPRESS 9 algorithm. The set TCG represents a set of transitive closure graphs generated by COMPRESS, each starting from a given type of AEE. All vertices are types rather than instances. The structure of single transitive closure graph generated by COMPRESS is as shown in Figure 20. P TB TCG * represents the set of all paths that satisfy the start and end conditions specified by the target behavior, in each graph belonging to the set TCG*. Given a path “p” in P TB TCG P G GCFSM t TB , a subset of eCFSMs and their edges in “t” can be reconstructed from the path, as consecutive two state transitions vertices represents eCFSMs that are in each others' neighborhood. When the neighborhood information is accumulated over the entire path, and the accumulated information is represented as a partial topology “t'”, then, t' will always be a subgraph of t. When a similar exercise is undertaken on the paths in P TB TCG * , necessary topology conditions 9 Transitive closure graph generated by the COMPRESS algorithm is denoted by TCG* to differentiate it from transitive closure graph generated from path in the state space of an instance of topology 75 required for the target behavior to occur on all topologies is obtained. The goal of this section is to prove that the above claim is true. 6.2: Proof of completeness of COMPRESS A naive procedure that uses enumerated state space reachability to check if the target behavior occurs on any given topology may completely explore all possible ways in which a given system can evolve. COMPRESS on the other hand neither explicitly enumerates each topology nor does it exhaustively explore the enumerated sequence space for any instance of topology. COMPRESS generates topology conditions using transitive closure graphs. Since COMPRESS deviates from the naive procedure, it is required to prove that all topologies on which the target behavior occurs as determined by the exhaustive space search will be included in the set of topologies specified by the necessary topology conditions. To prove the completeness of COMPRESS a baseline is first established and its completeness proved. Initially, for a simple definition of target behavior, it is shown that the TCG generated by COMPRESS is complete as the enumerated sequence space of any topology can be completely reconstructed from the topology independent TCG. After establishing the completeness of the TCG, the completeness of the path in the TCG representing the target behavior by reconstructing all local paths on any topology representing the target behavior is established. Once the completeness of the paths is established, it is shown that the topology reconstruction algorithm is complete. Eventually, using all the results, it is shown that the COMPRESS generates necessary topology conditions. Lastly, the proof is generalized for a more general definition of target behavior including multiple eCFSM state transitions and AEEs. 6.2.1: Baseline For the purpose of proving completeness, a baseline or the “ground truth” is established, and then the completeness of COMPRESS is established by comparing it to the baseline. As a starting point a simple case where the target behavior consists of <AEE, CurrentState1, nextState1, outputMessage1> occurring eventually leading to an eCFSM state transition “<consumedMessage2, currentState2, nextState2, 76 outputMessage2>” is considered. Later, the proof is generalized to include multiple eCFSM state transitions and multiple external events. To establish the baseline, a naive algorithm that enumerates all possible topologies, considers all possible initial states for each topology, considers all possible way of injecting a single external event A into each of the GCFSMs in its initial states and enumerates all possible sequence of evolutions, is considered. Then, the algorithm checks for occurrence of target behavior on each such enumerated GCFSM state space and collects all topologies on which the target behavior occurs. The following procedure establishes the base line for comparing the topology conditions generated by COMPRESS. Assumptions and notations used in the procedure: Let , TB be represented by <A,currentState1,nextState1, outputMessage1> <inputMessage2,currentState2,nextState2, outputMessage2> t ,a topology ,represent a set of eCFSMs connected bya connectivity matrix [c] T valid is the set of all valid topologies T TB ,a set of topologies, represent a set of topologies on which the TB occurs G GCFSM tis a set of graphs representing state space Procedure used to generate the set of all topologies on which the target behavior TB occurs: 77 T TB ={∅} for each topology t i ∈T valid ## Generate state space G GCFSMt i ={∅} for each initial state jof t i for each eCFSM k used to inject AEEA into t i , j G GCFSMt i =G GCFSMt i ∪StateSpaceSequencet i, j, k for each graph g j ∈G GCFSM t i for each path p k defined byvertex pair 〈v x , v y 〉∈g j for each child c of vertex v x for each child d of vertex v y if ( (<AEE,currentState1,nextState1, outputMessage1>∈< v x ,v c >)∧ (<inputMessage2,currentState2,nextState2, outputMessage2> ∈< v y , v d >) ) T TB =T TB ∪t i StateSpaceSequencet for each enabled transition x∈t nextGCFSMstate=triggerStateTransitiont ,x StateSpaceSquencenextGCFSMstate The above procedure explicitly enumerates the complete enumerated sequence space for each topology and checks if the target behavior occurs. When the above procedure terminates 10 , T TB will contain the set of 10 Since completeness is the only concern here, unbounded computational and space /time capacity is assumed. 78 all topologies in T Valid on which the target behavior occurs. The completeness of the above procedure is trivial to prove as it completely enumerates the entire state space for each and every valid topology configuration. 6.2.2: Completeness of TCG The objective here is to show that the TCG generated by COMPRESS which relaxes detailed sequencing is capable to generating all paths representing the target behavior in the enumerated GCFSM sequence space on all topologies. This proves that the TCG used by COMPRESS is complete and has the required information to completely reconstruct the GCFSM sequence space for any topology. To prove this, a procedure that could theoretically 10 explore all paths leading to the target behavior in the complete state space for a given instance of topology t1 is designed. Then, a topology independent TCG is considered and paths for topology t1 instantiated. Then, it is proved that every path that the former would generate would be included in the set of paths generated by the later. The proof is then relaxed for all topologies to prove that claim that the TCG is complete. Table 2 compares the steps taken by the two approaches Naïve Approach TCG based approach Enumerate all possible topologies, ti in T_valid Generate TCG* using the algorithm used by COMPRESS Enumerate all possible initial states j for topology ti Enumerate all possible topologies ti in T_valid Enumerate all possible injection points k for AEE A to get t_(i,j,k) Generate all mappings of TCG* to ti Enumerate complete sequence space for t_(i,j,k) Generate all possible GCFSM sequence space allowed by the mapped TCG Enumerate all paths in sequence space and check if target behavior occurs Collect all paths on which target behavior occurs Collect all paths on which target behavior occurs Table 2: Naive approach Vs COMPRESS approach. 79 Consider a complete exploration of the enumerated GCFSM sequence space to a depth d 11 , i.e., all ways in which a GCFSM starting from all its initial states can evolve d steps after injecting a single AEE on to one of the eCFSM. This can be represented as a tree with branches of depth utmost d. A path in the enumerated GCFSM sequence space represents an instance of evolution of the GCFSM and is a traversal from the root of the tree to some intermediate GCFSM state. Let, TB be represented by <A,currentState1,nextState1, outputMessage1> <inputMessage2,currentState2,nextState2, outputMessage2> t1 ,a topology ,represent a set of eCFSMs connected by edges T TB ,a set of topologies, represent a set of topologies on which the TB occurs G GCFSM tis a set of graphsrepresenting the statespace StateSpaceSequence(s, n) be a procedure that enumerates the state space sequence starting from GCFSM state s such that there are only n GCFSM state transitions P GCFSMt i SEI is a set of paths leading to TBin graphsin GCFSMt i Consider the above procedure that generates GCFSM evolution trees of depth d for a given topology t1, due to a single external event starting from all possible initial states of GCFSM. 11 If target behavior occurs in the state space of GCFSM, it is assumed that d is large enough to include at least one occurrence of target behavior. 80 The above procedure generates GCFSM state evolution trees of depth utmost d that results from the occurrence of a single AEE A, starting from all possible initial states of the GCFSM. It also crawls the trees DEPTH=n T=t1 T TB ={∅} for each topology t i ∈T ## Generate enumerated GCFSMsequence space G GCFSMt i ={∅} for each initial state j of t i for each eCFSM k used to inject AEE A into t i, j G GCFSMt i =G GCFSMt i ∪StateSpaceSequencet i, j, k , DEPTH ##extract paths on which SEI occurs P GCFSMt i TB ={∅} for each graph g j ∈G GCFSM t i for each path p k defined by vertex pair 〈 v x , v y 〉∈g j for eachchild c of vertex v x for each child d of vertex v y if ( (<AEE,currentState1,nextState1, outputMessage1>∈< v x , v c >)∧ (<inputMessage2,currentState2,nextState2, outputMessage2> ∈< v y , v d >) ) P GCFSM t i TB =P GCFSM t i TB ∪p k 81 and enumerates all paths in the trees on which the target behavior occurs. Embedded in the vertices of a given path are the following information ● eCFSM state transition that generated a particular message ● eCFSM state transitions that were caused by the consumption of a given message ● If there are multiple pending eCFSM state transitions, then which one of them was triggered next in that particular path The above information can be made explicit by extracting a transitive closure graph consisting of the eCFSMs state transitions and the message they receive and generate, from a path in the enumerated GCFSM sequence space. The transitive closure graphs has the following properties ● If there exists an eCFSM state transition, then, there must be a message that was ○ An AEE and hence occurred autonomously ○ Internally generated as a consequence of a previous eCFSM state transition ● If there exists a non-AEE or internally generated message, then there must be an eCFSM state transition that generated that message In short, the TCG encodes that causal relationships between the messages and state transitions of eCFSMs. Figure 19 shows the transformation of a path in the GCFSM state space to a TCG. This transformation from a path p j ∈P GCFSMt i TB to its corresponding TCG is straight forward as shown. The following procedure transforms the TCG from each path in P GCFSM t i TB and stores it is the set TCG P GCFSMt i TB . 82 TCG will include all messages and all eCFSM state transition that are transitively generated as a consequence of a single AEE. Thus no message or eCFSM state transition type that could occur on any instance of a GCFSM will be missing from the TCG. Consider the following procedure that generates the TCG. Many of the constructs used to generate TCG is shown in Figure 20. We know that the state transition function of eCFSM maps mesg consumed ,state current tostate next ,mesg output Let , StTxTable SYSTEM represent a set of state transition mappings mesgVertices be a set of graph vertices of type messages each of which isalso associated with a set of topology equivalence class stTXVertices be an array of graph vertices of type state transition state current ,state next stTxmesg consumed ,state current ,state next , mesg output =state current ,state next consumedMesgmesg consumed ,state current ,state next , mesg output =mesg consumed outputMesgmesg consumed ,state current ,state next ,mesg output =mesg output Merge all eCFSM state transition tables into one big table. ## Extract TCG from paths representing TB TCG P GCFSMt i TB ={∅} for each p j ∈P GCFSM t i TB TCG P GCFSMt i TB =TCG P GCFSMt i TB ∪extractTCGp j 83 StTxTable SYSTEM ={∅ } for each type i of eCFSM for each state transition mapping j =mesg consumed ,state current ,state next , mesg output ∈f i , state transition function f of eCFSM i set f i , j as unused StTxTable SYSTEM =StTxTable SYSTEM ∪f i, j Consider the following procedure that builds a transitive closure graph mesgVertices={∅}, stTxVertices={∅ } for each item i ∈StTxTable SYSTEM tmpStTxVertex=tmpMesgVertex=NULL if consumedMesgitem i == A set item i as used tmpMesgVertex=consumedMessageitem i tmpStTxVertex=stTxitem i set tmpMesgVertex as parent of tmpStTxVertex if tmpMesgVertex∉mesgVertices mesgVertices=mesgVertices∪tmpMesgVertex else find vertex ∈mesgVerticessuch that vertex == tmpMesgVertex tmpMesgVertex=vertex set tmpStTxVertex as child of tmpMesgVertex set tmpMesgVertex as parent of tmpStTxVertex append tmpStTxVertex tostTxVertices constructTCCStTxTable SYSTEM ,tmpStTxVertex ,outputMessageitem i , mesgVertices ,stTxVertices 84 constructTCCStTxTable SYSTEM , preStTxVertex , preMesgVertex ,mesgVertices ,stTxVertices { for each mesg i ∈mesgVertices if mesg i == preMesgVertex set mesg i as child of preStTxVertex set preStTxVertex as parent of mesg i if preMesgVertex∉mesgVertices for each connectivity equivalenceclass CEC for destination preMesgVertex preMesgVertex.topology= preMesgVertex.topology∪CEC mesgVertices=mesgVertices∪preMesgVertex for each item i =mesg consumed ,state current ,state next , mesg output i ∈StTxTable SYSTEM if consumedMesgitem i == preMesgVertex∧item i is not used set item i as used tmpStTxVertex=stTxitem i stTxVertices=stTxVertices∪stTxitem i set tmpStTxVertexas child of preMesgVertex set preMesgVertex as parent of tmpStTxVertex tmpMesgVertex=outputMessageitem i constructTCCStTxTable SYSTEM , tmpStTxVertex, outputMessageitem i , mesgVertices,stTxVertices set item i as unused } The above procedure generates the transitive closure graph which enumerates all possible state transitions that can generate a particular message and all possible state transitions that are caused by any given message and links the two. Also for a given message destination type for every message, all topology 85 equivalence classes are enumerated. This graph specifies all possible topological relationships between that node (state transition) generating the message and the node (state transition) receiving the message at every step. The following procedure enables us to construct all paths of depth d, including the path on which the target behavior occurs, in any given instance of the GCFSM due to a single AEE. Let , TCG Candidate d isa set of graphs ,each of which is a connected subset of TCG * tcg t i be a set of graphs obtained after tcg∈TCG Candidate d is mapped to t i P tcg j t i isa set of graphs ,where each graph represents the enumerated statespace sequence of GCFSM extracted from tcg j t i TCG Candidate d ={all possible subsetsof TCG * with d connected state transition vertices} P tcg t i ={∅ } T={t 1 } for each topology t i ∈T for each tcg j ∈TCG Candidate d tcg j t i =instantiatetcg j ,d ,t i for each k ∈tcg j t i P tcg j,k t i =tcgToPathtcg j,k t i P tcg t i =P tcg t i ∪P tcg j, k t i 86 instantiatetcg ,d ,t { Generate all possible valid mappings from state transition vertices on path p to nodes∈topology t such that 1) all state transition vertices∈tcg are mapped 2) onestate transition vertex∈tcg ismapped toexactly one vertex on topology t 3) the total number of mappings isutmost d } tcgToPathtcg j ,k t i { generateall possible sequencingof eCFSM state transitions allowed by tcg. see Illustration below traverse each sequencing tovalidate continuity∈eCFSMstate transitions traverse each sequencing toreconstruct buffersof each eCFSM } The above procedure ● Maps a topology independent TCG to an instance of a topology such that ○ For each message, consider all possible types of nodes in all possible states that can receive and react to the message is considered ○ For each message destination type, all possible partial topologies and hence all possible topological ways in which the node receiving that message can be connected in relation to the node generating that message is considered. ○ These covers all possible ways in which the eCFSM vertices can be topologically connected, all possible initial states the GCFSM could be in, and all possible eCFSM state transitions. ● Constructs all possible GCFSM like paths from a TCG instantiated on a topology instance 87 ○ From the above mapping, there will always a at least one mapping with the exact topological connectivity and eCFSM state transitions such that they are the same as those in the actual GCFSM path ○ From the path construction procedure, all sequences allowed by the TCG instantiated on a topology are exhausted and hence there will exist at least one sequencing which will be the same as the sequence of the path in the GCFSM. Since this is true for any given path in the GCFSM, it is true for all paths in the GCFSM including the path representing the target behavior. This is trivially true for all topologies and hence can be generalized for all topologies in T Valid . 6.2.3: Completeness of pruned TCG In COMPRESS, only paths in the topology independent TCG that represent the target behavior is used for topology condition generation. In this section it is shown that any path representing the target behavior in the TCG extracted from the path in the state space of the GCFSM defined on any instance of the topology will be generatable from at least one path in the topology independent TCG without adding any new information to it. This shows that the local paths ( eCFSM state transition and messages in Figure 19 ) generated from the pruned TCG will not exclude any local paths in the enumerated sequence space that leads to the target behavior. To prove this, a procedure that could theoretically generate all possible paths representing the target behavior in the TCG extracted from the path representing the target behavior in the enumerate sequence space of GCFSM defined for a topology ti , is considered. Then, a pruned topology independent TCG that represents the target behavior is taken and mapped to a topology ti . Then, it is shown that the set of paths generated by the later would include every path generated by the former. The following table show the comparison between the naive approach and the TCG based approach taken by COMPRESS. 88 Naïve approach TCG based COMPRESS approach Generate all paths in GCFSM sequence space for all topologies on which the target behavior occurs Generate TCG* Transform each path to a TCG Enumerate all paths representing the target behavior in TCG* Enumerate all local paths For each topology in T_valid map paths to topology Collect all local paths representing the target behavior Table 3: Comparison of handling of local paths between the naive and COMPRESS approaches. The procedures shown in the previous sections can be used to extract TCG from the paths in the enumerated GCFSM sequence space of topology ti on which the target behavior occurs. The following Figure 21: Reconstruction of GCFSM paths using topology independent TCG and t i 89 procedure extracts the path representing target behavior from the TCG extracted from the path in the enumerated sequence space of a GCFSM defined on topology ti . The following procedure extracts paths representing the target behavior in the topology independent TCG and instantiates that path on all topologies. Let , P TB TCG P GCFSMt i TB be a set of graphs , where each graph represents the paths∈TCG P GCFSM t i TB leading to TB ## extract paths∈TCG that represents TB P TB TCG P GCFSMt i TB ={∅} for each path p k of depthDEPTH defined by vertex pair〈 v x , v y 〉 ∈TCG P GCFSMt i TB vertex v c =child vertex v x vertex v d =child vertex v y if ( (<AEE,currentState1,nextState1, outputMessage1>∈< v x , v c >)∧ (<consumedMessage2,currentState2,nextState2, outputMessage2> ∈< v y , v d >) ) P TB TCG P GCFSMt i TB =P TB TCG P GCFSM t i TB ∪p j 90 At this point the following is true. TCG * is generated by the genericTCG generation algorithm Let, P TB TCG * bea set of graphs representing pathspossibly with rolled loopsleading to TB∈TCG * P TB TCGt i be a set of graphs , where each graph represents a path in P TB TCG * mapped to t i P TB TCG * =set of all pathswith rolled loopsrepresenting TB∈TCG * for each topology t i ∈T valid P TB TCG t i ={∅} for each path p j ∈P TB TCG * P TB TCGt i =P TB TCGt i ∪Instantiatet i ,p j ,DEPTH Instantiatet ,p ,depth { Generate all possible valid mappingsfromstate transition vertices on path p to nodes∈topologyt such that 1) all vertices on path p aremapped 2) one vertex on path p is mapped toexactly one vertex on topology t 3) paths resultingfrom mapping represents TB within a depth d } 91 ● P TB TCG P GCFSMt i TB = [ p 1 t i . . p r t i ,. . . p s t i ] , the paths representing the target behavior in the TCG extracted from the paths in the enumerated sequence space of GCFSMs defined on topology ti ● P TB TCG * = [ p 1 . . p k . . p n ] , path representing the target behavior in topology independent TCG ● P TB TCG * t i = [ p 1,1 t i p 1,2 t i ...... p 1,m 1 t i . . ...... . . . ...... . p k , 1 t i p k ,2 t i ...... p k , m k t i . . ...... . . . ...... . p n , 1 t i p n ,1 t i ...... p n, m n t i ] , the paths in P TB TCG * instantiated on topology ti To prove completeness of the pruned TCG, the following is claimed ∀ paths p g t i ∈P TB TCG P GCFSMt i TB ∃path p a , b t i , p a ,c t i ∈P TB TCG * t i generated from p a ∈P TB TCG * ,such that 1) p a , b t i =p g t i 2) verticesp a ,c t i ⊂verticesp g t i 3) for any vertices pair〈 v x , v y 〉∈p a ,c t i and p g t i , Order v x , v y , p a , b t i =Order v x , v y , p g t i 92 From the procedure that maps the paths in P TB TCG * to topology ti , at each step, all possible states in which the receiving node could be in and all ways in which the receiving node could be topologically related to the node generating the message are considered. Using this property, it can be shown that for any given path in P TB TCG P GCFSMt i TB , every step will be included in the paths generated by the above procedure. Thus, there will exist at least one path where claim (1) will be true. Since the compact paths in TCG* can have loops an explicit mapping of such a path to an instance of topology may include paths where the loop is not considered. Claim (2) is a logical consequence of this and claim (1). Since the TCG is governed by the rules of the eCFSM table, the causal relationships between the messages and eCFSM state transitions are preserved in the TCG* (see Figure 21). For paths with loops, when loops are skipped while mapping it to a topology instance, there will be instances where intermediate eCFSM state transition are skipped. Therefore if a vertex exists in both paths, they satisfy the ordering property. Hence claim (3) is true. The above proves that every path p j ∈P TB TCG P GCFSM t i TB is generatable from P TB TCG * . 6.2.4: Completeness of topology generation In this section it is shown how the topology generation procedure collects only the static information present in the paths in the TCG representing the target behavior and why the conditions generated is the necessary topology condition. To prove this, it is shown that the topology generated from the path which would generate an equivalent path in P TB TCG P GCFSMt i TB would include the topology ti . Since COMPRESS does not explicitly enumerate a path for any instance of topology, there is no way of knowing which candidate path in TCG that represents the target behavior is the desired path. COMPRESS handles this by generating topologies for all paths in P TB TCG * and taking an intersection of the conditions. To show this is feasible, it is shown that there are only finite number of paths over which the intersection 93 operation needs to be performed. Lastly, all the results from the previous sections are tied together to show that COMPRESS is indeed complete. In the previous section it has been shown that for every path in P TB TCG P GCFSMt i TB there exists a path in P TB TCG * which can be instantiated on topology ti to obtain an equivalent path. In this section, it wiil be proved that the topology conditions generated by COMPRESS for that path will include the topology ti . The idea behind topology generation from a path in the p x ∈P TB TCG * is extremely simple. If an eCFSM state transition occurs, the message that caused the state transition must either be an AEE or be generated by another eCFSM state transition. Applying this logic at every step leads to the generation of topology for the path p x ∈P TB TCG * . Every state transition vertex in the TCG is preceded by a message vertex that caused the state transition and may be succeeded by a message vertex if a message is generated as a consequence of the state transition. Also in a given path p x ∈P TB TCG * the partial topology for a given message destination is selected at every step. Thus the following information is encoded in the path ● Type of message that caused the eCFSM state transition ○ The topological relationship between the node that generated the message and the current eCFSM under consideration ● Type of eCFSM that underwent the state transition ○ Relationship between the variables of the messages and the eCFSM ● Type of message generated by eCFSM state transition ○ Relationship between the variables of the eCFSM and the generated message ○ Topological relationship between the eCFSM and any potential eCFSM that receives this message 94 This information is available at every step starting from the AEE to the eCFSM defined in the target behavior specification. The topology generation procedure exploits this information. The following procedure is used to perform topology generation for a path in P TB TCG * whose loops have been unrolled to some depth “d”. Let p i be the path for which topology generation is being performed T p i ={∅ } for each stTxVertex stTx j ∈p i newNode=generateNodestTx j mark all node∈T p i as unused reconcileT p i , newNode Procedure to reconcile topologies reconcileT, newNode for each unused node n k ∈T mark n k as used pn 1 =n k ∩newNode pn 2 =n k −{n k ∩newNode} pn 3 =newNode−{n k ∩newNode} ## if block continued on next page 95 ## if block within for loop from previous page if pn 1 =∅ reconcileT, newNode mark n k as unused elseif pn1 == newNode set n k =newNode mark n k as unused elseif pn1 == n k reconcileT,∅ mark n k as unused reconcileT, pn3 mark n k as unused else n k =pn1 reconcileT, pn 2 mark n k as unused reconcileT, pn 3 mark n k as unused The above procedure 12 generates many instances of candidate topologies for a given path. The above procedure uses static properties of the nodes to generate topologies and at every step if there are multiple possibilities of topology generation, it covers all possibilities. Since the generic TCG represents a cardinality and a sequence relaxed version of the GCFSM, the topologies generated from the TCG will not include any information about the state of the eCFSM. 12 In case additional constraints are imposed on an eCFSM state transition in the path, additional backward traversal may be required to detect contradictions. 96 From the previous section,the following claims are true ∀ paths p g t i ∈P TB TCG P GCFSMt i TB ∃path p a ,b t i , p a ,c t i ∈P TB TCG * t i generated from p a ∈P TB TCG * ,such that (1) p a , b t i =p g t i (2) verticesp a ,c t i ⊂verticesp g t i (3) for any vertices pair 〈 v x , v y 〉∈p a ,c t i and p g t i , Order v x , v y , p a , b t i =Orderv x , v y , p g t i Consider any path p a ,c ∈P TB TCG * whose instantiation on topology ti that satisfies the above claims. On performing topology generation on such a path and extracting the necessary topology conditions, it is claimed that the set of topology instance defined by the necessary conditions will include ti for the following reasons ● If the p a , c ∈P TB TCG * generates a path equivalent to a path p g t i ∈P TB TCG P GCFSM t i TB then the former at every step must satisfy the node type and the relationship between the sending and receiving nodes. Thus the former path will include all existential static information about the topology excluding the cardinality of nodes. ● If the p a , c ∈P TB TCG * generates a path that is a subset of the path p g t i ∈P TB TCG P GCFSM t i TB then the former path at every step of commonality must satisfy the node type and relationship between the sending and receiving nodes. Since this is true for all paths p g t i ∈P TB TCG P GCFSM t i TB there will always be a path in p a ,c ∈P TB TCG * that will cover the set of nodes not covered by p a ,c ∈P TB TCG * under consideration. 97 However, it is impossible to determine which path p a , c ∈P TB TCG * when instantiated in topology ti generates the path p g t i ∈P TB TCG P GCFSM t i TB . To obtain the necessary conditions, topologies for all paths p a , c ∈P TB TCG * that could potentially represent the target behavior in the TCG* are generated. To be able to do this, it has to be proved that the number of such paths is finite. While generating topology independent TCG*, all possible state that the next node could be in and all ways in which they can be topologically interconnected is considered. This can result in a path with loops in the TCG. Unbounded unrolling of loops can lead to infinite paths. Figure 22 shows a typical example. Figure 22: Unrolling of loops in paths representing the target behavior in TCG*. 98 The path in the TCG* representing the target behavior may have loops when the same type of message occurs more than one in the path. In this case the topology generation procedure unrolls the loop up to a point where no newer information can be added. Unrolling of loops can only be done to a finite depth as ● FSM state transitions tables of all eCFSMs are finite ○ They have finite types of messages, state transitions and states ● For a given message destination, there are only a finite topology equivalence classes. Thus, after unrolling, a point is reached where the topology generation algorithm keeps reusing the existing topological elements and hence the unrolling can be stopped. This proves that the number of paths over which the topology must be generated and necessary conditions extracted are finite. Thus far,it has been shown that ● All paths P GCFSM t i TB for a given topology ti can be generated from TCG* ● All paths P TB TCG P GCFSMt i TB for a given topology ti can be generated from TCG* ● Topology conditions generated from p a , c ∈P TB TCG * which generates a path equivalent to a path p g t i ∈P TB TCG P GCFSM t i TB will include the topology ti ● There are only finite number of paths that can represent the target behavior in a TCG These prove that the topology conditions generated by COMPRESS is the necessary condition that any topology on which the target behavior occurs satisfies. 6.2.5: Generalization for multiple eCFSM state transitions in target behavior In the previous section it has been proved that the topology conditions generated by COMPRESS for a specific definition of the target behavior is indeed the necessary topology condition. In this section, the 99 proof is generalized to include multiple eCFSM state transitions. To generate topology conditions for a target behavior with multiple eCFSM state transitions, the necessary topology conditions for each pair of paths starting from the external event to the eCFSM component of the target behavior is generated. The union of necessary conditions generated for each such pair, represents the necessary topology conditions for the entire target behavior. This is a logical consequence of the structure of the TCG and De-Morgans laws. The completeness of the TCG remains the same as the number of external events are not being changed. Since the completeness of the pruned TCG holds for a target behavior with a single eCFSM component, it also holds individually for each of the paths defined by each of the eCFSM components in this case. Since only static elements of the topology are considered the necessary conditions due to many paths in the TCG is simply the union of the topology conditions generated for each component of the target behavior. 6.2.6: Generalization for multiple AEEs In the previous section a target behavior with only a single external event was considered. In this section the methodology will be generalized to accommodate multiple external events. For this generalization, the baseline using the modified GCFSM enumeration procedure is restated. generateStateSpacet i ∀ GCFSM=combinationof initialstates of fsm i ∈t i ={fsm} 1 n ,[c] (a) ∀ types of aee x ∈AEE including{∅} ∀combinations of injecting aee x intoGCFSM ,GCFSM={GCFSM, aee x } ∀ permutationof triggering enabled transition on GCFSM enumerateOneStepGCFSM gotoa With this new enumeration procedure, the set of all topologies on which the target behavior occurs is found. 100 T TB ={∅} for each topology t i ∈T valid ## Generate state space G GCFSMt i ={∅} for each initial state jof t i for each node k used to inject AEE A intot i , j G GCFSMt i =G GCFSMt i ∪generateStateSpacet i, j, k for each graph g j ∈G GCFSM t i allSEIVertex=TRUE for each vertex v x ∈TB if v x ∉g i allSEIVertex=FALSE if allSEIVertex == TRUE T TB =T TB ∪t i allSEIVertex=TRUE The above procedure considers all possible sequence and cardinalities of injection of external event to generate the enumerated GCSFM sequence space for each topology. COMPRESS handles this case using multiple TCGs, one for each type of AEE. The target behavior paths from the external event to the eCFSM component is searched across all TCGs. The requirement for occurrence of a given component is the existence of a path in at least one of the AEE specific TCGs. To generate the necessary topology conditions for the occurrence of the target behavior, necessary conditions are generated for each <AEE, eCFSM> pair in the target behavior and the union of the necessary topology conditions for all such pairs represents the overall necessary topology conditions. The completeness proof 101 of the TCGs is the same as the completeness proof of the single AEE case. Here, all possible TCGs while generating the paths in the GCFSM representing the target behavior have to be considered. The completeness proof of the pruned TCG is the same as before and needs to be shown for each and every path representing all components of the target behavior. 6.3: Complexity of COMPRESS The overall complexity of COMPRESS is best understood by understanding the complexity of the individual components. Thus the complexity of COMPRESS depends on ● Complexity of generating the TCG from the external autonomous event ○ G(V , E) where V = (sum of all messages) * (topology equivalence classes), E = state transitions ○ Worst case = V^2 ● Complexity of finding all paths in the TCG from external event to the components of the target behavior ○ Complexity of finding all paths from source to destination = (V^2)! in the worst case ● Complexity of finding and bounding loops ○ (# of paths with loops ) (topology generation per path of loop depth i, i+1) (detecting topology generated from path of loop depth i is a subset of the topology generated from a path of loop depth i+1) ● Complexity of generating topology for a given path ○ n^3, where n is the number of state transition in the path ● Complexity of taking the intersection of topologies generated by all paths from source to destination ○ (Number of paths) (c-1)!, where c is the number of nodes in the topologies generated by the paths ● Complexity of reconciling necessary conditions generated for each component of the target behavior 102 ○ (# of components of target behavior) * (c - 1)!, where c is the number of nodes in the topology for that component of target behavior Though the worst case complexity of the various components of the topology generation algorithms look hopelessly large, in reality, various factors limit the complexity to a great extent. Firstly, the worst case TCG rarely occurs as the worst case occurs when every message is acted upon by every state, thus generating a fully connected graph. Secondly, the TCG is rarely densely connected, thus yielding fairly manageable number of paths from root of the graph to the required destination. Thirdly, uniqueness of nodes combined with type of node and connectivity limit the complexity of finding the intersection of topology conditions between the various paths that represent the component of the target behavior as well as merging the necessary conditions generated for each component. This argument is further supported, as only new information that is usually in the form of node type while performing topology generation is added. 6.4: Tightness of necessary conditions Consider the problem of reconstructing all paths of depth d in GCFSM state space from TCGs generated using only the topological connectivity of the eCFSM and the FSM transition tables of the component eCFSMs. To obtain the exact GCFSM path from the TCG, the TCG construction methodology must keep track of as much information as a enumerated GCFSM sequence space construction methodology would, i.e., at every step, the state of the individual eCFSM and their buffers must be tracked. Consider a TCG generation methodology where only the state of the eCFSMs are tracked, ignoring the messages in the buffer. If the TCGs from which all possible paths in enumerated GCFSM sequence space of depth d can be reconstructed, TCGs must be generated in a way that compensates for the loss of the information resulting from ignoring the contents of the buffers. In the enumerated GCFSM sequence space evolution, the contents of the buffers of the eCFSMs will result in sequencing of eCFSM state transitions. Since this information ignored, at each step, the possibility that any possible message type/cardinality could have been in the buffer is explored. To do this, at each step, all possible permutation and combination of 103 triggering of eCFSM state transition for that message is considered. This will lead to generation of TCG which will explore a larger set of possibilities which would also include the TCG from which all paths in the GCFSM of depth d can be reconstructed. Consider another approach where only the messages in the buffers are precisely tracked while the state of the individual eCFSM are ignored. The set of all TCGs generated would be able to generate all paths in the GCFSM by considering all possible states in which a eCFSM could be in for every message received. The trade-off being made here is computational complexity to accuracy of reachability answers obtained. As the more details are relaxed while generating the TCG, lesser number of TCGs are generated, and hence the reduction in computational complexity. However, if this TCG is used to generate GCFSM paths, it will generate far more paths, i.e., the ones that are present in the enumerated GCFSM sequence space as well as the ones that are not. Thus, more work needs to be done to reconstruct paths in GCFSM and among the reconstructed paths there is not way of knowing which ones are in the enumerated GCFSM sequence space and which ones aren't. At one extreme, one can keep track of all information and obtain exact answers at the cost of extremely high computational complexity (which could border on undecidability in some cases). At the other extreme, one can keep track of only static information and obtain partial or incomplete results Figure 23: Tradeoff between quality of reachability based answers and complexity. 104 at the cost of decreased computational complexity. The following figure summarizes the trade-offs being made. A fundamental consequence of relaxation of tracking any information is the loss of the ability to precisely answer reachability questions. In all the three tradeoff quadrants in the above figure, the reachability questions are guaranteed to yield affirmative results if the answer is affirmative in the case where all information is tracked precisely. However, this is not true for negative results. As more information of relaxed the quality of the results degrade. 6.5: Interpreting negative results If COMPRESS generates topology conditions for any given target behavior, then the topologies included in that set will always contain the topologies on which the target behavior occurs. However there can be target behaviors for which • There exists no path in the TCG • There exists one or more paths, but none generate topologies The completeness proof says that if there exists a non empty set of sufficient topology conditions for an target behavior to occur, then COMPRESS will produce topology conditions which defines a super set of topologies which will always include the topologies defined by the sufficient conditions. If there exists no sufficient topology conditions, then COMPRESS may or may not detect it as COMPRESS does not exhaustively search sequencing space or cardinality in topologies. It is only under the conditions where they are static or structural contradictions that COMPRESS is able to detect the non-occurrence. Therefore, existence of multiple valid path whose topological intersection is empty is not the same as the case where there exists no valid paths. In the former case, the conditions generated by COMPRESS represents the set of all valid topologies whereas in the later case, no topology conditions can be generated and hence the target behavior cannot occur under the conditions in which the TCG is built. 105 6.6: Chapter 6 summary In this chapter it as been shown how the partial ordering enabled by transitive closure enables generation of necessary conditions. The worst case computation complexity of various steps of COMPRESS have been discussed and reasoned why a general case is not expected to be as bad as the worst case complexity. 106 Chapter 7: Evaluation of COMPRESS and analysis of results In this section, the strengths and weaknesses of the COMPRESS methodology is evaluated by generating necessary topology conditions for different target behaviors of protocols. Then, the necessary topology conditions generated by COMPRESS are used to (a) verify if the behavior occurs by generating sufficient conditions and (b) evaluate the severity of the target behavior. To evaluate the effectiveness of generation of necessary topology conditions by COMPRESS, fully specified topologies of small sizes are enumerated and for each, occurrence of target behavior is checked by complete state space analysis. For each such enumerated set, the total number of topologies that satisfy (SAT) necessary conditions (NC) and the total number of topologies on which the target behavior occurs (sufficient conditions (SC) is compared with the set of all topologies. The smaller the set of topologies satisfying the necessary conditions compared to the set of all topologies, the bigger will be the set of all topologies that can be pruned before each individual instance of topology is fed to a complete state space enumeration algorithm. The greater the pruned topology space, the more effective is COMPRESS. Unlike sequence space complexity where the number of state is a good metric to understand its complexity, the number of states cannot be used as a metric as COMPRESS does not explore the entire state space and there is nothing equivalent to topology generation and reconciliation in the other existing methodologies. Further, unlike most other topology generation methodologies which mostly generate sufficient topology and state space conditions for the occurrence of a behavior, COMPRESS generates necessary conditions and hence a direct comparison in complexity of the different approaches is not meaningful. The number paths that are generated and transformed into topology conditions dominates the runtime complexity as in most cases the runtime complexity of extraction of topology from paths was linear in terms of the depth of the paths. 107 As a part of the case studies three class of protocol, namely Multicast based Micro-Mobility Protocol, resource discovery protocol and client server protocol were initially selected. Topology dynamics were not modeled and topologies were assumed to be static for the above case studies. In order to demonstrate the extensibility of COMPRESS, finite topology dynamics (single hop node mobility) is modeled and the effectiveness of COMPRESS is demonstrated using the Emergency Context Resolution with Internet Technologies (ECRIT) case study in the next chapter. • Multicast based Micro-Mobility Protocol [helm04]: For this protocol the behavior of mobility detection and handover mechanisms that is used to detect IP mobility based on handover information provided by the MAC layer to obtain a domain wide IP unicast/multicast address is studied. Though this sounds very simple, there are certain behaviors caused by certain topology configurations that are neither intuitive nor obvious. COMPRESS was used to generate topologies for a number of behaviors expressed as target behaviors. • Resource discovery protocols may be used to discover resources of the LAN using a broadcast request and a unicast response mechanism. Initially, a simple resource-discovery mechanism is selected and attempt to generate topologies for different target behaviors. Then, an extended ZeroConf [stei05] like resource discovery protocol for resource discovery in an enterprise is considered. ZeroConf protocol enables a host to bind to a unique link local address in the absence of DHCP, bind to a unique link local name, and announce and find services on other hosts on the same link local LAN. Very simple extensions to ZeroConf like protocols are assumed so that enterprise wide resource discovery can be performed. It is assumed that an enterprise wide multicast address and routing tree exists and nodes announce and find resources using this multicast service. The protocols use a simple query response mechanism. A node initially queries for a resource and gets a response. The response typically consists of the name (enterprise scoped and selected in a distributed fashion) of the node on which the desired service is available. The 108 querier then attempts to resolve the name to an IP address to which it can connect to use the resource. • Client server based protocols are expensively used by various infrastructure as well as application network protocols. There are several ways in which the client-server protocol can be designed. For example, the client and server can communicate over different ports or different TCP sessions for each round of message exchange. For the case study, three such categories were considered. • Emergency Context Resolution with Internet Technologies (ECRIT) [ecri06] is an attempt by Internet Engineering Task Force (IETF) to define a set of requirements and protocols to enable 911 like emergency calling feature for IP based systems, mainly for V oIP. One of the explicitly stated goals is to “describe when these (protocols for location resolution) may be appropriate and how they may be used” as overlay routing can render the location information obtained based in IP addresses unsuitable for emergency call routing [ecri06]. The ECRIT protocols use a combination of DHCP, DNS, LIST and HELD protocols to determine location information form the location information server (LIS). 109 COMPRESS was able to produce necessary topology conditions for most of the target behaviors. The complexity of producing sufficient conditions from the necessary conditions were very managable. Table 4 lists the summary of results. Effectiveness of COMPRESS is measured in terms of the percentage of topology space that can be eliminated based on the necessary topology conditions. The greater that number, the more effective is COMPRESS. 7.1: Case studies I: Static topologies In the first set of case studies, no topology dynamics are modeled. All topologies are assumed to be static. Links are assumed to be lossless as packets drops are not modeled. For case studies in this section, recursive NATs are not modeled and a physical LAN segment is assumed to host only one private network. NATs are assumed to act as gateways for private networks. Table 4: Summary of results of case studies. # of TB M&M Single BS 7 7 7 of 7 27.7 LAN 2 2 1 of 1 77.7 WAN 5 4 4 of 4 87.8 1 1 1 of 1 98.5 1 1 1 of 1 1 1 1 of 1 99.4 ECRIT IP based 5 5 4 of 4 47 Protocol Class Protocol In - stance TBs addressed by COMPRESS Existence of suf - ficient condi - tions AVG. Effective - ness of COM - PRESS Resource Discovery Client-Serv - er TCP (Same port, Same IP) TCP ( Different port, Same IP) 97.9 HTTP (Same port, Same IP) 110 7.1.1: Multicast based micro mobility protocol The M&M protocol described in [helm04] operates at the same layer of dynamic host configuration protocol. Additionally, it also manages the subscription to its domain wide multicast group. Client discovers the DHCP servers using a link local broadcast address mapped to a MAC broadcast address. The DHCP servers also reply using the same, but with the client's MAC ID embedded within the body of the message. Since the protocol was not designed to handle private IP addresses and consequently the mobility between private and global IP addresses, only global IP addresses are considered in this model. At the MAC layer, hub and base station act as packet forwarding vertices. DHCP server, mobile node and designated router forms the non-packet delivery vertices and they can either be wired or wireless hosts. Apart from the obvious wired and wireless physical edges, virtual edges like MAC broadcast edge and IP broadcast edges are modeled. Further, at the MAC layer the LAN can only have a single hub with zero or more base stations.connected to it. There is assumed to be one designated router per LAN. Figure 24 shows the state transition table of the M&M protocol. The M&M client uses link local broadcast as the message destination address to communicate with the DHCP servers on the LAN and link local multicast address to communicate with the designated router on the LAN. A detailed description on how this topology model was developed is described in Section 9.6. The partial topology representation for broadcast message destination types is show in Figure 15. 111 Figure 24: Client and server FSMs for mobility detection and handover protocol. 112 7.1.1.1: Target behaviors studied for M&M protocol For the multicast based micro-mobility protocol, COMPRESS was used to generate partial topology conditions for the following target behaviors Target behavior Description 1 MN receives a Layer 2 handover (L2HO) signal and sends a join to its multicast group 2 MN receives L2HO and receives an IP address from a new IP domain 3 MN with a valid IP address receives L2HO and then receives an IP address from new IP domain 4 MN receives L2HO and receives more than on type of reply for its query for DHCP servers 5 MN receives L2HO and detects a new domain but receives an address from the old domain 6 MN receives an L2HO and detects no mobility but receives an address from a new domain 7 MN receives L2HO and sends Join but receives data form different domain Table 5: Target behaviors for which topologies were generated for variants of M&M. 7.1.1.2: Results and data The following table lists the necessary conditions that COMPRESS generated for M&M protocol with and without timer suppression. Target behavior Necessary conditions generated by COMPRESS 1 At least one DHCP server is in link local broadcast range of MN 2 At least one DHCP server is in link local broadcast range of MN Domain of DHCP server is different from the domain of the initial IP address of the MN 3 LAN on which there exists at least one DHCP server serving a domain different from the domain of the MN's original IP address 4 LAN on which there exists at least two DHCP servers in the link local broadcast range of MN 5 LAN on which there exists at least two DHCP servers in the link local broadcast range of MN, with at least one DHCP server serving a domain same as the domain of the initial IP address of the MN and another DHCP server serving a domain different from the domain of the initial IP address of the MN 6 LAN on which there exists at least one DHCP server serving a domain different from the domain of the MN's original IP address 7 At least one DHCP server and a DR is in link local broadcast range of MN, there exists a DR belonging to a domain different from the domain of the IP address of the MN Table 6: Necessary conditions generated by COMPRESS for M&M. 113 From the topologies generated, it can be seen that there are certain behaviors that look interesting and needs to be evaluated before making a decision weather the behavior of problematic or not. For example, target behavior 5 where the MN detects a new domain but receives an address from its old domain though sounds like incorrect behavior, is in fact a behavior that occurs as a matter of fact on topologies where a single physical LAN segment serves more than one domain (Figure 25). Now the question that becomes relevant is “does the behavior cause performance degradation”. Since the necessary conditions namely “a single physical LAN segment serving more than one domain” is known, it is up to the designer to figure out the distribution of topologies that satisfies this necessary condition and evaluate the effect of such a topology configuration. Without the necessary topology conditions, the designer would not have a data point to look for, in the topology distribution and hence estimating the effect of the behavior on average performance would not be straight forward. Another behavior that looks interesting is target behavior 7. Here the MN from one domain is able to receive data from meant for another MN from another domain. This is an unspecified condition and needs to be addressed explicitly in the design. Another insight obtained from this study was from target behaviors 2, 3 and 4. If the question is under specified, then the topology generation algorithm will generate more topologies, thus leading to an increase in the run time complexity. Target behavi or # of topologies generated and reconciled Number and type of elements to be added to convert necessary topology conditions to sufficient topology conditions 1 22 Refine node types and add BS, select initial state of MN (5 of 5 choices) 2 12 Refine node types and add BS, select initial state of MN (4 of 5 choices) 3 3 Refine node types and add BS, (initial state of MN specified in target behavior) 4 2 Refine node types and add BS, (initial state of MN specified in target behavior) 5 2 Refine node types and add BS, (initial state of MN specified in target behavior) 6 5 Refine node types and add BS, select initial state of MN (3 of 5 choices) 7 22 Refine node types and add BS, select initial state of MN (5 of 5 choices) Table 7: Performance indicators of COMPRESS for M&M. 114 To evaluate the effectiveness of COMPRESS small topologies with one and two DHCP servers and one DR was generated including all configurations the relationship between IP addresses that the M&M protocol embeds. Figure 26 and Figure 27 shows the total number of topologies that were generated with 1 and 2 DHCP servers including the number of topologies on which the target behavior occurs. Figure 26: Effectiveness of COMPRESS for target behaviors 1,2,3 for M&M protocol. TG1 TG2 TG3 0 2 4 6 8 10 12 14 4 3 3 12 10 10 M&M # topos (1 DHCP) topo. SAT NCs (1 DHCP) # topos (2 DHCP) topo. SAT NCs (2 DHCP) # of topologies Figure 25: Pictorial representation of necessary topology conditions for TB 5 and 7. 115 From the above graph it is obvious that the target behaviors 1 through 3 occur on most topologies. These are as expected as the behavior modeled in target behaviors 1 through 3 capture the routine expected behavior of the M&M protocol. Behaviors 4 through 7 represent slightly more complex behaviors, some of which occur only on topologies with two DHCP servers. Figure 27: Effectiveness of COMPRESS for target behaviors 4,5,6,7 for M&M protocol. TG4 TG5 TG6 TG7 0 2 4 6 8 10 12 14 0 0 2 2 12 3 10 10 M&M # topos (1 DHCP) topo. SAT NCs (1 DHCP) # topos (2 DHCP) topo. SAT NCs (2 DHCP) # of topologies Figure 28: Effectiveness of COMPRESS as percentage of eliminated topologies. TG1 TG2 TG3 TG4 TG5 TG6 TG7 0% 20% 40% 60% 80% 100% M&M topo. ! SAT NCs topo. SAT NCs %age of topologies 116 Figure 28 shows the effectiveness of COMPRESS measured in terms of the number percentage of the topology space that can be eliminated even for topologies with one and two DHCP servers. COMPRESS is most effective for target behavior 5 generating topology conditions that eliminated 80% of the numerated topologies. 7.1.2: Resource discovery protocols Resource discovery protocols sole purpose as its name suggests is to enable a client to discover resources that run the server component of the discovery service. A typical setup consists of a communication channel which servers listen to. The client only knowing the nature of service required (for example, printer service) and not known the address of the server, broadcasts its request for service over the communication channel. Any server that satisfies the service request has to respond to the request. Upon reception of multiple replies from the server, the client can choose the appropriate server to use. In this case study, it is assumed that the client picks the server whose response is received first. The client then attempts to use the address received in the reply to connect to the service. Various scenarios in this context is studied in this section. 7.1.2.1: Simple resource discovery protocol In this case study, a specific configuration of virtual private network and local routing combination that is commonly seen in virtual private networks are modeled. It is assumed that a given node in the network has utmost one VPN tunnel to the server open at any given time. A default routing schemes in which the default route is through the physical interface is assumed. Also the configuration of the topology is assumed to be static. This captures a very commonly seen scenarios where resource discovery works on the local LAN or on the remote LAN but not both when VPN is enabled. For this case study, client and resource are IP end hosts and act as non-packet forwarding vertices. VPN server as well as IP gateways acts as IP packet forwarding vertices. LAN at the MAC layer is modeled as a simple wired LAN, where hub acts as MAC packet forwarding vertex. 117 Figure 29 shows the FSM tables of the client and server. All IP nodes in the network are assumed to have acquired an IP address. Further, the query and reply mechanisms are assumed to use link local broadcast address and use the default interface, while the connect mechanism is assumed to use TCP. Since link local broadcast addresses are not propagated beyond the physical broadcast domain, no private networks or NATs are modeled. The partial topology for IP link local broadcast is represented by the broadcast domain through the default route, The partial topology for TCP is represented by (a) TCP source connected to TCP destination with IP address different from the IP sub-domain of the source, through a VPN server, and, by (b) TCP source connected to the TCP destination on the local LAN with IP addresses belonging to the same Figure 29: State machine of a simple resource discovery protocol. 118 IP sub-domain. Two simple target behavior were studied, both related to the actions of discovery followed by ability/inability to connect to the service. Table 9 depicts the necessary topology conditions generated by COMPRESS. Since no paths were found to generate valid partial topologies for target behavior 1, this scenario does not occur in reality. This is one of the few cases where COMPRESS was able to generate a negative result. The necessary condition (Figure 30) generated for target behavior 2 was close to that of sufficient conditions. To measure the effectiveness of COMPRESS different topologies were generated each with one, two and three LANs. As show in Figure 31 from the target behavior occurs only on topologies with more than two LANs. Using COMPRESS nearly 80% of the topologies can be eliminated as uninteresting. Table 8: Target behaviors studied for resource discovery protocol on LAN. Table 9: Necessary topology conditions generated by COMPRESS for target behaviors. TB 1 No paths in TCG generated topologies 2 Necessary conditions generated by COMPRESS (LAN) Client with Global IP address, Resource with Global IP address on a different LAN and SubDomain, VPN connection between Client and Resource LAN Figure 30: Pictorial representation for necessary conditions for target behavior 2. TB Description (LAN) 1 A querier can discover a resource, but not connect to it 2 A querier with global IP can connect to the resource, but not discover it 119 The number of paths for which topologies need to be generated was fairly low as the protocol model is small and the possible partial topologies representing the link local broadcast is small. Using the augmented COMPRESS algorithm, the number of paths evaluated was reduced by a factor of 33%. 7.1.2.2: Augmented zero-conf protocol The augmented zero configuration protocol is used to discover resources in an enterprise wide setting. Querier and resource are non-packet delivery vertex types. All nodes in the network are assumed to have a valid IP address and are also assumed to have joined the enterprise wide multicast channel. Multicast routing itself is assumed to be correct. Like any normal enterprise, network, both private and public IP Figure 31: Effectiveness of COMPRESS for resource discovery on LAN 1 LAN 2 LANs 3 LANs 0 1 2 3 4 0 1 1 Resource Discovery (LAN) # topos topo. SAT NCs # of topologies up to 3 servers 0% 20% 40% 60% 80% 100% !SAT NCs SAT NCs %age of topologies Table 10: Number of paths evaluated for resource discovery on LAN. LAN # paths COMPRESS A-COMPRESS TB 1 27 18 TB 2 27 18 120 address are assumed. Since private networks are modeled, in addition to a single default gateway per IP sub-domain, different types of NATs including multicast pass through, blocking, DMZ NATs are modeled as packet delivery vertices. IP routing as described in Sections 9.2 and 9.3, but for a single IP domain is assumed. At the MAC layer, a simple hub based wired LAN is assumed. All queries and replies are assumed to happen on the multicast channel, whereas, the connection to use the service is assumed to use TCP.A simple multicast model is used where all designated routers (DRs) send packets to the single well known rendezvous point (RP) and receive packets from the same. For the sake of simplicity, the RP is assumed to exist on a separate LAN. Partial topologies for domain wide multicast destination type and TCP is used in this case study, as they are the only two message destination types used by the protocol. Table 11 depicts some of the target behaviors of interest. target behavior Description 1 A querier can see a resource, but not connect to it 13 2 A querier with global IP address can see a resource, but not connect to it 3 A querier with Local IP address can see a resource, but not connect to it 4 A querier queries a resource, gets a response but on connecting, it connects to a different resource 5 A querier with global IP address queries a resource, gets a response but on connecting, it connects to a different resource Table 11: Target behavior used for topology generation for extended ZeroConf protocol. Table 12 shows the necessary topology conditions generated various target behaviors. Two target behavior were of interest here. Target behavior 4 was a result of the consequence of multicast enabled NAT exporting the local address space out of its scope (see Figure 32). It was fairly well known that NATs pose a big problem as they break the assumptions that were made about IP addressing. The topologies generated represent a manifestation of the same. Though the ZeroConf specification briefly mentions that NATs may cause disruption in operation of link local resource discovery, the nature of manifestation was not intuitive and COMPRESS helped discover the topologies is a systematic manner. Target behavior 5 failed to produce any partial topologies for all paths in the TCG representing the target behavior and hence target behavior 5 does cannot occur on any topology. 13 A message pattern where IP routing is not possible, 121 target behavior Necessary conditions generated by COMPRESS 1 Client and its DR, Resource with local IP address behind a multicast pass through NAT and its DR, Domain.SubDomain of Client and Resource are not the same, RP 2 Client with Global IP address and its DR, Resource with Local IP address behind NAT supporting Multicast pass through and its DR RP 3 Client with Local IP address behind a multicast enabled NAT and its DR, Resource with a Local IP address behind a different multicast enabled NAT and its corresponding DR, Domain/SubDomain/host of IP address of Client and Resource are different, RP 4 Client with local IP address behind multicast enabled NAT, Resource with Local IP address behind a different multicast enabled NAT, Domain.SubDomain of Client and Resource are the same, there exists a unicast path from Client to a different node with the same IP address as the Resource. 5 No paths in the TCG generated topologies Table 12: Necessary conditions generated by COMPRESS. Figure 32: Pictorial representation of necessary conditions for TB 4. Table 13: Number of paths evaluated to generate necessary topology conditions. WAN # paths COMPRESS A-COMPRESS TB 1 157464 5808 TB 2 157464 2904 TB 3 157464 2904 TB 4 6928416 7744 TB 5 6928416 3872 122 To evaluate the effectiveness of COMPRESS enterprise networks consisting of one and two public IP LAN segments were enumerated taking into account the possibility of existence of private networks behind NATs. Unlike the previous case studies, as show in Figure 33, the number of fully specified topologies were almost an order of magnitude higher. Through protocol FSM itself is simple, there are many partial topologies that represent the how two non-forwarding eCFSM using a multicast addressed packet can be connected to each other. For target behavior 5, since no condition was generated, all topologies are marked as satisfying the necessary topology condition. However, since no path actually exists in the TCG representing the target behavior, TB5 does not occur on any topology. To generate small sized topologies, all topologies with two servers, one client spread across one and two public IP sub domains were explored. This represents a large enough space of topologies within the realm of realistic assumptions. For target behaviors 1 through 4, all topologies which satisfied the necessary conditions were also the topologies on which the corresponding target behaviors occurred. From Figure 34 it is clear that COMPRESS is effective in eliminating a very large portion of the topology space (up to 98% Figure 33: Performance of COMPRESS for Enterprise wide resource discovery protocol. TB 1 TB 2 TB 3 TB 4 TB 5 0 100 200 300 400 500 600 700 800 900 0 1 2 1 2 1 2 1 2 1 2 1 2 814 242 248 30 32 212 216 16 16 814 818 Enterprise wide resource discovery LANs # topos topo. SAT NCs # of LANs # of topologies 123 in case of target behavior 4). Since the topology space itself is large even for topologies with two public IP local area network the effectiveness of COMPRESS is high. Multicast enabled NATs export local name space outside its scope and cause name space clashes, especially when multiple LANs have the same private network IP sub domain address. The necessary conditions generated by COMPRESS represents a large class of topologies, especially if the users do not change the default settings of NAT boxes. Assuming that private networks are restricted to leaf networks, the obvious fix is to filter out responses from private networks. 7.1.3: Client server based protocols Client server protocol based on transport or session layer abstractions are extremely common in today's networked systems. Typically, the client and server exchange messages based on states to achieve their objective. Though the Representation State Transfer (REST) [rich07] architecture is extensively used, there Figure 34: Effectiveness of COMPRESS for enterprise wide resource discovery protocol. TB 1 TB 2 TB 3 TB 4 TB 5 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% EWRD topo. !SAT NCs topo. SAT NCs Target Behaviors % of topoloies 124 are application that do not lend themselves easily to this architecture which makes the server completely stateless. Many e-commerce servers are designed as stateful servers where the client and server go through a series of states to achieve their objective. A typical example is a checkout process in a web e-commerce application where the client and server go through a series of states eventually leading to the client being presented with the total amount for the checked out items. A simple representative is show in Figure 35. This protocol has a special property, i..e., the sequence of interaction between the client and server is associated with its own unique number and the reply to query is given only if the unique number matches. It the sequence is request is off with a matching unique number or if there is a mismatch in the unique number, the server responds with a order_reset. This property is used during topology generation to track the progression of the state of the server. Without this property being taken into account, COMPRESS generates trivial necessary topology conditions, namely the existence of a client and a server. Figure 35: FSM of client server protocol exchanging a sequence of messages. 125 The three different versions of the protocol each using same/different TCP ports and another using HTTP is studied in this section. Client and server are the non-packet forwarding vertices at the TCP and HTTP layers respectively. For the HTTP case study, transparent load balancing proxy acts as the message forwarding vertex. At the TCP layer, NATs and different types of load balancers act as message forwarding vertices. At the IP layer, a single default gateway per IP subdomain per LAN acts as the packet forwarding vertex. At the MAC layer, a hub based wired LAN is modeled. Partial topologies for HTTP and TCP message destination types are used in the TCP and HTTP based case studies respectively. The same target behavior is studied for all the three versions of the protocol, i.e., the client receives the response to its first request but, then gets an order_reset response from the server. 7.1.3.1: TCP: same port, same DST In this section, every message exchange between the client and server happens on different TCP connections but on the same port. The set of all NATs modeled includes all types of load balancers. The necessary topology conditions generated by COMPRESS is as shown in Figure 36. From the necessary topology conditions are generated, it is obvious why this conditions is problematic. Most topologies do not use a load balancer that simply sprays TCP connections blindly to one or more back-end servers. For the next set of case studies, such a dumb load balancer was not considered to the included in the set of allowed eCFSMs. Figure 36: Necessary topology conditions generated by COMPRESS. 126 One of the fundamental problems that was faced while modeling load balancers was that the number of states of the load balancer depended on the number of back-ends that the load balancer had. On a hunch, the simplest configuration selected, i.e., a configuration with two back-ends based on the fact that, a protocol not designed to handle choice breaks even the most simple choice is presented. This does not guarantee the coverage of all possible topologies, but the set of all allowed load balancers can be kept to a finite number and still good results obtained. Figure 37 shows the high degree of effectiveness of COMPRESS, as this behavior does not occur on more than 98% of topology configurations. 7.1.3.2: TCP port spray For this case study the protocol is assumed to send each request to a different port. The necessary topology conditions generated by COMPRESS is shown in Figure 38. For this case study, the effectiveness of COMPRESS was high as more than 98% of topologies generated were eliminated. Figure 37: Performance if COMPRESS for TCP based client server protocol. 1 server 2 servers 0 100 200 300 400 500 0 7 TCP (Same SIP, Dst Port) # topos topo. SAT NCs # of topologies up to 2 servers 0% 20% 40% 60% 80% 100% TCP (Same SIP, Dst Port) !SAT NCs SAT NCs %age of topologies 127 7.1.3.3: HTTP In this case study, the protocol modeled is assumed to be communicating using HTTP. For this case studies, HTTP transparent proxies and HTTP load balancing proxies were modeled. Similar issues of states of the load balancer depending the number of gateway proxies were faces and the approach taken to solve this problem was the same as described earlier. COMPRESS was able to detect the interaction between HTTP load balancing proxies and load balancer at the server side resulting in failure of the protocol. The partial Figure 38: Pictorial representation of necessary topology conditions. Figure 39: Performance of COMPRESS for TCP based client server protocol. 1 server 2 servers 0 100 200 300 400 500 0 14 TCP (Same SIP, Diff Dst Port) # topos topo. SAT NCs # of topologies up to 2 servers 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% TCP (Same SIP, Diff Dst Port) !SAT NCs SAT NCs %age of topologies 128 topology generated by COMPRESS is shown in Figure 40. This particular scenario was independently discovered by this approach and later on verified that this was a wide spread problem with users of America On Line Internet service users because AOL networks were configured with load balancing proxies and when the clients behind the proxies interacted with a source IP or an IP/port based load balancer, e- commerce applications broke. Since this topology is much more complex than the previous case, the number of possible configuration is very large. However, the number of problematic topologies is extremely small, i.e., just 10 out of 743 topologies (Figure 41). Using COMPRESS, more than 99% of uninteresting topologies can be eliminated, with the caveat that without the protocol's properties described earlier, COMPRESS would generate very trivial conditions that cannot be used to eliminate any topologies. Figure 40: Necessary topology conditions for HTTP based client server protocol. Figure 41: Performance of COMPRESS for HTTP based client server protocol. 1 server 2 servers 0 100 200 300 400 500 600 700 800 0 10 HTTP # topos topo. SAT NCs # of topologies up to 2 servers 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% HTTP !SAT NCs SAT NCs %age of topologies 129 Table 14 shows the number of paths that need to be evaluated to generate necessary topology conditions. The very high number of partial topologies representing all ways in which a client and server could be connected leads to the very large number of paths. However, when A-COMPRESS algorithm is used, early contradiction detection significantly reduces the runtime complexity. The fix to this problem is dependent on how the stake holders in the system are distributed. For example, in the AOL case, the clients are the customers of the ISP. The e-commerce service providers may or may not be the customers of the TCP. For the ISP, the obvious solution is to disable the transparent load balancing proxies,provided it serves their interest (required traffic characteristics, higher customer satisfaction etc.). For the e-commerce application providers, the obvious solution is to migrate end point load balancing based on a higher layer construct like HTTP cookies. Though, COMPRESS was able to generate necessary topology conditions for the target behavior to occur, evaluating the nature of fix required be much more complicated than other cases, 7.2: Insights into topology generation and COMPRESS From the results of the various case studies, it is clear that the necessary conditions generated by COMPRESS are close to the sufficient conditions. This indicates that COMPRESS works well for the class of protocols under study. To understand why COMPRESS works even though it does not consider detailed sequencing of messages, the causes of occurrence of target behaviors needs to be characterized. As show in Figure 42, the occurrence of target behavior depends on connectivity, eCFSM types, cardinalities Table 14: Number of paths evaluated to generate necessary topology conditions. Client Server # paths COMPRESS A-COMPRESS TCP (Same port) 50625 675 TCP (Different ports) 50625 675 HTTP (Same port) 1336336 3468 130 of eCFSMs, relationship between the various fields of the eCFSMs, and finally protocol states. All these properties other than protocol states, is called “Topology properties”. A given behavior of the protocol can depend on • Topology properties (other than connectivity): The occurrence of a target behavior is exclusively dependent on one or more properties of the topology i.e., on a given topology satisfying some topology properties, all sequences will result in the occurrence of the target behavior. No sequence of events on a topology not satisfying that property will be able to reproduce the same behavior. ◦ There may be target behaviors that occur on more than one topology configuration that may or may not have common necessary conditions • State space interleaving: The occurrence of a target behavior is exclusively dependent of the interleaving of messages, and consequently the state changes, and will occur on any topology configuration as long as some sequence of state transitions occur. Here node cardinality and connectivity may also play an important role as they determine sequencing of messages. In such target behaviors, topology generation will not play a crucial role as the required sequencing may be producible even on trivially connected topologies. • Combination of State space interleaving and topology property ◦ Topology property or state space interleaving: in such target behavior's either some sequence of events that are possible due to interleaving independent of topology properties or some topology property lead to the same scenario. In the limited cases studied, this type of target behavior has not been observed. The approach of generating necessary conditions, for such target behavior itself does not yield any insightful answers and hence COMPRESS will also not be able generate any useful information. ◦ Topology property and state space interleaving: a majority of target behavior fall into this category as some topology conditions combined with certain sequence of interleaving of messages will lead the target behavior. Thus, topological elements with certain properties connected in certain ways may result in the target behavior. 131 In general it is not possible determine which one of the above causes a particular behavior until detailed analysis of the various steps leading up to the behavior is studied. However, it is may not be possible to study some of the topology dependent behavior if the wrong topology is used. This classification is used to explain why COMPRESS works and how it can fail under certain conditions. Occurrence of target behavior generally depends on sequencing as well as topological properties. In general sequencing of messages is dependent of topological properties like node cardinality, node type, state of different nodes, relationship between the various fields of the node and connectivity. Depending on the number of nodes in the topology and the protocol state machine, the number of possible sequences could be extremely high. Thus trying to search all regions at the same time can lead to high complexity. In general, the resource discovery protocols studied in this work fall into “topology property” and “topology and state space interleaving” category as the occurrence of an target behavior is strongly dependent on some topology property as well as some particular interleaving possible over the topology satisfying that property. Generally, most end to end data communication protocols fall into “state space interleaving” category with some dependence on abstract property like delay or jitter on the end to end path. A classic example would be the transmission control protocol (TCP). Introducing packet loss into resource discovery Figure 42: Dependence of occurrence of target behavior. 132 protocols will give raise to target behavior that fall into the category “topology property or state space interleaving”. As shown in Figure 43, COMPRESS performs the search only on certain topological aspects namely “eCFSM type”, “relationships” and “connectivity”. The topology generation per path in the TCG will be able to enumerate some of the required “eCFSM type” and “relationships” and “connectivity” without exhaustively searching the space of possible sequences or cardinalities of nodes. Since COMPRESS does not exhaustively search the sequencing space resulting from cardinality and state of the different nodes in the topology, COMPRESS is able to generate information regarding Type of node, connectivity and relationship between different fields of the nodes. Further, if the protocol under study is Figure 43: Comparison between COMPRESS and full state space / topology space analysis. 133 agnostic to the cardinality of nodes of any particular type, this does not affect the quality of the conditions generated by COMPRESS. 7.3: Advantages and limitations of COMPRESS Like every methodology, COMPRESS makes assumptions both about the expressiveness of the formalism used to represent the protocols, and the topology to enable generation of necessary topology conditions, without solving the state space reachability problem. In this section, the advantages of COMPRESS are highlighted followed by the limitations resulting from the assumptions 7.3.1: Advantages of COMPRESS One of the fundamental contribution of this methodology is enabling the protocol designer to think about topology in terms of its components and allowing the methodology to perform the composition of the components to generate topology conditions that may represent complex composite topologies. Generally, topologies used to test protocols are explicitly input to the protocol testing procedure as a composite topology. Since there are many building blocks for the topology, and, many ways to compose them, manually generating meaningful composite topologies is not an easy task. Automating the composition of topologies using a systematic enumerative approach results in very high complexity, as the number of compositions for a simple LAN topology with 4 nodes can run into millions (see Figure 3). COMPRESS provides a simple but systematic way for the designer to think about topologies. Instead of thinking about composite topologies, the designer needs to list the message destination types that the protocol under test uses. For each such message destination type, the protocol designer needs to supply the a set of partial topologies that define how the node sending the message and the node receiving the message are connected. The set of topology classes can be systematically obtained by tracing the dependency graph of a protocol at a given layer with the message destination types and protocols that are used at the lower layer. This information once generated can be reused across other tests. Further such a library of topologies for a message destination type can be augmented and distributed as standard cell libraries as done with 134 VLSI design tools. Since the components are simpler and topologies for message destination types are independent of the protocols that use them, thinking about topologies in terms of components rather than compositions gives a higher degree of clarity to the designer. Though the designer may come across new topology components, the components alone may not indicate how it affects the protocol under test as it may require other components to cause the target behavior. In many scenarios, it is the composite topology that causes the target behavior of the protocol rather than any one single component. Once the target behavior is specified, the COMPRESS methodology uses transitive closure on the states and messages of the protocol under test to constrain the ways in which composite topologies are generated. To the best of out knowledge this is the first work that uses static analysis of protocol's state space to generate composite topologies. Using several case studies it is shown how COMPRESS can be used as an effective tool to generate necessary topology conditions that represent compositions of two of more components. 7.3.2: Limitations of COMPRESS Many of the limitation of COMPRESS are a consequence of the models used and assumptions made. Protocols are modeled using a simple eCFSM syntax and assumed to have finite number of states and messages. A protocol whose states depend on the topology or environment in which it executes, with potentially infinite states, may also be represented as a compact finite state machine which includes integers as a part of the state variable. For example, states of the routing protocols at least includes the identity of every nodes it can reach within one hop. COMPRESS by assumption excludes all protocols which have unbounded integer variables embedded in the state. Though this excludes routing protocols, from the case studies we have shown that there are many other protocols which can be modeled within the realms of the finite state and message assumption. A networked system is often represented as a GCFSMs including connectivity between individual eCFSMs. The connectivity itself can be binary or more expressive to represent directionality, weighted edges, delay etc. Further, the connectivity can be static or dynamic (including probabilistic). In this study only binary bidirectional connectivity is assumed. The COMPRESS methodology itself is agnostic to these assumptions 135 and by changing the topology generation and partial topology representation for message destination types, some of the richer edge semantics can be incorporated. Again, handling real number as a property of edges will most likely be problematic as it may yield potentially infinite representation of partial topologies for message destination types. • Given a GCFSM state space, the behavior of protocols can be expressed in the following ways. • Syntax of eCFSM, GCFSM states and state transitions reachable from initial state ◦ Complete or partially specified state transitions of the eCFSM defined by consumed message, state transition of the eCFSM along with the state of the buffer, output message ◦ Complete state of the GCFSM ▪ Including the state of the buffers ▪ Excluding the state of the buffers ◦ Partial state of the GCFSM, i.e., with state of certain eCFSM components being “don't care” ▪ Including the state of the buffers of the components ▪ Excluding the state of the buffers of the components • Existence of sequences of eCFSM and GCFSM states and state transitions defined by ◦ Partially or fully specified eCFSM/GCFSM states or state transitions ◦ Sequential relationship between the states or state transitions specified by the above • Structure of GCFSM sequences ◦ Loops in the enumerated sequence space of a GCFSM may be considered as live locks ◦ Deadlocks may be state in which all components of the GCFSM have not reached their final state, which is represented by the presence of a GCFSM in the required state in the leaf or terminating vertex of the enumerated sequence space • Local aggregate behaviors can be expressed as a summation of number of messages transmitted/consumed, number of state transitions that a given state machines undergo. A typical example of this type of behavior is good put of TCP. 136 • Global aggregate behaviors can be expressed can a combination of set of local aggregate state behaviors and properties of the networked system. For example, the average throughput per node in the network requires the number of nodes in the network including the throughput of each node. COMPRESS covers a small, but important set of this vast range of expressiveness of state machines and behaviors. The COMPRESS methodology requires that the eCFSMs have a finite number of states and messages. Topology connectivity models only include binary connectivity without dynamics. This model, further limits the expressiveness of the target behaviors to include behaviors that can be expressed as a sequence of FSM state transitions as described earlier. If the protocol is a finite representation of an infinite state machine, then the TCG generated will not be finite because of one of the following reasons • TCG may contain infinite state transitions and/or messages • Finite representation of the states may represent a potentially infinite set of topology sets (number of neighbors in a routing protocol may be unbounded) Artificially bounding the enumerated states that a compact representation can represent takes care of finiteness of the TCG. This technique works in cases where the states of the protocol are not dependent on the number of eCFSMs instance in the topology. This technique was used in the ECRIT case study to limit the number of hand over states that a node can have, essentially limiting the number of hand overs that a node can incur. However, if such an exercise is under taken with a routing protocol, the limitations on the expressiveness of behaviors as a sequence of eCFSM state transitions does not allow one to express global behaviors like convergence, routing loops etc., which are the more interesting aspects of the routing protocols. Further, since there are no limitations on the topology connectivity configuration, the enumerated state machine exhibits the worst case behavior, where every message reacts with every state. Since the TCG considers every state that a message can cause a state transition, the quality of the necessary conditions generated for all target behaviors will likely be trivial. Behaviors that requires GCFSMs cannot be expressed as the COMPRESS does not assume how many nodes are in the topology or how they are connected. Though the approach of aggregating multiple local FSM states into a more complex state to increase expressiveness of target behaviors looks promising, the aggregation requires considering all 137 possible states for the complex state which is usually a combination/permutation of the individual states that make up the complex state. Since state relaxation of the TCG requires considering all possible states of the aggregate state, this will result in worst case property for the eCFSM where every message will react to every state, leading to trivial solutions, while incurring high computation complexity. General aggregate behaviors over messages or state on a given FSM also cannot be expressed as the COMPRESS methodology uses a state relaxation to generate necessary topology conditions without effectively solving the reachability problem. General aggregate behaviors over global states are also not expressible as COMPRESS does not consider global states or completely solve the reachability problem. COMPRESS algorithm is fundamentally driven by the causal relationships between messages and state transitions. Consequently, COMPRESS is unable to capture non-occurrence properties. For example, a packet loss leads to non-occurrence of an event and it cannot be represented in the TCG, unless the packet loss itself is represented as reception of lost packet which is appropriately interpreted by the protocol state machines. Further, the necessary topology conditions generation captures all occurrence properties and fails to capture negative conditions that must be satisfied by the topology. For example, necessary topology conditions generation procedure is unable to generate a negative topology condition like “all topologies which do not have a DHCP server”. This limitation is demonstrated in the ECRIT case study. However, if the non-occurrence property can be extracted from the eCFSM tables of the protocols, such static conditions can be seamlessly incorporated into the topology generation procedure. Though several limitations of the COMPRESS methodology have been listed in terms of expressiveness of the formalism used to represent the protocol, as well as the target behavior, it has been shown through case studies that there is not only a large and interesting set of protocols that COMPRESS is not only applicable to, but also generates verifiable topologies on which several interesting target behaviors occurs. 138 7.4: Chapter 7 summary In this chapter, protocols like M&M, resource discovery protocols and client server protocols are used to evaluate COMPRESS. Necessary topology conditions required for target behaviors to occur is generated using COMPRESS. Then, to evaluate effectiveness of COMPRESS, small sized topologies are generated and the occurrence of behavior is checked on each by full state space enumeration. From the case studies it is clear that the necessary topology conditions generated by COMPRESS is able to eliminate a large number of topologies on which the target behavior does not occur. 139 Chapter 8: Extensions of COMPRESS methodology In this section, some of the techniques that can be used to work with the limitations of COMPRESS, while exploring topology dynamics, packet loss and delay semantics in the topology are presented. 8.1: Case Study II: Topology dynamics Though external events like topology dynamics, node crashes/reboots etc., are typically unbounded in nature, it is often enough to study the effects of only a finite set of such events as protocols may exhibit erroneous behaviors even with a small number of such external events. Further, other factors may limit the exposure of the protocols only to a finite number of such external events. For example, the mean time between reboots of a base station is significantly less that the average time TCP sessions last. In such cases, only a finite number of such events will most likely suffice. One of the fundamental techniques that can be used to model such finite external events is to explicit identify each external event and incorporate the event identifier into the states and messages of the protocols under consideration. Transforming such external events into an eCFSM state is a powerful mechanism as it enables modeling limited topology dynamics including mobility (ECRIT). For example, the ECRIT case study uses this technique to study the effect mobility of a given node on the ECRIT protocols. In reality, an MN can incur an arbitrary number of handovers. For this case study, each handover is explicitly identified by an identifier and made a part of the eCFSM of the protocols of the MN as well as all the non-packet delivery eCFSMs that the MN interacts with. The time between two consecutive external events is referred to as an epoch. All events and states resulting from the messages and states triggered by the external event at the start of the epoch is tagged to belong to that epoch. Further, it is assumed that the messages and states in an epoch do not interact with each other. To keep the state machine 140 finite, only a window in which the MN can incur a finite number of handovers is considered. The mobile node model is changed to have as many interfaces as the number of handovers modeled. Though the node has multiple interfaces, only one interface is assumed to be active at a given time. A simple example of finite mobility (two handovers) is shown in Figure 44. For this case study, two consecutive epochs triggered by one external event is modeled. With this small window, COMPRESS was able to generate scenarios indicating underspecification of the ECRIT protocols. Similar techniques can be used to handle simple topology dynamics like reboots and crashes. When no handover is modeled, the host (regular or virtualized) looks the same as a virtualized host would, i.e., a host with a public IP address or like a host behind a NAT. However, when mobility is introduced, the end host model needs to be reconsidered. The model of the end host is a shown in Figure 45. The interfaces that incurs mobility will be different in both the cases. In the host model that is not virtualized, the physical interface of the host is also the interface that the ECRIT protocol uses. However, in the virtualized host, the physical interface is not the same as the interface used by the ECRIT protocol as the virtualization layer connects the virtual interface to the physical interface through a NAT. Thus,the logical/virtual interface always sees a private IP. Further, when a hand-off occurs on the physical interface, there is not protocol defined that will relay the hand-off information to the virtual interface. In the network world, this is Figure 44: Modeling finite topology dynamics. 141 equivalent to detection mobility of an entire sub-network and relaying that information to all the members of that sub-network. The virtualized and non-virtualized hosts were modeled as states, and, the behavior of the two were different to a layer two handover event. For example, protocols running in a virtualized host would not see a handover event and hence react to it, whereas, a non-virtualized how would see a handover event and react to it. The FSMs of the two is as shown in Figure 46 including the number of external events captured as states. One of the unique challenges faced in modeling this was related to the essence of what TCG captures, i.e., causal relationships between messages and state transitions. Since it is assumed that when a host receives a handover event, all protocols using the interface when the handover event is received is notified and the event appropriately handled. For example, if a handover event occurs when there is an active TCP session, that session needs to be abandoned and appropriately handled at the higher layers. Since the initiation of the connection to LIS server and L2HO are both external events and the scenario under study is one of trying to figure out if handover events affect ECRIT protocol, following two new constructs were used: Figure 45: Physical host model with and without virtualization. 142 • On finishing a round trip with the server (DHCP, DNS, STUN, LIS), the client queries external nodes which are assumed to be always reachable on link local IP broadcast address, if it incurred a handover. One replies with yes and the other replies no. To keep the changes to the protocol minimal, the state machine was changed only to check of the handover happened after connecting to the LIS server. • The above reply is chained with the steps of the protocol as shown in Figure 46, so that the client can either proceed with the next step of the protocol or abandon the steps and restart based on the type of node (physical or virtual). The topology model was extracted from the description of the ECRIT protocol's operational scenarios described in [ecri06]. For this case study, the mobile node, DHCP server, DNS server, STUN server and the LIS servers were modeled as end hosts at the IP, transport and the HTTP layers. At the IP layer, a special network configuration that accounts for a device based LAN, home based LAN, private IP subnetwork hosted by the ISP, a public network hosted by ISP was modeled. Each had their corresponding gateway (single GW / LAN). A single transit vertex performed the function of inter domain routing in the global IP Figure 46: FSMs of virtual and physical hosts tracking handovers as states. 143 space. Simple NATs as well as transparent HTTP proxies were modeled. Simple wired LAN was used for all but the home LAN, where, wireless LAN was modeled. Managed and unmanaged VPN servers were modeled such that only the mobile node could connect to the network through them. Virtual edges corresponding to IP link local broadcast, TCP, UDP, HTTP and HTTPS were modeled including their corresponding partial topologies. The eCFSM state machine of the ECRIT protocol is as shown in Figure 47. The goal of the protocol is to enable a host to determine its physical location based on its IP address. To do this, the host uses DHCP and DNS protocols and tries to find the IP address of its location information server (LIS). Upon finding the address of the LIS server, it connects to the server and obtains the location information. The details of the protocol can be found in [ecri06]. The following target behavior were studied Figure 47: ECRIT protocol state machine. 144 COMPRESS was modified to handle the scenario in which the client is mobile, i.e., it handles two handover events. In this case, the TCG generation procedure remains the same. The change occurs in the topology generation procedure for a given path. For example, when two handover epochs are modeled, the mobile node is assumed to have two interfaces, sequentially numbered, each for one handover event. Along a path, once the interface with higher sequence is used, the lower numbered interface cannot be used again. The MN itself connects to a single partial topology through the multiple interface as show in Figure 44. Another modification required for this case study was to explicitly add fields to messages, so that messages that passed through tunnels could be distinguished from the ones that did not. The fields added to the messages did not alter or modify the behavior of the protocols generating or receiving the messages. COMPRESS was able to generate necessary topology conditions as shown in Table 16. Since the protocol specification explicitly addresses the issue of detection of the tunnels multiple tunnel configurations were modeled, including managed (corporate) and unmanaged (user setup) tunnels. Table 15: Target behaviors studied for ECRIT protocol. TB Description 1 (static) Client initiates LIS discovery and connects to an LIS server 2 3 4 5 (static) Client initiates LIS discovery, but connects to STUN through VPN tunnel (static) Client connects to LIS, but LIS server rejects HTTP connection (static) Client connects to LIS, but LIS server rejects HTTPS connection Client connects to an LIS server through HTTPS and its next connection to the same server gets rejected. 145 TB1 specifies a very generic scenario of the client connecting the LIS server. This includes all possible configurations of the topology on which the target behavior occurs. As predicted earlier, this consumes relatively more computational resources and yields very weak necessary topology conditions. The number of paths evaluated by COMPRESS is show in Table 17. From target behavior 1 it is clear that A- COMPRESS yields enormous gains in computation complexity. TB2, specified a scenario in which the tunnel detection mechanism as specified in the protocol specification fails to detect that presence of a tunnel leading the client to erroneously connecting to the STUN server through the tunnel. This occurs in topologies where there is a user managed tunnel, which may not satisfy corporate compliance rules. TB3 is a scenario in which the client discovers the LIS server and upon connecting to it, the server rejects the connection. This occurs in a scenario in which the client connects to an LIS server through a HTTP proxy whose domain is different from that of the LIS server. This scenario was in fact speculated in the ECRIT work group discussions and the suggested fix was to use HTTPS instead of HTTP. TB4 scenario is the same at TB3, except for the fact that the client uses HTTPS instead of HTTP. In this case, COMPRESS was unable to find a path connecting the two and hence this behavior is not expected to occur within the bounds of the models used in this case study. For TB5 topology model was changed to incorporate topology dynamics as shown in Figure 44 and Figure 45. The state machines of the protocol was changed to incorporate the handover semantics as well as to chain the two HTTPS requests while incorporating the Table 16: Necessary topology conditions generated by COMPRESS for ECRIT protocol. T B Description 1 2 3 4 No paths generated valid topologies 5 (static) Client, at least one DHCPs and atleast one LIS (w eak necessary condition (static) Client, unmanaged bridge tunnel, at least one unmanaged DHCP server behind unmanaged NA T , at least one ST UN serv er (static) Client w ith HT T P proxy, LIS serv er w hose domain is different from that of HT T P proxy (mobile) V irtualized client connected to managed A ccess netw ork w ith its LIS server w ith global IP address before handover, and another managed access netw ork w ith a different domain to w hich the MN is connected to after handover 146 AEE occurrence between the HTTPS sessions. A very simple single tier access network model was considered for this particular target behavior as this was sufficient to demonstrate the extension without undertaking extensive changes to the topology generation implementation used for other case studies. COMPRESS was able to generate the necessary topology conditions for TB5 as shown in Table 16. The main culprit in this scenario is the virtualized host where the handover information is not conveyed to the virtualized host and hence the virtualized host reuses the cached LIS address even after handover. To evaluate the effectiveness of COMPRESS, topologies were generated to include VPN tunnels, managed/unmanaged DHCP/ DNS servers, LIS, STUN servers and HTTP proxies. This case study reveals one of the weaknesses of partial topology representation that was predicted earlier. From Figure 48 and Figure 49, it is clear that the necessary conditions generated by COMPRESS for TB1 and TB3 is loose, as the number of topologies on which the TB occurs is significantly less than the number of topologies which satisfy the necessary topology conditions. This is due to the fact that the target behavior occurs on topologies with disparate structure and components. However, the effectiveness of COMPRESS for TB2 is high as the necessary conditions also represents sufficient conditions and a large number of uninteresting topologies can be eliminated. Figure 48: Topologies satisfying necessary and sufficient conditions (ECRIT) TB1 TB2 TB3 TB4 0 10 20 30 40 50 60 ECRIT (Static topologies) # topos topo. SAT NCs topo. SAT SC 147 Table 17 shows the number of paths evaluated to generate necessary topology conditions. A-COMPRESS yields enormous gains for TB1 and TB2. There are no gains for TB4 and TB5 as both the scenarios required exploration of only two levels in the TCG tree, where both COMPRESS and A-COMPRESS Table 17: Number of paths evaluated to generate necessary topology conditions for ECRIT. ECRIT # paths COMPRESS A-COMPRESS TB 1 > 5 Million 5216 TB 2 1542 372 TB 3 576 576* TB 4 64 64* TB 5 4098 - Figure 49: Effectiveness of COMPRESS for ECRIT case study. TB1 TB2 TB3 TB4 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Effectiveness of COMPRESS ECRIT topo. SAT NCs # topos 148 examines the same number of paths. It is only after three levels does A-COMPRESS start showing gains compared to COMPRESS. A-COMPRESS for TB5 was not implemented as TB5 was a scenario used to demonstrate the ability to extend COMPRESS to incorporate finite topology dynamics and COMPRESS was good enough to demonstrate that capability. 8.2: Handling packet loss semantics in COMPRESS One of the ways of modeling packet loss in COMPRESS is to model lossy and lossless links as shown in Figure 50. However, it is important to understand the semantics of such a model. Since the source or destination or both could be types, the lossy links represents the loss of all packets of a given type between the source and destination node types. Modeling selective losses is not possible with this model. Further, one of the consequence of packet loss is that the eCFSMs will not change its state. The TCG is generated based on causal relationships of messages and state transitions that occur. Since the consequence of packet loss is “non-occurrence” of a state transition, it needs to expressed in the semantics required by the TCG, i.e., as an occurrence event with consequences expressed in terms of messages and state transitions. This requires modifying the protocol eCFSM state machine is a way that consistently expresses the packet loss semantics while retaining all the properties of the original eCFSM. Even if this is done, is it not very clear if such an exercise is useful as the states of the eCFSM are relaxed in the COMPRESS methodology. Then, when a message in the TCG sees a given state, it is not obvious if the state occurred due to the relaxation procedure that COMPRESS uses, or if it was due to packet loss. In general the COMPRESS methodology is unsuitable for modeling scenarios with packet loss. 149 8.3: Handling Delay Semantics in COMPRESS In this section the system model is extended to model different delays that a message may encounter in its end to end path between two non-packet forwarding eCFSMs. System delay model is as show in Figure 51. It consists of • A set of communicating FSMs with unbounded (input/output) buffers are assumed to be connected to a communication medium that may also have internal buffers. • Enabled state transitions can take time (d_fr) to fire (message consumption delay). • A state transition may take time (d_g) to generate a message (message generation delay) • Output buffers may take time (d_fl) to be flushed to the communication medium (message flushing delay). • Communication medium of a given type is associated with a maximum transmission rate and minimum propagation delay. (D_trans, D_prop) • Messages are associated with a minimum length (msg_len) • The total delay to send a message through a communication medium consists of transmission, propagation and queuing delays. ◦ Min transmission delay = min packet length/Max transmission rate ◦ Min Propagation delay = (static property externally defined) Figure 50: Modeling loss and lossy links. 150 ◦ Queuing delay (d_link_queue) is a function of the number of internal buffers and their corresponding occupancy as seen by the packet passing through the internal buffers Delays can be classified into the following types • Static delay (indicated by “D” ) ◦ is invariant and depends on static properties. • Dynamic delay (indicated by “d”) ◦ Delay due to congestion and queuing. E.g, the queuing delay is a function of occupancy of the queue which itself is dependent on the number of nodes injecting packets into the buffer and the rate of injection at each node. The following properties hold true for static and dynamic delays • Lower bounds on static delays may be calculated when all components of the delay equation are known. When static delays are stacked, and the stack size unknown due to partial representation of topology, the lower bound may still be calculated using stack size of one. • Calculating bounds on dynamic delay components depends on topology as well as on the traffic generated by the nodes in the topology. The trivial lower bound on dynamic delays is zero. Non trivial upper bounds may be calculated under special circumstances. 151 In general, the detailed end to end delay can be mapped to an abstract model as follows (see Figure 51) • Post state transition delay (d_postTX) ◦ Message generation delay + Output queuing delay + flushing delay ◦ Delay type is dynamic. ◦ Lower bound is zero. ◦ Upper bound can arbitrarily large to allow for all possible sequencing • Delay due to topology component ◦ Transmission + Propagation delays (D_trans + D_prop) ▪ Delay type is static ▪ Lower bound can be calculated using static properties of the link as well as the static properties of packet length ◦ Queuing Delay (d_link_queue) Figure 51: Abstract delay semantics modeled in COMPRESS. 152 ▪ Delay type is dynamic. ▪ Lower bound is zero. ▪ Upper bound can be calculated with precise information about topology, nodes and traffic generation properties at each node. For partial topologies, determining upper bounds is infeasible. • Pre state transition delay (d_preTx) ◦ Input queuing delay ▪ Delay type is dynamic. ▪ Lower bound is zero. ▪ Upper bound can be calculated with precise information about topology, nodes and traffic generation properties at each node. For partial topologies, determining upper bounds is infeasible. ◦ Message consumption delay ▪ Delay type is dynamic. ▪ Lower bound is zero. ▪ Upper bound can be arbitrarily large to allow for all possible sequencing. Proof that lower bound calculated by COMPRESS is always less than or equal to the lower bound for target behavior to occur on any topology. Let tb be the target behavior. Let t x ∈T valid represent an instance of fullyspecified topology. Let P tb represent the set of all possibleevolutions of thesystem GFSM t x ending with tb. Let tcg p i represent the transitive graph extracted from path p i ∈P tb Let p j tcg p i represent the pathin tcg p i representing tb 153 Total delaycomponents of path p j tcg p i can be represented asfollows total delay p j tcg p i = ∑ a=0 p { d a msggen d a outputqueue d a flushing } ∑ b=0 q { D b trans D b prop d b link queue } ∑ c=0 r { d c input queue d c msgconsump } where , p ,q , r are the number of post transition , link and pre transition eventsin path p j tcg p i Let D p j tcg p i be thestatic component of totaldelay p j tcg p i D p j tcg p i = ∑ b=0 q { D b trans D b prop } ≤ total delay p j tcg p i Let TCG represent a set of topologyindependent transitive closure graphs Let P tb TCG represent theset of all pathsrepresenting the target behavior in TCG From completeness proof , there will alwaysexist a path p k TCG ∈ P tb TCG such that its represents a subset of state transitions in p j tcg p i total delay p k TCG = ∑ a=0 p ' { d a postTx } ∑ b=0 q ' { D b trans D b prop d b link queue } ∑ c=0 r ' { d c preTx } 154 where , p ',q ', r 'are the number of post transition ,link and pre transition events in path p k TCG Since TCG is topology independent dynamicdelaycomponents cannot be calculated and hence, ⌊ totaldelay p k TCG ⌋≥ 0 ∑ b=0 q ' { D b trans D b prop 0 } 0 ≥D p k TCG Since q≥q ', ∑ b=0 q { D b trans D b prop } ≥ ∑ b=0 q ' { D b trans D b prop } ⌊ D p j tcg p i ⌋≥D p k TCG ⌊ totaldelay p j tcg p i ⌋≥D p k TCG Thelowerbound delay D tb∈TCG for tb to occur on any topology is calculated as follows D tb∈TCG =⌊ D P TCG ⌋ D tb∈TCG ≤⌊ totaldelay p j tcg p i ⌋∀ i , j Though the lower bound on static delays on all topologies can be calculated, this is not a very useful metric of the topology. Protocols which employ static or dynamic timers often do so to handle the dynamics aspects of the topology ,i.e., packet loss, node crashes and delays caused due to congestion. Though this extension is mathematically valid, it practical applicability is rather limited. 155 8.4: Chapter 8 summary In this section, several extensions to COMPRESS are discussed. Through the ECRIT case study, the usefulness of the extension to handle limited mobility is demonstrated. Though other extensions to handle packet loss and delay are modeled, it is shown why such extensions are not very useful. These extensions perhaps indicates the boundary of capabilities of COMPRESS as a framework as presented in this work. 156 Chapter 9: Guidelines for using COMPRESS COMPRESS is a methodology that enables the protocol designer to generate necessary topology conditions that any topology on which the target behavior occurs must satisfy. The designer must clearly understand the syntactic structures used in COMPRESS including its semantics to be able to effectively use COMPRESS. As shown in Figure 18, the COMPRESS algorithm requires (a) topology model (b) protocol model and (c) the target behavior as inputs. Using these inputs, COMPRESS generates necessary topology conditions for the target behavior to occur. This chapter gives guidelines to the users of COMPRESS regarding modeling of real world networks in a language required by COMPRESS and dealing with the constraints imposed both by the language and the COMPRESS methodology. This chapter does not attempt to suggest what aspects of the network topology that the designer needs to model. The selection of the level of detail of the network model to test a given protocol is left to the discretion of the user. This chapter is organized as follows • Section 9.1 describes the basic building blocks of a fully specified topology, namely, vertices, fields, edges followed by the packet delivery rules and constraints that need to be imposed on fully specified topologies. The goal of this section is to introduce the user to the nuances of modeling topologies as a graph. • In addition to the generic constraints described in Section 9.1.5, Section 9.2 focuses on additional constraints that the topology model needs to satisfy so that it can be used with COMPRESS. • Section9.3 describes how protocols can be modeled as eCFSMs for use in COMPRESS, with emphasis on limitations imposed by COMPRESS on its syntax and semantics. • Section9.4 describes the concept of partial topology with examples • Section 9.5 gives a few initial points in the network protocol's eCFSM specification where the designer can start specifying the target behavior. The goal of this chapter is to enable the user to build a simple model so that the model can be used as inputs to the COMPRESS methodology. 157 9.1: Topology model As described in Section 3.1: , a network topology is modeled as a graph G(V , E), where V represents the nodes of the network and E represents the physical and virtual edges that may exist in the network. Additionally, each vertex has fields like type, addresses etc. To ensure a consistent network topology can be constructed, different types of local and global constraints are imposed on the neighborhood of each vertex (instance/type) as well as on the global topology instance. Rules need to be specified for packet delivery at all the layers of the network that the model encompasses. Using the above, partial topologies for message destination types must be constructed. In this section, each aspect of the topology model is described with emphasis on how the real world networks can be mapped to the syntactic structures required for COMPRESS. Knowledge of language theory, especially prepositional and first order logic greatly helps while reading this section. 9.1.1: Vertices The concept of vertices is straight forward as they are almost analogous to nodes in the network. For example, LANs, switches, base stations, end nodes, routers, hubs are represented as vertices. In the M&M case study, IP end nodes, base stations, LAN are modeled as vertices. To accommodate the single port constraint imposed by eCFSMs, each vertex is assumed to have a single network interface. Nodes in real networks that have multiple interfaces are modeled as a combination of vertices with single interfaces as shown in Figure 52. Figure 52: BS with two interfaces modeled as two vertices with single interface. 158 9.1.2: Fields Fields in a vertex represent variables that can hold type, address or any other relevant information about the vertex. For example, the type field determines if the vertex is a hub or a base station (wired or wireless) or any other types of node. The type field is assumed to be mapped to a particular value both in partial or fully specified topology. The address fields are used to hold different types of addresses of the node. For example, an IP end host will have a mac address, IP unicast,(link local and sub-domain) broadcast, multicast addresses. In fully specified topologies, the fields are mapped to a particular value, i.e., IP unicast → 192.168.0.1, IP broadcast → 192.168.0.255, link-local-broadcast → 255.255.255.255 etc. The use of these fields in partial topology is different from the way it is used in fully specified topology and is addressed in Section 9.4. A given vertex may have such fields for each layer of the network protocol stack. For example, an HTTP proxy used in the case study described in 7.1.3.3 HTTP, specified fields to capture the address from MAC to transport layer. The COMPRESS methodology assumes that fields are static. Changes to the value of fields must be restricted to a finite number of changes, as the changes are mapped to an enumerated set of static fields. For example, in the M&M case study, the IP address of the mobile node is assumed to change once during the lifetime of the study. To accommodate this change, the IP address field is split into two static fields namely, old IP address and new IP address. Further, the protocol eCFSM specification is modified to appropriately use the old and the new address fields. 159 9.1.3: Edges The concept of an edge is slightly more complicated compared to that of a vertex. In fully specified instance of topologies, only physical or some lowest level of abstraction of edge is used. For example, in the M&M case study, if an instance of topology is considered, it can only have physical wired or a physical wireless edge. However to represent communication between two vertices, virtual edges are used. A virtual edge exists between the two vertices which communicate with each other using some message destination type. For example, if two IP end nodes connected to the same LAN communicate via a message destined to an IP link local broadcast address type, then, a virtual edge of type IP link local broadcast, exists between the two vertices. At each layer of the network protocol stack, virtual edges analogous to the message destination types supported at that layer are assumed to exist (see Figure 5). As shown in Figure 53, consider a four node topology instance consisting of three IP end hosts all connected to a LAN. Let IP end host 1 communicate with IP end host 2 using a message destined to IP link local broadcast. This communication is represented as an IP link local broadcast virtual edge between host 1 and host 2 in the graph. Figure 53: Physical and virtual edges in an instance of topology. 160 For the M&M case study, edge types like IP link-local broadcast and IP multicast are modeled, as, these are the communication abstractions used by the protocol. Similarly, for the TCP case study, the TCP edge is modeled. 9.1.4: Packet delivery rules Packet delivery rules are an integral part of a topology model. Given a fully specified topology, packet delivery rules determine how a packet generated by an end host is delivered to its destination. To enable packet delivery, the following needs to be specified (a) How a message destination type at the highest layers are mapped to the message destination types supported by the lower layers? (see Figure 5), (b) Once a packet is generated, the vertex that can receive/forward that packet must be determined for every step until the destination is reached. In short, at the MAC layer, packet forwarding rules determine how a packet destined to a MAC unicast/broadcast/multicast address is delivered. At the IP layer, it is equivalent to how an IP end host or gateway or the router determines the next hop for each supported message destination type. Such rules must be specified for each message destination type supported at each layer of the network. Figure 54: Forwarding of unicast IP packet at IP layer. 161 In the topology models used in the case studies, at the IP layer, the IP end host always delivers the IP unicast packet to an IP gateway. The IP gateway may deliver the packet to an IP end host if it is on the same IP subnetwork, or else, forward it to a single vertex that does interdomain routing as shown in Figure 54. This is a simple transit stub topology where the entire transit network is represented by a single router. At the MAC layer, all MAC end hosts deliver packets destined to all MAC message destination types to either a LAN or a base station. The LAN may only deliver packets to base stations or MAC end hosts. The base station may deliver packets to either a LAN or MAC end host. While modeling the packet delivery rules, it is best to start with the highest layer of the network and work downwards. Once the packet delivery rules are formulated, its consistency must be checked using the path constraints described in Section 9.1.5 and additional constraints specified in Section 9.2. The packet delivery rules for each message destination type is encoded as an eCFSM state transition table and its characteristics are discussed in detail in Section 9.3. 9.1.5: Local, global and path constraints Constraints are imposed to limit the instance of graphs that can be constructed from the available vertices and edges. The types of constraints that can be specified is explained in Section 3.1: . The local, global and path constraints are used to express constraints that arise due to: • Constraints that general networks are expected to satisfy: The origin of this set of constraints may be in rules set by IETF specifications or in the common practice generally seen in today's networks. • Additional constraints that are imposed by the designer: This may be to ensure ease of model development and, to constrain the valid topology space expressible using vertices and edges, to a set with required properties. For example, the designer may only want to study LAN topologies with no more that three base stations. • Constraints that COMPRESS expects the topology models to satisfy: These requirements arise out of the constraints that COMPRESS requires to ensure termination and completeness of various steps of COMPRESS. Since the constraints imposed by COMPRESS also add constraints on 162 modeling protocols using eCFSM syntax, topology constraints imposed by COMPRESS is discussed as a part of a larger set of constraints imposed by COMPRESS in Section 9.2. Some of the above constraints can be mapped directly to the syntax that is described in Section 3.1: . However, constraints that are imposed to simplify model development may need to be modeled as a combination of local, global and path constraints, and, additionally, an iterative modeling of constraints and eCFSM protocols specification may be required before a satisfactory topology model can be constructed. Local constraints are specified in terms of type and cardinalities of vertex types that can be in the neighborhood of other vertex types and instances. IEEE specification of infrastructure mode of operation of 802.11 protocol requires the MAC end node to communicate only with a base station. This was expressed as local constraint on the neighborhood of MAC end host in the M&M case study. To keep the modeling complexity low for the M&M case study, constraints like (a) exactly one LAN or one base station can be in the physical neighborhood of a vertex representing a MAC end node (b) a LAN can only have other MAC end nodes or base stations in its physical neighborhood, and, (c) a base station can only have either a LAN or a MAC end node in its neighborhood, were imposed. This was done with the awareness that there is a trade-off between the modeling complexity and quality of answers that COMPRESS could generate. For example, if certain behaviors only occur on topologies with two LAN nodes, then such behaviors would be missed in the simplified model. Global constraints enforce the type/cardinality constraints on all topologies. For example, IETF protocol specification of multicast protocols requires that there be exactly one designated router per domain and one RP per group per domain. For the sake of modeling simplicity, in the enterprise wide resource discovery case study, only a single domain is modeled and is expressed as all vertices' global IP address belonging to the same IP domain. This simplification also translates to all valid network topology instances having exactly one DR. 163 Path constraints are required to make sure that communication between two vertices indicated by a virtual edge can be eventually mapped to a path at the physical layer. This is one of the basic requirements that all real world topologies necessarily satisfy. To understand path constraints, the mapping of message destination types across the layers of the network must be understood. If a TCP virtual edge is considered, it is analogous to understanding how a TCP session between a client and a server is routed at the IP layer and how each IP hop is routed within the MAC layer. Given a virtual edge representing communication at one layer, mapping of message destination types across layers shown in Figure 5, determines the type of edges that make up the path in the lower layer. For example, a TCP virtual edge is mapped to a IP unicast path and each hop of the IP unicast path is mapped to a MAC unicast path. Path constraint asserts that if there exists a virtual edge representing communication using some message destination type at some layer, then, there must exist at least one path made up of edges determined by the mapping of message destination types (Figure 5), between the same vertices at every layer of the network stack, all the way to the physical layer. Figure 53 shows an example where the IP broadcast edge is mapped to a MAC broadcast edge, which in turn is mapped to a physical path. 9.2: Constraints imposed by COMPRESS Section 9.1 describes some of the general rules that are required to build a topology model. However, this is not sufficient as the COMPRESS algorithm itself imposes further restrictions of its own to ensure completeness of the algorithm. The COMPRESS algorithm requires that the TCG be finite, i.e., • There must be a finite number of eCFSM types and each eCFSM type have a finite number of states and messages. This translates to restriction on packet delivery rules modeled as eCFSM state transition tables. The states of a routing protocol are dependent on the number of vertices in the network. From a syntactic perspective, a general eCFSM state transition table for routing cannot be specified as, every network of each cardinality may have as many state transition table entries as the number of vertices in the network. Thus, the network model needs to be simplified so that the routing table is representable as a finite sized eCFSM table using a finite number of 164 eCFSMs. In the TCP and HTTP case studies 14 , at the IP layer, (a) IP end host had only one routing table entry, i.e., the entry to a single default gateway., (b) gateways had only two entries in their routing table, one for delivering packets to the same IP sub-domain and the other for delivering packets to other IP subdomains. This translates to a simple transit stub routing model shown in Figure 54. This model can be enforced using local and global type/cardinality constraints on gateways and the inter-domain routing vertices. More complex transit stub models can be built as long as they can be finitely represented. • For each message destination type, all paths in all topologies that such a message may take must be finitely representable, i.e., the number of hops and the sequence of types of eCFSMs that the packet may transit through must be finite. This property is required to ensure completeness of compact representation of such paths using partial topologies. The desired property can be expressed using local constraints on vertices and path constraint on topologies. For example, in the M&M case study, constraints on the neighborhood of LAN are designed such that it cannot have another vertex of type LAN in its neighborhood. The assumed model for the M&M case study yields a finite mapping of virtual to physical edge as show in Figure 15. If such local constraints are not imposed, the forwarding rules cannot be expressed as a finite number of eCFSM in all cases, and further, there may be infinitely many sequences of eCFSMs of type LAN switch that the packet may pass through (see Figure 55). Even if complex MAC topologies are allowed, all such topologies must be representable by a finite number of eCFSM types each with a finite number of messages and states. Another place where such a restriction was imposed was in the HTTP load balancing proxy model. To ensure finiteness in the number of ways in which a client and server could be connected while communicating over HTTP, the number of HTTP proxies that can be in the neighborhood of any given HTTP client vertex was limited to a finite number. The selection of the number was driven by a very simple thumb rule: if a protocol is not designed to handle choice, giving it even the simplest choice will break the protocol. Thus, the number of 14 For simplicity of explanation routing between local and global IP name spaces is omitted here. 165 HTTP load balancing proxies that can be in the neighborhood of any given HTTP client was restricted to two. From case studies it is clear that this thumb rule yields good results. 9.3: Protocol model The formal protocol model is described in Section 3.3. Some of the important concepts that need to be kept in mind while modeling protocol are • Finite number of states and messages • Distinction between packet delivery and non-packet delivery eCFSMs The requirement to keep the number of messages and states in the eCFSM state transition specification finite, stems from the requirement to be able to build a finite transitive closure graph. The details of how this constraint can be handled is explained in the previous section. In cases where the state of the eCFSM Figure 55: Undesirable mapping of virtual edge to infinite physical paths. 166 involves an integer variable like a counter, the values that the state can take must be finitely enumerable. For example, if TCP like protocol is being modeled, the finite sequence numbers must be enumerated both in the message and states and all the interaction of the enumerated states and messages must be modeled. This arises due to the restriction of addition/subtraction operations not being supported in the pre/post condition predicates. There are many real world scenarios where the number of states is dynamically determined at run time or is manually configurable. TCP and HTTP load balancers are classic examples and the technique to handle such situations is explained in Section 9.2 eCFSM that perform packet delivery and eCFSM that do not perform packet delivery must be clearly distinguished for each layer and their respective properties (explained in Section 3.3.2: ) must be enforced. Distinguishing the packet delivery and non-packet delivery vertices aids in generating partial topologies for message destination types at various layers as explained in Section 4.3.4: . While doing this, the fact that a non-packet delivery vertex at one layer may be a packet delivery vertex at another layer must be kept in mind. For example, an IP gateway is a non-packet delivery vertex at the MAC layer, whereas it is a packet delivery vertex at the IP layer. The level of detail that a network model encompasses is left to the discretion of the designer. Using simplified versions of protocols reduces the size of the TCG, but it also abstracts out information. For example, in all the case studies, a trivial model of address resolution protocol (ARP) was used, but with the understanding that the model would not capture any behaviors resulting from the behavior of a possibly more complex model of the ARP protocol. However, the syntax or semantics of the COMPRESS algorithm and its inputs does not prevent the designer from including more complex version of the protocol. 9.4: Partial topologies The concept of partial topology is discussed in Sections 3.2 and 4.3.4. Understanding the difference between instance and type is critical to understanding how to model partial topologies. It is very similar to variable x that can to mapped to one or more elements in a set, say the set of natural numbers. Each element 167 in the set of natural numbers represents an instance. Simply put, the variable x represents all instances of natural numbers. By asserting additional constraints on x, the elements to which the variable may be mapped to can be restricted. For example, x = 2 y, where y can be mapped to any natural number, represents the set of all even numbers. In a topology, a vertex is said to be an instance when its MAC address 15 is mapped to a particular natural number. If the MAC address of a vertex can take any value, then the vertex is said to be a type. Additional restriction can be placed on the various fields of both instance and types of vertices. Partial topologies heavily use the concept of vertex types. For example, consider a vertex (v) instance with mac address → 1 and IP address → 192.168.10.1. Consider a partial topology which includes a vertex (v) as well as at least one DHCP server serving the same IP subnetwork in its neighborhood. On of the things to be noted while representing the DHCP server is that the fields of the DHCP server are not mapped to a value, i.e., they are represented as variables and the values that the variable may take are relative to the value of the fields of vertex (v). The MAC address of the DHCP servers will be different from the MAC address of vertex (v) and the domain-subdomain part of the IP address will be the same as that of vertex (v). A partial topology representing all ways in which two vertices sending and receiving a message at a given level of network abstraction can be represented without specifying instances of vertices. For example, consider the partial topologies show in Figure 56. The illustration on the left represents a partial topology, where one instance of the vertex is fully specified whereas the other vertex is partially specified. Let A and B represent the handles to the structure of representing vertices. Assignment of the MAC address of A makes that structure an instance of a vertex. However, for the vertex represented by structure B, an assertion about inequality of the MAC address ensures that B represents a partially specified vertex that is not the same as A. The assertion about equality of IP address subnet ensures that the partial topology represents all fully specified instances of topologies with at least two instances of vertices belonging to the same IP sub network. Finally, the last assertion ensures that the vertex represented by structure B is a 15 Specifying global IP address asserts an instance of vertex, whereas, specifying local IP address does not as more that one instance of the vertex may have the same local IP address. 168 DHCP server. An important aspect to be kept in mind is that if B were to be instantiated, then its MAC address could be mapped to any natural number except that of A. The same is true for the host component of the IP address as well. From this it is clear that a partially specified topology represents a very large number of instance of fully specified topologies. In Figure 56, the figure to the right represents a partial topology where neither of the vertices A or B are mapped to any particular instance. The only assertion that is made is that their MAC addreses are different, they belong to the same IP subnetwork and B is a DHCP server. This represents a larger class of topologies compared to the one on the left. Partial topologies representing message destination types are represented similarly without instantiating the vertex sending the message or the vertex receiving the message. One of the very important concepts that need to be kept in mind while formulating a topology model is that the number of partial topologies representing all ways in which a vertex sending a message and a vertex receiving a message for all destination type must be finite (see Section 9.2). Packet delivery rules, constraints on vertex neighborhood and path constraints must be used to enforce the finiteness constraint 16 . The procedure to generate partial topologies from protocol eCFSM and packet delivery rules are described 16 Finite choices for selecting next hop combined with finite path length ensures that there are only finite number of ways in which a vertex generating a message and a vertex receiving a message may be topologically connected. Figure 56: Examples of representing partial topologies. 169 in Section 4.3.4 for MAC broadcast message destination type. MAC packet delivery rules determine the finite types of vertices that any given vertex at the MAC layer can forward the packet to. Local constraints on neighborhood ensures that the number of hops are for all packet delivery paths in all topologies are finite. Since there are only a finite number of eCFSM types at the MAC layer, there are only a finite number of partial topologies for each message destination at the MAC layer. Similar rules can be applied to generate partial topologies for other message destination types at other network layers. Since COMPRESS algorithm deals with topology at the highest level of abstraction possible, virtual edges representing message destination types used for communication are used to represent partial topologies. Figure 15 is a typical example of a partial topology. 9.5: Target behavior The syntax of specifying target behavior is described in Section 3.5.2. Consider a sample target behavior specified as an eCFSM state transition (m0, a → b, m1) leading to (m2, p → q, m3). This type of specification means that, the message m1 generated by for the first eCFSM state transition transitively generates m2, which causes the second eCFSM state transition. Such sequences can be chained with the understanding that the message generated by the first transitively generates the second and the message generated by the second transitively generates the third and so on. The types of behaviors that may be expressed and the limitations that are posed by the specific syntax required by compress are discussed in Section 3.5 and 7.3.2 respectively. Determining how to specify target behavior is fairly straight forward. Protocol is usually designed to go through different steps to achieve its final goal. For example, in the M&M protocol, the protocol performs two main functions, namely, mobility detection and address updation. In such a protocol, the obvious target behavior is the results of the individual steps, i.e., was mobility detected after a handover or was address not acquired after a handover ? The obvious target behaviors can be further constrained by adding conditions to the message (consumed or generated) and state. For example, the vertex receiving the layer two handover signal can be constrained to not have a valid address in the beginning or the vertex receiving 170 the response from the DHCP server can be constrained to have a different IP address from that contained in the message. The resource discovery protocols have two distinct steps, namely the resource discovery step in which a message destination type specified by the protocol is used and the step in which an attempt is made to use the discovered resource. Similarly, the ECRIT protocol consists of several distinct steps which are conditionally chained together to achieve the protocol objective. Another obvious candidate for target behaviors is to check if topologies can violate the conditional chaining. Error states also tend to be of interest while specifying target behavior. Target behaviors pertaining to detection and response to errors can be specified. Similar constraints can be placed on error states to generate more constrained target behaviors. Another place in the eCFSM state transition table where target behaviors can be specified are places where the protocol transitions from using one message destination type to another message destination type. For example the resource discovery protocol uses IP broadcast to discover the resource and TCP to connect to the resource. Since, introduction of network devices that violate the end to end property change the assumption for the different communication abstractions (represented by message destination types), the transition points in the protocol's eCFSM become prime places for specifying target behaviors. The target behaviors used in TCP/HTTP, ECIRT and resource discovery case studies are prime examples. Since COMPRESS is unable to answer reachability questions precisely all the time (see Section 4.2), already reported erroneous behaviors of protocols are prime candidates for necessary topology conditions generation using COMPRESS, as, the characteristics of topology required for that target behavior to occur can be well characterized. Target behaviors studied in Section 7.1.2.1 were based on reported misbehavior of resource discovery behind NATs. 9.6: Construction of topology model used in M&M case study For the purpose of illustration, a topology model that was used in the M&M case study is constructed from scratch. The first step is to determine the vertices that need to be modeled. The next step is to model edges, followed by topology rules and partial topologies for message destination types. The protocol model for 171 M&M and the DHCP servers are shown in Section 7.1.1. Protocol model and target behaviors for this case study are described in detail in Section 7.1.1. The first step in formulating the topology model, as described in Section 9.1.1 and 9.1.2, is to recognize the required vertices and their fields. For the M&M case study, mobile node (wireless), DHCP servers and designated router (DR) are obviously required, as these are the vertices that host the set of protocols under test. At the MAC layer, each vertex needs a type field and an address field. Since wireless network is being modeled, MAC end hosts may be of type wired or wireless. Since MAC end hosts could be IP end hosts or IP packet delivery hosts, they need to have a IP type and address field as well. To build a local area network model, consider the commonly seen network configurations. Most networks tend to have hubs and switches which can be connected by physical cables. Hubs are generally stateless, whereas, switches are stateful. Generally, the states of a switch are used for routing packets at the MAC layer and the number of states depends on the size of the network. This violates the property that there have to be a finite number of eCFSM types as described in Section 9.3. However, a switch with limited number of states may be modeled. To keep the topology model simple, let all topology models be composed of mobile nodes, DHCP server, hubs and base stations. All topology instances are further assumed to have IP end hosts with global IP addresses. The next step in formulating a the topology model is to determine the edge types as described in Section 9.1.3. Wired edge and wireless edge are obviously required as there are wired and wireless MAC end hosts. Apart from this edge, other virtual edges analogous to the message destination types used by the protocol model is also required. From Section 7.1.1 it is clear that the protocol uses MAC broadcast and IP link local broadcast and IP multicast addresses. For the sake of simplicity, let all IP multicast addresses be mapped to MAC broadcast address. A wired edge may exist between a wired MAC end host and hub, a hub and the wired part of the base stations. A wireless edge may exist between wireless mac end host and the wireless part of the base station. A MAC broadcast edge may exist between any two MAC end hosts and an IP broadcast/multicast edges may also exist between any two MAC end hosts. 172 With the edges and vertices in place, the next step as described in Section 9.1.5, is to place restrictions on the neighborhood of each so that the required properties are enforced on all graphs. Consider a wireless mac end host. This can only be connected to a single base stations. For the wireless mac end host, the wireless neighborhood type and cardinality is set to base station and one respectively. For the wired MAC end host the type/cardinality constraints for wired neighborhood is set to HUB and one respectively. For the HUB, the wired neighborhood type is set to wired MAC end hosts and base station respectively. For the MAC broadcast and IP broadcast edge, the path constraint is added so that if there exists one of these edges between two mac end hosts, there must also exist a path composed of wired and wireless physical edges. Since it is assumed that there is exactly one DR per local area network, a global constraint that there can only be one DR in all instances of graphs is imposed. The next step is to formulate packet delivery rules as described in Section 9.1.4. These rules must work for all vertices and over all possible topology configurations. From the neighborhood constraints, it is obvious that all wired and wireless MAC end host respectively has a hub and a base station as its neighbor. Thus, the packets generated by MAC end hosts can only be picked up by a hub or a base station. Packets received by BS from any wireless end host is delivered to the HUB. All packets received by the HUB is in turn delivered to MAC end hosts and base stations. The base station in turn delivers the packet to the wireless end hosts in its neighborhood. The same rules are used for MAC and IP broadcast message destination types. MAC/IP multicast and unicast may be implemented as a filtering mechanism at the MAC/IP end hosts. The next step is to generate partial topologies for message destination types as described in Section 9.4. Consider a MAC broadcast message destination types. This packet can be generated by either a wired MAC end host or a wireless MAC end host. If the packet is generated by a wired MAC end host, then the only vertex that can receive that message is the HUB. Thus, on all topologies, the first hop after a wired MAC end host is a hub. The hub can deliver the packet to either a wired end host or to a base station, as there are 173 the only two types of vertices that can be in the neighborhood of the hub. If the next hop after the end host is a wired end host, the packet delivery chain ends there. If the next hop after the end host is a base station, then the only vertices that can receive that message are the wireless MAC end hosts in its neighborhood. Once the packet is delivered to the wireless MAC end host, the delivery chain ends there. Similarly a packet generated by a wireless MAC end host is always first delivered to a BS and then to the HUB. Then, the packet may either be delivered to the wired MAC end host or to the base station and then eventually to the wireless MAC end host. Thus, if one tries to trace the path a MAC packet may take, it will be limited to the five paths show in Figure 15 on all fully specified topologies that can be built within the realm of constraints discussed earlier. Using the packet delivery rules for other message destination types, similar partial topologies for all message destination types can be formulated. This simple topology model forms a starting point for a more complex topology model. For example, other vertex types like switches with finite states may be added. More complex forms of base stations may be modeled. When a new vertex type or edge or packet delivery rule is introduced, the topology constraints must be checked again for consistency and the partial topology for message destination types must be regenerated. 9.7: Chapter 9 summary This chapter gives the user of COMPRESS insights into how the real networks can be modeled using the syntax and semantics required by COMPRESS. This chapter also includes an example to demonstrate how topology model was formulated for the M&M case study using the rules discussed earlier in this chapter. 174 Chapter 10: Contributions and future work In this section, the contributions of this work are highlighted and followed up by a discussion of some of the potential directions that this work opens up. 10.1: Contributions Systematic study of effect of topology space on different network protocols and distributed applications has received little attention, baring a few protocols in the transport layer. Initial work in protocol testing emerged as a result of work in formal methods for protocol testing in telecommunication networks. However, telecommunication network services are almost entirely implemented in the network [loge99] and formal methods that evolved to serve systems with this property, developed techniques to verify state and message properties of a single node and its interaction with the network ([diet02], [mead03]). As observed in [diet02], variations in connectivity between the nodes of the system are not considered as this does not affect the services in telecommunication networks. Other formal methods developed for protocol testing focus on studying or testing for a desired property on a given topology. There are many examples in the recent past of network protocols or distributed applications breaking due to the introduction of a new topology component or configuration. In this study, the importance of topology generation in evaluation of protocol's specification is illustrated. The contributions of this work are multifaceted. The first contribution is the characterization of the topology generation problem and the approach taken to solve it. The second contribution is the COMPRESS methodology, especially the topology augmented TCG data structure and lastly is the set case studies that not only illustrate the applicability and usefulness of the COMPRESS methodology for network/distributed application protocol testing, but also reveal unexpected behavior in existing and proposed protocols is the presence of specific topology configurations. In this work, the problem of protocol testing when topology space is taken into account is first characterized. Based on this characterization, a methodology that can generate necessary topology 175 conditions for a given scenario/behavior to occur on static topologies is proposed. This is based on the novel combination of two basic ideas • Represent complex topologies as a composition of simple end to end topology component abstractions derived from packet destination types (e.g., IP unicast, IP broadcast, etc). • Use the information already embedded in the protocol FSMs to drive the composition of the end to end topology abstractions to obtain more complex, but relevant topologies. Decoupling the protocol FSM specification and the topology specification, enables the topologies to be reasoned about in a much more systematic manner. Further, deconstructing topologies into smaller components that can be composed to obtain more complex topologies simplifies the conceptualization and generation of complex topologies from simple components. This is true in spite of the size of topologies being small in terms of number of nodes and links. If a designer had to think about the composite topologies without having the benefit of composition, it is very unlikely that a designer would have been able to come up with the ones that COMPRESS generated. This is very apparent from the types of topologies that were generated in various case studies. A classic example of this is the topology that COMPRESS generated for the HTTP case study involving load balancing transparent proxies. Another good example is the M&M case study where having multiple DHCP servers serving more than one domain on the same LAN leads to unexpected behavior. Out of the dozen of papers addressing handover in mobility scenario, none of them explicitly or implicitly studied the protocol with topologies in which a LAN hosts more than one domain. Only one paper explicitly made an assumption that the network topologies they consider, does not include such topology configurations to ensure correctness of their algorithms. Another advantage of decoupling the FSM specification from the topology configuration enables a framework where the topology library can be extended as newer topology configurations are enabled due to the introduction of newer types of devices into the network. Further, this work lays some preliminary ground work for what could become a network topology language to further systematize network protocol design and testing. 176 The second contribution of this work is the COMPRESS methodology to generate necessary topology conditions for a target behavior to occur. The problem of generation of necessary topology conditions from both a practical and theoretical perspective has been motivated. Theoretically, solving both the reachability problem and the topology space coverage problem for communicating FSMs is undecidable. Hence, an approach that systematically tradesoff state space reachability to topology coverage is developed. Practically, necessary topology conditions can be used to evaluate the severity of the behavior and hence prioritize the fix to the protocol or topology. A representation to express complex topologies as a composition of partial topologies has been developed. Using message destination types of messages in FSM specification enables one to keep the representation of topologies and FSM separate, but with enough information to link the two together in useful ways. To drive composition of topology components to form more complex topologies without exhaustively solving the reachability problem, a compressed but complete representation of the state space using a topology augmented transitive closure graph has been developed. Initially, only static topologies are considered, but, in the later part of this work, it is shown how some of the limitation initially assumed can be relaxed by transforming external inputs to internal states. Though this method can express a finite number of external autonomous events like (crashes, hand overs, etc.), through case studies it has been shown that even finite events can have very big impact on the protocol's behavior and performance. For example, the ECRIT case study with finite mobility exposes a very serious design flaw in the protocol specification. The models have been extended to handle limited delay semantics to demonstrate the flexibility of this methodology. Though the algorithm proposed has exponential complexity due to the limitations in expressiveness of the language used to express the problems, through case studies it has been shown that the practical runtime complexity is fairly manageable. An augmented topology generation algorithm that can decrease the runtime complexity exponentially compared to the one initially proposed has been developed. There is room for more improvement in runtime complexity, especially when algorithms can be restricted to classes of FSM with specific properties. Compared to the existing work, this work brings a new capability to the field of network protocol testing as COMPRESS can generate necessary topology conditions based to target behavior. 177 The third contributions of this work is the results that were generated using COMPRESS to study various protocols. To evaluate the effectiveness and runtime complexity of the COMPRESS methodology twenty target behaviors spanning seven different protocols have been studied. The results from the case studies are summarized in Table 18. COMPRESS was able to generate necessary topology conditions for most protocols studied. The necessary conditions were also close to the sufficient conditions in most cases. The quality of the necessary conditions generated by COMPRESS was evaluated as the percentage of topology space that the necessary conditions eliminate. The highest effectiveness obtained was 99.4% elimination for the HTTP case study, i.e., if COMPRESS was used to generate necessary topology conditions before feeding the topologies to a state space enumerator to check for a target behavior, 99.4% of the topologies could be weeded out and only 0.6% of topologies needs to be explored for occurrence of the target behavior. From a run time Table 18: Summary of results of case studies. # of TB M&M Single BS 7 7 7 of 7 27.7 LAN 2 2 1 of 1 77.7 WAN 5 4 4 of 4 87.8 1 1 1 of 1 98.5 1 1 1 of 1 1 1 1 of 1 99.4 ECRIT IP based 5 5 4 of 4 47 Protocol Class Protocol In - stance TBs addressed by COMPRESS Existence of suf - ficient condi - tions AVG. Effective - ness of COM - PRESS Resource Discovery Client-Serv - er TCP (Same port, Same IP) TCP ( Different port, Same IP) 97.9 HTTP (Same port, Same IP) 178 perspective, eliminating topologies is useful as the state space enumerator needs to perform a complete and exhaustive state space enumeration to assert non-occurrence of a target behaviors. This significantly reduces the overall complexity as the more expensive state space enumeration needs to be performed only on a few select topologies. For each of the protocol's target behavior studied, it has been shown how necessary topology conditions can be used to evaluate the severity of the behavior and possible steps that could be taken to fix the problem. For example, the necessary topology conditions generated for target behavior #7 for the M&M case study though does not include commonly found topologies, the nature of the error itself requires a fix, whereas the necessary topology conditions generated for target behavior #4 does not require a fix as the necessary topology conditions encompass only a small subset of topologies which can lead to decreased performance. Using some of the target behaviors, the limitations of COMPRESS are highlighted. 10.2: Future work This work opens up many directions for future work. Some of them are direct extensions of the methodology, models and the framework, whereas, others deal with how this work relates to and can be adapted to today's fast evolving world. 10.2.1: Extend and augment COMPRESS framework In the process of developing the methodology and the topology abstractions, areas where more research is required to improve the capabilities of the topology generation methodology have been identified. One of the areas is developing efficient search algorithms in topology space so that, the most suitable partial topology branch is selected or an abstraction that encompasses all the branches, i.e., an abstraction that includes topologies in all branches is automatically generated when multiple candidate topologies are available for branching. Another area which will benefit from additional investigation into formalizing 17 and quantifying knowledge representation in topology abstractions so that one can manually or 17 Language theory, descriptive complexity [imme05] and the field of mathematical topology may yield useful clues to approaches for formalizing topology representation, abstraction and algebra. 179 automatically choose at runtime the abstract representation of the topology that is best suited for expressing the topology conditions required for that target behavior. This will enable easier design and evaluation of set operations of abstractions of topologies. Another area which requires additional work would be in developing behavioral abstractions of topological artifacts (for example, blocking NAT as well as a firewall configured to block connections) so that the branching factor of topology search can be reduced, thus increasing the runtime efficiency of the algorithm. Yet another extension would be to incorporate richer semantics of delay so that interesting and useful upper and lower bounds can be calculated when relevant. Though an attempted to incorporate delay semantics into the framework has been made, its applicability as it stands today is rather limited. The concepts explored in this work could also be extended or adapted for automating topology generation in the newer space time based architectures like Dynamic Network Renaming using Space-Time Contexts [ping07] which introduces the notion of versions and temporal references into protocol stacks, in which case, the path the packet takes in a network stack is not only resolved using the regular header, but also using temporal information. Such an architecture is designed with ease of migration and testing of new configuration in mind. Resource discovery protocols in such architectures will be more complex than existing ones as the search space is not only the existing topology space, but a cross product of topology space and time. For example, a host may query for the current IP address of a host name is the previous or some historical time epoch. The Recursive Network Architecture (RNA) [touc06] architecture could potentially support multiple fully connected virtual topologies on top of a physical topology, in which case automating topology generation for protocol testing would be an invaluable tool for protocol and distributed application designers. In short, as the complexity of the underlying network increases and as network designers and users create novel designs and uses automating topology exploration and generation will play a crucial role in protocol testing as well as network trouble shooting. One of the fundamental extensions that has be made to the COMPRESS methodology is to handle multi-port FSMs. Modeling a protocol that binds to multiple interfaces is fundamentally different from modeling a multiple access protocol. One is a multiple port multiple medium protocol whereas the other is a multiple port single 180 medium protocol. A generic model may be definable, but the effects such a model on completeness and expressiveness needs to be understood before such an exercise can be pursued. • Augmenting the transitive closure graph to enumerate all combinations of messages that could be received on the multiple ports. • Construction, reconciliation and operations as well as expressiveness of topology needs to be understood. The problem of topology generation for protocols with complex 18 state machines remains unsolved as topologies generated by COMPRESS rather unsatisfactory when the behavior of protocols is not only dependent on the low level details of the topology, but also on the cardinality of the messages and nodes. At this time, an approach different from COMPRESS is needed perform necessary condition generation for protocols whose behavior is strongly dependent on topology as well as cardinality of the elements in the topology and messages. Further, the topology generation for TCP like protocols whose behavior depends on quantitative feedback from the network as well as externalities like traffic, though extremely important, is not addressed in this work. 10.2.2: Relationship of design and testing to open processes The emergence of the Internet has fundamentally altered the way people communicate, collaborate, access and disseminate information. The cost of communication, collaboration and access to information is almost zero. The open source movement has innovated on process for building components and composing them. Almost negligible cost of obtaining components and creating compositions has enabled a massive distributed and parallel search for newer components and compositions. This invariably tests components as well as compositions under a very large set of environments that may lead to a type of environmental or contextual coverage not possible within a given tool or a lab environment. As people attempt to do newer things with components and compositions, it causes them to operate near their design envelopes exposing limitations and bugs. Near zero cost of communication and co-ordination enables collection of this 18 The author believes that few protocols will fall into the category where they depend on both on low level topology details and cardinality of elements and messages. 181 feedback from distributed sources that enables the components to be corrected and extended in interesting and useful ways. The pace of change is rapid in components and zero cost of compositions accelerates the emergence of newer, richer compositions. Mature components and compositions often embed the collective knowledge and experience of the community which goes a long way in weeding out bugs in components. The key takeaway is that testing is not a standalone operation and pushing components to operate at or beyond its design envelope is the norm and provides extremely valuable feedback that progressively improves the features and quality of components. This approach to system design and compositions opens up new possibilities for research that is unlike anything seen to date. One of the paths that can be pursued will look like design for * (DFX) where different objectives can be explicitly addressed as a part of system design itself. The other path would deal with the issues of enabling, augmenting and harnessing the power of massive distributed human computation to design and test systems on a continuous basis. The perceptual beta label that many of the recent Internet based services use is an indication of the use of the same. 10.3: Chapter 10 summary In this chapter discussed some of the contributions of this work as well as the important insights obtained from this effort are discussed. Various areas of COMPRESS that would benefit from additional research have been elaborated. Some of the fundamental extensions required to add new capabilities to perform topology generation on newer network architectures is discussed. Lastly, the problem of software/application testing is discussed in the context the new capabilities enabled by the open source movement as well as the emergence of Internet as a global low cost communication platform. 182 References [alur97] Rajeev Alur, Robert K. Brayton, Thomas A. Henzinger, Shaz Qadeer, and Sriram K. Rajamani.”Partial-order reduction in symbolic state-space exploration“, Proceedings of the Ninth International Conference on Computer-Aided Verification (CA V), Lecture Notes in Computer Science 1254, Springer-Verlag, 1997, pp. 340-351. [anag02] K. G. Anagnostakis, M. B. Greenwald, and R. S. Ryger, "On the sensitivity of network simulation to topology," in Modeling, Analysis and Simulation of Computer and Telecommunications Systems, 2002. MASCOTS 2002. Proceedings. 10th IEEE International Symposium on, 2002, pp. 117-126. [baif04] Fan Bai, Ganesha Bhaskara, Ahmed Helmy, "Building the Blocks of Protocol Design and Analysis Challenges and Lessons Learned from Case Studies on Mobile Ad hoc Routing and Micro-Mobility Protocols", ACM SIGCOMM Computer Communication Review (CCR),V ol. 34 No. 3 - July 2004. [bhas03] G. Bhaskara, A. Helmy, S. Gupta, " Micro Mobility Protocol Design and Evaluation: A Parameterized Building Block Approach ", IEEE Vehicular Technology Conference (VTC), October 2003. [bing04] Jesse Bingham, “A New Approach to Upward Closed Set Backward Reachability Analysis”, 6th International Workshop on Verification of Infinite-State Systems (INFINITY), September 2004. [burc92] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. “Symbolic model checking 1020 states and beyond”, Information and Computation, 98(2):142--170, June 1992. [chel09] David Chelimsky, Dave Astels, Bryan Helmkamp, and Dan North, “The RSpec Book: Behaviour Driven Development with RSpec, Cucumber, and Friends”, Pragmatic Programmers, Aug. 2009. [clar00] Edmund M. Clarke, Orna Grumberg and Doron A. Peled, “Model Checking”, MIT Press, January 2000, ISBN 0-262-03270-8. [diet02] Dietrich, F., and Hubaux, J.-P., “Formal methods for communication services: meeting the industry expectations,” pp. 99-120., V ol. 38, No. 1, Computer Networks, Jan. 2002. [ebra05] S. Ebrahimi-Taghizadeh, Ahmed Helmy, S. Gupta “TCP vs. TCP: a systematic study of adverse impact of short-lived TCP flows on long-lived TCP flows”, INFOCOM 2005: 926- 937 183 [ecri06] http://www.ietf.org/html.charters/ecrit-charter.html [emer00] E. Allen Emerson, Vineet Kahlon, “Reducing Model Checking of the Many to the Few”, 17th International Conference on Automated Deduction (CADE 2000), pg 236-254, Pittsburgh, PA, USA, June 17-20, 2000. [emer05] E. Allen Emerson, Thomas Wahl, "Efficient Reduction Techniques for Systems with Many Components", Symposium on Formal Methods (SBMF), Recife/Brazil, 2004. [fink01] A. Finkel and Ph. Schnoebelen, “Well-structured transition systems everywhere”, Theory of Computer Science, V ol 256 Num 1-2, 2001, issn 0304-3975, pg 63—92, Elsevier Science Publishers Ltd. Essex, UK. [guna07] R. Gunasundari and S. Shanmugavel, “Influence of Topology on Micro Mobility Protocols for Wireless Networks”, Information Technology Journal, V olume: 6, Issue: 7, pp.: 966-977, 2007. [heid01] J. Heidemann, K. Mills, and S. Kumar, "Expanding confidence in network simulations," Network, IEEE, vol. 15, no. 5, pp. 58-63, 2001. [helm00] A. Helmy, D. Estrin, S. Gupta, "Systematic Testing of Multicast Routing Protocols: Analysis of Forward and Backward Search Techniques", The 9th International Conference on Computer Communications and Networks (IEEE ICCCN 2000), pp. 590-597, October 2000. [helm04] A. Helmy, M. Jaseemuddin, G. Bhaskara, " Multicast-based Mobility: A Novel Architecture for Efficient Micro-Mobility , ALL-IP WIRELESS NETWORKS, IEEE Journal on Selected Areas in Communications (JSAC), 1st Quarter 2004. [helm04-1] A. Helmy, S. Gupta, D. Estrin, 'The STRESS Method for Boundary-point Performance Analysis of End-to-end Multicast Timer-Suppression Mechanisms', IEEE/ACM Transactions on Networking (ToN), V ol. 12, No. 1, pp. 44-58, February 2004. [henz00] Thomas A. Henzinger and Rupak Majumdar. “A classification of symbolic transition systems” Proceedings of the 17th International Conference on Theoretical Aspects of Computer Science (STACS), Lecture Notes in Computer Science 1770, Springer-Verlag, 2000, pp. 13-34. [holz97] G. J. Holzmann, “The model checker SPIN”, IEEE Trans. on Softw. Eng., 23(5):279--295, May 1997 [imme05] Neil Immerman, “Guest Column: Progress in Descriptive Complexity”, SIGACT NEWS 36(4) (2005), p. 24-35. 184 [isiu07] Information Science Institute, University of Southern California, University of California- Berkeley, Xerox PARC, Network Simulator, ns-2 Home Page http://www.isi.edu/nsnam/ns . [isma05] Ismaïl Berrada, et. al., “ Testing Communicating Systems: a Model, a Methodology, and a Tool ”, Page 111-128, V olume 3502/2005, Lecture Notes in Computer Science, ISBN978-3- 540-26054-7. [kara00] Bengi Karacali. “Simultaneous Reachability Analysis of Concurrent Systems”, PhD thesis, Department of Computer Science, North Carolina State University, 2000. [kuce99] A. Kucera. “On finite representations of infinite-state behaviors”, Information Processing Letters, V ol 70(1), pg 23-30, 1999. [lai02] R. Lai, “A survey of communication protocol testing”, Pages: 21 - 46 , V olume 62, Issue 1, Journal of Systems and Software archive, Elsevier Science. Inc, 2002. [leiy02] Yu Lei and Kuo Chung Tai, “Blocking-based simultaneous reachability analysis of asynchronous message-passing programs”, IEEE Int. Symposium on Software Reliability Engineering, 2002. [loge99] X. Logean, F. Dietrich, J. -p. Hubaux, S. Grisouard, P. -a. Etique, “On Applying Formal Techniques to the Development of Hybrid Services: Challenges and Directions”, pages 132- 138, volume 37, IEEE Communications Magazine, 1999. [mayr98] R.Mayr, “Decidability and Complexity of Model Checking Problems for Infinite-State Systems”, PhD thesis, TU-München, 1998. [mead03] Meadows C., "Formal Methods for Cryptographic Protocol Analysis: Emerging Issues and Trends", pp. 44-54, No. 1, V ol. 21, IEEE Journal on Selected Areas in Communication, January 2003. [opne07] [OPNET] OPNET Inc., OPNET Modeler—Accelerating Network R&D, http://www.opnet.com/products/modeler/home.html, 2007. [ozde97] Kadir Ozdemir and Hasan Ural., “Protocol validation by simultaneous reachability analysis”, Computer Communications, pages 772- 788, 1997. [peng95] W. Peng, K. Makki. "On Reachability Analysis of Communicating Finite State Machines,",cccn, p. 0058, Fourth International Conference on Computer Communications and Networks (ICCCN '95), 1995. 185 [pete03-1] L. Peters, I. Moerman, B. Dhoedt and P. Demeester, “Influence of the topology on the performance of micromobility protocols”, Proceedings of the workshop WiOpt'03 "Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks" (Sophia Antipolis, France, March 2003) pp. 287-292. [pete03] L. Peters, I. Moerman, B. Dhoedt and P. Demeester, Performance of micromobility protocols in an access network with a tree, mesh, random and ring topology. in: Proceedings of the IST Mobile & Wireless Communications Summit 2003 "Enabling a Pervasive Wireless World", (Aveiro, Portugal, June 2003) pp. 63-67. [ping07] Pingali, V ., Touch, J. Dynamic Network Renaming using Space-Time Contexts, Poster at IEEE INFOCOM, Anchorage, AK, May, 2007 [qual07] Qualnet, Qualnet Parallel Developer Documentation, http://www.scalable-networks.com/ [rich07] L. Richardson and S. Ruby, “RESTful Web Services”, ISBN-10: 0596529260, O'Reilly Media, Inc., May 2007. [salt84] J. Saltzer, D. Reed, D. Clark, 1984. End-to-end arguments in system design. ACM Trans. Comput. Syst. 2, 4 (Nov. 1984), 277-288. [scho98] Hans van der Schoot and Hasan Ural. “On improving reachability analysis for verifying progress properties of networks of cfsms.”, In Proc. 18th Intl. Distributed Computing Systems, pages 130-137, 1998. [stei05] Daniel H. Steinberg, Stuart Cheshire, “Zero Configuration Networking: A Definitive Guide”, ISBN 13: 9780596101008, First Edition, December 2005, O'Reilly Media, Inc. [touc06] Touch, J., Wang. Y ., and Pingali, V ., "A Recursive Network Architecture," ISI Technical Report ISI-TR-2006-626, October 2006 [youn89] M. Young and R.N. Taylor. "Rethinking the Taxonomy of Fault Detection Techniques", in Proceedings of the 11th International Conference on Software Engineering, pp. 53-62, Pittsburgh, PA, May 1989.
Asset Metadata
Creator
Bhaskara, Ganesha (author)
Core Title
Topology generation for protocol testing: methodology and case studies
Contributor
Electronically uploaded by the author
(provenance)
School
Andrew and Erna Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
03/09/2010
Defense Date
11/19/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
network protocol,OAI-PMH Harvest,protocol,protocol testing,systematic testing,testing,topology generation: computer network protocol
Language
English
Advisor
Gupta, Sandeep K. (
committee chair
), Govindan, Ramesh (
committee member
), Silvester, John (
committee member
)
Creator Email
bhaskara@usc.edu,ganesha@bhaskara.org
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2868
Unique identifier
UC1201005
Identifier
etd-Bhaskara-3399 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-294568 (legacy record id),usctheses-m2868 (legacy record id)
Legacy Identifier
etd-Bhaskara-3399.pdf
Dmrecord
294568
Document Type
Dissertation
Rights
Bhaskara, Ganesha
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
uscdl@usc.edu
Abstract (if available)
Abstract
One of the key steps in the design or refinement of an application or network protocol is, testing of the protocol for correctness and performance. Simulation and state space exploration are the most popular ways of testing protocols. Simulation tools like NS, OPNET etc., allow the designer to test/evaluate a protocol's performance on fully specified instances of topologies. State space exploration tools like SPIN allow the designer to test the protocol state space on fully specified instances of topologies for the desired properties. Correctness or (target) behavior of protocols can be expressed using designer specified reachability properties and standard structural properties (invariants on states), whereas, performance of protocols is often expressed as aggregate over states of the protocol (average throughput, average packet loss, etc.). The scope of this study is limited to the correctness of control or house keeping functions of protocols.
Tags
network protocol
protocol
protocol testing
systematic testing
testing
topology generation: computer network protocol
Linked assets
University of Southern California Dissertations and Theses