Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Deriving component‐level behavior models from scenario‐based requirements
(USC Thesis Other)
Deriving component‐level behavior models from scenario‐based requirements
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DERIVING COMPONENT-LEVEL BEHAVIOR MODELS FROM SCENARIO-BASED REQUIREMENTS by Ivo Krka A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2014 Copyright 2014 Ivo Krka Dedication To my parents, for dedicating themselves to making sure I have a life of limitless opportunities. ii Acknowledgments Six years seems like a long time, and spending six years to achieve the goal of becoming a well-rounded scholar, and to nally reach the milestone of getting that last signature on the graduation form requires dedication, sacrice, and hard work as well as a pinch of talent every now and then. This may sound like a clich e, but all that would still not suce if it weren't for the special people around me, the advisors who guided me with their knowledge and experience, the colleagues willing to listen and discuss whatever ideas came to my mind, and family and friends to give me unconditional support without much in return. Without them, not only would a doctoral degree be almost impossible to get, it would be worth a lot less. Because of this, it is my honor to use this opportunity to thank them for their contribution in making me a successful student, researcher, and I hope a better person. First, I want to thank my advisor, Neno Medvidovi c for everything he has done for me. When I started my endeavor at USC, I had only a vague idea of what software engineering research is and how good software engineering research is done. Under Neno's guidance, after the rst couple of semesters I already had several publications, and the number has steadily grown since. Working with Neno has been thrilling and inspiring, as he always kept me in awe with his insights, clever ideas, and communication skills that showed me iii what makes an academic \superstar". But even more than that, Neno was always ready to meet and listen, willing to stay up all the way until 4 a.m. conference deadlines, and able to provide full trust and plenty of creative freedom so I could pursue relevant problems I found interesting. Besides being an academic advisor, Neno has been a supportive and sincere friend. I will miss watching soccer games on campus together, and am looking forward to all of our future get togethers around the world. During all these years, I have shared oce space with the fellow members of the Software Architecture Research group. In alphabetical order, they are Daniel, Dave, Farshad, George, Jae, Josh, Reza, and Yuriy. When I arrived to the US for the rst time, Daniel was waiting at the airport and let me crash at his place for the rst couple of days, so I knew that my colleagues would also become my close friends. Guys, I have learned a lot from each one of you, and your excellence has pushed me to try even harder. I will always remember working on my rst papers with George and Yuriy, staying to work on class assignments till early morning with Josh, and grabbing Persian food with Farshad. Our oces in SAL were a place where good ideas were conceived, where we pondered about the future, and where we learned a lot about the dierent backgrounds and cultures we were coming from. Whether it was the daily back and forth dynamic, lunches at Parkside, grabbing coee at the Architecture Cafe, or fun trips to ICSE and FSE, I immensely enjoyed working and socializing with you. I wish you successful careers and lives, and promise to keep in touch. My time at USC also involved interacting with a lot of great people outside of the research group. Working at SAL, I found true friends in Isela, Lizsl, Sue, and Tip. Social gatherings with you guys often made my week. Walking into Lizsl's oce at random iv times to chat about anything from basketball to department gossip made trips to campus worthwhile even in those rare cases when I didn't have much to do. I would also like to thank Dr. Leana Golubchik for being another an outstanding research role-model as well as the Vice-Chair who supported the PhD Student Committee I was a member of, in our eorts to make the CS Department a real community. I will extend my thanks to the fellow members of that Committee, Erica, Furqan, George, et al. doing seemingly tedious chores was easy and fun in your company. One perk of being a doctoral student is the ability to travel. I had the pleasure of spending several months at a time at Google, Infosys in Bangalore, and University of Buenos Aires. I really want to thank my hosts Philo, Naveen and Srini, and Nicol as and Sebasti an , for making these industry and research visits eye opening and unforgettable experiences from both the professional and personal standpoint. Jae and I never thought we would nostalgically talk about our adventures in India, while I personally did not expect to fall in love with Buenos Aires. I made too many new friends in the process to name each one. Instead, I will thank the group of awesome InStep interns and the amazing members of the LaFHIS research group. The biggest sacrice I had to make to pursue my PhD studies was leaving my home in Croatia. I was afraid that the challenges ahead will be too great of a sacrice for what I was leaving behind. The bond that helped me the most to overcome the challenges was the special bond I have with my family. I owe my parents, Marija and Silvestar, everything I have achieved in my life, and I try to make them proud every day. My sister Silvija and my brother Mario have been my biggest supporters. It has been gratifying to see Silvija graduate from college and Mario marry my now sister-in-law Marina. And v hearing my nephew Karlo's voice for the rst time was one of the happiest moments in my life. Being away was extremely dicult at times, especially when my Grandpa Spiro passed away, but the dicult times made me appreciate even more the unconditional love that my Grandma Slavka, Uncle Jure, Aunt Mirjana, and other family have for me. After going through the process, I suppose I have the authority to say that the thrills of getting a PhD are sometimes canceled out with stress, so I must thank the many friends who kept me sane during hard moments and celebrated with me during good ones. Julie, I cannot thank you enough for providing your love and support for several years. Nick and Nakul, you made Downtown a fun place to live, and LAAC my favorite place to play basketball, relax, and make many new friends. Noelle, thank you for providing the support and encouragement at the end of my studies as those \last couple of yards" are always the toughest. Fei, Marija, Milan, and Srdan, thank you for making L.A. a place I will always look to visit again. My L.A. Croats, Dubravka, Ivan, Jelena, Melina, Slobodan, Ton ci, and Vedran, meals at Aroma or just grabbing coee with you made me feel like I was home and for that I will always be grateful. And, nally, I thank my many friends in Croatia for always being excited for my visits and their eorts to maintain and grow our bonds: Ante, Anton, Damir, Hrvoje, Ivan, Igor, Jak sa, Leo, Marin B. Marin S., Petar, Tomo, and Veljko. At the end, I want to say Thank you! once more to every person who was there for me, and apologize to those I failed to mention. And, looking back, these six years actually feel like nothing more than a blink of an eye. I will be forever grateful for the cherished memories, and will do my best to be as good of a friend as you have been to me. vi Table of Contents Dedication ii Acknowledgments iii List of Figures ix Abstract xii Chapter 1 Introduction 1 1.1 Problem Space 3 1.2 Insights and Hypotheses 8 1.3 Solution Space 11 1.4 Dissertation Structure 15 Chapter 2 Background 16 2.1 Scenario Specications 16 2.2 System Properties 19 2.3 Transition Systems 20 2.4 Running Examples 24 Chapter 3 Heuristic MTS Synthesis 32 3.1 Phase 1: Component Constraint Generation 34 3.2 Phase 2: Initial MTS Generation 37 3.3 Phase 3: Sequence Diagram Annotation 40 3.4 Phase 4: Final MTS Generation 44 3.5 Discovering Design Flaws 48 Chapter 4 Component-Aware Triggered Scenarios 53 4.1 caTS Syntax 55 4.2 caTS Semantics 59 4.3 From caTS to Component MTS Models 67 Chapter 5 Refinement Distribution Framework 71 5.1 Renement Types 73 5.2 Distributing Transition Renement 77 5.3 Distributing Execution-Tracking State Renement 79 vii 5.4 Distributing Fluent-Based State Cloning 82 Chapter 6 Trace-Enhanced MTS Inference 86 6.1 Preliminaries 88 6.2 Phase I: Synthesis of the Invariant-Based MTS 91 6.3 Phase II: Rening the Invariant-based MTS 93 6.4 Algorithm Extensions 97 Chapter 7 Evaluation 100 7.1 Hypothesis 1 { Heuristic MTS Synthesis 101 7.2 Hypothesis 2 { Component-Aware Triggered Scenarios 109 7.3 Hypothesis 3 | Renement Distribution Framework 123 7.4 Hypothesis 4 { Trace-Enhanced MTS Inference 133 Chapter 8 Related Work 145 8.1 Scenario Specications 145 8.2 Partial Behavior Models 148 8.3 Behavior Model Synthesis 149 8.4 Behavior Model Decomposition 152 8.5 Specication Mining 153 Chapter 9 Concluding Remarks 156 References 161 viii List of Figures 1.1 High-level view of the problem space addressed in this dissertation. 4 2.1 Elements of a sequence chart. 17 2.2 The rules for the parallel composition operator. 22 2.3 Web cache specication. 25 2.4 A uTS [84] for Customer Banking. 26 2.5 An eTS [84] for Customer Banking. 27 2.6 Specication eort beyond AccountVerication. 27 2.7 Behavior specication of Coee Machine system. 28 2.8 The requirements elicited for Coee Machine system. 29 2.9 The rened MTSs of Coee Machine system. 29 2.10 Five example StackAr invocation traces. 30 2.11 A subset of Daikon's program invariants on StackAr. 30 3.1 MTS synthesis algorithm phases. 33 3.2 Algorithm for deriving component-level constraints. 36 3.3 Client's component-level constraints. 37 3.4 Algorithm for generating an initial MTS. 39 3.5 Cache's initial MTS labeled with component states. 40 ix 3.6 The initial SD annotation steps. 41 3.7 Annotating Web Cache scenario annotated from Cache's perspective. 42 3.8 The SD value propagation steps. 43 3.9 Final MTS generation phase. 45 3.10 Steps from the initial MTS to the nal MTS for Cache. 46 3.11 Example scenarios with specication discrepancies. 49 4.1 caTS-specic scenario constructs. 56 4.2 AccountVerication as a caTS scenario. 57 4.3 RejectionBranch as a caTS scenario. 58 4.4 Example LTS models for Customer Banking components. 60 4.5 Initial preconditions for Customer Banking system. 61 4.6 Component-level MTS synthesis. 68 4.7 ATM 's MTS for AccountVerication. 69 5.1 Handling a transition renement in a system MTS. 78 5.2 A negative scenario incurred by rening CoeeMachine. 79 5.3 Handling execution-tracking state renement in a system MTS. 81 5.4 Part of the Transaction-tracking rened CoeeMachine. 83 6.1 Constructing of an invariant-based MTS. 91 6.2 The invariant-based StackAr MTS. 93 6.3 Rening the invariant-based MTS according to the traces. 94 6.4 The StackAr MTS rened with invocation traces. 96 6.5 A subset of StackAr transition invariants. 97 7.1 Philips TV scenarios: (top) original models; (bottom) caTS models. 112 x 7.2 Tuner1's MTS obtained from TuneV2. 115 7.3 Characteristics of the generated specications 120 7.4 Evaluation results for the dierent triggered-scenario languages. 121 7.5 The scenarios from the Philips TV case study [84]. 131 7.6 Eight applications that exercise the evaluated libraries. 135 7.7 Precision (P) and recall (R) comparison of the competing techniques. 138 7.8 Contractor models with and without TEMI-specic invariant lters. 140 7.9 Comparison of TEMI and Contractor ? algorithms in a noisy environment. 142 8.1 An MSD [47] for Customer Banking. 147 8.2 A merged trace obtained using Lorenzoli et al.'s algorithm [67]. 154 xi Abstract Use-case scenarios, with notations such as UML sequence diagrams, are widely used to specify the desired behaviors of a software system. Scenarios are often complemented with formalized system properties (e.g., event invariants). These intuitive requirements nota- tions only partially specify the system-to-be by prohibiting or requiring certain behaviors, while leaving other behaviors uncategorized into either of those. During early stages of a system's life cycle, engineers iteratively specify and elaborate the scenario-based require- ments by elaborating existing and eliciting new scenarios. In parallel, engineers design the system's software architecture, consisting of multiple independently running components, that should be consistent with and satisfy the elicited requirements. Although intuitive, the existing requirements notations allow engineers to specify be- haviors with unintended semantic side-eects. In particular, the current practices support reasoning about and specication of behaviors exclusively at the system level, in contrast to the fact that a system consists of interacting components. This runs the risk of arriving at an inconsistent requirements specication (i.e., one that is not realizable as a composi- tion of the system's components), which can prove costly if left unresolved. Furthermore, the lack of a direct mapping from requirements to a specication of components' behav- iors duplicates the specication eort as the same behaviors need to be specied both as xii a part of requirements and architecture specications. This also hampers the traceability that should ideally exist from requirements to the eventual implementation. To address the shortcomings of the current practices, this dissertation implements three strategies to enable transitioning from a scenario-based requirements specication to a set of component-level behavior models: (1) heuristically creating component MTSs from a system-level scenario-based specications, (2) enhancing the way scenarios are specied, and (3) mapping the renements performed on a system MTS to renements to-be-performed on component-level MTSs. The component models are specied as modal transition systems (MTS) | a partial-behavior modeling formalisms that accurately cap- tures the required, prohibited, and undened behaviors of the system components. The implementations of the three strategies are intended for dierent development contexts and work with dierent inputs: 1. A heuristic algorithm that synthesizes a set of component MTSs from a set of existen- tial scenarios and event invariants. 2. Component-aware Triggered Scenarios (caTS), a triggered-scenario language that en- ables expressing reactive behaviors of system components. 3. A framework that, given a system MTS renement based on a new requirement, prop- agates that renement to a set of component MTSs. The MTSs produced using these techniques can be used for automated analyses (e.g., requirements consistency checking) and requirements elicitation, while ensuring traceabil- ity and consistency between the requirements and architecture specications. To assist xiii traceability and consistency checking between the system specications and the even- tual system implementation, this dissertation proposes Trace-Enhanced MTS Inference (TEMI) algorithm that extracts component MTSs from the observed system executions. The proposed techniques have been theoretically evaluated to analyze their complex- ity, as well as to establish their correctness and completeness. The techniques have been applied on a number of real-world and automatically generated case studies. The results suggest that the generated MTSs accurately capture those component implementations that (1) necessarily provide the behavior required by the scenarios, (2) restrict behavior forbidden by the requirements specication, and (3) leave the behavior that is neither ex- plicitly required nor forbidden as undened. Furthermore, the proposed techniques help to detect potential specication aws as they are specied, correct the existing errors, and prevent future inconsistencies. The techniques also scale to larger system specica- tions than the prior state-of-the-art in terms of the running times required to generate component MTSs and the specication eort required to specify the desired behaviors. Finally, the performed evaluations conrm that the TEMI algorithm produces models of signicantly higher quality than the state-of-the-art in dynamic model inference. xiv Chapter 1 Introduction The prerequisite to success of a software system is for the system to satisfy requirements of its stakeholders. One of the postulates of software engineering, proven time and time again, is that a requirement omission or a awed design decision are orders of magnitude cheaper to x if caught early in a system's life cycle (e.g., before the system's imple- mentation) than later in the life cycle (e.g., after the system is implemented) [11, 99]. To improve the quality and adoption of a software system, the requirements should be collected during early stages of a system's life cycle and well before the system's im- plementation [97]. Similarly, software architecture, the set of system's principal design decisions [88], should be devised to provide appropriate abstractions and design idioms that help to eectively deal with the increasing complexity of modern software. A sys- tem's architecture typically denes the system's software components, the components' behavior, and the interactions between components. One of the daunting tasks faced by software engineers is ensuring that the system design is consistent with the system requirements. The modern software development 1 practices that specify the system's requirements and architecture iteratively and in par- allel (e.g., under the Twin Peaks process [75]) make this task even more challenging. In support of this task, the recommended requirements engineering practice is to collect, specify, and analyze software requirements iteratively and incrementally using precise yet straightforward and intuitive notations [97]. Use-case scenarios (e.g., UML sequence diagrams [76]) and system properties (e.g., event invariants and system goals [97]) | to which this dissertation cumulatively refers to as scenario-based specications | have become a widespread means of capturing software requirements across a multitude of development organizations, application domains, and problem types, sizes, and complexities. The scenario-based specications provide only a partial description of a system-to-be by requiring or prohibiting particular behaviors, which allows an engineer to focus on particularly interesting aspects of a system's behavior and to defer other decisions. For example, a scenario describes a single desired sequence of event exchanges between system components, whereas a safety goal proscribes some behavior as illegal. The component-oriented viewpoint found in scenarios has the poten- tial to directly tie the system requirements (i.e., the desired high-level use-cases) with the system architecture (i.e., the required component behaviors). Hence, scenario-based specications can support the eventual aims of the iterative requirements specication and architecture design: (1) to arrive at a correct, comprehensive, and expressive set of system requirements, and (2) to determine and accurately specify how the system's components must behave to achieve those requirements. This dissertation's focus will be on capturing appropriately those requirements that dene the desired interactions between the system components, and, in turn, mapping 2 those requirements to behavioral descriptions of the individual system components. The remainder of this chapter outlines the existing obstacles to moving from a requirements specication to a component-level behavior specication, lists the hypothesis tested in this dissertation, outlines the proposed techniques, and provides the roadmap for the remainder of the dissertation. 1.1 Problem Space Over the past two decades, researchers and practitioners have contributed to dierent areas of the target problem space depicted in Figure 1.1. The existing work sets the foundations for specifying requirements using scenarios (Requirements Specication in Figure 1.1), synthesizing component-level or system-level models from scenarios (Model Synthesis), and specifying and maintaining consistency between component-level and system-level models (Behavior Specication). Furthermore, the eventual system imple- mentation should be consistent with the produced architecture design models (i.e., the Behavior Specication): this comparison depends on availability of models that accurately capture the implementation-level behaviors. However, the existing techniques within each area of the problem space suer from shared limitations, which are discussed below. This dissertation aims to resolve these limitations. 1.1.1 Overview of Existing Solutions The existing spectrum of of scenario notations supports requirements of dierent strengths. In particular, the dierent scenario notations allow capturing weak existential statements (e.g., Statement 1: \a system use case has ATM receiving the conrmation and then 3 C1 C2 e1 C3 e2 e3 Pre1 e4 C1 C2 e4 C3 e2 e5 Pre2 e2 e6 preCondition(e1) = Prop 1 preCondition(e2) = Prop 2 preCondition(e3) = Prop 3 goal(G 1 ) = Prop a goal(G 2 ) = Prop b goal(G 3 ) = Prop c ... Requirements Specification Model Synthesis S1 S2 S3 S4 S5 S5' S4' Behavior Specification S1 S2 S3 C1 S1 S3 C2 S1 C3 S2 S2 S4 S5 System Figure 1.1: High-level view of the problem space addressed in this dissertation. returning the card") as well as strong universal assertions (e.g., Statement 1: \ATM shall return the card when the transaction is conrmed"). While UML sequence diagrams interpret scenarios primarily as existential statements, researchers have dened several triggered-scenario specication languages [27,47,48,82,84, 85] with the purpose of more strongly expressing a system's reactive behaviors. An event in a triggered scenario has an associated modality | existential, branching, or universal. The universal modality (e.g., Statement 2) denes the only allowed system response (card return) at a specic point in scenario's execution (following a conrmation). The branch- ing modality requires some behavior to be enabled at a given point, while allowing other possible behaviors. For example, Statement 3: \Bank shall be able to reject a transaction whenever it receives a request" ascertains the mandatory existence of a rejection behav- ior branch, while not prohibiting other behaviors (e.g., Bank's approval). Note that the selection of the notation to be used depends on factors such as the development context. For example, development with rapid prototyping may require only existential scenarios. By contrast, experienced engineers on safety-critical projects are more likely to leverage the full semantic power and analysis capabilities of triggered-scenario notations. 4 To avoid errors and to support requirements elicitation, the partial scenario-based requirements should be supported with techniques that help (1) to detect inconsistent requirements, (2) to validate and verify the system behaviors, and (3) to explore the cur- rently underspecied behaviors. Similarly, to assist architecture design, the requirements should ideally map to behavior descriptions of the individual, mutually interacting system components [88]. To this end, numerous techniques leverage scenario-based specications to synthesize behavior models of the system components or a single system-level model (e.g., [26, 31, 44, 96, 104]). These techniques synthesize machine-based models such as labeled transition systems (LTS) [68] and UML statecharts [76], which are used in Archi- tecture Description Languages [73] to specify behavior viewpoints (e.g., a Darwin ADL specication includes component LTSs [68]). Such notations can specify, at a chosen level of abstraction, exactly those behaviors that a component is required to exhibit (meaning that all other behaviors are prohibited). However, this overlooks the inherent partiality of the input specications: the above synthesis techniques return only one of the many possible models that satisfy the input requirements [94]. To capture the inherent partiality of the input requirements, researchers have proposed synthesizing partial-behavior models, such as modal transition systems (MTS) [61], from scenario-based specications [39, 40, 85, 91]. MTS is a formalism that distinguishes be- tween required, maybe, and prohibited state transitions as entailed by the requirements. Intuitively, a desired scenario sequence is represented in an MTS using required transi- tions, while MTS's maybe transitions should not violate the event invariants. A set of component MTSs can be composed into an MTS that captures the overall system be- havior [61]. Once the current requirements are captured with MTSs, the MTSs can be 5 validated, veried, and used to elicit additional requirements [85, 91]. In this context, addition of a newly elicited requirement renes the underlying MTSs by either upgrading some maybe behaviors to required or prohibiting them from the synthesized model. In addition to the analysis of the requirements and architecture, the component be- havior models can be used to check the consistency between the system's design and the eventual implementation. The consistency analysis requires implementation-level models that can be compared with the synthesized design models. To this end, a number of model synthesis algorithms have been designed to extract a behavior specication for an implemented system based on the observed system executions [7,8,19,41,66,80,100]. 1.1.2 Limitations of Existing Solutions The desired seamless mapping of a scenario-based specication to component-level partial- behavior models is not currently possible. This stems from the fact that the existing scenario notations and synthesis algorithms treat a system as a single entity, as opposed to treating it as a composition of independently running components. The specic limi- tations, detailed below, risk creating an inconsistent specication that cannot be realized as a composition of the given system's components [86, 96]. Discovering and xing such problems late in a system's life cycle can incur signicant costs due to the required rede- velopment (e.g., rearchitecting the system to avoid undesired implied scenarios [96]). Triggered-scenario languages can express stronger requirements than is possible with basic (existential) scenarios. However, the existing triggered-scenario languages are lim- ited in their expressiveness and semantics. The expressive limitations include (1) unneces- sary constraints on the use of dierent modalities in a scenario (e.g., existential modality 6 must be followed by universal [27]), and (2) inability to specify an event that is universal only for the components sharing that event. These limitations dictate elaborating the ini- tially elicited existential scenarios into a greater number of shorter, thus potentially less intuitive, and frequently overlapping scenarios of restricted modality. The semantic lim- itations of the existing languages stem from the monolithic, system-level interpretation, which runs the risk of unforeseen emergent behaviors once the system is implemented as a set of components [96]. The existing notations may in principle be used to cap- ture individual components' behaviors. However, this would require specifying separate \system-level" scenarios for each component, thus duplicating signicant modeling eort. The semantic limitations of triggered-scenario languages expand to model-synthesis al- gorithms. Researchers have explored directly synthesizing component models from exist- ing languages, but this has been shown undecidable [13], while heuristic solutions [12,46] risk misinterpreting or overlooking the true stakeholder intent. The algorithms that synthesize a system-level MTS from the input requirements [39, 40, 85, 91] may produce models with behaviors that are unachievable in a composition of independently running components. Furthermore, the direct synthesis of a system MTS may become impractical for complex systems due to the resulting model sizes. Sibay et al. [86] aim to resolve the problems related to synthesizing only a system MTS by decomposing a system MTS into a set of component MTSs. However, their work proves that a system MTS need not be distributable (i.e., transformed into a set of component MTSs whose composition is exactly the system MTS). Sibay et al. identify a set of conditions for a system MTS to be distributable, and outline a distribution algorithm for such distributable MTSs. However, even if a system MTS obtained at some 7 point of the specication process is distributable, any new requirement may change that. In turn, this would force an engineer to either perform costly manual changes to make the system MTS distributable or to keep working with an inconsistent specication. Finally, even if one goes with the assumption that a set of correct component-behavior models is eventually obtained, these models still cannot be used to check the consistency of an implementation. This is because the existing techniques that synthesize behavior models from observed executions suer from inaccuracies [64, 78, 79]. Recent research results [22, 66, 67] provide preliminary evidence that some of these inaccuracies may be circumvented by combining the information about the observed event sequences with the information about a component's internal state. The previous eorts, however, have combined this information only in a limited manner. For example, the state-of-the- art technique [67] constructs FSM-based protocol descriptions by combining (1) FSM inference from observed executions and (2) invariants over the library's internal state, but considers only limited subsets of executions, thus still suering from similar inaccuracies. 1.2 Insights and Hypotheses This dissertation proposes a suite of techniques that test the following hypotheses to address the limitations of the existing solutions described in Section 1.1. Each hypothesis, associated to a subsequently proposed technique, relies on a set of insights. 1.2.1 Hypothesis 1 Insight: Existential scenarios do not convey the system and component states from which the specied sequence is required to execute. To infer those states, the potentially 8 available specication of the system event invariants can be utilized. Assumption: A component will be able to execute its respective part of a scenario start- ing from a given component state if and only if, starting from that state, the scenario execution does not violate the event invariants. Hypothesis A: A technique that synthesizes component MTSs can be devised such that each synthesized MTS (1) allows all behaviors that follow the component's event invari- ants, (2) excludes all behaviors that violate the component's event invariants, and (3) contains only those required transitions that can be mapped to a scenario event under the above Assumption. Hypothesis B: The component-level synthesis will scale to specications that cannot be handled by a similar system-level approach due to the size of the system being specied. 1.2.2 Hypothesis 2 Insight: The existing triggered-scenario languages do not account for the fact that two components sharing an event can have dierent restrictions related to that event. For example, only one of the components may be prohibited from executing any event other than the scenario event. Insight: The individual scenarios in a scenario-based specication often repeat similar subsequences because the existing languages lack constructs to tie the execution of a sce- nario to specic component states. Hypothesis: A component-oriented triggered scenario language can be dened such that (1) it has a manageable number of new constructs compared to existing notations, (2) it 9 prevents specication errors that occur when the existing triggered-scenario languages are used, and (3) it leads to a more compact specication with at least 50% fewer scenarios needed to capture the requirements, compared to the existing languages. 1.2.3 Hypothesis 3 Insight: A state in a system-level MTS corresponds to a tuple of component-level MTS states, where each element in the tuple corresponds to a single state of each component. Insight: A new system-level requirement modies the previously synthesized system- level MTS by splitting the MTS states, duplicating MTS states, making previously maybe transitions required, or removing previously maybe transitions. Hypothesis: Renements of a system MTS based on a new requirement can be classied into a nite set of types. A technique can be devised that maps system renements of a certain type to renements of the same type at the component level. Such a technique will create a distributable system MTS that renes the original system MTS and restricts the system's behaviors per the new requirement. 1.2.4 Hypothesis 4 Insight: The inaccuracies present in the existing techniques for synthesizing models based on observed implementation-level executions can be circumvented by combining the observed invocation sequences and internal-state information. Insight: The observed invocation sequences are similar to scenario sequences, while 10 inferred method invariants [36] are similar to design-time event invariants. Hypothesis: Synthesis from scenario-based specications can be adapted to work with runtime artifacts in a way that yields a new technique with improved precision and recall compared to the existing implementation-level state-of-the-art synthesis techniques. 1.3 Solution Space To test Hypotheses 1{3, this dissertation explores three strategies to enable the transition from a scenario-based requirements specication to a set of component-level MTSs: (1) enhancing the way scenarios are specied, (2) heuristically creating component MTSs from a system-level scenario-based specications, and (3) mapping the renements per- formed on a system MTS to renements to-be-performed on component-level MTSs. To test Hypothesis 4, the strategy to heuristically create component MTSs from scenario- based specications has been modied in non-trivial ways to work with the observed system executions. This dissertation proposes a suite of techniques that implement these strategies. Next, the techniques are overviewed in terms of their aims and contributions with respect to the problem space. 1.3.1 Proposed Techniques Heuristic component MTS synthesis. Software engineers need formal, component- oriented models that capture the inherent partiality of early specications in order to avoid inconsistencies and reason about implications of the system's decomposition to in- dependently running components. To this end, this dissertation proposes an algorithm that derives component-level MTSs from (1) existential system-level scenarios, captured 11 as UML sequence diagrams, and (2) system-level properties, captured as event invari- ants specied in Object Constraint Language (OCL), a widely language for specifying formal properties. At a high level, the applied heuristics utilize the invariants to enhance the existential sequence diagrams with state annotations. These annotations denote the component states traversed in the scenario and are extracted from the properties. component-aware Triggered Scenarios. To address the limitations of the existing triggered-scenario languages, this dissertation introduces a novel language | component- aware Triggered Scenarios (caTS) | that enhances prior art. caTS inherits the concepts found in existing languages, with several added constructs and semantics dened at the component level. These additions allow an engineer to specify component-specic be- havior restrictions and obligations that would otherwise remain underspecied or require signicant added eort (e.g., specifying temporal properties outside of scenarios). For example, consider the statement \Statement 4: Proxy must react to Bank's transaction conrmation by forwarding it to ATM; in turn, ATM expects to receive either a con- rmation or a rejection." Unlike caTS, existing scenario languages cannot adequately distinguish between Proxy's universal behavior and ATM's two enabled behaviors. Renement distribution framework. This dissertation proposes a framework that interprets the renements performed on a system model in response to a new requirement in terms of the renements of the constituent component models. The framework is built around renement types that have been derived by studying how a new system-level requirement |be it a scenario, a safety goal or an event invariant| changes a system MTS. For each renement, the framework uses the mapping between the system model and the component models to identify the component states that need to be rened in 12 response to rening the system states. An engineer using the framework does not need to be aware of the underlying MTS renements: the outcome of the renement process can be captured as a set of additional requirements that ensure specication consistency. Trace-enhanced MTS inference. To improve behavior model inference for imple- mented software, this dissertation introduces trace-enhanced MTS inference (TEMI), a non-trivial enhancement of the heuristic MTS synthesis algorithm. The specic enhance- ments include: (1) a renement strategy adapted to more exhaustive runtime information and (2) extensions and lters for handling dynamically inferred method invariants [36]. Intuitively, TEMI infers program invariants and uses them to produce an MTS of a li- brary's inferred and observed executions. 1.3.2 Contributions The high-level contribution of this dissertation is a suite of techniques that improve the mapping and consistency of scenario-based requirements to architectural behavior models, and to the eventual system implementation. As they work with dierent inputs, selecting the appropriate technique depends on the development context, which can involve diverse factors such as restrictions on adopting new notations and the required level of formality. The specic individual techniques' contributions are as follows. Heuristic MTS synthesis. The performed evaluations conrm that: (1) The component-level MTSs produced through heuristic MTS synthesis capture exactly those behaviors that do not violate the event invariants. (2) Heuristic MTS synthesis correctly captures the input scenarios. (3) Heuristic MTS synthesis is an algorithm that can be used to assist reasoning about and elaboration of the system requirements. (4) The generated 13 artifacts help to expose potential requirements and design aws, including misspecied scenarios and overly restrictive invariants. (5) The synthesis algorithm scales well, to very large system specications. component-aware Triggered Scenarios. This dissertation formally denes the semantics of caTS, and operationalizes them with a synthesis algorithm that constructs a set of component-level MTSs from a caTS. caTS has been applied on existing requirements specications and quantitatively evaluated on generated specications. The performed evaluations rearmed the following contributions: (1) caTS is a specication language that correctly captures the requirements while avoiding inconsistencies present in the existing approaches. (2) The synthesis algorithm that generates a set of component MTSs from caTS is correct and complete. (3) caTS leads to a more concise, and thus modeling-eort saving, specication to express behaviors that were previously specied ambiguously and across a signicantly larger number of scenarios. Renement distribution framework. The framework has been formally analyzed and applied on a case study found in related literature [84, 89]. The main contributions of the framework, supported by the collected results, are: (1) A characterization of the fundamental steps performed when a new requirement is added to a synthesized system MTS. (2) A sound and correct way of propagating the system MTS renements to com- ponent MTSs that nally produces a system MTS that is distributable and captures the requirement driving the process. (3) Reduction of inconsistencies that stem from the monolithic system viewpoint when using the existing notations. Trace-enhanced MTS inference. TEMI has been evaluated on nine open-source libraries with the following contributions: (1) TEMI signicantly improves the recall of 14 the inferred models, while maintaining (or, in rare cases, minimally reducing) the already high precision of state-of-the-art techniques. (2) TEMI is applicable to little-used and poorly-known libraries. (3) TEMI is insensitive to potential noise in the algorithm's inputs that are derived from incomplete runtime information. 1.4 Dissertation Structure The remainder of this dissertation is structured as follows. Chapter 2 provides the back- ground information and terminology necessary to understand the proposed techniques. Chapters 3{6 present the four proposed techniques outlined in Section 1.3. Chapter 7 presents the evaluation results in support of the dissertation hypotheses from Section 1.2. Chapter 8 provides details on the related work. Finally, Chapter 9 concludes with a summary of the contributions and an outline of future work. 15 Chapter 2 Background This dissertation relies on terminology and concepts that are similar to those one used in related work. This chapter denes scenarios (Section 2.1) and formal properties typically used to augment scenarios (Section 2.2). The chapter also denes modal transition sys- tems that are used to capture the partial behavior of software components (Section 2.3). Finally, the chapter introduces a set of examples that are used to illustrate the proposed techniques in the remainder of the dissertation (Section 2.4). 2.1 Scenario Specications A system's use-cases are often specied with sequence charts. A basic sequence chart consists of vertical lifelines that represent component instances, and labeled arrows be- tween the lifelines that represent interaction events. This representation is shared by the widely used notations, Message Sequence Charts [49] and UML sequence diagrams [76]. An example scenario, depicted in Figure 2.1, contains annotations of the scenario building elements. The locations between adjacent events along a component's lifeline dene that component's scenario execution steps. In the simple banking scenario from Figure 2.1, 16 Proxy passIncorrect Bank verifyWithBank component instance event reception event sending lifeline scenario location Figure 2.1: Elements of a sequence chart. Proxy rst sends a verication request verifyWithBank to which Bank responds with an event passIncorrect. The events on a single lifeline are totally ordered, while the overall scenario sequence is partially ordered. A component instance in a basic sequence chart is dened similarly to Uchitel et al. [95]. Denition 1 (Component Instance). A component instance CI in a scenario is a tuple (L, ,, ) where: L is a nite set of locations, is the event alphabet, LL is a total order of locations, where for each two locations l and l 0 either ll 0 or l 0 l; suc(l) =l 00 denotes the location immediately following l, and :L! is a labeling function that denes the event following a location. Denition 2 (Sequence Chart). A sequence chart Scen is a set of component instances such that for every CI i ; CI j 2 Scen, if CI i and CI j share events e 1 and e 2 , where e 1 = CI i :(l i ) = CI j :(l j ) and e 2 = CI i :(l 0 i ) = CI j :(l 0 j ), then CI i :l i CI i :l 0 i if and only if CI j :l j CI j :l 0 j . The condition specied in Denition 2 ensures that the components sharing multiple events have matching orderings of those events. Note that, consistent with the existing work in the area, this dissertation uses the term scenario sequence to refer to sequences of events (l) associated with their respective locations l. 17 In its basic form, a sequence chart just illustrates one event ordering, leaving the relation to a system's overall behavior implicit. Several ways of describing how a scenario relates to an overall system execution have been proposed. One approach is to dene the full system behavior as an ordered sequence of basic sequence charts (e.g., high-level Message Sequence Charts [95]). However, specifying the full system behavior at once is seldom feasible under iterative development practices. This dissertation studies triggers as a way to iteratively and incrementally express the legal system executions. The triggered scenario concepts can be conveniently described on the example of Sibay's existential Triggered Scenarios (eTS) and universal Triggered Scenarios (uTS) [84]; eTS and uTS are similar to other popular triggered-scenario languages such as Harel's Live Sequence Charts (LSC) [48]. Denition 3 (Triggered Scenarios). A Triggered Scenario Scen is a pair of basic se- quence charts (Pre, Main) where cond is a uent propositional logic condition, Pre is the prechart, and Main is the main chart. A scenario's modality can be branching (eTS) or universal (uTS). A Triggered Scenario consists of a prechart and a main chart. A prechart species an event sequence whose execution triggers the main chart; a prechart can also have a precondition dened on uents. A main chart captures an event sequence triggered by the prechart. An eTS implies that a system should have a behavior branch that follows the main chart after each execution of the prechart; an eTS also allows other behaviors to follow the prechart instead. By contrast, a uTS implies the stronger condition that each execution of the prechart must be followed by the execution of the main chart. The concept of a triggered scenario can be extended to negative scenarios, which imply that 18 an event must not happen after any execution of the prechart. Each scenario has an associated alphabet, which is equal to or a subset of the system's event alphabet. As an example, the uTS CoeeScenario from Figure 2.8 states that selection of coee (event coee) must be followed by a cash payment. The notation used in this dissertation for triggered scenarios has existential events denoted with dotted arrows, branching events with dashed arrows, and the universal events with solid arrows. 2.2 System Properties Propositional and temporal logic formulas are often used to express a system's high- level properties, preconditions and triggers [97]. In this dissertation, component and system state and domain variables are modeled using simple Boolean propositions called uents [44]. Fluents are commonly used to dene formal properties, invariants, and triggers related to event-intensive specications (scenarios, MTSs) [26,91]. Denition 4 (Fluent). A uent is a triple (I ,T , Init ) whereI is a set of initiating events, T is a set of terminating events (I \T =;), and Init is the initial logical value of (true or false). The value of a uent can be established at each point of system execution based on the event sequence that has been executed to that point. A uent is set to true when an event from the initializing set occurs (I ) and is set to false when an event from the terminating set occurs (T ). Formally, the value of after a sequence ! = e 1 :::e n is true if and only if one of the following conditions holds: 1. Init ^ (8i2 [1;n]:e i = 2T ) 19 2. 9i2 [1;n]:(e i 2I )^ (8j2 [1;n]:i<j)e j = 2T ) Informally, a uent holds when it is initially true and never terminated or if it is initialized at some point and has not been terminated since. Propositional formulas on uents are de- ned using standard Boolean operators, while Fluent Linear Temporal Logic (FLTL) [44] properties are specied using standard temporal constructs (X { next, G { globally, U { strong until). As a convention, a uent with the same name as a system event is initialized with that event and terminated with any other event. Note that specifying propositional uent formulas can be done similarly in other property specication languages such as the widely used Object Constraint Language (OCL) [76]. To incorporate state information into a sequence chart, the sequence chart may be annotated with references to states of the involved components. Denition 5 (Component State). Component state cs is a tuplehst 1 ;:::; st n i of values for the uents 1 ;:::; n whose initiating and terminating events are elements of the component's alphabet. 2.3 Transition Systems A labeled transition system (LTS) is a nite state machine formalism that labels tran- sitions to specify which events can occur in each state. LTSs are often used to model the required behavior of a software component. A modal transition system (MTS) [61] generalizes LTS with maybe transitions that are currently neither explicitly required nor prohibited, in addition to the required transitions used to model the required behavior of a 20 software component in an LTS [68]. For example, the MTS ChargingStation (Figure 2.7) has a maybe transition putCard? from s 1 to s 2 . Denition 6 (Labeled Transition System). A labeled transition system L is a tuple (S, A, , s 0 ), where S is a nite set of states, A is a nite set of actions, (SAS) is a transition relation, and s 0 is the initial state. Denition 7 (MTS). A modal transition system M is a 5-tuple (S, A, r , p , s 0 ), where S is a nite set of states, A is a nite set of actions, r (SAS) is a required transition relation, p (SAS) is a potential transition relation, r p , and s 0 is the initial state. In the above denitions, the set of potential transitions is the union of the disjoint sets of required and maybe transitions. Maybe transitions are the transitions from the set ( p n r ). The notations l ! s 0 is used to represent a transition froms tos 0 labeled with l, where 2fr;m;pg denotes required, maybe, and potential transitions, respectively. Any transition that is not potential is considered prohibited. Intuitively, rening an MTS involves converting some of the maybe transitions into required or removing maybe transitions from the model based on additionally elicited requirements. An MTS N is a strong renement of M (denoted M S N) if N has all of the required behavior of M, and no potential behavior that is prohibited in M. Denition 8 (Strong Renement). For MTSs M and N, a strong renement relation is a binary relation RS M S N , (s M0 , s N0 )2R, such that for each (s M , s N )2R: 1.(8l,s 0 M )(s M l ! r s 0 M )(9s 0 N )(s N l ! r s 0 N ^(s 0 M ,s 0 N )2R)) 2.(8l,s 0 N )(s N l ! p s 0 N )(9s 0 M )(s M l ! p s 0 M ^(s 0 M ,s 0 N )2R)) 21 RT mi l !rmj (mi;n k ) l !r(mj;n k ) l= 2A N MT mi l !mmj (mi;n k ) l !m(mj;n k ) l= 2A N MR mi l !mmj^n k l !rno (mi;n k ) l !m(mj;no) RM mi l !rmj^n k l !mno (mi;n k ) l !m(mj;no) MM mi l !mmj^n k l !mno (mi;n k ) l !m(mj;no) RR mi l !rmj^n k l !rno (mi;n k ) l !r(mj;no) Figure 2.2: The rules for the parallel composition operator. The denition includes the notion of rening a state in the more general model into multiple states in the more specic model. The renement is valid as long as the new states have all the required transitions of the original model (condition (1) in Denition 8), and do not introduce potential transitions that were prohibited (condition (2)). MTS renement has also been dened for MTS models with dierent alphabets [39,93]. The goal of specication renement is to eventually arrive at a model that contains only required behavior | i.e., an LTS, referred to as an implementation. Denition 9 (Implementation). An LTS I = (S I , A I , I , I , i 0 ) is an implementation of an MTS M = (S M , A M , r M , p M , m 0 ), if and only if M S I. The set of M's implementations is dened as Im(M) =fIj M S Ig. MTSs of the system components are composed into a system MTS using a parallel composition operator. This operator synchronizes the component MTSs on transitions labeled with shared actions, while independently progressing on non-shared actions. Denition 10 (Parallel Composition). Let M and N be MTSs, where M = (S M , A M , r M , p M ,m 0 ), andN = (S N ,A N , r N , p N ,n 0 ). Parallel composition (jj) is a symmet- ric operator producing an MTS MjjN = (S M S N , A M [A N , r , p , (m 0 , n 0 )), where r and p are the smallest relations that satisfy the rules in Figure 2.2. When two MTSs synchronize on a shared action, the parallel composition operator creates a required transition only when both of the synchronized transitions are required 22 (rule RR in Figure 2.2). Otherwise, the action synchronization produces a maybe transi- tion (rules MM, MR, and RM). A component's required transition on a non-shared action is present in the system model as required (rule RT), while a maybe transition labeled with a non-shared action is present in the composite model as a maybe transition (rule MT). The merging operator (+) [39, 40, 93] combines MTSs obtained from the dierent requirements into an MTS that satises all of those requirements. Distribution of a system LTS tries to create a set of component LTSs whose compo- sition is behaviorally equivalent (denoted) to the system LTS; for a non-distributable system LTS, the composition of component LTSs exhibits implied scenarios [96]. Distribu- tion of a system MTS aims to obtain a set of component MTSs such that the distributable LTS implementations of the system MTS can be composed from the LTS implementations of the obtained component MTSs. Denition 11 (Distributable LTS). Given an LTS I = (S, A, , s 0 ), and an alphabet distribution = fA 1 ,:::, A n g, I is distributable if there exist LTSs I 1 , :::, I n with alphabets A 1 ,:::, A n , respectively, such thatjj i2[n] I i I. Denition 12 (Complete and Sound MTS Distribution). Given an MTSM = (S,A, r , p ,s 0 ) and an alphabet distribution =fA 1 ,:::,A n g, a sound and complete distribution of M over are MTSs M 1 , :::, M n with alphabets A 1 ,:::, A n such that: 1. for all LTS I 1 , :::, I n , if M i I i then Mjj i2[n] I i 2. for everyI, whereMI andI is distributable and deterministic, there areI i for which M i I i andjj i2[n] I i I Sibay et al. [86] have identied a necessary condition for an MTS M to be dis- tributable: the LTS I obtained by changing all of M's maybe transitions to required 23 must be distributable. The procedure to check whether an LTS is distributable is to project the LTS to each component's alphabet, compose the obtained projections, and check whether this composition generates the same language as the original LTS. Note, however, that Sibay et al. prove that MTS distribution is not complete: not every system MTS can be distributed to an appropriate set of component MTSs. 2.4 Running Examples To motivate and illustrate the proposed techniques, this dissertation uses four example system specications. These specications cover a wide range of input information: sce- narios of dierent strength from existential to universal, event invariants, safety goals, dynamically collected traces and invariants. One of the examples also involves MTS synthesis using existing algorithms. Furthermore, the specications contain subtle and non-obvious inconsistencies, which will be further explained as part of descriptions of the proposed techniques in Chapters 3{6. 2.4.1 Web Cache System Web Cache is a system that enhances Client-Server interactions with caching function- ality. The system's requirements specication consists of two existential UML sequence diagrams and a set of event invariants (expressed as OCL pre- and postconditions). Note that the variables used in the OCL constraints are similar to uents (recall Section 2.2). In Scenario 1 from Figure 2.3, Client requests data from Cache, Cache fetches the data from Server and returns it to Client. A subsequent request is then directly retrieved from Cache. In the other scenario, Client asks for data that is already cached, and the data 24 requestCacheData Client Cache Server requestServerData responseServerData responseCacheData requestCacheData responseCacheData Scenario 1: requestCacheData pre: requestPending = false post: requestPending = true responseCacheData pre: requestPending = true and cached = true post: requestPending = false requestServerData pre: requestPending = true and cached = false post: responseServerData pre: post: cached = true dataUpdate pre: post: cached = false Constraints: requestCacheData Client Cache Server responseCacheData requestCacheData requestServerData responseServerData responseCacheData Scenario 2: dataUpdate Figure 2.3: Web cache specication. is directly returned. Next, a dataUpdate occurs (indicating that the data has changed), and a subsequent data request is redirected to Server. As an example, the invariants in Figure 2.3 imply that Cache requests data from Server only if there is a pending client request (requestPending = true) and the data is not cached (cached = false). The system-level nature of these variables means that event invariants can reference arbitrary variables regardless of which two components share the event. 2.4.2 Customer Banking System Customer Banking system, which is inspired by the running example found in related literature [95, 104], provides remote banking services via ATM terminals. The system's requirements are rst dened using natural language, and then captured using existing triggered-scenario notations. The scenarios involve four components: (1) UI, (2) ATM, 25 uTS Account Verification UI ATM enterPassword Proxy passVerified Bank verifyAccount verifyWithBank validPass dispOptions Figure 2.4: A uTS [84] for Customer Banking. (3) Proxy that mediates between ATM and Bank, and (4) Bank that approves or re- jects transactions. The dotted scenario arrows represent existential, the dashed arrows represent branching, and the solid arrows represent universal events. Requirement set 1 | processing a user's password: 1. At a certain point, a user may enter her password through UI, which is then sent via Proxy to Bank for verication. 2. UI may nally be requested to display transaction options to the user. 3. Once the password is sent, ATM shall receive either a positive or negative verica- tion response from Proxy. 4. In case Proxy receives a positive verication response from Bank, it must forward it to ATM. ATM is then obliged to instruct UI to display the transaction options. These requirements are captured with a universal Triggered Scenario (uTS) [84] AccountVerication (Figure 2.4). uTS is chosen because it supports modeling of a uni- versal (\must") sequence following an existential (\may") sequence. AccountVerication asserts that Bank's conrmation must be forwarded to ATM, which then displays the transaction options via UI. Requirement set 2 | Proxy password verication: 26 eTS Rejection Branch Proxy passIncorrect Bank verifyWithBank Figure 2.5: An eTS [84] for Customer Banking. eTS Acceptance Branch Proxy passVerified Bank verifyWithBank eTS General Rejection Proxy passIncorrect Bank PassVerifying preCondition(verifyWithBank) = ¬PassVerifying, preCondition(verifyAccount) = ¬PassVerifying, preCondition(validPass) = ¬PassVerifying, preCondition(invalidPass) = ¬PassVerifying, … Figure 2.6: Specication eort beyond AccountVerication. 1. When Proxy requests verication, its next event shall be either a positive or a negative Bank response. 2. When Bank is in its verifying state, which is entered upon a password verication request, Bank shall be able to reject the password. The existential Triggered Scenario (eTS) [84] RejectionBranch from Figure 2.5 denes a sequence in which Bank must have an option to reject a password (passIncorrect) following a verication request (verifyWithBank). eTS is chosen as it supports modeling of a branching sequence (\shall be able to" branch) following an existential one. Note, however, that RejectionBranch only approximates the above requirements. Specifying the missing aspects using existing techniques requires non-trivial added eort (Figure 2.6): a precondition to every Proxy event other than passIncorrect and passVeried to ensure 27 Coffee Machine S1 S2 select putCard?, putCash? S4 S5 select putCard?, putCash? coffee?, cappuccino? S3 putCard?, putCash? S6 coffee?, cappuccino? S7 chargeCoff?, chargeCapp? confirm? S8 makeCoff?, makeCapp? S9 putCard? putCash? makeCoff?, makeCapp? S1 S2 putCard?, putCash? Charging Station Drink Selection S3 chargeCoff?, chargeCapp? confirm? S1 S2 select S3 coffee?, cappucino? S4 chargeCoff?, chargeCapp? confirm? S5 makeCoff?, makeCapp? Figure 2.7: Behavior specication of Coee Machine system. that only the two legal branches are allowed (requirement (1)), and GeneralRejection eTS with a property precondition of being in a verifying state (requirement (2)). 2.4.3 Coee Machine System Coee Machine is a software system controlling a vending machine that makes coee and cappuccino, while accepting cash and credit cards as a form of payment. Cof- fee Machine needs to be built using two components: ChargingStation that deals with the payment and DrinkSelection that monitors beverage choice. Figure 2.7 depicts the CoeeMachine MTS obtained from a preliminary set of requirements using the synthesis algorithm from [91]. The CoeeMachine MTS was detected as distributable and then de- composed using the algorithm from [86] into component MTSs of Figure 2.7. Each state in a system MTS corresponds to a combination of states in the component MTSs. For example, the state CoeeMachine:s 6 corresponds to a combination of ChargingStation:s 2 and DrinkSelection:s 3 . Hence, the transitions from CoeeMachine:s 6 are a result of syn- chronizing the transitions from ChargingStation:s 2 and DrinkSelection:s 3 , as dened by the parallel composition operator from Denition 10. 28 Requirement 3: CoffeeScenario select User Drink Selection Charging Station putCash coffee Requirement 2: EnabledPayment Property G(Payment => DrinkSel) Requirement 1: Charging Preconditons chargeCoff: CoffeeSel ˄ Payment chargeCapp: CappSel ˄ Payment Requirement 4: DispenseScenario chargeCoff User Drink Selection Charging Station confirm makeCoff coffee Figure 2.8: The requirements elicited for Coee Machine system. Coffee Machine Step 12 S1 S2 select putCard?, putCash? putCard?, putCash? cappuccino? putCard?, putCash? S7 chargeCapp? confirm? S8 makeCoff?, makeCapp? putCard? putCash? Coffee Machine Refined S1 S2 select cappuccino? S3'' S3' coffee? putCash S6' putCard?, putCash? S6'' S7'' chargeCoff? chargeCapp? confirm? S8'' makeCoff? makeCapp? S7' confirm? S8' makeCoff S3'' S3' S6' S6'' coffee? putCard?, putCash? chargeCoff? Figure 2.9: The rened MTSs of Coee Machine system. The maybe transitions (`?') of the CoeeMachine MTS are analyzed with a stake- holder, resulting in four new requirements depicted in Figure 2.8. The rst requirement is captured with two event preconditions 1 stating that \the system shall charge for a particular drink only when that drink has been selected and the payment has been sub- mitted." The second requirement is an FLTL [44] property stating that \a payment can be submitted only when a drink has been selected." The third requirement is a uTS [84] stating that \the system shall permit only cash payments for coee." The nal uTS states that \the system shall dispense coee once the coee payment has been processed." The CoeeMachine MTS from Figure 2.7 is rened according to the new requirements. The CoeeMachineStep12 MTS from Figure 2.9 is obtained by rening the CoeeMachine 1 For CoeeMachine, the following uents are dened: CoeeSel initiated by coee and terminated by chargeCo or chargeCapp; CappSel initiated by cappuccino and terminated by chargeCo or chargeCapp; Payment initiated by putCash or putCard and terminated by chargeCo or chargeCapp; DrinkSel initi- ated by coee or cappuccino and terminated by conrm. 29 S0 StackAr S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Trace 1 (capacity: 0) S0 S4 S5 push S6 S7 S8 S9 S10 Trace 2 (capacity: 1) S11 S12 S13 S14 S0 S1 S2 S3 Trace 3 (capacity: 2) S4 S8 StackAr StackAr S1 S2 S3 ET TPN Emp TPN TPN ET FF FT EF TV FF ET TPN TN TPV TN Emp push push TN FT ET TN FT TN: topPop() = null TV: top() ≠ null TPN: topAndPop() = null TPV: topAndPop() ≠ null Emp: makeEmpty() Legend: ET: isEmpty() = true EF: isEmpty() = false FT: isFull() = true FF: isFull() = false S15 push S16 Emp S17 push S5 S6 TPV push S7 Emp push S9 push S10 push S11 TPV S12 push S13 TPV S14 TPV S15 push S16 push S17 TPV S18 Emp S0 StackAr S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Trace 4 (capacity: 5) ET TPN Emp TPN TN FF ET TN FF S0 S1 S2 S3 S4 S8 StackAr Emp push push S5 S6 TPV push S7 Emp push S9 push S10 push S11 TPV S12 push S13 TPV S14 TPV S15 push S16 push S17 TPV S18 Emp Trace 5 (capacity: 6) Figure 2.10: Five example StackAr invocation traces. DataStructures.StackAr:::CLASS this.topOfStack >= -1 this.topOfStack <= size(this.theArray[])-1 DataStructures.StackAr.push(java.lang.Object):::ENTER this.topOfStack < size(this.theArray[..])-1 this.topOfStack >= -1 DataStructures.StackAr.push(java.lang.Object):::EXIT103 orig(this.topOfStack) < size(this.theArray[..])-1 this.topOfStack >= 0 this.topOfStack - orig(this.topOfStack) - 1 == 0 size(this.theArray[..]) == orig(size(this.theArray[..])) Figure 2.11: A subset of Daikon's program invariants on StackAr. MTS based on the rst two requirements from Figure 2.8; the dashed lines denote transi- tions that were removed in the process. The CoeeMachineRened MTS from Figure 2.9 is a renement of CoeeMachineStep12 based on the scenarios from Figure 2.8. 2.4.4 StackAr Java Library StackAr is a Java-based implementation of a stack [21, 36]. Initialized with an integer capacity, StackAr has six public methods: push(x), top(), topAndPop(), makeEmpty(), isEmpty(), and isFull(). Internally, StackAr represents a stack as an array (theArray) with a pointer to the top of the stack (topOfStack). A push() on a full stack generates an exception. A top() or topAndPop() on an empty stack returns null. 30 Two dynamic artifacts are obtained by running StackAr. The rst artifact is a set of ve StackAr invocation traces, corresponding to creating and using stacks of dierent capacities (Figure 2.10). The second artifact is a set of inferred invariants consisting of method invariants and StackAr's object invariants (Figure 2.11). The top portion reports three object invariants: theArray is never null, and the topOfStack index is never less than -1 and is always less than the size of theArray. Figure 2.11 also shows the method pre- and postconditions of push(): before a push() invocation, the stack should not be full; after push() is invoked, the pointer to the top of the stack is incremented. 31 Chapter 3 Heuristic MTS Synthesis Synthesizing component MTSs from a requirements specication is directly motivated by the observation that engineers need tools to reason about and analyze the inherently partial requirements specications. Note that such support is currently lacking in cases when the specications are semantically weak (as in the case of existential scenarios). This chapter describes the algorithm that automatically generates a component-level MTS for each component in a software system from a set of scenario and property specications. The generated MTSs |in addition to being useful as an initial architectural specication of component behaviors| can also be used to discover discrepancies between system-wide and component-oriented perspectives. These discrepancies often represent aws in the system's eventual component-based implementation is overlooked. The algorithm assumes that its input system specications are available in the form of (1) a set of existential UML sequence diagrams [76] and (2) a set of event invariants written in Object Constraint Language [76] specied on a set of system domain variables. The domain variables are not restricted to uents, but can be any Boolean variable, thus making the algorithm more generally applicable. 32 Component Constraint Generation Sequence Diagram Annotation Initial MTS Generation Final MTS Generation SDs System specifications Final MTSs Phase 3 Phase 4 Phase 2 Phase 1 Figure 3.1: MTS synthesis algorithm phases. The nal result of the algorithm is a set of component-level MTSs that capture the behaviors required by scenarios and do not allow behaviors proscribed by the properties. Several important obstacles must be overcome to generate the component-level MTSs: 1. Obtaining constraints on the behavior of individual components, when input spec- ications are given at the level of the whole system. 2. Constructing the state space of each component and determining all potential tran- sitions that are not proscribed by the provided system specication. 3. Inferring the conditions under which a specied existential scenario can execute from each component's perspective. 4. Heuristically incorporating information about required system behavior, captured in system scenarios, into the partial behavior models of components. The algorithm consists of four phases, as depicted in Figure 3.1. Each phase addresses one of the above challenges. The following sections describe the phases of the algorithm, while the nal section presents a set of analyses that expose potential design aws. 33 The algorithm has proven scalable to very large system specications both in theory and practice. Furthermore, the algorithm has been used to improve specication correct- ness and requirements elaboration. The specic results are discussed in Section 7.1. 3.1 Phase 1: Component Constraint Generation The rst phase of the synthesis algorithm produces a set of OCL constraints on the behavior of each component in the system. These constraints are used in later phases of the algorithm to create initial component MTSs and annotate SDs. For each component C, OCL constraints are derived that dene pre- and postconditions on provided and expected events of C. The individual steps of this phase are described next. 3.1.1 Derive Provided and Required Events. A component's interface signature is not assumed available; hence the algorithm rst extracts the interface from the available specication. The set P C of component C's provided events and the set E C of C's expected events are dened as follows: Denition 13 (Provided events). For every component C, for all events o, o2 P C if and only if there exists an SD with evento labeled on an incoming arrow intoC's lifeline. P C is the set of C's provided events. Denition 14 (Expected events). For every component C, for all events o, o2E C if and only if there exists an SD with evento labeled on an outgoing arrow fromC's lifeline. E C is the set of C's expected events. For example, responseCacheData is Client's provided event, while requestCacheData is its expected event (see Figure 2.3). 34 3.1.2 Derive Signicant Domain Variables. In order to arrive at a component's behavior model, the algorithm leverages domain variables. Component states are determined based on the value assignments of domain variables that aect that component's behavior | dierent states will have dierent value assignments. Additionally, assigning values to domain variables for each state helps to determine whether a component event is allowed in a particular state. Not every domain variable aects the behavior of a component. For example, the value of cached does not aect the behavior of Client in the Web Cache system (recall Figure 2.3): Client sends data requests regardless of whether the data is cached. There- fore, it is necessary to determine which domain variables restrict the behavior of a specic component. For a component C, the algorithm rst determines which domain variables restrict C's outgoing behavior (referred to as C's signicant domain variables). Denition 15 (Signicant Domain Variables). Let V be the set of system domain vari- ables. For each component C, V C V is the set of signicant domain variables if and only if for all v2V C , v appears in a precondition of an expected event in E C . For Cache in the Web Cache system, cached and requestPending are Cache's sig- nicant domain variables because Cache should invoke requestServerData and response- CacheData only depending on the values of these variables. 3.1.3 Derive Scoped Domain Variables. To complete a component's behavior model, it is necessary to determine for every com- ponent state whether any provided events may be invoked in that state. The algorithm utilizes the system specication to infer the conditions that have to hold for a provided 35 GenCompConstr(constr Cons, comp C) 1 create new empty constraint set Cons C 2 for each event o in E C 3 for each expression e in pre o 4 if e is defined on any variable in V C 5 Cons C :add(o:pre;e) 6 for each expression e in post o 7 if e is defined on any variable in V C [U C 8 Cons C :add(o:post;e) 9 for each event o in P C 10 for each expression e in pre o 11 if e is defined on any variable v in U C 12 Cons C :add(o:pre;e) 13 for each expression e in post o 14 if e is defined on any variable in V C [U C 15 Cons C :add(o:post;e) 16 return Cons C Figure 3.2: Algorithm for deriving component-level constraints. event to be invoked. For a component C, the algorithm extracts the set of domain vari- ables that are only modied by events in whichC participates. These variables are termed scoped domain variables because they exist globally inside C's \scope". The underlying heuristic is that constraints that specify when C's provided events may be invoked from C's perspective should be dened in terms of the scoped domain variables. Denition 16 (Scoped Domain Variables). LetV be the set of system domain variables. For each component C, let U C be the set of scoped domain variables, such that v2U C if and only if v is modied only by the events o2P C [E C . For example, requestPending is Client's only scoped domain variable; Client has global knowledge of requestPending's value. 36 requestCacheData pre: requestPending = false post: requestPending = true responseCacheData pre: requestPending = true post: requestPending = false Figure 3.3: Client's component-level constraints. 3.1.4 Translate System-Level to Component-Level Constraints. After obtaining the sets V C and U C , component-level constraints can be derived from the available system-level constraints. To this end, the information about a component C's signicant and scoped domain variables is used to decide which subexpressions of the event pre- and postcondition are relevant from C's perspective. For example, Client's component-level constraints (Figure 3.3) contain the precondition requestPending = true for responseCacheData, which diers from responseCacheData's system-level precon- dition (Figure 2.3) because cached does not restrict the behavior of Client. Pseudocode in Figure 3.2 describes the steps for obtaining the component-level constraints | the main heuristic is that the preconditions of expected events are dened only on signicant domain variables, while preconditions on provided events are dened on scoped domain variables. In the rest of the chapter, the component C's signicant domain variables and scoped domain variables are referred to as C's signicant variables. 3.2 Phase 2: Initial MTS Generation After deriving the set of component-level constraints for each componentC, the synthesis algorithm creates an initial MTS M C that captures all the possible behaviors that are not proscribed by the constraints. These behaviors are captured as maybe transitions labeled with the corresponding events, according to the steps described below. 37 3.2.1 Extend the MTS Denition. During construction of the initial MTS, it is necessary to decide whether or not a tran- sition between two states should be added to the MTS. To address this requirement, for a component C, the synthesis algorithm creates an initial MTS whose states have a one-to-one mapping to C's component states (recall Denition 5 of Section 2.2). Denition 17 (MTS Extension). Extended MTS M C is a structure (S, A, r , p , s 0 , T C , Map), where (S, A, r , p , s 0 ) is an MTS as specied in Denition 7, T C is the set of vectors of component states, and function Map : S! T C maps each state in M C to a component state of C. For Cache, the states in the initial MTS T Cache (Figure 3.5) are labeled with the dierent possible truth assignments of signicant variables requestPending and cached (i.e., the dierent Cache's component states). 3.2.2 Create an Initial State. The rst step in the construction of an initial MTS is setting Map(s 0 ) to the initial values of the component's signicant variables. The initial values can typically be extracted from the requirements or otherwise obtained from a domain expert. For example, the initial state of Cache's MTS (Figure 3.5) has the corresponding component state mapping Map(s 0 ) =hrequestPending = false; cached = falsei. 3.2.3 Expand the MTS with Legal Transitions and States. The feasible states of a component's MTS are not known a priori. Therefore, after creating the initial MTS state, the algorithm gradually construct the desired MTS by (1) adding 38 GenInitialMTS(comp C, cons Cons C , vec init) 1 create new MTS M C 2 create initial state s 0 with Map(s 0 ) =init 3 M C :S:add(s 0 ) 4 iterator it =M C :S:iterator() 5 while iterator:hasNext()6=NULL 6 s curr =iterator:nextElement() 7 for each event o2R C [P C 8 if Map(s curr ) satisfies C:pre o 9 flet S next be a set of states s for which 10 transition s curr o ! p s is allowedg 11 for each s2S next 12 if s = 2M C :S 13 M C :S:add(s) 14 M C : p :add(s curr o ! p s) 15 return M C Figure 3.4: Algorithm for generating an initial MTS. new transitions from existing states and (2) adding new states if the transition should lead to a state that is not already in the MTS. Denition 18 (Allowed MTS Transitions). A transition s o ! p s 0 labeled with the name of the event op is allowed if and only if (1) the op's precondition is satised in s (i.e., for Map(s)) and op's postcondition is satised in s 0 , and (2) s and s 0 have identical values for those signicant variables that are not modied by op. Hence, the algorithm adds transitions that are allowed from existing states to the MTS under construction, and adds their destination statess 0 to the MTS if these are not already contained in the state set. These steps ensure that transitions that violate the constraints are not created. The pseudocode in Figure 3.4 details the steps performed to generate an initial MTS for a component. For example, in Cache's initial MTS, starting with state s 0 , the MTS is expanded with transitions s 0 requestCacheData ! p s 2 and s 0 responseServerData ! p s 1 , because preconditions of requestCacheData and responseServerData are satised in s 0 . A newly added state 39 S0 <F,F> S1 <F,T> requestServerData? requestCacheData? S2 <T,F> S3 <T,T> requestCacheData? responseServerData? responseCacheData? responseServerData? responseServerData? responseServerData? Figure 3.5: Cache's initial MTS labeled with component states. s 1 preserves s 0 's value of requestPending and satises responseServerData's postcondi- tion; in contrast, state s 2 preserves s 0 's value of cached and satises requestCacheData's postcondition. The algorithm nally arrives at the complete MTS from Figure 3.5. 3.3 Phase 3: Sequence Diagram Annotation This phase of the algorithm annotates the input scenarios to enrich the weak existential information they convey. In particular, the most general conditions that guarantee that the scenario execution would not violate the invariants are determined. For each sequence diagram, the algorithm creates an annotated SD C that represents the scenario from C's perspective. Subsequently, these annotations are used to heuristically determine the component states from which to require scenario events. To annotate an SD, the algorithm make two passes through it. The rst pass adds an initial set of annotations that pertain to the individual event occurrences, while the second pass propagates values between adjacent annotations. Each annotation is a vector dening the values of component's signicant variable. Variables that must be true are annotated with `T', those that must be false are annotated with `F', and '?' otherwise. In the case 40 Annotate(SD SD C , comp C) 1 for each event invocation i2S C 2 fcreate two new vectors an C;i and an 0 C;i indexed 3 by C's significant variables and initialized to ?g 4 for each C's significant variable v 5 if v must be assigned a value x to satisfy C:pre i 6 an C;i [v] =x 7 if v must be assigned a value x to satisfy C:post i 8 an 0 C;i [v] =x 9 if an C;i [v] = ? and an 0 C;i [v]6= ? 10 and an C;i [v] =an 0 C;i [v] is not a requirement 11 an C;i [v] =? 12 if an 0 C;i [v] = ? and an C;i [v]6= ? 13 and an C;i [v] =an 0 C;i [v] is not a requirement 14 an 0 C;i [v] =? 15 return SD C Figure 3.6: The initial SD annotation steps. of Cache, each annotation will characterize the necessary values of requestPending and cached at dierent points of the Web Cache system scenarios (Figure 2.3). 3.3.1 Create the Initial Annotations. The initial set of annotations is created from the component-level constraints using the algorithm in Figure 3.6. The annotation before (after) an event species the values of signicant variables that have to hold to satisfy the event's precondition (postcondition). Denition 19 (Minimal Annotation). For all components C, SDs SD C , and an event invocation instances i in SD C , an annotation an C;i before (after) i is a minimal an- notation if and only if an C;i satises i's precondition (postcondition), and for all elds an C;i [v] in the annotation vector with a specied truth assignment, modifying the eld would violate i's precondition (postcondition). Creating only minimal annotations ensures that (1) annotations do not violate the constraints, and (2) the most general conditions are captured (i.e., adding more undened 41 requestCacheData <F,?> <T,?> <T,F> <T,F> <?,*> <?,T> <T,T> <F,T> <F,?> <T,?> <T,T> <F,T> requestCacheData <F,F> <T,F> <T,F> <T,T> <F,T> <T,T> <F,T> (a) (b) Cache Cache Figure 3.7: Annotating Web Cache scenario annotated from Cache's perspective. elds violates the constraints). Note that multiple annotations may be needed to capture the event execution conditions (e.g., when a component's state is changed depending on the value of event parameters). This case would involve creation of multiple copies of the SD with dierent valid annotations in the algorithm from Figure 3.6. Figure 3.7a depicts the initial set of annotations from Cache's perspective for the Web cache system Scenario 1. The initial annotation before the rst invocation of requestCacheData in Fig- ure 3.7a ishF,?i becauserequestPending must be false to satisfy the requestCacheData's precondition, while cached is left undened. 3.3.2 Propagate Annotation Values. In an SD, an event is preceded and/or followed by other events; the surrounding context of an event can impose additional conditions that have to hold. Specically, the annotation after an event should not con ict with the annotation before the next scenario event (the assumption is that there are no side-eects between adjacent invocations) | i.e., the two annotated vectors should be consistent. 42 Propagate(SD SD C , comp C) 1 boolean changeFlag =true 2 while changeFlag =true 3 changeFlag =false 4 for each invocation i2SD C 5 for each C's significant variable v 6 if an C;i [v]6= ? and an 0 C;i [v] = ? 7 an 0 C;i [v] =an C;i [v] and changeFlag =true 8 if an C;i [v] = ? and an 0 C;i [v]6= ? 9 an C;i [v] =an 0 C;i [v] and changeFlag =true 10 for each adjacent invocation pair (i k ;i k + 1)2SD C 11 for each C's significant variable v 12 if an 0 C;i k [v] is assigned and an C;i k+1 [v]2f?;?g 13 an C;i k+1 [v] =an 0 C;i k [v] and changeFlag =true 14 if an 0 C;i k [v]2f?;?g and an C;i k+1 [v] is assigned 15 an 0 C;i k [v] =an C;i k+1 [v] and changeFlag =true 16 report unification conflicts and inconsistencies 17 return SD C Figure 3.8: The SD value propagation steps. Denition 20 (Vector Consistency). For all components C and annotation vectors an1 C and an2 C of component C's signicant variable assignments, an1 C and an2 C are consistent if and only if for each variable v, an1 C [v] = an2 C [v] whenever both an1 C [v] and an2 C [v] have a dened truth assignment. The objective of the propagation is for the already consistent annotations after one invocation and before the next in the SD to be identical. Hence, values between the adja- cent annotations are propagated: each eldan1 C [v] which is undened in one annotation is assigned the value of that eld an2 C [v] in the other annotation if an2 C [v] is dened. The new value is then also propagated to the undened eld in the other annotation of the same event if the event does not modify the value of that variable. These propagation steps are iteratively applied as long as there are values that can be further propagated. The details of the propagation process are elaborated in Figure 3.8. 43 Figure 3.7b shows the nal annotated SD from Cache's perspective. Observe that the annotation before the topmost requestCacheData invocation becamehF,Fi, although it was initiallyhF,?i. The value propagation occurred as follows. The initial annota- tion before requestServerData imposes cached to be false. This value is propagated to the annotation after requestCacheData which initially had cached undened. Since re- questCacheData does not modify cached, the value (false) is further propagated to the annotation before requestCacheData. The nal topmost annotation asserts that should be no pending requests and the data entry should not be cached for the scenario to execute. 3.4 Phase 4: Final MTS Generation The last phase of the algorithm leverages the set of initial MTSs and the set of annotated SDs to construct the set of nal component-level MTSs. In the process of MTS renement, the algorithm rst determine the MTS states from which a scenario can execute by heuristically interpreting the scenario annotations. The algorithm then traverses the MTS according to the scenario and converts the traversed maybe transitions to required. 3.4.1 Determine the Launching State(s) for a Scenario. The set of MTS states from which the scenario execution can start, which are referred to as SD's launching MTS states, is heuristically deduced based on the following denition. Denition 21 (Launching MTS State). For all components C, SDs SD C , and MTSs M C with the state set S and the component state mapping Map, s2 S is a launching MTS state for SD C if and only if Map(s) is consistent with the rst annotation in SD C . 44 GenerateFinalMTS(MTS M C , SD SD C ) 1 MTS M =M C 2 fstateSet S 0 M C :S where s2S 0 3 satisfy annotation before SD C :firstOpg 4 for each op2SD C :orderedevents 5 stateSet Snext =; 6 for each s2S 0 7 for each t: s op !s 2 in M 8 where s 2 satisfies the annotation after op 9 MTS N =M 10 if t is a required transition 11 Snext:add(s 2 ) 12 if t is a potential transition 13 if 9t 2 : (s 3 op !r s 2 ) in M 14 set t required in N 15 Snext:add(s 2 ) 16 else (s 0 ;s 00 ) = Refine(M,N,s,s 2 ,op) 17 set s 0 op !r s 00 in N 18 Snext:add(s 2 ) 19 M =N 20 S 0 =Snext 21 return M Refine(MTS M, MTS N, st s, st s 2 , oper opcurr) 1 frefine s 2 in N into s 0 2 and s 00 2 2 where Map(s 0 2 ) = Map(s 00 2 ) = Map(s 2 )g 3 if s =s 2 in M 4 s =s 0 2 in N 5 for each t 2 : s 3 op 0 ! s 2 in M and s 2 6=s 3 6 create t 0 2 : s 3 op 0 ! s 0 2 in N 7 for each t 2 : s 2 op 0 ! s 3 in M and s 2 6=s 3 8 create t 0 2 : s 0 2 op 0 ! s 3 in N and t 00 2 : s 00 2 op 0 ! s 3 in N 9 for each t 2 : s 2 op 0 ! s 2 in M 10 create t 0 2 : s 0 2 op 0 ! s 0 2 in N and create t 00 2 : s 00 2 op 0 ! s 0 2 in N 11 for each t 2 : s 3 opcurr ! s 0 2 in N 12 create t 0 2 : s 3 opcurr ! s 00 2 in N 13 return (s;s 00 2 ) Figure 3.9: Final MTS generation phase. For example, the only launching state in Cache's initial MTS (Figure 3.5) forSD Cache (Figure 3.7b) iss 0 because Map(s 0 ) =hF,Fi is identical to the rst annotation inSD Cache . 3.4.2 Traverse Through the MTS. For each state in the set of launching MTS states, the MTS is traversed starting from that state. The rst traversed transition s 1 op !s 2 is labeled with the name of the rst invoked event op in the SD, while s 1 is the launching state and s 2 is a state consistent with the 45 Traverse and refine S0 → S2 S0 <F,F> S1 <F,T> requestServerData? requestCacheData? S2 <T,F> S3 <T,T> requestCacheData? responseServerData? responseCacheData? responseServerData? responseServerData? responseServerData? S0 <F,F> S1 <F,T> requestServerData? requestCacheData S2' <T,F> S3 <T,T> requestCacheData? responseServerData? responseCacheData? responseServerData? responseServerData? responseServerData? S2'' <T,F> requestServerData? responseServerData? ... S0 <F,F> S1' <F,T> requestCacheData S3'' <T,T> requestCacheData? responseServerData? responseCacheData responseServerData? S2'' <T,F> responseServerData? S2' <T,F> requestServerData? responseServerData S3' <T,T> responseServerData? responseCacheData S1'' <F,T> requestCacheData responseServerData? responseServerData? (1) (2) (3) (4) (5) (6) requestServerData S0 <F,F> S1 <F,T> requestServerData? requestCacheData S2' <T,F> S3 <T,T> requestCacheData? responseServerData? responseCacheData? responseServerData? responseServerData? responseServerData? S2'' <T,F> requestServerData responseServerData? Traverse and refine S2'' → S2' Traverse and refine S2' →S3 Final MTS Figure 3.10: Steps from the initial MTS to the nal MTS for Cache. annotation after op in the SD. If the traversed transition t is a potential transition,the MTS is rened by makingt required, provided that conditions discussed in the next step hold. The same step is then iteratively performed from t's destination state for the next event in the scenario. In case of Cache, the rst traversed transition is s 0 requestCacheData ! p s 2 . Figure 3.9 provides a detailed description of the MTS traversal. 3.4.3 Rene the MTS with Required Scenario Behaviors. Rening a traversed potential transitiont by simply modifying it to required would make the resulting MTS overspecied. For example, imagine the traversal of Cache's initial 46 MTS from Figure 3.5 overs 2 requestServerData ! p s 2 for some SD. Modifying this transition to required would introduce a required self-loop in s 2 on requestServerData. The resulting MTS now incorrectly imposes that subsequent invocations of requestServerData must be supported, although the SD requires that only one such invocation is supported. To address this issue, the destination states 2 of a traversed transitiont is rened into new states s 0 2 and s 00 2 . State s 00 2 is the new destination of all of s 2 's incoming transitions labeled with the traversed event op, while s 0 2 is the destination state for s 2 's remaining incoming transitions. Transition t is then modied into a required transition and the next step in the MTS traversal is performed from s 00 2 . Refine in Figure 3.9 details these steps. Finally, after iterating over all of the scenarios, the nal MTS supports all the behaviors dened in the sequence diagrams via required transitions, while remaining a strong renement of the initial MTS (formal proof can be found in Section 7.1.1.2). Figure 3.10 shows the rst two steps in the MTS renement for Cache and the - nal MTS that is obtained after stepping through the whole SD Cache (Figure 3.7b). The bold parts in the rst three MTSs show the traversed transition, its source and desti- nation state, as well as the transition that was rened in the previous step. The enu- merations in Cache's nal MTS depict how Cache supports Scenario 1 from Figure 2.3 through required transitions. In the rst step of Cache's MTS renement, the transition t: s 0 requestCacheData ! p s 2 is traversed: the destination state s 2 is rened into s 00 2 , which has all ofs 2 's incoming transitions dened on requestCacheData, and states 0 2 , which hass 2 's incoming transitions dened on requestServerData. Subsequently,t is modied to become transition. Note that the nal MTS satises the constraints from Figure 2.3 and realizes 47 the scenarios described in Scenario 1. The nal MTS also captures behavior that has yet to be decided, such as whether Cache can repeatedly invoke requestServerData. 3.5 Discovering Design Flaws Besides producing component MTSs, the proposed algorithm's artifacts aid the discovery of potential design aws, which are overlooked by the existing synthesis approaches. First, the SD annotation phase of the algorithm discovers all scenarios that cannot execute as specied. Second, analysis of the annotations on the component-level and system-level SDs created by the algorithm can suggest subtle design aws that result from inconsis- tencies between component perspectives and the system perspective. Third, analysis of the generated MTSs of dierent components can suggest likely design aws resulting from overly restrictive or overly permissive constraints. This section explores the origins and implications of each of these discrepancies, and outlines the solutions. 3.5.1 Scenario Cannot Execute as Specied By modeling the behavior of the system in two dierent and complementary ways, namely, via scenarios and properties, the engineer is forced to truly understand system speci- cations and the behavior implied by those specications. During the annotation propa- gation (Section 3.3), it is possible to discover discrepancies between the input scenarios and properties. Discrepancies arise when a scenario is supposed to exhibit behavior that is prohibited by some invariant. This can happen in two modalities: (1) when the anno- tations after one and before the next scenario event are not consistent, and (2) when an 48 requestCacheData requestServerData dataUpdate <T,F> <T,F> <F,T> <T,T> ... ... responseData removePermission responseData pre: post: requestPending = false System-perspective: <T,T> Component perspective: <T,F> ... removePermission pre: post: permission = false Client Client WebServer SubscribeServer Cache Server (a) (b) Figure 3.11: Example scenarios with specication discrepancies. event does not modify a signicant variablev, but the annotation elds corresponding to v before and after the invocation have dierent values. For example, Scenario 2 of the Web Cache system (Figure 2.3) contains a discrepancy depicted in Figure 3.11a. The discrepancy is unveiled during SD annotation from Cache's perspective: the value of cached in the annotation following requestCacheData is false, while it is true in the annotation before requestServerData. According to the scenario, Cache should request Server data after the dataUpdate event. However, Cache cannot ob- serve that event and considers the data cached. Therefore, requestServerData invocation would not occur as cached =true con icts with the precondition of requestServerData. A con ict of this type has multiple possible causes and solutions. For example, one or more constraints may be overly restrictive (i.e., the scenario is valid, but a constraint pre- vents it). In this case, the engineer should consider relaxing the constraint that prevents 49 the scenario's execution. More importantly, the engineer should investigate the reasons why the constraint is not required to hold for the particular scenario: Are there special cases that require application of a dierent constraint set? Is the system in a dierent operating mode in which some constraints are irrelevant? For example, the problem in Figure 3.11a may be resolved by relaxing the constraint on requestServerData. Another cause of this type of discrepancy is that the scenario is misspecied and one or more constraints correctly prevents its execution. In this case, the engineer should either correct the scenario or remove it. This problem can result if the scenario is lacking an event, performs prohibited invocations, or events occur out of order. The problem from Figure 3.11a may be resolved by adding to Cache a new event dataChanged, which updates cached to false whenever the data is updated. 3.5.2 System And Component Views Dier There are cases when a scenario can execute under both the component- and system-level perspectives, but undesired behavior is still present due to internal component states that dier from the expected system state. For example, consider the Web Server system from Figure 3.11b, which resembles the Web Cache system. The WebServer component provides partial data to an unsubscribed Client, and full data if Client is subscribed. In the depicted scenario, SubscribeServer, which manages the subscriptions, removes Client's subscription thus making a domain variable permission false. Because responseData does not have any preconditions, the scenario executes correctly. However, comparison of annotations inSD WebServer andSD SYS discovers that elds corresponding topermission dier (highlighted in Figure 3.11b), which, in this case, unveils undesired behavior. 50 These types of issues can be discovered by comparing each component's annotated SD, which captures the component's internal states, with the system-wide annotated SD, which captures the expected at dierent points of the scenario execution. The comparison of component-level and system-wide annotations provides an automatic detection of which components are not \in sync" with the expected system state. These state inconsistencies can result from either of the following causes: (1) a scenario allows a domain variable to be legally modied via some invocation, but all interested components are not notied, hence their states become inconsistent with the system state, (2) a scenario allows a variable to be modied in a manner that is incompatible with the system-wide scenario; as a consequence, a component with an inconsistent state may perform an event that moves other components to an inconsistent state. For such inconsistencies, the engineer should decide whether the specied behavior is indeed legal behavior. If not, the most common strategy for addressing these problems is to add an event to the scenario that noties all the relevant components of the new variable value. Ultimately, the discovery of this issue can lead to substantial design modications such as employing a publish-subscribe architecture to distribute updates. 3.5.3 Component-Level MTSs Dier The nal output of the synthesis algorithm is a set of component-level MTSs. The reach- able states in a component MTS represent the valid component states. It is thus possible to perform comparison of the dierent components' MTS state sets to enumerate the valid value combinations for signicant variables that the components have in common. For example, if the initial values of domain variable cached is true, then Cache's MTS 51 would only have states in which cached = true, while Server's MTS would have states where cached = true and states where cached = false. Such a discrepancy may (though it need not) indicate a design aw and should be further analyzed. There are two possible causes of such discrepancies. First, the system may be under- constrained so that certain undesired behaviors are not prevented and a component can end up in an illegal state. In this case, the engineer should modify existing constraints to make them more restrictive or add new constraints to explicitly disallow the behaviors leading to the illegal state. Second, a component may be overconstrained and is unable to reach a desirable state. To address this issue, the engineer should relax the constraints that apply to the component or introduce new events that lead to the desired state. 52 Chapter 4 Component-Aware Triggered Scenarios To address the limitations of the existing triggered-scenario languages (recall Section 1.1), this dissertation proposes a novel language | component-aware Triggered Scenarios (caTS). To illustrate the limitations of the existing languages and motivate caTS, consider the scenarios for the Customer Banking system of Section 2.4.2. The AccountVerication uTS from Figure 2.4 tries to capture how a user's password is processed (Requirement set 1 of Section 2.4.2). As specied in Figure 2.4, however, AccountVerication does not dene whether UI and ATM must wait for dispOptions and validPass, respectively, or may also permit other events, as suggested in requirements (2) and (3). The reason for the inability to specify these requirements under the system-level uTS semantics is that validPass and dispOptions become universally required only after verifyWithBank and passVeried occur; however, neither UI nor ATM can observe these events. Hence, a more eective way to specify and interpret behavior obligations and restrictions related to specic components is necessary. To capture the Requirement set 2 of Section 2.4.2, the initially specied RejectionBra- nch eTS from Figure 2.5 had to be augmented using non-trivial added eort (Figure 2.6). 53 This was done to ensure that only the two legal branches are allowed (per requirement (1)), and an additional scenario GeneralRejection with a property-based precondition of being in a verifying state (per requirement (2)). The added eort stemmed from inadequate capabilities of the existing languages: these languages have limited options for direct elaboration of a scenario to specify alternative enabled behaviors or to specify how scenario steps relate to component states. To resolve the existing limitations, caTS inherits the concepts found in existing lan- guages with several important additions. These additions empower an engineer to specify behavior constraints and obligations of system components in the context of a system sce- nario. First, the semantics of caTS are dened at the component level. Second, caTS permits varying event modalities in a single scenario, and, in contrast to [47], is the rst to allow modality mixing for a single event. Third, caTS introduces additional language con- structs |context annotations and alternative event annotations| to model information about the components' states and enabled events. Using caTS, the problematic Cus- tomer Banking scenarios can be xed by adding a small number of annotations to those scenarios (AccountVerication and RejectionBranch are re-specied as caTS depicted in Figure 4.2 and Figure 4.3) without any other additional eort. This chapter operationalizes caTS semantics with a synthesis algorithm that con- structs a set of component-level MTSs from a caTS | one such model for each compo- nent represented in the caTS. A component MTS synthesized by the algorithm denes the set of all component implementations that satisfy the caTS specication. The com- bination of caTS specication and MTS synthesis can serve as a useful tool for iterative specication renement. The intent is to capture the basic system use-cases as a set of 54 existential caTS and then to gradually elaborate them. It is then possible to merge [40] the MTSs synthesized from the dierent caTS, and compose [61] the component MTSs into a system MTS. In turn, an engineer can analyze the undened MTS behavior to elaborate the existing caTS and elicit new ones. This chapter introduces the component-aware Triggered Scenarios' (caTS) syntax (Section 4.1) and semantics (Section 4.2). Subsequently, the chapter presents a synthesis algorithm that creates an MTS for each component such that every implementation of a generated MTS satises the given caTS. Section 7.2 proves correctness and completeness of the synthesis algorithm, provides a case-study-based analyses of benets from using caTS, and quantitatively analyzes the savings in modeling eort when using caTS. 4.1 caTS Syntax The syntax of caTS builds on the features of basic sequence charts described in Section 2.1. Triggered scenarios in general, and caTS in particular, support events of dierent modal- ities. To reiterate, the visual syntax denotes (from weakest to strongest) the existential events with dotted arrows, branching events with dashed arrows, and the universal events with solid arrows. In addition to the constructs shared with existing languages, caTS has four novel constructs depicted in Figure 4.1. These constructs allow an engineer (1) to specify component-level event obligations, (2) to set the context of a scenario, and (3) to specify the alternatives to an event. Note that while these are new syntactic constructs, the semantic concepts they represent are similar to those found in existing specica- tion languages thus not requiring a major \paradigm shift". caTS adapts these existing 55 existential event branching event universal event universal obligation branching obligation state label context annotation events alternative event annotation Figure 4.1: caTS-specic scenario constructs. semantic concepts (e.g., assigning varying modalities to events, property-based annota- tions) in a way that enables the specication of behavioral obligations and restrictions on individual components. The analysis of AccountVerication uTS from Figure 2.4 identied that validPass should be a universal event from Proxy's perspective, but a branching event from ATM 's perspective. caTS supports specifying such diverging component-level obligations using the one-sided obligation construct, which can have a universal or branching modality (Figure 4.1). For example, AccountVerication revised as a caTS (Figure 4.2) assigns a universalvalidPass obligation to Proxy |denoted by the solid circle on Proxy's end of the validPass arrow| while ATM 's obligation remains branching. A one-sided obligation may be used only on event arrows of strictly weaker modality. For example, the branching one-sided obligation may be used only on an existential event arrow. Reasoning about a component's reactive behavior frequently relates to specic com- ponent states. For example, Bank's requirements from Requirement set 2 of Section 2.4.2 asserts that passIncorrect should be an enabled behavior for every verifying execution state (represented with a PassVerifying uent dened in Figure 4.5). In other words, being in a PassVerifying component state is a trigger for the branching obligation to generate passIncorrect. To specify such triggering conditions, the locations along caTS 56 caTS Account Verification UI ATM enterPassword Proxy passVerified Bank verifyAccount verifyWithBank validPass dispOptions Figure 4.2: AccountVerication as a caTS scenario. lifelines can be annotated with context annotations that specify propositional expressions on uents (e.g., PassVerifying in Figure 4.3); this is in contrast to the existing notations such as eTS [84] that allow property-based trigger conditions only at the initial scenario location. Without this ability to assign a context annotation to the desired location, an engineer would have to specify an additional scenario with a single event and a precon- dition referring to the specic states (e.g., eTS GeneralRejection from Figure 2.6). Any event dened for a component is a possible alternative to that component's ex- istential or branching scenario event. This may be undesirable in practice when it is known that only some events are enabled at a certain point of a component's execu- tion. To explicitly specify the legal alternatives in a compact and intuitive way, caTS supports alternative event annotations. For example, the caTS RejectionBranch from Figure 4.3 implies passVeried as the only legal alternative to passIncorrect from Proxy's viewpoint. Current solutions cannot restrict this elegantly as they require additional spec- ications, such as the preconditions discussed in Section 2.4.2 that would complement the RejectionBranch eTS from Figure 2.6. The denition of a caTS component instance includes the notions of event modality and caTS annotations: 57 caTS Rejection Branch Proxy passIncorrect Bank verifyWithBank PassVerifying passVerified Figure 4.3: RejectionBranch as a caTS scenario. Denition 22 (caTS Component Instance). A caTS component instance is a tuple (L, ,, , mod, , alt) where: (L, ,, ) is a basic component instance (Denition 1) mod :L!fexi;brc;unig is a function that assigns the modality to a location, :L!f;g[ is a state annotation function with as the set of uent proposi- tional logic formulas, and alt :L!f;g[ 2 is an alternative event annotation function that assigns a subset of component's events to a location. Informally, each location l of a component instance has an associated event ((l)) with an assigned modality (mod(l)), an optional alternative event set (alt(l)), and an optional annotation with a propositional logic expression ( (l)). For example, Bank's instance in RejectionBranch (Figure 4.3) has two locations: l 1 before verifyWithBank and l 2 between verifyWithBank and passIncorrect. l 2 has an associated branching (brc) event (l 2 ) = passIncorrect, and a context annotation (l 2 ) = PassVerifying. 58 4.2 caTS Semantics This section gradually dene the caTS semantics. First, Section 4.2.1 overviews the semantics. Section 4.2.2 denes the obligations incurred by the event modalities. Sec- tion 4.2.3 formalizes the meaning of caTS annotations. Section 4.2.4 nally uses these elements to dene the full caTS semantics. The semantics is dened via conditions that characterize whether a component implementation, described as an LTS (recall Sec- tion 2.3), satises a caTS. 4.2.1 caTS Semantic Ingredients To express the behavior a component should exhibit and the conditions under which that behavior is exhibited, caTS oers three semantic ingredients: (1) specication of behavior obligations, (2) specication of the execution context that triggers the obligations, and (3) classication of alternatives as possible or undesired. To comprehend the need for these caTS ingredients, consider the component LTSs depicted in Figure 4.4. The event names have been abbreviated for visual clarity, and correspond to those in Figures 4.2 and 4.3. These models are implementations of the component MTSs that were synthesized using the heuristic MTS synthesis algorithm [52] described in Chapter 3. These MTSs are synthesized from the scenarios depicted in Figure 2.4 and Figure 2.6, and the event preconditions from Figure 4.5. The LTSs depicted in Figure 4.4 are used to illustrate the potential issues; other implementations are possible. While the LTSs from Figure 4.4 are correct implementations of the synthesized MTSs, due to the expressive and semantic limitations of the input scenarios, the synthesized 59 S1 S2 enterPass S3 S4 verifyAcc validPass cancel validPass dispOptions, cancel badPass, cancel ATM S1 S3 verWBank Proxy passVfied verWBank S1 verWBank Bank passVfied, passIncorrect verifySpecial passVfied, retry S2 S3 verWBank S2 verifyAcc S4 S5 passIncorrect validPass badPass Figure 4.4: Example LTS models for Customer Banking components. MTSs (and, in turn, their LTS implementations) fail to capture relevant information already specied in the natural language requirements (Section 2.4.2). Note that some of the events in the LTSs, such as cancel, appeared only in the invariant specication (e.g., cancel's invariant is listed in Figure 4.5). Furthermore, on top of regular user verication, Bank also veries corporate clients (verifySpecial event). The rst ingredient of caTS semantics, provided in events of dierent modalities, allows an engineer to specify a component's desired reaction once a trigger is reached (e.g., ATM must display options after a positive Proxy response). The distinct features of caTS event modalities are that (1) the components sharing an event can have dierent modalities for that event, and (2) an engineer can assign modalities to events in a scenario as mandated by the requirements without any restrictions. In the example, ATM 's LTS from Figure 4.4 does not comply with the caTS AccountVerication from Figure 4.2, even though the LTS was generated from the ostensibly analogous scenario of Figure 2.4. Specically, the ATM can execute cancel from state s 4 despite the one-sided universal obligation to generate dispOptions at that point of execution. This one-sided obligation is consistent with the stakeholder intent specied with the natural requirement (2) from the Requirement set 1 of Section 2.4.2. The second semantic ingredient, supported with context annotations, allows an engi- neer to describe the triggering context | the component states to which a certain event obligation applies. The default triggering context for an event with an obligation mod(l) 60 fluent Validating = <enterPass,{badPass, dispOptions, cancel}> init false fluent PassProcessing = <verifyAcc, {validPass, badPass}> init false fluent SimpleVerifying = <verWBank, {verifySpecial, passVfied, passIncorrect, retry}> init false fluent PassVerifying = <{verWBank, verifySpecial}, {passVfied, passIncorrect, retry}> init false preCondition(verifyWithBank) = PassProcessing preCondition(cancel) = Validating preCondition(passIncorrect) = PassVerifying Figure 4.5: Initial preconditions for Customer Banking system. is the scenario sequence preceding (l). By contrast, a context annotation explicitly ties event obligations to a specic set of component states (recall Denition 5), and thereby to their related LTS states. 1 For example, both s 2 and s 3 of Bank's LTS in Figure 4.4 should be able to execute passIncorrect per the RejectionBranch caTS (Figure 4.3) be- cause the uent PassVerifying evaluates to true in those states (see the uent denition in Figure 4.5). Note that RejectionBranch is consistent with the requirement (2) from Requirement set 2 of Section 2.4.2 that species PassVerifying as the triggering condi- tion. However, Bank's LTS was generated from the eTS RejectionBranch of Figure 2.5, and since passIncorrect is not supported in s 3 , it violates the original requirement. The desired behavior was lost during the renement process because passIncorrect's invariant (Figure 4.5) was intended to express the obligation. However, an invariant can only forbid undesired behaviors and cannot require some desired behavior. The third semantic ingredient of caTS, supported via alternative event annotations, denes the only legal alternatives to existential and existential branching events. An 1 In general, the relationship between component states and LTS states is many-to-many. Given a component's LTS L, each of its states L:s has a set of associated component states CS to which it can map during component execution such that, for each csi2CS, there exists a sequence from L:s0 to L:s that establishes the uent evaluation csi [25]. 61 alternative event annotation is thus semantically similar to having a set of negative sce- narios: one for each event that does not appear in the annotation. Proxy's LTS from Figure 4.4 violates the RejectionBranch caTS from Figure 4.3, even though the LTS was obtained after rening a specication that includes the ostensibly analogous eTS scenario of Figure 2.5. Specically, Proxy's state s 3 has a verifyWithBank transition, which con- tradicts passIncorrect's caTS annotation. The origin of this inconsistency was an overly permissive verifyWithBank precondition (Figure 4.5), which is a common occurrence in the case of the frequently incomplete invariant specications. 4.2.2 caTS Event Obligations A purely existential event does not impose specic obligations on the behavior of a component. A branching event e requires a component to be able to generate/receive e whenever the trigger is reached. A component event's trigger in an annotation-free caTS (i.e., without state and alternative event annotations) is the sequence ! preced- ing e on the component lifeline. A universal event e requires a component to gener- ate/exclusively accept e whenevere's trigger is reached. For example, AccountVerication in Figure 4.2 obliges ATM to exclusively generate dispOptions whenever it executes the sequencehenterPassword; verifyAccount; validPassi. Event obligations are dened in a way that is intentionally similar to the existing scenario languages [27,84]. Denition 23 (Annotation-free caTS Event Obligation). Let LTS C = (S, A, r , r , s 0 ) be a component implementation andCI = (L, ,,, mod, , alt) be an annotation- free caTS component instance. C satises an obligation assigned to a locationl and event (l) if and only if: 62 1. 8s i ;s j 2S : ((mod(l) = ebr)^ (s i ! !s j ))) (s j (l) !), 2. 8s i ;s j 2S : ((mod(l) = uni)^ (s i ! !s j ))) ((s j (l) !)^ (8lbl2Anf(l)g : (s j lbl !))), where !2 is the sequence that precedes l in CI. 2 The above conditions dene the structure of a component's LTS that satises branch- ing and universal event obligations. Informally, an LTS states j reachable via! (s i ! !s j ), where! is(l)'s trigger, must satisfy(l)'s obligation. When(l) is a branching event, at least one ofs j 's transitions must be labeled with(l). For example, ATM 's states 3 (Fig- ure 4.4), reachable viahenterPassword; verifyAccounti, satises the branching validPass obligation modeled in AccountVerify of Figure 4.2. When (l)'s modality is universal, every one of s j 's transitions must be labeled with (l). Hence, ATM 's LTS violates the universal dispOptions obligation because it has a con icting transition s 4 cancel !s 1 . 4.2.3 Semantics of caTS Annotations caTS annotations (1) generalize the triggering context of event obligations (context an- notations) and (2) prune undesired behaviors (alternative event annotations). As discussed above, a context annotation explicitly associates some of the event obli- gations to specic component states. A context annotation (l) asserts that event obli- gations specied afterl should be fullled starting with any implementation LTS state in which (l) can be satised. This is in contrast to stating that an event (l)'s obligation must be fullled only when the sequence preceding(l) occurs. For example, the context 2 Assume A = for clarity of presentation; the full caTS semantics will be dened for A . 63 annotation PassVerifying in the RejectionBranch caTS (Figure 4.3) asserts that Bank should be able to generate passIncorrect from any PassVerifying state. Denition 24 (caTS Triggering Context). Let CI = (L, ,, , mod, , alt) be a caTS component instance. The triggering context of a location l and its corresponding event(l) (denoted ctx(l)) is a pair (,$), where 2 is a uent propositional formula and $2 is a word of event labels. and $ are dened as follows: ctx(l): = (l 0 ) where l 0 is such that (l 0 l)^ (8la2Lnfl;l 0 g: ((l 0 la)^ (lal)))( (la) =;)) ctx(l):$ is the scenario sequence between l 0 and l along the component lifeline. The triggering context of an event (l) consists of the property (l 0 ) established in the nearest context annotation, and the scenario sequence $ between l 0 and l. (l)'s event obligation activates whenever $ is executed from a state in which (l 0 ) can be satised. The triggering context is incorporated into caTS semantics by replacing the statement (s i ! !s j ) from the conditions in Denition 23 with (9! 0 2 : (s 0 ! 0 !s i )^(! 0 j = ctx(l):))^(s i ctx(l):$ ! s j ). The rst clause in the new statement evaluates whether an LTS state s i satises the context annotation ctx(l):; the second clause evaluates whether s i can reach s j via ctx(l):$. As discussed in Section 4.2.1, Bank's states s 2 and s 3 in Figure 4.4 satisfy the triggering context of passIncorrect's obligation from Figure 4.3. caTS semantics impose two additional restrictions on context annotations. First, a context annotation can only refer to \local" uents observable by a component in order to prevent specication of obligations that a component cannot satisfy. Second, a context annotation (l) must be satised by the uent evaluation obtained by executing the scenario from the scenario's start to l. This ensures that a component that satises 64 a caTS with a context annotation always satises the same caTS without the context annotation. For example, executing verifyWithBank in the context of RejectionBranch (Figure 4.3) yields the uent evaluation SimpleVerifying =T^PassVerifying =T ; hence, the context annotation after verifyWithBank is valid. The nal ingredient of the caTS semantics interprets alternative event annotations. Formally, when (l) has a dened alternative event set, each component state s j sat- isfying (l)'s triggering context must have outgoing transitions labeled exclusively with (l) and its alternative events. In addition to their previously stated benets, these an- notations help to scope reasoning in the subsequent requirements elicitation. For exam- ple, annotating passVeried as the only alternative to passIncorrect in RejectionBranch from Figure 4.3 avoids extraneous questions about the legality of sending an additional verifyWithBank before receiving a response. 4.2.4 Complete caTS Semantics The full caTS semantics incorporates all semantic ingredients and extends Denition 23 with additional clauses. For example, s i ! !s j is replaced with s i !s j ^ j =!, where j is event projection, to account for the possible dierences between the scenario and component alphabets. Denition 25 (caTS Component Instance Satisfaction). Let LTS C = (S, A, r , r , s 0 ) be a component implementation, A be the set of words dened on A, and CI = (L, ,, , mod, , alt) be a caTS component instance. A component implementation C is said to satisfy a caTS component instance CI (denoted Cj = CI) if and only if the following conditions hold for every location l in L and every state s i , s j in S: 65 1. [(mod(l)2fexi; ebrg)^ (9 1 2A : (s 0 1 !s i )^ ( 1 j j = ctx(l):))^ (9 2 2A : (s i 2 !s j )^ ( 2 j = ctx(l):$))]) [8 3 2A : ((s j 3 !)^ ( 3 j 6=;))) (start( 3 j )2 alt(l)[f(l)g)] 2. [(mod(l) = ebr)^ (9 1 2A : (s 0 1 !s i )^ ( 1 j j = ctx(l):))^ (9 2 2A : (s i 2 !s j )^ ( 2 j = ctx(l):$))]) [9 3 2A : (s j 3 !)^ ( 3 j =(l))] 3. [(mod(l) = uni)^ (9 1 2A : (s 0 1 !s i )^ ( 1 j j = ctx(l):))^ (9 2 2A : (s i 2 !s j )^ ( 2 j = ctx(l):$))]) [(9 3 2A : (s j 3 !)^ ( 3 j =(l)))^ (8 4 2A : ((s j 4 !)^ ( 4 j 6=;))) (start( 4 j ) =(l)))] The rst condition above handles alternative events; the following two conditions dene obligations stemming from existential branching and universal event modalities, respectively. The antecedent of each condition evaluates whether a state s j fullls the triggering context. When the context is satised, the consequent evaluates the validity of event alternatives (condition (1) above), the satisfaction of existential branching obli- gations (condition (2)), and the satisfaction of universal obligations (condition (3)). A system satises a caTS scenario if its components satisfy the caTS component instances. 66 4.3 From caTS to Component MTS Models A single caTS only partially species the system components' behavior. Hence, there can exist many component implementations that satisfy a caTS. To accurately characterize the set of LTS implementations that satisfy a given caTS, this section uses MTS as a compact target model for synthesis from caTS. These can, in turn, be transformed into appropriate implementation through the notions of MTS renement and implementation set introduced in Section 2.3. The synthesis algorithm, specied in Figure 4.6, builds on the heuristic MTS synthesis algorithm from Chapter 3. The scenario interpretation heuristics described in Chapter 3 are semantically similar to translating the existential UML sequence diagrams into a special form of caTS | a caTS with strictly branching events and context annotations inferred from the event invariants. The primary dierences between the two synthesis algorithms are that synthesis from caTS needs to handle the dierent event modalities and caTS annotations, while having to work without separately specied event invariants. To illustrate the concepts behind the algorithm, consider ATM 's MTS (Figure 4.7) obtained from the AccountVerication caTS (Figure 4.2). The four states in Figure 4.7 are intended to track the execution of AccountVerication: ATM 's states 1 denotes that the scenario is not executing, whiles 4 tracks that AccountVerication's execution is in its last step (s 4 is only reachable via the sequencehenterPassword; verifyAccount; validPassi). Such an \execution-tracking" MTS is needed to prohibit or require behaviors precisely as specied by the caTS. For example, the algorithm keeps only a single required dispOptions transition in ATM 's state s 4 due to the universal obligation from AccountVerication; 67 GenerateSystemMTS(caTS Sc) 1 for each instance CI in Sc 2 MTS M C = CreateInitialMTS(CI ) 3 for each sequence subSc in CI 4 ReneByScenario(M C , subSc) 5 return M = M C1 k:::kM Cn RefineByScenario(MTS M C , sequence subSc) 1 set explore =; 2 for each state M C :s that satises subSc:preCond 3 add M C :s to explore 4 for each state s in explore 5 remove s from explore 6 location l = subSc:initLoc 7 while l6=; 8 if9s (l) ! s 0 9 add new state s 00 to M C :S 10 replace s (l) ! s 0 with s (l) ! s 00 11 for each s 0 (l) ! t in M C : 12 add s 00 (l) ! t to M C : 13 if alt(l)6=; or mod(l) = uni 14 remove all s lbl ! p t such that lbl = 2 alt(l)[f(l)g 15 if mod(l)2febr; unig 16 change s (l) ! s 00 to required s (l) ! r s 00 17 if s 0 2explore then add s 00 to explore 18 l =l:next, s =s 00 19 else if mod(l)2febr;unig then report error(l, s) Figure 4.6: Component-level MTS synthesis. the state s 3 has a required validPass transition due to AccountVerication's branching obligation, while allowing all other events via maybe transitions. The method GenerateSystemMTS from Figure 4.6 denes the algorithm's high- level steps; RefineByScenario species how a component MTS is rened according to a caTS. GenerateSystemMTS iterates through component instances to create an MTS for each component (lines 1{4). The method rst creates an initial MTS (line 2) that allows every behavior in terms of maybe transitions, while tracking the values of uents used in context annotations [44] to be able to identify states that satisfy those 68 S1 S2 enterPass? S3 S4 verifyAcc? validPass enterPass? enterPass? Alt?\ {verifyAcc?} Alt? dispOptions Alt?\ {validPass?} ATM Alt={verifyAcc,validPass, badPass,cancel,dispOptions} Figure 4.7: ATM 's MTS for AccountVerication. annotations. For example, ATM 's initial MTS based on AccountVerication caTS has a single state with maybe self-transitions on every ATM event. GenerateSystemMTS then splits a lifeline along context annotations, as permitted by the triggering context denition (recall Denition 24), and renes the MTS according to the obtained scenario subsequences (lines 3{4). A system-level MTS is nally created as a composition of component MTSs (line 5). The method RefineByScenario iterates through MTS states (lines 4{19) in which a scenario may start executing; these are the states that satisfy the initial context anno- tation, if present, or otherwise every state. The method steps through the scenario (lines 7{19) and renes the MTS in relation to the caTS obligations. Beginning with a state s from which the scenario may start, the algorithm takes the rst event (l), nds the transition s (l) ! p s 0 and creates a new state s 00 (line 9 in Figure 4.6). s 00 is designated to track the scenario execution by changing the target state of s (l) ! p s 0 from s 0 to s 00 (line 10). Subsequently, the method removes transitions or upgrades them to required accord- ing to event obligations (lines 13{16). The algorithm nally proceeds to s 00 and handles the next scenario location (line 18). 69 The synthesis algorithm creates component-level MTS models whose implementation sets contain exactly those LTSs that satisfy a given caTS. This is phrased as Theorem 4 and proven in Section 7.2.1. Once the component and system MTSs are synthesized, they can be utilized to elicit additional requirements, as suggested by existing research [39, 52, 91, 93]. Furthermore, multiple MTSs synthesized for the same component from multiple caTS can be merged into a single MTS [39,40]. The composite system MTS can be used to validate or verify the correctness of the overall system behavior. 70 Chapter 5 Refinement Distribution Framework The work presented in this chapter aims to support the interpretation and distribution of a new system-level requirement even if it at rst produces a non-distributable underlying MTS. In particular, the chapter proposes a framework that interprets the renements performed on a system model in response to a newly introduced requirement in terms of the renements of the constituent components' models. The framework ensures the consistency of the system model and the component models. In doing so, it also prevents the requirement inconsistencies that may stem from the monolithic viewpoint. To illustrate the types of inconsistencies that may arise without the framework, con- sider the Coee Machine system from Section 2.4.3. In particular, the initial MTS spec- ication of Coee Machine (Figure 2.7) was rened into the MTS depicted in Figure 2.9 according to the new requirements from Figure 2.8. Although CoeeMachineRened cor- rectly renes CoeeMachine per Denition 8, it is incorrect with respect to the system's decomposition into two interacting components. Specically, the second and third require- ment con ict as they relate to the same behavior of ChargingStation | CoeeMachine 71 cannot prevent payment (putCard or putCash) before a drink is selected and simultane- ously impose only cash payments for coee because ChargingStation does not monitor the drink selection. The existing algorithm for MTS decomposition [86] can detect that CoeeMachineRened is not a distributable MTS, but does not suggest how to make it distributable nor which requirements caused the inconsistency. By contrast, the aim of the framework is to continuously keep the system MTS and component MTSs \in sync" as new requirements are added, thus immediately detecting any inconsistencies. The framework handles three dierent types of system MTS renements. These re- nement types have been dened after studying how a new system-level requirement |be it a scenario, a temporal property or an invariant| changes a system MTS. For each renement, the framework uses the mapping between the states in the system model and the states in the component models to identify the component states that need to be rened in response to rening the system states. An engineer using the framework does not need to be aware of the underlying models' renements: the outcome of the renement process can be captured as a set of additional requirements that ensure the overall consistency of the requirements specication. To reason about the component states and transitions aected by a system-level re- nement, the distribution framework uses the mapping between the system states and the components' states. This mapping stems from the parallel composition operator (De- nition 10), which represents each system state as a tuple of component states. Notation M:s M i = M i :p is used to denote the mapping from a system state M:s to a component M i 's state M i :p. Similarly, M i :p M =fM:sjM:s M i =M i :pg denotes the reverse mapping from a component state to a set of system states. 72 The framework has been evaluated to prove its soundness and correctness, as well as to assess its practical utility on a case study. The case-study experience suggests that the framework helps to correctly rene the MTSs, to avoid inconsistencies, and to analyze system behaviors. The specic results are presented in Section 7.3. Section 5.1 denes the types of renements incurred by the system requirements. Sections 5.2{5.4 dene elements of the framework that handles such renements. 5.1 Renement Types To reason about requirement-driven renements, it is necessary to understand the struc- ture of an MTS obtained from a requirement. The existing synthesis procedures [84,91], including those from Chapters 3 and 4, produce MTSs in which each state is uniquely characterized by (1) the evaluation of the currently specied uents, and (2) the scenario step tracked in the particular state. For example, the state CoeeMachineRened:s 0 3 (Fig- ure 2.7) has the CoeeSel uent set to true and tracks that the execution of CoeeScenario (Figure 2.7) has reached its third step. Addition of a new requirement may rene an MTS (1) to distinguish between states with alternative uent evaluations or to track execution of a scenario, and (2) to prohibit or require some of the previously maybe behaviors. Three atomic renement types described below capture these renement notions: transi- tion renement, execution-tracking state renement, and uent-based state cloning. 5.1.1 Transition Renement Transition renement is performed when all of the uents and scenario steps are tracked in a system MTS and it is only necessary to (1) require a maybe transition according 73 to a scenario or (2) remove a maybe transition that violates a property, an invariant, or a universal scenario. For example, as part of the renement of CoeeMachineStep12 to CoeeMachineRened based on CoeeScenario from Figure 2.8, the transition Coee- MachineRened:s 0 3 putCash ! r CoeeMachineRened:s 0 6 is required due to CoeeScenario's universal obligation to require putCash. Denition 26 (Transition Renement). For MTS M = (S, A, r , p , s 0 ), rening a transition M:s l ! m M:s 0 produces an MTS N = (S, A, r0 , p0 , s 0 ), where r0 = r [fN:s l ! r N:s 0 g when a transition is rened to required or p0 = p nfN:s l ! p N:s 0 g when a transition is prohibited. 5.1.2 Execution-Tracking State Renement Rening an MTS according to a scenario splits some of the MTS states to distinguish between the executions that follow the scenario and other possible executions. There- fore, it is possible to require or prohibit events only for those executions that follow the scenario (recall the renement procedures from Chapters 3 and 4). For example, DispenseScenario from Figure 2.7 renes CoeeMachineStep12 to CoeeMachineRened by rening the states s 7 and s 8 to track DispenseScenario's execution. These state re- nements make it possible to require makeCo and to prohibit makeCapp exclusively following the execution of the prechart in DispenseScenario (i.e., the initial, purely exis- tential event subsequence of the scenario). Execution-tracking state renement renes an MTS states to distinguish (1) transitions incoming tos that follow the scenario sequence, and (2) transitions incoming to s that follow other possible execution sequences. 74 Denition 27 (Execution-Tracking State Renement). Renement of a state s in an MTS M = (S, A, r , p , s 0 ) to track execution of (p e ! s) produces an MTS M 0 = ((Snfsg)[fs 0 ;s 00 g, A, r0 , p0 , s 0 ), where r0 and p0 satisfy the following rules (with 2fr;m;pg): 1. ((p =s), (s 0 e ! s 00 )2 0 ) 2. ((p6=s), (p e ! s 00 )2 0 ) 3. ((q l ! s)2 ^ (l6=e_q6=p)), (q l ! s 0 )2 0 4. ((s l ! q)2 ^ (l6=e_q6=s)), (s 0 l ! q)2 0 5. (s l ! p q)= 2 p ) (s 00 l ! p q)= 2 p0 6. (s l ! r q)2 r ) (s 00 l ! r q)2 r0 7. q6=s;t6=s: (q l ! t)2 , (q l ! t)2 0 A state renement redistributes the incoming transitions of the original state across the new states: one state s 00 has the incoming transition corresponding to the scenario execution (rules (1) and (2) above), while the other state s 0 has all other incoming tran- sitions of the original state (rule (3)). The execution-tracking state s 00 satises the re- nement relation from Denition 8, but may also require or prohibit some of the orig- inally maybe transitions, as required by the scenario (rules (5) and (6)). For example, makeCo is required and makeCapp is prohibited from CoeeMachineRened:s 0 8 to sat- isfy the DispenseScenario uTS. An execution-tracking state renement does not modify the transitions between other MTS states (rule (7)). 75 5.1.3 Fluent-Based State Cloning Fluent-based state cloning is required whenever a new requirement introduces a previously unused uent. The cloning splits the states of an MTS so that every state has a unique uent evaluation. This enables subsequent renements explicitly for states with a specic evaluation of the new uent. For example, the preconditions in Figure 2.8 introduce two new uents CoeeSel and CappSel. The statess 3 ands 6 of the CoeeMachine MTS (Fig- ure 2.7) are then split in the rened CoeeMachine12 MTS to track the value of CoeeSel and CappSel. By doing so, it is possible to prohibit the transition on chargeCapp only from the state in which CoeeSel evaluates to true (state CoeeMachineRened:s 0 6 ) in or- der to satisfy chargeCo 's precondition. At the same time, chargeCapp remains allowed in the states for which CoeeSel is false (state CoeeMachineRened:s 00 6 ). Denition 28 (Fluent-Based State Cloning). For MTSM = (S,A, r , p ,s 0 ), cloning states based on a uent = (I ,T , In ) produces an MTSN = (S 0 ,A, r0 , p0 ,s 0 In ), whereS 0 =S true [S false , while r0 and p0 satisfy the following rules (with 2fr;m;pg and 2ftrue; falseg): 1. ((s 2S 0 )^ (s l ! q2 )^ (l2I )), ((q true 2S 0 )^ (s l ! q true 2 0 )) 2. ((s 2S 0 )^ (s l ! q2 )^ (l2T )), ((q false 2S 0 )^ (s l ! q false 2 0 )) 3. ((s 2S 0 )^ (s l ! q2 )^ (l = 2I [T )), ((q 2S 0 )^ (s l ! q 2 0 )) Intuitively, the cloning duplicates each MTS state s into state s true with the uent's evaluation true and state s false with the uent's evaluation false. A transition from a state with a uent's evaluation true goes to a state with the uent's evaluation false only for a uent-terminating event. The converse case requires a uent-initiating event. All other transitions do not change the uent evaluation. 76 The following three sections describe the solutions for handling system-level transi- tion renement, execution-tracking state renement, and uent-based state cloning by propagating them to component-level renements of matching types. 5.2 Distributing Transition Renement Each transition in a system MTS relates to a transition in one or more component MTSs, depending on whether the event is shared by multiple components. Requiring or pro- hibiting a system transition should be re ected by requiring or prohibiting transitions in certain component MTSs. Without doing so, costly inconsistencies may arise such as the one that arose during renement of the CoeeMachine MTS (Figure 2.7) into the CoeeMachineRened MTS (Figure 2.9). To reiterate, the performed transition rene- ments (1) prohibited cash as a method of payment in the initial system state s 1 , and (2) required cash payments for coee. However, these renements overlook the fact that ChargingStation cannot observe drink selection: prohibiting a pushCash transition from the initial system state should eectively prohibit cash as a method of payment altogether. Rening a maybe transition to required in a system model implies rening all of the re- lated component maybe transitions to required. This stems from the parallel composition rules, which produce a required system transition only when the synchronized component transitions are required (Figure 2.2). In turn, each transition renement to required in a component MTS may trigger the parallel composition rule RR (Figure 2.2) for some of the system MTS transitions. Therefore, once a transition renement is propagated to the components, the resulting renements are propagated back to the system MTS. 77 RefineToRequired(tr M:s l ! m M:s 0 ) 1 refine M:s l ! m M:s 0 to M:s l ! r M:s 0 2 for each MTS M i 2M:comp 3 if (M:s Mi =M i :p)^ (M:s 0 Mi =M i :p 0 )^ (9 M i :p l ! m M i :p 0 ) 4 refine M i :p l ! m M i :p 0 to M i :p l ! r M i :p 0 5 for each M:q2M i :p M , M:q 0 2M i :p 0 M s.t.9(M:q l ! m M:q 0 ) 6 if :9M j s.t. (M:q Mj l ! m M:q 0 Mj ) 7 refine M:q l ! m M:q 0 to M:q l ! r M:q 0 RemoveFromComposite(tr M:s l ! m M:s 0 ) 1 remove M:s l ! m M:s 0 2 for each MTS M i 2M:comp 3 if (M:s Mi =M i :p)^ (M:s 0 Mi =M i :p 0 )^ (9 M i :p l ! m M i :p 0 ) 4 add(candidateRef, M i :p l ! m M i :p 0 ) 5 for each tr M i :p l ! m M i :p 0 2 candidateRef 6 if size(candidateRef ) = 1 or controls(M i ,l) 7 remove M i :p l ! m M i :p 0 8 for each M:q2M i :p M , M:q 0 2M i :p 0 M s.t.9(M:q l ! m M:q 0 ) 9 remove M:q l ! m M:q 0 Figure 5.1: Handling a transition renement in a system MTS. Algorithm RefineToRequired in Figure 5.1 outlines the solution to requiring a transition in a system MTS M. Let t = M:s l ! m M:s 0 be a transition that is rened; initially, t is modied to required (line 1 in RefineToRequired). In the subsequent steps,RefineToRequired iterates through the component MTSs (lines 2-7) to nd the transitions that producet (condition in line 3). All such transitions must become required (line 4). After a transition in a component MTS is rened, the algorithm explores whether additional maybe transitions in the system model should become required (lines 5-7). Prohibiting a system's maybe transition is realized by prohibiting a maybe transition in a component model. The algorithmRemoveFromComposite outlines this procedure with conceptual similarities to RefineToRequired. RemoveFromComposite rst 78 neg select User Drink Selection Charging Station putCash coffee Figure 5.2: A negative scenario incurred by rening CoeeMachine. detects the components whose transition may be aected by the renement (lines 2{ 4). If only one component with a maybe transition to be rened is identied (the rst condition in line 6), that maybe transition is prohibited. Otherwise, based on the notion that each component is responsible for certain events, the solution prohibits a transition in the component that controls the event of the removed transition (the second condition in line 6 of RemoveFromComposite). As an example of applying the solution, consider prohibiting CoeeMachine's tran- sition s 1 putCash ! m s 4 . RemoveFromComposite propagates this renement to Charging- Station, which incurs additional modications to the behavior of CoeeMachine. One of these modications is depicted as a negative scenario discarding cash as a method of payment for coee (Figure 5.2). Importantly, by propagating the renement to the component level, the framework prevents the otherwise possible incorrect renement of CoeeMachine to CoeeMachineRened described above. 5.3 Distributing Execution-Tracking State Renement Rening a state in a system model implies that some states in the component mod- els should consequently be rened. Without performing the matching renements at the component level, it would remain unclear whether the requirement is realizable as 79 specied, while costly inconsistencies may be introduced. The solution proposed in this section identies the component states to be rened, and performs transition renements to require or prohibit behaviors as dened in the scenario. The candidate components that may be aected by the system renement are au- tomatically identied based on the event whose execution is being tracked. For ex- ample, the state CoeeMachineStep12:s 7 from Figure 2.9 is rened to track the ex- ecution of CoeeMachineStep12:s 0 6 chargeCo ! m CoeeMachineStep12:s 7 according to the DispenseScenario uTS from Figure 2.8. The solution identies that this renement should be re ected in both component state DrinkSelection:s 4 and component state ChargingStation:s 3 . Analyzing an execution-tracking state renement in isolation sug- gests that the renement should be propagated to all of the components that share the tracked event. However, as discussed next, this is not true in general. Rening a component's state to track a scenario step depends on the whether that component participates in the remainder of the scenario. In particular, a component's state need not be rened when that component's role in the scenario ends with the current step. This occurs when rening CoeeMachineStep12:s 8 to CoeeMachineRened:s 0 8 and CoeeMachineRened:s 00 8 to track the execution of conrm from CoeeMachineRened:s 0 7 . In this case, ChargingStation's state is not rened to track the execution of conrm because conrm is ChargingStation's nal event in DispenseScenario from Figure 2.8. In contrast, an execution-tracking state renement is performed for those components that participate in the subsequent steps of the scenario. For this reason, the renement of CoeeMachineStep12:s 8 to track DispenseScenario's event conrm is propagated to DrinkSelection. Once the identied component states are rened, the framework renes 80 RefineSysState(MTS M,tr M:q curr:name ! M:s,set changes,ts Scen,ev curr) 1 for each MTS M i in curr:components 2 if not lastEvent(M i ; Scen; curr) 3 add(toRene;M i ) 4 for each MTS M i in toRene 5 M i :p =M:s Mi , M i :n =M:q Mi 6 refine M i :p to track (M:n curr:name ! M:p) into M:p 0 ;M:p 00 7 for each (M:r2M i :p:sys;M:r 0 2M i :n M ) s.t.9(M:r 0 curr:name ! m M:r) 8 refine M:r to track (M:r 0 curr:name ! M:r) into M:r 0 ;M:r 00 9 for each change ch in changes 10 if ch:type = require then RefineToRequired(ch:tran) 11 else if ch:type = remove then RemoveFromComposite(ch:tran) Figure 5.3: Handling execution-tracking state renement in a system MTS. the transitions outgoing from the execution-tracking system state using the solution from Section 5.2. Such a renement is performed to require makeCo from the rened state CoeeMachineRened:s 0 8 . The algorithmRefineSysState, depicted in Figure 5.3, details the solution. Initially, RefineSysState determines the components to be rened (lines 1{3). RefineSysState then propagates the desired system state renement to the component level by perform- ing component state renements (lines 5{6). These component-level renements split the component state to track the current scenario event curr. The component state re- nement is propagated back to the system model by rening the related system states (lines 7{8). Finally, RefineSysState performs the transition renements (lines 9{11) as implied by the scenario. To validate the system's behavior after the introduction of a new scenario, an engineer should analyze either the MTSs produced using RefineSysState or the emergent posi- tive or negative scenarios. The purpose of this validation is not to ensure that the scenario was properly mapped to the component level, but to analyze whether the requirement was 81 appropriate. This risk is particularly prominent for universal scenarios [84], which pro- hibit every alternative to the main chart once the prechart occurs. In certain instances, such a strong requirement results in deadlocked components, in order to prevent any scenario-violating event. Once such undesired emergent restrictions are detected using the obtained MTSs, the original requirement should be revised or discarded. 5.4 Distributing Fluent-Based State Cloning Cloning the system MTS states to track a newly specied uent should intuitively be re ected in cloning the states of some component MTSs to track the component-level \versions" of the uent. If a component shares all of the uent's initiating and terminating events, the component's states can be directly cloned to track the uent. By contrast, if no component shares all of the uent's initiating and terminating events, it may be impossible to require or prohibit behaviors exclusively for a selected |true or false| uent evaluation. This, however, contradicts the purpose of specifying the uent. As an example, consider an extension of Coee Machine system (Section 4.2.4): the revised system reports to a central server, via ChargingStation's new highFunds event, that the cash container should be emptied. The new event is added to the CoeeMachine's MTS as a self-transition on highFunds in every state. Additionally, a new uent Transaction, initiated by putCash or putCard and terminated by makeCo or makeCapp, is specied and used to prohibit highFunds during ongoing transactions. A portion of the system MTS, obtained via renements of the CoeeMachine MTS ac- cording to the new requirement, is depicted in Figure 5.4. The dashed lines denote 82 S1 S2 select putCard?, putCash? S4' S5' select putCard?, putCash? S8' makeCoff?, makeCapp? highFunds, lowLevel highFunds, lowLevel highFunds, lowLevel highFunds, lowLevel Figure 5.4: Part of the Transaction-tracking rened CoeeMachine. the prohibited transitions and the gray states denote that Transaction = true in those state. While this MTS renement is technically sound, it is inconsistent with respect to the system components. In particular, ChargingStation cannot support highFunds in CoeeMachine:s 1 and, at the same time, prohibit highFunds in CoeeMachine:s 0 8 . This is because ChargingStation cannot track the current value of the uent: despite be- ing able to observe Transaction-initiating events putCash or putCard, it cannot observe Transaction-terminating events makeCo and makeCapp. More generally, a uent whose initiating and terminating events are not included in any one component's alphabet can be considered unenforceable. This stems from the notion that, in such a case, no one component's event can be required or prohibited ex- clusively for a selected uent evaluation: the uent evaluation can change by the uent's initiating and terminating events that are not a part of a component's alphabet. Hence, an unenforceable uent cannot be used to specify accurate restrictions on the system be- haviors. To avoid unenforceable uents altogether, the renement distribution framework permits only uents whose initiating and terminating events belong to one component. To revise a system uent that cannot be mapped appropriately to the component 83 MTSs, the framework suggests potential new uents, initiated and terminated by a single component's events, to replace based on a set of heuristics. To propose candidate new uents, the framework considers that an event that aects the uent's value, but is not in a component's alphabet, is often followed by or preceded by an event that initiates or terminates the implicit, unspecied \local" uent of that component. If a component M i shares an unenforceable uent's terminating events, the framework proposes a new uent local to that component. The terminating events of this new uent are the same as those of the original uent. The initiating events, however, need to be selected by an engineer from the following heuristically identied events: (1) M i 's events that follow the initiating event of the original uent in some scenario and (2) M i 's events that follow the initiating event of the original uent in the system MTS. The framework similarly proposes a new uent for each component M i that shares an unenforceable uent's initiating events. The distinction compared to the above case is that it is now necessary to identify the candidate terminating events from: (1) M i 's events that precede the terminating event of the original uent in some scenario and (2) M i 's events that precede the terminating event of the original uent in the system MTS. To illustrate how the renement distribution framework handles uents, consider the Transaction uent that could not be tracked by ChargingStation because makeCo and makeCapp are not in ChargingStation's alphabet. Both makeCo and makeCapp are preceded by conrm in the CoeeMachine MTS from Figure 2.7, and conrm precedes makeCo in DispenseScenario from Figure 2.8. The framework thus proposes a re- vised uent, referred to as Transaction ChargingStation , that has putCash and putCard as its initiating events and conrm as its candidate terminating event, as identied by the 84 above heuristics. The revised uent is then used to rene the ChargingStation MTS appropriately. Note that an engineer need not accept the proposed uents | the pro- posed uents can be discarded or modied based on the desired system behaviors. For example, chargeCo and chargeCapp would be specied as the terminating events for Transaction ChargingStation if the stakeholders consider transaction to be complete once the charge for a beverage is committed. 85 Chapter 6 Trace-Enhanced MTS Inference A promising way to enable checking consistency between a software library's implementa- tion and its original requirements and design is to automatically mine accurate specica- tions from existing uses of those libraries [106]. Such a specication should, for example, assert that communication libraries (e.g., Apache's [3] libraries) require a connection to be established before data is sent. Numerous techniques have been proposed to that end [7,8,20,23,36,41,66,67,81,103]. These techniques either (1) infer nite state machine models that match the observed executions [7, 8, 19, 41, 66, 80, 100], or (2) identify the declarative class and method invariants by considering a library's state (i.e., its internal variables) [20,36,102]. While the artifacts produced by these techniques can assist development tasks (e.g., test case generation [22,67] and debugging [8]), their utility critically depends on how close they are to the \true model." This, in turn, depends on the quality of the input executions, which are inherently partial. It has been observed previously that techniques that infer LTSs from observed executions suer from inaccuracies [64, 78, 79]. On the other hand, with one notable exception [29], techniques that identify invariants capture restrictions 86 on sequences of method invocations only implicitly. Moreover, automated verication of compliance with requirements and design or protocol enforcement techniques that rely on these models would likely report many false positives. Recent research results [22, 66, 67] provide preliminary evidence that some of the existing inaccuracies may be circumvented by combining the dierent (state vs. sequence) types of execution information. The previous eorts have combined this information in a limited manner. The state-of-the-art technique [67] constructs LTS-based protocol descriptions by combining (1) LTS inference from observed executions and (2) invariants over the library's internal state. However, this technique considers only limited subsets of executions. The work presented in this chapter relies on the hypothesis that other unexplored ways of combining state and sequence information have the potential to result in more-complete models, in terms of the supported valid protocol sequences, and richer models, with respect to the types of the captured information. This chapter focuses on a strategy that augments program state information with observed execution evidence. To this end, the chapter proposes trace-enhanced MTS in- ference (TEMI ), an algorithm that infers an MTS that includes (1) the behavior observed in the traces and (2) the behavior asserted legal by the invariants. TEMI consists of two phases. The rst phase constructs an MTS with only maybe transitions, capturing all invocation sequences of an object's interface allowed by the invariants. This model is referred to as an invariant-based MTS. The second phase promotes transitions observed in the traces from maybe to required. The remaining maybe transitions stem from gen- eralizations performed during invariant inference. The maybe vs. required dichotomy is particularly useful when working with a limited set of invocation traces, which may result 87 in imperfect invariants. TEMI is conceptually similar to the heuristic MTS synthesis al- gorithm from Chapter 3. The main dierences are that TEMI considers richer invariant types that include non-Boolean variables, and incorporates a dierent renement strategy to account for runtime traces that are denser than scenario requirements. The evaluation results, which are presented in Section 7.4, conrm that TEMI models have signicantly improved quality compared to models generated using the existing state- of-the-art techniques. The remainder of the chapter is organized as follows. Section 6.1 overviews the information available at runtime and the basic algorithm for synthesizing runtime models. Sections 6.2 and 6.3 detail the two TEMI phases. Section 6.4 discusses additional enhancements to the algorithm and the generated models. 6.1 Preliminaries This section rst discusses how the widely used k-tail algorithm infers an MTS from invo- cation traces (Section 6.1.1). Then, it discusses dynamic inference of program invariants (Section 6.1.2). Finally, it denes program state in terms of the predicates that appear in the invariants (Section 6.1.3). 6.1.1 Invocation Traces and the k-tail Algorithm The k-tail algorithm [10] has been widely as the basis of existing state-of-the-art in dy- namic model inference [19,64{67,80,100]. The input to k-tail is a set of observed execution traces. An invocation trace | a runtime recording of method invocations and internal 88 data values between those invocations | can be represented by an MTS, with states cor- responding to variable values and transition labels composed of the method name, input values, and return value. The k-tail algorithm concisely captures a library's API protocol by merging invocation trace states to create an LTS. The algorithm merges every pair of states with identical sequences of the nextk invocations (hence \k-tail"). k-tail looks only at the invocation se- quences and does not consider the state information. Selecting an appropriatek typically involves a tradeo between precision (smaller k implies more spurious merges due to the limited scope) and completeness (larger k implies fewer merges, and less generalization) of the generated model. To illustrate the k-tail algorithm, and two of its shortcomings, consider ve StackAr invocation traces, corresponding to creating and using stacks of dierent capacities (Fig- ure 2.10). Consider how 2-tail, which considers the method return values, works on these traces. The algorithm correctly merges states S 1 and S 6 in Trace 1 because the two fol- lowing invocations from each state are isEmpty()=true and top()=null. The algorithm also merges states S 1 in Trace 1 and S 11 in Trace 2. However, this merge is incorrect as it allows a non-zero-capacity stack to change capacity to zero after an invocation of isFull() from S 10 to S 11 in Trace 2. The k-tail models are also incomplete for those traces in which pairs of methods happen, but are not actually required, to be invoked in a specic order (e.g., top() is always invoked after isEmpty() in Figure 2.10). The existing k-tail-based techniques (e.g., [66,67,80]) all suer from these types of limitations. 89 6.1.2 Program Invariants Program invariants are mathematical properties that relate program variable values. In- variants hold true at certain program execution points. For example, an object-level invariant, such as size 0 holds at all program points. While developers can manu- ally specify program invariants in the code or in other documentation [29,77], static and dynamic analyses can automatically infer invariants. Such inferred invariants are thus re ective of the actual program implementation. TEMI the Daikon dynamic invariant detector [36] to observe data values of program executions and infer invariants that hold over all observed executions. The inferred invariants consist of method pre- and postcon- ditions and object invariants. For example, Figure 2.11 illustrates the invariants inferred for StackAr library introduced in Section 2.4.4. Contractor [29] creates LTS models exclusively based on program invariants. Con- tractor uniquely characterizes the model's states by a combination of methods enabled in each state. This abstraction is thus referred to as enabledness abstraction. A transition on a method exists between two states if that method's full precondition is satisable in the source state, and postcondition is satisable in the target state. 6.1.3 Program State A program's concrete state can be dened by the values of the program's variables at a given snapshot in the program's execution. However, for non-trivial programs, there exist intractably many concrete states. Therefore, TEMI considers abstract program states [23,109], which are dened with rst-order predicates over program variable values: 90 GenerateInvariantMTS(set pred, inv-set invariants) 1 MTS invariantMTS, set toProcess; isProcessed =; 2 set predCombinations = Combine(pred) 3 for each combination2 predCombinations 4 if SMT-isConsistent(combination) 5 add combination to invariantMTS:states 6 add invariantMTS:initSt to toProcess 7 while toProcess6=; 8 currentSt = toProcess:pop 9 add currentSt to isProcessed 10 for each methodInv2 invariants 11 if SMT-isConsistent(methodInv:pre^ currentSt) 12 for each targetSt2 invariantMTS:states 13 if SMT-isConsistent(methodInv:post^ targetSt) 14 if targetSt = 2 isProcessed add targetState to toProcess 15 add currentSt methodInv:name ! m targetSt to invariantMTS:transitions 16 return invariantMTS Figure 6.1: Constructing of an invariant-based MTS. Dierent program states correspond to dierent combinations of predicate evaluations. TEMI uses those predicates that appear as clauses in method preconditions. This abstraction yielded a tractable number of states in practice. For StackAr, Daikon reports preconditions with four predicates: P 1 = (topOfStack 1), P 2 = (topOfStack 0), P 3 = (topOfStack<size(theArray) 1), and P 4 = (topOfStack size(theArray) 1). This results in 2 4 = 16 possible program states, plus one initial state. However, the number of actual program states is generally much smaller due to invalid predicate combinations. For example, P 2 cannot be true when P 1 is false. 6.2 Phase I: Synthesis of the Invariant-Based MTS The method GenerateInitialMTS in Figure 6.1 synthesizes an invariant-based MTS. This MTS describes all invocation sequences that do not violate the methods' inferred invariants. GenerateInitialMTS starts by constructing the prospective state space 91 invariantMTS:states (lines 3{5) based on the set pred of predicates from method pre- conditions. For each possible combination of the predicate evaluations (line 2), Gener- ateInvariantMTS uses the Yices Satisability Modulo Theory (SMT) solver [108] to check whether that combination is legal (line 4). For StackAr, Daikon inferred 4 pred- icates (recall Section 6.1.3); Yices rejects as unsatisable every predicate combination with P 1 = false and P 2 = true. After determining the valid states, GenerateInvariantMTS creates transitions between those states (the loop in lines 7{15). Each transition added in line 15 has a source state that satises the appropriate method preconditions and a destination state that satises the postconditions. Yices (via theSMT-isConsistent calls on lines 11 and 13) determines the satisability of the method invariants. Figure 6.2 depicts the invariant-based MTS for StackAr. Although the largest theo- retical state space for StackAr is 1 + 2 4 = 17 states, the generated invariant-based MTS has only 5 states, including an initial state. There are several self-transitions that capture methods that do not change the program state. By contrast, the k-tail-based algorithms implicitly consider every method to be state changing. TEMI's construction of the invariant-based MTS is conceptually similar to Contrac- tor's algorithm [29] as it uses a predicate-based abstraction of the program state. In contrast to Contractor, whose predicates are the full method preconditions, TEMI uses the individual clauses that appear in the invariants. The reason behind choosing this ner- grain abstraction is that the automatically inferred postconditions tend to be more com- plex than the manually written ones [77]. For example, consider a method postcondition that consists of a set of implication clauses that relate the program state before and after 92 S0 S4 S3 S2 S1 push? ET?, FT?, Emp?, TN?, TPN? ET?, FF?, Emp?, TN?, TPN? push?, EF?, FF?, TV?, TPV? EF?, FT?, TV? TPV? Emp?, TPV? push? StackAr? StackAr? push? Emp?, TPV? Figure 6.2: The invariant-based StackAr MTS. a method invocation: [(PreState 1 )) (PostState 1 )]^:::^ [(PreState n )) (PostState n )]. The states in the TEMI's invariant-based MTS would correspond to the dierent states PreState 1 ;:::; PreState n , and each state would have a transition to its appropriate next state PostState 1 ;:::; PostState n . On the other hand, Contractor's model would have a single state corresponding to all the pre-states with non-deterministic transitions to the post-states, thus resulting in a less precise model. 6.3 Phase II: Rening the Invariant-based MTS TEMI uses observed trace information to rene the invariant-based MTS by promoting to required those maybe transitions that correspond to observed invocations. RefineIn- variantMTS in Figure 6.3 describes this renement algorithm. A direct approach to incorporating trace information into the invariant-based MTS is to simulate the traces on the MTS and promote each traversed maybe transition (denoted with `?' on the label) to a required transition (without `?'). However, this approach can result in imprecisions because states may be visited multiple times, and the produced 93 RefineInvariantMTS(MTS invMTS, set traces) 1 currentSt = invMTS:initialState 2 for each currentEv2 traces 3 for each nextSt2 invMTS:states 4 if SMT-isConsistent(nextSt^ currentEv:post)^currentSt currentEv ! m nextSt 5 RefineState(currentSt,nextSt,currentEv,invMTS) 6 currentSt = nextSt 0 7 for each st 1 2 invMTS:renedSt 8 for each st 1 l ! m st 2 2 invMTS:transitions 9 if9st 1 l ! r st 3 remove st 1 l ! m st 2 10 for each state 1 ,state 2 2 invMTS:renedSt 11 where state 1 :programSt = state 2 :programSt 12 if st 1 :outTrans st 2 :outTrans Merge(st 1 , st 2 ) 13 return invMTS RefineState(state currentSt, state nextSt, event currentEv, MTS invMTS) 1 if currentSt = nextSt require currentSt currentEv ! r nextSt 2 else 3 add nextSt 0 to invMTS:states, rename nextSt to nextSt 00 4 replace currentSt currentEv ! nextSt 00 with currentSt currentEv ! r nextSt 0 5 for each nextSt 00 l ! otherSt in invMTS:transitions 6 if nextSt 00 6= otherSt add nextSt 0 l ! otherSt to invMTS:transitions 7 else add nextSt 0 l ! nextSt 0 to invMTS:transitions Figure 6.3: Rening the invariant-based MTS according to the traces. model would not distinguish between the dierent visits. This can \stitch" together a required transition from one trace to a required transition from another trace, resulting in an invocation ordering that never occurred. For example, consider the direct renement of the invariant-based MTS from Figure 6.2 based on the StackAr traces from Figure 2.10. The resulting MTS allows a spurious sequence in which | based on Trace 2 | push(x) from an empty stack (state S 1 ) to a full stack (S 3 ) is followed | based on Trace 3 | by topAndPop() to a partially full stack (S 2 ). To avoid such issues,RefineInvariantMTS enhances the direct renement strategy by also rening the visited states (the renement process is captured in lines 2{6). When RefineInvariantMTS visits a state in the invariant-based MTS, it rst splits it into 94 two states (line 5, at which RefineState, shown in the bottom portion of Figure 6.3, is called). The rst rened state (nextSt 0 in RefineState) has only one incoming tran- sition, which corresponds to the currently processed trace invocation (currentEv). The second rened state (nextSt 00 ) keeps the remaining incoming transitions of the original state. Each state keeps all of the original outgoing transitions and self-transitions (lines 5{7 inRefineState). The state splitting enables each state to express dierent behavior according to the incoming transitions. This renement strategy is more conservative (i.e., introduces less required behaviors) than the one used in heuristic MTS synthesis. The reason for this is that the number of traces is typically much larger than the number of scenarios; hence, each individual trace should be used to introduce very limited required behavior into the MTS. As an example of trace-based renement, StackAr's MTS depicted in Figure 6.4 is obtained by rening the invariant-based MTS from Figure 6.2 with the traces from Fig- ure 2.10. The push(x) invocation fromS 5 toS 6 in Trace 2 (Figure 2.10) splits StackAr's invariant-based state S 3 (Figure 6.2) into S 0 3 and S 00 3 (Figure 6.4). S 0 3 is reachable from empty-stack states, while S 00 3 is reachable from a partially full stack. Furthermore, the transition S 0 1 push ! r S 0 3 is promoted to required since it has been observed in the traces. Once the MTS is rened according to the traces, for each state with non-deterministic transitions on a method, if some of those transitions are observed (required) and others are unobserved (maybe), RefineInvariantMTS removes the unobserved transitions (lines 7{9). By doing so,RefineInvariantMTS relies on trace information to resolve outgoing non-determinism that frequently stems from imprecise and incomplete invariants. In particular, an invariant can have (1) weak predicates that permit illegal transitions and 95 S0 S4' S3' S1' push ET, FT, TN, TPN, Emp ET, FF, Emp, TN, TPN EF, FT, TV StackAr StackAr S1'' ET, FF, Emp?, TN, TPN TPV, Emp push S1''' S2 push, EF?, FF?, TV?, TPV push ET?, FF?, Emp?, TN?, TPN? S3'' TPV push EF?, FT?, TV? TPV, Emp push Emp Figure 6.4: The StackAr MTS rened with invocation traces. (2) missing predicates whose absence requires state renement and further distinction of the rened states in terms of their outgoing behavior. For example, StackAr's rened MTS in Figure 6.4 does not have a maybe transition fromS 0 3 ontopAndPop() to a partially full stack state S 2 . This is because such behavior was never observed. By contrast, a direct transition to an empty stack (S 0 3 topAndPop ! r S 00 1 ) was observed in Trace 2. This distinction was not present in the original invariant-based MTS due to topAndPop()'s incomplete postcondition. Finally, RefineInvariantMTS (lines 10{12) merges back states that correspond to the same program state and whose state-changing required transitions are the same. This is done because these states were previously split only to ensure that dierences in their outgoing behavior were captured. For example, when RefineInvariantMTS processes Trace 3 from Figure 2.10, it initially splits state S 2 in the invariant-based MTS from Figure 6.2 into a state with an incoming push(x) transition and a state with an incoming 96 S 0 1 push(x) ! r S 2 size(this.theArray[]) >= 2 x != null post(this.theArray[this.topOfStack]) = x S 2 topAndPop()!=null ! r S 000 1 size(this.theArray[]) >= 2 this.topOfStack = 0 return = this.theArray[this.topOfStack] Figure 6.5: A subset of StackAr transition invariants. topAndPop() transition. However, when the trace processing ends, their behavior does not dier, so they are merged back into S 2 (Figure 6.4). 6.4 Algorithm Extensions To further enhance the synthesized SEKT and TEMI models, the transitions are deco- rated with transition-specic invariants, such as invariants related to method parameters. Moreover, to avoid noise resulting from less meaningful invariants, a set of invariant lters is employed. These extensions, described next, make the resulting models more meaning- ful. Furthermore, TEMI's implementation contains additional optimizations that were introduced to improve the algorithms' scalability. The aim of transition invariants in a nal MTS is to represent information about the values of the program variables and method parameters specic to the execution of a particular transition [67]. For each required transition that was executed more than once, Daikon is used to compute transition invariants. The reported invariants dene: (1) the values of method parameters, (2) the relationship between the method parameters and the program variables, and (3) the values of program variables that are dierent from 97 those reported in the full method invariants. Figure 6.5 depicts some of the transition invariants obtained for StackAr's nal MTS from Figure 6.4. The invocations of push(x) leading from an empty stack (S 0 1 ) to a partially full stack (S 2 ) had a stack of size larger than 1 and a non-null parameter x, which is inserted at the top of the stack. Similarly, invocations of topAndPop() leading from a partially full stack to an empty stack had the top-of-the-stack pointer (topOfStack) pointing to the remaining element. To limit the number of program states and lter out unnecessary restrictions in the models, TEMI's implementation selects a subset of the invariants detected by Daikon [21, 36]. Invariant ltering is needed in approaches that use dynamically inferred invariants, since these invariants tend to be more complex than the manually specied ones [77]. TEMI considers the relational invariants on boolean and integer variables (e.g., IntEqual, IntGreaterThan), and IsNull invariant on objects. Further, TEMI considers internal variables up to a depth of one (i.e., an object's elds are considered, but not those elds' elds). For collections, TEMI's implementation considers their sizes but not their elements (the elements of a collection are considered only in transition invariants). Note that the this set of invariants was established empirically. To describe system states, TEMI only considers those predicates that occur in the inferred program invariants (recall Section 6.1.2). However, that space can still be expo- nentially large in the number of predicates. In order to reduce the computational time of Yices, and in turn of TEMI, two additional optimizations are applied. To reduce the num- ber of Yices queries, TEMI's implementation avoids evaluation queries when matching a particular point of the invocation trace with a program state: simulating the trace on the invariant-based MTS reveals the program state when a deterministic transition captures 98 the invocation; a query is only necessary in the case of non-determinism for the particular invocation. Finally, TEMI avoids querying Yices for satisability of every combination of predicate evaluations when constructing the invariant-based MTS (Figure 6.1 depicts the non-optimized algorithm). Instead, it incrementally prunes inconsistent combinations by recursively adding individual predicate evaluations and checking their satisability. Once a set of predicate evaluations is unsatisable, considering its supersets is unnecessary. 99 Chapter 7 Evaluation The techniques proposed in Chapters 3{6 have been designed according to the hypotheses of Section 1.2. This chapter tests the hypotheses via theoretical analyses, case studies, and quantitative evaluations of the four respective techniques and their implementations. To theoretically evaluate the hypotheses, the chapter contains both algorithm com- plexity analyses [50] and proofs based on the formal theories presented in Chapter 2. In contrast to the running examples of Section 2.4, the case studies used in this chapter involve more complex system specications and aim to demonstrate how the four tech- niques perform in real-world applications. Finally, the quantitative evaluations measure the complexity, scalability, and practical benets of the four techniques. These evalua- tions have been conducted on real-world specications when feasible, and on generated specications otherwise (e.g., analyzing scalability requires several large-scale specica- tions, which are often not readily available). The following sections respectively evaluate the four hypotheses of Section 1.2. 100 7.1 Hypothesis 1 { Heuristic MTS Synthesis Heuristic MTS synthesis (Chapter 3) has been designed to support Hypothesis 1 of this dissertation: \A technique that synthesizes component MTSs can be devised such that each synthesized MTS allows all behaviors that follow the component's event invariants, excludes all behaviors that violate the component's event invariants, and contains only those required transitions that can be mapped to a scenario event. The component- level synthesis will scale to specications that cannot be handled by a similar system- level approach due to the size of the system being specied". Therefore, to conrm Hypothesis 1, heuristic MTS synthesis must (1) satisfy the correctness conditions stated in the hypothesis and (2) achieve the expected scalability. The technique's correctness has been analyzed via theoretical analyses presented in Section 7.1.1. The technique's scalability has been analyzed both theoretically and em- pirically. The theoretical component presented in Section 7.1.2 analyzes the worst-case complexity and the expected practical complexity of the technique. The empirical com- ponent presented in Section 7.1.3 uses a set of generated specications to measure the runtimes of the technique's implementation. Finally, based on the evaluation results, Section 7.1.4 determines the validity of Hypothesis 1. 7.1.1 Corectness of Heuristic MTS Synthesis The correctness of heuristic MTS synthesis requires that each synthesized component MTS \(1) allows all behaviors that follow the component's event invariants, (2) excludes all behaviors that violate the component's event invariants, and (3) contains only those 101 required transitions that can be mapped to a scenario event". Section 7.1.1.1 proves that a generated initial component MTS satises the rst two conditions. Section 7.1.1.2 proves the third condition by proving that the nal MTS generation correctly renes the initial MTS; the rst two conditions still hold because nal MTS generation does not remove transitions. 7.1.1.1 Corectness of Initial MTS Generation Theorem 1 (Allowed Behaviors). For component C's execution trace tr =e 1 :::e n that does not violate C's event invariants, the initial MTS M C generated for C simulates tr. Proof. (by induction) If the trace tr comprises a single event e 1 that does not violate C's invariants, the initial MTS M C will have a transition s 0 e ! p s 0 because lines 7-14 of algorithm GenInitialMTS (Figure 3.4) create outgoing transitions for every event ev whose preconditions are satised in s 0 . Assume that M C can simulate every valid trace of length n. The theorem will thus hold if for every trace tr of length n + 1 that does not violate C's event invariants, M C can simulate tr. Let state s be the state reached by tr's subsequence of length n and e n+1 be tr's (n + 1) st event. According to the conditions in lines 8-10 of algorithm GenInitialMTS, the algorithm will create a transition for every evento whose invariants hold in s, which includes event e n+1 . Theorem 2 (Prohibited Behaviors). For component C's execution trace tr = e 1 :::e n for which at least one element of the trace would violate C's event invariants, the initial MTS M C generated for C cannot simulate tr. 102 Proof. (by induction) Assume that M C can simulate every valid trace of length n. The theorem will thus hold if for every trace tr of length n + 1 that would violate C's event invariants and e n+1 , tr's (n + 1) st event, is the violating event, M C will not be able to simulate tr. Let state s be the state reached by tr's subsequence of length n. According to the conditions in lines 8-10 of GenInitialMTS, the algorithm will create a transition for every event o whose invariants hold in s, which excludes event e n+1 . 7.1.1.2 Correctness of Final MTS Generation Theorem 3. The initial MTS and the nal MTS produced by the presented algorithm share the strong renement relation. Proof. (by induction) Base case: By denition of strong renement (Denition 8), the initial MTS is its own strong renement. Inductive Hypothesis: Given an MTS M, after a single iteration of the algorithm from Figure 3.9 on a transition t (the loop in lines 7{19 of GenerateFinalMTS), the produced MTS N is a strong renement of M. Because the strong renement relation is transitive, proving the inductive hypothesis implies that the nal MTS is a strong renement of the initial MTS. The proof will be based on constructing a relation between states in M and N. This relation depends on the transition t. If t is a required transition (lines 10{11 of Gener- ateFinalMTS), N is identical to M, which preserves the strong renement relation. If t is a potential transition, s op ! p s 2 , and there exists a required incoming transitions into 103 s 2 labeled with op (lines 13{15 of GenerateFinalMTS), then N has a single potential transition modied to a required transition, and thus N is a strong renement of M. The nal MTS N, by construction (ensured by Refine), may have only two distinct types of states: type 1 states have only incoming potential transitions, and type 2 states have at least one incoming required transition and all the incoming transitions are labeled with the same event. Dealing with a traversed transitions with a destination state of type 1 is already discussed above. Refine and lines 16{18 of GenerateFinalMTS take care of traversed transitions t: s op ! p s 2 that reach type 2 states and ensure thatR =f(s i ;s i )js i 6=s 2 g[f(s 2 ;s 0 2 ); (s 2 ;s 00 2 )g, which as will be shown below, is a strong renement relation. For all the MTS states that do not have a transition tos 2 , no modications are made, soR is a strong renement for those states. For each state s j , s j 6= s 2 , with outgoing transition to s 2 , the rst rene- ment condition (see Denition 8) is directly satised since type 2 states have no incoming required transition. The second condition of strong renement is satised because: 1. For each transition (s j l ! p s 0 2 ) inN, the corresponding transition (s j l ! p s 2 ) existed in the MTS M (accounted for by lines 5{6). 1 2. For each transition (s j l ! p s 00 2 ) in N, transition (s j l ! p s 2 ) existed in the MTS M (ensured by lines 13{14). Finally, it is necessary to show that (s 2 ;s 0 2 )2 R and (s 2 ;s 00 2 )2 R satisfy the strong renement conditions. This is done by enumerating over all possible transitions to and from those states: 1 Unless otherwise specied, line numbers refer to algorithm Refine. 104 1. For each transition (s 2 l ! r s j ) in M, N has required transitions (s 0 2 l ! r s j ) and (s 00 2 l ! r s j ) (ensured by lines 7{9). This statement satises the rst condition of the strong renement denition. 2. No transition (s 0 2 l ! p s j ) in N, such that s j 6=s 0 2 and s j 6=s 00 2 , violates the second renement condition as transition (s 2 l ! p s j ) existed in M (ensured by line 8). 3. No transition (s 00 2 l ! p s j ) in N, such that s j 6=s 0 2 and s j 6=s 00 2 , violates the second renement condition as transition (s 2 l ! p s j ) existed in M (ensured by line 9). 4. No transition of either the form (s 0 2 l ! p s 0 2 ) or (s 00 2 l ! p s 0 2 ) in N violates the second renement condition as transition (s 2 l ! p s 2 ) existed inM (ensured by lines 10{12). 5. No transition of either the form (s 0 2 l ! p s 00 2 ) or (s 00 2 l ! p s 00 2 ) in N violates the second renement condition as transition (s 2 l ! p s 2 ) existed in M (lines 10{14). Therefore, no transitions in N violate the strong renement condition, and N is a strong renement of M. The nal step to proving correctness of heuristic MTS synthesis is to prove that the introduced required behavior corresponds to the input scenario. This, however, trivially holds as lines 4{20 of GenerateFinalMTS (Figure 3.9) follow the scenario steps. 7.1.2 Complexity of Heuristic MTS Synthesis The scalability of heuristic MTS synthesis would imply that the technique can be used to synthesize models for system specications with increasing numbers of scenarios and components. This section analyzes the worst-case complexity of each synthesis phase, while discussing the practical impact of the complexity. 105 7.1.2.1 Complexity of Phase 1 Let N C be the number of components in the system, N SD be the number of scenarios, N OP be the number of distinct events,N V be the number of domain variables,L SD be the maximum length of a scenario,N SV be the maximum number of a component's signicant variables, and N COP be the maximum number of a component's events (these symbols are used in complexity analysis of each of the algorithm phases). In Phase 1, a single pass through each scenario is made, as well as two passes through each event's constraints. Thus, the worst-case time complexity for this phase is (N SD L SD +N C N OP ). In practice, N SD and N OP are expected to be the largest factors, ranging to up to several hundred, so this phase of the algorithm will execute in time linear in the number of scenarios and distinct events. 7.1.2.2 Complexity of Phase 2 In Phase 2, the state space is gradually built and the necessary transitions are added to the initial MTSs. The worst-case time complexity for this phase is 2 N SV N COP N C . Although exponential, this complexity should not be problematic in practice for several reasons. First, N SV will be small because, in practice, a component will be concerned only with a subset of domain variables, and N SV will not increase with the system size. Second, the maximum number of states 2 N SV is, in practice, signicantly reduced because the constraints will prohibit a number of combinations of variable assignments. 106 7.1.2.3 Complexity of Phase 3 This synthesis phase has the worst-case complexity of (N C N SD L 2 SD 2 N SV ). This worst-case would occur only for complex constraints that include the majority of the signicant variables. Since high-level constraints constructed manually by architects tend to be notably simpler [1], the factor exponential in N SV will, in practice, be polynomial. 7.1.2.4 Complexity of Phase 4 Similarly to Phase 3, the worst-case complexity of Phase 4 is N C N SD L 2 SD 2 N SV . By using appropriate data structures, i.e., hash tables that return matching scenario steps for a given state, this complexity can be reduced to 2 N SV N COP N C . Note that such hash tables would be built as part of Phase 3. As discussed earlier, the constraints' nature is expected to render the practical complexity polynomial in N SV . The overall worst-case complexity of heuristic MTS synthesis is thus (N C 2 N SV N COP +N SD L 2 SD ). As discussed above, while this complexity is exponential, the practical complexity for real-world software specications is expected to be polynomial. 7.1.3 Scalability of Heuristic MTS Synthesis in Practice Theoretical evaluation of the technique's complexity suggests that heuristic MTS synthe- sis will scale to real-world specications. To further test the scalability, heuristic MTS synthesis was implemented in a prototype tool, MTSGen [74]. MTSGen takes a sys- tem's specication in terms of scenarios and properties dened on Boolean variables, and automatically constructs the component-level MTSs. To assess the scalability of the tech- nique, MTSGen was used to generate component MTSs for 100 automatically generated 107 specications. The reason why generated specications were used is the unwieldiness of nding a suciently large number of real-world large-scale specications. The generated specications consisted of 50 components, 300 system events, 200 do- main variables, and 200 scenarios and were used to synthesize models on a mid-range Windows PC. Generating system-level models from these specication, using heuristic MTS synthesis while considering the whole system as a single component, has proven unfeasible as the synthesis lasted in the excess of 60 minutes and would typically ter- minate with an out-of-memory exception. This result is unsurprising given that, in the worst case, the system's state space would comprise 2 200 states. By contrast, the average runtime for generating all 50 component models using MTSGen was only 36 seconds, and the synthesis successfully terminated for each specication. Furthermore, the synthesized component models had 60 states. This is consistent with the specication and resulting model sizes that are expected of real-world systems [101]. 7.1.4 Validity of Hypothesis 1 The evaluations presented in this section conrm Hypothesis 1. In particular, it has been proven that heuristic MTS synthesis synthesizes component models that correctly capture the input specications. This result also conrms that the technique is well suited in case of incremental provision of specications: additional scenarios rene the previously synthesized model, while elaborated invariants result in additionally restricted models. Furthermore, both theoretical and empirical analyses conrmed the scalability of heuristic MTS synthesis to large-scale system specications. 108 7.2 Hypothesis 2 { Component-Aware Triggered Scenarios Component-aware Triggered Scenarios is a language intended to support Hypothesis 2 : \A component-oriented triggered scenario language can be dened such that (1) it has a manageable number of new constructs compared to existing notations, (2) it prevents specication errors that occur when the existing triggered-scenario languages are used, and (3) it leads to a more compact specication with at least 50% fewer scenarios needed to capture the requirements, compared to the existing languages.". To this end, it is necessary to demonstrate that caTS requires a low learning curve (\a manageable number of new constructs"), improved specication correctness (\prevent specication errors"), and higher compactness of the specication (\at least 50% fewer scenarios"). The argument that caTS satises the rst condition stems from its design | caTS has four new constructs (one-sided obligations, context annotations, and alternative event annotations) as compared to over 30 constructs found in state-of-the art languages such as UML sequence diagrams [76]. To determine whether caTS improves the correctness of a specication, this section (1) analyzes the correctness and completeness of the MTS synthesis algorithm (Section 7.2.1), and (2) describes a case study conducted on an indus- trial specication (Section 7.2.2). Furthermore, to prove that using caTS leads to a more concise specications, Section 7.2.2 rst provides a couple of examples of conciseness sav- ings. The conciseness of caTS-based specications is subsequently thoroughly evaluated on a set of generated examples (Section 7.2.3). Section 7.2.4 aggregates the evaluation results to form a nal validity argument for Hypothesis 2. 109 7.2.1 Correctness and Completeness of Synthesis from caTS caTS aims to improve correctness of a system's specication in two ways. First, caTS provides a set of constructs that are hypothesized to capture the intent more accurately. Second, the component MTS synthesis capabilities create an alternative viewpoint, which, in theory, enhances the understanding of the system's behaviors and detection of spec- ication errors. Note, however, that synthesizing component models rst requires those models to be an accurate, correct and complete, representation of the caTS. Theorem 4 (Correctness and Completeness). A componentC's LTSI C satises a caTS component instance CI (I C j = CI) i I C renes the component MTS M C (M C I C ) synthesized in GenerateSystemMTS (Figure 4.6). Proof. (by contradiction) (M C I C )I C j =CI) Assume that there is an implementation I C of M C that does not satisfy CI. This implies that a state I C :p i , which violates CI at some location l, renes a state M C :s j , which can be reached via the triggering context of (l). Since MTS M C is synthesized in GenerateSystemMTS, M C :s j is one of the states s 00 that are created to track the location l (line 9). For each state M C :s 00 that tracks l, the synthesis algorithm removes all transitions that are inconsistent with the alternative event annotation or the universal obligation (lines 13{14), and strengthens transitions labeled with (l) to required for an existential branching or universal obligation (lines 15{16). The caTS satisfaction denition (Denition 25 from Section 4.2.4) implies that I C :p i has either (I ) an event not permitted by the alternative annotation or the universal obligation (conditions 1 and 3 of Denition 25), or (II ) a missing transition that is required by(l)'s 110 obligation (conditions 2 and 3). This creates a contradiction because ifM C I C thenI C :p i has all required behavior of M C :s j thus avoiding the condition (II ) above, and has no behavior forbidden in M C :s j thus avoiding the condition (I ) above. (I C j = CI ) M C I C ) This direction of the proof implies that it is impossible to establish a renement relation between M C and I C due to I C 's states that either (iii) exhibit behaviors that are not possible in M C or (iv) lack behaviors that are required in M C . For each such state I C :p, if I C :p cannot be reached via the triggering context of a caTS event (l) then I C :p cannot satisfy conditions (iii) or (iv) above. The reason for this is that M C 's states that do not track scenario execution are left unchanged and have outgoing maybe transitions on every event. Alternatively, if I C :p is a state reach- able via the triggering context of (l), I C :p still cannot satisfy conditions (iii) and (iv) above. Specically, while I C :p should miss required transitions or have extra transitions compared to a state M C :s that tracks l, GenerateSystemMTS prohibits or requires a transition only if necessary to satisfy a caTS (lines 13{16). Thus to satisfy CI, a renement relation between I C :p and M C :s must exist, causing a contradiction. 7.2.2 Case Study: Philips TV This section discusses the experience with applying caTS to a non-trivial case study, with a focus on how caTS helped to arrive at a correct specication by: (1) accurately expressing component behaviors that pose a challenge for existing languages, (2) concisely specifying behaviors that would otherwise require a number of similar scenarios and logical properties, and (3) eliciting additional requirements based on iterative strengthening of the event obligations and analysis of the synthesized MTSs. 111 eTS Basic Tune User Tuner1 t1.tune Switch Video t1.dropReq blank t1.dropAck t1.newVal Pre1 t1.restore Pre1 = (not T1_Tuning) and (T1_Active) eTS Basic Switch User Switch switch Video blank active_t2 Pre2 unblank Pre2 = (not SignalDropped) and (T1_Active) connActive eTS NestedTune1 User Tuner1 Switch Video t1.dropAck t1.newVal Pre3 t1.restore unblank Pre3 = (T1_Tuning) and (T1_Active) and (T1_WaitingDropAck) TuneV1 User Tuner1 t1.tune Switch Video t1.dropReq blank t1.dropAck t1.newVal Cond1 Cond2 t1.restore unblank Cond1 = (not T1_Tuning) Cond2 = (not S_T1_Tuning) and (T1_Active) TuneV2 User Tuner1 t1.tune Switch Video t1.dropReq blank t1.dropAck t1.newVal Cond1 Cond2 t1.restore unblank Cond5 = (not ValueUpdating) and (WaitingDropAck) Cond6 = (not ValueUpdating) and (PreparingRestore) Cond5 Cond6 t1.tune t1.tune TuneInactive User Tuner1 t1.tune Switch t1.dropReq t1.dropAck Cond1 Cond7 t1.restore Cond7 = (not S_T1_Tuning) and (not T1_Active) and (not Switching) Cond8 = (not T1_Active) and (not Switching) and (WaitingDropAck) t1.newVal Cond8 switch Original Specification caTS-based Specification unblank Figure 7.1: Philips TV scenarios: (top) original models; (bottom) caTS models. The studied specication is an industrial protocol from the Phillips television product line [98]. The initial models and the derived the system requirements were obtained from the case studies performed by Uchitel et al. [92] and Sibay et al. [84] who modeled the system's behavior with a set of existential Triggered Scenarios (eTS) [84, 85], and universal Triggered Scenarios (uTS) [27,84]. The studied product instantiation consists of two tuning components (Tuner1 and Tuner2), a Switch, and a Video component. The two tuners are connected to the screen (i.e., Video) via the Switch. Only one tuner is active (i.e., its picture displayed) at a time. A user can (1) change the tuners' frequency and (2) switch between tuners. This case study is concerned with two protocols that keep the screen blank to avoid icker while either of the two user actions are performed. 112 The case study started with an informal description of the system requirements [84,92], and a set of twelve system-level triggered scenarios [84]. Two scenarios depicted in Fig- ure 7.1, BasicTune and BasicSwitch, capture the basic re-tuning and switching behavior. In [84], these scenarios were a starting point for requirements elicitation, which resulted in ten additional scenarios. The scenarios from [84] were thus transformed into caTS scenar- ios and gradually elaborated to accurately and concisely capture the intended behaviors. This section presents three illustrative examples by foreshadowing the limitations of the existing techniques, and then explaining how caTS helped to alleviate them. Revising BasicTune. The BasicTune eTS [84] captures a sequence in which Tuner1 stores a new frequency value, and asks Switch to blank the screen while the frequency is changing. Once the frequency has been set (t1:restore), the screen should display the picture again (unblank). The system-level BasicTune was rst revised into TuneV1 caTS depicted in Figure 7.1. This revision weakened t1:dropReq and blank arrows from their original form to avoid overspecication at the component-level. For example, keeping blank's arrow existential branching would require a blank transition to exist in Video's every state. However, the requirements extracted from [92] indicate that Video strictly alternates blank and unblank (i.e., blank is not always enabled). The BasicTune eTS was specied to hold if Tuner1 is active and is not already re- tuning. BasicTune's formal precondition (:T1 Tuning^ T1 Active) 2 does not consider that uent T1 Active is outside of Tuner1's scope, while T1 Tuning is unobservable for both Switch and Tuner1. Consequently, the conditions under which Tuner1 and Switch 2 T1 Tuning is the uent initiated with t1:tune and terminated with unblank; T1 Active is initiated with active t1 and terminated with active t2. 113 should respect the scenario event obligations remain underspecied. For example, Tuner1 may behave optimistically by assuming that BasicTune's precondition holds (i.e., unblank has terminated the last t1:tune and Tuner1 is active), and always follow its obligations. Alternatively, Tuner1 may pessimistically assume that BasicTune's precondition does not hold, and ignore its obligations. As it turns out, neither of these interpretations complies with the true requirements. By contrast, a caTS context annotation must exclusively refer to the local uents, thus imposing the components' viewpoint. Based on system requirements, two compo- nent instance preconditions were thus specied in the TuneV1 caTS (Figure 7.1). In this instance, uent T1 Tuning was changed to have t1:restore as the terminating event. In addition, a new uent S T 1 Tuning, initiated with t1:dropReq and terminated with unblank, was dened to specify conditions on Switch's behavior. Notably, TuneV1 pre- cisely species that Tuner1 must adhere to the scenario whenever it is idle (i.e., Cond1 from Figure 7.1 is satised). To conrm that TuneV1 correctly represents the true in- tent, it was validated against the nal LTS models independently produced in [92]. This example illustrates that caTS can accurately specify the desired behavior of the system components thus avoiding prospective errors that may arise from the inaccuracies. Strengthening Tuner1 obligations. Four of the case study scenarios in [84] were produced by analyzing BasicTune's alternative branches. The resulting scenarios spec- ied that a nested re-tuning request may occur in BasicTune only after t1:dropReq. In [84], the authors also discovered that once Tuner1 updates the frequency based on a nested request, Tuner1 should continue, as opposed to restart, the protocol. One of the additional scenarios, the NestedTune1 eTS, is depicted in Figure 7.1. This scenario is 114 S1 S2 t1.tune S3 S4 t1.newVal t1.dropReq t1.restore S5 t1.dropAck S5' t1.tune? t1.newVal? All S4' t1.tune? t1.newVal? All Tuner1 Figure 7.2: Tuner1's MTS obtained from TuneV2. similar to BasicTune as they share the last three events. In NestedTune1 , these events denote protocol continuation. Instead of specifying a set of similar caTS, the additional requirements were incor- porated simply by strengthening Tuner1's lifeline in TuneV1. This process resulted in the TuneV2 caTS depicted in Figure 7.1, which compactly describes the desired Tuner1 behavior. As an example, the context annotations Cond5 and Cond6 specify that the protocol shall be able to resume with t1:dropAck (t1:restore) whenever Tuner1 is in the protocol state WaitingDropAck (PreparingRestore) and any nested re-tuning requests have been processed (the clause:ValueUpdating). Note that the Cond5 annotation pre- cisely captures the same information as NestedTune1 . Based on the requirements from [84, 92], TuneV2 was further enriched with anno- tations that specify t1:tune (a nested re-tuning request) as the only legal alternative to t1:dropAck and t1:restore, and strengthened with the initial event obligations on Tuner1's lifeline. These requirements, whose addition to the caTS was trivial, were not formally captured in the original case study. To validate TuneV2, Tuner1's MTS was synthesized (Figure 7.2), which contains underspecied behavior following a nested re-tuning request. Hence, the specication was rened with a scenario prescribing a universal obligation to generate t1:newVal whenever t1:tune occurs, which was consistent with a scenario elicited in [84]. 115 caTS facilitated concise specication of complex requirements hence reducing the en- gineer's burden. The support for modality mixing in caTS yielded straightforward spec- ication of requirements that may otherwise remain too weak or even be omitted from the specication. Furthermore, the conciseness also led to straightforward specication of requirements that were not even specied originally. Exploring Switch's behavior. Rening Switch's behavior involved specifying addi- tional caTS pertaining to the tuner's switching protocol (whose essence is captured by BasicSwitch from Figure 7.1). The resulting caTS that are not crucial to this discus- sion are omitted and can be found on an external website [14] (the website also contains additional examples of caTS usage). After transforming BasicSwitch into caTS (not shown), scenario's event alternatives were explored using \what-if" questions. Analysis of the scenarios and synthesized MTS models uncovered an important aspect of the system behavior not considered in [84]: simultaneous switching and re-tuning. To explore this behavior, the four possible rela- tionships between switching and re-tuning were studied. The requirement stating that tuner switching preempts active re-tuning but not vice-versa proved to be particularly interesting in that it led to uncovering another behavior, discussed below. In response to this requirement, the previously specied caTS was strengthened with a clause:Switching that was added to the precondition on Switch's lifeline in TuneV2. Further iteration prompted a question regarding Switch's communication with an in- active Tuner1, raised while inspecting a composite system MTS. The models provided in [84] did not explore this behavior, but the authors' prior work [92] did state a re- quirement that Switch should communicate with an inactive Tuner1 according to the 116 basic protocol sequence, while skipping events shared with Video. The resulting caTS TuneInactive (Figure 7.1) contains a context annotation Cond8 that ensures proper con- tinuation of the protocol even when TuneV2 is preempted by tuner switching. The \what-if" questions posed about locations along the caTS lifelines thus helped to elicit relevant new requirements. Furthermore, the ability to merge and compose synthesized MTSs provided a comprehensive viewpoint for exploration of underspecied behaviors. More importantly, many of the elicited requirements were never explored or considered in the original case study, which thus resulted in an incorrect nal model. 7.2.3 Conciseness of caTS The nal element of Hypothesis 2 is concerned with the impact of caTS on the size of a scenario-based specication: a more concise specication implies that an engineer needs to specify a smaller number of scenarios. Section 7.2.3.1 describes how conciseness has been evaluated, while Section 7.2.3.2 discusses the obtained results. 7.2.3.1 Objectives and Setup The objective of the evaluation is to answer the following research questions: (RQ1) How much more concise is a specication based on caTS compared to one based on the existing state-of-the-art triggered scenario languages? (RQ2) For what sizes of systems, in terms of state space size and number of components, is using caTS particularly benecial? (RQ3) Does making a specication more concise require adding a large number of context annotations (thus actually leading to more complex scenarios)? 117 To answer these questions, this section compares how using caTS diers from using existing scenario languages when describing a system's behavior. Obtaining a suciently large number of real-world systems and specifying their behavior using dierent languages is not practical. Instead, a set of component LTS models and the corresponding system- level LTS were generated. Then, two scenario-based specications were created|one based on a state-of-the-art scenario language (the combination of eTS and uTS [84]) and one based on caTS| that fully describe the given system's behavior. The resulting specications are compared in Section 7.2.3.2. The procedure for generating LTSs takes as inputs the average number of uents per system component, the number of components, the upper limit on the number of system events, and the upper limit on the number of links between components. The procedure also incorporates several constraints to mitigate the risk that a generated model is not be representative of a typical software model [101] (e.g., models in which a component is able to generate an arbitrarily large number of events should not be generated because software components typically follow structured protocols). In this process, three constraints are placed on the generated models: (1) uents dene the system states (each LTS state corresponds to a distinct uent evaluation); (2) each system event has a precondition dened on those uents; and (3) the average number of events a component can generate in each of its states is bounded to ve events. Under these constraints, a set of random uents and a set of random event preconditions are generated rst. This is followed by an iterative LTS generation that starts with an LTS with a single (initial) state. The procedure takes the state and adds new transitions on randomly selected events 118 whose preconditions are satised, while also adding the new states reached by the new transitions; the following iterations process the new states. For evaluation purposes, six categories of models, dened in Figure 7.3, are used; each category is dened by the combination of the system LTS size and the number of components (light gray area in Figure 7.3). 100 system specications of each category were generated; their average characteristics are reported in Figure 7.3. Note that systems with more than eight components are not considered due to the very large resulting system- level LTSs that would limit the scalability of the analysis. Instead, it is safe to assume that such systems would be modeled hierarchically [88]. Once the LTS models are generated, the generation procedure iteratively creates sce- narios with 6{10 events, as recommended by [18], until a set of scenarios is arrived at that describes the system behavior. In each iteration, the procedure selects a random system state, creates a new scenario whose precondition is the uent evaluation in that state, and generates a scenario sequence by following random transitions from that state; the scenario generation is done separately for eTS/uTS and for caTS. Intuitively, a generated scenario species that a transition tr incoming to a states can be followed by a transition tr 0 outgoing froms if (1) the scenario sequence traverses tr followed by traversing tr 0 , or (2) tr 0 is traversed as the rst scenario step or following a caTS context annotation, which implies that tr 0 is always possible froms. Note that, for eTS/uTS,s refers to the system LTS states, while for caTS, due to its component-level semantics, s refers to the states in the component models. The system behavior is fully described when the generated set of scenarios species that, for each LTS state s, a transition tr incoming to s can be followed by any transition tr 0 outgoing from s. The dierence between the procedure 119 Category Range SC Ev Fl SS ST CS CT Category 1 11{20 2 8 9 15 33 13 29 Category 2 21{100 2 11 9 52 122 42 105 Category 3 21{100 5 13 18 50 246 7 18 Category 4 101{500 5 18 18 226 1170 15 47 Category 5 101{500 8 21 24 247 1902 6 19 Category 6 501{5000 8 28 24 1933 16455 13 52 Figure 7.3: Characteristics of the generated specications to generate eTS/uTS and the procedure to generate caTS, is that generating caTS also takes into account the possibility of adding event obligations and context annotations to a scenario. When a caTS is generated, an event is specied as universal or one-sided universal whenever possible. Furthermore, the generation procedure considers adding a context annotation before the event. To test the eect of the context annotations accord- ing to RQ3, the scenario generation procedure takes as an input the allowed number of context annotations per scenario. 7.2.3.2 Results Figure 7.4 reports the characteristics of the generated scenario sets for the model cat- egories of Figure 7.3. The light gray area in Figure 7.4 contains the number of the generated eTS and uTS that fully describe the system behavior. The dark gray area in Figure 7.4 contains the data related to the use of caTS without context annotations. The column Universal events denotes the cumulative number of caTS events with universal or one-sided universal modality. The unshaded area in Figure 7.4 captures the data on the use of caTS with context annotations. Next, the results are discussed in relation to the three research questions posed in Section 7.2.3.1. 120 Category eTS and uTS Annotation-free caTS Annotated caTS eTS Scenarios uTS Scenarios Scenarios Universal ev. Scenarios Universal ev. Context annot. Annot. per caTS Category 1 23 3 25 39 13 22 33 2.5 Category 2 86 11 95 119 48 66 126 2.6 Category 3 99 16 47 77 22 39 52 2.4 Category 4 439 49 140 129 63 61 137 2.2 Category 5 437 76 89 114 42 57 86 2.1 Category 6 2764 353 274 209 129 98 232 1.8 Average 641 85 112 115 53 62 111 2.3 Figure 7.4: Evaluation results for the dierent triggered-scenario languages. RQ1. The collected results imply that using caTS signicantly reduces the size of the scenario set needed to describe the system behavior. In particular, the size of the generated eTS/uTS scenario sets ranged between 26 and 3117 scenarios (the sum of the two columns in Figure 7.4), and the number of required caTS was only between 13 and 129 scenarios. While in a realistic development setting, the scenario specication may stop before the system behavior is fully described |specifying thousands of scenarios may be unlikely| the ratio between the required number of caTS scenarios and the required number of other scenarios would still hold. Using annotation-free caTS reduced the size of the original specication by approximately a factor of 6. The reduction is even higher for caTS with context annotations: the average reduction factor was 14, with the specication size reduced by a factor of 2 for (the smallest) models of Category 1 and a factor of 24 for (the largest) models of Category 6. These reductions were a result of the component-level semantics specic to caTS and the context annotations that eectively elaborate the desired behaviors by relating them to component states. Contrary to the intuition, the number of caTS universal events was often higher than the number of uTS scenarios. This is because the caTS generation procedure species a universal event as such in each scenario in which it appears. By contrast, for eTS/uTS, a 121 universal event appears as universal only in uTS, while the universality remains implicit when it appears in eTS scenarios. RQ2. The data in Figure 7.4 conrms that languages with strictly system-level semantics require similar specication eort for system LTSs of similar sizes regardless of the number of underlying components (e.g., for Category 4 and Category 5). By contrast, capturing the behavior of a 5-component system of Category 4 required 140 caTS scenarios on average, while capturing the behavior of an 8-component system of Category 5 required around 50 fewer caTS. Furthermore, caTS was particularly eective as the size of the system LTS increased: while a larger LTS also increases the size of a caTS specication, this increase was noticeably smaller than that observed for eTS/uTS. For example, the number of eTS/uTS scenarios generated for models of Category 6 was more than six times larger than the number of eTS/uTS generated for Category 5; the sizes of the caTS-based specications diered only by a factor of three. These results suggest that, although using caTS led to a more concise specication in general, caTS is particularly well suited for larger component-based systems. RQ3. The evaluation results conrm that context annotations further reduce the size of a caTS specication: an average of only 2{3 annotations per scenario reduced the number of scenarios by more than a factor of two in comparison to the annotation-free caTS. Even a single annotation per scenario reduced the number of scenarios by at least 20%. At the same time, it is important to highlight that these results do not take into account the potential cognitive load stemming from annotations, and a more extensive and rigorous study of this issue is warranted, but is out of scope of this dissertation. 122 7.2.4 Validity of Hypothesis 2 The collected evaluation results suggest the validity of Hypothesis 2. In particular, the experience from the industrial case study conrmed that caTS, unlike the existing lan- guages, eectively leads to a correct specication. This is a result of both more accurate scenario constructs and the support for an alternative viewpoint | component MTSs| which stems from the correct and complete synthesis algorithm. Furthermore, the anal- yses conducted on a large set of generated models overwhelmingly conrm that caTS specications are naturally more concise. Finally, the burden of learning caTS is reason- ably low: as stated earlier, caTS introduces a small number of new constructs compared to the numbers of constructs an engineer needs to learn to use existing state-of-the-art scenario languages. This argument is further reinforced when considering that the new caTS constructs have semantics that resemble concepts in already existing languages. 7.3 Hypothesis 3 | Renement Distribution Framework The renement distribution framework described in Chapter 5 is an implementation of the hypothesized technique from Hypothesis 3 : \Renements of a system MTS based on a new requirement can be classied into a nite set of types. A technique can be devised that maps system renements of a certain type to renements of the same type at the component level. Such a technique will create a distributable system MTS that renes the original system MTS and restricts the system's behaviors per the new requirement". The evaluation will thus determine whether the framework satises the three required conditions: (1) the viability of arriving at a given MTS via a set of the framework's 123 atomic renements, (2) sound renement mappings (`renes the original system MTS" and be \distributable"), and (3) correct renements (\restrict the system's behaviors per the new requirement"). Section 7.3.1 proves the rst condition by dening a set of steps to arrive from a given initial MTS to any of its renements. Section 7.3.2 then theoretically analyzes the soundness of the proposed mappings between the system-level MTS renements and the component-level MTS renements. Finally, both Section 7.3.2 and Section 7.3.3 evaluate the framework's correctness: while Section 7.3.2 provides a theoretical correctness argu- ment, Section 7.3.3 demonstrates how the framework facilitates correct renements of an industrial specication. 7.3.1 Generality of Atomic Renements As noted in Section 5.1, the states in a system MTS generated from the popular re- quirements notations can be characterized (1) with the unique evaluation of the specied uents in the particular state, and (2) with the scenario execution step tracked in that state. Hence, the application of the renement distribution framework depends on the ability to express the transition from a less rened to a more rened requirements-based system MTSs as a sequence of atomic renements. Theorem 5 (Generality). Given two MTSs M and N that are based on requirements, where N is a strong renement of M M S N, N can be obtained from M through a sequence of atomic renement operations performed on M. 124 Proof. The proof builds a procedure that achieves a strong renement by applying a sequence of atomic renements. For MTSsM andN, withM S N,R ref (m i ) denotesN's states that renem i 2S M . Furthermore, each state in the MTSs is uniquely characterized by the combination of uent evaluations cs(s) and the tracked scenario step sc(s). Let FL M be the set of M's uents, and FL N be the set of N's uents, which is, in turn, a superset of FL M . Note that for each state n j 2R ref (m i ), cs(n j ) is a superset of cs(m i ), while either sc(n j ) and sc(m i ) are equivalent or sc(m i ) is empty. The procedure for arriving from M to N proceeds as follows. As a rst step, uent- based state cloning is performed for all uents FL M =FL M so that an MTSM 0 is obtained. Since uent-based cloning, which does not remove or add behavior, is performed, M and M 0 are behaviorally equivalent. In turn, this also means that M 0 S N, while for each state n j 2 R ref (m 0 i ), the states' uent evaluations are equivalent and only the tracked scenario step may dier. To arrive at N, it is necessary to rene M 0 based on the dierences between sc(n j ) and the corresponding sc(m 0 i ). Hence, starting from those states that track the rst scenario step not tracked in M 0 , M 0 is rened via execution- tracking state renements that only split states and neither promote maybe transitions to required nor remove them from the model, which ensures behavior equivalence of M 0 and M 00 . This procedure is iteratively repeated until M 00 is obtained such that, for each state n j 2R ref (m 00 i ), sc(n j ) and sc(m 00 i ) are equivalent. Finally, to arrive at N, M 00 can be simply rened with a sequence of transition renements. 125 7.3.2 Soundness and Correctness of the Framework This section proves that the renement distribution is sound, i.e., it produces a system MTS that (1) is obtained through valid MTS renements and (2) is distributable, by proving that the solutions for handling the dierent renement types are sound. Further- more, the section provides correctness arguments that relate the nally obtained MTS with the intent of the input system renement and, in turn, of the system requirement. 7.3.2.1 Transition Renement Soundness. The algorithms in Figure 5.1 modify a system MTS M into M 0 . In turn, the component renements are propagated back to the system model by poten- tially requiring or prohibiting additional transitions, eventually yielding a nal system MTS M 00 . To prove that the algorithms perform legal renement, i.e., M S M 0 , it is sucient to show that they modify only the previously maybe transitions (lines 4 and 7 of RefineToRequired and lines 7 and 9 of RemoveFromComposite). Further- more, M 0 is the composition of the rened component MTSs because (1) lines 5{7 of RefineToRequired correctly update the applied parallel composition rule (Figure 2.2) when, following a component renement, the system transition should become required; (2) lines 8{9 of RemoveFromComposite correctly remove those system transitions for which their underlying component transitions were prohibited; and (3) other transitions are correctly left unmodied. Correctness. The intent behind rening a system transition M:s l ! m M:s 0 is to have some system components that respectively require or prohibit that system behavior, while not doing so trivially by prohibiting the system to ever reachM:s. Given this intent, 126 it is necessary to show that the algorithms perform necessary transition renements at the component level. Consider an event sequence ! that transitions to M:s from the initial MTS state M:s 0 with ! M i as the projection of ! to a component M i 's alphabet (M:s is the only state reachable by ! for a deterministic MTS). For a component M i that shares the eventl of the rened transition,! M i transitions to stateM:s M i fromM i 's initial state. Hence, the transition renements necessary to still allow ! while requiring a system transition M:s l ! m M:s 0 are the transition renements performed in line 4 of RefineToRequired. Similarly, the transition renement required to support ! while prohibiting a system transition M:s l ! m M:s 0 is the removal of the matching transition in at least one component state M:s M i as done in line 7 of RemoveFromComposite. 7.3.2.2 Execution-Tracking State Renement Soundness. The algorithmRefineSysState renes the system MTSM into an MTS M 0 by (1) splitting the states of M, and (2) rening its transitions. The soundness of the second step has already been demonstrated. To prove the soundness of the rst step, note that the state renement in lines 7{8 of RefineSysState only redistributes a state's incoming transitions, while leaving the outgoing behaviors unmodied. Hence, MTS M 0 is a legal renement of M. To prove that M 0 is the composition of the rened component MTSs, note that lines 7{8 revise the system states to re ect the renement of the component state M i :p into M i :p 0 and M i :p 00 . Correctness. An execution-tracking renement of a system stateM:s aims to create a component composition in which the components will require or prohibit behaviors according to a scenario and its current event e. Requiring or prohibiting events after e 127 is executed should not be satised (1) by preventing the system from ever reaching M:s viae, or (2) by requiring or prohibiting those events for any execution that reaches M:s. Given this intent, it will be illustrated that the algorithm performs the appropriate state renements and transition renements at the component level. Let be the set of event sequences that reach M:s from the initial state M:s 0 such thate is the last event for every!2 , and event sequences! 0 that reachM:s fromM:s 0 with events other than e as the last events. RefineSysState performs an execution- tracking state renement in a component M i only in cases when M i shares e (i.e., ! M i , M i 's projection of !, ends with e). By doing so, it is possible to subsequently require or prohibit M i 's events solely for those executions that reached M:s through e. For the converse case, when M i does not share e, transition renements are appropriate for the component state M:s M i because, for each !, either (1) the dierent ! M i have the same nal event that is already tracked in M:s M i , or (2) the dierent ! M i have dierent nal events, in which case requiring or prohibiting behaviors must be done for all the dierent ways of transitioning toM:s M i . For componentsM j that are not involved in the scenario aftere executes (i.e., the scenario does not subsequently aect M j 's events), a transition renement is appropriate because M:s:t(M j ) may be visited again through a dierent path before the scenario's execution proceeds. 7.3.2.3 Fluent-Based State Cloning Soundness. Fluent-based cloning of states in an MTS M does not require or prohibit existing behaviors, nor does it introduce new, previously prohibited behaviors. Hence, the resulting MTSM 0 , despite a larger number of states, is behaviorally equivalent toM. 128 The only distinction is that M 0 has a larger number of states to track the uent values. Furthermore, the same set of uents is used to clone the states in the system MTS and the component MTSs. Therefore, the composition of the component MTSs is equal to the nally obtained system MTS. Correctness. The intent behind specifying a new uent is to be able to restrict behaviors for a particular uent evaluation. The correctness of the solution for uent- based state cloning stems from two factors: (1) the framework allows only uents that can be mapped to components so the intended system behaviors can be later restricted, and (2) any revision to a uent is validated by an engineer using the framework. 7.3.3 Case Study: Philips TV The renement framework was also applied to the industrial protocol from the Philips TV product line [84,89,98] (recall Section 7.2.2). The primary focus of this case study is on the benets that the framework provides when aiming to arrive at a correct specication, and in contrast to the existing, purely system-level, approaches. The case study setting replicates the setting described in Section 7.2.2: the product instantiation consists of two tuning components (Tuner1 and Tuner2), a Switch, and a Video component. The tuners are connected to the screen (i.e., Video) via the Switch with one tuner active. The case study follows the steps from Sibay et al. [84] where scenarios and event invariants were incrementally specied for a protocol that keeps the screen blank to avoid icker during re-tuning. As part of the case study, several inconsistencies have been identied (described below) that were overlooked in the original case study, which was conducted exclusively at the system level. 129 The original case study consisted of several iterations. Each iteration involved either specifying a new requirement, or revising a requirement based on the validation results for a synthesized MTS. The case study starts by describing the main protocol sequence (the BasicTune eTS from Figure 7.5). In BasicTune, Tuner1 stores a new frequency value, and asks Switch to blank the screen while the frequency is changing. Once the frequency is set (t1:restore), the screen displays the picture again (unblank). The validation of the synthesized MTS discovered that the tuning protocol will restart instead of continuing where it stopped if another tuning request is made during the execution of BasicTune. Therefore, a new uent T1 Tuning was dened to capture an ongoing protocol, and the clause:T1 Tuning included in BasicTune's precondition to prevent this protocol restart. In the following iteration, the NestedTune eTS, depicted in Figure 7.5, is specied to capture how a new frequency is immediately stored (t1:newVal) for a nested tuning request. The NestedTune's precondition is then found overly permissive and revised with a clause that references two new uents. The next two iterations specify a precondition and a scenario that describe when a user's request (t1:tune) is allowed and when it is prohibited. Finally, two new eTSs are specied to capture the continuation of the protocol for a nested tuning request. The framework was applied to the above iterations, while revising the requirements only if necessary. For each iteration, the rst attempt would be adding the requirement as specied in [84]. In case a requirement caused a subsequent inconsistency in the original case study, it was revisited in a way that avoids the inconsistency. The system MTS and component MTSs were then validated to determine whether they can be used to detect problems with the requirements or additional behaviors of interest that were not 130 Basic Tune User Tuner1 t1.tune Switch Video t1.dropReq blank t1.dropAck t1.newVal T1_Active t1.restore Nested Tune User Tuner1 t1.newVal (T1_Tuning) and (T1_Active) t1.tune unblank existential prechart existential prechart Figure 7.5: The scenarios from the Philips TV case study [84]. discussed in [84]. Next, a discussion is provide on how the renement framework helped to avoid inconsistencies and facilitated behavior validation. Fluent-based state cloning. From the four originally specied uents, the frame- work directly performed uent-based state cloning at the component level for three uents, while correctly suggesting a revision of T1 Tuning. T1 Tuning, initiated by Tuner1's event t1:tune and terminated by Switch's events unblank, setActiveT1, and setActiveT2, caused inconsistencies that were not detected in the original case study. For example, the nal MTS from [84] prohibits a tuning request before blank and unblank in those executions that follow the BasicTune eTS from Figure 7.5, but allows them immediately after those events. However, Tuner1 cannot enforce these restrictions because blank and unblank are not Tuner1's events. To x the inconsistencies, t1:restore was made the ter- minating event for T1 Tuning, as suggested by the framework based on the BasicTune eTS in which t1:restore precedes unblank. Transition renement and execution-tracking state renement. The rene- ment framework successfully handled the transition renements and execution-tracking state renements for the case study requirements. Notably, none of the requirements 131 were con icting once the uent T1 Tuning was revised as described above. This result was expected since the scenarios were specied as semantically weak existential scenarios: Existential scenarios only require the transitions related to the behaviors captured in the scenarios, while allowing all other behaviors as maybe transitions. Hence, at the end of the original case study, the synthesized MTS contained a number of unexplored maybe transitions. To further explore the framework's ability to rene the remaining maybe be- haviors while preventing inconsistencies, an attempt to strengthen the NestedTune eTS into a uTS was conducted. Validation of the resulting model uncovered that the uTS was overly restrictive for some of Switch's events: t1:dropAck was prohibited from Tuner1's model although it was simultaneously required by Switch, while prohibiting blank was not possible as it was already required. This prompted a revision of the scenario by excluding the events controlled by Switch from the scenario's alphabet. These inconsistencies would not have been detected with the purely system-level approaches. Behavior discovery. While the synthesized MTSs can be used to discover similar behaviors to those discussed in [84], the propagation of the component renements back to the system MTS helped to identify the otherwise overlooked implications of having a system composed of components. For example, the MTSs capture that Tuner1 tries to follow the same protocol regardless of whether it is active or inactive. This discovery, in turn, raised the question of how Switch should respond to an inactive Tuner1. By contrast, the system MTS in [84] fails to capture this behavior of Tuner1, and the re- ported case study does not resolve the question. A separate treatment of the Philips TV system [89] does conrm the relevance of the identied behavior and denes additional requirements related to Switch's communication with an inactive Tuner1. In addition to 132 identifying the implications of the components' behaviors better, the renement frame- work also improved the scalability of behavior validation: since a system MTS can become very large, existing behaviors of interest can be validated and new behaviors identied in a smaller component MTS. The full system MTS can then be used to \replay" the analyzed component behaviors. The conducted case study conrmed that the renement framework helps (1) to spec- ify requirements more accurately, (2) to avoid potential inconsistencies, (3) to identify and validate the behaviors of interest, and (4) to arrive at a correct system specication. 7.3.4 Validity of Hypothesis 3 Evaluating Hypothesis 3 involved collecting evidence to test whether the renement dis- tribution framework is sound, correct, and generally applicable to the context of require- ments engineering. To this end, the constructed theoretical proofs and arguments conrm that the framework indeed satises these conditions. The conducted case study then fur- ther reinforced that the adopted notions of correctness and generality are meaningful from the requirements engineering standpoint. In particular, the framework did not restrict the types of requirements that can be used and it helped avoid specication errors in several non-trivial instances. 7.4 Hypothesis 4 { Trace-Enhanced MTS Inference The nal hypothesis of Section 1.2 suggested: \Synthesis from scenario-based specica- tions can be adapted to work with runtime artifacts in a way that yields a new technique 133 with improved precision and recall compared to the existing implementation-level state- of-the-art synthesis techniques". To this end, Chapter 6 introduced TEMI, a technique developed to satisfy the hypothesis. To evaluate whether the hypothesis holds, TEMI was compared to existing state- of-the-art algorithms to synthesize models of nine libraries. The libraries were exer- cised using readily-available, open-source applications. This section provides an in-depth comparison of the respective quality of the inferred models: Section 7.4.1 describes the evaluation setup. To assess whether TEMI has improved precision and recall over the state-of-the-art, the quality of TEMI models is tested both in the case of perfect inputs (Sections 7.4.2), and imperfect inputs | incomplete invariants (Section 7.4.3). Finally, Section 7.4.4 tests the hypothesis and addresses the threats to validity. 7.4.1 Evaluation Setup This section describes the libraries used in the evaluation and the evaluation procedure, the adopted quality metrics, outlines the process for assessing a model's quality, and describes the specic experiments. 7.4.1.1 Subject Libraries and Applications The libraries used to evaluate TEMI span ve common categories: 1. Data structures (DataStructures.StackAr) 2. Data processing (org.apache.xalan.templates.ElemNumber.NumberFormatStringTokenizer, which will be referred to as NFST, and java.util.StringTokenizer) 134 Application Type Exec. Libraries StackArTester unit test unit test StackAr jEdit text editor end-user StringTokenizer jlGUI media player end-user StringTokenizer Columba e-mail client end-user Signature, Socket, SMTPProtocol jFTP le transfer end-user Signature, SftpConnection JarInstaller packaging end-user ZipOutputStream DaCapo Xalan benchmark perfor- NumberFormatStr- mance ingTokenizer, test ToHTMLStream Voldemort distributed unit tests Socket database Figure 7.6: Eight applications that exercise the evaluated libraries. 3. Authentication and data-integrity verication (java.security.Signature) 4. Data streaming (java.util.zip.ZipOutputStream and org.apache.xml.serializer.ToHTMLStream) 5. Distributed communication and message exchange (org.columba.ristretto.smtp.SMTPProtocol, net.sf.jftp.net.wrappers SftpConnection, and java.net.Socket) To collect invocation traces for these libraries, eight open-source applications were used. Figure 7.6 describes the applications and the way they were ran to exercise the libraries' functionalities. For example, StringTokenizer's functionality was exercised by running jEdit as an end-user who edits and saves text. As another example, Voldemort's unit tests that involve remote communication were used to collect traces for Socket. The same set of traces was used for both the invariant inference and MTS renement. 135 7.4.1.2 Evaluation Method Metrics. To measure the quality of a model, the model is compared to a ground-truth model. This comparison is done in standard terms of precision and recall [64]. Precision measures the fraction of the traces generated by the inferred model that are allowed by the ground-truth. Recall measures the fraction of the ground-truth traces that are allowed by the inferred model, suggesting how complete the model is. Assessing Model Quality. Evaluating the quality of the inferred models requires a set of ground-truth models that represent the libraries' legal behavior. To this end, this evaluation uses the ground-truth models from related work [22, 78]. These models were modied when they had imprecisions conrmed after inspecting the source code, and when the models included non-public methods. All transitions on methods that were never invoked in the collected traces were removed from the ground-truth model. (This simplication equally impacts the recall of TEMI and of the existing techniques, and thus does not aect the conclusions.) For libraries without an existing ground-truth (NFST, ToHTMLStream, and SftpConnection), the source code and API documentation were inspected to manually construct the models. Experiments. The performed experiments evaluated seven versions of TEMI and existing inference algorithms with certain enhancements. The rst four are traditional k-tail (recall Section 6.1.1) and SEKT, an adaptation of k-tail with a stronger merging criterion of predicate equivalence [53], with k2f1; 2g. Results of k-tail for k > 2 are not reported because that led to fewer merges, which lowered the recall without notable precision improvement. Furthermore, other k-tail-based algorithms (e.g., [66,67]) are not 136 included in the experiments because, by denition, they perform fewer merges than the traditional k-tail with insignicant, if any, observed precision gains. The other three algo- rithms included in the evaluation are invariant-based: optimistic and pessimistic TEMI, and Contractor [29]. The pessimistic TEMI removes maybe transitions (thus treating unobserved invocations as illegal), while the optimistic model retains all transitions. Contractor [29] is included in the evaluation as a state-of-the-art approach for gen- erating invariant-based MTS models. Previously, Contractor was neither quantitatively evaluated nor used for dynamic specication mining. In order to use Contractor in the evaluation, its inputs were enhanced in two signicant ways. First, since Contractor does not consider predicates of the methods' output values, hence it was enhanced to capture each (method, return-value) combination as a distinct method with its own invariants. Second, Contractor was provided with Daikon invariants after they were ltered in a way specic to TEMI (recall Section 6.4). These enhancements of the published technique are thus referred to as Contractor ? . The experiments consisted of three steps: (1) executing applications that use the library to generate traces, (2) running the inference algorithms on those traces, and (3) measuring the precision and recall of the inferred models. The implementation, ob- served traces, inferred models, and ground-truth and good-practice models are publicly available: http://softarch.usc.edu/wiki/doku.php?id=inference:start. 7.4.2 Inferred Model Precision and Recall This section assesses the quality of the inferred models, as compared to the ground truth. Figure 7.7 shows the precision (P) and recall (R) of each algorithm's models. Figure 7.7 137 Library Traditional Traditional SEKT SEKT Optimistic Pessimistic Contractor ? 1-tail 2-tail 1-tail 2-tail TEMI TEMI P R P R P R P R P R P R P R StackAr 64% 30% 83% 30% 97% 30% 97% 30% 99% 94% 100% 71% 99% 94% NFST 100% 21% 100% 21% 100% 21% 100% 21% 96% 57% 96% 44% 96% 57% StringTokenizer 100% 52% 100% 51% 100% 51% 100% 50% 100% 51% 100% 50% 93% 52% Signature 100% 61% 100% 61% 100% 61% 100% 61% 100% 88% 100% 88% 100% 88% ToHTMLStream 100% 26% 100% 26% 100% 26% 100% 26% 100% 100% 100% 100% 100% 100% ZipOutputStream 100% 37% 100% 36% 100% 37% 100% 35% 100% 63% 100% 47% 100% 63% SMTPProtocol 100% 20% 100% 20% 100% 20% 100% 20% 96% 77% 100% 66% 88% 78% Socket 94% 24% 97% 21% 100% 23% 100% 21% 100% 67% 100% 51% 100% 67% SftpConnection 61% 31% 96% 29% 100% 30% 100% 29% 97% 48% 100% 33% NA NA Average 95% 34% 98% 33% 100% 34% 100% 33% 99% 75% 99% 65% 97% 75% Figure 7.7: Precision (P) and recall (R) comparison of the competing techniques. distinguishes the algorithms along two dimensions: TEMI is separated from the existing algorithms by the two vertical lines; the dierent shadings distinguish the k-tail-based algorithms from the algorithms that infer models using invariants. Next, three aspects of the results are analyzed: (1) the precision and recall of TEMI models to other models; (2) the reasons for better recall of the invariant-based algorithms, TEMI and Contractor ? ; and (3) the dierences between TEMI and Contractor ? . For all libraries, except StringTokenizer, TEMI produces models of superior recall and comparable or better precision than k-tail algorithms. For example, the k-tail models of SMTPProtocol allow only a single message to be sent before terminating the connec- tion. The TEMI model correctly generalizes the observed behavior and allows multiple messages, which improves the recall. Compared with SEKT, the precision of the opti- mistic TEMI is lower in the cases where TEMI either included an erroneous transition or inferred erroneous states due to incomplete invariants. The imprecisions due to erro- neous transitions are resolved when only the required transitions are considered (i.e., in the pessimistic TEMI). 138 The data in Figure 7.7 demonstrate that the invariant-based algorithms, TEMI and Contractor ? , have signicantly higher recall than the k-tail-based algorithms. This is because the invariants help to distinguish between those invocations that change pro- gram state, in turn restricting future invocations, and those that do not. For example, ToHTMLStream's invariants indicate no restrictions on its method invocations and the TEMI model has a single state with a self-transition for every method. In contrast, the k-tail models capture irrelevant invocation restrictions inferred from the traces. The evaluation results also suggest that TEMI is slightly-to-moderately more precise than Contractor ? , while having nearly-identical recall. The dierences in precision stem from the ways these two approaches construct model states and the way TEMI incorpo- rates the observed invocations. As discussed in Section 6.2, TEMI distinguishes states by atomic predicate clauses, while Contractor ? denes state by the predicates that cor- respond to full method preconditions. Multiple TEMI states may be logically mapped to a single Contractor ? state; the Contractor ? state may have additional transitions that do not exist in the TEMI states. Therefore, Contractor ? has theoretically higher recall at the expense of model precision. The dierence in precision is especially pronounced in the cases of StringTokenizer and SMTPProtocol, whose recall Contractor ? improved by up to 1%, but at a preci- sion cost of 7{8% as compared to optimistic TEMI. For example, Contractor ? 's model of StringTokenizer allowed illegal invocation sequences in which a rst invocation of hasMoreTokens() returned false but a subsequent invocation returned true. This oc- curred because Contractor ? is not always able to precisely capture postconditions that 139 Library Contractor ? No Invariant No Method Filtering Distinction P R P R P R StackAr 99% 94% 99% 94% 77% 100% NFST 96% 57% 98% 40% 78% 66% StringTokenizer 93% 52% 91% 77% 47% 77% Signature 100% 88% 100% 78% NA NA ToHTMLStream 100% 100% 100% 100% NA NA ZipOutputStream 100% 63% NA NA NA NA SMTPProtocol 88% 78% 100% 24% NA NA Socket 100% 67% 96% 60% 86% 68% Figure 7.8: Contractor models with and without TEMI-specic invariant lters. relate post-state values to pre-state values. For SMTPProtocol, TEMI removed unob- served transitions on getState(), which remained in the Contractor ? model due to in- complete invariants. In general, the causes behind Contractor ? 's imprecision were more varied than those that impacted the recall of TEMI: While TEMI's recall may deteriorate because of overly restrictive invariants that are abstracted by Contractor ? , Contractor ? 's precision may be hampered by incomplete invariants, intricate relationships between invariants, and invocation dependencies that cannot be captured using invariants. While an issue such as overly restrictive invariants can be mitigated by collecting additional executions, invariant complexity and implied dependencies cannot be mitigated in such a way. The relatively high precision and recall of Contractor ? are also a result of enhance- ments to the original Contractor algorithm [29]. These enhancements, which are a contri- bution of the work presented in this dissertation, are applicable primarily in the context of dynamic model inference (in contrast to design-time model synthesis targeted by Contrac- tor [29]). Figure 7.8 outlines the quality of the models obtained when the enhancements are not applied. First, when the input invariants are not ltered, a modest 1% average 140 increase in precision comes at the expense of a sizable drop in recall (up to 54% in the case of SMTPProtocol). Second, when the dierent method return points are not handled sep- arately, the resulting models are 25% less precise on average, although the average recall increases by 10%. Omitting the enhancements also accentuated Contractor's scalability problems, discussed next. Each Contractor ? SMT query includes the invariants of all methods, and such a query is generated for every possible combination of methods' invariant evaluations (recall Sec- tion 6.1.2). Hence, Contractor ? queries are longer and more resource consuming than TEMI's queries. In the case of SftpConnection, which has 22 methods with 684 in- variant clauses, the SMT solver runs out of memory (Figure 7.7); this happens in ve additional cases when the enhancements are not applied (Figure 7.8). 7.4.3 Sensitivity to Invariant Quality In the real world, the collected execution data can be unrepresentative, and the results of invariant inference noisy. The aim is to have a technique that can maintain high precision under noise: Even a moderate drop in precision could render the produced model useless for a number of development tasks. To evaluate the impact of invariant noise, random subsets of invariant clauses were re- moved and the resulting average precision and recall of the TEMI and Contractor ? models were measured. For each library, 20 models were generated for each of the cases when 10% and 20% of the invariant clauses are removed. Figure 7.9 shows the results, which suggest that (1) the precision of pessimistic TEMI is robust to variations in invariant quality, (2) optimistic TEMI models outperform Contractor ? by yielding higher precision 141 Library Optimistic Optimistic Pessimistic Contractor ? Optimistic Pessimistic Contractor ? TEMI TEMI TEMI TEMI TEMI full invariant set 10% invariants removed 20% invariants removed P R P R P R P R P R P R P R StackAr 99% 94% 67% 96% 100% 71% 70% 95% 66% 96% 100% 80% 61% 97% NFST 96% 57% 90% 59% 96% 44% 86% 60% 86% 60% 96% 44% 82% 69% StringTokenizer 100% 51% 90% 77% 100% 50% 92% 69% 82% 79% 100% 50% 91% 72% Signature 100% 88% 96% 90% 100% 88% 96% 88% 92% 91% 100% 88% 88% 77% ToHTMLStream 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% ZipOutputStream 100% 63% 100% 69% 100% 47% 100% 69% 100% 76% 100% 47% 100% 73% SMTPProtocol 96% 77% 96% 77% 100% 66% 86% 74% 94% 78% 100% 66% 80% 62% Socket 100% 67% 61% 69% 100% 51% NA NA 58% 71% 100% 51% NA NA Average 99% 76% 91% 81% 99% 67% 90% 79% 89% 83% 99% 67% 86% 79% Figure 7.9: Comparison of TEMI and Contractor ? algorithms in a noisy environment. and recall, (3) noisy environments require enhancement of invariants with trace infor- mation, and (4) decreased invariant quality negatively aects Contractor ? 's performance. Each point is elaborated below. 1. Invariant incompleteness did not aect the near-perfectly precise (99%) pessimistic TEMI models. This conrms that the MTS renement procedure (recall Section 6.3) appropriately introduces the trace information even when the initial model is imperfect. The impact of this result is that pessimistic TEMI models are the most appropriate choice for development tasks that require high precision (e.g., code understanding, debugging, documentation) when the condence in invariants is low due to limited traces. In partic- ular, pessimistic TEMI models have 8{13% higher precision than optimistic TEMI and Contractor ? models, while also having 32% higher recall than comparably precise k-tail models (Figure 7.7). 2. When faced with noisy invariants, the optimistic TEMI models outperform the competition in the average case by 1{4% in both precision and recall. The reasons were twofold: (1) Incomplete invariants add erroneous non-deterministic transitions; the degree of non-determinism is higher in Contractor ? models, leading to lower precision. 142 (2) Incomplete invariants cause TEMI's renement procedure to correctly split states and subsequently remove the undesired maybe transitions only from the appropriate state. 3. In the average case, the precision of TEMI models dropped from 99% when noise is not present (Figure 7.7) to 89{91% under noise (Figure 7.9); the precision dropped by 42% and 33% in the two extreme cases |StackAr and Socket. A reason for the extremes is that the states in TEMI's Socket model can have a number of incorrect non-deterministic transitions that conrm the connection (isConnected()=true) before it has been established. This highlights the crucial role of augmenting invariant-based models with trace information: isConnected() never returns this result in the actual traces and the erroneous transitions do not exist in the pessimistic TEMI models. 4. Incomplete invariants aect performance, making Contractor ? less ecient due to a higher number of allowed program states. This, in turn, results in a higher number of SMT queries that cause the scalability problems already discussed in Section 7.4.2. 7.4.4 Validity of Hypothesis 4 In contrast to evaluations of Hypotheses 1-3, the evaluation of the dissertation's nal hypothesis was exclusively empirical. For this reason, the evaluation involved a higher number of real-world system in order to gather suciently rich and strong evidence. This process also required addressing potential threats to the validity of the empirical study. To this end, the potential ground-truth bias was avoided by using the publicly available ground-truths from other researchers [22,78], and modifying them only if source code inspection conrmed them. The risks stemming from subject library and application selection were mitigated through selection of libraries of dierent types and popularity 143 (e.g., well documented Java libraries vs. less widely known NFST) as well as applications from several dierent domains. The collected results strongly support Hypothesis 4: Utilizing program state informa- tion signicantly improves the quality of dynamically inferred models, particularly over state-of-the-art k-tail-based techniques. The results also suggest that program state infor- mation should be used to create an initial model before augmenting it with the information about invocation sequences, consistent with TEMI. Moreover, it was conrmed that TEMI is an appropriate solution even in the case of imperfect inputs, still outperforming com- peting techniques. Finally, the invocation trace information has been shown necessary to circumvent potential imprecisions in the invariant-based models | while lower quality invariants reduced the precision of the maybe transitions, the pessimistic TEMI models remained almost perfectly precise with unchanged recall. 144 Chapter 8 Related Work This dissertation's related work can be grouped into prior scenario specication languages (Section 8.1), existing partial-behavior modeling formalisms (Section 8.2), techniques for behavior model synthesis (Section 8.3), techniques for behavior model decomposition (Section 8.4), and techniques for implementation-level specication mining (Section 8.5). 8.1 Scenario Specications The focus of this section are the existing scenario specication languages with particular emphasis on triggered scenario languages. As noted earlier, the popular scenario notations such as UML Sequence Diagrams [76] and Message Sequence Charts (MSC) [49] have weak existential semantics. To describe the overall system behaviors, multiple Message Sequence Charts can be connected into a high-level Message Sequence Chart (hMSC) [49]. However, specifying an hMSC may be infeasible during iterative system development where individual scenarios are only gradually discovered and specied. To express stronger requirements statements using scenarios, several triggered sce- nario languages have been proposed. Damm and Harel [27] rst introduced triggers into 145 sequence chart-based specications with Live Sequence Charts (LSC), which come in both existential (eLSC) and universal (uLSC) forms (e.g., a universal LSC comprises an exis- tential sequence that triggers a universal sequence). LSCs are dened via a system-level semantics that denes legal system traces. In contrast to caTS, LSCs do not support the existential branching modality or arbitrary modality mixing. The state annotations in an LSC dene the conditions that may/must hold at a certain point, while caTS context annotations dene the conditions that trigger certain obligations. Sibay et al. [85] rst proposed the existential branching event modality in their exis- tential Triggered Scenarios (eTS). Compared to caTS, eTS and the recently proposed uni- versal Triggered Scenarios (uTS) [84] are dened via system-level semantics, with certain limitations from the component-level viewpoint (recall the discussion from Chapter 4). It is suggested that eTS is useful for capturing preliminary system-level requirements, and used as a starting point for component-level reasoning and scenario elaboration. Sibay et al. [84,85] operationalize the eTS and uTS semantics with a system-level MTS synthesis algorithm. eTS and uTS do not support arbitrary mixing of modalities or the additional caTS constructs (Figure 4.1). Inspired by LSC, UML [76] provides a universal assert sequence diagram construct. Harel and Maoz [47], however, demonstrated the semantic ambiguity of this construct and proposed Modal Sequence Diagrams (MSD) as a solution. Similarly to caTS, MSD allows arbitrary mixing of universal and existential events. However, the system-level MSD semantics may cause an incorrect component-level interpretation. To illustrate the limitations of MSD, consider an MSD [47] Verication depicted in Figure 8.1 for the Customer Banking system from Section 2.4.2. Verication aims to 146 MSD Card Verification UI ATM cardEnter Proxy Bank verifyCard invalidCard returnCard sendUpdate Figure 8.1: An MSD [47] for Customer Banking. specify two requirements: (1) Upon reception of a card verication request, Proxy may report that card information is invalid and may then send a report to the Bank; and (2) When a card is input, ATM may send a verication request to Proxy. If the card is invalid, ATM shall immediately instruct UI to return the card. While Verication seems to have a direct correspondence to these requirements, it is counter-intuitive from the components' viewpoint. In particular, under the system-level MSD semantics, the only legal continuations after invalidCard are (a) returnCard followed by any other event or (b) sendUpdate followed by returnCard. Since Proxy cannot observe returnCard, it must generate sendUpdate to satisfy event interleaving in which the next Proxy event happens before returnCard (case (b)). This implies that sendUpdate, originally specied as existential at the system level, becomes universal at the component level. Other scenario languages with a \triggered" avor are Triggered Message Sequence Charts (TMSC) [82] and Property Sequence Charts (PSC) [4]. The TMSC semantics is dened at the component-level. In TMSC, the individual scenarios are connected into a high-level MSC [49] that describes the overall system behavior. TMSC uses purely existential triggers to defer making decisions about the exact behavior of an hMSC node. By contrast, caTS does not require an hMSC and its triggers relate to the overall system 147 behavior. PSC have a narrower scope than caTS as they are meant only to express system-level LTL properties. As an alternative to visual scenario specication, Ben-David et al. [6] have recently proposed Conditional Scenario Specication Language (CSSL), a linear temporal logic with support for specifying branching sequences. While more expressive that the existing triggered scenario languages, CSSL does not provide means to distinguish system compo- nents, does not consider uents, and, as a formal logic, may require a higher specication eort compared to scenarios. Expanding CSSL to uents may enable the mapping from caTS to CSSL formulas and, in turn, access to CSSL's ecient model checking facilities. 8.2 Partial Behavior Models Modal Transition System (MTS) [61] is the most commonly used partial-behavior mod- eling formalisms. The recent success of MTSs as the target models for model synthesis (e.g. [52,91]) is due to their ability to represent compactly the full set of legal implemen- tations and expose the undened behaviors. Generating MTSs from system requirements has proven useful when analyzing how the dierent requirements work together [39,91,93]. Furthermore, MTSs can be used to drive elaboration of existing and elicitation of new requirements [84, 91, 94] as they explicitly capture the underspecied behaviors | i.e., those behaviors that are neither explicitly required nor prohibited. One limitation of the MTS formalism is that it does not distinguish the control of an event | whether an event is incoming, outgoing, or internal. Motivated by the existing work on specifying component interfaces with event directionality for implementation 148 models [28], researchers have recently proposed modal IO automata [5, 62] to specify partial component interfaces. However, these proposed formalisms dene consistency of two automata with only two values |true or false| while, in the context of partial models, there should also exist a notion of maybe consistency, which would then depend on the future renements. MTSs and modal IO automata [62] can express both general component interfaces [5,62] and product-line components [37,38,62]. Additional examples of partial behavior modeling formalisms include disjunctive MTS [63] that extend MTS with \or-transitions", and Feature Transition Systems (FTS) [17] used to model product line components by attaching a set of features to each transition. This dissertation works with MTS because the proposed formalisms for dening modal interfaces are still preliminary. The underlying assumption, however, is that the presented techniques can be minimally adapted to generate partial-behavior models of other avors. 8.3 Behavior Model Synthesis There exist a number of techniques for synthesis of system-level and component-level behavior models from scenario-based system specications. The existing approaches for component-level model synthesis predominantly produce nal LTSs (as opposed to partial MTS models), thus arbitrarily selecting one of the potentially many component imple- mentations that satisfy the input requirements. Whittle and Schumann [104] proposed an algorithm for generating component state- charts from scenarios and system operation invariants. Their algorithm works on similar 149 inputs as the heuristic MTS synthesis algorithm from Chapter 3, but without consider- ing the behavior that is neither prohibited nor required by the requirements. Uchitel et al. [94] demonstrated the importance of considering the specications' partiality during iterative architecture design on the case study used by Whittle and Schumann. Uchitel et al. [95,96] also put forward a technique for component-level LTS model synthesis from a scenario-based specication, and discovering implied scenarios. The resulting LTS models are constructed from LTS models of individual MSCs, which are composed according to an additional high-level MSC. The techniques proposed in this dissertation do not require the architect to specify hMSC. Several techniques [24,26,69] allow an engineer to interactively build a scenario-based specication through the specify-synthesize-validate cycle. M akinen and Syst a [69] de- veloped a semi-automated technique that uses architect's guidance to synthesize compo- nent statecharts from existential sequence charts. Damas et al. [24] inductively infer a system-level LTS and subsequently decompose it into component-level LTSs from scenar- ios interactively provided by the user (the decomposition algorithm used is simple LTS projection [68]). A later extension of this technique reduces the number of questions to the use [26] by incorporating FLTL properties [44]. However, these techniques can create overspecied models that exhibit more behavior than actually desired. Further- more, these techniques make restrictive assumption such as assuming that every specied scenario starts in the system's initial state. Several heuristic algorithms synthesize behavioral models from LSCs [12,46], but their heuristics risk misinterpreting the true stakeholder intent. Furthermore, the algorithm proposed by Harel et al. [46] introduces events that were not originally specied into the 150 component statecharts to synchronize the states of dierent components. However, by doing so, the system becomes monolithic as each component must have global knowledge of the system's execution. A couple of recently proposed techniques construct LTSs based on pre- and postcondi- tion invariants [2,31]. De Caso et al. [31] focus on generating abstract models to support invariant validation and elaboration. Their tool Contractor accepts a rich set of variable types, including integer and oating point variables. De Caso's algorithm, however, does not consider scenarios. Alrajeh et al.'s technique [2] facilitates renement of system's event invariants based on system goals and scenarios. These two techniques complement the techniques proposed in this dissertation by supporting validation and renement of system-level invariants. D'Ippolito et al. [32{34] have a suite of techniques for synthesizing a model for a single component (controller) that communicates with the remainder of the system. The initial technique [32] synthesizes an LTS model that, under assumptions that are specied using FLTL formulas, satises a set of given liveness properties. The technique was later extended [33] to work in domains in which some of the events may fail a nite amount of times (i.e., a controller is able to predict possible errors and retry). In these two cases, synthesizing an LTS is sucient as the techniques are targeted for self-adaptive systems. In particular, such systems need to obtain, at runtime, any controller that satises the properties. The most recent technique [34] analyzes whether the LTS implementations of an environment MTS have a corresponding LTS controller that satises a given property. Compared to the existing work on MTS synthesis from a scenario-based specica- tion [85, 90], the techniques proposed in this dissertation generate component-level, as 151 opposed to system-level, MTSs. The algorithm designed by Uchitel et al. [90] generates system-level MTSs from scenarios and safety properties while imposing each scenario to start from the initial MTS state. Sibay et al. [85] synthesize system-level MTSs M from eTS scenarios. Since the generated models are system-level, these techniques do not guarantee that a corresponding component-based system implementation is feasible. 8.4 Behavior Model Decomposition This section overviews the techniques for system-level model decomposition, and discusses techniques that maintain the consistency of design models. Such techniques are foremost conceptually related to the renement distribution framework proposed in Chapter 5. The idea of creating component-level behavior models from system-level models has been explored in dierent settings [9,45,86,96]. Bianculli et al. [9] propose a Web service- oriented technique that creates a set of component models such that their composite be- haviors are a subset of the initially specied system-level behaviors. Halle and Bultan [45] proposed a technique that determines whether a system-level LTS can be decomposed in a way that preserves all of the desired system-level behaviors. Uchitel et al. [96] devise a set of component- models such that their composition may provide more behaviors than the initial system-level specication. Contrary to this dissertation's renement distribution framework, these existing techniques do not deal with partial behavior models. The only other technique that tries to devise component MTSs from a system MTS was proposed by Sibay et al. [86]. As discussed in Section 2.3, Sibay presents an algo- rithm that creates correct component MTSs from a distributable system MTS. However, 152 contrary to the renement distribution framework, for a non-distributable initial system MTS their algorithm does not return any MTSs and does not point out the specic issues. The work on the renement distribution framework also relates to research on model traceability and inconsistency management [105]. Egyed [35] and Xiong et al. [107] create consistency rules between dierent types of UML models. The rules, built-into a tool [35] or specied by an engineer [107], are checked and propagated (semi-)automatically when a model is modied. By contrast, the renement distribution framework maintains be- havioral consistency between models of the same type, but of dierent scopes. The frame- work also automatically propagates transition and state renements, while involving an engineer when revising a problematic uent. 8.5 Specication Mining This section focuses on the prior research closest to TEMI|the techniques that use pro- gram invariants and invocation traces to mine LTS specications| while also overviewing static specication mining. Program invariants have been used to directly synthesize FSM models [29], and to aug- ment the k-tail algorithm with transition invariants [67]. As noted in Section 7.4, TEMI is more appropriate than Contractor [29] for dynamic specication mining because (1) it has higher precision, (2) it is more resilient to invariant noise, and (3) unlike Contractor, it distinguishes between observed and unobserved invocations. By contrast, starting from the observed executions and using Daikon [36] to infer transition invariants [67] results in other imprecisions. This approach learns invariants for those executions that follow 153 S0 StackAr capacity ≥ 0 S1 S2 S3 S4 S5 Trace 14 isEmpty size ≥ 0 return = true topAndPop size ≥ 0 return = null makeEmpty size ≥ 0 top size ≥ 0 return = null isFull size ≥ 0 (return = true v return = false) ... Figure 8.2: A merged trace obtained using Lorenzoli et al.'s algorithm [67]. an identical sequence of method invocations. For example, this approach would merge StackAr's Trace 1 and Trace 4 from Figure 2.10 because they follow the same invocation sequence, despite the dierent method parameter and return values. Figure 8.2 depicts the initial steps of the resulting Trace 14. While the merging procedure creates sound invariants (e.g., capacity is a non-negative integer), it merges the traces for stacks of dif- ferent sizes. Consequently, the resulting merged trace erroneously implies a stack of size 0 is the same as one of size 5, even though isFull returns dierent values. The k-tail algorithm [10] serves as a basis for many techniques for mining LTS-based models from invocation traces [19, 64{67, 80, 100]. These algorithms (1) extend k-tail to improve its precision or recall [19,66,80,100], (2) build larger frameworks with k-tail as the inference algorithm [65,80], and (3) enhance the models with information about invocation probabilities [65] and program state and method parameters [67]. Additional merges can make the models more compact [19, 80], while stricter merging conditions based on pairwise sequencing invariants improve precision [66, 100]. However, the evaluation detailed in Section 7.4 suggests the inferior recall of k-tail techniques. Synoptic [8] uses the CEGAR [16] approach to create a coarse initial model, and then rene it using counterexamples that falsify temporal invariants. Meanwhile InvariMint [7] presents a declarative specication language for expressing model-inference algorithms, and improves the eciency of algorithms, but neither their precision nor recall. 154 Static [30,83] and hybrid approaches [23,103] provide two alternatives to specication mining based strictly on invocation traces. Shoham et al. [83] infer, from client-side code, FSM models that over-approximate the actual invocation sequences. By contrast, TEMI works on traces generated from multiple parts of the code and, potentially, from multiple applications. De Caso et al. [30] statically analyze C programs for invariants and use Contractor to create models that allow more behavior than the ground truth. ADABU [23] infers the concrete program state by statically nding side-eect-free invocations and combines that information with test case executions. While the concrete program state is also abstracted using predicates, these predicates are predetermined and do not relate multiple variables (e.g., ADABU abstracts integers only as negative, zero, or positive). Together with test case generation, ADABU can improve model qual- ity, enabling code verication [22]. However, this requires tailored unit test executions. Compared to TEMI, ADABU uses limited invariants and infers one model per runtime object, which hampers its applicability to rich classes and executions that involve many objects of the same type. Whaley et al. [103] create a separate submodel for each eld of a class, analyzing if a method modies a given eld, and creating a 1-tail model that combines static and dynamic information about method invocations with respect to that eld. Unlike TEMI, this approach requires static analysis, creates multiple submodels, and considers only one-step history (1-tail), limiting its applicability. 155 Chapter 9 Concluding Remarks Designing and implementing software systems that successfully achieve the stakeholders' requirements is a daunting task. This task becomes even more challenging given the added extrinsic complexity: the typical development scenario requires building a system consisting of multiple components under tight deadlines and limited budgets. Conse- quently, thorough requirements collection and specication as well as careful and rigorous architecting of the system are, contrary to the recommended software engineering prac- tices [88], often overlooked. This, in turn, results in erroneous and unsatisfactory software products being delivered to the users. It is important to note, however, that these occa- sionally haphazard practices stem not only from deadline and budget pressures, but also from an oftentimes lacking support for accurate and concise specication of requirements, limited ways of mapping those requirements to the system design and, nally, relating the design to the implementation. To reduce these limitations, this dissertation proposed four techniques, heuristic MTS synthesis, component-aware Triggered Scenarios, renement distribution framework, and 156 trace-enhanced MTS inference. These techniques (1) let engineers specify the require- ments related to the behavior of the system-to-be accurately and concisely, (2) map the specied requirements to faithful design-level models, and (3) produce implementation- level models that can be compared against the design-level models. Heuristic MTS syn- thesis devises partial component-level behavior models from scenario-based system speci- cations. Generating partial component-level models, as opposed to system-level models, introduces more rigor into the requirements elicitation and architectural renement pro- cesses and aids the discovery of several classes of potential problems. Component-aware Triggered Scenarios is a novel scenario modeling language designed to deal with the common design choices that pervade existing scenario modeling languages and underlie a number of unintended, undesirable semantic side-eects when aiming to specify and relate requirements and architecture. The syntactic features of caTS allow it to concisely model the behaviors, constraints, and obligations of the system components stemming from the system requirements. The renement distribution framework handles a renement of a system model stemming from a new system-level requirement, by propagating the changes to component models in order to maintain consistency. Finally, trace-enhanced MTS inference automatically infers models of implementation-level libraries by combining observed invocation sequences with automatically inferred program-state invariants. The proposed techniques have been evaluated and the results conrmed each tech- nique's underlying hypothesis. While each technique addressed a separate hypothesis, several common themes emerged: (1) the improved specication quality, (2) the tech- niques' scalability, and (3) the eective support for elaboration and understanding of the specication. First, the techniques have proven to lead to a correct and more accurate 157 specication than was possible with the existing techniques. In case of heuristic MTS synthesis and renement distribution framework, this refers to earlier discovery of defects stemming from the system-level viewpoint. For caTS, the novel language features allowed more accurate specication of the intent, while TEMI led to signicant improvements in precision and recall of runtime models. Second, heuristic MTS synthesis and caTS have shown marked improvements in scaling to specication sizes that would have been im- possible to handle previously. With heuristic synthesis it is now possible to analyze and nd prospective defects in existing scenario-based specications of almost arbitrary sizes. caTS, on the other hand, can be used to fully specify the behavior of large-scale sys- tems with a signicantly reduced risk of avoiding rigorous specication due to exorbitant required eort. Third, through support for synthesis of component-level MTS models, which has been proven correct and complete, the four techniques provide an alternative component-oriented viewpoint that improves the understanding of the system's behav- ior. In addition, the partial nature of these models has provided invaluable assistance in detecting, analyzing, and elaborating underspecied and unexplored behaviors. On top of considering the above strengths, choosing the best technique for moving from a scenario-based specication to a set of component MTSs involves tradeos re- lated to the desired expressive power and the required training curve (this discussion does not relate to TEMI, which works on implementation-level artifacts). In terms of ex- pressive power, caTS provides constructs that can accurately capture component-specic requirements that also lend themselves to iterative scenario elaboration. The renement distribution framework supports a range of existing system-level notations, which makes 158 it widely applicable. However, the framework is also unable to capture how the require- ments relate to components in an explicit way. The heuristic MTS synthesis is the most restrictive as it allows only existential scenarios. The consideration related to the tech- niques' required training curve are opposite. In particular, it can be argued that caTS requires the highest training curve as it is a new language. By contrast, heuristic MTS synthesis requires understanding of intuitive scenario annotations, while the renement distribution framework allows an engineer to use their favorite notations. The resolution of these tradeos depends on the context a technique is intended to be applied in. For example, safety-critical systems can benet from the rigorously modeled and traceable scenarios, which makes caTS the suitable solution. On the other side of the spectrum, heuristic MTS synthesis is appropriate when time or budget constraints prevent detailed scenario modeling, but synthesized \sanity checks" still help to prevent costly defects. The research described in this dissertation has been disseminated to the wider research audience through a series of publications [51{55,58{60]. Additionally, dierent elements of the research have been applied on related projects in the areas of architecture reliability analysis [15, 56, 57, 87], architecture recovery [42, 43, 71], and architecture-based system adaptation [70,72]. The results of this dissertation also open up new research avenues. For example, the novel ways of creating partial component-level models with maybe behav- iors create opportunities for automated techniques that interactively propose additional requirements that cover those behaviors. The synthesized component models can serve as design contracts that independent teams in a large-scale development project need to satisfy. Similarly, the synthesized models could be used to create candidate matches with the available o-the-shelf software components. The TEMI-based implementation-level 159 models can help to devise caTS-like implementation scenarios or to nd inconsistencies be- tween the intended design and the implemented system. Given the specic contributions of this dissertation as well as the future opportunities that it enables, the dissertation is a step toward software systems whose behaviors are guaranteed to be correct from the system's conception, and through the system's design, implementation, and operation. 160 References [1] J org Ackermann and Klaus Turowski. A library of OCL specication patterns for behavioral specication of software components. In Proceedings of 18th Conference on Advanced Information Systems Engineering, 2006. [2] Dalal Alrajeh, Je Kramer, Alessandra Russo, and Sebastin Uchitel. Learning operational requirements from goal models. In Proceedings of 31st International Conference on Software Engineering, 2009. [3] Apache HTTP Client. http://hc.apache.org/httpcomponents-client-ga/ httpclient/apidocs/, 2014. [4] Marco Autili, Paola Inverardi, and Patrizio Pelliccione. Graphical Scenarios for Specifying Temporal Properties: An Automated Approach. Automated Software Engineering, 14(3), 2007. [5] Sebastian S Bauer, Philip Mayer, Andreas Schroeder, and Rolf Hennicker. On weak modal compatibility, renement, and the mio workbench. In Proceedings of the 16th International Conference on the Tools and Algorithms for the Construction and Analysis of Systems, 2010. [6] Shoham Ben-David, Marsha Chechik, Arie Gurnkel, and Sebastian Uchitel. CSSL: A logic for specifying conditional scenarios. In Proceedings of the 19th ACM SIG- SOFT symposium and the 13th European conference on Foundations of software engineering, 2011. [7] Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, Michael D. Ernst, and Arvind Krishnamurthy. Unifying FSM-inference algorithms through declarative specica- tion. In Proceedings of the 35th International Conference on Software Engineering, 2013. [8] Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. Leveraging existing instrumentation to automatically infer invariant- constrained models. In Proceedings of the 8th Joint Meeting of the European Soft- ware Engineering Conference and ACM SIGSOFT Symposium on Foundations of Software Engineering, 2011. [9] Dominico Bianculli, Dimitra Giannakopoulou, and Corina Pasareanu. Interface Decomposition for Service Compositions. In Proceedings of the 33rd International Conference on Software Engineering, 2011. 161 [10] Alan W Biermann and Jerome A Feldman. On the synthesis of nite-state machines from samples of their behavior. IEEE Transactions on Computers, 21(6), 1972. [11] Barry W Boehm. Software engineering economics. IEEE Transactions on Software Engineering, 1(1), 1984. [12] Y. Bontemps and P. Heymans. Turning High-level Live Sequence Charts into Au- tomata. In Proceedings of the 1st Workshop on Scenarios and State Machines: Models Algorithms and Tools, 2002. [13] Y. Bontemps, P. Heymans, and P.Y. Schobbens. From Live Sequence Charts to State Machines and Back: A Guided Tour. IEEE Transactions on Software Engi- neering, 31(12), 2005. [14] caTS Supplementary Material. http://www-scf.usc.edu/ ~ krka/caTS/caTS.htm. [15] Leslie Cheung, Ivo Krka, Leana Golubchik, and Nenad Medvidovic. Architecture- level reliability prediction of concurrent systems. In Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering, 2012. [16] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided Abstraction Renement. In Proceedings of the 12th In- ternational Conference on Computer Aided Verication, pages 154{169, 2000. [17] Andreas Classen, P. Heymans, P.Y. Schobbens, A. Legay, and J.F. Raskin. Model checking lots of systems: Ecient verication of temporal properties in software product lines. In Proceedings of the 32nd International Conference on Software Engineering, 2010. [18] Alistair Cockburn. Writing eective use cases, volume 1. Addison-Wesley Reading, 2001. [19] Jonathan E Cook and Alexander L Wolf. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Method- ology, 7(3), 1998. [20] Christoph Csallner, Nikolai Tillmann, and Yannis Smaragdakis. DySy: Dynamic symbolic execution for invariant inference. In Proceedings of the 30th International Conference on Software Engineering, 2008. [21] The Daikon invariant detector. http://groups.csail.mit.edu/pag/daikon, 2009. [22] Valentin Dallmeier, Nikolai Knopp, Christoph Mallon, Gordon Fraser, Sebastian Hack, and Andreas Zeller. Automatically generating test cases for specication mining. IEEE Transactions on Software Engineering, 38(2), 2012. [23] Valentin Dallmeier, Christian Lindig, Andrzej Wasylkowski, and Andreas Zeller. Mining object behavior with ADABU. In Proceedings of the 4th International Workshop on Dynamic Analysis, 2006. 162 [24] Christophe Damas, Bernard Lambeau, Pierre Dupont, and Axel Van Lamsweerde. Generating annotated behavior models from end-user scenarios. IEEE Transactions on Software Engineering, 31(12), 2005. [25] Christophe Damas, Bernard Lambeau, Francois Roucoux, and Axel Van Lam- sweerde. Analyzing critical process models through behavior model synthesis. In Proceedings of the 31st International Conference on Software Engineering, 2009. [26] Christophe Damas, Bernard Lambeau, and Axel Van Lamsweerde. Scenarios, Goals, and State Machines: A Win-win Partnership for Model Synthesis. In Pro- ceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2006. [27] Werner Damm and David Harel. LSCs: Breathing life into message sequence charts. Formal Methods in Systems Design, 19(1), 2001. [28] Luca de Alfaro and Thomas Henzinger. Interface automata. In Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering, 2001. [29] Guido de Caso, Victor Braberman, Diego Garbervetsky, and Sebastian Uchitel. Automated abstractions for contract validation. IEEE Transactions on Software Engineering, 38(1), 2012. [30] Guido de Caso, Victor Braberman, Diego Garbervetsky, and Sebastian Uchitel. Enabledness-based program abstractions for behavior validation. ACM Transac- tions on Software Engineering and Methodology, 22(3), 2013. [31] Guido de Caso et al. Validation of contracts using enabledness preserving nite state abstractions. In Proc. of International Conference on Software Engineering, 2009. [32] Nicol as D'Ippolito, Victor Braberman, Nir Piterman, and Sebasti an Uchitel. Syn- thesis of live behaviour models. In Proceedings of the 18th ACM SIGSOFT inter- national symposium on Foundations of software engineering, 2010. [33] Nicol as D'Ippolito, Victor Braberman, Nir Piterman, and Sebasti an Uchitel. Syn- thesis of live behaviour models for fallible domains. In Proceedings of the 33rd International Conference on Software Engineering, 2011. [34] Nicol as D'Ippolito, Victor Braberman, Nir Piterman, and Sebasti an Uchitel. The modal transition system control problem. In Proceedings of the 18th International Symposium on Formal Methods, 2012. [35] Alexander Egyed. Automatically Detecting and Tracking Inconsistencies in Soft- ware Design Models. IEEE Transactions on Software Engineering, 37(2), 2011. 163 [36] Michael D Ernst, Je H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69(1), 2007. [37] Alessandro Fantechi and Stefania Gnesi. Formal modeling for product families engi- neering. In Proceedings of the 12th International Software Product Line Conference, 2008. [38] Dario Fischbein, Victor Braberman, and Sebastian Uchitel. A Sound Observational Semantics for Modal Transition Systems. In Proceedings of the 6th International Colloquium on Theoretical Aspects of Computing, 2009. [39] Dario Fischbein, Nicolas D'Ippolito, Greg Brunet, Marsha Chechik, and Sebastian Uchitel. Weak Alphabet Merging of Partial Behaviour Models. ACM Transactions on Software Engineering and Methodology, 21(2), 2012. [40] Dario Fischbein and Sebastian Uchitel. On correct and complete strong merging of partial behaviour models. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008. [41] Mark Gabel and Zhendong Su. Javert: Fully automatic mining of general tem- poral properties from dynamic traces. In Proceedings of the 16th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2008. [42] Joshua Garcia, Ivo Krka, Chris Mattmann, and Nenad Medvidovic. Obtaining ground-truth software architectures. In Proceedings of the 35th International Con- ference on Software Engineering, Software Engineering in Practice, 2013. [43] Joshua Garcia, Ivo Krka, Nenad Medvidovic, and Chris Douglas. A framework for obtaining the ground-truth in architectural recovery. In Proceedings of the Joint 10th Working IEEE/IFIP Conference on Software Architecture & 6th European Conference on Software Architecture, 2012. [44] Dimitra Giannakopoulou and Je Magee. Fluent model checking for event-based systems. In Proceedings of the 9th European Software Engineering Conference held jointly with 11th ACM SIGSOFT International Symposium on Foundations of Soft- ware Engineering, 2003. [45] Sylvain Hall e and Tevk Bultan. Realizability Analysis for Message-Based Interac- tions Using Shared-State Projections. In Proceedings of the 18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2010. [46] David Harel, Hillel Kugler, and Amir Pnueli. Synthesis revisited: Generating stat- echart models from scenario-based requirements. In Formal Methods in Software and Systems Modeling. Springer, 2005. [47] David Harel and Shahar Maoz. Assert and Negate Revisited: Modal Semantics for UML Sequence Diagrams. Software and Systems Modeling, 7(2), 2008. 164 [48] David Harel and Rami Marelly. Come, let's play: scenario-based programming using LSCs and the play-engine. Springer, 2003. [49] ITU. Message sequence charts, 2004. [50] Jon Kleinberg and Eva Tardos. Algorithm Design. Pearson, 2006. [51] Ivo Krka. From requirements to partial behavior models: an iterative approach to incremental specication renement. In Proceedings of the 18th ACM SIGSOFT Symposium on Foundations of Software Engineering, Doctoral Symposium, 2010. [52] Ivo Krka, Yuriy Brun, George Edwards, and Nenad Medvidovic. Synthesizing Par- tial Component-Level Behavior Models from System Specications. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on the Foundations of Software En- gineering, 2009. [53] Ivo Krka, Yuriy Brun, and Nenad Medvidovic. Automatically mining specications from invocation traces and method invariants. Technical Report CSSE-2013-509, Center for Systems and Software Engineering, University of Southern California, 2013. [54] Ivo Krka, Yuriy Brun, Daniel Popescu, Joshua Garcia, and Nenad Medvidovic. Us- ing dynamic execution traces and program invariants to enhance behavioral model inference. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, New Ideas and Emerging Results, 2010. [55] Ivo Krka, George Edwards, Yuriy Brun, and Nenad Medvidovic. From system spec- ication to component behavioral models. In Proceedings of the 31st International Conference on Software Engineering, New Ideas and Emerging Results, 2009. [56] Ivo Krka, George Edwards, Leslie Cheung, Leana Golubchik, and Nenad Medvi- dovic. A comprehensive exploration of challenges in architecture-based reliability estimation. In Architecting Dependable Systems VI. Springer, 2009. [57] Ivo Krka, Leana Golubchik, and Nenad Medvidovic. Probabilistic automata for architecture-based reliability assessment. In Proceedings of the ICSE Workshop on Quantitative Stochastic Models in the Verication and Design of Software Systems, 2010. [58] Ivo Krka and Nenad Medvidovic. Revisiting modal interface automata. In Formal Methods in Software Engineering: Rigorous and Agile Approaches, 2012. [59] Ivo Krka and Nenad Medvidovic. Distributing renements of a system-level partial behavior model. In Proceedings of the 21st International Conference on Require- ments Engineering, 2013. [60] Ivo Krka and Nenad Medvidovic. Component-aware triggered scenarios. In Proceed- ings of the 11th Working IEEE/IFIP Conference on Software Architecture, 2014. 165 [61] Kim Larsen and Bent Thomsen. Larsen, kim g., and bent thomsen. In Proceedings of the 3rd IEEE Symposium on Logic in Computer Science, 1988. [62] Kim G Larsen, Ulrik Nyman, and Andrzej Wasowski. Modal I/O automata for interface and product line theories. In Proceedings of the 16th European Symposium on Programming, 2007. [63] Kim Guldstrand Larsen and Liu Xinxin. Equation solving using modal transition systems. In Proceedings of the 5th IEEE Symposium on Logic in Computer Science, 1990. [64] David Lo and Siau Khoo. QUARK: Empirical assessment of automaton-based specication miners. In Proceedings of the 13th Working Conference on Reverse Engineering, 2006. [65] David Lo and Siau-Cheng Khoo. Smartic: towards building an accurate, robust and scalable specication miner. In Proceedings of the 14th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2006. [66] David Lo, Leonardo Mariani, and Mauro Pezz e. Automatic steering of behavioral model inference. In Proceedings of the the 7th Joint Meeting of the European Soft- ware Engineering Conference and the ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2009. [67] Davide Lorenzoli, Leonardo Mariani, and Mauro Pezz e. Automatic generation of software behavioral models. In Proceedings of the 30th International Conference on Software Engineering, 2008. [68] Je Magee and Je Kramer. Concurrency: State Models & Java Programs. John Wiley & Sons, 2006. [69] Erkki M akinen and Tarja Syst a. MAS{an interactive synthesizer to support be- havioral modeling in UML. In Proceedings of the 23rd International Conference on Software Engineering, 2001. [70] Sam Malek, George Edwards, Yuriy Brun, Hossein Tajalli, Joshua Garcia, Ivo Krka, Nenad Medvidovic, Marija Mikic-Rakic, and Gaurav S Sukhatme. An Architecture- Driven software mobility framework. Journal of Systems and Software, 83(6):972{ 989, 2010. [71] Chris A Mattmann, Joshua Garcia, Ivo Krka, Daniel Popescu, and Nenad Med- vidovic. The anatomy and physiology of the grid revisited. In Proceedings of the Joint 8th Working IEEE/IFIP Conference on Software Architecture & 3rd European Conference on Software Architecture. IEEE, 2009. [72] Nenad Medvidovic, Hossein Tajalli, Joshua Garcia, Ivo Krka, Yuriy Brun, and George Edwards. Engineering heterogeneous robotics systems: A software architecture-based approach. Computer, 44(5):62{71, 2011. 166 [73] Nenad Medvidovic and Richard N Taylor. A classication and comparison frame- work for software architecture description languages. IEEE Transactions on Soft- ware Engineering, 26(1), 2000. [74] MTSGen. http://www-scf.usc.edu/ ~ krka/MTSGen.zip. [75] Bashar Nuseibeh. Weaving together requirements and architectures. Computer, 34(3), 2001. [76] OMG. UML 2.2 specication. http://www.omg.org/spec/UML/2.2/, 2009. [77] Nadia Polikarpova, Ilinca Ciupa, and Bertrand Meyer. A comparative study of programmer-written and automatically inferred contracts. In Proceedings of the 18th International Symposium on Software Testing and Analysis, 2009. [78] Michael Pradel, Philipp Bichsel, and Thomas R Gross. A framework for the eval- uation of specication miners based on nite state machines. In Proceedings of the 26th IEEE International Conference on Software Maintenance, 2010. [79] Michael Pradel and Thomas R Gross. Leveraging test generation and specication mining for automated bug detection without false positives. In Proceedings of the 34th International Conference on Software Engineering, 2012. [80] Steven P Reiss and Manos Renieris. Encoding program executions. In Proceedings of the 23rd International Conference on Software Engineering, 2001. [81] Matthias Schur, Andreas Roth, and Andreas Zeller. Mining behavior models from enterprise web applications. In Proceedings of the 9th Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software En- gineering, 2013. [82] Bikram Sengupta and Rance Cleaveland. Triggered Message Sequence Charts. IEEE Transactions on Software Engineering, 32(8), 2006. [83] Sharon Shoham, Eran Yahav, Stephen J Fink, and Marco Pistoia. Static Specica- tion Mining Using Automata-Based Abstractions. IEEE Transactions on Software Engineering, 34(5), 2008. [84] German Sibay, Victor Braberman, Sebastian Uchitel, and Je Kramer. Synthesising modal transition systems from triggered scenarios. IEEE Transactions on Software Engineering, 39(7), 2013. [85] German Sibay, Sebastian Uchitel, and Victor Bramerman. Existential live sequence charts revisited. In Proceedings of the 30th International Conference on Software Engineering, 2008. [86] German E Sibay, Sebasti an Uchitel, Victor Braberman, and Je Kramer. Dis- tribution of modal transition systems. In Proceedings of the 18th International Symposium on Formal Methods, 2012. 167 [87] Marin Silic, Goran Delac, Ivo Krka, and Sinisa Srbljic. Scalable and accurate prediction of availability of atomic web services. IEEE Transactions on Services Computing, 2013. [88] Richard N Taylor, Nenad Medvidovic, and Eric M Dashofy. Software Architecture: Foundations, Theory, and Practice. John Wiley & Sons, 2009. [89] S. Uchitel, R. Chatley, Je Kramer, and J Magee. System architecture: the context for scenario-based model synthesis. In Proceedings of 12th International Symposium on Foundations of Software Engineering, 2004. [90] Sebastian Uchitel, Greg Brunet, and Marsha Chechik. Behaviour model synthesis from properties and scenarios. In Proceedings of the 29th International Conference on Software Engineering, 2007. [91] Sebastian Uchitel, Greg Brunet, and Marsha Chechik. Synthesis of Partial Behavior Models from Properties and Scenarios. IEEE Transactions on Software Engineer- ing, 35(3), 2009. [92] Sebastian Uchitel, Robert Chatley, Je Kramer, and Je Magee. System Architec- ture: The Context for Scenario-based Model Synthesis. In Proceedings of the 12th ACM SIGSOFT Symposium on Foundations of Software Engineering, 2004. [93] Sebastian Uchitel and Marsha Chechik. Merging partial behavioural models. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2004. [94] Sebastian Uchitel, Je Kramer, and Je Magee. Behaviour model elaboration using partial labelled transition systems. In Proceedings of the fourth joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2003. [95] Sebastian Uchitel, Je Kramer, and Je Magee. Synthesis of behavioral models from scenarios. IEEE Transactions on Software Engineering, 29(2), 2003. [96] Sebastian Uchitel, Je Kramer, and Je Magee. Incremental elaboration of scenario-based specications and behavior models using implied scenarios. ACM Transactions on Software Engineering and Methodology, 13(1), 2004. [97] Axel van Lamsweerde. Requirements Engineering: From System goals to UML models to software specications. John Wiley & Sons, 2009. [98] R. Van Ommering, Frank van der Linden, Je Kramer, and Je Magee. The koala component model for consumer electronics software. Computer, 33(3), 2002. [99] Hans van Vliet. Software Engineering: Principles and Practice. John Wiley & Sons, 2008. 168 [100] N. Walkinshaw and K. Bogdanov. Inferring nite-state models with temporal con- straints. In Proceedings of the International Conference on Automated Software Engineering, 2008. [101] Neil Walkinshaw, Bernard Lambeau, Christophe Damas, Kirill Bogdanov, and Pierre Dupont. Stamina: a competition to encourage the development and as- sessment of software model inference techniques. Empirical Software Engineering, 18(4), 2012. [102] Yi Wei, Carlo A Furia, Nikolay Kazmin, and Bertrand Meyer. Inferring better contracts. In Proceedings of the 33rd International Conference on Software Engi- neering, 2011. [103] John Whaley, Michael C Martin, and Monica S Lam. Automatic extraction of object-oriented component interfaces. In Proceedings of the 13th International Sym- posium on Software Testing and Analysis, 2002. [104] Jon Whittle and Johann Schumann. Generating statechart designs from scenarios. In Proceedings of the 22nd International Conference on Software Engineering, 2000. [105] S. Winkler and J. von Pilgrim. A survey of traceability in requirements engineering and model-driven development. Software and Systems Modeling, 9(4), 2010. [106] Tao Xie, Suresh Thummalapenta, David Lo, and Chao Liu. Data mining for soft- ware engineering. Computer, 42(8), 2009. [107] Y. Xiong, Z. Hu, H. Zhao, H. Song, M. Takeichi, and H. Mei. Supporting automatic model inconsistency xing. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2009. [108] Yices SMT Solver. http://yices.csl.sri.com, 2009. [109] Sai Zhang, David Sa, Yingyi Bu, and Michael D Ernst. Combined static and dynamic automated test generation. In Proceedings of the 20th International Sym- posium on Software Testing and Analysis, 2011. 169
Abstract (if available)
Abstract
Use‐case scenarios, with notations such as UML sequence diagrams, are widely used to specify the desired behaviors of a software system. Scenarios are often complemented with formalized system properties (e.g., event invariants). These intuitive requirements notations only partially specify the system‐to‐be by prohibiting or requiring certain behaviors, while leaving other behaviors uncategorized into either of those. During early stages of a system's life cycle, engineers iteratively specify and elaborate the scenario‐based requirements by elaborating existing and eliciting new scenarios. In parallel, engineers design the system's software architecture, consisting of multiple independently running components, that should be consistent with and satisfy the elicited requirements. ❧ Although intuitive, the existing requirements notations allow engineers to specify behaviors with unintended semantic side‐effects. In particular, the current practices support reasoning about and specification of behaviors exclusively at the system level, in contrast to the fact that a system consists of interacting components. This runs the risk of arriving at an inconsistent requirements specification (i.e., one that is not realizable as a composition of the system's components), which can prove costly if left unresolved. Furthermore, the lack of a direct mapping from requirements to a specification of components' behaviors duplicates the specification effort as the same behaviors need to be specified both as a part of requirements and architecture specifications. This also hampers the traceability that should ideally exist from requirements to the eventual implementation. ❧ To address the shortcomings of the current practices, this dissertation implements three strategies to enable transitioning from a scenario‐based requirements specification to a set of component‐level behavior models: (1) heuristically creating component MTSs from a system‐level scenario‐based specifications, (2) enhancing the way scenarios are specified, and (3) mapping the refinements performed on a system MTS to refinements to‐be‐performed on component‐level MTSs. The component models are specified as modal transition systems (MTS)—a partial‐behavior modeling formalisms that accurately captures the required, prohibited, and undefined behaviors of the system components. ❧ The implementations of the three strategies are intended for different development contexts and work with different inputs: ❧ 1. A heuristic algorithm that synthesizes a set of component MTSs from a set of existential scenarios and event invariants. ❧ 2. Component‐aware Triggered Scenarios (caTS), a triggered‐scenario language that enables expressing reactive behaviors of system components. ❧ 3. A framework that, given a system MTS refinement based on a new requirement, propagates that refinement to a set of component MTSs. ❧ The MTSs produced using these techniques can be used for automated analyses (e.g., requirements consistency checking) and requirements elicitation, while ensuring traceability and consistency between the requirements and architecture specifications. To assist traceability and consistency checking between the system specifications and the eventual system implementation, this dissertation proposes Trace‐Enhanced MTS Inference (TEMI) algorithm that extracts component MTSs from the observed system executions. ❧ The proposed techniques have been theoretically evaluated to analyze their complexity, as well as to establish their correctness and completeness. The techniques have been applied on a number of real‐world and automatically generated case studies. The results suggest that the generated MTSs accurately capture those component implementations that (1) necessarily provide the behavior required by the scenarios, (2) restrict behavior forbidden by the requirements specification, and (3) leave the behavior that is neither explicitly required nor forbidden as undefined. Furthermore, the proposed techniques help to detect potential specification flaws as they are specified, correct the existing errors, and prevent future inconsistencies. The techniques also scale to larger system specifications than the prior state‐of‐the‐art in terms of the running times required to generate component MTSs and the specification effort required to specify the desired behaviors. Finally, the performed evaluations confirm that the TEMI algorithm produces models of significantly higher quality than the state‐of‐the‐art in dynamic model inference.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Automated synthesis of domain-specific model interpreters
PDF
Calculating architectural reliability via modeling and analysis
PDF
Techniques for methodically exploring software development alternatives
PDF
Constraint-based program analysis for concurrent software
PDF
Reducing inter-component communication vulnerabilities in event-based systems
PDF
Proactive detection of higher-order software design conflicts
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
A model for estimating schedule acceleration in agile software development projects
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Integration of digital twin and generative models in model-based systems upgrade methodology
PDF
Multidimensional characterization of propagation channels for next-generation wireless and localization systems
PDF
A system framework for evidence based implementations in a health care organization
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Benchmarking interactive social networking actions
PDF
Learning logical abstractions from sequential data
PDF
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
Asset Metadata
Creator
Krka, Ivo
(author)
Core Title
Deriving component‐level behavior models from scenario‐based requirements
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/11/2014
Defense Date
10/03/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
formal methods,modal transition systems,model inference,model synthesis,OAI-PMH Harvest,requirements specification,software architecture modeling,use‐case scenarios
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medvidović, Nenad (
committee chair
), Golubchik, Leana (
committee member
), Gupta, Sandeep K. (
committee member
), Uchitel, Sebastian (
committee member
)
Creator Email
ivo.krka@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-368265
Unique identifier
UC11296246
Identifier
etd-KrkaIvo-2287.pdf (filename),usctheses-c3-368265 (legacy record id)
Legacy Identifier
etd-KrkaIvo-2287.pdf
Dmrecord
368265
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Krka, Ivo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
formal methods
modal transition systems
model inference
model synthesis
requirements specification
software architecture modeling
use‐case scenarios