Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Design-time software quality modeling and analysis of distributed software-intensive systems
(USC Thesis Other)
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DESIGN-TIME SOFTWARE QUALITY MODELING AND ANALYSIS OF DISTRIBUTED SOFTWARE-INTENSIVE SYSTEMS by Leslie Chi-Keung Cheung A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2011 Copyright 2011 Leslie Chi-Keung Cheung Acknowledgements First of all, I would like to express my thanks to my advisor Leana Golubchik. Without her guidance I would not have completed this dissertation On technical matters, she gave me a lot of useful comments to improve this dissertation. Personally, she gave me tremendous amount of support and encouragement, especially during some very hard times. Thanks also go to Nenad Medvidovic, who has helped me significantly during my PhD journey. Neno has helped me with his software engineering wisdom, and has always been available when I needed help. I thank Sandeep Gupta, Gaurav Sukhatme and Shahram Ghandeharizadeh for serv- ing on my guidance committee, and for providing invaluable feedback to my work. Also, I thank William GJ Halfond and Fei Sha for their comments on improving an earlier version of the Web service work. I would like to take this opportunity to thank my first research advisor, Andrea Arpaci-Dusseau of UW-Madison, for introducing me to computer systems research. I would not have thought about doing a PhD without the enjoyable experience working with her. I am grateful to have an opportunity to work with my colleagues in the Internet Multimedia Lab: Alix Chow, Yan Yang, Yuan Yao, Kai Song, Bo-Chun Wang, Sung- Han Lin, Ranjan Pal and Abhishek Sharma. I thank each of you for your feedback on ii my work, and for staying for my long and endless presentation dry-runs. I enjoy my time working with each of you. I appreciate all the help I got from my colleagues in the software architecture group. Ivo Krka deserves special thanks, as he has helped me tremendously in improving SHARP, as well as reading early drafts of my papers and this dissertation. I also thank Roshanak Roshandel for the fruitful discussions on the component operational profile estimation work, Sam Malek for his help on the DeSi experiments, and George Edwards and Chiyoung Seo for their help on the MIDAS experiments. Finally, I cannot imagine completing my PhD without the love and support from my family. My parents, Alex and Susanna, have been my huge supporters, and helped me in any way they can. I always enjoy spending my time with my brothers, Frank and Felix, whenever we can. Felix, I wish you good luck on your own PhD journey. iii Table of Contents Acknowledgements ii List of Tables vi List of Figures vii Abstract x Chapter 1: Introduction 1 1.1 SHARP: A Scalable, Hierarchical, Architecture-Level Reliability Pre- diction Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 An Approach to Performance Estimation of Third-Party Web Services from a Client’s Perspective . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Design-Time Operational Profile Estimation . . . . . . . . . . . . . . . 13 1.4 Contributions and Validation . . . . . . . . . . . . . . . . . . . . . . . 15 1.5 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 2: Related Work 17 2.1 Design-Level Software Reliability Analysis . . . . . . . . . . . . . . . 17 2.1.1 Applicability to Concurrent Systems . . . . . . . . . . . . . . . 19 2.1.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Testing-Based Software Performance Estimation Techniques . . . . . . 23 Chapter 3: SHARP: A Scalable Framework for Reliability Prediction of Concurrent Systems 27 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 Architectural-Level Defect Analysis . . . . . . . . . . . . . . . 31 3.2 An Overview of the SHARP framework . . . . . . . . . . . . . . . . . 32 3.3 Reliability Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Basic Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 SEQ Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.3 PAR Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 iv 3.4.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 4: Performance Estimation of Third-Party Components 72 4.1 A Framework for WS Performance Prediction . . . . . . . . . . . . . . 73 4.1.1 Step 1: Performance Testing . . . . . . . . . . . . . . . . . . . 74 4.1.2 Step 2: Regression Analysis . . . . . . . . . . . . . . . . . . . 75 4.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.1 Interpolation Errors . . . . . . . . . . . . . . . . . . . . . . . . 90 4.2.2 Extrapolation Errors . . . . . . . . . . . . . . . . . . . . . . . 95 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Chapter 5: Parameter Estimation in Quality Analysis of Software Com- ponents 100 5.1 Component Reliability Prediction Framework . . . . . . . . . . . . . . 101 5.2 Operational Profile Modeling . . . . . . . . . . . . . . . . . . . . . . . 105 5.3 Evaluation of Operational Profile Estimation . . . . . . . . . . . . . . . 111 5.3.1 Evaluation of SCRover’s Controller . . . . . . . . . . . . . . . 113 5.3.2 Evaluation of DeSi . . . . . . . . . . . . . . . . . . . . . . . . 120 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Chapter 6: Conclusions and Future Work 130 6.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2.1 Integrating Firmware Properties . . . . . . . . . . . . . . . . . 132 6.2.2 Performability Analysis . . . . . . . . . . . . . . . . . . . . . . 133 6.2.3 Reliability Testing . . . . . . . . . . . . . . . . . . . . . . . . 134 Bibliography 136 v List of Tables 3.1 r i andt i of the MIDAS scenarios . . . . . . . . . . . . . . . . . . . . . 44 3.2 Values ofP(C k ) andR(C k ) in the System scenario . . . . . . . . . . . 50 3.3 Values ofP j (c j ) in the System scenario . . . . . . . . . . . . . . . . . . 54 3.4 Worst-case complexities . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 Summary of computational costs in practice . . . . . . . . . . . . . . . 64 4.1 Comparisons of TPC WS interpolation results . . . . . . . . . . . . . . 86 4.2 Errors in response time estimates usingQN 3 . . . . . . . . . . . . . . . 89 4.3 TPC WS interpolation errors . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 Average Interpolation Errors . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Extrapolation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1 Defects injected in DeSiController . . . . . . . . . . . . . . . . . . . . 122 vi List of Figures 1.1 The problem space in early software quality analysis — Cost vs. Infor- mation Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 The problem space in early software quality analysis — Quality metric vs. Information Availability . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 An example of the Cheung’s model [20] . . . . . . . . . . . . . . . . . 17 3.1 An Overview of the MIDAS system . . . . . . . . . . . . . . . . . . . 27 3.2 Components’ state diagrams . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Sequence diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 MIDAS scenarios organized in a hierarchy . . . . . . . . . . . . . . . . 31 3.5 An illustration of SHARP applied on the complex Sensor measurement scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6 Component submodels of SensorGW . . . . . . . . . . . . . . . . . . . 37 3.7 SBMs of the basic scenarios . . . . . . . . . . . . . . . . . . . . . . . 38 3.8 QN model of the SensorGW scenario . . . . . . . . . . . . . . . . . . . 41 3.9 Rate redistribution in GUIRequest . . . . . . . . . . . . . . . . . . . . 43 3.10 SBMs of the SEQ scenarios . . . . . . . . . . . . . . . . . . . . . . . . 46 3.11 Models for completion rate computation for GUI LOOP and ControlAC 47 3.12 SBMs of the PAR scenarios . . . . . . . . . . . . . . . . . . . . . . . . 51 3.13 Probability distribution of the number of completed instances . . . . . . 52 3.14 Computational Cost in Practice . . . . . . . . . . . . . . . . . . . . . . 65 3.15 Sensitivity Analysis of the Sensor PAR scenario . . . . . . . . . . . . . 67 vii 3.16 Sensitivity analysis at the system level . . . . . . . . . . . . . . . . . . 68 3.17 Sensitivity analysis of the Client-Server system . . . . . . . . . . . . . 69 3.18 Computational cost of SHARP with and without truncation . . . . . . . 70 3.19 Errors caused by model truncation . . . . . . . . . . . . . . . . . . . . 70 4.1 An overview of our WS performance prediction framework . . . . . . . 72 4.2 Extrapolation using standard regression methods . . . . . . . . . . . . . 78 4.3 An Example Objective Function . . . . . . . . . . . . . . . . . . . . . 80 4.4 Extrapolation using queueing models . . . . . . . . . . . . . . . . . . . 81 4.5 TPC WS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.6 An overview of the hybrid approach . . . . . . . . . . . . . . . . . . . 87 4.7 Results usingQN 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.8 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.9 Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.1 Dynamic Behavior Model of the Controller Component . . . . . . . . . 101 5.2 Software Component Reliability Prediction Framework . . . . . . . . . 102 5.3 Reliability Model of the Controller Component . . . . . . . . . . . . . 103 5.4 Analysis of sensitivity to information sources of SCRover’s Controller . 115 5.5 Analysis of sensitivity to operational profiles of SCRover’s Controller . 117 5.6 Dynamic behavior models of the Controller component at two different levels of granularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.7 Analysis of sensitivity to models of different granularities of SCRover’s Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.8 Architectural models of the DeSiController component at different lev- els of detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.9 Analysis of sensitivity to information sources of DeSiController . . . . 124 viii 5.10 Analysis of sensitivity to operational profiles of DeSiController . . . . . 126 5.11 Analysis of sensitivity to models of different granularities of DeSiCon- troller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 ix Abstract As our reliance on software system grows, it is becoming more important to understand a system’s quality, because systems that provide poor quality of service have costly conse- quences. It has been shown that addressing problems late, such as after implementation, is prohibitively expensive, because it may involve redesigning and reimplementing the software system. Thus, it is important to analyze software system quality early, such as during system design. In early software quality analysis, in addition to analyzing components that are developed from scratch, it is also necessary to analyze existing components that are being integrated into the system, because software designers make use of them to save development cost. We focus on two aspects of early software quality analysis: the cost of analysis and parameter estimation. First, we address the high cost of existing design-level quality analysis techniques. In modeling complex systems, existing design-level approaches may generate models that are computationally too expensive to solve. This problem is exacerbated in concurrent systems, as existing design-level approaches suffer from the state explosion problem. To address this challenge, we propose SHARP, a design-level reliability prediction framework that analyzes complex specifications of concurrent sys- tems. SHARP analyzes a hierarchical scenario-based specification of system behavior and achieves scalability by utilizing the scenario relations embodied in this hierarchy. SHARP first constructs and solves models of the basic scenarios, and combines the x obtained results based on the defined scenario dependencies; this process iteratively continues through the specified scenario hierarchy until finally obtaining the system reliability. Our evaluations indicate that (a) SHARP is almost as accurate as a tradi- tional non-hierarchical method, and (b) SHARP is more scalable than other existing techniques. Second, we address the high cost of testing-based approaches, which are typically used in analyzing the quality of existing software components. However, since testing- based approaches require sending a large number of requests to the components under testing, it is quite an expensive process, particularly when testing at high workloads (i.e., where performance degradations are likely to occur) — this may render the component under testing unusable during the tests’ duration (which is also a particularly bad time to have a system be unavailable). Avoiding testing at high workloads by extrapolating (from data collected at low workloads), e.g., through regression analysis, results in lack of accuracy. To address this challenge, we propose a framework that utilizes the benefits of queueing models to guide the extrapolation process, while maintaining accuracy. Our extensive experiments show that our approach gives accurate results as compared to standard techniques (i.e., use of regression analysis alone). Finally, we address the problem of parameter estimation in existing design-level approaches. An important step in software quality analysis is to estimate the model parameters, which describe, for example, how the system and its components are used (this is known as their operational profile). This information is assumed to be available in existing design-level approaches, but it is unclear how existing approaches obtain such information to estimate model parameters. We identify sources of information available during design, and describe how information from different sources can be translated for use in the context of component reliability estimation. Our evaluation and validation xi experiments indicate that use of our approach in determining operational profiles results in accurate reliability estimates, where implementations are used as ground truth. xii Chapter 1 Introduction Software systems play a major role in our everyday lives. Nowadays, we rely on soft- ware systems to perform many tasks, including personal communication using email and instant messaging, business applications for online shopping and business-to-business services, software controllers of medical devices, aircraft control systems, and so on. As our reliance on software systems grows, analyzing the system’s quality has become more important. Software systems providing poor quality of service may cause incon- venience, hurt business income and reputation, and may even cause loss of human lifes. Traditionally, software system quality system is after the system has been built using testing-based approaches. For example, to ensure the correctness of a system, software engineers prepare a suite of test cases, with a variety of valid and invalid requests, and compare the system’s output with the anticipated output, determined using the system’s specifications. Another example is performance testing, for ensuring that the system provides acceptable performance. In a performance test, software engineers generate a large number of requests according to some traffic models or existing workloads, and measure various system performance metrics, such as system throughput, aver- age response time, and utilization. However, testing-based approaches are expensive, because they involve sending a large number of requests to ensure the system is tested thoroughly. In many cases, correcting the problems can be even more expensive, as mitigating the problems may involve redesigning, rebuilding, and retesting the entire system, which can be orders of magnitude more expensive than if such problems are discovered and addressed during system design [14]. 1 At the same time, during system design, software engineers are faced with many design decisions, many of which have significant impact on software quality. For exam- ple, if the software system is designed to be deployed in harsh physical environments (e.g., in a tropical rain forest), system designers need to consider the level of redundancy needed for their applications: this requires studying the tradeoff between cost and sys- tem reliability in deploying redundant components. As another example, software engi- neers may choose between developing a software component from scratch, or utilizing a third party component. The use of third-party components saves development cost, but integrating the component into the system under design may not be always trivial, and it may be more difficult to address problems when they arise. Therefore, early qual- ity analysis, such as during software architecture design, is important in building high quality software systems. Software architecture provides high-level abstractions for rep- resenting the structure, behavior, and key properties of a software system [53, 71]. A software system’s architecture comprises a set of computational elements (components), their interactions (connectors), and their compositions into systems (configurations). Early software quality analysis allows software designers to study design tradeoffs and assess their impacts on software quality, such that software designers can make more informed design decisions, and hence architect better software systems. We focus on two important aspects in early software quality analysis: the cost of the analysis and parameter estimation. Before discussing these in details, let us consider the problem space in early software quality analysis, which is depicted in Figure 1.1. The x-axis describes the amount of information available to system components. Software engineers may design new components or integrate existing components, possibly pro- vided by a third-party, into the system. Information available about existing components varies: making use of existing components that are developed in-house (e.g., from an older version of the system) or open-source software allows access to the source code 2 High Low Cost of Analysis Information Avaliability Design Models Binaries Accessible Binaries Available Source Code Available Chap 3 Design-level approaches Chap 5 Existing testing-based approaches Chap 4 reverse engineering program analysis Figure 1.1: The problem space in early software quality analysis — Cost vs. Information Availability and perhaps expert information about the components (Source Code Available); pur- chasing software from third-party vendors may allow access to only the binaries (Bina- ries Available); and, utilizing software that is deployed by a third-party allows access to the component’s services, but neither the source code nor the binaries are available as the component is deployed by a third-party (Binaries Accessible). In the cases above, we can apply testing-based techniques as the component has been built. On the other hand, when software engineers design new components, only design models are avail- able for software quality analysis (Design Models). Testing-based techniques are not applicable, because the component has not yet been built. The y-axis in Figure 1.1 describes the cost of analysis. While testing-based tech- niques can be applied when the implementation is available, they are expensive as they involve making a large number of requests. If the source code or the binaries are avail- able, program analysis- and/or reverse engineering-based techniques can be applied. While these techniques have lower cost, they are not applicable when the binaries are 3 accessible but unavailable for quality analysis (i.e., software in the “Binaries Accessi- ble” category). For example, reverse engineering-based approaches cannot be applied when the software is hosted by a third party, in which users have accesses to the ser- vice, but cannot obtain the binaries. While a number of approaches that leverage design models have been proposed (when only design models are available) (see Chapter 2 for details), they are costly to apply. As the scale and degree of concurrency of mod- ern software systems have grown significantly, incorporating the complex relationships between different parts of the system in a tractable way has become more challenging. An intractable approach would be too expensive to apply, and hence not useful in evalu- ating and improving the system. Thus, the computational cost of solving for the quality metric of interest is prohibitively high in existing design-level approaches for larger sys- tems, and this scalability problem is exacerbated in concurrent systems, in which the state space is much larger. Moreover, none of the existing design-level approaches discusses how model param- eters, which describe the software system runtime behaviors, can be obtained. These approaches are not useful without accurate estimation of the model parameters. This is illustrated as a dotted box in Figure 1.1. Figure 1.2 shows another dimension of the problem space. The y-axis represents software quality metrics, that each metric describes a different aspect of quality of the system. In this dissertation, we focus on analyzing system performance and reliability. Performance usually refers to the response time or throughput as seen by the users [67], and reliability can be informally defined as the probability that the system performs “correctly”, as specified in its requirements specification. When the source code is available, program analysis-based techniques can be applied to evaluate the system’s performance and reliability. For example, software profiling techniques (e.g., [27]) identify how much time is spent on different parts of 4 Reliability Performance Metric Information Avaliability Design Models Binaries Accessible Binaries Available Source Code Available Design-level approaches Chap 3,5 Chap 4 reverse engineering program analysis testing-based approaches Figure 1.2: The problem space in early software quality analysis — Quality metric vs. Information Availability the code, while code-level reliability analysis techniques (e.g., [20]) build a reliability model from the source code, and solve the model for a reliability estimate. When the source code is unavailable, although reverse engineering-based approaches (e.g., [40]) have been applied in performance estimation in the “Binaries Available” case, it is typical to rely on testing-based approaches to evaluate a system’s performance and reliability. For example, we send a large amount of requests and mea- sure the average response time for performance analysis, and observe the number of errors the system returns for reliability analysis. As noted earlier, the cost of testing- based approaches is high, and we strive to reduce their cost in this dissertation. In Chapter 3, we address the high cost of design-time reliability analysis. We choose to tackle this problem because (1) during system design, it is imperative to ensure the system is reliable, or the system would not be usable; and (2) existing design-level approaches are unable to scale to larger systems. We focus on reliability instead of per- formance here because reliability can be defined more broadly to include performance characteristics as well, by specifying performance requirements in the requirements 5 specification. (Recall that a system is considered unreliable when it violates any require- ment documented in its requirements specification) For instance, a system designer may specify a requirement that the system should process a time-sensitive request withinX seconds. If the system fails to process such request within the specified time, it is con- sidered unreliable. As part of our future work, we plan to integrate performance and reliability into one unified framework, which is typically known as performability [50] (see Chapter 6.2.2). In Chapter 4, we focus on reducing the cost of testing-based approaches in evalu- ating the performance of software in the “Binaries Accessible” category. We choose to tackle this problem because (1) performance testing is expensive, especially at high workload; and (2) there is no alternative to testing-based approaches to evaluate soft- ware this category. Reliability testing is also very expensive, as it involves sending a very large number of requests to observe an error. For example, a system’s reliability is usually specified using the five 9’s rule. i.e., its reliability is expected to be at least 99.999%. This implies that, on average, it requires sending 100,000 requests to this system before we observe an error. Reducing the cost of reliability testing remains a challenge; we consider our work in Chapter 4 in reducing the cost of performance test- ing as a first step in this direction, and consider reliability testing as part of our future work (see Chapter 6.2.3). In Chapter 5, we address the problem of parameter estimation in design-level relia- bility analysis. This is an important problem because it is unreasonable to assume the availability of an “oracle” to provide model parameters, as they typically correspond to the system’s runtime behavior. In addition, we will show, in Chapter 5.3, that even if such an oracle exists, the information may be inaccurate, which results in inaccu- rate reliability estimates. Parameter estimation in design-level performance estimation 6 approaches is also an important topic. However, unlike reliability estimation, perfor- mance estimation requires information from the underlying platform, and integrating such information during system design is challenging. We address this problem as part of our future work (see Chapter 6.2.1). We note that system reliability is also affected by the reliability of its underlying platform. Yet, many reliability problems are rooted in design errors, and these problems manifest themselves regardless of which platform we deploy the software on. Integrating firmware properties into our reliability analysis is also part of our future work. We approach these problems as follows: 1. We address the high cost of design-level approaches in the “Design Models” cat- egory in Figure 1.1 by proposing SHARP (Chapter 3). SHARP is a design-level reliability prediction framework that has significantly lower computational cost than existing approaches. More specifically, SHARP analyzes a small part of the system at a time, according to the system use-case scenarios, which is a standard way software engineers divide a system into smaller pieces. The results are then combined to obtain a system reliability estimation using a hierarchical solution technique we describe in Chapter 3. The motivation behind SHARP is that solv- ing many smaller models is significantly less expensive than solving one huge model in terms of computational cost. 2. We address the high cost of testing-based approaches by proposing a framework for estimating system performance at high workload intensity, using performance information collected at low workload intensity, and applying regression-based analysis (Chapter 4). More specifically, estimating system performance at high workload intensity is very expensive, as it involves generating a large number of requests. This process may saturate the system under testing, rendering it not 7 usable. We propose a framework that leverages regression analysis to predict sys- tem performance at high workload intensity, by using performance information at low workload intensity collected through testing, which is less expensive to col- lect. We have applied this technique in predicting the performance of third-party Web services, which correspond to the “Binaries Accessible” category in Fig- ure 1.1, because, as discussed earlier, there is no alternative to estimating perfor- mance of software in this category other than through testing-based approaches. 3. To remove the assumption that model parameters are available in existing design- level approaches, we explore the sources of information during design, and study how such information can be used to estimate model parameters in the context of reliability estimation in Chapter 5. Specifically, important information which may be unavailable or uncertain during architectural design is a component’s oper- ational profile. An operational profile is unavailable since the component has not yet been implemented, hence it is not obvious how one can reliably predict its actual usage. We discuss how we estimate candidate operational profiles for relia- bility estimation by leveraging and combining information from different sources, and applying the hidden Markov model (HMM)-based approach in [19] to esti- mate operational profiles using such information. The remainder of this chapter is organized as follows: we discuss SHARP in more detail in Chapter 1.1; performance estimation of third-party components in Chapter 1.2; and parameter estimation at the design level in Chapter 1.3. Chapter 1.4 presents the contributions of this dissertation. Finally, a roadmap of the remainder of this dissertation is given in Chapter 1.5. 8 1.1 SHARP: A Scalable, Hierarchical, Architecture- Level Reliability Prediction Framework In a nutshell, existing design-level approaches generate a performance or a reliability model from the software system’s architecture, which takes information about compo- nent interactions and the performance or reliability of individual components as param- eters. The metric of interest is computed by solving the system-level model. Several survey papers have been published in this area. For example, [10, 12] are surveys on early performance analysis, and the surveys in [34, 38, 30] discuss early reliability pre- diction. For example, the performance modeling approach in [22] generates an execu- tion graph to describe component interactions. Then, it generates a queueing network from the execution graph and the system’s deployment plan, which describes on which host each component is deployed. In combining component models to compute a system reliability estimate, existing approaches have not considered the issue of scalability of the reliability model and its solution. i.e., they may result in intractable models for larger systems. This is espe- cially the case in reliability prediction of concurrent systems, in which it is typical (e.g., as in [28, 59]) to keep track of the status of all components. The size of the model is O(M C ), whereM is the number of states in a component, andC is the number of components. That is, the number of states grows exponentially with the number of com- ponents, causing the so-called state space explosion problem, which makes the model solution intractable. To address the problem of scalable reliability prediction of concurrent systems, in Chapter 3, we propose SHARP, a hierarchical reliability prediction framework. In SHARP, rather than considering a concurrent system as having simultaneously running components as in existing approaches (such as in [59]), we view it as having different 9 use-case scenarios that execute concurrently. For example, consider a sensor network application where a number of sensors take measurements and users can read the pro- cessed data at a GUI. We view it as having a sensor measurement scenario and a GUI display scenario running simultaneously. SHARP is also capable of handling complex scenarios, in which a scenario is composed of several lower-level scenarios. The lower- level scenarios may be complex scenario themselves, and SHARP is capable of handling complex scenarios with an arbitrary number of levels. As inputs to our framework, we require the system use-case scenario models (e.g., UML sequence diagram), a description of how scenarios interact (e.g., Scenario 2 starts after the execution of Scenario 1), and the system’s operational profile (estimates of which we discuss in detail in Chapter 5). We also need to identify the defects that cause system failures; we leverage the approach in [63] to identify mismatches between architectural models of the system’s component. Our framework produces system reli- ability estimates, as well as the reliability of each basic scenario, as output. To generate a system model, first we generate models of the basic scenarios by leveraging system use-case scenario models. Then, we combine the models of the basic scenarios to form a higher-level model, according to the relationships between the lower-level scenarios. Thus, system reliability is the reliability of the highest-level scenario. The motivation here is that a model of a scenario is expected to be relatively small, and that solving a number of smaller submodels (rather than one huge model) results in space and com- putational savings. We note that the use of scenario-based models is also explored in [35, 59, 78]. However, these works differ from SHARP in that [35, 78] assume a sequen- tial system, while [59] considers a “flat model” and thus suffers from the very scalability problem we are striving to address. We are able to achieve better scalability without sacrificing the level of system detail we can model. More specifically, in modeling concurrent systems, some existing works 10 (e.g., [28]) model a component as being either on or off. By doing so, while we know which component has failed, it is very difficult to tell what causes a component failure. In SHARP, by using a hierarchical approach, we are able to retain greater level of detail about the system being modeled, while doing so in a scalable way. The notion of system failure is different in different operational contexts and usages, and from different perspectives, even within a single system. Therefore, in order to pro- vide architects with a comprehensive analysis approach, a reliability estimation frame- work should be able to capture different notions of system failure. Failure rules specify the conditions under which the system fails, and are more complex in concurrent sys- tems. Existing approaches designed for sequential systems assume the system fails when the running component fails, and it is not obvious how to incorporate other failure rules into their approaches. Thus, we propose an approach which captures different notions of system failure. To this end, we allow designers to specify conditions under which the system is considered reliable, in terms of the number of failed instances of scenarios. In turn, failure rules determine how we combine the solutions of lower-level scenarios in order to compute the overall system reliability. Moreover, our approach to capturing failure rules can be applied to existing reliability prediction approaches for concurrent systems. 1.2 An Approach to Performance Estimation of Third- Party Web Services from a Client’s Perspective As discussed earlier, software designers may utilize third-party components to reduce development cost, such as reusing components from a previous project, or buying soft- ware from a third-party vendor. In the case where the component is deployed by a 11 third-party, one has to rely on testing-based approaches. However, testing the compo- nent at high workload may render it not usable during testing, as serving the testing traffic depletes its resource. We have chosen to study performance estimation in the Web service paradigm for the following reasons, which falls under the “Binaries Accessible” category in Figure 1.1. We argue that analyzing performance and reliability of third-party Web services (WSs) is more challenging than in other categories (e.g., using open-source software and com- ponents that are bought from a third-party) because (1) WSs are only required to pub- lish their interfaces (via WSDL [8]); information about their internal structure (e.g., whether they are deployed on a single host or a cluster server) and external resources (e.g., whether they use a remote database or another WS) they rely on are typically unavailable; and (2) testing-based approaches adversely affect the normal operation of a WS, which is already operational. Existing work on evaluating the quality of WSs has focused on evaluating WSs from a system administrator’s or a designer’s perspective. For example, [74] assumes the systems architecture is known and models a WS-based system using a multi-tiered architecture. Other works assume the systems architecture (e.g., how the third-party WSs are connected [75]), and/or the systems parameters (e.g., the amount of I/O time needed to complete a service [46]) are known. We argue that such an assumption is not reasonable in evaluating third-party WSs from a client’s perspective: it is not clear how such information can be obtained by a client, and the service providers may be reluctant to provide it. We focus on evaluating the performance of third-party WSs from a client’s perspec- tive, and our focus is on average response time estimation. Our major challenge is the lack of information about the target WS. This includes (1) the structure of the WS, as 12 discussed above, and (2) the parameters of each WS that provides service to complete a client’s request. Our proposed approach makes use of data collected from performance testing [51], which involves sending requests and collecting performance data at low workloads, and applying regression analysis [26] to such data for response time prediction at high work- loads. Our experiments have shown that applying standard regression analysis gives poor extrapolation results, which, in this context, corresponds to predicting the response time outside of the parameters used in performance testing. Therefore, we propose to fit the performance data to queueing models for WS response time prediction. Queueing models have been widely used in performance modeling of computer systems. Thus, we believe they are useful in modeling WS performance as well, and hypothesize that pre- dicting performance using queueing models fitted to performance testing data is more accurate than using standard approaches in the regression literature. However, the inter- polation results of using queueing models, which corresponds to predicting response time within the parameters used in performance testing, are not as good as using stan- dard regression approaches. Hence, we derive a hybrid approach that combines the benefits of using queueing models and standard regression approaches. 1.3 Design-Time Operational Profile Estimation One common theme across existing design-level approaches is that they assume run- time information about a system or a component is available, which is an unreasonable assumption because the system/component has not been implemented. For example, one important ingredient in performance and reliability modeling is the system’s and its components’ operational profiles, which describe how the system and its components are used [52]. It appears that existing approaches have assumed the availability of an 13 “oracle” that can provide model parameters; yet, such an oracle typically does not exist. Even if an oracle is available, as we will show in Chapter 5, this information is sub- jective and may be inaccurate, due to the complexity of the system, or to unexpected interaction patterns between components. The lack of operational profile information forces us to devise ways of deriving, combining, and applying other existing sources of information available during archi- tectural design. For example, (1) system engineers intuitions can be combined with (2) simulations of a component behavior constructed from the architectural model and (3) execution logs of functionally similar systems/components (e.g., from a previous version of the system under construction). By leveraging these different information sources, we can produce candidate operational profiles for reliability prediction. Although the aforementioned uncertainties present significant challenges, the avail- ability of formal software architecture models presents an opportunity which we lever- age in this work. Specifically, we leverage a component’s state-based models to generate corresponding stochastic models which, in turn, can be used to predict reliability. In thus utilizing architectural models we observe that another important ingredient in reliabil- ity prediction is information about a system’s or component’s potential failure modes. However, since software engineers most often design their systems for correct behavior, failure modes are not typically part of an architectural specification. Thus, to handle uncertainties associated with the lack of failure information, we leverage architectural defect classification and analysis techniques [63, 61] to identify inconsistencies within a component’s as well as between components’ architectural models. We demonstrate a way to study the effects of different failure modes by exploring the design space. i.e., to vary the failure-related parameters between a range of possible values and observe the resulting effects on the component’s reliability prediction. 14 1.4 Contributions and Validation To summarize, we present the contributions of this dissertation, as well as an overview of our validation process. Our first contribution is the SHARP framework, that can accurately predict con- current system reliability, while significantly reducing the computational cost needed to solve for system reliability, as compared to existing design-level, brute-force type approaches. SHARP achieves scalability through the use of a hierarchical approach, and our solution technique allows us to solve for the reliability of a part of a system at a time, and combine the results of lower-level scenarios appropriately. The motivation is that solving many smaller, scenario-based models is more efficient than solving a model of the entire system. Through extensive experimentation we validate the complexity and accuracy of this approach. Lastly, we note that SHARP is an approximation of the “flat model” (i.e., one that keeps track of the status of all components) used in other tech- niques. However, we argue that its potential scalability benefits are achieved at the cost of fairly small losses in accuracy. To address the high cost of testing-based approaches in analyzing the performance of third-party components, specifically WSs, we propose a framework for estimating performance at high workload intensity, using information collected at low workload intensity, and applying regression analysis — this is another contribution of this dis- sertation. Such an approach allows system designers to evaluate the target WS, for example, for its response time and stability conditions, and can be used in determining how the target WS should be used in designing a new WS. We evaluate the accuracy of our approach by studying its interpolation and extrapolation errors. Our results indicate that our approach is able to overcome the poor extrapolation results while maintaining the accuracy in interpolation, as compared to standard regression-based techniques. 15 Finally, we investigate estimating a component’s operational profile during design by utilizing a variety of available information sources. For instance, we utilize informa- tion from domain experts, requirements document, simulation, and functionally similar components, and apply an HMM-based approach in estimating operational profiles of a component. We evaluate the effectiveness of the reliability prediction process as a function of different information sources. For instance, our results indicate that expert knowledge alone, on which existing approaches often appear to rely, may lead to inac- curate predictions. A rigorous evaluation process on a large number of software compo- nents shows that our framework has a high degree of predictive power and resiliency to changes in the identified parameters. The framework is validated by comparisons to an implementation-level technique, which is used as the ”ground truth”. Our results indi- cate that the framework can meaningfully assess reliability of a component even when the information is distributed, sparse, and itself not entirely reliable. For instance, our initial hypothesis — that more information about a component (e.g., actual operational profile and failure behavior, and faithful detailed design model or implementation) will result in more precise reliability predictions — has in fact been borne out in our eval- uation. Additionally, our results indicate that less information consistently yields more pessimistic predictions, which we consider to be a desirable trait of the framework. 1.5 Roadmap We discuss related work in Chapter 2, and SHARP in Chapter 3. In Chapter 4, we describe our approach to performance analysis of third-party WSs. This is followed by our approach on operational profile estimation, and its application to component reliability modeling in Chapter 5. Finally, we discuss future research directions, and conclude in Chapter 6. 16 Chapter 2 Related Work This chapter describes existing works in more detail, especially on design-level approaches and testing-based techniques, because of their relevance to the work pro- posed in this dissertation. In Chapter 2.1, we describe existing design-level software reliability analysis techniques, and highlight their solution cost and assumption on the availability of model parameters. Then, we focus on testing-based approaches and their applications to performance estimation of software in the “Binaries Accessible” cate- gory in Chapter 2.2. 2.1 Design-Level Software Reliability Analysis As discussed in Chapter 1, it is important to start analyzing system quality early to save development cost. To this end, many design-level software reliability analysis tech- niques have been proposed. These approaches include [20, 33, 32, 29, 35, 36, 39, 41, 42, 28, 58, 59, 62, 66, 65, 76, 78]. A comprehensive description of these can be found in existing surveys on the topic [30, 37, 38] and the references therein. At a high level, C 1 2 4 F 3 5 6 Figure 2.1: An example of the Cheung’s model [20] 17 they make use of the system’s structure in predicting system reliability. Many of them are influenced by [20], which is one of the earliest works on design-level reliability pre- diction that considers a system’s internal structure using discrete-time Markov chains (DTMCs). An example of a model built using [20] is depicted in Figure 2.1. The states in the reliability model represent components, while the transitions represent transfer of control between components. These transitions are assumed to follow the Markov prop- erty (i.e., a transition to the next state is determined only by the current state). 1 Each component may fail with a failure probabilityf i , and a transition from Statei to a fail- ure stateF represents a system failure. Since failures are assumed to be irrecoverable in [20], the failure state is an absorbing state that has no outgoing transition. When the sys- tem has finished its execution, it transitions to a correct stateC, which is an absorbing state that represents the system has completed without error. Therefore, system reliabil- ity is defined as the probability of eventually reachingC, which can be computed using standard techniques [69]. A number of approaches have built upon [20]. For example, in [58], instead of assuming the reliabilities of components are available, it considers each component to be providing a number of services, and computes a component’s reliability by combining the reliabilities of its services. Another example is [21], which has considered error propagation in computing system reliability. They argued that an error that is caused by a component may not cause that component (and hence the system) to fail immediately. Rather, an error may propagate to other components, which then causes system failure. [32] proposed another approach using Markov chains. Instead of computing system reliability directly from a Markov chain as in [20], they compute the number of visits to each component before the system has terminated. System reliability can then be 1 [37, 30] have suggested that this assumption does not hold in many software systems. Higher-order Markov chains can be used to alleviate this problem, but the size of the model grows much faster, and the resulting model may be intractable. 18 computed by combining the components’ reliabilities, accordingly to the number of visits to each component. An advantage of this approach is that when a component’s reliability changes, we only need to multiply the component reliabilities, and do not need to solve the entire Markov chain again. The approaches we have discussed thus far are classified as state-based approaches in [37]. [37] has classified other approaches as path-based approaches. In path-based models, such as [39], system reliability is defined as the weighted sum of the reliabil- ity of each execution path, and the weights represent the probabilities that each path is executed. Some path-based approaches are tied with the system’s use-case scenar- ios. A use-case scenario describe the interactions between components to achieve a subset of the system’s functionalities, and a standard way to specify scenarios is to use sequence diagrams. For example, [35] converts a sequence diagram into a DTMC, and defines system reliability as a weighted sum of scenario reliabilities. Using path-based or scenario-based approaches allow system designers to analyze the reliability of a small part of the system at a time, and may reduce the complexity in solving the model. 2.1.1 Applicability to Concurrent Systems All approaches we have discussed so far assume a sequential system. Typically, in such approaches a reliability model keeps track of which component is running. For example, the state-based approaches, such as [20], model transfer of control between components, and assumes only one component is executing at a time: once a component has transferred control to another component, it remains idle until control is transferred back to it again. The path- or scenario-based approaches, such as [35, 39], assume only one path/scenario is being executed at a time, as they assume the probabilities that a path/scenario executes sum up to 1. 19 This assumption is problematic in modeling concurrent systems, in which many components may be running simultaneously. Thus, in analyzing systems with simul- taneously running components, one typically needs to keep track of the status of all components. Such an approach is taken, for instance, in [28, 76], and we refer to it as a “flat model”. In [28, 59, 76], a state S in the model is described usingC variables, whereC is the number of components in the system, i.e., S = (S 1 ,S 2 ,...,S C ). In [28, 76], components are modeled as black-boxes, which are either active or idle, i.e., S i = 0 when Componenti is idle andS i = 1 when Componenti is active. In addition to scalability problems, this is also a shortcoming since representing the internal struc- ture of components facilitates more accurate models. For example, some defects may only be triggered when the component performs certain functions, and thus not having a sufficient level of granularity in the reliability model could lead to poorer reliability estimation. To address this, instead of modeling the status of a component as either active or idle, one can use a finer-granularity component model; this would result in the type of model used in [59], whereS i represents the state of Componenti. Specifically, [59] generates component models from scenario sequence diagrams and then generates a system model by combining the component models using the parallel composition. We note that, as in [59], SHARP (Chapter 3) models systems at a finer granularity level through the use of scenarios. However, unlike [59], we employ a hierarchical approach, which is intended to yield better scalability. Since such approaches essentially generate reliability models in a brute-force manner, they suffer from scalability (i.e., “state explo- sion”) problems. As a result, such models are often prohibitively costly to generate and solve, even for systems with a modest number of components. Other state-based approaches, such as those based on stochastic Petri nets (SPNs), suffer from the same state explosion problem. Existing SPN-based approaches focus on performance analysis based on UML models (see [10] for a survey); such models can 20 be used in reliability analysis as well. However, solving the SPN requires generating the SPN’s reachability graph, which has the same state explosion problem described above. While non-state-based approaches, such as [39, 62, 77], may not have the state explosion problem and (implicitly) consider concurrency, they are not as descriptive as state-based approaches, and hence may not give accurate estimates. For instance, the work in [39] computes system reliability as a weighted average of the reliabilities of all execution paths, and the reliability of each path is the product of component reliabilities. In addition, [62] explored the use of Bayesian Networks (BNs) to model reliability of concurrent systems. States in the component state diagrams are interpreted as nodes in a BN, and transitions are interpreted as dependencies between nodes. The BN can then be solved for reliability given these dependencies and component reliabilities. However, the notion of concurrency in these approaches is limited as they do not describe flow of control. For example, the two approaches above are not able to model the time spent in each component, so that a lightly-used component has the same effect on reliability as a heavily-used component. At the same time, the notion of system failure is more complex in a concurrent sys- tem. In existing approaches that assume sequential systems, failures are represented by transitions to a failure state in the (Markov-chain based) reliability model. In these mod- els, which assume a single-threaded system, being in stateS i indicates that Component i is active, while all other components are idle. A transition from a stateS i to a failure state indicates that Componenti has failed. This means that if any active component has failed, the entire system is considered to have failed. On the other hand, in a concurrent system, since more than one component may be running, assuming the system has failed if any active component has failed may be inflexible. However, existing approaches that are applicable to concurrent systems are not able to handle complex failure conditions. For example, in [59], the system transitions to a failure state when any active component 21 fails. The work in [28, 32] does not include failure states explicitly; rather essentially a reward is assigned to each state (with the value of the reward representing the proba- bility of the system failing in that state), where the system’s reliability is computed as a Markov reward function [69]. This system failure description is also limited, which assumes that the system fails when any (active) component fails. [76] provides a some- what richer description of system failures, where a reliability model includes backup components that can provide services when the primary component fails; the system fails when the primary component and all backup components fail. Unfortunately, this approach is not capable (without significant changes) of describing other notions of system failure, e.g., an OR-type relationship (the system fails when Component A or ComponentB fails). 2.1.2 Parameter Estimation Another common theme across these design-level reliability approaches is that it is not clear how the model parameters are estimated. This is a major challenge in design-level software quality analysis: since the implementation is not available, it is hard to gather information for analysis. The parameters that are needed for software reliability analysis are (1) the system’s operational profile, which corresponds to, for example, transition probabilities in [20], as well as probability that a scenario executes in scenario-based approaches, such as [78]; and (2) component reliabilities, which corresponds to transition probabilities to a failure state in [20]. Existing design-level reliability estimation approaches (sometimes implicitly) assume that the system’s or its components’ operational profiles are known. A sys- tem’s operational profile is typically estimated after the system has been implemented [52]. Typically, this involves analyzing the system’s traces, to determine, for example, 22 the transition probabilities between components. Estimating an operational profile of a system becomes non-trivial at the design level, because the implementation of a system is unavailable. To determine a component’s reliability, we can rely on component reliability predic- tion approaches, such as [19] (Chapter 5), to predict component reliability. However, as in the system-level, it is challenging to estimate the component’s operational profile and failure information. Some existing work acknowledge this fact, and study the effect of uncertainties about a system’s operational profile on the resulting reliability estimates. For example, [36] provides an analytical evaluation of the effect of uncertainties in model parameters on the resulting system reliability. Others assume a fixed operational profile and varying component reliability, and apply traditional Markov-based sensitivity analysis [20, 66]. [60] proposes an approach to using hidden Markov models (HMMs) [55] in estimat- ing a component’s operational profile. However, it is not clear how “training data”, an input to the Baum-Welch algorithm (a parameter estimation algorithm in HMMs [55]), can be obtained. Part of our contribution in this context (see Chapter 5) is to identify how training data can be obtained at the design level. 2.2 Testing-Based Software Performance Estimation Techniques There is a vast literature on software performance evaluation, going back to [67], which proposed the software performance engineering process that has been in wide use; it examines issues in software performance evaluation, e.g., information gathering, model construction, and performance measurements. More recently, research has focused on 23 performance evaluation using software architectural models, e.g., [10] provides a rep- resentative survey on the topic. These works leverage software architectural models of their choice to generate performance models and focus on performance evaluation from a system designers’ perspective — this allows early performance evaluation which aids in avoiding costly design problems. Given the scope of our work in Chapter 4, here we discuss works that have focused on performance evaluation of third-party WSs. Although there has been significant interest in this topic, the main shortcomings of exist- ing techniques (as detailed below) include (a) high cost of measurements at high work- loads (needed by those techniques to estimate system response time) and (b) assump- tions made by those techniques about availability of information about third-party WSs. Several black-box approaches consider predicting performance of third-party com- ponents, where the performance model is built from the component’s documentation [54], or by examining the component’s binary code (e.g., Java bytecodes) [40]. How- ever, these approaches assume the availability of design models, documentation, or binaries of a third-party component, which are typically unavailable in the case of a third-party WSs. Thus, they are not readily applicable. In [17] an approach to WS performance evaluation is proposed; however, it requires testing WSs at high workloads, which is expensive. In [68], a simulation-based approach to estimate WS response time is proposed, where results from performance testing are used when the WS being tested is lightly-loaded, to obtain simulation parameters, and predicting response time for heavier loads is done using simulation. However, a short- coming of this work is the assumption (when generating the simulation model) of knowl- edge of the architecture of the WS being tested. Moreover, simulations could take a fairly long time to converge, and thus at design time, analytic techniques may be more desirable. In [75] a queueing network-based model of a composite WS is generated, in which each WS is modeled as a server in the queueing network. However, if the WS 24 being tested is a third-party WS, it is not clear how information about the structure of a composite WS can be gathered (e.g., to what other WSs the WS under testing makes requests). Another approach is to include performance information in a WS’s service descrip- tion, so that their clients can use such information for performance evaluation. For example, [24] proposes that P-WSDL includes service performance characteristics of the system (e.g., utilization and/or throughput), network information (e.g., network bandwidth), and workload characteristics (e.g., request arrival rate). We argue that ser- vice providers may be reluctant to provide such information, and it is not clear how this information can describe a composite WS, in which the service performance depends on other WSs. Lastly, [46] proposes to include demands on server resources for each inter- faces (e.g., a service requires X units of CPU time and Y units of I/O). Unfortunately, it is not clear how the service demand can be obtained, as it is difficult to map a high-level service to low-level hardware demands. Existing work on performance evaluation specific to Web applications focuses on evaluation from a service provider’s perspective. For example, [74] models a Web application as a multi-tier system. Each tier represents a different type of server, and the routing between the servers are assumed to be known. In such efforts, the goal of the analysis is to help service providers to make service-level agreements (SLA) for their clients. Our approach is orthogonal in that we evaluate a WS from the client’s perspective. The work in [56] monitors the quality of a third-party WS, including performance, to detect violations of a SLA. This work is similar to ours in the collection of statistics to evaluate a third-party WS. On the other hand, they assume the system is already oper- ational, and are interested in the distribution of performance measures (e.g., response 25 time distribution) to look for SLA violations, while we use the information to predict the performance of a WS to aid service selection. 26 Chapter 3 SHARP: A Scalable Framework for Reliability Prediction of Concurrent Systems We present SHARP, a scalable, hierarchical, architecture-level reliability prediction framework for concurrent systems. We first present background information on a run- ning example we use throughout this chapter, the MIDAS system [45], along with infor- mation on software design models and architecture modeling in Chapter 3.1. This is followed by an overview of the SHARP framework in Chapter 3.2, and the details are described in Chapter 3.3. Finally, we evaluate SHARP in Chapter 3.4. 3.1 Background For the ease of exposition of our framework and for its subsequent evaluation, we use a sensor network application, built using the MIDAS framework [45] and depicted in Figure 3.1, as our running example. This system monitors room temperature and turns the air conditioner (AC) on or off in order to satisfy user-specified temperature levels. Sensor Sensor Gateway Gateway Hub AC GUI Sensor Sensor Figure 3.1: An Overview of the MIDAS system 27 1 (e) AC +E8 1 -E6 2 +E7 (d) GUI 1 3 +E3 2 +E6 -E4 -E7 -F2 -1 -R2 (c) Hub 1 3 2 -1 -F1 -R1 (a) Sensor 1 -E1 2 +E2 3 -E3 4 5 +E4 -E5 +E5 (b) Gateway +E1 -E2 Param Value Param Value Param Value Param Value q(E1) 0.2 q(E2) 1 q(E3) 0.1 q(E4) 0.2 q(E5) 0.5 q(E6) 0.005 q(E7) 1 q(E8) 2 q(F1) 0.03 q(F2) 0.01 q(R1) 0.8 q(R2) 0.4 E1: GetSensorData E2: SensorMeasurement E3: GWMeasurement E4: HubAckGW E5: GWAckSensor E6: GUIRequest E7: GUIAck E8: ChangeACTemp Figure 3.2: Components’ state diagrams We refer to this example system as the MIDAS system. The MIDAS system consists of five different types of components: a Sensor measures temperature and sends the measured data to a Gateway. The Gateway aggregates and translates the data and sends it to a Hub, which determines whether it should turn the AC on or off. Users can view the current temperature and change the thresholds using a GUI component, which then sends an update to the Hub. The state diagrams capturing the behavior of the MIDAS components are given in Figure 3.2. In a component state diagram, an event E is either a sending event or a 28 Gateway Hub E3 E4 Sensor Gateway Hub AC E8 GUI Hub E6 E7 Sensor Gateway E5 (a) SensorGW (b) GWHub (c) GWACK (d) GUIRequest (e) ChangeACTemp E1 E2 Figure 3.3: Sequence diagrams receiving event. In this paper, we use the notation introduced in [79], in which send- ing and receiving events is represented by “-” and “+”, respectively. In SHARP, an event needs to come with a specification of its arrival rate in states in which that event is enabled. These rates should be available from different information sources that are available during a system’s design; further assessment and discussion of these informa- tion sources is detailed in Chapter 5. Some of the state machines in Figure 3.2 include failure states (labeled by a negative number) that represent erroneous behavior triggered by a failure event F . In the following section, we discuss how we derive the failure states. The system-level behavior of MIDAS is captured using five basic scenarios: the SensorGW scenario that includes processing measurements from a Sensor component by a Gateway (Figure 3.3(a)); the GWHub scenario that includes processing aggregated measurements from a Gateway to the Hub (Figure 3.3(b)); the GWACK scenario that includes acknowledging the Sensor’s measurement (Figure 3.3(c)); the GUIRequest sce- nario that includes updating the temperature readings and changing temperature thresh- olds (Figure 3.3(d)); and the ChangeACTemp scenario that includes turning AC on or off according to the temperature readings (Figure 3.3(e)). 29 The five basic scenarios are in turn combined to form more complex system behav- iors as shown in Figure 3.4. The complex system behavior consists of relations between basic and complex scenarios that altogether form a scenario hierarchy 1 . The different scenarios can run concurrently (PAR relationship) or sequentially one after the other (SEQ relationship). In MIDAS, complex scenario Sensors PAR represents the paral- lel execution of multiple Sensors running the SensorGW scenario (Figure 3.4 describes a system variant with four sensors). Sensor PAR is considered complete once all the concurrently running scenario instances are complete. Furthermore, the complex sce- nario SensorMeasurement specifies a longer sequence that summarizes how a sensor measurement is propagated from Sensors to Gateway to Hub and back. Hierarchical scenario descriptions similar to the one presented for MIDAS are com- monly created during system design. In SHARP, an engineer also needs to annotate the scenario hierarchy with the following quantitative information: (1) branching probabil- ities when one scenario can be sequentially followed by multiple other scenarios, and (2) the number of scenario instances that run in parallel. While other approaches require and utilize the branching probabilities (e.g., [59]), SHARP is unique in its ability to effectively handle scenario multiplicities. The information about how many scenarios of a certain type will be running in parallel should be derivable either from the system requirements or from the architectural configuration. 1 We assume that the system designers have already handled implied scenarios [73] and updated the specification to be free of them. 30 1 ControlAC GUIRequest_LOOP ChangeACTemp* p ControlAC Sensors_PAR 2 SensorGW* SensorGW* 1 SensorMeasurement Sensors_PAR GWHub* GWACK_PAR End End GWACK_PAR 2 GWACK* GWACK* End End 1 - p ControlAC * system designer- specified scenario p ControlAC = 0.3 GUIRequest_LOOP 2 End GUIRequest* p invalid 1 - p invalid p invalid = 0.02 0 System SensorMeasurement SensorMeasurement ControlAC Legend Scenario Name Level Child Scenario Figure 3.4: MIDAS scenarios organized in a hierarchy 3.1.1 Architectural-Level Defect Analysis In this paper, we focus on analyzing the reliability effect of architectural defects [63], such as operation signature mismatches and mismatches between components’ interac- tion protocols. We define system reliability as the probability that a user does not expe- rience a failure caused by architectural defects. When a defect is triggered, a failure may occur, after which a component behaves erroneously with respect to the system’s requirements. SHARP accounts for the fact that a recovery from failure is possible. In system reliability analysis, a system’s failure is typically defined in terms of the failures of its components. We analyze an individual system component’s architec- tural model by applying a defect classification technique [63] to determine the failure states. We then add a failure transition from all the states in which a defect may be triggered. For example, using the defect classification technique [63] on MIDAS, we determine that a Sensor is unable to notify the Gateway when it is running out of bat- tery. This defect was discovered as a mismatch between the two components’ interac- tion protocols. Failures caused by this defect are represented as the failure state−1 in Figure 3.2(a). Furthermore, Sensor returns to State 2 upon recovery. 31 In general, a component can return to any state designated by the system designer during defect analysis. We extend the event send/receive nomenclature to failure and recovery events by viewing a component as sending failure (recovery) events when it fails (recovers). We assume that the failures are recoverable, but can model irrecoverable failures without significantly changing SHARP; only a few equations in Chapter 3.2 would need to be updated, where transient [72] (rather than steady state) analysis would be used. For brevity, in this paper we omit details of irrecoverable failure analysis. 3.2 An Overview of the SHARP framework The SHARP framework estimates a system’s reliability based on (a) a behavioral spec- ification provided primarily as a scenario hierarchy, (b) an operational profile, and (c) a definition of failure states. At a high level, SHARP works by partitioning the sys- tem behavior into smaller analyzable parts according to the scenario specification, with the premise that analyzing multiple smaller models is more efficient than analyzing one very large model. This is in stark contrast to a state-of-the-art scenario-based technique [59] that generates a full-blown reliability model from a complex scenario specification. Conversely, the idea of state space partitioning using scenarios has been explored in the literature [23, 78] but with notable shortcomings. Specifically, the existing research does not resolve two crucial obstacles to reliability estimation of complex, concurrent systems: 1. How can one efficiently solve a reliability model that captures complex sys- tem behaviors consisting of elaborate multi-scenario sequences when solving the whole model at once does not scale? 32 2. How can one efficiently estimate system reliability when a system consists of tens or even hundreds of concurrently running components and scenarios with possibly similar behavior? The first obstacle corresponds to the ability to deal with sequential scenario combina- tions without having to solve the corresponding “flat” model used by other approaches [59] and without making simplifying assumptions about scenario independence like some existing techniques [23]. The second obstacle relates to the need to handle sit- uations in which multiple scenario instances are running concurrently (PAR relation- ship). For example, we want to be able to efficiently solve Sensors PAR scenario from Figure 3.4 even in situations when we have thousands of concurrently running Sensors. SHARP resolves both of these obstacles by first generating and solving the reliability models of smaller scenarios and then incorporating the results into reliability models of the complex scenarios; this is done in a bottom-up way throughout the specified scenario hierarchy. To solve for reliability of a complex scenario with sequential dependencies (e.g., the SensorMeasurement scenario) in which there may be a large number of scenarios running one after another, we propose a technique based on stochastic complementa- tion [49]. Stochastic complementation is a standard technique for solving large Markov chains that relies on partitioning a large model to smaller analyzable parts that have a low number of incoming and/or outgoing transitions in the original model. To be able to do this, we utilize the partitioning that is intrinsically present in a SEQ scenario where each sub-scenario has only one entry point. For example, when analyzing MIDAS scenario hierarchy (Figure 3.4) SHARP utilizes the SEQ relations in the SensorMeasurement scenario to solve Sensors PAR, GWHub, and GWACK PAR first, and then incorporate the obtained results into a small, high-level SensorMeasurement model with only three 33 states. The outlined method for estimating reliability of complex SEQ scenarios demon- strates the synergy between structured software specifications and stochastic methods that is present in our framework. To estimate the reliability of a PAR scenario (e.g., Sensor PAR in Figure 3.4), instead of generating parallel scenario models by simply composing all of the instances together [43] or keeping track of the internal states of all components, SHARP works by keeping track of the number of concurrently running scenario instances. To be able to abstract a scenario’s execution state to either running or completed, SHARP calculates completion rates of the different scenarios using queueing network (QN) models [69]. For example, we aggregate the overall behavior of Sensors PAR from Figure 3.4 with a model that tracks whether there are zero, one, or two concurrently running instances of SensorGW. The transition rates between these different Sensors PAR states amount to the previously calculated SensorGW completion rate. In certain cases (e.g., very large scale systems), even the described symbolic model can become intractably large. SHARP applies model truncation [69] on such models. Model truncation removes the rarely visited states (i.e., rare scenario combinations) thus achieving notable scalability gains with a minimal loss in accuracy. Moreover, previous techniques fail short when analyzing concurrent systems because they do not consider that components in such systems often compete for the same set of resources. To address this, we take resource contention into account. Specif- ically, SHARP includes information about the possible concurrently running scenarios when constructing the basic scenario QNs. For MIDAS, we identify that the Gateway is a possible contention point as multiple Sensors may be using it concurrently; SHARP includes this information when building the SensorGW QN and reliability model. 34 Figure 3.5: An illustration of SHARP applied on the complex Sensor measurement scenario 3.3 Reliability Computation In this section, we present the technical details of SHARP. SHARP has three distinct activities that target (1) basic scenarios, (2) SEQ complex scenarios, and (3) PAR com- plex scenarios. We describe each activity in Chapters 3.3.1, 3.3.2, and 3.3.3, respec- tively. Each of these activities consists of steps for solving the corresponding models for reliability and completion time; the completion times are used when solving the PAR scenarios as elaborated below. As an example, Figure 3.5 illustrates the steps that SHARP performs to analyze the reliability of the Sensor measurement scenario from Figure 3.4. The different SHARP activities are used to analyze the different parts of the scenario hierarchy. The process for obtaining the reliability information for GWHub and GWACK PAR, for brevity not shown in Figure 3.5, is identical to the process for SensorGW and SensorsPAR. Intu- itively, SHARP first analyzes the low-level, basic scenarios and incrementally incorpo- rates the lower-level analysis results in the higher-level SEQ and PAR scenario models. Note that each activity comprises efficient steps for analyzing the scenario reliabili- ties and completion times, while the basic scenario activity also contains a contention 35 modeling step. In the following sections, we describe the SHARP activities with their corresponding steps. The order of applying the different activities ultimately depends on the structure of the scenario hierarchy. 3.3.1 Basic Scenarios As discussed earlier, the first step in solving for system reliability in SHARP is to solve for the reliability and completion time of the basic scenarios. We generate the scenario- based reliability models (SBM for short) in a similar manner to existing research [23, 59, 78] (described as Step 1.1 in the following section). The unique aspect of our approach is that we generate the failure states for a basic scenario based on the correspondence between the component reliability models (e.g., Figure 3.2) and the generated SBM. By doing so, we manage to reuse component-level information about architectural defects thus making the reliability analysis more meaningful as opposed to having an engineer “guess” the failure states. Next, we augment the generated SBM to model resource contention with special “queueing” states (Step 1.2). Intuitively, such states simulate a situation when an event cannot be processed immediately. For example, while the Gateway is processing data from one Sensor, it may receive data from another; consequently, we augment the SBM by adding a “queueing” state to represent queueing of the Sensor’s request. Contention- related parameters are computed using queueing networks (QNs) [69]. For example, to compute the average waiting time of a Sensor request, we build a QN model depicted in Figure 3.8. Parameters needed to build this model, e.g., frequency and the process- ing time of requests, are derivable from the operation profile. This contention-related behavior is included with minor increasing reliability model’s size as the QNs are solved separately and only their results are “plugged” back into the reliability model. 36 (a) Sensor (b) Gateway 1 +E1 2 -E2 -1 -F1 -R1 3 1 3 -E1 2 +E2 Figure 3.6: Component submodels of SensorGW Once the SBM reliability model is constructed and the QN is solved for contention, we solve SBM for scenario reliability (Step 1.3) and completion time (Step 1.4) using standard methods. Step 1.1: Generating SBM To generate the SBM for a scenario, we first generate a component submodel for each component in each scenario, and then apply parallel composition, as in [59]. A com- ponent submodel of Component Comp c in Scenario Scen i , Comp c Scen i , is a state machine model describing the behavior of Comp c in Scen i , in which a state transi- tion represents the occurrence of an event in the corresponding sequence diagram. In our MIDAS example, the component submodels for the SensorGW scenario (recall Fig- ure 3.3) are depicted in Figure 3.6. The next step is to add failure states to each component submodel. We identify the possible points of failure in a component submodel by leveraging the component model using our work [19]. As we discussed in Chapter 3.1, we assume we have identified the architectural defects in the components, and represent them as failure states in a component model. For example, to model the defect in the Sensor, represented by the failure state−1 in Figure 3.2(a) (recall Chapter 3.1), we add a failure state, State−1, in Figure 3.6, a failure transition from State 2 to State−1, and a recovery transition from State−1 to State 2. 37 1 3 q(E3) 2 q(E4) 1 2 q(E5) 1 2 q(E8) (b) GWHub (c) GWACK (d) GUIRequest (e) ChangeACTemp 1 3 q(E6) 2 q(E7) -1 q(F2) q(R2) (a) SensorGW 1 q(E1) 2 q(E2) -1 q(F1) q(R1) q(Ready) 4 3 Figure 3.7: SBMs of the basic scenarios We then generate an SBM for each Scen i by applying parallel composition [43] to the component submodelsComp c Scen i for allComp c . In MIDAS, the SBM of the five scenarios specified by the system designer is depicted in Figure 3.7. In our example, applying parallel composition to the component submodels in Figure 3.6 would result in the SBM for the SensorGW scenario depicted in Figure 3.7(a). Note that the states cor- responding to normal behavior are marked in white, while the failure states are marked in grey. 2 (Note that State 3 in the SensorGW scenario (Figure 3.7(a)) corresponds to contention modeling, which we describe next). A major difference between our approach and [59] is that [59] first combines the component submodels of each component for all scenarios (i.e., combining the models Comp c Scen i for all Scen i to synthesize a model for Comp c ), and then apply paral- lel composition to all synthesized component models. Conversely, we apply parallel composition to the component submodels for each scenario separately (i.e., for each Scen i , we apply the algorithm to Comp c Scen i for all Comp c ); we then combine 2 For ease of presentation, we assume that the execution of a scenario has failed if any component is in a failure state. SHARP is flexible enough to allow designers to specify more complex failure rules. This is done using the same technique in PAR scenario as described in Chapter 3.3.3. 38 the results of solving the SBMs by solving its parent’s SBM. As a consequence, our approach addresses the common scalability problems because generating and solving many, smaller models, rather than a huge model in [59], results in space and computa- tionally savings, as discussed earlier. Finally, we determine the transition rate between the states based on the provided operational profile. Formally, let Q i be the transition rate matrix for Scen i ’s SBM. If the transition from State j to State k corresponds to the event E, the transition rate Q i (j,k) = q(E), where q(E) is the rate that event E occurs. To complete the SBM, according to [69], we set the diagonal entries Q(j,j) such that each row in Q sums to zeros. 3 Step 1.2: Modeling Contention To model contention in SHARP, we augment the SBMs with contention informa- tion. When several components (callers) request services from a servicing component (callee), the callee needs to allocate its resources appropriately to serve a caller, while other callers would need to wait to obtain service. 4 Since the system behavior may be different when a request is waiting for service and when it is being processed, we add a queueing state to represent that a caller’s request is queued. Formally, let E be an event that triggers a transition from Statej to Statek in an SBM. If there is a component that may be servicing other requests upon receiving E, we add a queueing state State q such that Q(j,q) = q(E), and Q(q,k) = q(Ready). Ready is an event indicating that the callee is ready to process the request of the caller of interest. As an example, in 3 Note that self-loops in a component model (i.e., an event that causes a transition from a state to itself) have been implicitly accounted for here. Since self-loops do not cause any state transitions, they do not affect the probability distribution of being in a state in a CTMC, and are therefore dropped in a SBM. 4 Since the flat model results in callees serving the callers on a FCFS basis, we also use FCFS in our exposition (as the flat model is used as the ground truth); however, SHARP allows other queueing disciplines, which can be modeled similarly. 39 the SensorGW, the Gateway could be servicing a Sensor’s request when another Sensor triggers eventE1. Therefore, we add State 3 to represent queueing, as in Figure 3.7(a). Any other points of contention would be modeled in a similar manner. The next step is to determine q(Ready), the outgoing rate of the queueing state. We define q(Ready) = 1 T wait , where T wait is the average time a caller spends waiting to receive service. To compute T wait , we solve a queueing network (QN) [69], which describes the queueing behavior of the callers’ requests. In this QN, the callee is repre- sented by a server. To build such a QN, we utilize the following information: (a) the number of differ- ent types of callers (i.e., the different types of components where each type can request different services with different processing times), and the maximum number of type of caller that may request a service; (b) how often a caller requests a service (arrival rate); (c) how long the callee takes to serve a request (service rate); and (d) the callee’s queue- ing discipline. Note that (a) is available from the system’s requirements and architectural models which contain architectural configuration information, (b) is the available from the operational profile (i.e., the rate of an eventE), and (c) is the total rate leaving a state k also derivable from the operational profile. The operational profile information and the other model parameters are determined using an approach to be described in Chap- ter 5. Lastly, (d) describes how the callee’s resources are allocated among callers. For example, the callee can serve the incoming requests in a first-come-first-serve (FCFS) or a round-robin (RR) fashion. This information should be available from the system’s requirements and the architectural models. For example, the choice of a given middle- ware for implementing the system may impose FCFS service of callers. The constructed QN is finally solved for the average waiting time in the queue using standard methods [69]. 40 Gateway 2 Sensors rate: 1 rate: 2 Figure 3.8: QN model of the SensorGW scenario The QN for SensorGW basic scenario is depicted in Figure 3.8. As an example, we are interested in modeling the case when a Sensor sends measurements to the Gateway while the Gateway is processing another Sensor’s request. Hence, we have one class of callers: two Sensors may send measurements to the Gateway with arrival rate being 2×q(E2) = 2, processing rate being q(E3) = 1, and the Gateway is a FCFS callee. After solving the QN for the average waiting time at Gateway’s queue in Figure 3.8, the resulting rate of leaving the queueing state (State 3) in Figure 3.7(a) is estimated to be 5. Other points of contention in the SBMs are treated analogously. Step 1.3: Computing Scenario Reliability Scenario reliability is defined as the probability of not being in a failure state. Solving for it involves finding the steady state solution of a SBM. Note that the process we apply to compute basic scenario reliability needs to take into account that, at a higher level, we utilize stochastic complementation [49] to handle SEQ scenarios. Intuitively, stochastic complementation breaks a large Markov model into a number of submodels, solves the submodels separately, and reconstructs results of the original model. The special structure required for an efficient solution using stochastic complementation is that each submodel has only one start state. Notably, the generated basic scenario SBMs satisfy this requirement as they have a single starting state. To obtain the basic scenario reliability, we assume a system is executing for a long time (i.e., it is in its steady state), that after the execution of Scen i we are analyzing, 41 it will eventually executes Scen i again. Therefore, at the level of a basic scenario, we are interested in finding its reliability r i , which is the conditional probability of not being in a failure state, given thatScen i is executing. To complete the model, we need to account for the transition going out of Scen i and determine the state to which the control is eventually transferred from another scenario. In case of a basic scenario, the execution always returns to the scenario’s Start state. The execution of a scenario ends when it is about to go to an End state, which represents the start of another scenario. Therefore, based on [49], we remove the End state, and redistribute the transition rates to the Start state (i.e., Q i (j,End) = Q i (j,1)). For example, in GUIRequest’s SBM in Figure 3.7(d), we replace the transition from State 2 to State 3 with a transition from State 2 to State 1, with the transition rate being q(E7), as in Figure 3.9. Now we can solve this model for its steady state probability vector, ~ π i ), by using standard techniques [69], and scenario reliabilityr i can be computed as follows: r i = 1− X f∈F i π i (f) (3.1) where F i is a set of failure states in Scen i ’s SBM. The rate matrix of GUIRequest’s model in Figure 3.9 after rate redistribution is 1 2 −1 −0.005 0.005 0 0.1 −0.11 0.01 0 0.4 −0.4 (3.2) Solving this matrix using standard technique for ~ π i would yield ~ π i = [0.9512,0.0476,0.0012], and hence the reliability of the GUIRequest scenario is r i = 1−0.0012 = 0.9988. 42 1 3 q(E6) 2 F q(F2) q(R2) q(E7) Figure 3.9: Rate redistribution in GUIRequest Step 1.4: Computing Scenario Completion Time Recall that the scenario’s completion time, t i , is needed in building the concurrency- level models for a PAR scenario (Chapter 3.3.3). We can computet i fromScen j ’s SBM, after updating its parameters. We note thatt i includes time spent in normal operation as well as time spent in recovering from failures, because the definition of the “completion” of a scenario includes all of the scenario’s behavior. LetT i (s) be the completion time when we are in Statei inScen i ’s SBM. i.e.,t i = T i (1). We compute ~ T i by performing transient analysis that corresponds to solving Eq. (3.3) [72]: Q ′ i ~ T i =−e (3.3) where Q ′ i is the matrix after eliminating the row and column corresponds to the End state inQ i , and−e is a column vector of−1 with the appropriate dimension. Consider the GUIRequest scenario, the rate matrixQ ′ i is Q ′ i = 1 2 −1 −0.005 0.005 0 0 −0.11 0.01 0 0.4 −0.4 (3.4) 43 Table 3.1: r i andt i of the MIDAS scenarios Scenario r i t i Scenario r i t i SensorGW 0.9900 6.2625 GWHub 1 15 GWACK 1 2 Sensors PAR 0.9867 9.3937 GWACK PAR 1 3 SensorMeasurement 0.9867 27.394 GUIRequest 0.9999 201.03 ChangeACTemp 1 0.5 GUI LOOP 0.9999 205.13 ControlAC 0.9999 205.28 System 0.9940 287.46 Applying Eq. (3.3) toQ ′ i , GUIRequest’s ~ T i is ~ T i = [201.03,1.025,3.525], and hence t i = 201.03. t i of the MIDAS scenarios are depicted in Table 3.1. 3.3.2 SEQ Scenarios To analyze a complex scenario with sequential dependencies, we apply stochastic com- plementation to generate a SEQ scenario’s SBM by combining the SBMs of the child scenarios (Step 2.1). This is a novel use of an advanced stochastic method for analyzing a software system’s quality, and comprises an important contribution of this paper. We solve the resulting SEQ scenario SBM for scenario reliability (Step 2.2) and completion time (Step 2.3) in a similar manner to our solution for basic scenarios. Step 2.1: Generating SBM We generate an SBM for a SEQ scenario as follows: we first generate the states of the model, and then compute the transition rates with respect to the applied stochastic complementation [49]. The states in a SEQ scenario’s SBM correspond to the child scenarios. We determine the transitions according to the dependencies between the child scenarios. If a SEQ scenarioScen i has a child scenarioScen k executing after another 44 child scenario Scen j , we add a transition from state j to state k in Scen i ’s SBM. For example, the SBMs of the SEQ scenarios in MIDAS are depicted in Figure 3.10. 5 The transition rates for each transition determined above are calculated as follows [49]: Q i (j,k) = (p i (j,k))out j (3.5) where p i (j,k) is the probability that Scen k executes after the execution of Scen j , and out j is defined in a similar manner to [49]: out j = X s∈S j (π j (s))Q j (s,End) (3.6) where S j is a set of states in Scen j , π j (s) is the steady state probability of being in State s in the Scen j ’s model, and Q j (s,End) is the transition rate going from State s to the End state in Scen j ’s model. For example, in GUI LOOP (Figure 3.10(b)), let Scen i = GUI LOOP and Scen j = GUIRequest, the transition rate from State 1 to State 2 inScen i is Q i (1,2) = p i (1,2)( X s∈S j π j (s)Q j (s,End)) = (1−p invalid )(π j (2))(q(E7)) = (0.98)(0.005)(1) = 0.0049 Furthermore, when we move up a level in the hierarchy to ControlAC (Figure 3.4) the transition rate going from State 1 to 2 in ControlAC’s SBM in (Figure 3.10(c)) becomes 5 Note that the self-loop in State 1 of the GUI LOOP scenario (depicted as dotted arrow in Figure 3.10), representing that the user’s input is invalid, has been dropped, because a CTMC implicitly accounts for self-loops. 45 1 3 2 out 5 1 4 out 1 2 q(E4) 3 out 3 (a) SensorMeasurement (c) ControlAC (p ControlAC )out 9 (1-p ControlAC )out 9 1 2 (b) GUI_LOOP (1-p invalid )out 4 Figure 3.10: SBMs of the SEQ scenarios Q i (1,2) = p i (1,2) X s∈S j π j (s)Q j (s,End) = (p ControlAC )(1)Q j (1,End) = (0.3)(1)(0.0049) = 0.0015 Step 2.2: Computing Scenario Reliability Similarly to the way SHARP solves the basic scenario SBM, we redistribute the rate going to the End state of a SEQ scenario SBM and solve the model for its steady state probability vector,~ π i , using standard technique. Once we have computed~ π i andr j for all child scenariosScen j , we solve the scenario reliability using the equation r i = 1− X j π i (j)r j (3.7) Continuing with our example, to solve for~ π i using the SBM of the ControlAC sce- nario, after redistributing the rate going to the End state, we have the following rate matrix: −0.0015 0.0015 2 −2 (3.8) 46 1 3 2 1/t 5 (1) (b) ControlAC (p ControlAC )(1/t 7 (1)) (1-p ControlAC )(1/t 7 (1)) 1 2 (a) GUI_LOOP 1/t 4 (1) Figure 3.11: Models for completion rate computation for GUI LOOP and ControlAC Solving this model gives us ~ π i = [0.9993,0.0007]. The reliability of the Change- ACTemp is 1, as there is no defect identified in that scenario. Hence, the reliability of the ControlAC scenario isr i = (0.9993)(0.9999)+(0.0007)(1) = 0.9999. The reliabilities of other scenarios, depicted in Table 3.1, are similarly computed. Step 2.3: Computing Scenario Completion Time To compute the SEQ scenario completion time, we combine the completion time of the child scenarios, represented by a state in this SBM. Formally, if State j repre- sents a child scenarioScen j , we update the rate going to another Statek toQ i (j,k) = (p i (j,k))(1/t j ), wheret j is the completion time of a child scenarioScen j , andp i (j,k) is the probability of going from Statej from Statek in the parent scenarioScen i . The difference compared to the model generated in Step 2.1 is that the transition rates here are a function of child scenario completion times, whereas in Step 2.1 the transition rates were a function of the transition rates going into the End states of the child scenar- ios. The reason for this change is the purpose of the analysis —reliability or completion time— which requires different information incorporated from the child scenarios. Sub- sequently, we compute the completion time using the standard methods (Eq. 3.3). For example, consider the GUI LOOP scenario from Figure 3.4. The rate matrix for completion time computation in Figure 3.11(a) by updating the rates in Figure 3.10(b) as follows: let Scen i = GUI LOOP , and Scen j = GUIRequest, using the rate matrix 47 and steady state probability vector in Chapter 3.3.1, we can update the transition rate from State 1 to State 2 toQ i (1,2) = (p i (1,2))(1/t j ) = (1−0.02)(1/201.03) = 0.0049. Applying Eq. (3.3) gives us the completion time of GUI LOOP is 205.12 time units. Let us move up one level in the hierarchy and consider the ControlAC scenario, in which the model for completion time computation is depicted in Figure 3.11(b). Let Scen i = ControlAC, and Scen j = GUI LOOP , the transition rate from State 1 to State 2 in ControlAC is Q i (1,2) = (p 1,2 )(1/t j ) = (p ControlAC )(1/t j ) = (0.3)(1/205.12) = 0.0015. Similarly, the transition rate from State 1 to State 3 is Q i (1,3) = (p 1,3 )(1/t j ) = (1−p ControlAC )(1/t j ) = (0.7)(1/205.12) = 0.0034. The transition rate from State 2 to State 3 is the completion rate of ChangeACTemp, which isQ i (2,3) = 1/0.5 = 2. Given these parameters, the rate matrix of the model in Figure 3.11,Q ′ (defined in Chapter 3.3.1) is Q ′ = 1 2 −0.0047 0.0015 0 −2 Thus, the completion rate of ControlAC after solving Q ′ using Eq. (3.3) is t i = 205.28 time units. Note that the completion time of the ControlAC scenario is similar to that of the GUI LOOP scenario. This is because the completion time of the Change- ACTemp scenario is low as compared to the completion time of the GUI LOOP scenario (0.5 vs. 205.12 time units). Therefore, the ChangeACTemp has little impact on the completion time of the ControlAC scenario. 3.3.3 PAR Scenarios The primary goal in the generation of a PAR scenario’s SBM is to avoid scalability problems that arise when handling systems in which there are many concurrent scenario 48 instances, some of which are of the same type. To this end, we propose a method based on symbolic representation of the system execution state. Specifically, we abstract the execution of concurrent scenarios by creating a model that keeps track of currently running instances of each scenario. Each state of a PAR scenario SBM can be described as a combination of child scenarios, or simply, a combination. Our model also allows us to avoid redundancy in the models we generate, when the system can have several instances of the same scenario. The generated symbolic model is referred to as the concurrency-level model in the following sections. SHARP first determines the feasible scenario combinations (Step 3.1), and constructs the concurrency-level model (Step 3.2). Next, SHARP calculates the probabilities (Step 3.4) and the reliabilities (Step 3.5) of the different scenario combinations. SHARP ultimately uses the obtained information to compute the overall PAR scenario reliability (Step 3.6) and completion time (Step 3.7). To further address the potential scalability problems when dealing with very large- scale systems for which concurrent scenario instances may be in the range of hundreds or thousands, SHARP employs model truncation [69] (Step 3.3). Step 3.1: Determining Scenario Combinations Determining the possible combinations is the first step in solving for reliability and completion time of a PAR scenario; as a reminder we consider a PAR scenario complete when all of its child scenarios end their execution. A combination, C i , is defined as C i = (c 1 ,c 2 ,...,c Sc ), wherec j is the number of completed instances ofScen j , andS c is the number of child scenarios. 6 We also define I j to be the number of instances of 6 Given that the system essentially experiences scenario “completions”, we assume that the probability that more than one scenario completes in the exact same instant in time is negligible. This is a standard assumption in Markov chain models which makes them more tractable without a significant loss in what is expressable with such models. 49 Table 3.2: Values ofP(C k ) andR(C k ) in the System scenario C k P(C k ) R(C k ) t i C k P(C k ) R(C k ) t i (0,0) 0.0420 0.9606 287.46 (1,0) 0.0630 0.9735 260.07 (2,0) 0.1260 0.9866 232.67 (3,0) 0.7266 0.9999 205.28 (0,1) 0.0077 0.9607 82.181 (1,1) 0.0116 0.9736 54.787 (2,1) 0.0231 0.9867 27.394 Scen j that needs to be completed, andI = max(I j ) be the largest number of possible instances among all Scen j . The execution of a PAR scenario is completed only when all child scenarios has been completed. In order to find scenario reliability, we need to compute the distribution of the possi- ble combinations. Since, in general, not all combinations of scenarios in a system may be possible, we allow a system architect to specify the combinations of scenarios that are not possible (or not allowed). For instance, in MIDAS, such a restriction may be put in place to avoid exhausting the resources of the Hub: by allowing no more than three requests in the Hub. Hence, in the System scenario, if we set I 1 = 3, and I 2 = 1, and include the restriction that I 1 +I 2 ≤ 3, then, the possible scenario combinations are those depicted in Table 3.2. Step 3.2: Concurrency-Level Models Generation The next step is to generate a concurrency-level model for each child scenario Scen j . Specifically, a concurrency-level model is a CTMC model, representing the number of completed instances of Scen j , and the determination of the End state depends on the number of unique child scenarios. When there is one child scenario, completing I j instances of Scen j represents the completion of the PAR scenario. i.e., the state I j in the concurrency-level model is considered to be the End state. For example, the concurrency-level models corresponding to Sensors PAR and GWACK PAR in MIDAS are depicted in Figure 3.12(a) and (b). 50 (a) Sensors_PAR (b) GWACK_PAR 0 2 2/t 1 1 1/t 1 0 2 2/t 3 1 1/t 3 (c) System 0 2 1 0 1 1/t 10 ControlAC SensorMeasurement E E 3/t 5 1/t 5 1/d 5 1/d 10 3 2/t 5 Figure 3.12: SBMs of the PAR scenarios When there is more than one child scenarios, completing I j instances of Scen j means that the execution of all instances has been completed. Scen j can only execute again when all other scenarios have been completed, and the parent scenario executes again. We add a stateE to model this behavior, and define State E to be the End state of the concurrency-level model. We also add a transition from stateI j , which corresponds to completing all instances of Scen j , to State E. For example, the concurrency-level models corresponding to the System scenario is depicted in Figure 3.12(c). We determine the transition rates as follows: the transition rate from State c j (c j < I j ), i.e., when there are c j completed instances of Scen j , to State c j + 1 is (I j − c j )(1/t j (1)), where t i is the scenario completion time, which is computed in Chapters 3.3.1 and 3.3.2. The transition rate corresponds to the rate an instance of Scen j completes, when there arec j completed instances. When there are more than one unique child scenarios, the transition from StateI j to State E, which corresponds to the average time to wait for the completions of all instances from other scenarios, is 1/d j , where d is the average of the total delay other scenarios Scen k , k 6= j have caused. Here, we setd j = P k6=j I k t k , wheret k is the completion time ofScen k . 7 7 Note that d j is an approximation, which assumes that the average time to complete an instance of Scen k , given thatScen j has completed, is stillt k . An exact computation ofd j involves transient analysis, which is computationally more expensive [72]. 51 0 20 40 0 0.2 0.4 0.6 0.8 t = 0.01 Prob # completed 0 50 0 0.1 0.2 t = 1.00 Prob # completed 0 50 0 0.1 0.2 t = 100.00 Prob # completed Figure 3.13: Probability distribution of the number of completed instances Step 3.3: Performing Model Truncation To further reduce the computational cost, we drop the combinations in a PAR scenario that are rarely visited using model truncation [69]. The steady probability distribution of c j , the number of completed instances of Scen j , depends on the values of ~ t j (scenario completion time), as well as the com- pletion time of other scenarios ~ t k , Scen k 6= Scen j . P j (c j ) can be obtained by solving a concurrency-level model using standard techniques. As an illustration, we depict the steady state probability distribution ofc j in Figure 3.13. We assume the completion rate of other scenarios are fixed, and d j = 1 (recall Step 3.2). Also, we setI = 50, and varied t j (1) at different values (0.01, 1, and 100). For instance, when t j (1) = 100, 29 (out of 51) possible values ofP j (c j ) is less than 1%. In generating the scenario combinations, we can drop the values of c j that occur rarely. That is, instead of consideringc j could be any value between 0 andI, we con- sider a smaller range of values, determined as follows: in generating the scenario com- binations, we considerx as a possible value ofc j ifP(c j =x) is larger than a threshold ǫ . That is, a small threshold allows us to consider a wider range of value, and we can consider the case without using truncation as having ǫ = 0. For example, if ǫ = 0.01 52 (depicts as a dotted line in Figure 3.13), whent = 100,d j = 1, andI = 50, we only con- sider 29≤c j ≤ 50 in generating the scenario combinations (instead of 0≤c j ≤ 50). There is a tradeoff between the number of states we drop and the penalty in accuracy in applying model truncation. We evaluate this tradeoff in Chapter 3.4.2. Step 3.4: Computing Combination Probability Once constructed, SHARP solves the concurrency-level model for the probability distri- bution of each combination. We defineP(C i ) = P(c 1 ,c 2 ,...,c S ) to be the probability that there arec j completed instances ofScen j for eachj = 1...S. Since we assume all instances of all child scenarios run independently, P(C i )≃ Q j P j (c j ) W (3.9) where P j (c j ) is the probability that c j instances of Scen j have completed, and W = P k P(C k ) is a normalization factor 8 that ensures that P(C k ) sum to 1. In the System scenario, P(c 1 ,c 2 ) is the probability that there are c 1 and c 2 completed instances of SensorMeasurement and ControlAC, respectively. Hence, P(c 1 ,c 2 )≃ P 1 (c 1 )×P 2 (c 2 ) W We redistribute the transition state from the End state to State 1 as in Chapter 3.3.1, and solve the model for P j (c j ) for all c j , using standard techniques [69]. This corre- sponds to solving for the probability of being in state c j in Scen j ’s concurrency-level model. Table 3.3 gives the probability distribution ofP j (c j ) in the two PAR scenarios of our MIDAS example; these are computed using the concurrency-level models of each 8 The normalization factor is needed because, in general, not all combinations of scenarios may be allowed, as described earlier. 53 Table 3.3: Values ofP j (c j ) in the System scenario Parameter Value Parameter Value Parameter Value P 1 (0) 0.0305 P 1 (2) 0.0916 P 2 (0) 0.8949 P 1 (1) 0.0458 P 1 (3) 0.8321 P 2 (1) 0.1051 scenario (as described above). Furthermore, Table 3.2 gives the corresponding combina- tion probabilities, computed using the data from Table 3.3 by applying Eq. (3.9). Lastly, since the computation of the distribution of different scenario combinations is done in an approximate manner as described above, in Chapter 3.4 we evaluate the accuracy of this approximation, as well as the reduction in computational cost. At the level of the entire system (i.e., the highest level in the hierarchy), the syn- chronization requirement, that a PAR scenario is considered completed only when all child scenarios have been completed, may be too restrictive. Some systems are con- sidered to be continuously running, where child scenarios can be considered as starting and completing independently. For example, after the Sensors have finished taking mea- surements, they do not necessarily have to wait until the GUI to update the data before taking another set of measurements. We can use our previous work in [18] to model this behavior. Step 3.5: Computing Combination Reliability Given that we now know how to compute the probabilities of having various combina- tions of child scenarios as well as the reliabilities of the child scenarios, what remains is the computation of the reliabilities of each combinations. We can then compute scenario reliability by combining the reliabilities of each combination. We will use the combina- tion (1,0) in the System scenario — one completed instance of the SensorMeasurement scenario, and no completed instance of the ControlAC scenario — as an illustrative 54 example in this section; the reliabilities of other combinations are calculated analo- gously. To compute the reliability of a scenario combination, we need to first examine how scenario failure is defined. In SHARP, system designers can specify the conditions under which the scenario is considered to have failed as follows. If there are x j or more failed instances of any Scen j , the system is considered to have failed, i.e., (F 1 ≥x 1 )∨(F 2 ≥x 2 )∨...∨(F S ≥x S ) (3.10) whereF j is the number of failed instances of a child scenarioScen j (recall thatS is the number of distinct child scenarios). 9 As an example, the system is considered reliable if it can control the temperature appropriately and display the current room temperature to the user. This requires that (a) Sensors on at least one Gateway correctly measure and send data to the Hub; and (b) the GUI displays the current temperature obtained from the Hub, and the Hub controls the AC appropriately. Therefore, we define the system to be unreliable when one or more instances of SensorMeasurement, or one or more instance of ControlAC have failed, respectively. Thus,x 1 = 1, andx 2 = 1. To compute the probability that combination C k satisfies the failure condition in Eq .(3.10), we consider the clauses one at a time, and compute the probability of satis- fying a clause as follows: P(F j ≥x j ) = I j −c j X f=x j P(F j =f) (3.11) The next step is to compute the probability distribution ofF j , which is the number of failed instances ofScen j , out ofI j −c j instances. An instance fails with probability 9 The OR-clauses are used for ease of presentation. SHARP can easily specify more general failure conditions, by using disjunctive normal form and modifying Eq. (3.13) accordingly. 55 1− r j , where r j is the reliability of Scen j as defined in Eq. (3.1); otherwise, with probability r j , it has not failed. Note that F j is a binomial random variable, and its probability distribution, according to [69], is as follows: P(F j =f) = I j −c j f (1−r j ) f (r j ) (I j −c j −f) (3.12) In the MIDAS example, based on Eq. (3.11) and Eq. (3.12), we computeP(F 1 ≥ 2) as follows: P(F 1 = 1) = 2 1 (1−0.9905) 1 (0.9905) 1 = 0.0188 P(F 1 = 2) = 2 2 (1−0.9905) 2 (0.9905) 0 = 9×10 −5 P(F 1 ≥ 1) = 2 X f=1 P(F i =f) = P(F 1 = 1)+P(F 1 = 2) = 0.0188 Similarly,P(F 2 ≥ 1) = 0.0001. Since the system is considered to have failed when any clause in Eq. (3.10) is satis- fied, the reliability of a combination,R(C k ), can be defined as: R(C k ) = 1− S X j=1 P(F j ≥x j ) (3.13) To complete our example, we combine the above results according to Eq. (3.13), i.e., 56 R((1,0)) = 1− S X i=1 P(F i ≥x i ) = 1−(0.0188+0.0001) = 0.9811 We repeat this calculation for each combination; Table 3.2 gives reliabilities of all sce- nario combinations for the MIDAS example under the above given failure conditions. Step 3.6: Computing Scenario Reliability We compute scenario reliability by combining the results of the previous steps. Sce- nario reliability of a PAR scenario is defined as the sum of the scenario combinations’ reliabilities, weighted by the probability that the combination occurs, i.e., r i = X k P(C k )R(C k ) (3.14) In our running example, the solution of Eq. (3.14) gives the reliability of the System scenario, and hence system reliability, as 0.9940, which, in this case, is within 0.5% of the ground truth of 0.9935, obtained by solving the “flat model” as detailed below. Step 3.7: Computing Completion Time To complete the PAR scenario analysis, SHARP computes the completion time for each combination C j . Intuitively, the average completion time of a combination t(C j ) = max(T i (c i )), where T i (c i ) is a random variable representing the completion time of Scen i when there are c i completed instances. Computing this is difficult, because we need to compute the completion time distribution of all child scenarios, followed by computing the distribution ofmax(T i (c i )). 57 To simplify the calculation oft(C j ), we assume the completion of a scenario instance is memoryless, as in the flat model. As an example, consider the combination (0,0) in the concurrency-level models of the System scenario in Figure 3.12. i.e., no instances of both the SensorMeasurement and ControlAC scenarios have been completed. The memoryless assumption means that if an instance of the SensorMeasurement scenario has completed, the average time it takes to complete an instance of the ControlAC sce- nario is “reset” tot 10 . Given this assumption, we can see that the average time to complete a combination, t(C j ), is simply the sum of the completion time for each scenario, t i (c j ). That is, the completion time forC j is t(C j ) = S X i=1 t i (c i ) (3.15) To computet i (c i ), we solve the concurrency-level for the average completion time, by setting State I i as the End state of the concurrency-level model, and applying Eq. (3.3) to it. For example, in Sensors PAR, we solve the model in Figure 3.12(a) using Eq. (3.3), and the resulting completion time is T i = [9.3937,6.2625], and hence t i = 9.3937. 3.4 Evaluation We evaluate SHARP along two dimensions: (a) the complexity of generating and solv- ing concurrent systems’ reliability models as compared to those that can be derived from existing approaches (Chapter 3.4.1), and (b) the corresponding accuracy of SHARP (Chapter 3.4.2). More specifically, we compare SHARP against a flat model, which is used here as the “ground-truth”. Our flat model is essentially the same as [59], where 58 a system reliability model is generated by applying the parallel composition. 10 We note that the difference between our application of parallel composition (in Chapter 3.3.1) and that in [59] is that we use parallel composition to generate a SBM of basic scenarios (which, as argued below, is expected to be relatively small) while [59] uses it to generate a model of the entire system at once. We applied SHARP to a variety of systems, with different numbers of components, scenarios, as well as numbers of instances of scenarios. We show representative results obtained from the following systems: 1. The MIDAS example system we used throughout this chapter. This system has five basic scenarios, and may potentially have a large number of instances of a scenario (e.g., multiple sensors taking measurements). There are four Sensors, one Gateway, one Hub, one GUI, and one AC in the instantiation of MIDAS used in this evaluation. 2. A GPS system with route guidance, audio player, and bluetooth phone capabili- ties. This system has five major components: RouteGuidance (RG), EnergyMon- itor (EM), MediaPlayer (MP), BluetoothPhone (BT), and Database (DB). This system is modeled using 21 basic scenarios. Note that it is unlikely that there will be more than one instance of a scenario in this system because of the system’s structure (e.g., it typically makes little sense to have two instances of a route guid- ance scenario to perform the same route guidance service). 11 To evaluate SHARP in a controlled manner, we injected the following defects into this GPS system: (a) a defect in the EM component which may lead to failure to notify other system 10 The only differences between our flat model and the one in [59] is that [59] assumes that failures are irrecoverable. As we discussed earlier, SHARP can model irrecoverable failures without significant changes. 11 One exception to this would be the situation when the system designers are concerned about service failures, and hence introduce redundancy. We do not consider such a variant of the GPS system. 59 components when the battery is low, and (b) a defect in the RG component which may lead to failure in updating a user’s location accurately. 3. A simple client-server system with one server and possibly many clients, which is modeled using one basic scenario. In this system, a client sends a Request message to the server to request for service, and the service replies with a Reply message. If a request arrives when the server is servicing another request, the new request waits in the server’s buffer in a FCFS fashion. We assume the server has enough buffer space so that the requests would not be dropped. We select such a simple system as it is difficult to study the accuracy of larger systems: the flat models of larger systems would become too large to solve, and hence we would lack a baseline for comparison. We use this system to illustrate the effect of contention when there are many clients, as its simplicity allows us to generate flat models with a larger number of components as compared to MIDAS (see Chapter 3.4.2 for details). We consider a defect in the server that may lead to failure to reply to the client when a requested file cannot be retrieved. 3.4.1 Complexity Analysis We now explore the complexity of SHARP as compared to the flat model. We first describe the theoretical worst-case complexity of each approach, and then discuss the computational cost that is likely to arise in practice. Worst Case Complexity LetU be the number of unique components,C be the total number of components,S be the number of scenarios (basic and intermediary),S B be the number of basic scenarios, S I be the number of intermediary scenarios (i.e.,S =S B +S I ). Also letI i be the number 60 Table 3.4: Worst-case complexities Time Complexity Space Complexity SHARP O(S I max(S 3 ,S (I+1) I 3 )+ O(S+ max(M 2C ,S 2 ,SI)) S B (M 3C +M C S B (I+1) S B )) Flat Model O(M 3C ) O(M 2C ) of instances of Scen i , andI = max(I i ) for all Scen i . Also, letM j be the number of states of Comp j , andM = max(M j ) for all Comp j . The resulting complexities are summarized in Table 3.4. Let us first analyze the complexity of SHARP: Basic scenarios: In the worst case, every state in every component participate in a basic scenario, and hence the SBM may have as many asO(M C ) states. Once we have determined the states in the SBM, we need to determine the transitions between each pair of states. Therefore, the complexity of the generation of a SBM is O(M 2C ). The complexity of solving a SBM 12 is O(M 3C ). Thus, the time complexity of generating and solving the SBM for a basic scenario isO((M 2C +M 3C )) =O(M 3C ). The space complexity of generating and solving a basic scenario’s SBM is O(M 2C ) — once we have solved a SBM, we can reuse its space as we generate SBMs one at a time. Contention Modeling: In the worst case, there is contention in every state in the SBM of a basic scenario. If, as a result, we add a queueing state corresponding to each state, we double the size of every SBM of each basic scenarios, which does not affect the worst case complexity of solving it (O((2M C ) 3 =O(8M 3C ) =O(M 3C )). Thus, in the worst case, we have O(M C ) QNs to solve. Determining the worst case complexity of solving a QN can be complicated, as that depends on the type of a QN we have. Given the special structure of our contention models (refer to Chapter 3.3.2), we can make sure that the corresponding QNs have product form [69], by adjusting the visit ratios 12 The time complexity of solving a Markov chain withN states is O(N 3 ), and the space complexity for storing the corresponding rate matrix isO(N 2 ). 61 accordingly; 13 this would result in the worst case complexity of O(S B (I+1) S B ) [69] for solving one QN. Thus, the worst case time complexity of solving all QNs would be O(M C S B (I+1) S B ). SEQ scenarios: Since there are at mostS scenarios in the system, there are at mostS states in the SBM of a SEQ scenario, because each scenario is represented by a state in the SBM of a SEQ scenario. Therefore, the complexities of generating and solving the SBM of a SEQ scenario areO(S 2 ) andO(S 3 ), respectively, and the space complexity is O(S 2 ), as discussed above. PAR scenarios: In the worst case, allS scenarios run in parallel. The most expensive step in solving forP(C j ) is to solve a concurrency-level model for each scenario (each with at most O(I) states). Therefore, the complexity of solving for P(C j ) is O(SI 3 ). Solving forR(C j ) involves combine the reliabilities of the child scenarios according to the failure conditions. The complexity of solving Eq. (3.11) isO(I), and hence the com- plexity for computing R(C j ) using Eq. (3.13) is O(SI). Therefore, the complexity of solving for reliability of a PAR scenario isO(S I (SI 3 +SI)) =O(S I SI 3 ) =O(S I+1 I 3 ). The space complexity of solving the PAR scenarios is O(SI), as we need to store the results of the concurrency-level models. Note that we have not considered the computational cost savings of model trunca- tion in this complexity analysis, as model truncation does not reduce the worst-case complexity. Overall Complexity: First, since there are S B basic scenarios, the com- plexity of generating and solving all S B SBMs of the basic scenarios is O(S B (M 3C +M C S B (I+1) S B )). There are S I intermediary scenarios, which each 13 The visit ratios correspond to the the number of times the “callee queue” (e.g., the Gateway queue in Figure 3.8) is visited, per visit to the “caller queue” (e.g., Sensor queue in Figure 3.8). In our example, these are 1:1, but in general, the “callee queue” can be visited multiple times, per visit to the “caller queue”. 62 of them could either be s SEQ or PAR scenario. As we do not know which of SEQ or PAR scenario is more expensive to solve in the worst case (it depends on the values of S and I), we describe the complexity of solve an intermediary scenario to be O(max(S 3 ,S (I+1) I 3 )). Therefore, the overall time complexity is O(S B (M 3C +M C S B (I+1) S B ))+S I max(S 3 ,S (I+1) I 3 )). In analyzing the overall space complexity, we need to consider the space needed to store the results of the scenarios that have been processed, in addition to the space needed to store the SBM of the scenario that is being processed. Since we store ther i and t i of eachS scenario in the worst case, the space needed to store the results ofS scenarios is O(2S). The “last” scenario could be a basic, SEQ, or a PAR scenario, so the space complexity is the maximum space needed among the three types of scenarios. Thus, the overall space complexity isO(S+max(M 2C ,S 2 ,SI)). In the flat model, we first apply parallel composition using all components, for which the complexity isO(M 2C ). The time complexity of solving the flat model isO(M 3C ). Therefore, the overall time complexity of the flat model approach isO(M 2C +M 3C ) = O(M 3C ). Since the flat model has as many as O(M C ) states, its space complexity is O(M 2C ). Computational Cost in Practice In our worst-case analysis above, it appears as if the flat model has the better time com- plexity. This is (partly) because in solving the SBM of the basic scenarios in SHARP, in the worst case, there areM C states in each SBM (i.e., just as in the flat model). However, in practice the worst case will be very unlikely. Specifically, the worst case analysis assumes that all states in all components participate in all scenarios. In practice, we expect that (a) the number of states participating in a scenario from a particular component, as well as (b) the number of components participating in that scenario, will 63 Table 3.5: Summary of computational costs in practice U C M S B Flat Model SHARP MIDAS (6 sensors) 5 12 5 5 1.52×10 12 692 GPS 5 5 17 21 1.17×10 11 1331 Client-Server (8 clients) 2 9 2 1 1.34×10 8 737 be substantially smaller thanM andC, respectively. In contrast, even in practice, the flat model approach requires generation of the entire system model that involves using all states in all components. Thus, we expect the worst case analysis of the flat model approach to be reflective of the practice. Another reason for the increased worst case complexity of SHARP is the assumption that all scenario types participate in all resource contention points, which leads to more costly solutions of QNs, used for contention modeling. Again, in practice, we expect that the number of scenario types contending for the same resource would be substantially smaller thanS. Moreover, many approximation techniques exist in the QN literature, which based on our experience should work well, given the simple structure of our QNs. For instance, Schweitzer’s approximation [64] would result in anO(S) worst case solution. Table 3.5 summarizes the computational costs in practice to solve for system relia- bility using the flat model and SHARP of the three systems we evaluated. The computa- tional cost savings using SHARP are significant in all three systems. This illustrates that SHARP is able to avoid scalability problems by generating and solving many smaller models, instead of generating and solving one huge model as in the flat model. Compar- ing the computational costs of the three systems yields some interesting observations. We noticed that it is more expensive to solve the GPS system than the MIDAS systems, since the GPS system is modeled as 21 basic scenarios, and we generate and solve a SBM for each scenario. Although systems with more basic scenarios (which are typ- ically more complex) are more expensive to solve in SHARP, the computational costs 64 2 4 6 10 3 10 6 10 9 10 12 10 15 Number of Gateways Num operations SHARP Flat Model Figure 3.14: Computational Cost in Practice in practice are still significantly lower, as compared to the flat model. While the client- server system is a simpler system than MIDAS, it costs more to solve. This is because I is larger in the client-server system (I = 8) than MIDAS (I = 3), which results in a larger model in a PAR scenario. The computational cost of the client-server system can be reduced using model truncation, as described in Chapter 3.3.3 (Step 3.3). Figure 3.14 illustrates how computational costs increase as the system becomes more complex. Here, we vary the number of Gateways in MIDAS (x-axis), and assume that each Gateway connects to two Sensors. We plot the number of addition/multiplication operations needed to solve the two resulting models on the y-axis. Otherwise, the sys- tem is the same as the example used throughout this chapter. Note that the y-axis of Figure 3.14 is plotted on a logarithmic-scale. As can be seen from the figure, the com- putational cost of SHARP is much lower and grows significantly slower than that of the flat model. For example, it takes more than 10 12 operations to compute the reliability solution of the MIDAS system with 6 Gateways using the flat model, while it only takes 692 operations to compute the solution using SHARP. Since the SBMs are likely to be smaller than the flat model, we argue that SHARP requires significantly less space in practice than the flat model. The savings are also due to the fact that we can generate and solve the SBM one at a time, and thus reuse the space. 65 As we discussed in Chapter 3.3.3, it is also possible to reduce the computational cost of SHARP via model truncation. We study the tradeoff between further reducing the computational cost and loss of accuracy in Chapter 3.4.2. Lastly, given that SHARP takes the approach of solving many smaller models rather than one large model, if parallel processing is available, we could solve our models in parallel. 3.4.2 Accuracy Our goal is to provide evidence that SHARP is sufficiently accurate to be used in making design decisions. The goal of design-time approaches is to analyze the effect of different design decisions on reliability rather than obtain absolute reliability measurements. 14 Therefore, we compare the sensitivities of SHARP and the corresponding flat model: if the differences in the change in reliability estimates are reasonably small when the same parameter is varied in both SHARP and the flat model, then SHARP can be considered accurate. Sensitivity Analysis First, we compare the sensitivities of SHARP and the flat model when model parameters change. We vary a parameter within a range (to be specified below), and observe how system reliability changes. Here, we present results corresponding to varying failure- related parameters in the MIDAS and GPS systems. We performed similar experiments by varying other parameters and using several other systems’ models. The results were qualitatively similar and are omitted here for brevity. 14 For example, at implementation time, it may be appropriate to evaluate a system’s reliability using the five 9’s standard. However, this is not typically meaningful at design time. 66 0.1 0.2 0.3 0.4 0.5 0.8 0.85 0.9 0.95 1 f1 System Reliability Flat Model SHARP (a) Sensor failure 0.2 0.4 0.6 0.8 0.99 0.992 0.994 0.996 0.998 1 r1 System Reliability Flat Model SHARP (b) Sensor recovery Figure 3.15: Sensitivity Analysis of the Sensor PAR scenario The inaccuracies in our estimates come from the solution of the PAR scenarios, because of the approximations we made (recall Chapter 3.3.3). We generate the SBM of the basic scenarios using the same technique as in existing techniques, therefore the results are the same. The solution of the SEQ scenarios is exact: the steady state prob- ability using our stochastic complementation-based approach is the same as one would solve it directly (by “flattening out” the model and connect the child scenarios appropri- ately) [49]. We compare the sensitivity at the level of a scenario, and our results of the Sen- sor PAR scenarios are depicted in Figure 3.15. We varied the failure rate of the two Sensors between 0.1 to 0.5 in Figure 3.15(a) and the recovery rate between 0.2 and 0.8 in Figure 3.15(b). We vary the parameters one at a time, maintaining other parameters fixed at their default values (Default values of the MIDAS system are given in Fig- ure 3.2). We varied other parameters in the Sensor PAR scenario, and the results are qualitatively similar. As we can see, the reliability estimates of both SHARP and flat model vary at a simi- lar rate when the parameters change. For example, in Figure 3.15(a), scenario reliability drops from 0.96 to 0.87 in SHARP, and from 0.95 to 0.84 in the flat model, respectively. 67 0.1 0.2 0.3 0.4 0.5 0.95 0.96 0.97 0.98 0.99 1 f1 System Reliability Flat Model SHARP (a) Sensor failure 0.1 0.2 0.3 0.4 0.5 0.99 0.992 0.994 0.996 0.998 1 f2 System Reliability Flat Model SHARP (b) Hub failure 0.1 0.2 0.3 0.4 0.5 0.9 0.92 0.94 0.96 0.98 1 f2 System Reliability Flat Model SHARP (c) EM failure 0.1 0.2 0.3 0.4 0.5 0.985 0.99 0.995 1 f1 System Reliability Flat Model SHARP (d) RG failure 0.2 0.4 0.6 0.8 0.997 0.998 0.999 1 r1 System Reliability Flat Model SHARP (e) Sensor recovery 0.2 0.4 0.6 0.8 0.9988 0.999 0.9992 0.9994 0.9996 0.9998 r2 System Reliability Flat Model SHARP (f) Hub recovery 0.2 0.4 0.6 0.8 0.97 0.98 0.99 1 r2 System Reliability Flat Model SHARP (g) EM recovery 0.2 0.4 0.6 0.8 0.96 0.97 0.98 0.99 1 r1 System Reliability Flat Model SHARP (h) RG recovery Figure 3.16: Sensitivity analysis at the system level Next, we study how the inaccuracies propagate to the system level. In Fig- ures 3.16(a) - (d), we vary the failure rates of the Sensor and Hub components in MIDAS, and the EM and RG components in GPS between 0.1 and 0.5. In Fig- ures 3.16(e) - (h), we vary the recovery rates of the Sensor, Hub components in MIDAS, and the EM and RG components in GPS between 0.2 and 0.8, As in the experiments at the level of a scenario, other parameters fixed at their default values. In these experiments, we observe that results obtained from SHARP closely follow the flat model. This suggests that SHARP is accurate in predicting system reliability, while in practice it should result in much better scalability than the flat model approach. We also illustrate that SHARP is useful in determining which components are more critical to a system’s reliability. We have verified this property of SHARP in a number of examples. For instance, in Figure 3.16, when we vary the failure rates of Sensor (Fig- ure 3.16(a)) and Hub (Figure 3.16(b)) between 0.1 and 0.5, system reliabilities obtained from SHARP change by 4% and 0.4%, respectively. Since the system’s reliability is affected more by the changes in Sensor’s failure rate than Hub’s, under these condi- tions Sensor is the more critical component. Note that the differences in the change in 68 0.05 0.1 0.15 0.2 0.8 0.85 0.9 0.95 1 f System Reliability Flat Model SHARP SHARP−contention (a) Server failure 2 4 6 8 0.95 0.96 0.97 0.98 0.99 1 r System Reliability Flat Model SHARP SHARP−contention (b) Server recovery Figure 3.17: Sensitivity analysis of the Client-Server system reliability estimates between SHARP and the flat model are very small (within a few percent). We have not observed significant deviations from the flat model in any of our studies. We can thus conclude that SHARP is useful in this analysis. Effect of Contention Modeling This section aims at illustrating the importance of modeling contention in SHARP. We use a very simple client-server system with a single scenario for reasons given above. By increasing the number of clients, we can model a highly-contended system. For example, our results with one server and 8 clients are depicted in Figure 3.17, where we present the results of using SHARP without contention modeling, SHARP with con- tention modeling by considering the server as a FCFS callee, as well as results from the flat model (which includes contention) as a baseline for comparison. The differ- ences between the results obtained from SHARP without contention modeling and the flat model can be as large as 12% (when the failure rate is 0.2), while the results with contention modeling are much more accurate (the error is generally about 2%, and no larger than 5%, when the failure rate is 0.2). This occurs because, without contention modeling, SHARP includes the time spent waiting to be served as processing time, thus 69 10 −3 10 −2 0 2 4 6 8 10 x 10 5 Threshold Num ops w/o truncation w/ truncation Figure 3.18: Computational cost of SHARP with and without truncation 10 −3 10 −2 0 2 4 6 8 x 10 −3 Threshold Error Figure 3.19: Errors caused by model truncation overestimating the processing time. In turn, this lowers the reliability because process- ing a request may trigger a defect in the server that waiting for service does not. Results obtained using other systems are qualitatively similar, and are omitted for brevity. Effect of Model Truncation In evaluating the effect of truncation in Chapter 3.3.3, first we study the computational cost savings. In Figure 3.18, we plot the number of operations needed to solve for the reliability of MIDAS with one GUI, AC, and Hub, and vary the number of Gateways. As in Chapter 3.4.1, we assume that each Gateway connects to two Sensors, and increase the number of Sensors accordingly. The interactions of each Gateway with other com- ponents are modeled as an instance of the SensorMeasurement scenario. There are at most 100 instances of the SensorMeasurement scenario, and at most one instance of the GUIRequest scenario. In Figure 3.18, we varied the threshold (x-axis, plotted in logarithmic-scale), and plotted the number of operations needed to solve SHARP with truncation. We fixed the scenario reliability of SensorMeasurement at 0.99, and the completion time at 1. The system is considered to have failed when any instance of SensorMeasurement has failed. The cost without truncation is our baseline, and can be 70 considered as having a threshold of 0. As we can see from Figure 3.18, the computa- tional cost savings can be significant: when the threshold is 10 −2 , the number of oper- ations needed to solve SHARP with and without truncation are approximately 1× 10 6 and 6200, respectively. These results indicate that model truncation reduces the compu- tationally cost in generating the scenario combinations. Next, we study the error in reliability estimates when truncation is used (i.e., we consider only a small range of possible values of the number of active instances). The results are depicted in Figure 3.19. We varied the threshold in the x-axis, and plotted the error in reliability estimates as compared to the results without truncation (y-axis). When the threshold is small (i.e., we consider a wider ranges of values), the error is smaller, with the largest error being 0.8% (when the threshold is 10 −2 ). 3.5 Conclusions We presented SHARP, a scalable framework for predicting reliability of concurrent sys- tems. Our main idea in modeling concurrency is to allow multiple instances of system scenarios to run simultaneously. We overcame inherent scalability problems by leverag- ing scenario models and using an (approximate) hierarchical technique which allowed generation and solution of smaller parts of the overall model at a time. Our experimental evaluation showed that SHARP is more scalable than existing approaches in practice, and its scalability is achieved without significant degradation in the accuracy of system reliability predictions. 71 Chapter 4 Performance Estimation of Third-Party Components As discussed earlier, it is expensive to apply testing-based approaches to assess the quality of software systems. To address this problem in the context of performance estimation, we propose a queueing model-based framework that estimates software per- formance at high workloads, by applying regression analysis using performance testing data collected at low workloads. We focus on applying this framework to estimating the performance for Web services (WSs), because, as discussed in Chapter 1, software designers have to rely on testing-based approaches to evaluate the quality of software in the “Binaries Accessible” category, and WS is one such example. An overview of our framework is depicted in Figure 4.1. In Step 1, we collect performance data of the WS being tested using performance testing. In Step 2, we apply regression analysis to estimate response time at points that are not sampled dur- ing performance testing, using data collected in Step 1. Our main contribution is in Step 2, where we propose incorporation of queueing models in this process, in order Black-box WS Step 1: Performance Testing testing traffic performance data Regression Performance prediction regression function Step 2: Regression Analysis Figure 4.1: An overview of our WS performance prediction framework 72 to overcome the poor extrapolation results typically obtained using standard regression analysis-based techniques (as detailed below). 4.1 A Framework for WS Performance Prediction We describe our framework in details in this section. Specifically, in Step 1, we send requests to the WS being tested, and collect the corresponding average response time. This process is repeated at different workload intensities. Performance testing (Step 1, Chapter 4.1.1) is typically done to ensure that the system of interest conforms to some performance expectations. The challenge in this step is that performance testing is quite expensive, as it involves making a large number of requests to the WS being tested. This is especially the case for testing under heavy loads, where the testing process can greatly affect the normal operation of the WS being tested. The implication then is that we have limited data, particularly for the system under heavy loads, to predict the WS performance. Thus, it is highly desirable to be able to extrapolate – using data collected under lighter loads to construct predictive models which are capable of predicting well under heavier loads. In other words, Step 2 (Chapter 4.1.2) involves predicting response time beyond the sampled arrival rates in Step 1. Extrapolation is a challenging problem, and we have confirmed that standard regression approaches perform poorly at this task (refer to Chapter 4.1.2). Therefore, we propose to use queueing models for response time prediction, which, however, may give less accurate interpolation results. This motivates us to derive a hybrid approach that combines the more accurate interpolation results when using standard regression approaches, with the more accurate extrapolation results when using queueing models. 73 4.1.1 Step 1: Performance Testing Performance testing has been used in evaluating software performance to ensure the system performs as expected [51]. The goal of performance testing is to understand the system’s properties, such as system throughput and response time, given a controlled workload. Performance testing may assume an open model, in which clients arrive to the sys- tem at a pre-specified arrival rate ˆ λ E , and leave the system once the request has been served. It may also assume a closed model, in which the number of clients is fixed. In either case, we are interested in observing the response time when we vary the arrival rate in an open model, or when we vary the number of clients in a closed model. In the remainder of this chapter, we assume the use of an open model, and generate arrivals accordingly to a Poisson process. (We note that, as a client of a third-party WS, we can control the arrival process.) Specifically, we generateD j requests at rate ˆ λ j E , and measure the response time to each request k, ˆ T j,k . We can then compute the average response time as ˆ T j = 1 D j X k ˆ T j,k (4.1) We repeat the test at different values of ˆ λ j E , and compute the corresponding ˆ T j . A shortcoming of performance testing is the assumption that the system does not change over the duration of the test. This includes the WS being tested, any other third- party WSs involved, as well as network conditions. In real-world applications, this may not be the case. For example, making a large number of requests to a WS may be perceived as an attack. Thus, administrators may block the testing traffic, and, as a result, we would not be able to gather performance data. This again motivates the need to limit performance testing, particularly at higher workloads, and devise approaches for accurate extrapolation. 74 4.1.2 Step 2: Regression Analysis The goal of regression analysis is to model and estimate the input-output relationship between random variables based on observed data, and then use the model for predic- tion. In our context, we apply regression analysis to model the relationship between the arrival rate and the WS response time, and predict WS response times at arrival rates that are not sampled during performance testing. The stage of modeling is often referred as “training”. We often need to assess the effectiveness of a trained model before we deploy it to real-world environments to make prediction. The stage of assessment is often referred as “testing” (or “evaluation” to avoid being confused with performance testing). The assessment is accomplished by comparing the model’s prediction on data with known arrival rates and responses times. However, such data should have no over- lap with the data used in the training stage so that the model is not over-optimistic. As noted earlier, we differentiate two different types of predictions: interpolation when the arrival dates are within the range of those being collected during performance testing, and extrapolation when the arrival dates are outside the range. Statistical models for regression analysis can be broadly classified into parametric and nonparametric: Parametric regression: In parametric regression, we specify a regression function with unknown parameters to capture the relationship between the arrival rate and the response time. One can leverage prior knowledge about the relationships among variables to determine a suitable regression function. An example regression function is an N th - degree polynomial, i.e., T(λ E ,~ α) = N X i=0 α i (λ E ) i (4.2) 75 where λ E is the average customer arrival rate to the WS. ~ α is the unknown parameter vector, representing the coefficients of the polynomial. We estimate it using performance testing data. More specifically, given a regression function and data from performance testing (pairs of values of ˆ λ j E and ˆ T j ), we would like to find~ α = (α 1 ,α 2 ,...,α K ) such that the mean squared error between the measured response time and the model’s prediction is minimized. This problem can be formulated as the following optimization problem: minimize ~ α X j ( ˆ T j −T( ˆ λ j E ,~ α)) 2 (4.3) whereT( ˆ λ j E ,~ α) is the predicted response time when the external arrival rate is ˆ λ j E . This problem can be solved using standard optimization techniques [26]. In addition to fitting data with a polynomial, we have also considered fitting the data with an exponential function. i.e., T(λ E ,~ α) =α 1 e α 2 λ E (4.4) where~ α is estimated using regression. Once we have estimated the unknown parameters, we predict response time, by plugging in the arrival rate of interest, λ E , and parameters estimated from regression, ~ α ∗ . In this work, we consider another two types of regression functions – splines and neural networks. Splines are piecewise-smooth polynomials. We used cubic splines in our experiments (a standard choice in many applications [25]). Neural networks (NN) are another common approach for regression. Architecturally, a neural network is a set of connected linear and nonlinear “neurons”. They can model highly nonlinear functions with sufficiently complicated network architecture. In our 76 experiments, we have used 3-layer neural networks. The first layer is the input layer, representing the arrival rate. The output layer corresponds to the response time. The hidden layer is a layer of nonlinear processing units which transform the input with tanh functions. The transformed inputs are then linearly combined to form the output [13]. Nonparametric regression: Nonparametric approaches make predictions directly uti- lizing the observed data, without specifying explicitly a regression function. An exam- ple of nonparametric approaches is Gaussian process (GP) [57]. In our work, the GP encodes similarity among data (i.e., pairs of arrival rates and response times) with ker- nel functions and makes predictions by combining (nonlinear) response times from observed data. Intuitively, a closer training data at ˆ λ E to λ E contributes more to the final prediction onλ E . Our experiments use the so-called “neural networktanh kernel” as it performs the best when compared to a few other alternatives. A Shortcoming of Standard Regression Analysis To illustrate a shortcoming of applying standard regression analysis for WS performance estimation, we show how well these approaches extrapolate. A more comprehensive validation is presented in Chapter 4.2. In this experiment, we use extrapolation error as our metric. We collected perfor- mance testing data by varying the arrival rates, until the system has been saturated (i.e., when the system has started returning errors because of resource saturation). We then divide the data into two sets: the training set and the validation set. Data in the training set, consisting of the data points in the bottom 60% of the arrival rates sampled, was supplied to the regression algorithm. Then, we compute the extrapolation error by com- paring the predicted response time and data in the validation set, which corresponds to the data points in the upper 40% of the arrival rates sampled. 77 6 8 10 12 14 16 0 10 20 (a) Poly Arrival Rate Resp Time 6 8 10 12 14 16 0 10 20 (b) Exp Arrival Rate Resp Time 6 8 10 12 14 16 0 10 20 (c) Spline Arrival Rate Resp Time 6 8 10 12 14 16 0 10 20 (d) NN Arrival Rate Resp Time 6 8 10 12 14 16 0 10 20 (e) GP Arrival Rate Resp Time Figure 4.2: Extrapolation using standard regression methods Here, we show the results using the Java Adventure Builder (AB) application [3]. This simple travel agent WS is provided by Sun to demonstrate the development and deployment of a WS. It is an atomic WS (i.e., one that does not make requests to other WSs), that makes requests to a local database server. Our system has 54 customers and 1,022 bookings. The extrapolation results are depicted in Figure 4.2. Data in the training set and the validation set are depicted as circles and squares, respectively. We depict results based on an 8 th -degree polynomial in Figure 4.2(a) as, in this experiment, the results of using an 8 th -degree polynomial were more accurate than those using polynomials of other degrees. We observe, from Figure 4.2, that standard regression techniques are unable to predict response time when the arrival rates are outside of the data used as input to regression analysis. Specifically, all five approaches we studied predict the response time to remain flat when the arrival rate increases beyond the sampled arrival rates, instead of increasing rapidly as the system nears saturation. Indeed, the fact that standard regression approaches may give poor extrapolation results is a well-known problem in the regression literature. A Queueing Model-Based Framework To address the shortcoming that standard regression approaches tend to perform poorly at extrapolation, we propose a queueing network-based framework to estimate the response time of black-box WSs. More specifically, we use queueing models to derive 78 a function that describes the relationship between arrival rates and response time; this function is then used as the regression function in parametric regression for response time prediction. The challenge is, however, that we do not know the structure of the WS being tested. For example, we do not know if it is deployed on a server, using a three-tier architecture as in [74], or if it makes use of other WSs. In the absence of structural information, we approach this problem by using a suite of queueing models, and as shown in Chapter 4.2, this provides us with insight about the performance of the WS. For example, we can determine the stability conditions of the WS using the most pessimistic model. In presenting our queueing model-based framework, we first discuss single-queue models, followed by queueing network models. We also give several instantiations of the queueing models we have considered in our evaluation in Chapter 4.2. Single-Queue Models: A single-queue model is characterized by: (1) the arrival pro- cess, which describes the workload characteristics; (2) the service time distribution, which describes the characteristics of the servers; and (3) the number of servers, which describes the degree of concurrency. As a client to a WS, we can control the arrival process by adjusting the performance testing parameters. Parameters related to the ser- vice time distribution are estimated using regression, while the number of servers is determined by the system modelers. Given this information, we can derive the average response time as a function of arrival rate and other model parameters, and estimate model parameters by applying standard regression analysis using data collected from performance testing. Since information about the WS being tested is limited, in general, it is challenging to determine the number of servers and the service time distribution. However, in our validation in Chapter 4.2, we show that even with simple queueing models (as detailed below) one can gain valuable insight into the WS being tested. For instance, we can 79 15 15.1 15.2 15.3 15.4 0 200 400 600 800 1000 μ MSE Figure 4.3: An Example Objective Function determine the stability conditions of the WS, which are useful in, for instance, deter- mining how much workload one should send to that WS. M/M/1 Model: As an example, let us consider the M/M/1 model (i.e., with a Poisson arrival process and exponential service time distribution). The corresponding average response time is then [69] T(λ E ,μ) = 1/(μ−λ E ) (4.5) where λ E and μ are the average customer arrival and service rates, respectively. We apply regression analysis to estimate μ using performance testing data. In applying regression analysis, we need to specify constraints to ensure that the resulting system is stable, i.e., in the case of the M/M/1 model, thatμ>λ E . Another important consideration is the choice of a starting point to the regression problem. It has been proven that regression algorithms can find the global optima when the objective function is convex, no matter which starting point we choose [26]. How- ever, as opposed to using a polynomial as a regression function, using queueing models may result in a non-convex objective function. An example of our objective function is depicted in Figure 4.3, in which we apply the M/M/1 model to the AB WS. In this example, the highest arrival rates we used in performance testing, ˆ λ max E = max j ˆ λ j E , is 15, and we variedμ (x-axis) between 15.01 and 15.5. The MSE is plotted on the y-axis, 80 6 8 10 12 14 16 0 10 20 Arrival Rate Resp Time M/M/1 M/G/1 Training Validation (a) AB 1 1.5 2 2.5 0 5 10 15 20 Arrival Rate Resp Time M/M/1 M/M/m QN Training Validation (b) TPC Figure 4.4: Extrapolation using queueing models and it can be shown that this objective function is not convex. It appears that the objec- tive function is convex whenμ<μ ∗ , whereμ ∗ is the optimal point that gives the lowest MSE (in this example, when μ ≃ 15.03). Therefore, we set μ slightly larger than the highest arrival rates used in the performance, ˆ λ max E (we setμ to 1.001 ˆ λ max E in our evalu- ation). We have experimented with other starting points, and our results have indicated that our choice has yielded good results. We are unsure if our choice of a starting point is optimal; such analysis is out of the scope of this chapter. We apply regression analysis to predict the response time of the AB WS using the M/M/1 model, with results depicted in Figure 4.4(a). Even though the M/M/1 model can predict the rapid increase in response time (beyond a certain load), it does so pessimisti- cally in this case, i.e., this increase occurs much sooner than in the actual system. One reason for this is that the exponential service time distribution assumption is unlikely to hold in a real system. Thus, the M/M/1 model illustrates the basic idea and motivates the use of more complex models, as we do next. 81 TPC-App WS Host 1 Host 2 Order WS Client DB Item WS DB Customer WS DB Host 3 Host 4 Figure 4.5: TPC WS Architecture M/G/1 Model: The M/G/1 model allows a general service time distribution that is characterized by its mean and variance. The corresponding average response time is [69]: T(λ E ,μ,σ) = 1 μ + λ E σ 2 2(1− λ E μ ) (4.6) whereσ 2 is the variance of the service time distribution. We apply regression analysis to estimate bothμ andσ using performance testing data. The results of predicting response time of the AB WS using M/G/1 model is depicted in Figure 4.4(a). The M/G/1 model is more accurate than the M/M/1 model, due to the more general model of the service time distribution. M/M/m Model: The M/M/m model relaxes the single-server assumption of the M/M/1 model, i.e., we have a single queue withm servers. The corresponding average response time is [69]: T(λ E ,μ,m) =P Q ρ λ E (1−ρ) + 1 μ (4.7) whereρ =λ E /(mμ) andP Q , the probability of queueing, is given by [69]: P Q = (mρ) m m! p 0 1−ρ (4.8) p 0 = ( m−1 X k=0 (mρ) k k! + (mρ) m m! 1 (1−ρ) ) −1 (4.9) 82 To illustrate the use of multi-server queueing models, we present results using the TPC-App benchmark [6] we deployed, which we refer to as TPC WS in the remainder of the chapter. This benchmark emulates a bookstore WS environment, in which customers can create an account, search for books, place an order, and retrieve the status of an order. Our deployment of TPC is depicted in Figure 4.5. The WS makes use of several internal WSs: an Order WS, an Item WS, and a Customer WS. Each of the three internal WSs runs on a separate physical machine, and queries a local database. Our system has 100 customers, 500 books, and 30,000 order records. The extrapolation results 1 using the TPC WS are depicted in Figure 4.4(b). While the results based on the M/M/m model are more accurate than those based on the M/M/1 model, they are still pessimistic. One reason is that the TPC WS was deployed on four machines, and each machine has its own queue. Therefore, a single-queue, multi-server model, such as the M/M/m model, may not be as accurate as a model with multiple queues, which motivates consideration of queueing network models. Queueing Network Models: To simplify our discussion, we assume an open 2 QN of M/M/1 queues. We also assume there is only one class of customers: the arrival and service processes for all customers are the same. In such a QN, a queue may, e.g., represent an internal server (such as a Web server or a database server), or another WS. With these assumptions, our QN is a product-form network [11]. 3 The first piece of information needed in addition to single-queue models is the num- ber of queues, which is estimated by the system modelers. This corresponds to the number of physical servers (e.g., database and application servers) that serve a client’s request. One approach is to try different number of queues, and determine which gives 1 Herem = 2; different values ofm gave less accurate results. 2 A closed model can be used without significant changes to our approach. 3 In general, more complex QNs can be used and still remain product-form [11]; We would then update Eqs. (4.12) – (4.15) to reflect that. 83 the most accurate results. We use a two-queue QN to model the TPC WS in Chapter 4.2, as it generates the most accurate results among QNs with different number of queues. For each queue, in addition to the parameters specified in single-queue models, we need to determine its visit ratio, using regression (see below). We now define a QN model more formally. LetK be the number of queues,p i,j be the probability of going to queuej upon leaving queuei;p E,i be the probability that an external arrival goes to queue i, and p i,E be the probability that a customer leaves the system upon leaving queuei, where P j p i,j +p i,E = 1. Note that in a WS, a customer always arrives at the WS being tested (e.g., a customer cannot send requests directly to an internal database server). If we assume Queue 1 is the WS being tested, then p E,1 = 1, andp E,i = 0 for alli6= 1. The visit ratio of queuei is given by [69] v i =p E,i + X j v j p j,i (4.10) where the total arrival rate at queuei is λ i =λ E v i (4.11) Given λ i for each M/M/1 queue i, the average number of customers in queue i, N i , is [69]: N i = λ i μ i −λ i = λ E v i μ i −λ E v i (4.12) Since the QN is product-form, the joint probability of havingn i customers in queue i, 1≤i≤K, is P(n 1 ,n 2 ,...,n K ) = Y i P(n i ) (4.13) 84 whereP(n i ) is the probability that there aren i customers in queuei. Here, the average number of customers in the system,N, is N = X i N i = X i λ E v i μ i −λ E v i (4.14) Thus, using Little’s result [69], the average response time is T = N λ E = 1 λ E X i λ E v i μ i −λ E v i = X i v i μ i −λ E v i (4.15) We can simplify our process as follows: instead of estimating the entire routing matrix (i.e., thep i,j ’s) and compute the visit ratios, we choose to estimate the visit ratio v i ’s directly. Furthermore, if we multiply Eq. (4.15) by (1/v i )/(1/v i ), we obtain: T = X i v i μ i −λ E v i × 1/v i 1/v i (4.16) = X i 1 μ i /v i −λ E = X i 1 α i −λ E (4.17) whereα i =μ i /v i . Rewriting Eq. (4.15) as Eq. (4.17) allows us to simplify the response time estimation process by using regression analysis to estimateμ i /v i directly, instead of their individual values. We apply this QN model to the data collected from the TPC WS, with the results depicted in Figure 4.4(b). We observe that the QN model is more accurate than the M/M/1 and M/M/m models, because of its more accurate description of the TPC WS’s structure. This QN model, however, is too optimistic when the arrival rate is high. This suggests that we should use a suite of queueing models to understand the behavior of a WS, rather than a single model. 85 Table 4.1: Comparisons of TPC WS interpolation results λ measured QN error NN error 0.7 1.13700 1.03342 0.10358 1.14992 0.01292 1.1 1.35670 1.31543 0.04127 1.33618 0.02052 1.5 1.74110 1.82561 0.08451 1.78227 0.04117 1.9 2.94880 3.09797 0.14917 2.90909 0.03971 A Shortcoming of Queueing Models While the extrapolation results using queueing models are better than those of standard approaches, their interpolation results are not as good. This can be explained as follows. System response time increases rapidly when the system is close to being saturated, and hence the slope of the response time function is very steep when λ E is high. This property causes the regression algorithm to try fitting data at high workload intensity, because a slight error in the estimated parameters results in very large errors in these data points. Given that the queueing models usually have few parameters to fit (e.g., the M/M/1 model only has one parameter), the regression algorithm cannot adjust the parameters to fit data at low workload intensity, and hence the response time estimates at low workload intensity are not as good using queueing models. On the other hand, stan- dard approaches are usually more flexible in fitting data at both low and high workload intensities, and hence are able to produce more accurate interpolation results. As an illustrative example, consider the TPC WS we used earlier. We provided every other data point collected during performance testing as training data, and the remaining data points were used to compute interpolation errors. We show results using the QN model and NN, because these results are most accurate among queueing models and standard regression approaches, respectively. Note that we present the results here as a motivation for the hybrid approach in Chapter 4.1.2; we will present a more comprehen- sive validation with other aforementioned models and WSs in Chapter 4.2. The results are depicted in Table 4.1. 86 performance data Step 2a Fitting data to a queueing model Step 2b Generating new data points Step 2c Apply standard regression model parameters performance prediction new data points queueing model Regression Data Generation Regression Figure 4.6: An overview of the hybrid approach We can see that the interpolation errors of QN are higher than those of NN, e.g., when λ E = 0.7, the error of NN is 0.01292 (or 1.136%), while the error of QN is 0.10358 (or 9.11%). These results have motivated us to derive a hybrid approach, that takes advantage of the low interpolation errors of standard regression approaches at low workload intensity, and more accurate extrapolation results of the queueing models at high workload intensity. A Hybrid Approach How do we take advantage of the better interpolation accuracy of standard regression approaches at low workload intensity, and the better extrapolation results of the queueing models at high workload intensity? Figure 4.6 illustrates our proposed hybrid approach. Recall that ˆ λ max E is the highest arrival rate sampled during performance testing. The main idea is to first fit queueing models with performance testing data at the sampled arrival rates (λ E ≤ ˆ λ max E , Step 2a), and then generate new performance data points at higher arrival rates (λ E > ˆ λ max E ) using the fitted queueing model (Step 2b). In the final Step 2c, we augment the real performance testing data with the QN-predicted 87 0.5 1 1.5 2 2.5 0 10 20 Arrival rate Resp time Training New Training Validation Figure 4.7: Results usingQN 3 performance testing data. We then apply standard regression approaches to the aug- mented data to build a new prediction model which fuses knowledge from the queueing model. We hypothesize that the resulting model has low interpolation errors at low work- load intensity, as compared to using queueing models alone, while being able to extrap- olate response time at high workload intensity, as compared to using standard regression approaches alone. The following example supports the hypothesis. More detailed vali- dation results are given in the next section. As an illustrative example, consider applying this hybrid approach to the TPC WS. Since the interpolation results using NN are most accurate among standard approaches, and the extrapolation results using QN are most accurate among queueing models (Fig- ure 4.4(b)), we use QN in Steps 2a and 2b, and NN in Step 2c in the results to be presented here and in Chapter 4.2. We refer to this approach asQN 3 . Step 2a: We fit data collected during performance testing at the sampled arrival rates, depicted as circles in Figure 4.7, using a QN with 2 queues (introduced in Chapter 4.1.2). In this example, the parameters of the QN, obtained using regression analysis by sup- plying Eq. (4.17) as the regression function, areα 1 = 2.5908 andα 2 = 2.5912. 88 Table 4.2: Errors in response time estimates usingQN 3 λ E measured QN 3 QN NN 0.7 1.13700 0.01381 0.07936 0.00310 0.9 1.22280 0.00072 0.04008 0.00536 1.1 1.35670 0.01968 0.01533 0 1.3 1.55030 0.03610 0.00113 0 1.5 1.74110 0.04974 0.09206 0 1.7 2.20000 0.02348 0.04464 0.32503 1.9 2.94880 0.04137 0.05449 1.00066 2.1 5.05620 0.98818 0.98297 3.07344 2.3 21.17940 14.29826 14.30680 19.18136 Step 2b: The next step is to generate new data points using this QN model by plugging in λ E > ˆ λ max E , α 1 and α 2 into Eq. (4.17). In our example, the new data points are depicted as triangles in Figure 4.7. Step 2c: Finally, we take the data from Steps 2a and 2b as inputs to a standard regression approach (in our example NN), with results depicted in Table 4.2 and Figure 4.7. The results in Table 4.2 indicate that the interpolation errors of QN 3 are compa- rable to using NN alone and are lower than using QN alone. At the same time, the extrapolation errors of QN 3 are very close to using QN, and are lower than using NN alone (which produces poor extrapolation results). These results illustrate thatQN 3 is more accurate than using either QN or NN alone. A more comprehensive validation is presented next. 4.2 Validation We perform an extensive evaluation and comparison of the approaches described in Chapter 4.1, i.e., standard regression techniques, queueing models (QN, M/M/1, 89 M/M/m, and M/G/1), and QN 3 . Concretely, we analyzed 4 WSs with different con- figurations. We predict response times using above stated approaches and report their errors. The 4 WSs which have analyzed are the AB WS and the TPC WS that we deployed in a controlled environment (both described earlier), and the Weather WS [7] and the Geocoding WS [1] that are “live”. Analysis on other “live” WSs and “fictitious” WSs yielded similar conclusions and are thus omitted for brevity. We report RMSE – (squared) root of measure squared errors – a commonly used evaluation metric in regression analysis. The errors are defined as the differences between the predicted values and the measurements (ground-truth). For each WS, we sent 10000 requests at a fixed arrival rate according to a Poisson process, and computed the average response time. This process was repeated at 9 - 11 different arrival rate val- ues. The data was then split into two sets (with details given below): data in the training set was supplied as input to each approach, and we computed the approach’s RMSE using its predictions and data in the validation set. In what follows, we report first results on interpolation. In this setting, parameters of our models are estimated on training data (i.e., different arrival rates) whose value ranges are the same as validation data. Then, we report results on extrapolation, where the ranges of training data and validation data are disjoint. Our evaluation results show that, while other techniques perform well on either interpolation or extrapolation,QN 3 performs the best in both cases. 4.2.1 Interpolation Errors In this set of experiments, we choose an odd number of data points. An example is the data in the first two columns in Table 4.2. We sort them according to the corresponding 90 Table 4.3: TPC WS interpolation errors λ measured QN M/M/1 M/M/m M/G/1 Poly 0.7 1.13700 0.10358 0.52988 0.20194 0.45465 0.10844 1.1 1.35670 0.04127 0.55485 0.26512 0.39676 0.10039 1.5 1.74110 0.08451 0.56063 0.30143 0.24797 0.26089 1.9 2.94880 0.14917 0.71227 0.47936 0.01250 0.70997 Exp Splines NN GP QN 3 1.13034 0.16458 0.01292 0.08168 0.01292 1.30735 0.16070 0.02052 0.04575 0.02052 1.37532 0.55890 0.04117 0.08720 0.04117 0.23740 1.77780 0.03971 0.65222 0.03971 Table 4.4: Average Interpolation Errors QN M/M/1 M/M/m M/G/1 Poly TPC 0.0946 0.5864 0.3120 0.2780 0.2949 AB 1.7508 2.3515 2.3141 1.0578 0.4948 Geocoding 0.1847 0.2154 0.2228 0.2417 0.0876 Weather 0.0430 0.2340 0.0939 0.1308 0.0846 Exp Spline NN GP QN 3 TPC 1.1026 0.6655 0.0286 0.2167 0.0286 AB 1.9163 0.5784 0.2451 1.2404 0.2451 Geocoding 0.1168 0.0787 0.0513 0.0659 0.0513 Weather 0.3587 0.1107 0.0878 0.3199 0.0878 arrival rates and then select the data points, alternating between the training and the validation data sets. Note that since the first and the last data points are always selected for training data, we are guaranteed that the arrival rates in the validation data are always within the range of the rates in the training data. 91 0.5 1 1.5 2 2.5 0 10 20 (i) QN arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (ii) M/M/1 arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (iii) M/M/m arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (iv) M/G/1 arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (v) Polynomial arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (vi) Exp arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (vii) Splines arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (viii) NN arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (ix) GP arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (x) Hybrid QN arrival rate response time (a) TPC 6 8 10 12 14 16 0 10 20 (i) QN arrival rate response time 6 8 10 12 14 16 0 10 20 (ii) M/M/1 arrival rate response time 6 8 10 12 14 16 0 10 20 (iii) M/M/m arrival rate response time 6 8 10 12 14 16 0 10 20 (iv) M/G/1 arrival rate response time 6 8 10 12 14 16 0 10 20 (v) Polynomial arrival rate response time 6 8 10 12 14 16 0 10 20 (vi) Exp arrival rate response time 6 8 10 12 14 16 0 10 20 (vii) Splines arrival rate response time 6 8 10 12 14 16 0 10 20 (viii) NN arrival rate response time 6 8 10 12 14 16 0 10 20 (ix) GP arrival rate response time 6 8 10 12 14 16 0 10 20 (x) Hybrid QN arrival rate response time (b) AB 2 4 0 2 4 6 (i) QN arrival rate response time 2 4 0 2 4 6 (ii) M/M/1 arrival rate response time 2 4 0 2 4 6 (iii) M/M/m arrival rate response time 2 4 0 2 4 6 (iv) M/G/1 arrival rate response time 2 4 0 2 4 6 (v) Polynomial arrival rate response time 2 4 0 2 4 6 (vi) Exp arrival rate response time 2 4 0 2 4 6 (vii) Splines arrival rate response time 2 4 0 2 4 6 (viii) NN arrival rate response time 2 4 0 2 4 6 (ix) GP arrival rate response time 2 4 0 2 4 6 (x) Hybrid QN arrival rate response time (c) Geocoding WS 2 4 0 2 4 (i) QN arrival rate response time 2 4 0 2 4 (ii) M/M/1 arrival rate response time 2 4 0 2 4 (iii) M/M/m arrival rate response time 2 4 0 2 4 (iv) M/G/1 arrival rate response time 2 4 0 2 4 (v) Polynomial arrival rate response time 2 4 0 2 4 (vi) Exp arrival rate response time 2 4 0 2 4 (vii) Splines arrival rate response time 2 4 0 2 4 (viii) NN arrival rate response time 2 4 0 2 4 (ix) GP arrival rate response time 2 4 0 2 4 (x) Hybrid QN arrival rate response time (d) Weather WS Figure 4.8: Interpolation 92 In Figure 4.8, we illustrate the fitted regression curves (draw in blue) along with the training data (using squares) and the validation data (using circles). In Table 4.3, we report the errors of the TPC WS at different arrival rates, and in Table 4.4, we report the average interpolation errors across all arrival rates for each of the 4 WSs - best performing techniques are shown in bold. Detailed results for the other 3 WSs have similar patterns to those reported in Table 4.3 and are thus omitted. From Table 4.3, we observe that the M/M/1 and M/M/m models give higher interpo- lation errors than the QN model in the TPC WS. This illustrates that the QN model is a better description of the TPC WS than the M/M/1 and M/M/m models, because the TPC WS was deployed on four physical servers, and hence the QN model, which assumes a multi-queue system, describes the TPC WS more accurately than the single-queue systems. From Table 4.4, we observe that while applying the QN model to the TPC, Geocod- ing, and Weather WSs had lower interpolation errors, it had higher interpolation errors than the M/G/1 model when it was applied to the AB WS. This is because (1) the AB WS was deployed on a single machine, in which the M/G/1 model had accurately described as a single-queue system; and (2) the QN model uses exponential service times, which is unlikely the case in our performance testing. The M/G/1 model, on the other hand, is able to more accurately capture the service time distribution, as it assumes a general ser- vice time distribution. This illustrates that the M/G/1 model is more accurate if the WS is a single-server WS. Since we do not know if a WS being tested is a single-server or multi-server system, these results indicate that we should use a combination of queueing models, because none of the queueing models outperformed the others. Now let us study the accuracy of using polynomials. We experiment with polyno- mials of different degrees to fit the results of the 4 WSs, and present the results with the lowest interpolation errors in Figure 4.8: an8 th -degree polynomial for the TPC WS, 93 a 12 th -degree polynomial for the AB WS, a 4 th -degree polynomial for the Geocoding WS, and a3 rd -degree polynomial for the Weather WS. From Table 4.4, the interpolation errors of using polynomials are similar to the queueing models, and outperform all four queueing models in the Geocoding WS. We conclude that the use of polynomials gives similar interpolation results as the queueing models. While the exponential model gives good predictions when the arrival rate is high, the predictions are lower than the measured response time when the arrival rate is low, which results in high errors. This is more visible in the TPC WS (Figure 4.8(a)(vi)), in which the model has underestimated the response time when λ E < 1.5. This is because the exponential function increases at a different rate than the measured data. In fitting the data, the regression algorithm “sacrifices” the accuracy of the response time at low workload intensity. The reason is that if the regression algorithm fits the data at low workload intensity, the rate that the response time goes up would be too low in the exponential model, and hence causing large errors when the arrival rate is high. If the regression algorithm fits the data at low arrival rates, the large errors at high workload intensity offset the small errors at low workload intensity. Therefore, the regression algorithm chooses to sacrifice the accuracy at low workload intensity. For this reason, we concluded the exponential function is not a good function to model WS response time, as it underestimated the response time when the WS was lightly-loaded. In our experiments, splines exhibited overfitting, which is characterized by decreases in response times even when the arrival rates increase (e.g., in Figure 4.8(b)(vii)). This undesirable property makes it not a good approach for interpolation. In general, from Table 4.4, NN and GP had lower interpolation errors than the queue- ing models, and NN had lower errors than GP. For example, in the Geocoding WS, the interpolation errors of NN and GP (0.0513 and 0.0659, respectively) were lower than the most accurate queueing model (QN, whose error is 0.1847). However, we observed 94 that the interpolation errors of GP were noticeably higher than those of the queueing models in the TPC WS. This is because GP used a straight line to connect data points at high workload intensities, causing high interpolation errors when λ E = 2.1 in Fig- ure 4.8(a)(ix). Despite the possibility of overestimation at high workload intensities, the results have indicated that NN and GP are better approaches than using queueing mod- els for interpolation. We consider accuracy in interpolation an advantage of standard regression approaches over queueing models. Note that the results ofQN 3 were the same as NN in this experiment. This can be explained as follows: since we supplied data at high workload intensities (i.e., λ E ≃ ˆ λ max E ), little or no new data is generated in Step 2b. Hence, the data supplied to NN in QN 3 in Step 2c was the same as the data supplied to NN when it was to be used alone. 4.2.2 Extrapolation Errors The next experiment studies how well the models predict response times beyond the range of arrival rates used in performance testing. As in the results presented in Chap- ters 4.1.2 and 4.1.2, the training set consists of data points corresponding to arrival rates in the lower 60%, and the evaluation set consisted of data points corresponding to arrival rates in the upper 40%. The results are depicted in Figure 4.9 and Table 4.5. As in the results in Chap- ter 4.1.2, the standard regression approaches predicted increases in response time at much slower rates in many cases. For example, the standard regression approaches predicted the response time staying flat, except in the Geocoding WS, in which poly- nomial and spline correctly predicted the response time increasing (Figures 4.9(c)(v) and 4.9(c)(vii)). This is because the response time had started to increase rapidly when λ E = 2.3. Polynomial and spline even predicted the response time to go down in the TPC, AB, and Weather WSs. This provides evidence that standard regression 95 Table 4.5: Extrapolation Errors ˆ λ measured QN M/M/m QN 3 TPC 1.70000 2.20000 0.04464 0.39427 0.02348 1.90000 2.94880 0.05449 1.67352 0.04137 2.10000 5.05620 0.98297 30.12275 0.98818 2.30000 21.17940 14.30680 - 14.29826 AB 13.50000 1.85298 2.66497 17.90088 3.87008 14.00000 4.46033 - - 58.95611 14.50000 10.85322 - - 676.94406 15.00000 26.64767 - - 2495.16183 Geocoding 2.85710 0.97300 0.14335 0.34484 0.08964 3.07690 1.17120 0.15661 0.48525 0.16062 3.63640 2.74360 0.39774 1.81043 0.41562 4.44440 6.72810 - 4.22958 277.91401 Weather 3.07690 0.60660 0.13125 0.10086 0.06777 3.63640 0.96270 0.03293 0.01556 0.14722 4.44440 1.68640 0.19728 1.23738 0.22925 5.00000 4.09690 1.55715 - 1.35149 approaches are not effective at extrapolating models in terms of handle inputs outside of the range of their training data. As discussed earlier, this is a major shortcoming, because it is often infeasible to do performance testing at high workload intensities, as discussed in Chapter 4.1.1. However, in order for these approaches to accurately predict response time at high workload intensities, they require data at high workload intensi- ties, which can overload the system being tested. The queueing models performed better than the standard regression approaches, as they predicted the rapid increase in the response time, when arrival rates were high. We observed that while the M/M/1 and M/G/1 models correctly predicted the rapid increase in response time, they were more pessimistic than the QN model. This is because they assume a single-server, whereas WSs are typically not. Thus, the two single-server models overestimated the utilization of the system, and hence gave pessimistic results. 96 0.5 1 1.5 2 2.5 0 10 20 (i) QN arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (ii) M/M/1 arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (iii) M/M/m arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (iv) M/G/1 arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (v) Polynomial arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (vi) Exp arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (vii) Splines arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (viii) NN arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (ix) GP arrival rate response time 0.5 1 1.5 2 2.5 0 10 20 (x) Hybrid QN arrival rate response time (a) TPC 6 8 10 12 14 16 0 10 20 (i) QN arrival rate response time 6 8 10 12 14 16 0 10 20 (ii) M/M/1 arrival rate response time 6 8 10 12 14 16 0 10 20 (iii) M/M/m arrival rate response time 6 8 10 12 14 16 0 10 20 (iv) M/G/1 arrival rate response time 6 8 10 12 14 16 0 10 20 (v) Polynomial arrival rate response time 6 8 10 12 14 16 0 10 20 (vi) Exp arrival rate response time 6 8 10 12 14 16 0 10 20 (vii) Splines arrival rate response time 6 8 10 12 14 16 0 10 20 (viii) NN arrival rate response time 6 8 10 12 14 16 0 10 20 (ix) GP arrival rate response time 6 8 10 12 14 16 0 10 20 (x) Hybrid QN arrival rate response time (b) AB 2 4 0 2 4 6 (i) QN arrival rate response time 2 4 0 2 4 6 (ii) M/M/1 arrival rate response time 2 4 0 2 4 6 (iii) M/M/m arrival rate response time 2 4 0 2 4 6 (iv) M/G/1 arrival rate response time 2 4 0 2 4 6 (v) Polynomial arrival rate response time 2 4 0 2 4 6 (vi) Exp arrival rate response time 2 4 0 2 4 6 (vii) Splines arrival rate response time 2 4 0 2 4 6 (viii) NN arrival rate response time 2 4 0 2 4 6 (ix) GP arrival rate response time 2 4 0 2 4 6 (x) Hybrid QN arrival rate response time (c) Geocoding WS 2 4 0 2 4 (i) QN arrival rate response time 2 4 0 2 4 (ii) M/M/1 arrival rate response time 2 4 0 2 4 (iii) M/M/m arrival rate response time 2 4 0 2 4 (iv) M/G/1 arrival rate response time 2 4 0 2 4 (v) Polynomial arrival rate response time 2 4 0 2 4 (vi) Exp arrival rate response time 2 4 0 2 4 (vii) Splines arrival rate response time 2 4 0 2 4 (viii) NN arrival rate response time 2 4 0 2 4 (ix) GP arrival rate response time 2 4 0 2 4 (x) Hybrid QN arrival rate response time (d) Weather WS Figure 4.9: Extrapolation 97 In addition, the M/G/1 model was unable to predict response time in the TPC WS at high workload intensity (Figure 4.9(a)(iv)). Upon closer examination of the estimated parameters, in this particular example, the regression algorithm estimated the service rate to be very high (μ ≃ 4000), which was much larger than other queueing models (e.g., μ = 2.31 in the M/M/1 model). This indicates that the flexibility in the service time distribution of the M/G/1 model may cause poor extrapolation results, and therefore the M/G/1 model should be used along with other queueing models in extrapolation. Qualitatively, the results of the M/M/m model were comparable to those of the QN model: the results were similar in the AB WS, while the M/M/m model was more optimistic in the Geocoding WS, and it was more pessimistic in the TPC and Weather WSs. To compare the two models more closely, we tabulate the extrapolation errors in Table 4.5. An “–” in the table indicates that the model predicts the system as being unstable at that arrival rate. As we can see from the table, the QN model had lower extrapolation errors than the M/M/m model in all WSs, except for the Geocoding WS, in which the QN model was more pessimistic, and considered the system as unstable when λ E = 4.44. This indicates that the QN model is a better model than the M/M/1, M/G/1, and M/M/m models. The extrapolation results ofQN 3 were comparable to the results of QN, as we used QN for extrapolation. These results indicate that QN 3 has lower extrapolation errors than NN (which is unable to extrapolate), and that the extrapolation results ofQN 3 are comparable to those of using QN alone. Summary: Combining our results in Tables 4.4 and 4.5, we observe thatQN 3 can per- form well at both, interpolation and extrapolation tasks and is better than using standard regression approaches or queueing models alone. 98 4.3 Conclusion It has become more common to integrate third-party software components for creation of new systems; hence it is important to understand performance characteristics of third- party components. To reduce the cost of performance testing, we estimate the perfor- mance of third-party components during high workloads using data collected at low workloads, and apply our approach to performance estimation of WSs. Our hybrid approach combines the low interpolation errors of standard regression analysis with the low extrapolation errors of queueing models for response time prediction. Our valida- tion results indicate that the hybrid technique is accurate, as compared to using standard regression approaches or queueing models alone. Thus, we believe that our technique can be used to improve system that involve third-party components. For instance, our approach can be utilized by service selection techniques [16, 48]. In this context, a WS can be composed dynamically, where performance characteristics can be part of the selection criteria. Our approach can support such techniques by providing performance estimation information for a given WS, i.e., so that such approaches can make more informed decisions. 99 Chapter 5 Parameter Estimation in Quality Analysis of Software Components As we discussed in Chapters 1 and 2, a major obstacle in design-time software quality analysis techniques is that it is difficult to reliably determine a software system’s opera- tional profile, because the implementation is not available. Existing approaches simply assume the operational profile, which describes the system’s or component’s usage, is available, and have not adequately addressed this problem. In this chapter, we focus on operational profile estimation in component reliabil- ity prediction. Estimating the operational profile for performance prediction requires integrating information about the performance of underlying firmware (e.g., operation systems, middleware, and hardware) into the analysis, which is part of our future work (see Chapter 6.2.1). In Chapter 5.1, we describe a component-level reliability prediction framework [19], and highlight the parameters it requires. In Chapter 5.2, we describe sources of infor- mation that are available during the design stage, and describe how they can be used in generating operational profiles. Finally, in Chapter 5.3, we compare our results with results obtained from an implementation, which are used as “ground truth”. While we focus on operational profile estimation at the component level, we believe what we pro- pose in this chapter also applies at the system level. 100 Idle Estimating Sensor Data Turning Left Turning Right Going Straight Updating Database Start / Estimate Sensor Data Turn [Deg = 0] / Walk straight Turn [0 < Deg < 180] / Turn Right Robot stopped / Update Database Finished turning / Walk straight Finished turning / Walk straight Finished updating database / Estimate Sensor Data Goal satisfied / idle Turn [-180 < Deg < 0] / Turn Left Figure 5.1: Dynamic Behavior Model of the Controller Component 5.1 Component Reliability Prediction Framework Before we describe our operational profile modeling process, we first describe a com- ponent that we use as a running example throughout this chapter. The example that we use in this chapter is that of the Controller component of the SCRover (depicted in Figure 5.1), a third-party robotic testbed based on NASA JPL’s Mission Data System framework [15]. This testbed contains requirements and architectural documentation as well as a simulated robotic platform. SCRover is the implemented prototype of a robot that is capable of performing different missions such as wall-following, turning at a given angle, moving a fixed distance in a given direction, and identifying and avoiding obstacles. Here, we focus on the behavior of the robot in a wall-following mission: it should maintain a certain distance from the wall; if it moves too far from or too close to the wall, or encounters an obstacle, it has to turn in an appropriate direction to correct this. As soon as the state of the robot changes, it has to update a database with its new state. Here, we describe a component reliability prediction framework in [19] that we apply our operational profile estimation approach to. For ease of exposition, we present this framework as a three-phase process depicted in Figure 5.2. 101 Phase 1 Determining States Phase 2 Determining Transitions Phase 3 Computing Reliability A rc hitec tural M odels R eliability M odel Comp Reliability Granularity of Architectural Models Information Sources S tates of R eliability M odel Legend artifact input/output of a phase has an impact on Figure 5.2: Software Component Reliability Prediction Framework In Phase 1, we determine an appropriate set of statesS of the component’s reliability model by leveraging architectural models. In Phase 2, we determine the values of the transition matrix P of the reliability model by leveraging information available at the architectural level. Finally, in Phase 3, we compute component reliability by applying standard techniques [70]. We briefly describe each step in the remainder of this section. We focus on Phase 2 in this chapter. This involves estimating an operational profile of a component, which is represented by transition probabilities in the reliability model. Details of Phase 2 is given in Chapter 5.2. Phase 1: Determining States In Phase 1 of the component reliability prediction framework, we determine the set S by leveraging architectural models and performing standard architectural analyses [47]. There are two types of states in the set S that need to be determined: states corresponding to component’s normal behavior,B, and to faulty behavior,F . We leverage a component’s dynamic behavior model [61] in order to determine behavioral states (setB) of our model. A dynamic behavior model of a software compo- nent is often depicted by a state transition diagram that shows the internal states of the component, the transitions between them, and the event/action pairs that govern these transitions (e.g., as in UML’s statechart diagrams). The dynamic behavior model of the Controller component is illustrated in Figure 5.1 and consists of six states: Idle (B 1 ), 102 Idle Estimating Sensor Data Turning Left Turning Right Going Straight Updating Database B 1 B 2 B 3 B 4 B 5 B 6 F 2 F 1 Legend R ecovery T ransitions B ehavioral T ransitions F ailure T ransitions F ailure S tates B ehavioral S tates Figure 5.3: Reliability Model of the Controller Component Estimating Sensor Data (B 2 ), Turning Left (B 3 ), Turning Right (B 4 ), Going Straight (B 5 ), and Updating Database (B 6 ). We map the states of the dynamic behavior model directly to the behavioral states of the Markov chain reliability model (Figure 5.3). To determine the failure states (setF ) we analyze the architectural models of a com- ponent. The multi-view approach to modeling a component described in [61] allows for the detection of architectural inconsistencies. Standard techniques for architectural anal- ysis [47] can be adopted to this end. The results of architectural analyses can be lever- aged to represent defects, which contribute to the unreliability of the component. Once we have identified the defects, we designate a failure state for each class of defect. For example, we identified two defects in the Controller component in Figure 5.1: Defectd 1 affects the Estimating Sensor Data state, and Defectd 2 affects the Turning Left state. We model the two defects as different classes, and designate two failure statesF ={F 1 ,F 2 } to correspond to the Defectsd 1 andd 2 respectively. Phase 2: Determining Transitions Values of the transition matrix P are determined in this phase using various sources of available information. Given the states, determination of transition probabilities 103 between these states remains a challenge. A critical difficulty here is the lack of informa- tion about the operational profile and failure information of the component. We address this problem by (a) identifying and classifying the utility of information sources avail- able during architectural design and (b) combining the use of such sources with a hidden Markov model (HMM)-based approach that was proposed in [60]. The description of information sources typically available at the architecture level and the details of deter- mining transition probabilities are described in Chapter 5.2. Phase 3: Computing Reliability Once the states and the transition probabilities of the Markov chain reliability model are determined, in Phase 3 of the component reliability prediction framework, the model is solved to compute a reliability prediction. Let π(i)(t) be the probability that a component is in state i at time t, where i ∈ B S F . As t goes to infinity (i.e., as the component operates for a long time), these probabilities converge to a stationary distribution [69], ~ π = [π(F 1 ),...,π(F M ),π(B 1 ),...,π(B N )] (5.1) which is uniquely determined by the following equations [69]: 1 P i∈S π(i) = 1 ~ π =~ πP (5.2) This system of linear equations can be solved using standard numerical techniques [70]. The component’s reliability can then be defined as the probability of not being in a failure state: 1 It is not difficult to show that for our reliability model this limiting distribution exists and is a station- ary one [69]. 104 R = 1− M X i=1 π(F i ) (5.3) As an illustrative example, let the transition matrixP , estimated using the approach to be described in Chapter 5.2, is as follows: F 1 F 2 S 1 S 2 S 3 S 4 S 5 S 6 0.8 0 0.2 0 0 0 0 0 0 0.2 0.8 0 0 0 0 0 0 0 0 1 0 0 0 0 0.05 0 0.019 0 0.076 0.0095 0.8455 0 0 0.04 0 0 0 0 0.96 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 (5.4) After solving Equation (5.2), we have ~ π = [π(F 1 ),π(F 2 ),π(S 1 ),π(S 2 ),π(S 3 ),π(S 4 ),π(S 5 ),π(S 6 )] = [0.0765,0.0012,0.0220,0.3061,0.0233,0.0029,0.2840,0.2840] Thus, the reliability of the Controller component is R = 1−(0.0765+0.0012) = 0.9223 5.2 Operational Profile Modeling Estimating the transitions in the reliability model involves estimating an operational profile of a component. The transitions in our reliability model, corresponding to the elements of the transition probability matrixP , can be viewed as being of three different types: (1) behavioral; (2) failure; and (3) recovery. Behavioral transitions are between 105 two behavioral states; failure transitions are from a behavioral state to a failure state; and recovery transitions are from a failure state to a behavioral state. The process of determining the probabilities of each transition may be different and depends on the information available to the architect. We identify the following information sources that may be available at the architec- tural level. • Domain Knowledge. Information about a component may be obtained from a domain expert. The main difficulty is that such an expert may not be available. Even when an expert is available, this information source is inherently subjec- tive and the information may be inaccurate, either due to the complexity of the component or to unexpected operational profiles of that component. For example, consider estimating the outgoing transition of the Estimating Sensor Data state in the Controller component in Figure 5.3. To determine the transitions to Turning Left, Turning Right, and Going Straight states, we can ask the expert to estimate the probability that a robot turns. If the expert predicts the robot to be going straight most of the time, we can estimate the transition probability from Estimating Sensor Data to Going Straight to be larger than the transition proba- bility going to the Turning Left and Turning Right states. • Requirements Document. The requirements for a given component, or the over- all system, will frequently contain the typical use cases for that component. Fur- thermore, the requirements may be explicit in terms of how a component is to respond to exceptional circumstances such as failures. This information can be leveraged to estimate at least a subset of the above transition probabilities. 106 For example, in SCRover’s requirements document [4], one of the requirements states the acceptable time to reboot SCRover in case of a software crash. This information can be used in estimating the recovery probabilities. • Simulation. Simulation of a component’s architectural models [31] has the poten- tial of handling components with complex state spaces because the process can be automated. However, simulation techniques still require information related to a component’s operational profile, which would have to come from other sources. For example, relying on the domain expert on estimating the parameters of a com- plex components may be error-prone, because of the complexity of the compo- nent. The domain expert, on the other hand, may be able to suggest possible operational profiles at a higher-level, in which each higher-level event may cor- respond to multiple transitions in the component’s simulation model. We explore this technique in our evaluation in Chapter 5.3.1. • Functionally Similar Component. If a functionally similar component exists, we can use its runtime behavior to estimate the operational profile of the compo- nent under consideration. It is also possible to combine information from multiple functionally similar components. For example, if we are building a word process- ing component with drawing capabilities, we can leverage runtime information of an existing word processor to explore the behavior corresponding to word pro- cessing functionality, and the runtime information of an existing drawing tool to explore the behavior corresponding to drawing functionality. We note that several of the above information sources may be available simultane- ously. A strength of our approach is that we can use them in a complementary manner in order to mitigate their individual disadvantages. 107 Determining Behavioral Transition Probabilities. Let us define q ij to be the proba- bility of going from behavioral state B i to state B j . The central question here is how to determine the numerical value of q ij . We address this in the context of information sources described above and use the Controller component for illustration. Since in the Controller component the transitions out of stateB 2 are the more interesting ones, we will use them in our examples. If domain knowledge is available, we can focus on the subset of possible operational profiles corresponding to the provided domain knowledge. For instance, the expert may suggest that in the Controller example the robot moves straight most of the time. Then, we can eliminate the operational profiles corresponding to high probabilities of turning left and right. When simulation data of a component’s architectural models or from a functionally similar component is available, we can use it to obtain the behavioral transition probabil- ities. While a standard Markov-based approach would assume that there is a one-to-one correspondence between observed events in the simulation (or execution logs) and tran- sitions in the model, such correspondence may not exist. This is especially true in the case of a functionally similar component. For example, in the Controller component from Figure 5.1, when we observe the Turn event, we cannot tell whether a transition occurred to the Turning Left, Turning Right or Going Straight states from the Estimating Sensor Data state. Our work in [60] suggests that in such a case we can use hidden Markov models (HMMs) [55] to obtain behavioral transition probabilities. An HMM is defined by a set of states S = {S 1 ,S 2 ,...,S N }, a transition matrix A = {a ij } representing the probabilities of transitions between states, a set of observationsO ={O 1 ,O 2 ,...,O M }, and an observation probability matrix E = {e ik }, which represents the probability of observing event O k in state S i . The set S of the HMM comes from Phase 1. The 108 event/action pairs of the dynamic behavior model become observations of the HMM (setO). Once we have determined the set of states S and observations O of the HMM, we can apply the Baum-Welch algorithm [55] to estimate the transition probabilities. The inputs to the algorithm are (1) a starting point, corresponding to initializing the Matrices A and E, and (2) training data for parameter estimation. The Matrices A and E can be initialized with random values [55] or they may be initialized more intelligently, by utilizing architectural models. Since the Baum-Welch algorithm is a local optimization technique, the starting point (given by the MatricesA andE) can affect the accuracy of the output. Therefore, to obtain an accurate operational profile, it is important to start at a “good” starting point. We observe that, typically, it is unlikely that all event/action pairs can happen in all states. Thus, it is possible to determine which entries in the MatrixE are zero (i.e.,e ik = 0 when eventk cannot happen in stateS i ), and fill in random values for other possible event/action pairs. The information on possible event/action pairs at the states is available from the component’s architectural models. For example, in the Controller component in Figure 5.1, the Start event is not possible when the component is in state Turning Left. Therefore, we can set the corresponding entry in the MatrixE to 0. Training data for HMMs is obtained by collecting measurements using an already built system in an existing operational environment. However, since we are doing this at the architectural level, we needed to find a novel approach to generate training data. To this end, we utilized the available information sources: a combination of expert advice, system requirements, simulation traces (when simulation of architectural models is available), or execution traces (when a functionally similar component is available). Given an initial HMM constructed as described above, the Baum-Welch algorithm con- verges on a Markov model that has a high probability of generating the given training 109 data. The underlying Markov model of the HMM, with transition matrixA ∗ , obtained after running the Baum-Welch algorithm represents the behavioral transition probabili- ties for the component, i.e.,q ij =a ∗ ij for alli andj. We note here that the training data does not include any failure or recovery behavior. This assumption enables us to focus on behavioral transition probabilities. We will incorporate failure and recovery behavior next, based on the defect classification we performed in Phase 1. Determining Failure and Recovery Probabilities. We definef ij to be the probability that a defect of classj occurs while the component is in stateB i . In other words, in the reliability model, f ij is the probability of going from a behavioral state B i to a failure state F j . Furthermore, we define r kl to be the probability that the component enters state B l after recovery from a defect of class k. 2 For a given pair of behavioral and failure states, B i and F j , we can determine whether f ij is non-zero: f ij > 0 if Defect d j may occur in stateB i . This would be determined as part of the architectural analysis process, as described in Phase 1. Also, for each defect classD k , we can determine (e.g., from a requirements document or domain expert) what is a reasonable set of states in which the component can re-start after recovery from failure. In other words, for each behavioral stateB l , we can determine whetherr kl is non-zero: r kl > 0 if the component restarts in StateB l after recovery from a failure caused by Defectd k . In the Controller component from Figure 5.3 defects of classesD 1 andD 2 can occur in statesB 2 andB 3 , respectively. Thus, we add transitions (with non-zero probabilities) fromB 2 toF 1 , and fromB 3 toF 2 . In this example, recovery from any failure returns the component back 2 We have assumed that a component will recover from a failure due to one defect before experiencing a failure due to another defect. This assumption may not be reasonable in the case of multi-threaded com- ponents. We treat such complex components as systems and apply our system-level reliability prediction technique in Chapter 3 on them. 110 to stateB 1 . The self-transitions atF 1 , andF 2 represent the component being in a failure state until recovery is complete. Knowing which failure (f ij ) and recovery (r kl ) transition probabilities are non-zero is not sufficient. To complete the reliability model, we need to assign specific val- ues to these probabilities. Estimating such failure-related information is challenging, because software engineers most often design components for correct behavior, infor- mation related to failure are limited. One approach is to explore the design space, i.e., to vary the failure and recovery probabilities and observe the resulting effects on the component’s reliability prediction. We demonstrate this approach in Chapter 5.3. This allows us to explore how sensitive the component’s reliability is to each of the defect classes and to the recovery process from each defect class. We could take advantage of the available information sources to reduce the design search space once again. For instance, a domain expert could help the reliability modeler determine how difficult it is to recover from a failure due to defect classD k . In turn, this would indicate the values ranges forr kl the reliability modeler should consider. 5.3 Evaluation of Operational Profile Estimation In this section, we validate and support several claims we have made throughout this chapter. This includes (a) showing the effectiveness of our approach when differ- ent sources of information are available, and (b) showing the predictive power and resiliency to changes in parameters identified in Chapter 5.2. Since our approach is intended to be used at design-time, a direct comparison of reliability numbers predicted 111 by the approach and those measured at runtime would not be meaningful. 3 Design- time approaches are intended for relative comparisons between possible fault mitiga- tion choices rather than (literally) accurate reliability predictions. Hence, a more useful measure here is one that in some manner reflects a confidence in the prediction and sensitivity to changes in the component and reliability model-related parameters. In our evaluation, we first compare the sensitivity of our results to the different infor- mation sources (recall Chapter 5.2). Next, we show how the estimates of operational profiles affect the predicted component reliability values. Finally, we study sensitivity of the results obtained using component models of different granularities. We have evaluated our approach in the context of a large number of components whose architectural models we were able to obtain or develop from scratch. Examples include components from • a cruise control system [60]; • the SCRover robotic testbed [15], developed by a separate research group at USC in collaboration with NASA’s JPL; • MIDAS [45], a large, embedded system developed as part of a separate collabo- ration between USC and Bosch; • DeSi [44], an architectural design and analysis tool developed as part of a separate research project at USC; and • a large library of systems developed in USC’s undergraduate software engineering project course over the past decade [9]. 3 For example, at implementation time, it may be appropriate to evaluate a system’s reliability using the five 9’s standard. However, this is not typically meaningful at design time. 112 In order to observe the trends in our approach’s reliability predictions on sufficiently large numbers of components with controlled variations, as part of our evaluation we have also synthesized many state-based models for “dummy” components, and per- formed evaluations on those models. Our approach has consistently yielded qualitatively similar results for all of the above cases. To illustrate these results and highlight the approach’s key properties, par- ticularly its sensitivity, we will use SCRover’s Controller component in Chapter 5.3.1, as well as a component from the DeSi environment [44] in Chapter 5.3.2. Results from a number of other components we have evaluated are qualitatively similar, and they are available in [2]. 5.3.1 Evaluation of SCRover’s Controller In this section, we present sensitivity analysis of the Controller component we used as an example throughout this chapter. We study the sensitivity of our results to (a) different information sources identified in Chapter 5.2, (b) changes to operational profile, and (c) models of different granularities. To validate our results, we constructed a detailed behavioral (control-flow) model of the Controller from a prototype implementation of the component that had existed previously. This implementation-level model is based on a directed graph that represents the component control structure. We then built a Markov model by leveraging this graph, where a node in the graph translates to a state in the Markov model. This is analogously to what existing approaches have done at the system level (e.g., [20, 35]). Based on the available component maintenance records, we injected defects into the code to simulate failure behavior. We should note that we were not interested in implementation-level faults in this model (e.g., an implementation- level defect that may cause a division by zero error), but only in architectural defects. To ensure a fair comparison with our architectural-level model, we assume there is no 113 implementation-level defects, as these defects are not modeled at the architectural level. We then introduced failure states and transitions in the control structure to represent erroneous behavior corresponding to the injected defects. The results obtained from this model were used as “ground truth” in a large number of experiments. As described in Chapter 5.2, our approach allows for multiple failure classes. How- ever, for clarity of exposition of results, in what follows, experiments are performed using one active class of defect at a time. In the presented experiments, this is done by setting probabilities of failures associated with the remaining defect classes to zero. That is, these experiments use only single failure state models, where the failure state corresponds to the class of defect being studied. We have also performed similar experi- ments where failure probabilities associated with defect classes other than the one under consideration are held constant at non-zero values — these correspond to multiple fail- ure state models. The results of those experiments showed qualitatively similar trends to the results presented below and are available in [2]. Sensitivity to Information Sources We study the sensitivity of our approach to different information sources. The following sources of component usage information were considered in this evaluation. Here, we present the parameter values we used in generating Figure 5.4. We have performed similar analyses with different inputs, and we obtained qualitatively similar results. • Case (1) - Domain Expert - We were given the architectural models, and focus on operational profiles that the expert suggests. • Case (2) - Simulation - We were provided with SCRover’s requirements, based on which we specified a sequence of high-level events to simulate the dynamic behavior model of Controller shown in Figure 5.1. We obtained training data by 114 0.5 1 0.6 0.7 0.8 0.9 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.6 0.7 0.8 0.9 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.6 0.7 0.8 0.9 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.6 0.7 0.8 0.9 1 Code−level Model Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (a) Defectd 1 0.5 1 0.8 0.85 0.9 0.95 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.8 0.85 0.9 0.95 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.8 0.85 0.9 0.95 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.8 0.85 0.9 0.95 1 Code−level Model Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (b) Defectd 2 Figure 5.4: Analysis of sensitivity to information sources of SCRover’s Controller leveraging the simulation trace and applied our HMM-based approach to obtain behavioral transition probabilities (recall Chapter 5.2). • Case (3) - Functionally similar component - As a functionally similar component we selected a robot that walks from one point to another, and avoids obstacles along the way. We then used an operational profile of this component in our reli- ability prediction of the Controller component using our HMM-based approach described in Chapter 5.2. In one set of experiments, we were interested in the sensitivity of the component reli- ability when the probabilities of recovering from defects change. To this end, we fixed the failure probability, and varied recovery probabilities from 0.2 to 1.0 in 0.2 incre- ments. We repeated the experiments for different failure probabilities. In Figure 5.4(a) we introduced Defect d 1 , affecting the reliability of the Estimating Sensor Data state, 115 and in Figure 5.4(b) we introduced a Defect d 2 , affecting the reliability of the Turning Left state. Not surprisingly, we observe that the trends conform to our expectations in all three cases: as recovery probability increases, the reliability of the component increases since the time taken to recover from a failure becomes shorter. Moreover, as failure probability increases, component reliability decreases. We note that in Figure 5.4(b) (Defectd 2 ), the slope of the curves in Case (1), where we have domain knowledge, is different from other cases. The reason is that our expert incorrectly predicted the robot to be walking mostly straight: in the prototype the robot walked at an angle most of the time, such that occasionally it was too far from, or too close to, the wall, and had to turn. As a result, the robot spends less time in the Turning Left state of the model generated based on our expert’ predictions for Case (1) than it does in the Turning Left state of the actual system. Hence, Defectd 2 had less impact on the component’s reliability. Sensitivity to Operational Profile To evaluate our reliability approach’s sensitivity to changes in a component’s opera- tional profile, one approach we have taken is to fix the transition probabilities among all states of the component’s reliability model (recall Figure 5.3), except for a specific set. By varying those remaining transition probabilities, we can observe the model’s response. In this experiment, we consider the ranges of Controller’s reliability values when the probability of going from state Estimating Sensor Data to state Turning Left (recall Figure 5.1) varies between 0 and 0.85, and adjust the probability of going to the Going Straight state accordingly. We fix the probability of going from state Estimating Sensor Data to state Turning Right and to state Idle at 0.1 and 0.05, respectively. All 116 0.7 0.8 0.9 1 (ii) (i) (a) fp = 0.05,rp = 0.20 0.7 0.8 0.9 1 (ii) (i) (b) fp = 0.05,rp = 0.40 0.7 0.8 0.9 1 (ii) (i) (c) fp = 0.05,rp = 0.60 0.7 0.8 0.9 1 (ii) (i) (d) fp = 0.05,rp = 0.80 0.7 0.8 0.9 1 (ii) (i) (e) fp = 0.05,rp = 1.00 0.7 0.8 0.9 1 (ii) (i) (f) fp = 0.10,rp = 0.20 0.7 0.8 0.9 1 (ii) (i) (g) fp = 0.10,rp = 0.40 0.7 0.8 0.9 1 (ii) (i) (h) fp = 0.10,rp = 0.60 0.7 0.8 0.9 1 (ii) (i) (i) fp = 0.10,rp = 0.80 0.7 0.8 0.9 1 (ii) (i) (j) fp = 0.10,rp = 1.00 0.7 0.8 0.9 1 (ii) (i) (k) fp = 0.15,rp = 0.20 0.7 0.8 0.9 1 (ii) (i) (l) fp = 0.15,rp = 0.40 0.7 0.8 0.9 1 (ii) (i) (m) fp = 0.15,rp = 0.60 0.7 0.8 0.9 1 (ii) (i) (n) fp = 0.15,rp = 0.80 0.7 0.8 0.9 1 (ii) (i) (o) fp = 0.15,rp = 1.00 0.7 0.8 0.9 1 (ii) (i) (p) fp = 0.20,rp = 0.20 0.7 0.8 0.9 1 (ii) (i) (q) fp = 0.20,rp = 0.40 0.7 0.8 0.9 1 (ii) (i) (r) fp = 0.20,rp = 0.60 0.7 0.8 0.9 1 (ii) (i) (s) fp = 0.20,rp = 0.80 0.7 0.8 0.9 1 (ii) (i) (t) fp = 0.20,rp = 1.00 Figure 5.5: Analysis of sensitivity to operational profiles of SCRover’s Controller other parameters in the operational profile are fixed. This corresponds to estimating the probability that a robot turns left. We reiterate that the same analysis was performed by varying transition probabilities between other states, and yielded qualitatively similar results. We varied the failure and recovery probabilities (as in Chapter 5.3.2), and obtained a reliability range for each failure-recovery probability pair. We did this for the two defects we introduced earlier. Figure 5.5 depicts our results. Each graph in this figure represents a case with a given failure (fp) and recovery (rp) probability. In each graph, the horizontal bars represent the range of reliability values obtained by varying the probability of going from state Estimating Sensor Data to state Turning Left between 0 to 0.85. The bars labeled (i), and (ii) represent the Defects d 1 and d 2 , respectively. We observe that the reliability ranges are larger when failure probabilities increase and/or recovery probabilities are lower. This corresponds to the graphs concentrated toward the left and bottom portions 117 of Figure 5.5. This means that, when failures occur more frequently and/or are harder to recover from, the component’s reliability is more sensitive to the specifics of the operational profile. Another observation is that Controller’s reliability was more sensitive to Defectd 2 . This is because d 2 directly affect the two states on which we focused in this particular scenario. More generally, by varying operational profiles, we can identify which defects most prominently affect the resulting reliability values across these operational profiles. If a defect is shown to increase the model’s sensitivity to multiple operational profiles, software designers may want to focus their attention particularly on eliminating that defect in order to achieve the greatest improvement in the component’s reliability. Sensitivity to Model Granularity Software architectural models may vary widely in terms of the amount of detail they contain. Different models are produced at different points during the system’s devel- opment, and may be intended for different stakeholders. On the average, it is possible to produce high-level models earlier than detailed ones during a system’s development; it is also easier to discover and mitigate any design flaws in them. On the other hand, a high-level model may not be representative of a system’s or component’s complexity and, as we elaborate below, it may obscure defects that can easily creep in during design refinement and implementation. In our case, the objective is to assess the impact that the amount of detail in a compo- nent’s architecture-level model has on the component’s reliability calculated using our approach. To this end, we have performed sensitivity analyses on component models of varying granularity levels. Figure 5.7 shows the results of such analysis using Defectd 1 for Case (1) discussed in Chapter 5.3.1. We obtained qualitatively similar results when we introduce Defectd 2 , 118 - / G o S traight R es et / Initializ e O bs tac le A head / T urn Left T oo C los e / T urn Left T oo F ar / T urn R ight O bs tac le A head / T urn Left T oo C los e / Turn Left T oo F ar / T urn R ight O bs tac le A head / T urn Left T oo C los e / T urn Left T oo Far / T urn R ight - / G o S traight - / G o S traight R es et / Initializ e Init Going Straight T urning (a) 3-state model Initialize Plan M ission F etch Sensor D ata Estim ate Sensor D ata T urning Left C hoose Action Going Straight C heck M ission Status Select T urn Param eters C alculate M ission F easibility T urning R ight U pdate D atabase (b) 12-state model Figure 5.6: Dynamic behavior models of the Controller component at two different levels of granularity. or use different information sources. The six-state model is the example we have used throughout this paper (depicted in Figure 5.1). The three-state and twelve-state models are depicted in Figures 5.6(a) and 5.6(b), respectively. Note that the transition labels are omitted from Figure 5.6(b) for clarity. We observe that in both cases, when recovery probability is fixed and failure prob- ability increases from 0.05 to 0.2, reliability values are most sensitive in the three-state model. The other observation is that the three-state model is more sensitive than the six-state model to recovery probability, while the twelve-state model is least sensitive. This trend can be explained as follows. Failures corresponding to Defect d 1 only occur in the Turning Left state in the twelve-state model. On the other hand, the time 119 0.2 0.4 0.6 0.8 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 3 states Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.2 0.4 0.6 0.8 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 6 states Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.2 0.4 0.6 0.8 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 12 states Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 Figure 5.7: Analysis of sensitivity to models of different granularities of SCRover’s Controller spent in the Turning Left state in the six-state model also includes the time spent in the Select Turn Parameters state in the twelve-state model. As a result, the robot spends more time in the Turning Left state in the six-state model than in the twelve-state model, hence the sensitivity is higher in the six-state model. Analogously, since the time spent in the Turning state in the three-state models includes the time spent in Estimating Sen- sor Data and Updating Database states in the six-state model, the sensitivity of the three-state model is higher than that of the six-state model. Note that in our experiments a model with fewer states gives more pessimistic results. We argue that, in general, it is (a) desirable for a approach such as ours to provide more conservative reliability predictions given less information and (b) neces- sary to do so consistently. This will both sensitize engineers to the potential problems the system may eventually exhibit and provide confidence in the approach’s predictive power. 5.3.2 Evaluation of DeSi The SCRover’s Controller component may be too small as a representative of real-world software components. Therefore, in order to study our approach more comprehensively, 120 init initializing data structures starting viewer waiting for deployment config deployment config loaded waiting for command creating host creating physical links creating component creating logical connector deploying validating model starting blank model adding host adding component adding connector getting data from viewer performing updates invoking effector invoking monitor init variables rank unmapped hosts finished mapping rank unmapped components (a) 24-state model init initializing data structures starting viewer waiting for config loading a config waiting for command running greedy algorithm processing middleware commands modifying model generating config manually creating a config (b) 12-state model init waiting for config creating a config waiting for command processing command (c) 5-state model Figure 5.8: Architectural models of the DeSiController component at different levels of detail we also perform sensitivity analysis on DeSi. DeSi is an environment that supports specification, manipulation, and visualization of deployment architectures for large dis- tributed systems. It consists of three major subsystems: a reactive DeSiModel subsystem that stores information about the current deployment; a DeSiView subsystem that visu- alizes information in the DeSiModel subsystem; and a DeSiController subsystem that generates deployment plans based on constraints set by the user, allows users to fine- tune parameters of a generated deployment, and invokes redeployment algorithms [44] that update the DeSiModel. To demonstrate our approach’s ability to handle components of large scale and complexity, we treat each subsystem as a single component. 121 Table 5.1: Defects injected in DeSiController Defect Description Affected State d 1 Mismatched signatures Waiting for command d 2 Missing model validation rules in design doc- ument Validating model d 3 Mismatch between the dynamic behavior model and interaction protocol Finished mapping d 4 Static behavior pre-/post-condition mismatch with event guards in dynamic behavior model Starting blank model DeSi served as a particularly useful evaluation platform because it was designed and implemented from an architecture-centric perspective: it contained clearly identifiable components, which composed hierarchically into higher-order components (i.e., DeSi subsystems), and was accompanied by existing architectural models. For consistency, we show the evaluation results of applying our approach to the DeSiController compo- nent only. A slightly abridged dynamic behavior model of DeSiController is depicted in Figure 5.8(a). To evaluate our approach in a controlled manner, we injected architec- tural defects into DeSi. Table 5.1 summarizes the subset of defects used in the results presented in the remainder of this chapter. As in the evaluation of SCRover’s Controller, to validate our results, we built sepa- rately a reliability model from the existing implementation of the DeSiController com- ponent, analogous to what we have done in Chapter 5.3.1. Again, we used the results obtained from this implementation-based model as the “ground truth” in our evaluations. Sensitivity to Information Sources As in the evaluation using SCRover’s Controller component in Chapter 5.3.1, we per- formed sensitivity analysis on models built using different information sources. We fixed the failure probabilities, and varied recovery probabilities from 0.1 to 1.0, at 0.1 122 intervals. We repeated this for different failure probabilities (from 0.05 to 0.2, at 0.05 intervals). The following information sources were considered in these experiments. • Case (1) - Domain Expert - We relied on the information provided by DeSi’s primary developer, and explored only the operational profiles suggested by him. • Case (2) - Simulation - We were provided with DeSi’s requirements [44], based on which we specified a sequence of high-level events to simulate the dynamic behavior model of DeSiController shown in Figure 5.8(a). We obtained training data by leveraging the simulation trace and applied our HMM-based approach to obtain behavioral transition probabilities (recall Chapter 5.2). • Case (3) - Functionally similar component - We obtained training data from an older version of DeSi that was missing certain functionality. We again applied our HMM-based approach to obtain behavioral transition probabilities. Our results are presented in Figure 5.9, where we plot component reliability as a function of recovery probability corresponding to the defect class under consideration. Each curve in the figure corresponds to a different failure probability, p, again, corre- sponding to the defect class under consideration. Specifically, we activate defect d 1 from Table 5.1 in Figure 5.9(a), defect d 2 in Figure 5.9(b), defect d 3 in Figure 5.9(c), and defectd 4 in Figure 5.9(d). As in the case of SCRover’s Controller, we observe that the trends conform to our expectations in all four cases for all defects. Although the general trends across the experiments are similar, Figure 5.9 yields some interesting observations. First, the sensitivity of the Case (1) results, and their accuracy as compared to the implementation-level model results, varies depending on the defect being studied. We have observed this situation in a number of other examples. As in the results in Chapter 5.3.1, this indicates that information provided by an expert 123 0.5 1 0.85 0.9 0.95 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.85 0.9 0.95 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.85 0.9 0.95 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.85 0.9 0.95 1 Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (a) Defectd 1 0.5 1 0.94 0.95 0.96 0.97 0.98 0.99 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.94 0.95 0.96 0.97 0.98 0.99 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.94 0.95 0.96 0.97 0.98 0.99 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.94 0.95 0.96 0.97 0.98 0.99 1 Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (b) Defectd 2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (c) Defectd 3 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (1) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (2) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Case (3) Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.88 0.9 0.92 0.94 0.96 0.98 1 Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 (d) Defectd 4 Figure 5.9: Analysis of sensitivity to information sources of DeSiController 124 may be inaccurate, or that in practice the component may not behave as expected. Rely- ing on expert opinion alone in estimating architecture-level reliability, as most existing approaches appear to do, can therefore be error-prone. Another observation is that in Figure 5.9(b), reliabilities in Case (3) are very high. This is because the older, functionally similar version of DeSi does not have the func- tionality that generates a deployment automatically based on user constraints. As a result, defectd 2 could never happen in this older version of DeSiController. Similarly, in Figure 5.9(d), Case (3) exhibits different sensitivity than results obtained using other information sources. This is because users rely more on creating deployments manually in DeSi’s older version, hence defect d 4 occurs more often in the older version, ulti- mately resulting in lower reliability values. This illustrates the fact that a functionally similar component is only useful in predicting reliability for the functionality that is available and used in a comparable fashion in both components. Information from other sources will be required to predict the effect of newly added functionality on certain defect classes. We also note that in the experiments of Figure 5.9, the implementation-level model exhibits higher reliability than the other cases. This occurs because the implementation- level model is finer-grained than the architectural models. As we have shown in Chapter 5.3.1, coarser-grained models give more conservative results in our approach. In summary, the results shown above corroborate our assertion that in order to pro- vide a meaningful evaluation of a component’s reliability, having information from mul- tiple sources is desirable: information from certain sources may be unavailable (e.g., functionally similar component) or inaccurate (e.g., expert opinion). 125 0.9 0.95 1 (iv) (iii) (ii) (i) (a) fp = 0.05, rp = 0.20 0.9 0.95 1 (iv) (iii) (ii) (i) (b) fp = 0.05, rp = 0.40 0.9 0.95 1 (iv) (iii) (ii) (i) (c) fp = 0.05, rp = 0.60 0.9 0.95 1 (iv) (iii) (ii) (i) (d) fp = 0.05, rp = 0.80 0.9 0.95 1 (iv) (iii) (ii) (i) (e) fp = 0.05, rp = 1.00 0.9 0.95 1 (iv) (iii) (ii) (i) (f) fp = 0.10, rp = 0.20 0.9 0.95 1 (iv) (iii) (ii) (i) (g) fp = 0.10, rp = 0.40 0.9 0.95 1 (iv) (iii) (ii) (i) (h) fp = 0.10, rp = 0.60 0.9 0.95 1 (iv) (iii) (ii) (i) (i) fp = 0.10, rp = 0.80 0.9 0.95 1 (iv) (iii) (ii) (i) (j) fp = 0.10, rp = 1.00 0.9 0.95 1 (iv) (iii) (ii) (i) (k) fp = 0.15, rp = 0.20 0.9 0.95 1 (iv) (iii) (ii) (i) (l) fp = 0.15, rp = 0.40 0.9 0.95 1 (iv) (iii) (ii) (i) (m) fp = 0.15, rp = 0.60 0.9 0.95 1 (iv) (iii) (ii) (i) (n) fp = 0.15, rp = 0.80 0.9 0.95 1 (iv) (iii) (ii) (i) (o) fp = 0.15, rp = 1.00 0.9 0.95 1 (iv) (iii) (ii) (i) (p) fp = 0.20, rp = 0.20 0.9 0.95 1 (iv) (iii) (ii) (i) (q) fp = 0.20, rp = 0.40 0.9 0.95 1 (iv) (iii) (ii) (i) (r) fp = 0.20, rp = 0.60 0.9 0.95 1 (iv) (iii) (ii) (i) (s) fp = 0.20, rp = 0.80 0.9 0.95 1 (iv) (iii) (ii) (i) (t) fp = 0.20, rp = 1.00 Figure 5.10: Analysis of sensitivity to operational profiles of DeSiController Sensitivity to Operational Profile We study the effect of changes in operational profiles to component reliability, similar to what we did in evaluating our approach using SCRover’s Controller. We consider the ranges of DeSiController’s reliability values when the probability of going from state Finished mapping to state Waiting for command (recall Figure 5.8(a)) varies between 0 and 1, while all other parameters in the operational profile are fixed. This corresponds to estimating the average number of iterations of DeSiController’s deployment calculation algorithm. Figure 5.10 depicts our results. In each graph, the horizontal bars represent the range of reliability values obtained by varying the probability of going from state Fin- ished mapping to state Waiting for command between 0 to 1. The bars labeled (i), (ii), (iii), and (iv) represent the defectsd 1 , d 2 , d 3 , andd 4 , respectively. We observe that the 126 0.5 1 0.6 0.7 0.8 0.9 1 Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.6 0.7 0.8 0.9 1 11 state model Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 0.5 1 0.6 0.7 0.8 0.9 1 24 state model Recov Prob Reliability p = 0.05 p = 0.1 p = 0.15 p = 0.2 Figure 5.11: Analysis of sensitivity to models of different granularities of DeSiCon- troller reliability ranges are larger when failure probabilities increase and/or recovery probabil- ities are lower. This observation agrees with what we observed in SCRover’s Controller. Another observation is that DeSiController’s reliability was most sensitive to defectsd 1 andd 3 . This is becaused 1 andd 3 directly affect the two states on which we focused in this particular scenario. Sensitivity to Model Granularity Figure 5.11 shows the results of calculating the reliability of the DeSiController compo- nent based on its models at the three levels of granularity from Figure 5.8, with injected defectd 3 from Table 5.1 and its operational profile estimated by the DeSi expert. Again, we plot reliability as a function of recovery probability fromd 3 -related failures, and the different curves correspond to failure probabilities due to d 3 . Performing this analysis using other information sources (functionally similar component and simulation) and other defects consistently yielded qualitatively similar results. The detailed model of DeSiController from Figure 5.8(a) is the one we have used in all of our measurements discussed in the preceding sections. Two higher-level models 127 of the same component, developed with the help of DeSi’s designers, are depicted in Figures 5.8(b) and 5.8(c). We observe that, when recovery probability is fixed while failure probability increases from 0.05 to 0.2, reliability values are most sensitive in the highest-level model (corresponding to Figure 5.8(c)). Another observation is that the model from Fig- ure 5.8(c) is more sensitive to recovery probability than the model from Figure 5.8(b), while the most detailed model (Figure 5.8(a)) is least sensitive. This agrees with the results obtained from SCRover’s Controller. In this more complex component, we observe that it is easier to narrow down the exact sources of defects using a detailed model. For example, defects associated with the middleware adaptor in DeSiController (the Processing middleware command state in the 11-state model of Figure 5.8(b)) may have been overlooked in the 5-state model. This is because the processing of all user-level commands in the 5-state model is described in a single state — Processing command. 5.4 Conclusions Meaningful architecture-level reliability prediction is critical to the cost-effective devel- opment of complex software systems. However, early efforts in this area have assumed some degree of knowledge of operational profiles. We have argued that these assump- tions are not reasonable, and have presented a way to estimate operational profiles from different information sources. We approached the challenges associated with the lack of information about a sys- tem and its components early in development by exploring the sources of information available at design time. Our evaluation and validation experiments indicate that, use 128 of our approach in determining operational profiles, results in accurate sensitivity in reliability estimates, where implementations are used as ground truth. 129 Chapter 6 Conclusions and Future Work As our reliance on software systems grows, it has become more important to perform quality analysis early. This is because if problems are discovered after the system has been implemented, it is prohibitively expensive to mitigate the problems. In this dissertation, we have focused on addressing the following shortcomings of existing approaches in early software quality analysis: (1) the high cost of existing design-level reliability estimation approaches, especially when applied to modeling concurrent sys- tems; (2) the high cost of testing-based approaches for performance analysis of third- party components, especially when testing at high workload; and (3) the unreasonable assumption on the availability of the system’s and its components’ operational profiles, which are typically gathered during runtime. In this chapter, we summarize our contributions in Chapter 6.1, and highlight a few future work directions in Chapter 6.2. 6.1 Summary and Contributions In Chapter 3 we proposed SHARP, an architecture-level, hierarchical framework that is capable of modeling concurrent systems in a scalable manner, without sacrificing the level of details we can model about the system. In SHARP, to generate a system model, first we generate models of the basic scenarios by leveraging system use-case scenario models. Then, we combine the models of the basic scenarios to form a higher-level model, according to the relationships between the lower-level models. Thus, system 130 reliability is the reliability of the highest-level scenario. This hierarchical approach is motivated by the fact that submodels are small, and that solving a number of smaller submodels is more computationally efficient than solving one huge model (as in “brute- force” approaches). Through extensive experimentation we validated the complexity and accuracy of this approach, which illustrates that the practical space and computa- tional complexity benefits are achieved at the cost of small losses in accuracy as com- pared to existing techniques. In Chapter 4, we presented a queueing model-based framework that accurately pre- dicts response time of third-party WSs. Recall that performance testing is quite an expensive process, as it requires sending a large amount of requests to the system being tested, which may saturate its resource when testing at high workload. The main idea behind our approach is that we avoid testing at high workload, and instead use queueing models to guide extrapolation, so as to overcome the poor extrapolation results using standard regression analysis. We have shown that our approach is more accurate in extrapolation, while maintaining the accuracy of interpolation, as compared to applying standard regression analysis. Such information can be useful in WS selection (e.g., use the WS that provides the best performance), capacity planning (e.g., estimate how much traffic the system can handle), and traffic engineering (e.g., determine how much traffic should be sent to WSs that provide the same service). In Chapter 5, we have overcome the lack of operational profile information by uti- lizing a variety of other available information sources. We have identified four major sources of information during design (Chapter 5.2): (1) expert’s knowledge, (2) require- ments document and system specifications, (3) functionally similar system/component, 131 and (4) simulation of architectural models, and apply the HMM-based technique pro- posed in [19] to estimate operational profiles from execution logs of a functionally sim- ilar system/component, or simulation logs. While our discussion has focused on com- ponent reliability analysis, we believe our approach is applicable to system-level reli- ability analysis as well. We have applied our operational profile estimation technique to the component reliability prediction process described in [19] to validate its effec- tiveness. We compared reliability estimates when the operational profile is estimated from different sources of information, and the results are validated by comparisons to an implementation-level technique, which is used as the “ground truth”. For instance, our results indicate that expert knowledge alone, on which existing approaches often appear to rely, may lead to inaccurate predictions. A rigorous evaluation process on a large number of software components shows that our framework has a high degree of predictive power and resiliency to changes in the identified parameters. 6.2 Future Work This section highlights a few directions to further improve software quality analysis. 6.2.1 Integrating Firmware Properties In Chapter 5, while we have addressed the problem of estimating a component’s opera- tional profile in reliability analysis, the problem of not knowing the failure information remain unresolved. Estimating the failure information of a software component requires understanding of the underlying platform, such as the operating system, middleware, and hardware resources, which we collectively refer to as firmware. Existing software reliability analysis techniques, including [19] that we used in Chapter 5, assume the underlying firmware is reliable. Being able to determine the failure information with 132 more certainties allows us to focus on a smaller parameter space in studying a com- ponent’s reliability (recall Chapter 5.2). At the same time, as discussed in Chapter 1, estimating a component’s performance also requires knowledge about the firmware, as the component’s performance is highly dependent on the performance of the firmware. Thus, in order to accurately predict a component’s performance, it is important to model the firmware. However, integrating firmware properties into software quality analysis is a chal- lenging problem. This is because of the complex interactions between software and the underlying firmware. For example, how does one map an application-level operation (e.g., storing sensor data in a database) to a sequence of hardware-level operations (e.g., executing a sequence of CPU, memory and disk operations)? Such a mapping is typi- cally needed in modeling the firmware, but this is a complex process as it involves going through multiple software and hardware layers. Another challenging issue in integrating firmware properties is that firmware prop- erties should be integrated in a composable manner, so that software designers need not to generate the model from scratch when they evaluate the same software on dif- ferent firmware platforms. However, different firmware platforms may map the same application-level operation differently, and are therefore not composable. Thus, it is very expensive to study how the software system performs on different firmware plat- forms. 6.2.2 Performability Analysis We treated performance and reliability separately in this dissertation; yet, the dependen- cies between software performance and reliability should not be ignored. This unified analysis is referred to as performability analysis in [50]. For example, a software system may have performance requirements (e.g., a request has to be completed inX seconds); 133 failure to meet such a requirement may be considered as a failure. Estimating how often this failure occurs involves estimating the system’s performance. Another example is modeling systems that may run in a degraded mode: the system can provide its intended functionalities even when part of the system has failed, but the performance is degraded. For example, when a disk drive in a disk array has crashed, the software may still be able to read data from the disk array, but the response time may be higher. One challenge is that existing performance estimation techniques focus on mean value-type analysis, while performability analysis typically requires knowing the dis- tribution of performance metrics. For example, if a system is considered unreliable when it takes more thanX to process a request, we need to know the distribution of the response time while integrating this requirement into a performability model. However, it is analytically difficult to obtain this information using existing performance analysis techniques (such as mean value analysis in QNs [69]). One can use simulation to gather this information, but simulation is more expensive. Another challenge is that a system’s performance is highly dependent on the firmware it runs on, and thus we face a similar challenge as discussed above. Without accurate performance estimates, performability analysis would not be very meaningful. 6.2.3 Reliability Testing As discussed in Chapter 1, there are few alternatives to reliability testing to evaluate software reliability where the source code is unavailable. As in performance analysis, analyzing the reliability of third-party components is important, because they contribute to the reliability of the overall system. Hence, it is vital to understand the reliability of third-party components before it has been integrated into the system. The challenge is that reliability testing is prohibitively expensive, because, on the average, it requires sending a large amount of requests (in the order of 100,000 requests) 134 to observe one error. The Seekda WS search engine [5] estimates the reliability of a WS by monitoring the responsiveness of the WS server (e.g., if it responds to ping). While this can be used as a coarse estimate, the WS itself has never been evaluated. For example, Seekda would report a WS as reliable even if the WS returns incorrect results. The work in [80] proposes a way to estimate reliability of third-party WSs using failure data of “similar” service users (e.g., users from the same ISP or making requests to WSs in the same administrative region). We argue that this approach may be inac- curate because different users may have different operational profiles and/or reliability definitions (recall Chapters 3 and 5). 135 Bibliography [1] http://webgis.usc.edu/Services/Geocode/WebService/GeocoderService V02 94.asmx?WSDL. [2] http://vista.usc.edu/ lccheung/reliability. [3] Java adventure builder reference application. http://adventurebuilder.dev.java.net. [4] SCRover. http://cse.usc.edu/iscr/pages/ProjectDescription/home.htm. [5] Seekda. http://webservices.seekda.com. [6] TPC-App benchmark. http://www.tpc.org/tpc app/default.asp. [7] Weather bug web service. http://api.wxbug.net/webservice-v1.asmx?WSDL. [8] Web service description language (WSDL). www.w3.org/TR/wsdl. [9] CSCI 477. Design and construction of large software systems, University of South- ern California, 2003. http://sunset.usc.edu/ neno/cs477 2003/. [10] S. Balsamo et al. Model-based performance prediction in software development: A survey. IEEE TSE, 30(5), May 2004. [11] Forest Baskett, K. Mani Chandy, Richard R. Muntz, and Fernando G. Palacios. Open, closed, and mixed networks of queues with different classes of customers. J. ACM, 22(2):248–260, 1975. [12] Steffen Becker, Lars Grunske, Raffaela Mirandola, and Sven Overhage. Perfor- mance prediction of component-based systems — a survey from an engineering perspective. Architectung Systems with Trustworthy Components, 2006. [13] Christopher Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. [14] Barry Boehm and Victor R. Basili. Software defect reduction top 10 list. Com- puter, 34:135–137, January 2001. 136 [15] Barry Boehm, Jesal Bhuta, David Garlan, Eric Gradman, LiGuo Huang, Alexan- der Lam, Ray Madachy, Nenad Medvidovic, Kenneth Meyer, Steven Meyers, Gus- tavo Perez, Kirk Reinholtz, Roshanak Roshandel, and Nicolas Rouquette. Using empirical testbeds to accelerate technology maturity and transition: The scrover experience. In Proceedings of ISESE’04, pages 117 – 126, 2004. [16] Valeria Cardellini, Emiliano Casalicchio, Vincenzo Grassi, and Francesco Lo Presti. Flow-based service selection for web service composition. In IEEE ICWS, 2007. [17] Senthilanand Chandrasekaran, John Miller, Gregory Silver, Budak Arpinar, and Amit Sheth. Performance analysis and simulation of composite web services. Electronic Markets, 13(2), 2003. [18] Leslie Cheung, Leana Golubchik, and Nenad Medvidovic. SHARP: A scalable approach to architecture-level reliability prediction of concurrent systems. In Pro- ceedings of the First International workshop on Quantitative Stochastic Models in the Verification and Design of Software Systems, May 2010. [19] Leslie Cheung, Roshanak Roshandel, Nenad Medvidovic, and Leana Golubchik. Early prediction of software component reliability. In ICSE 2008. [20] R.C. Cheung. A user-oriented software reliability model. IEEE TSE, 6(2), 1980. [21] Vittorio Cortellessa and Vincenzo Grassi. A modeling approach to analyze the impact of error propagation on reliability of component-based systems. In CBSE 2007, pages 140–156, 2007. [22] Vittorio Cortellessa and Raffaela Mirandola. Deriving a queueing network based performance model from UML diagrams. In ACM Proc. Intl Workshop Software and Performance, pages 58–70, 2000. [23] Vittorio Cortellessa, Harshinder Singh, and Bojan Cukic. Early reliability assess- ment of UML based software models. In Proceedings of the 3rd international workshop on Software and performance, pages 302–309, 2002. [24] Andrea D’Ambrogio and Paolo Bocciarelli. A model-driven approach to describe and predict the performance of composite services. In WOSP’07, 2007. [25] Carl de Boor. A Practical Guide to Splines. Springer, 2001. [26] Norman Draper and Harry Smith. Applied Regression Analysis. Wiley- Interscience, 1998. [27] Evelyn Duesterwald and Vasanth Bala. Software profiling for hot path prediction: less is more. SIGPLAN Not., 35:202–211, November 2000. 137 [28] Rehab El-Kharboutly, Reda A. Ammar, and Swapna S. Gokhale. UML-based methodology for reliability analysis of concurrent software applications. I. J. Com- put. Appl., 14(4):250–259, 2007. [29] S. Gokhale and K. Trivedi. Analytical models for architecture-based software reli- ability prediction: A unification framework. IEEE Transactions on Reliability, 55(4), Dec 2006. [30] Swapna Gokhale. Architecture-based software reliability analysis: Overview and limitations. IEEE TDSC, 4(1), Jan 2007. [31] Swapna Gokhale, Michael Lyu, and Kishor Trivedi. Reliability simulation of com- ponent based software systems. In Proceedings of ISSRE’98, pages 192–201, 1998. [32] Swapna Gokhale and Kishor Trivedi. Reliability prediction and sensitivity analysis based on software architecture. In ISSRE 2002. [33] Swapna S. Gokhale, W. Eric Wong, Kishor S. Trivedi, and J. R. Horgan. An analytical approach to architecture-based software reliability prediction. In IEEE International Computer Performance and Dependability Symposium, 1998. [34] K. Goseva-Popstojanova and K. Trivedi. Architecture-based approaches to soft- ware reliability prediction. Intl. J.Comp. & Math. with Applications, 46(7), Oct 2003. [35] Katerina Goseva-Popstojanova, Ahmed Hassan, Walid Abdelmoez, Diaa Eldin M. Nassar, Hany Ammar, and Ali Mili. Architectural-level risk analysis using UML. IEEE TSE, 29(3), Oct 2003. [36] Katerina Goseva-Popstojanova and Sunil Kamavaram. Software reliability esti- mation under uncertainty: Generalization of the method of moments. In Proc. of HASE 2004. [37] Katerina Goseva-Popstojanova and Kishor Trivedi. Architecture-based approach to reliability assessment of software systems. Performance Evaluation, 45:179– 204, 2001. [38] Anne Immonen and Eila Niemela. Survey of reliability and availability predic- tion methods from the viewpoint of software architecture. Software and Systems Modeling, Jan 2007. [39] S. Krishnamurthy and A. Mathur. On the estimation of reliability of a software system using reliabilities of its components. In In Proceedings of ISSRE 1997. 138 [40] Michael Kuperberg, Klaus Krogmann, and Ralf Reussner. Performance predic- tion for black-box components using reengineered parametric behaviour models. In CBSE ’08: Proceedings of the 11th International Symposium on Component- Based Software Engineering, pages 48–63, Berlin, Heidelberg, 2008. Springer- Verlag. [41] B. Littlewood. A reliability model for Markov structured software. In Proceedings of the international conference on Reliable software, pages 204–207, 1975. [42] B. Littlewood. Software reliability model for modular program structure. IEEE Transactions on Reliability, 28(3), 1979. [43] J. Magee and J. Kramer. Concurrency: State Models And Java Programs. John Wiley & Sons, 2006. [44] Sam Malek, Nels Beckman, Marija Mikic-Rakic, and Nenad Medvidovic. A framework for ensuring and improving dependability in highly distributed systems. Architecting Dependable Systems III, Oct 2005. [45] Sam Malek, Chiyoung Seo, Sharmila Ravula, Brad Petrus, , and Nenad Med- vidovic. Reconceptualizing a family of heterogeneous embedded systems via explicit architectural support. In ICSE 2007. [46] Moreno Marzolla and Raffaela Mirandola. Performance prediction of web service workflows. In QoSA’07, 2007. [47] Nenad Medvidovic and Richard Taylor. A classification and comparison frame- work for software architecture description languages. IEEE Trans. on Software Engineering, 26(1):70–93, Jan 2000. [48] Daniel Menasce, Honglei Ruan, and Hassan Gomaa. QoS management in service- oriented architectures. Performance Evaluation, 64(7-8), 2006. [49] C. D. Meyer. Stochastic complementation, uncoupling markov chains, and the theory of nearly reducible systems. SIAM Review, 31(2)::240–271, 1989. [50] J.F. Meyer. On evaluating the performability of degradable computing systems. IEEE Transactions on Computers, 29:720–731, 1980. [51] Ian Molyneaux. The Art of Application Performance Testing: Help for Program- mers and Quality Assurance. O’Reilly Media, 2009. [52] John D. Musa. Operational profiles in software-reliability engineering. IEEE Softw., 10(2):14–32, 1993. 139 [53] D.E. Perry and A. L. Wolf. Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes, 17:40–52, 1992. [54] Erik Putrycz, Murray Woodside, and Xiuping Wu. Performance techniques for cots systems. IEEE Softw., 22(4):36–44, 2005. [55] L.R. Rabiner. A tutorial on hidden markov models. Proceedings of the IEEE, 77:257–286, 1989. [56] Franco Raimondi, James Skene, and Wolfgang Emmerich. Efficient online moni- toring of web-service slas. In Proceedings of FSE-16, 2008. [57] Carl Edward Rasmussen and Chris Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. [58] R.R. Reussner, H.W. Schmidt, and I.H. Poernomo. Reliability prediction for component-based software architectures. J. of Sys. and Software, 66(3), 2003. [59] Genaina Rodrigues, David S. Rosenblum, and Sebastin Uchitel. Using scenarios to predict the reliability of concurreny component-based software systems. In FASE 2005. [60] Roshanak Roshandel, Somo Banerjee, Leslie Cheung, Nenad Medvidovic, and Leana Golubchik. Estimating software component reliability by leveraging archi- tectural models. In Emerging Results track, ICSE 2006, pages 853–856, May 2006. [61] Roshanak Roshandel and Nenad Medvidovic. Multi-view software component modeling for dependability. Architecting Dependable Systems II, 2004. [62] Roshanak Roshandel, Nenad Medvidovic, and Leana Golubchik. A Bayesian model for predicting reliability of software systems at the architectural level. In QoSA 2007. [63] Roshanak Roshandel, Bradley Schmerl, Nenad Medvidovic, David Garlan, and Dehua Zhang. Understanding tradeoffs among different architectural modeling approaches. In WICSA 2004. [64] P. Schweitzer. Approximate analysis of multiclass closed networks of queues. In International Conference on Stochastic Control and Optimization, Amsterdam, 1979. [65] M Shooman. Structural models for software reliability prediction. In In Proceed- ings of ICSE 1976. [66] Kyle Siegrist. Reliability of systems with Markov transfer of control. IEEE TSE, 13(7), July 1988. 140 [67] Connie Smith. Performance Engineering of Software Systems. Addison Wesley, 1990. [68] Hyung Gi Song and Kangsun Lee. sPAC (web services performance analysis cen- tre): Performance analysis and estimation tool of web services. BPM 2005, LNCS 3649, 2005. [69] W. Stewart. Probability, Markov Chains, Queues, and Simulation. Princeton Uni- versity Press, 2009. [70] William Stewart. Introduction to the numerical solution of Markov chains. Prince- ton University Press, 1994. [71] Richard Taylor, Nenad Medvidovic, and Eric Dashofy. Software Architecture: Foundations, Theory, and Practice. Wiley, 2009. [72] H.C. Tijms. Stochastic Models. John Wiley and Sons, 1994. [73] Sebastian Uchitel, Jeff Kramer, and Jeff Magee. Detecting implied scenarios in message sequence chart specifications. SIGSOFT Softw. Eng. Notes, 26(5):74–82, 2001. [74] Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. An analytical model for multi-tier internet services and its applications. SIGMETRICS Perform. Eval. Rev., 33(1):291–302, 2005. [75] Kaiyu Wang and Naishio Tian. Performance modeling of composite web services. In Proceedings of the Pacific-Asia Conference on Circuits, Communications and System, 2009. [76] W. Wang, D. Pan, and M. Chen. Architecture-based software reliability modeling. J. of Systems and Software, 79(1), 2006. [77] M. Xie and C. Wohlin. An additive reliabiltiy model for the analysis of modular software failure data. In ISSRE 95. [78] Sherif M. Yacoub, Bojan Cukic, and Hany H. Ammar. A scenario-based reliability analysis approach for component-based software. IEEE Transactions on Reliabil- ity, 53(4):465–480, 2004. [79] Daniel M. Yellin and Robert E. Strom. Protocol specifications and component adaptors. ACM TOPLAS, 19(2):292–333, 1997. [80] Zibin Zheng and Michael R. Lyu. Collaborative reliability prediction of service- oriented systems. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE ’10, pages 35–44, New York, NY , USA, 2010. ACM. 141
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Calculating architectural reliability via modeling and analysis
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Software connectors for highly distributed and voluminous data-intensive systems
PDF
Design and evaluation of a fault tolerance protocol in Bistro
PDF
Deriving component‐level behavior models from scenario‐based requirements
PDF
Techniques for methodically exploring software development alternatives
PDF
Software quality analysis: a value-based approach
PDF
Automated synthesis of domain-specific model interpreters
PDF
Constraint-based program analysis for concurrent software
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Using metrics of scattering to assess software quality
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Proactive detection of higher-order software design conflicts
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Architectural evolution and decay in software systems
PDF
Domain-based effort distribution model for software cost estimation
PDF
Self-assembly for discreet, fault-tolerant, and scalable computation on internet-sized distributed networks
PDF
Designing an optimal software intensive system acquisition: a game theoretic approach
PDF
Security functional requirements analysis for developing secure software
PDF
Prediction of energy consumption behavior in component-based distributed systems
Asset Metadata
Creator
Cheung, Leslie Chi-Keung
(author)
Core Title
Design-time software quality modeling and analysis of distributed software-intensive systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
05/02/2011
Defense Date
09/28/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
modeling and analysis,OAI-PMH Harvest,software performance,software reliability
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Golubchik, Leana (
committee chair
), Gupta, Sandeep K. (
committee member
), Medvidovic, Nenad (
committee member
)
Creator Email
lccheung@usc.edu,leslieck@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3831
Unique identifier
UC1166897
Identifier
etd-Cheung-4558 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-458195 (legacy record id),usctheses-m3831 (legacy record id)
Legacy Identifier
etd-Cheung-4558.pdf
Dmrecord
458195
Document Type
Dissertation
Rights
Cheung, Leslie Chi-Keung
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
modeling and analysis
software performance
software reliability