Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Value-based, dependency-aware inspection and test prioritization
(USC Thesis Other)
Value-based, dependency-aware inspection and test prioritization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
VALUE-BASED, DEPENDENCY-AWARE INSPECTION AND TEST PRIORITIZATION by Qi Li A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2012 Copyright 2012 Qi Li ii Dedication To my parents iii Acknowledgements My Ph.D dissertation could not be completed without the support of many hearts and minds. I am deeply indebted to my Ph.D advisor Dr. Barry Boehm, for his great and generous support for all my Ph.D research. I am deeply honored to be one of his students and get direct and close advice from him all the time. My sincere thanks are also extended to other committee members Dr. Stan Settles, Dr. Nenad Medvidovic, Dr. Richard Selby, Dr. William Halfond and Dr. Sunita Chulani, for the invaluable guidance on focusing my research and efforts on reviewing drafts of my dissertation. Special thanks to my ISCAS advisors, Professor Mingshu Li, Professor Qing Wang, and Professor Ye Yang. They led me into the academic world and continuously encourage, support my research, and promote the in-depth collaborative research in our joint lab of USC-CSSE & ISCAS. The realization of this research effort also exists because of the tremendous support from Dr. Jo Ann Lane and Dr. Ricardo Valerdi. In addition, this research could not have been conducted without support from the University of Southern California Center for Systems and Software Engineering courses, corporate, and academic affiliates, especial thanks to Galorath Incorporated, NFS-China for giving me the chance to apply this research into the real industrial projects, to USC-CSSE graduate-level software engineering courses 577ab Year 2009-2011 students for their collaborative effort on the Value-based Inspection and Testing experiments, to all my USC and ISCAS colleagues and friends, life could not be more colorful without you. Lastly, from the bottom of my heart, I would like to thank my family for their unconditional love and support during my study. iv Table of Contents Dedication................................................................................................................... ii Acknowledgements .................................................................................................... iii Chapter 1: Introduction ............................................................................................. 1 1.1. Motivation .................................................................................................... 1 1.2. Research Contributions.................................................................................. 4 1.3. Organization of Dissertation .......................................................................... 5 Chapter 2: A Survey of Related Work ....................................................................... 7 2.1. Value-Based Software Engineering .................................................................... 7 2.2. Software Review Techniques ............................................................................. 9 2.3. Software Testing Techniques............................................................................ 11 2.4. Software Test Case Prioritization Techniques ................................................... 12 2.5. Defect Removal Techniques Comparison.......................................................... 19 Chapter 3: Framework of Value-Based, Dependency-Aware Inspection and Test Prioritization ............................................................................................................ 22 3.1. Value-Based Prioritization ............................................................................... 22 3.1.1. Prioritization Drivers ................................................................................. 23 3.1.1.1.Stakeholder Prioritization..................................................................... 23 3.1.1.2.Business /mission value ....................................................................... 24 3.1.1.3.Defect Criticality ................................................................................. 24 3.1.1.4.Defect Proneness ................................................................................. 25 3.1.1.5.Testing or Inspection Cost.................................................................... 25 v 3.1.1.6.Time- to-Market .................................................................................. 26 3.1.2. Value-Based Prioritization Strategy ........................................................... 26 3.2. Dependency-Aware Prioritization ..................................................................... 27 3.2.1.Loose Dependencies ................................................................................... 27 3.2.2.Tight Dependencies .................................................................................... 29 3.3. The Process of Value-Based. Dependency-Aware Inspection and Testing .......... 31 3.4. Key Performance Evaluation Measures............................................................. 34 3.4.1. Value and Business Importance ................................................................. 34 3.4.2. Risk Reduction Leverage ........................................................................... 34 3.4.3. Average Percentage of Business Importance Earned (APBIE):.................... 35 3.5. Hypotheses, Methods to test ............................................................................. 36 Chapter 4: Case Study I-Prioritize Artifacts to be Reviewed .................................. 41 4.1. Background ..................................................................................................... 41 4.2. Case Study Design ........................................................................................... 45 4.3. Results............................................................................................................. 53 Chapter 5: Case Study II-Prioritize Testing Scenarios to be Applied...................... 65 5.1. Background ..................................................................................................... 65 5.2. Case Study Design ........................................................................................... 68 5.2.1. Maximize Testing Coverage ...................................................................... 68 5.2.2. The step to determine Business Value ........................................................ 70 5.2.3. The step to determine Risk Probability ....................................................... 71 5.2.4. The step to determine Cost......................................................................... 72 vi 5.2.5. The step to determine Testing Priority ........................................................ 74 5.3. Results............................................................................................................. 75 5.4. Lessons Learned .............................................................................................. 80 Chapter 6: Case Study III-Prioritize Software Features to be functionally Tested . 84 6.1. Background ..................................................................................................... 84 6.2. Case Study Design ........................................................................................... 84 6.2.1. The step to determine Business Value ........................................................ 84 6.2.2. The step to determine Risk Probability ....................................................... 86 6.2.3. The step to determine Testing Cost ............................................................ 92 6.2.4. The step to determine Testing Priority ........................................................ 93 6.3. Results............................................................................................................. 94 Chapter 7: Case Study IV-Prioritize Test Cases to be Executed............................ 102 7.1. Background ................................................................................................... 102 7.2. Case Study Design ......................................................................................... 103 7.2.1. The step to do Dependency Analysis ........................................................ 103 7.2.2. The step to determine Business Importance .............................................. 104 7.2.3. The step to determine Criticality .............................................................. 108 7.2.4. The step to determine Failure Probability ................................................. 109 7.2.5. The step to determine Test Cost ............................................................... 111 7.2.6. The step for Value-Based Test Case Prioritization .................................... 111 7.3. Results........................................................................................................... 114 7.3.1. One Example Project Results ................................................................... 114 vii 7.3.2. All Team Results:.................................................................................... 119 7.3.2.1 A Tool for Faciliating Test Case Prioritization: ................................... 120 7.3.2.2 Statistical Results for All Teams via this Tool..................................... 124 7.3.2.3. Lessons learned ................................................................................ 132 Chapter 8: Threats to Validity ............................................................................... 133 Chapter 9: Next Steps............................................................................................. 138 Chapter 10: Conclusions ........................................................................................ 142 Bibliography ........................................................................................................... 144 viii List of Tables Table 1. Comparsion Results of Value-based Group A and Value-neutral Group B .......................................................................................................................... 10 Table 2. Test Suite and List of Faults Exposed ................................................... 15 Table 3 Business Importance Distribution (Two Situations) ................................ 16 Table 4. Comparison for TCP techniques ............................................................ 18 Table 5. An Example of Quantifying Dependency Ratings ................................... 29 Table 6. Case Studies Overview .......................................................................... 38 Table 7.V&V assignments for Fall2009/2010 ...................................................... 44 Table 8. Acronyms.............................................................................................. 44 Table 9. Documents and sections to be reviewed.................................................. 45 Table 10. Value-neutral Formal V&V process ..................................................... 46 Table 11. Value-based V&V process ................................................................... 47 Table 12. An example of value-based artifact prioritization .................................. 48 Table 13. An example of Top 10 Issues ............................................................... 50 Table 14. Issue Severity & Priority rate mapping ................................................. 52 Table 15. Resolution options in Bugzilla.............................................................. 52 Table 16. Review effectiveness measures ............................................................ 53 Table 17. Number of Concerns ............................................................................ 54 Table 18. Number of Concerns per reviewing hour .............................................. 55 Table 19. Review Effort ...................................................................................... 56 Table 20. Review Effectiveness of total Concerns ................................................ 57 ix Table 21. Average of Impact per Concern ............................................................ 58 Table 22. Cost Effectiveness of Concerns ............................................................ 59 Table 23. Data Summaries based on all Metrics ................................................... 62 Table 24. Statistics Comparative Results between Years ...................................... 61 Table 25 Macro-feature coverage ........................................................................ 68 Table 26. FU Ratings .......................................................................................... 70 Table 27. Product Importance Ratings ................................................................. 71 Table 28. RP Ratings ......................................................................................... 71 Table 29. Installation Type .................................................................................. 72 Table 30. Average Time for Testing Macro 1-3.................................................... 72 Table 31. Testing Cost Ratings ............................................................................ 73 Table 32. Testing Priorities for 10 Local Installation Working Environments ........ 74 Table 33. Testing Priorities for 3 Server Installation Working Environments ........ 75 Table 30. Value-based Scenario Testing Order and Metrics .................................. 76 Table 35. Testing Results .................................................................................... 77 Table 36. Testing Results (continued) .................................................................. 77 Table 37. APBIE Comparison ............................................................................. 79 Table 38. Relative Business Importance Calculation ............................................ 85 Table 39. Risk Factors’ Weights Calculation-AHP .............................................. 88 Table 40. Quality Risk Probability Calculation (Before System Testing).............. 90 Table 41. Correlation among Initial Risk Factors: ................................................ 91 Table 42. Relative Testing Cost Estimation.......................................................... 92 Table 43 Correlation between Business Importance and Testing Cost .................. 93 x Table 44. Value Priority Calculation.................................................................... 94 Table 45. Guideline for rating BI for test cases................................................... 107 Table 46. Guideline for rating Criticality for test cases ....................................... 109 Table 47. Self-check questions used for rating Failure Probability ..................... 110 Table 48. Mapping Test Case BI &Criticality to Defect Severity& Priority ........ 118 Table 49. Relations between Reported Defects and Test Cases ........................... 119 Table 50. APBIE Comparison (all teams) .......................................................... 127 Table 51. Delivered Value Comparison when Cost is fixed (all teams)................ 128 Table 52. Cost Comparison when Delivered Value is fixed (all teams)................ 129 Table 53. APBIE Comparison (11 teams) .......................................................... 130 Table 54. Delivered Value Comparison when Cost is fixed (11 teams)................ 131 Table 55. Cost Comparison when Delivered Value is fixed (11 teams)................ 131 xi List of Figures Figure 1. Pareto Curves ........................................................................................ 2 Figure 2. Value Flow vs. Software Development Lifecycle .................................... 3 Figure 3. The “4+1” Theory of VBSE: overall structure ....................................... 8 Figure 4. Software Testing Process-Oriented Expansion of VBSE “4+1” Theory and Key Practices ................................................................................................. 8 Figure 5. Value-based Review (VBR) Process .................................................... 10 Figure 6. Coverage-based Test Case Prioritization .............................................. 12 Figure 7. Comparison under Situation 1 ............................................................... 16 Figure 8. Comparison under Situation 2 ............................................................... 17 Figure 9. Overview of Value-based Software Testing Prioritization Strategy......... 22 Figure 10. An Example of Loose Dependencies ................................................... 28 Figure 11. An Example of Tight Dependencies .................................................... 30 Figure 12. Benefits Chain for Value-based Testing Process Implementation ......... 31 Figure 13. Software Testing Process-Oriented Expansion of “4+1” VBSE Framework ......................................................................................................... 32 Figure 14. ICSM framework tailored for csci577 ................................................ 42 Figure 15. Scenarios to be tested ........................................................................ 67 Figure 16. Comparison among 3 Situations .......................................................... 79 Figure 17. Business Importance Distribution....................................................... 86 Figure 18. Testing Cost Estimation Distribution................................................... 93 Figure 19. Comparison between Value-Based and Inverse order ........................... 95 Figure 20. Initial Estimating Testing Cost and Actual Testing Cost Comparison .. 95 xii Figure 21. BI, Cost and ROI between Testing Rounds ......................................... 96 Figure 22. Accumulated BI Earned During Testing Rounds................................. 97 Figure 23. BI Loss (Pressure Rate=1%) ............................................................... 99 Figure 24. BI Loss (Pressure Rate=4%) .............................................................. 99 Figure 25. BI Loss (Pressure Rate=16%) ............................................................. 99 Figure 26. Value Functions for “Business Importance” and “Testing Cost” ......... 100 Figure 27. Dependency Graph with Risk Analysis ............................................. 104 Figure 28. Typical production function for software product features.................. 105 Figure 29. Test Case BI Distribution of Team01 Project..................................... 108 Figure 30. Failure Probability Distribution of Team01 Project ............................ 111 Figure 31. In-Process Value-Based TCP Algorithm............................................ 114 Figure 32. PBIE curve according to Value-Based TCP (APBIE=81.9%) ............. 115 Figure 33. PBIE Comparison without risk analysis between Value-Based and Value- Neutral TCP (APBIE_value_based=52%, APBIE_value_neutral=46%).............. 117 Figure 34. An Example of Customized Test Case in TestLink ............................ 121 Figure 35. A Tool for facilitating Value-based Test Case Prioritization in TestLink ........................................................................................................................ 122 Figure 36. APBIE Comparison .......................................................................... 124 Figure 37. Delivered-Value Comparison when Cost is fixed ............................... 125 Figure 38. Cost Comparison when Delivered Value is fixed ............................... 126 xiii Abbreviations ICSM Phases: ICSM: Incremental Commitment Spiral Model VC: Valuation Commitment FC: Foundation Commitment DC: Development Commitment TRR: Transition Readiness Review RDC: Rebaselined Development Commitment IOC: Initial Operational Capability TS: Transition & Support Artifacts developed and reviewed for USC CSCI577 OCD: Operational Concept Description SSRD: System and Software Requirements Description SSAD: System and Software Architecture Description LCP: Life Cycle Plan FED: Feasibility Evidence Description SID: Supporting Information Document QMP :Quality Management Plan IP: Iteration Plan IAR: Iteration Assessment Report TP: Transition Plan xiv TPC: Test Plan and Cases TPR: Test Procedures and Result UM: User Manual SP: Support Plan TM: Training Materials Value-Based, Dependency-Aware inspection and test prioritization related RRL: Risk Reduction Level ROI: Return On Investment BI: Business Importance ABI: Accumulated Business Importance PBIE: Percentage of Business Importance Earned APBIE: Average Percentage of Business Importance Earned AC: Accumulated Cost FU: Frequency of Use RP: Risk Probability TC: Testing Cost TP: Test Priority PI: Product Importance Others: FV&V: Formal Verification & Validation xv VbV&V: Value-based Verification & Validation Eval: Evaluation ARB: Architecture Review Board xvi Abstract As two of the most popular defect removal activities, Inspection and Testing are of the most labor-intensive activities in software development life cycle and consumes between 30% and 50% of total development costs according to many studies. However, most of the current defect removal strategies treat all instances of software artifacts as equally important in a value-neutral way; this becomes more risky for high-value software under limited funding and competitive pressures. In order to save software inspection and testing effort to further improve affordability and timeliness while achieving acceptable software quality, this research introduces a value-based, dependency-aware inspection and test prioritization strategy for improving the lifecycle cost-effectiveness of software defect removal options. This allows various defect removal types, activities, and artifacts to be ranked by how well they reduce risk exposure. Combining this with their relative costs enables them to be prioritized in terms of Return On Investment (ROI) or Risk Reduction Leverage (RRL). Furthermore, this strategy enables organizations to deal with two types of common dependencies among items to be prioritized. This strategy will help project managers determine “how much software inspection/testing is enough?” under time and budget constraints. Besides, a new metric Average Percentage of Business Importance Earned (APBIE) is proposed to measure how quickly testing can reduce the quality uncertainty and earn the relative business importance of the System Under Test (SUT). This Value-Based, Dependency-Aware Inspection and Testing strategy has been empirically studied and successfully applied on a series of case studies within different prioritization granularity levels: (1). Prioritizing artifacts to be reviewed in 21 graduate- xvii level, real-client software engineering course projects; (2). Prioritizing testing scenarios to be applied in an industrial project at the acceptance testing phase in Galorath, Inc.; (3). Prioritizing software features to be functionally tested in an industrial project in the China-NFS company; (4). Prioritizing test cases to be executed in 18 course projects. All the comparative statistics analysis from the four case studies show positive results from applying the Value-Based, Dependency-Aware strategy. 1 Chapter 1: Introduction 1.1.Motivation Traditional verification & validation, and testing methodologies such as: path, branch, instruction, mutation, scenario, or requirements testing usually treat all aspects of software as equally important [Boehm and Basili, 2001], [Boehm, 2003]. This leads to a purely technical issue leaving the close relationship between testing and business decisions unlinked and the potential value contribution of testing unexploited [Ramler et al., 2005]. However, commercial experience is often that 80% of the business value is covered by 20% of the tests or defects, and that prioritizing by value produces significant payoffs [Bullock, 2000], [Gerrard and Thompson, 2002], [Persson and Yilmazturk, 2004]. Also, current “Earned Value” systems fundamentally track the project progress against the plan, and cannot track changes in the business value of the system being developed. Furthermore, system value-domain problems are the chief sources of software project failures, such as unrealistic expectations, unclear objectives, unrealistic time frames, lack of user input, incomplete requirement, or changing requirements [Johnson, 2006]. All of these plus the increasing criticality of software within systems, make value- neutral software engineering methods increasingly risky. Boehm and Basili’s “Software Defect Reduction Top 10 List” [Boehm and Basili, 2001] shows that “Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase. Current software projects spend about 40 to 50 percent of their effort on avoidable rework. About 80 percent of avoidable rework comes from 20 percent of the defects. 2 About 80 percent of the defects come from 20 percent of the modules, and about half the modules are defect free. About 90 percent of the downtime comes from, at most, 10 percent of the defects. Peer reviews catch 60 percent of the defects. Perspective-based reviews catch 35 percent more defects than non-directed reviews. Disciplined personal practices can reduce defect introduction rates by up to 75 percent” [Boehm and Basili, 2001]. Figure 1. Pareto Curves [Bullock, 2000] The upper Pareto curve in Figure 1 comes from an experience report [Bullock, 2000] for which 20% of the features provide 80% of the business value. It shows that among the 15 customer types, the first one nearly consists of 50% of the billing revenues and that 80% of the test cases generate only 20% of the business value. So, focusing the effort on the high-payoff test cases will generate the highest ROI. The linear curve is representative of most automated test generation tools. It is equally likely to test the high and low value types, so in general, it shows a linear payoff. Value-neutral method can do even worse than this. For example, many projects focus on reducing the number of 3 outstanding problem reports as quickly as possible, leading to first fixing the easiest problems such as typos, or grammar mistakes. This generates a value curve much worse than the linear one. From the perspective of VBSE, the full range of the software development lifecycle (SDLC) is a value flow that begins with value objective assessment and capture by value-based requirement acquisition, business case analysis, early design and architecting, followed by value implementation by detailed architecting, and development; and value realization by testing to ensure the value objectives are satisfied before transitioned and delivered to customers by means of value-prioritized test cases being executed and passed, as shown in Figure 2. Monitoring and controlling actual value being earned by project’s results in terms of multiple value objectives can enable organizations to pro-actively monitor and control not only fast-breaking risks to project success in delivering expected value, but also fast-breaking opportunities to switch to even higher-value emerging capabilities to avoid highly efficient waste of an organization’s scarce resources. Figure 2. Value Flow vs. Software Development Lifecycle Value Objective Capture Acquisiti on, Requirement Design, Architect Development Test & Transition Value Implementation Value Realization 4 Each of the system’s value objectives is corresponding to at least one test item, e.g. an operational scenario, a software feature, or a test case that is used to measure whether this value objective is achieved or not in order to earn the relevant value. The whole testing process could be seen as a Value Earned process by executing and successfully passing one test case, and earning one piece of value etc. In the Value-Based Software Engineering community, value is not only limited to purely financial terms, but extended to as relative worth, utility or importance to provide help address software engineering decisions [Boehm, 2003]. Business Importance in terms of Return On Investment (ROI) is often used to measure the relative value of functions, components, features or even systems for business domain software systems. So the testing process under this business domain context could also be accordingly defined as a Business Importance Earned process. To measure how quickly a testing strategy can earn the business importance, especially under time and budget constraints, a new metric Average Percentage of Business Importance Earned (APBIE) is proposed and will be introduced in detail in Chapter 3. 1.2.Research Contributions The research is intended to provide the following contributions: Current software inspection and testing process investigation and analysis; Propose a real “Earned Value” system to track business value of testing and measure testing efficiency in terms of Average Percentage of Business Importance Earned (APBIE); 5 Propose a systematic strategy for Value-Based, Dependency Aware Inspection & Testing Processes; Apply this strategy to a series of empirical studies with different granularities of prioritization; Elaborate decision criteria of testing/inspection priorities per project contexts, which are helpful and insightful for real industry practices; Implement an automatic tool for facilitating Value-Based, Dependency-Aware prioritization. 1.3.Organization of Dissertation The organization of this dissertation is as follows: Chapter 2 presents a survey of results Value-Based Software Engineering, software inspection techniques, software testing process strategies, software test case prioritization techniques and defect removal techniques. Chapter 3 introduces the methodology of Value-Based, Dependency Aware inspection and testing prioritization strategy and process, proposes key performance evaluation measures, research hypotheses, and methods to test the hypotheses. Chapter 4-7 introduces the detailed steps and practices to apply the Value-Based, Dependency Aware prioritization strategy onto four typical inspection and testing case studies. For each case study, project backgrounds, case study designs, implementation steps are introduced, comparative analysis is conducted, both qualitative and quantitative results and lessons learned are summarized: 6 Chapter 4 introduces the prioritization of artifacts to be reviewed on USC-CSSE graduate-level, real-client course projects for its formal inspection; Chapter 5 conducts the prioritization of operational scenarios to be applied in Galorath, Inc. for its performance testing; Chapter 6 illustrates the prioritization of features to be tested on a Chinese software company for its functionality testing; Chapter 7 presents the prioritization of test cases to be executed on USC-CSSE graduate level course projects at its acceptance testing phase. Chapter 8 explains some threats to validity; Chapter 9 and 10 propose some future research work and conclude the contributions of this research dissertation. 7 Chapter 2: A Survey of Related Work 2.1. Value-Based Software Engineering Value-Based Software Engineering (VBSE) is a discipline that addresses and integrates economic aspects and value considerations into the full range of existing and emerging software engineering principles and practices, processes, activities and tasks, technology, management and tools decisions in the software development context [Boehm, 2003]. The engine in the center is the Success-Critical Stakeholder (SCS) Win-Win Theory W [Boehm, 1988], [Boehm et al., 2007], which addresses what values are important and how success is assured for a given software engineering organization. The four supporting theories that it draws upon are utility theory, decision theory, dependency theory, and control theory, respectively dealing with how important are the values, how do stakeholders’ values determine decisions, how do dependencies affect value realization, and how to adapt to change and control value realization. VBSE key practices includes: benefits realization analysis; stakeholder Win-Win negotiation; business case analysis; continuous risk and opportunity management; concurrent system and software engineering; value-based monitoring and control and change as opportunity. This process has been integrated with the spiral model of system and software development and evolution [Boehm et al., 2007] and its next generation system and software engineering successor, the Incremental Commitment Spiral Model [Boehm and Lane, 2007]. 8 Figure 3. The “4+1” Theory of VB SE: overall structure [Boehm and Jain, 2005] The Value-based Software Engineering theory is the fundamental theory for the proposed Value-based Inspection and Test Prioritization strategy. Our strategy is VBSE theory’s application on Software Testing and Inspection process. Our strategy’s mapping to the VBSE’s “4+1” theory and key practices is shown in Figure 4. Figure 4. Software Testing Process-Oriented Expansion of VB SE “4+1” Theory and Key Practices 9 2.2. Software Review Techniques Up to date, many focused review or reading methods and techniques have been proposed, practiced and proved to be superior to unfocused reviews. The most common one in practice is checklist-based reviewing (CBR) [Fagan, 1976], others include perspective-based reviewing (PBR) [Basili et al., 1996], [Li et al., 2008], defect-based reading (DBR) [Porter et al., 1995], functionality-based reading (FBR) [Abdelrabi et al., 2004] and usage-based reading (UBR) [Conradi and Wang, 2003], [Thelin et al., 2003]. However, Most of them are value-neutral (except UBR) and focused on one single aspect, e.g. DBR focuses defect classification to find defects in artifacts and a scenario is a key factor in DBR. UBR focuses on prioritizing use cases in order of importance from a user perspective. FBR is proposed to trace framework requirements to produce well- constructed framework and review the code. As an initial value-based set of peer review guidelines [Lee and Boehm, 2005], its process consists of: first, a win-win negotiation among stakeholders defines the priority of each system capability; Based on the checklists for each artifact, domain expert will determine the criticality of issue; next, the system capabilities with high priorities were reviewed first; third, at each priority level, the high-criticality sources of risk were reviewed first, as shown in Figure 5. The experiment uses Group A: 15 IV&V personnel using VBR procedures and checklists, Group B 13 IV&V personnel using previous value- neutral checklists. The result of the initial experiment found a factor-of-2 improvement in value added per hour of peer review time as shown in Table 1. 10 Figure 5. Value-based Review (VBR) Process [Lee and Boehm, 2005] Table 1. Comparsion Results of Value-based Group A and Value-neutral Group B [Lee and Boehm, 2005] By Number P- value % Gr A higher By Impact P- value % Gr A higher Average of Concerns 0.202 34 Average Impact of Concerns 0.049 65 Average of Problem s 0.056 51 Average Impact of Problem s 0.012 89 Average of Concerns per hour 0.026 55 Average Cost Effectiveness of Concerns 0.004 105 Average of Problem s per hour 0.023 61 Average Cost Effectiveness of Problem s 0.007 108 As a new contribution to value-based V&V process development, the Value- Based, Dependency-Aware prioritization strategy was then customized to develop a systematic and multi-criteria process to quantitatively determine the priorities of artifacts to be reviewed. This process adds Quality Risk Probability, Cost and 11 Dependency considerations into the prioritization and has been successfully applied on USC-CSSE graduate level, real client course projects with statistically significant improvement of review cost effectiveness, which will be introduced in Chapter 4. 2.3. Software Testing Techniques Rudolf Ramler outlines a framework for value-based test management [Ramler et al., 2005], it is a synthesis of current most relevant processes and a high-level guideline without detail implementation specifications and empirical validation. Stale Amland introduces a risk-based testing approach [Amland, 1999]. It states that resources should be focused on those areas representing the highest risk exposure. However, this method doesn’t consider the testing cost which is also an essential factor during testing process. Boehm and Huang propose a quantitative risk analysis [Boehm et al., 2004] that helps determine when to stop testing software and release the product under different organizational contexts and different desired quality levels. However, it is a macroscopic empirical data analysis without process guidance in detail. Other relevant work includes usage-based testing, and statistical-based testing [Cobb and Mills, 1990], [Hao and Mendes, 2006], [Kouchakdjian and Fietkiewicz, 2000], [Musa, 1992], [Walton et al., 1995], [Whittaker and Thomason, 1994], [Williams and Paradkar, 1999]. Usage model characterizes operational use of a software system, then generate random test cases from the usage model, perform statistical testing of the software, record any observed failure, and analyze the test results using a reliability model to provide a basis for statistical inference of reliability of the software during operational use. Statistical testing based on a software usage model ensures that the failures that will 12 occur most frequently in operational use will be found early in the testing cycle. However, it doesn’t differentiate failure’s impact and operational usages’ business importance. 2.4. Software Test Case Prioritization Techniques Most of current test case prioritization (TCP) techniques [Elbaum et al., 2000], [Elbaum et al., 2002], [Elbaum et al., 2004], [Rothermel et al., 1999], [Rothermel et al., 2001], are coverage-based, and aim to improve a test suite’s rate of fault detection, a measure of how quickly faults are detected within the testing process, in order to get earlier feedback on the System Under Test (SUT). The metric Average Percentage of Faults Detected (APFD) is used to measure how quickly the faults are identified for a given test suite. These TCP techniques are all based on coverage of statements or branches in the programs, assuming that all the statements or branches are equally important, all faults have equal severity and all test cases have equal costs. An example of coverage-based test case prioritization is shown in Figure 6. Figure 6. Coverage-based Test Case Prioritization [Rothermel et al., 1999] S.Elbaum proposed a new “cost-cognizant” metric, APFDc, for assessing the rate of fault of detection of prioritized test cases that incorporates varying test case and fault costs [Elbaum et al., 2001], [Malishevsky et al., 2006], which should reward test cases orders proportionally to their rate of “unit-of-fault-severity-detected-per-unit-test-cost”. 13 By incorporating context and lifetime factors, improved cost-benefit models are provided for use in assessing regression testing methodology and effects of time constraints on the costs and benefits of prioritization techniques [Do and Rothermel, 2006], [Do et al., 2008], [Do and Rothermel, 2008]. However, he didn’t incorporate the failure probability in the prioritization. H.Srikanth presented a requirement-based system level test case prioritization called the Prioritization of Requirements for Test (PORT) based on requirements volatility, customer priority, implementation complexity, and fault proneness of the requirement to improve the rate of detection of severe faults , measured by Average Severity of Faults Detected (ASFD), however, she didn’t consider the cost of testing in the prioritization. More recently, there has been a group of related work on fault-proneness test prioritization based on failure prediction, the most representative one is CRANE [Czerwonka et al., 2011], a failure prediction, change risk analysis and test prioritization system at Microsoft Corporation that leverages existing research [Bird et al., 2009], [Eaddy et al., 2008], [Nagappan et al., 2006], [Pinzger et al., 2008], [Srivastava and Thiagarajan, 2002], [Zimmermann and Nagappan, 2008], for the development and maintenance of Windows Vista. It prioritized the selected tests by “changed blocks covered per test cost unit” ratio [Czerwonka et al., 2011]. Their test prioritization is mainly based on the program change analysis in order to estimate the more fault-prone parts, however, program change is only one factor that would influence the failure probability, other factors, e.g. personnel qualification, module complexity etc. should influence the prediction of failure probability as well. Besides it didn’t consider the business value from customers and the different importance levels of modules, and defects. 14 Some other fault/failure prediction work to identify the fault-prone components in a system [58-60] is also relevant to our work. Other related work of test case prioritization can be found at some recent systematic review work [Roongruangsuwan and Daengdej, 2010], [Yoo and Harman, 2011], [Zhang et al., 2009]. In our research, a new metric: Average Percentage of Business Importance Earned (APBIE) to measure how quickly the SUT’s value is realized for a given test suite or how quickly the business importance can be earned by testing under the VBSE environment. The definition of APBIE will be introduced in detail in Chapter 3. Comparison among TCP techniques Most of the current Test Case Prioritization techniques [Elbaum et al., 2000, 2001 2002, 2004], [Malishevsky et al., 2006], [Do and Rothermel, 2006], [Do and Rothermel, 2008], [Do et al., 2008], [Rothermel et al., 1999], [Rothermel et al., 2001], [Srikanth et al., 2005] are under the prerequisite that: which test cases will expose which faults is known, and aims to improve the rate of “fault detection”. In order to predict the defect proneness to support more practical test case prioritization, current research in this field trends to develop various defect prediction techniques that serve as the basis for test prioritization [Bird et al., 2009], [Czerwonka et al., 2011], [Eaddy et al., 2008], [Emam et al., 2001], [Nagappan et al., 2006], [Ostrand et al., 2005, 2007], [Pinzger et al., 2008], [Srivastava and Thiagarajan, 2002], [Zimmermann and Nagappan, 2008] . In order to call for more attention to the value considerations into the current test case prioritization techniques, we used a simple example as shown in Table 2 from Rothermel’s paper [Rothermel et al., 1999] (which could also be representative of other 15 similar coverage-based TCP techniques) and constructed two situations as displayed in Table 3 for this example. Although these two situations are emulated, they can represent most of the real situations. Table 2. Test Suite and List of Faults Exposed [Rothermel et al., 1999] Fault 1 2 3 4 5 6 7 8 9 10 A X X B X X X X C X X X X X X X D X E X X X Rothermel’s test case prioritization technique is under the perquisite that: which test cases will expose which faults is known. Based on Rothermel’s method, the testing order should be “C-E-B-A-D”, however, his prioritization doesn’t differentiate the business importance of each test suite, let’s make some assumptions to show what his prioritization can result in if the business importance of each test suite is know. Let’s assume that test suite’s business importance is independent of faults seeded as shown in Table 2. The business importance is from the customer’s value perspectives on the relevant features that those test suites can represent. 16 Table 3 Business Importance Distribution (Two Situations) Situation 1 (Best Case) Situation 2 (Worst Case) Business Importance Accumulated BI Business Importance Accumulated BI C 50% 50% 5% 5% E 20% 70% 10% 15% B 15% 85% 15% 30% A 10% 95% 20% 50% D 5% 100% 50% 100% APBIE 80% 40% Situation 1: If it is lucky enough (the possibility should be very low in reality) that the business importance percentage distribution of the five test suites is shown as in the Situation 1 in Table 3, “C-E-B-A-D” is also the testing order if we apply Value-based TCP. So the PBIE curves for both our method and Rothermel’s overlap as shown in Figure 7. This testing order is the optimal for both rates of “business importance earned” and “faults detected”. Figure 7. Comparison under Situation 1 Start 1 2 3 4 5 PBIE Test Case Order Ours Rothermel 17 Situation 2: If the business importance percentage distribution of the five test suites is shown as in the Situation 2 in Table 3 “C-E-B-A-D” is the Rothemel’s TCP order with the APBIE=40%, however, our value-based method’s TCP order is “D-A-B-E-C” with the APBIE=80% as shown in Figure 8. So our method can improve the testing efficiency by a factor of 2 in terms of APBIE in this situation when compared with Rothermel’s method. Figure 8. Comparison under Situation 2 The comparison results shows that it is possible, but the possibility is extremely low, that Rothermel’s testing order can overlap the value-based order, and most often time the APBIE is lower than our value-based TCP technique. Because the two techniques have different optimized goals: our method aims to improve APBIE, while his method aims to improve “the rate of fault detection”. Besides, a comprehensive comparison among the state-of-art TCP techniques is shown in Table 4. The prioritization algorithm is the same, and all use the greedy algorithm or its variants to first pick the best candidate, making the local optimal choice at each step in order to achieve the global optimum. However, the selecting goals are different, for Rothermel’s method, the goal is to pick the one that can expose the most faults; while for our method, the goal is to pick the one that represents the highest testing Start 1 2 3 4 5 PBIE Test Case Order Rothermel's Ours 18 value. Rothermel’s test case prioritization aims to improve “the rate of fault detection”, measured by Average Percentage of Fault Detection (APFD), but our method’s goal aims to improve “the rate of business importance earned”, measured by Average Percentage of Business Importance Earned (APBIE). Table 4. Comparison for TCP techniques Rothermel et al., 1999 Elbaum et al., 2001 Srikanth et al., 2005 Czerwonka et al., 2011 Our method Prioritization algorithm Greedy Greedy Greedy NA Greedy Coverage-based Defect- Proneness based Value- based Goal Maximize the rate of fault detected Maximize the rate of “unit-of-fault- severity- detected-per- unit-test-cost” Maximize the rate of “severity of faults Detected Maximize the chances of finding defects in the changed code Maximize the rate of business importance earned Measure APFD: Average Percentage of Faults Detected APFDc: Average Percentage of Faults Detected, incorporating testing C ost ASFD: Average Severity of Faults Detected FRP: Fix Regression Proneness APBIE: Average Percentage of Business Importance Earned Assumption? under the prerequisite that which test case will expose which faults is known, and those faults are seeded deliberately No No Practical? Infrequently, because of the assumption above Yes Yes Factors for Prioritization ∙Risk Size? (business importance + defect impact) No Partial: consider the defect severity Partial: consider the customer- assigned priority No Yes ∙Risk Probability? No No Partial: consider requirement change, complexity, fault prone. Partial: mainly consider code change impact by version control system s Yes ∙Cost? No Yes No No Yes ∙Dependency? No No No No Yes 19 As an additional case of the application of the Value-Based, Dependency-Aware strategy, we recently experimented a more systematic value-based test case prioritization of a set of test cases to be executed for acceptance and regression testing on the USC- CSSE graduate-level, real-client course projects, with improved testing efficiency and effectiveness, which will be introduced in Chapter 7. Our prioritization is more systematic, because we synthetically consider the business importance from customers’ perspective, the failure probability, the execution cost and dependency among them into the prioritization. 2.5. Def ect Removal Techniques Comparison The efficiency of review and testing are compared in Constructive QUALity Model (COQUALMO) [Boehm et al., 2000]. To determine the Defect Removal Fraction (DRFs) associated with each of the six levels (i.e., Very Low, Low, Nominal, High, Very High, Extra High) of the three profiles (i.e., automated analysis, people reviews, execution testing and tools) for each of three types of defect artifacts (i.e., requirement defects, design defects, and code defects), it conducted a two-round Delphi. This study found that people review is the most efficient on removing requirement and design defects, and testing is the most efficient on removing code defects. Madachy and Boehm extended their previous work on COQUALMO and assessed software quality process with the Orthogonal Defect Classification COnstructive QUALity MOdel (ODC COQUALMO) that predicts defects introduced and removed, classified by ODC types [Chillarege et al., 1992], [Madachy and Boehm, 2008]. A comprehensive Delphi survey was used to capture more detailed efficiencies of the techniques (automated 20 analysis, execution testing, and tools, and peer reviews) against ODC defect categories as an extension on the previous work [Boehm et al., 2000]. In [Jones, 2008], Capers Jones lists Defect Removal Efficiency of 16 combinations of 4 defect removal methods: design inspections, code inspections, quality assurance, and testing. These results show that, on one side, no single defect removal method is adequate, on the other side, implies that removal efficiency from better to worse would be design inspections, code inspections, testing and quality assurance. However, all the above defect removal technique comparison work is based on Delphi surveys, and still lack quantitative data evidence from industry. Based on the experience from the manufacturing area that has been brought to the software domain and software reliability models to predict the future failure behavior, S. Wagner presents a model for quality economics of defect-detection techniques [Wagner and Seifert, 2005]. This model is proposed to estimate the effects of a combination and remove such influences when evaluating a single technique. However, this model is a theoretic model without real industry data validation. More recently, Frank Elberzhager presented an integrated two-stage inspection and testing process on the code level [Elberzhager et al., 2011]. In particular, defect results from an inspection are used in two-stage manner: first, prioritize parts of the system that are defect-prone and then prioritizes defect types that appear often. However, the combined prioritization is mainly using defects detected from inspection to estimate failure probability in order to prioritize testing activities, without considerations on defect removal technique efficiency comparison by defect type among inspection, testing or other defect removal techniques. 21 We plan to collect real industry project data to compare the defect removal techniques’ efficiency based on RRL to further calibrate ODC COQUALMO. And then select or combine defect removal techniques by defect type to optimize the scarce inspection and testing resources which will be discussed in Chapter 9 as our next-step work. 22 Figure 9. Overview of Value-based Software Testing Prioritization Strategy Chapter 3: Framework of Value-Based, Dependency-Aware Inspection and Test Prioritization This chapter will introduce the methodology of the Value-Based, Dependency Aware inspection and testing prioritization strategy and process, proposes key performance evaluation measures, research hypotheses and the methods to test those hypotheses. 3.1. Value-Based Prioritization The systematic and comprehensive value-based, risk-driven inspection and testing 23 prioritization strategy, proposed to improve their cost-effectiveness, is shown in Figure 9. It illustrates the value-based inspection and testing prioritization’s methodology, composed of four main consecutive parts: prioritization drivers, which deals with what are the project success-critical factors are and how they influence the software inspection and testing; prioritization strategy, which deals with how to make optimal trade-offs among those drivers; prioritization case studies, which deals with how to apply the value-based prioritization strategy into practices, especially under industry contexts and this part will be introduced in detail from Chapter 4 to Chapter 7; and prioritization evaluation which deals with how to track the business value of inspection and testing and measure their cost-effectiveness. These fours questions from each part will be answered and explained 3.1.1. Prioritization Drivers Most of the current testing prioritization strategies focus on optimizing one single goal, i.e. coverage-based testing prioritization aims to maximum the testing coverage per unit testing time, risk-driven testing aims to detect the most fault-prone parts at the earliest time etc. Besides, seldom research work incorporates the business or mission value into the prioritization. In order to build a systematic and comprehensive prioritization mechanism, the prioritization should take all project success-critical factors into consideration, i.e., business or mission value, testing cost, defect criticality, and defect- prone probability, for some business critical projects, the time to market should also be added into prioritization. The value-based prioritization drivers should include: 3.1.1.1.Stakeholder Prioritization The first step of value-based inspection and testing is to identify Success-Critical Stakeholders (SCSs) and understand their roles played during the inspection and testing 24 process and their respective win conditions. Direct stakeholders of testing are testing team, especially testing manager, developers and project managers, who directly interact with the testing team. In the spirit of value-based software engineering important parties for testing are key customers as the source of value objectives, which set the context and scope of testing. Marketing and product managers assist in testing for planning releases, pricing, promotion, and distribution. We will look at the following factors that must be considered when prioritizing the testing order of new features, and they represent SCSs’s win conditions: 3.1.1.2.Business /mission value Business or mission value is captured by business case analysis with the prioritization of success-critical stakeholder value propositions; Business Importance of having the features gives information as to what extent mutually agreed requirements are satisfied and to what extent the software meets key customers’ value propositions. CRACK (Collaborative, Representative, Authorized, Committed and Knowledgeable) [Boehm and Turner, 2003] customer representatives are the source of features’ relative business importance. Only if their most valuable propositions or requirements have been understood clearly, developed correctly, tested thoroughly and delivered timely, the project could be seen as a successful one. So under this situation, CRACK customer representatives are most likely to be collaborative and knowledgeable to provide the relative business importance information. 3.1.1.3.Defect Criticality Defect criticality is captured by measuring the impact of absence of an expected feature, not achieving a performance requirement, or the failure of a test case, Combining 25 with the business or mission value, it serves as the other factor to determine the Size of Loss as shown in Figure 9. 3.1.1.4.Defect Proneness Defect-proneness is captured by expert estimation based on historical data or past experiences, design or implementation complexity, qualification of the responsible personnel, code change impact analysis etc. Quality of the software product is another success-critical factor that needs to be considered for the testing process. The focus of quality risk analysis is on identifying and eliminating risks that are potential value breakers and inhibit value achievements. The information of quality risk could help testing manager with risk management, progress estimation, and quality management. Testing managers are interested in the identification of problems particularly the problem trends that helps to estimate and control testing process. By risk identification and analysis, it will also provide the developing manager some potential process improvement opportunities to mitigate project risks in the future. So both of the testing manager and developing team are willing to be collaborative with each other to do the quality risk analysis. 3.1.1.5.Testing or Inspection Cost Testing or inspection cost is captured by expert estimation based on historical data or past experiences, or by some state-of-art testing cost estimation techniques or tools; Testing cost is considered as an investment in software development and should also be seriously considered during the testing process. This would become more crucial as the time-critical deliverables are required, e.g., when time-to-market greatly influences the market share. If most of the testing effort is put into testing features or test cases, or 26 scenarios with relatively less business importance, that will lose more market share and lead to decreasing customer’s profits, even negative profits in the worst case. Testing managers are interested in making testing process more efficient, by putting more effort on the features with higher business importance. 3.1.1.6.Time- to-Market Time-to-market can greatly influence the effort distribution of software developing and project planning. Because the testing phase serves as the adjacent phase before software product transition and delivery, it will be influenced even more by market pressure [Yang et al., 2008]. Sometimes, in the intense market competition situation, sacrificing some software quality to avoid more market share erosion might be a good organizational strategy. Huang and Boehm [Huang and Boehm, 2006] propose a value-based software quality model that helps to answer the question “How much testing is enough?” in three types of organizational contexts: early start-up, commercial, and high finance. For example, an early start-up will have a much higher risk impact due to market share erosion than the other two. Thus better strategy for an early start-up is to deliver a lower quality product than invest in quality beyond the threshold of negative returns due to market share erosion. Marketing and product managers help to provide the market information and assist in testing for planning releases, pricing, promotion, and distribution. 3.1.2. Value-Based Prioritization Strategy The value-based inspection and testing prioritization strategy synthetically considers business importance from the client’s value perspective combined with the criticality of failure occurrence as a measure of the size of loss at risk. For each test item 27 (e.g. artifacts, testing feature, testing scenario, or test case), the probability of loss is the probability that a given test item would catch the defect, estimated from an experience base that would indicate defect-prone components or performers. Since Size (Loss) * Probability (Loss) = Risk Exposure. This enables the testing items to be ranked by how well they reduce risk exposure. Combining their risk exposures with their relative testing costs enables the test items to be prioritized in terms of Return On Investment (ROI) or Risk Reduction Leverage (RRL), where the quantity of Risk Reduction Leverage (RRL) is defined as follows [Selby, 2007]: Where RE before is the RE before initiating the risk reduction effort and RE after is the RE afterwards. Thus, RRL serves as the engine for the testing prioritization and is a measure of the relative cost-benefit ratio of performing various candidate risk reduction activities, e.g. testing in this case study. 3.2. Dependency-Aware Prioritization In our case studies, two types of dependencies are dealt with, they are “Loose Dependencies” and “Tight Dependencies”, their definitions, typical examples, and our solutions to them are introduced as below: 3.2.1.Loose Dependencies “Loose Dependencies” is defined as : it would be ok to continue task without awareness of dependencies, but would be better with awareness. The typical case is those dependencies among artifacts to be reviewed in the inspection process. 28 For example, Figure 10 illustrates the dependencies among four artifacts to be reviewed for CSCI577ab course projects: System and Software Requirement Description (SSRD), System and Software Architecture Description (SSAD), Acceptance Testing Plan and Cases (ATPC), Supporting Information Description (SID). Although they are course artifacts, they also represent typical requirement, design, test and other supporting documents in real industrial projects. As shown in Figure 10, SSRD is the requirement document and usually can be reviewed directly; in order to review use cases, UML diagrams in SSAD, or test cases in ATPC, it is better to review requirements first in SSRD at least to check whether those use cases, UML diagrams in SSAD or test cases in ATPC cover all the requirements in SSRD, so SSAD and ATPC depend on SSRD as the arrows illustrate in Figure 10. SID maintains the traceability matrices among requirements in SSRD, use cases in SSAD and test cases in ATPC, so it is better to have all the requirements, uses cases and test cases in hand when reviewing the traceability, so SID depends on all the other three artifacts. But it won’t bother or block to go ahead to review SSAD or ATPC without reviewing SSRD, or review SID without refereeing all other artifacts. So we call this type of dependencies “loose dependencies”. Figure 10. An Example of Loose Dependencies 29 Basically, the more artifacts this document depends on, the higher the Dependency rating is, and the lower the reviewing priority will be , which can be represented by the formula as below: In order to quantify the loose dependency and add it to the review priority calculation, Table 5 displays a simple example. The number of artifacts this document depends on is counted, qualitative ratings Low, Moderate and High are mapped, and numeric values (1, 2, 3) are added in to calculating the priority. Other numeric values e.g. (1, 5, 10) or (1, 2, 4) can also be used if necessary. The case study in Chapter 4 will introduce more about how to deal with this type of the loose dependency into the Value- Based prioritization. Table 5. An Example of Quantifying Dependency Ratings # of dependable artifacts Dependency Ratings Numeric Values SSRD 0 Low 1 SSAD, ATPC 1 Moderate 2 SID 3 High 3 3.2.2.Tight Dependencies “Tight Dependencies” is defined as: the successor task has to wait until all its precursor tasks finish, the failure of the precursor will block the successor. The typical case is the dependencies among the test cases to be executed during the testing process. 30 Figure 11. An Example of Tight Dependencies Figure 11 illustrates a simple dependency tree among 7 test cases (T1-T7), each node represents a test case, the numeric value in each node represents the RRL of the test case. If T1 fails to pass, it will block all other test cases that depend on it, e.g. T3, T4, T5, T6 and T7, and we call this type of dependencies “Tight Dependencies”. A prioritization algorithm is proposed to deal with this type of dependencies, and it is a variant of the greedy algorithm: it first selects the one with the highest RRL, and check whether it depends on other test cases; if it has dependencies, and in its dependency set, recursively selects the one with the highest RRL until selecting the one with no dependencies. The detailed algorithm and prioritization logics will be introduced in Chapter 7. For the 7 test cases in Figure 11, according to the algorithm, T2, T5 and T6 have the highest RRL with the value of 9. However, T6 depends on T3 and T1, T5 depends on T1, while T2 has no dependencies and can be directly executed. So T2 is the first test case to be executed. Since both T5 and T6 depend on T1, T1 is tested in order to test those high payoff T5 and T6. After T1 is passed, T5 with the highest RRL is unblocked and ready for testing. Recursively running the algorithm results in the order “T2->T1- >T5->T3->T6->T4->T7”. More test cases’ prioritization for real projects will be introduced and illustrated in Chapter 7. 31 3.3. The Process of Value-Based. Dependency-Aware Inspection and Testing Figure 12 displays the benefits chain for value-based testing process implementation including all these SCSs’ roles and their win conditions if we consider software testing as an investment during the whole software life cycle. Figure 12. Benefits Chain for Value-based Testing Process Implementation Figure 13 illustrates the whole process of this value-based software testing method. This method helps test manager consider all the win-conditions from SCSs, enact the testing plan and adjust it during testing execution. The main steps are as follows: 32 Figure 13. Software Testing Process-Oriented Expansion of “4+1” VB SE Framework Step 1: Define Utility Function of Business Importance, Quality Risk Probability and Cost. After identifying SCSs and their win conditions, the next step is to understand and create the single utility function for each win-condition and how they influence the SCSs’ value propositions. With the assistance of the key CRACK customer, the testing manager uses a method first proposed by Karl Wiegers [Wiegers, 1999] to get the relative Business Importance for each feature. The developing manager and the test manager accompanied with some experienced developers, calculate the quality risk probability of each feature. The test manager with the developing team estimate the testing cost for each feature This step brings the stakeholders together to consolidate their value models and to negotiate testing objectives. This step is in line with the Dependency and Utility Theory in VBSE that helps to identify all of the SCSs and understand how the SCSs want to win. 33 Step 2: Testing Prioritization Decision for Testing Plan. Then business importance, quality risk and testing cost are put together to calculate a value priority number in terms of RRL for each item to be prioritized, e.g. artifact, scenario, feature, or test case. This is like a multi-objective decision and negotiation process which follows the Decision Theory in VBSE. Features’ value priority helps test manager enact the testing plan, and resources should be focused on those areas representing the most important business value, the lowest testing cost and highest quality risk. Step 3: Control Testing Process according to Feedback. During the testing process, each item’s value priority in terms of RRL is adjusted according to the feedback of quality risk indicators and updated testing cost estimation. This step assists to control progress toward SCS win-win realization which is according to the Control Theory of VBSE. Step 4: Determine How Much Testing is Enough under Different Market Patterns. One of the strengths of “4+1” VBSE Dependency Theory is to uncover factors that are external to the system but can impact the project’s outcome. It serves to align the stakeholder values with the organizational context. Market factors would influence organizations to different extent by different organizational contexts. A comparative analysis is done in Chapter 6 for different market patterns and the result shows that the value-based software testing method is especially effective when the market pressure is very high. 34 3.4. Key Performance Evaluation Measures 3.4.1. Value and Business Importance Some of the dictionary definitions of “value” (Webster 2002) are in purely financial terms, such as “the monetary worth of something: marketable price.” However, in the value-based software engineering community, it broader dictionary definition of “value” as relative worth, utility or importance to provide help address software engineering decisions. In our research, we usually use relative Business Importance to capture the client’s business value. 3.4.2. Risk Reduction Leverage The quantity of Risk Exposure (RE) is defined by: Where Size (Loss) is the risk impact size of loss if the outcome is unsatisfactory, Prob (Loss) is the probability of an unsatisfactory outcome. The quantity of Risk Reduction Leverage (RRL) is defined as follows: Where RE before is the RE before initiating the risk reduction effort and RE after is the RE afterwards. Thus, RRL is a measure of the relative cost-benefit ratio of performing various candidate risk reduction or defect removal activities. RRL serves as the engine for the prioritization strategy for different applications to improve the cost-effectiveness of defect removal activities. Its quantity acquisition can be different per its applications, project context and scenarios. For example, to quantify the effectiveness of a review, Review Cost Effectiveness defined as below is a variant of RRL 35 under the condition that the defects detected are 100% resolved and removed, which drops the Prob (Loss) is from 100% to 0%: 3.4.3. Average Percentage of Business Importance Earned (APBIE): This metric is defined to measure how quickly the SUT’s value is realized by testing. Let T be the whole test case suite for the SUT containing m test items, T’ be a selected and prioritized test suite subset containing n test items that will be executed and i is the ith test items is in the test order T’. It is obvious that T’ T, and n≤m; The Total Business Importance (TBI) for T is After business importance for the m test items are all rated, TBI is a constant. Initial Business Importance Earned (IBIE) is the sum of the business importance for those test items in the set of T-T’. . It could be 0 when T=T’. The Percentage of Business Importance Earned (PBIE i ) when the ith test item in the test order T’ is passed is 36 Average Percentage of Business Importance Earned (APBIE) is defined as: Average Percentage of Business Importance Earned (APBIE) is used to measure how quickly the SUT’s value is realized, the higher it is, and the more efficient the test is and it serves as another important metric to measure the cost-effectiveness of testing. 3.5. Hypotheses, Methods to test A series of hypotheses are defined to be tested. For value-based review process for prioritizing artifacts, the core hypothesis is: H-r1:the review cost effectiveness of concerns/problems on the same artifact package does not differ between value-based group (2010, 2011teams) & value-neutral one (2009 teams); Others auxiliary hypotheses include: H-r2:the number of concerns/problems reviewers found does not differ between groups; H-r3:the Impact of concerns/problems reviewers found does not differ between groups; and etc. Basically, concerns/problems data based on the defined metrics are collected from the tailored Bugzilla system and consolidated. Then their Mean, Standard Deviation will be compared, T-test and F-test are used to test whether those hypotheses can be accepted or rejected. For value-based scenarios/features/test cases prioritization, the core hypothesis is: H-t1: the value-based prioritization does not increase APBIE; 37 Others auxiliary hypotheses include: H-t2: the value-based prioritization does not lead high-impact defects to be detected earlier in the acceptance testing phase; H-t3: the value-based prioritization does not increase “Delivered-Value when Cost is Fixed” or does not save “Cost when Delivered-Value is fixed” under time constraints; To test H-t1 and H-t3, we will compare the experimented value-based testing case study with value-neutral ones. Then their Mean, Standard Deviation will be compared, T- test and F-test are used to test whether those hypotheses can be accepted or rejected. To test H-t2, we will observe the issues reported in the Bugzilla system to check whether issues with high priority and high severity are reported at the early stage of acceptance phase. Besides, its application from USC real-client course projects to other real industry projects can further test these hypotheses. Furthermore, qualitative methods, such as surveys or interviews will also be used in our case studies to complement the quantitative results. The Value-Based, Dependency-Aware prioritization strategy has been empirically studied and applied on defect removal activities within different prioritization granularity levels as summarized in Table 6. prioritization of artifacts to be reviewed on USC-CSSE graduate level real-client course projects for its formal inspection; prioritization of operational scenarios to be applied in Galorath, Inc. for its performance testing; 38 prioritization of features to be tested on a Chinese software company for its functionality testing; prioritization of test cases to be executed on USC-CSSE graduate level course projects at its acceptance testing phase. Table 6. Case Studies Overview Case Studies Defect Removal Activities Items to be Prioritized Granularity for Prioritiz ation Prioritiz ation Drivers Business Value Risk Probability Testing Cost Dependency I: USC course projects Inspection Artifacts to be reviewed High-level Impacts to Project Rating Rating Yes II: Galorath, Inc. Perf ormance Testing Operational Scenarios to be applied High-level Frequency of Use Rating Rating No III: ISCAS project Functionality Testing Features to be tested Medium-level Benef it + Penalty Rating Rating No IV: USC course projects Acceptance Testing Test Cases to be executed Low-level Feature BI + Testing Aspect Rating Assume equal Yes These four typical case studies cover the most commonly used defect removal activities during the software development life cycle. Although the prioritization strategies for them are all triggered by RRL, the ways to get the priorities and dependencies for the items to be prioritized are different per the defect removal activity type and the project context. 39 For example, the business case analysis can be implemented with various methods, considering their ease of use and adaption under experiments’ environment. For example, in the case study of value-based testing scenario prioritization in Chapter 5, we use frequency of use (FU) combined with product importance as a variant of business importance for operational scenarios; in the case study of value-based feature prioritization for software testing in Chapter 6, Karl Wiegers’ requirement prioritization approach [Wiegers, 1999] is adopted, which considers both the positive benefit of the presence of a feature and the negative impact of its absence. In the case study of value- based test case prioritization in Chapter 7, classic S-curve production function with segments of investment, high-payoff, and diminishing returns [Boehm, 1981] are used to train students for their project features’ business case analysis with the Kano model [Kano] as a reference to complement their analysis for feature business importance ratings. Test cases’ business importance is then determined by its corresponding functions/components/features’ importance, and whether testing the core function of this feature or not. As for the case study of determining the priority of artifacts (system capabilities) in Chapter 4, the business importance is tailored to ratings of their influences/impacts to the project’s success. The similarity for these different business case analyses is that all using well-defined, context-based relative business importance ratings. These four case studies have practical meanings in real industry and practitioners can have 3 learner outcomes for each case study as below: What are the value-based inspection and testing prioritization drivers and their trade- offs? 40 What are the detailed practices and steps for the value-based inspections/ testing process under project contexts? How to track business value of testing and measure testing efficiency using a proposed real earned value system, with real industrial evidences? 41 Chapter 4: Case Study I-Prioritize Artifacts to be Reviewed 4.1. Background This case study for prioritizing artifacts to be reviewed was implemented in the real-client projects’ verification and validation activities at USC graduate-level software engineering course. The increasing growth of software artifact package motivates us to prioritize the artifacts be reviewed with the goal to improve the review cost-effectiveness. At USC, best practices from software engineering industries are introduced to students through a 2-semester graduate software engineering course (Csci577a, b) with real-client projects. From Fall 2008, the Incremental Commitment Spiral Model (ICSM) [Boehm and Lane, 2007], a value-based, risk-driven software life cycle process model was introduced and tailored as a guideline [ICSM-Sw] for this course as shown in Figure 14. It teaches and trains students skills such as understanding and negotiating stakeholder needs, priorities and shared visions; rapid prototyping; evaluating COTS, services options; business and feasibility evidence analysis; and concurrent plans, requirements and solutions development. 42 In this course, students work in teams and are required to understand and apply the Incremental Commitment Spiral Model for software engineering to real-world projects. In CSCI 577b, student teams develop Initial Operational Capability (IOC) products based on the best results from CSCI 577a. As the guideline for this course, ICSM covers the full system development life cycle based on Exploration, Valuation, Foundations, Development, and Operations phases as shown in Figure 14. The key to synchronizing and stabilizing all of the concurrent product and process definition activities is a set of risk-driven anchor point milestones: the Exploration Commitment Review (ECR), Valuation Commitment Review (VCR), Foundation Commitment Review (FCR), Development Commitment Review (DCR), Rebaselined Development Commitment Review (RDCR), Core Capability Drivethrough (CCD), Transition Readiness Review (TRR), and Operation Commitment Review (OCR). At these milestones, the business, technical, and operational feasibility of the growing package of specifications and plans is evaluated by independent experts. For the course, clients, Figure 14. IC SM framework tailored for csci577 [ICSM-Sw] 43 professors and teaching assistants perform Architecture Review Board (ARB) activities based on to evaluate the package of specifications and plans. Most off-campus students come from real IT industry with rich experiences. They often take on the roles of Quality Focal Point and Integrated Independent Verification and Validation (IIV&V) to review set of artifacts to find any issues related to completeness, consistency, feasibility, ambiguity, conformance, and risk in order to minimize the issues found at ARB. A series of package review assignments are consecutively given to them after development teams submit their packages during the whole semester. The instructions for each assignment, together with artifact templates in the ICSM Electronic Process Guide (EPG) [ICSM-Sw] provide reviewing entry and exit criteria for each package review. Table 7 summarizes the content of the V&V reviews as performed in Fall 2009 and Fall 2010 and 2011, and Table 8 gives the definitions of the ICSM and all other acronyms used in this case study. 44 Table 7.V&V assignments for Fall2009/2010 V&Ver Assignment Review Package 2009 V&V Method 2010/2011 V&V Method Learn to Use B ugzilla System for Your Project Team Eval of VC Package OCD,FED, LCP FV&V FV&V Eval of Initial Prototype PRO FV&V FV&V Eval of Core FC Package OCD,PRO,SSRD**,SSAD,LCP,FED, SID FV&V VbV&V Eval of Draft FC Package OCD,PRO,SSRD**,SSAD,LCP,FED, SID FV&V VbV&V Eval of FC/DC Package OCD,PRO,SSRD**,SSAD,LCP,FED, SID, QMP, ATPC^, IP^ FV&V VbV&V Eval of Draft DC/TRR Package OCD,PRO,SSRD**,SSAD,LCP,FED, SID, QMP, ATPC^, IP^, TP^ VbV&V VbV&V Eval of DC/TRR Package OCD,PRO,SSRD**,SSAD,LCP,FED, SID, QMP, ATPC, IP, TP, IAR^,UM^,TM^,TPR^ VbV&V VbV&V **: not required by NDI/NCS team; ^: only required by one-semester team; Table 8. Acronyms ICSM phases: VC: Valuation Commitment, FC: Foundation Commitment, DC: Development Commitment, TRR: Transition Readiness R eview, RDC: Rebaselined Development Commitment, IOC: Initial Operational Capability, TS: Transition & Support Artifacts developed and reviewed for this course: OCD: Operational Concept Description, SSRD: System and Software Requirements Description, SSAD: System and Software Architecture Description, LCP: Life Cycle Plan, FE D: Feasibility Evidence Description, SID: Supporting Information Document, QMP :Quality Management Plan, IP: Iteration Plan, IAR: Iteration Assessment Report, TP: Transition Plan, TPC: Test Plan and Cases, TPR: Test Procedures and Result, UM: User Manual, SP: Support Plan, TM: Training Materials Others: FV&V: Formal Verification & Validation, VbV&V: Value-based Verification & Validation, Eval: Evaluation, ARB: Architecture Review Board 45 4.2. Case Study Design The comparison analysis is conducted between 8 2010-teams and 13 2011-teams that adopted the value-based prioritization strategy and 14 2009-teams adopting a value- neutral method without prioritizing before reviewing. All the three years’ teams reviewed the same content of three artifact packages as shown in Table 9. Table 9. Documents and sections to be reviewed Doc/Sec CoreFCP DraftFCP FC/DCP 1&2 sem 1&2 sem 2 sem 1 sem OCD 100% 100% 100% 100% FED AA(Section 1,5) NDI(Section1,3,4.1,4.2.1,4.2.2) Section 1-5 Section 1-5 100% LCP Section 1, 3.3 100% 100% 100% SSRD AA(100%) NDI(N/A) AA(100%) NDI(N/A) AA(100%) NDI(N/A) AA(100%) NDI(N/A) SSAD Section 1, 2.1.1-2.1.3 Section 1, 2 Section 1, 2 100% PRO Most critical/important use cases 100% 100% 100% SID 100% 100% 100% 100% QMP N/A N/A Section 1,2 100% ATPC N/A N/A N/A 100% IP N/A N/A N/A 100% Year 2009 teams used a value-neutral formal V&V process (FV&V) to reviewing the three artifact packages, a variant of Fagan inspection [Fagan, 1976] practice. The steps they followed are: 46 Table 10. Value-neutral Formal V&V process Step 1: Create E xit Criteria: From the original team assignment’s description and the related ICSM EPG completion criteria, generate a set of exit criteria that identify what needs to be present and the standard for acceptance of each document. Step 2: Review and Report Concerns: Based upon the exit criteria, read (review) the documents and report concerns and issues into the B ugzilla [USC_CSSE_Bugzilla] system. Step 3: Generate E valuation Report Management Overview - List any features of the solution described in this artifact that are particularly good, of which a non–technical client should be aware of. Technical Details - List any features of the solution described in this artifact that you feel are particularly good, and which a technical reviewer should be aware of. Major Errors & Omissions - List top 3 errors or omissions in the solution described in this artifact that a non–technical client would care about. The description of an error (or omission) should be understandable to a non–technical client, and should explain why the error is worth the client’s attention. Critical Concerns - List top 3 concerns with the solution described in this artifact that a non– technical client would care about. The description of the concern should be understandable to a non– technical client, and should explain why the client should be aware of it. You should also suggest step(s) to take that would reduce or eliminate your concern. Year 2010 and 2011 teams applied the value-based, dependency-aware prioritization strategy to the review process with the guidelines for inspection as summarized as in Table 11. 47 Table 11. Value-based V&V process Step 1: Value-based V&V Artifacts Prioritization Priority Factor Rating Guideline Importance 5: most important 3: normal 1: least important Without this document, the project can’t move forward or could even fail; it should be rated with high importance Some documents serve a supporting function. Without them, the project still could move on; this kind of document should be rated with lower importance Quality Risk 5: highly risky 3: normal 1: least risky Based on previous reviews, the documents with intensive defects might be still fault-prone, so this indicates a high quality risk Personnel factors, e.g. the author of this documents is not proficient or motivated enough; this indicates a high quality risk A more com plex document might have a high quality risk A new document or an old document with a large portion of newly added sections might have a high quality risk Dependency 5: highly dependent 3:normal 1: not dependent Sometimes some lower-priority artifacts are required to be reviewed at least for reference before reviewing a higher-priority one. For example, in order to review SSAD or TPC, SSRD is required for reference. Basically, the more documents this document depends on, the higher the Dependency rating is, and the lower the reviewing priority will be Review Cost 5: need intensive effort 3: need moderate effort 1: need little effort A new document or an old document with a large portion of newly added sections usually takes more time to review and vice versa A more com plex document usually takes more time to review and vice versa Determine Weights Weights for each factor (Importance, Quality Risk, Review Cost, and Dependency) could be set according to the project context. Default values are 1.0 for each factor Priority Calculation E.g: for a document, Importance=5, Quality Risk=3, Review Cost=2, Dependency = 1, default weights are used=> Priority= (5*3)/(2*1)=7.5 A spreadsheet [USC_577a_VBV&VPS, 2010] helps to calculate the priority automatically, 5-level ratings for each factor are VH, H, M, L VL with values from 5 to 1, intermediate values 2, 4 are also allowed. Step 2: Review artifacts based on prioritization and report defects/issues The one with higher priority value should be reviewed first For each document’s review, review the core part of the document first. Report issues into the Bugzilla [USC_CSSE_Bugzilla] Step 3: List top 10 defects/ issues List top 10 highest-risk defects or issues based on issues’ priority and severity 48 A real example of artifacts prioritization in one package review by a 2010-team [USC_577a_VBV&VAPE, 2010] is displayed in Table 12. The default weight of 1.0 for each factor is used. Based on the priority calculated, reviewing order follows SSRD, OCD, PRO, SSAD, LCP, FED, SID. SSRD has the highest reviewing priority with the rationales provided: SSRD contains the requirements of the system, without this document, the project can't move forward or could even fail (Very-High Importance). This is a complex document, and needs to be consistent with win conditions negotiation, which might not be complete at this point, also, a lot of rework was required based on comments from TA (Very-High Quality Risk). SSRD depends on few other artifacts (Low Dependency). This is an old document, but it is complex with a lot of rework (Very-High Review Cost). Table 12. A n example of value-based artifact prioritization Weights: 1 1 1 1 Importance Quality Risk Dependency Review Cost Priority LCP M This document describes the lif e cycle plan of the project. This document serves as supporting f unction, without this, the project still could move on. With his document, the project could move more smoothly. L Based on previous reviews, the author of this document has a strong sense of responsibility. L M A lot of new sections added, but this document is not very complex. 1.00 OCD H This document gives the overall operational concept of the system. This document is important, but it is not critical f or this success of the system. VH This is a complex document and a lot of the sections in this document needed to be redone based on the comments received from the TA. M SSRD H Old document, but a lot of rework done. 1.67 49 FED H This document should be rated high because it provides feasibility evidence f or the project. Without this document, we don't know whether the project is f easible. H The author of this document does not have appropriate time to complete this document with quality work. H SSRD, SSAD H A lot of new section added to this version of the document. 1.00 SSRD VH This document contains the requirements of the system. Without this document, the project can't move f orward or even fail. VH This is a complex document. This document needs to be consistent with win conditions negotiation, which might not be complete at this point. Also, a lot of rework was required based on comments f rom TA. L VH This is an old document, but it is complex with a lot of rework. 2.50 SSAD VH This document contains the architecture of the system. Without this document, the project can't move f orward or even fail. VH This is a complex document and it is a new document. The author of this document did not know that this document was due until the morning of the due date. H SSRD, OCD VH This is an old document, but it is complex with a lot of rework done f or this version. 1.25 SID VL This document serves as supporting f unction, without this document, the project still could move on, but the project could move on more smoothly with this document. L This is an old document. Only additions made to existing sections. VH OCD, SSRD, FED, LCP, SSAD, PRO VL This is an old document and this document has no technical contents. 0.40 PRO H Without this document, the project can probably move f orward, but the system might not be what the customer is expecting. This document allows the customer to have a glimpse of the system. L This is an old document with little new contents. The author has a high sense of responsibility and he f ixed bugs from the last review in reasonable time. M FED L This is an old document with little content added since last version and not much rework required. 1.33 An example of Top 10 issues made by this team for CoreFCP evaluation is displayed in Table 13. These Top 10 issues are communicated in a timely manner with artifact authors to attract enough emphasis. The interesting finding is the relations between 50 the artifact priority sequence and the top 10 issues sequence: the issues with higher impact usually exist in the artifacts with high priority, showing that the artifact prioritization enables reviewers to focus on issues with high impact at least in this context. However, it also helps avoid the potential problem of neglecting high-impact issues in lower-priority artifacts, as in Issues 8 and 10. Table 13. A n example of Top 10 Issues Summary Rationale 1 SSRD Missing important requirements. A lot of important requirements are missing. Without these requirements, the system will not succeed. 2 SSRD Requirement supporting information too generic. The output, destination, precondition, and post condition should be defined better. These description will allows the development team and the client better understand the requirements. This is important for system success. 3 SSAD Wrong cardinality in the system context diagram. The cardinality of this diagram needs to be accurate since this describes the top level of the system context. This is important for system success. 4 OCD The client and client advisor stakeholders should be concentrating on the deployment benefits. It is important for that this benefits chain diagram accurately shows the benefits of the system during deployment in order for the client to show to potential investor to gather fund to support the continuation of system development. 5 OCD The system boundary and environment missing support infrastructure. It is important for the System boundary and environment diagram to capture all necessary support infrastructure in order for the team to consider all risks and requirements related the system support infrastructure. 6 FED Missing use case references in the FED. Capability feasibility table proves the feasibility of all system capabilities to date. Reference to the use case is important for the important stakeholders to understand the capabilities and their feasibility. 7 FED Incorrect mitigation plan. Mitigation plans for project risks are important to overcome the risks. This is important for system success. 8 LCP Missing skills and roles The LCP did not identify the skill required and roles for next semester. This information is important for the success of the project because the team next semester can use these information and recruit new team members meeting the identified needed skills. 51 9 FED CR# in FED doesn't match with CR# in SSRD The CR numbers need to match in both FED and SSRD for correct requirement references. 10 LCP COCOMO drivers rework COCOMO driver values need to be accurate to have a better estimate for the client. The three-year experiment issue data for the evaluation of CoreFCP, DraftFCP and FC/DCP from total 35 teams is collected and extracted from the Bugzilla database. The generic term “Issue” covers both “Concerns” and “Problems”. If the IV&Vers find any issue, they report it as a “Concern” in Bugzilla and assign it to the relevant artifact author. The author determines whether the concern is a problem or not. As transformed in Table 14, Severity is rated from High (corresponding to ratings of Blocker, Critical, Major in Bugzilla ), Medium (corresponding the rating of Normal in Bugzilla), Low ( the ratings of Minor, Trivial, Enhancement in Bugzilla) with the value from 3 to 1. Priority is rated from High (Resolve Immediately), Medium (Normal Queue), Low (Not Urgent, Low Priority, Resolved Later) with the value from 3 to 1. The Impact of an issue is the product of its Severity and Priority. The impact of an issue with high severity and high priority is 9. Obviously, the impact of an issue is an element in the set {1, 2, 3, 4, 6, and 9}. 52 Table 14. Issue Severity & Priority rate mapping Rating for Measurement Rating in Bugzilla Value Severity High Blocker, Critical, Major 3 Medium Normal 2 Low Minor, Trivial, Enhancement 1 Priority High Resolve Immediately 3 Medium Normal Queue 2 Low Not Urgent, Low Priority, Resolved Later 1 The generic term “Issue” covers both “Concerns” and “Problems”. If the IV&Vers find any issue, they report it as a “Concern” in Bugzilla and assign it to the relevant artifact author. The author determines whether it needs fixing by choosing an option for “Resolution” as displayed in Table 15. Whether an issue is a problem or not is easy to be determined by querying the “Resolution” of the issue. “Fixed” and “Won’t Fix” mean the issue is a problem and the other two options mean that it is not. Table 15. Resolution options in Bugzilla Resolution Options Instructions in Bugzilla Fixed If the issue is a problem, after you fix the problem in the artifact, then choose “Fixed” Won’t Fix If the issue is a problem, but won’t be fixed for this time, then choose “Won’t Fix” and must provide the clear reason in “Additional Comments” why it can’t be fixed for this time Invalid If the issue is not a problem then choose “Invalid” and must provide a clear reason in “Additional Comments” WorksForMe If the issue really works fine, then choose “WorksForMe” and let the IVVer review this again 53 4.3. Results Various measures in Table 16 are used to compare the performance of 2011, 2010 years’ value-based and 2009 value-neutral review process. The main goal of the Value- based review or inspection is to increase the review cost effectiveness as defined in Chapter 3. Table 16. Review effectiveness measures Measures Details Number of Concerns The number of concerns found by reviewers Number of Problems The number of problems found by reviewers Number of Concerns per reviewing hour The number of concerns found by reviewers per reviewing hour Number of Problems per reviewing hour The number of problems found by reviewers per reviewing hour Review Effort Effort spent on all activities in the package review Review Effectiveness of total Concerns As defined in Chapter 3 but for concerns Review Effectiveness of total Problem s As defined in Chapter 3 but for problems Average of Impact per C oncern Review Effectiveness of total Concerns/ Number of Concerns Average of Impact per Problem Review Effectiveness of total Problems/ Number of Problem s Review Cost Effectiveness of Concerns As defined in Chapter 3 but for concerns Review Cost Effectiveness of Problem s As defined in Chapter 3 but for problems Table 17 to Table 22 list the three years’ 35 teams’ performances on different measures for concerns, and problems’ data is similar and is not listed here due to page limitation. Mean and Standard Deviation values are calculated at the bottom of each measure. 54 Table 17. Number of Concerns 2011 Teams 2010 Teams 2009 Teams T-1 180 T-1 141 T-1 58 T-3 82 T-2 198 T-2 45 T-4 138 T-3 53 T-3 102 T-5 211 T-4 33 T-4 87 T-6 38 T-5 60 T-5 32 T-7 78 T-6 116 T-6 58 T-8 117 T-7 98 T-7 103 T-9 163 T-8 94 T-8 119 T-10 80 T-9 157 T-11 148 T-10 61 T-12 58 T-11 108 T-13 147 T-12 41 T-14 44 T-13 34 T-14 33 Mean 114.15 Mean 99.13 Mean 74.14 Stdev 54.99 Stdev 53.28 Stdev 38.75 55 Table 18. Number of Concerns per reviewing hour 2011 Teams 2010 Teams 2009 Teams T-1 4.81 T-1 2.79 T-1 0.81 T-3 1.86 T-2 3.07 T-2 1.25 T-4 5.17 T-3 1.22 T-3 2.15 T-5 7.54 T-4 1.12 T-4 1.43 T-6 1.10 T-5 1.08 T-5 0.79 T-7 2.41 T-6 3.02 T-6 1.17 T-8 3.74 T-7 2.89 T-7 1.46 T-9 6.15 T-8 1.46 T-8 2.08 T-10 4.88 T-9 2.18 T-11 7.22 T-10 1.14 T-12 2.32 T-11 1.60 T-13 5.08 T-12 1.53 T-14 1.90 T-13 0.75 T-14 0.69 Mean 4.17 Mean 2.08 Mean 1.36 Stdev 2.12 Stdev 0.93 Stdev 0.52 56 Table 19. Review Effort 2011 Teams 2010 Teams 2009 Teams T-1 37.44 T-1 50.5 T-1 71.2 T-3 44.06 T-2 64.6 T-2 36.1 T-4 26.69 T-3 43.5 T-3 47.5 T-5 27.98 T-4 29.5 T-4 61 T-6 34.6 T-5 55.35 T-5 40.5 T-7 32.4 T-6 38.4 T-6 49.5 T-8 31.25 T-7 33.95 T-7 70.5 T-9 26.5 T-8 64.3 T-8 57.2 T-10 16.4 T-9 72 T-11 20.5 T-10 53.5 T-12 25 T-11 67.5 T-13 28.95 T-12 26.85 T-14 23.1 T-13 45.5 T-14 48 Mean 28.84 Mean 47.51 Mean 53.35 Stdev 7.30 Stdev 13.37 Stdev 13.97 57 Table 20. Review Effectiveness of total Concerns 2011 Teams 2010 Teams 2009 Teams T-1 888 T-1 790 T-1 242 T-3 396 T-2 872 T-2 186 T-4 527 T-3 233 T-3 334 T-5 1153 T-4 147 T-4 349 T-6 139 T-5 233 T-5 151 T-7 331 T-6 480 T-6 186 T-8 487 T-7 404 T-7 486 T-9 811 T-8 406 T-8 422 T-10 333 T-9 631 T-11 646 T-10 229 T-12 226 T-11 442 T-13 562 T-12 160 T-14 191 T-13 133 T-14 137 Mean 514.62 Mean 445.63 Mean 292 Stdev 297.92 Stdev 263.08 Stdev 155.05 58 Table 21. Average of Impact per Concern 2011 Teams 2010 Teams 2009 Teams T-1 4.93 T-1 5.60 T-1 4.17 T-3 4.83 T-2 4.40 T-2 4.13 T-4 3.82 T-3 4.40 T-3 3.27 T-5 5.46 T-4 4.45 T-4 4.01 T-6 3.66 T-5 3.88 T-5 4.72 T-7 4.24 T-6 4.14 T-6 3.21 T-8 4.16 T-7 4.12 T-7 4.72 T-9 4.98 T-8 4.32 T-8 3.55 T-10 4.16 T-9 4.02 T-11 4.36 T-10 3.75 T-12 3.90 T-11 4.09 T-13 3.82 T-12 3.90 T-14 4.34 T-13 3.91 T-14 4.15 Mean 4.36 Mean 4.42 Mean 3.97 Stdev 0.54 Stdev 0.52 Stdev 0.44 59 Table 22. Cost Effectiveness of Concerns 2011 Teams 2010 Teams 2009 Teams T-1 23.72 T-1 15.64 T-1 3.40 T-3 8.99 T-2 13.50 T-2 5.15 T-4 19.75 T-3 5.36 T-3 7.03 T-5 41.21 T-4 4.98 T-4 5.72 T-6 4.02 T-5 4.21 T-5 3.73 T-7 10.22 T-6 12.50 T-6 3.76 T-8 15.58 T-7 11.90 T-7 6.89 T-9 30.60 T-8 6.31 T-8 7.38 T-10 20.30 T-9 8.76 T-11 31.51 T-10 4.28 T-12 9.04 T-11 6.55 T-13 19.41 T-12 5.96 T-14 8.27 T-13 2.92 T-14 2.85 Mean 18.66 Mean 9.30 Mean 5.31 Stdev 10.94 Stdev 4.53 Stdev 1.86 Table 23 compares the Mean and Standard Deviation values for all the measures between the three-year teams. To determine whether the differences between years based on a measure is statistically significant or not, Table 24 compares every two years’ data using the F-test and T-test. The F-test determines whether two samples have different variances. If the significance (p-value) for F-test is 0.05 or below, the two samples have different variances. This will determine which type of T-test will be used to determine whether the two samples have the same mean. Two types of T-test are: Two-sample equal variance (homoscedastic), and Two-sample unequal variance (heteroscedastic). If the 60 significance (p-value) for T-test is 0.05 or below, the two samples have different means. For example, Table 24 shows that 2010’s value-based review teams had a 75.04% higher Review Cost Effectiveness of Concerns than 2009’s value-neutral teams. The p-value for F-test 0.0060 leads to choose “Two-sample unequal variance” type T-test. The p-value for T-test 0.0218 is strong evidence (well below 0.05) that the 75.04% improvement has statistical significance, the similar for its comparison between 2011 and 2009 (with F-test 0.0000, and T-test 0.0004), which rejects the hypothesis H-r1. Table 23. Data Summaries based on all Metrics 2011 Team 2010 Team 2009 Team Mean Stdev Mean Stdev Mean Stdev Number of Concerns 114.15 54.99 99.13 53.28 74.14 38.75 Number of Problems 108.62 52.81 93.38 52.96 68.79 35.35 Number of Concerns per reviewing hour 4.17 2.12 2.08 0.93 1.36 0.52 Number of Problems per reviewing hour 3.96 2.04 1.96 0.92 1.26 0.48 Review Effort 28.84 7.30 47.51 13.37 53.35 13.97 Review Effectiveness of total Concerns 514.62 297.92 445.63 263.08 292.00 155.05 Review Effectiveness of total Problems 491.85 287.84 416.25 254.15 272.07 141.78 Average of Impact per Concern 4.36 0.54 4.42 0.52 3.97 0.44 Average of Impact per Problem 4.37 0.57 4.37 0.52 3.99 0.45 Review Cost Effectiveness of Concerns 18.66 10.94 9.30 4.53 5.31 1.86 Review Cost Effectiveness of Problems 17.80 10.54 8.69 4.32 4.97 1.73 61 Table 24. Statistics Comparative Results between Years 2011 Vs 2009 2010 Vs 2009 2011 Vs 2010 % 2011 Team higher F-tes t T-test % 2010 Team higher F-tes t T-test % 2011 Team higher F-tes t T-test (p-valu e) (p-valu e) (p-valu e) (p- value) (p- value) (p- value) Number of Con cerns 53.96% 0.225 0.0187 33.69% 0.3049 0.1093 15.16% 0.9752 0.2729 Number of Problems 57.90% 0.1656 0.0144 35.75% 0.1976 0.1026 16.32% 0.9454 0.2644 Number of Con cerns p er revi ewing hou r 206.77% 0 0.0002 53.17% 0.0636 0.0142 100.28% 0.0372 0.0031 Number of Problems p er revi ewing hou r 213.33% 0 0.0002 55.16% 0.0393 0.0382 101.94% 0.044 0.0033 Review Effort -45.95% 0.0314 0 -10.94% 0.9509 0.1752 -39.31% 0.064 0.0003 Review Effectiven ess of total Concerns 76.24% 0.0268 0.0136 52.61% 0.0949 0.0489 15.48% 0.7673 0.2985 Review Effectiven ess of total Problems 80.78% 0.0169 0.0117 52.99% 0.0661 0.0502 18.16% 0.7671 0.2746 Av era ge of Impact per Concern 9.74% 0.475 0.026 11.14% 0.5957 0.023 -1.26% 0.9358 0.4095 Av era ge of Impact per Probl em 9.46% 0.4398 0.0333 9.61% 0.6307 0.043 -0.13% 0.8602 0.4909 Review Cost E ffectiven ess of Concerns 251.23% 0 0.0004 75.04% 0.006 0.0218 100.66% 0.0271 0.0071 Review Cost E ffectiven ess of Problems 258.34% 0 0.0004 75.01% 0.0048 0.0233 104.75% 0.0254 0.0066 In Table 24 the shadowed sections represent that those comparisons are statistically significant, we can see that 2010 teams’ performance improves from 2009 teams’ on most of the measures, except the number of concerns/problems, and review effort. 2011 teams’ performance even improves from 2009 teams’ on all the measures. Since Year 2010 and 2011 teams all adopted the same value-based inspection process, their differences on the measures between the two years are expected to be insignificant. However, we find that the review effort in 2011 is dramatically decreased, which directly causes significant differences on other measures relevant to review effort between 2010 and 2011, such as review effort, number of concerns/problems per reviewing hour, review cost effectiveness of concerns/problems. The decreased review effort in 2011 is due to 2011 year’s team size change: 2011 teams have an average size of 6.5 (6 or 7) developers with 1 reviewer each team, while 2010 teams have an average size of 7.5 (7 or 8) developers with an average of 1.5 (1 or 2) reviewers each team, decreased 62 number of reviewers each team leads to the decreased review effort. This uncontrolled factor might partially contribute to an overall factor of 2.5’s improvement from 2009 to 2011, or an overall 100% from 2010 to 2011 on review cost effectiveness of concerns/problems, which might be a potential threat of validity to our positive results, however, we also find that all other review effort irrelevant measures’ comparison between 2010 and 2011 shows these two years’ performances are similar, such as average of impact per concern/problem, number of concerns/problems. Two reviewers in each team in 2010 usually overlapped reviewed all documents, they tend not to report duplicated concerns if there was already a similar one in the concern list, so for 2010 and 2011, it makes sense that both years have nearly the same number of concerns (no statistically significant), but review effort nearly doubled in 2010 since the reviewer size is nearly twice as 2011. This might also give us some hints that one reviewer per team might be enough for 577ab projects. This indicates that similar as the year 2010, reviewers tend to report issues with higher severity and priority by using value-based inspection process. This also minimizes the change of reviewer size’s threat to our results. To sum up, these comparative analysis results show that the value-based review method to prioritize artifacts can improve the cost effectiveness of reviewing activities, and can enable reviewers to be more focused on artifacts with high importance and risks, and capture concerns/problems with high impact. Besides, to complement the quantitative analysis, a survey was distributed to reviewers after introducing the Value-based prioritization strategy. In their feedback, almost all 14 Year 2009 teams, 8 Year 2010 teams and 13 Year 2011 teams chose the Value-based reviewing process. Various advantages are identified by reviewers, such as: 63 more streamlined, efficient, not a waste of time, more focused on most important documents with high quality risks, more focused on non-trivial defects and issues, an organized and systematic way to review documents in an integrated way, not treating documents independently. Some example responses are as below: “The value-based V&V approach holds a great appeal – a more intensive and focused V&V process. Since items are prioritized and rated as to importance and likelihood of having errors. This is meant for you to allocate your time according to how likely errors (and how much damage could be done) will occur in an artifact. By choosing to review those areas that have changed or are directly impacted by changes in the other documents I believe I can give spend more quality time in reviewing the changes and give greater emphasis on the changes and impacts.” “Top 10 issue list gives a centralized location for showing the issues as opposed to spread across several documents. Additionally, by prioritizing the significance of each issue, it gives document authors a better picture of which issues they should spend more time on resolving and let them know which ones are more important to resolve. Previously, they would have just tackled the issues in any particular order, and may not have spent the necessary time or detail to ensure proper resolution. Focusing on a top 10 list helps me to look at the bigger picture instead of worrying about as many minor problems, which will result in documents that will have fewer big problems.” “For the review of the Draft FC Package, the Value-based IIV&V Process will be used. This review process was selected because of the time constraint of this review. There is only one weekend to review all seven Draft FC Package documents. The Value- based review will allow me to prioritize the documents based on importance, quality risk, 64 dependencies, and reviewing cost. The documents will be reviewed based on its identified priority. This allows documents more critical to the success of the project to be reviewed first and given more time to. ” These responses and the unanimous choice of using the Value-based process show that the performers considered the Value-based V&V process to be superior to the formal V&V process for achieving their project objectives. The combination of both qualitative and quantitative evidence produced viable conclusions. 65 Chapter 5: Case Study II-Prioritize Testing Scenarios to be Applied 5.1. Background This case study to prioritize testing scenarios was implemented at the acceptance testing phase of one project in Galorath, Inc. [Galorath]. The project is designed to develop automated testing macros/scripts for the company’s three main products (SEER- SEM, SEER-H, and SEER-MFG) to automate their installation/un-installation/upgrade processes. The three macros below automate the work-flow for installation test, un- installation test and upgrade test respectively: Macro1: New Install Test integrates the steps of: Install the current product version-> Check correctness of the installed files and generate a report-> Export registry\ODBC\shortcut files-> Check correctness of those exported files and a generate report Macro2: Uninstall Test integrates the steps of: Uninstall the current product version-> Check whether all installed files are deleted after un-installation & generate a report-> Export registry\ODBC\shortcut files-> Check whether registry\ODBC\shortcut files are deleted after un-installation and generate a report 66 Macro 3: Upgrade Test integrates the steps of: Install one of previous product versions-> Upgrade to the current version-> Check correctness of installed files & generate a report-> Export registry\ODBC\shortcut files-> Check correctness of those exported files & generate a report-> Uninstall the current product version-> Return to the beginning (finish until all previous product versions are all tested) Secondly, these macros are going to be finally released to their testers, consultants, developers for internal testing purpose at the end. They are supposed to run these macros on their own machines or virtual machines on their host machines to do the installation testing (not like a dedicated testing server) and they need to deal with various variables: Different products’ (SEER-SEM, SEER-H, and SEER-MFG) installing, un- installing and upgrading processes are different and should be recorded and replayed respectively; The paths of registry files vary due to different OS bit (32 bit or 64 bit); The paths of shortcuts are different due to different operating systems (WinXP, Vista, Win7, Server 2003, and Server 2008) and OS bit; Different installation types (Local, Client, and Server) will result in different installation which will be displayed in registry files; In sum, the automation is supposed to work well for three types of installation type (Local, Client, Server) on different various operating systems (i.e. Win7, Vista, WinXp…) 67 with 32bit or 64bit, and on various virtual machines as well. The combination of these variables increases the operational scenarios to be tested at the phase of acceptance testing before the fixed release time. In our case study, we define one scenario as testing one product (SEER-MFG, SEER-H or SEER-SEM) can be installed, uninstalled, upgraded from its previous versions correctly without any performance issue on one operating system environment with one type of installation. For example, for Server type test, three types of servers need to be tested, i.e. WinServer 2003x32, 2008x64, 2008x32, for each of the three SEER products, this results in 3*3=9 scenarios; For Local or Client type test, the 10 operating systems to be workable are listed in Table 32and Table 33, and for each of the three SEER products as well, so this results in 10*3=30 scenarios as well. As show Figure 15, the number of leaf nodes is 3*3+10*3+10*3=69, which means there are 69 paths from the root to the leaf nodes, which represents 69 scenarios to be tested before final release. The time required to test one scenario is roughly (125+185+490)/3=267mins=4.4 hours (Table 31). So the time required to run all 69 scenarios testing is 69*4.4=306 hours=39 working days. This effort even doesn’t count the time for fixing and re-testing effort. Even several computers can be paralleled to run the test at the same time, this is still impossible to be finished before the fixed release time. Figure 15. Scenarios to be tested 68 5.2. Case Study Design In order to improve the cost-effectiveness of testing under the time constraint, both coverage-based and value-based testing strategies are combined to serve this purpose. 5.2.1. Maximize Testing Coverage As displayed in Table 25, Macro 3 covers all the functionalities and is supposed to catch all defects that Macro 1 and Macro 2 have. So the coverage-based strategy is: First test Macro3 according to the coverage-based testing principle. If defects are found in Macro 3, check whether this defect also exists in the shared features for Macro 1 and Macro 2, if so, adapt this change to Macro 1 and 2 and test them as well. So under the most optimistic situation that macro 3 passes without any performance issues, the time of running macros only requires the time of running macro 3. This could save some effort to test Macro 1 and Macro 2 individually. Table 25 Macro-feature coverage Features Macro 1 Macro 2 Macro 3 Install process X X Uninstall process X X Upgrade process X Export installed files X X Compare files’ size, date and generate report1 X X Export ODBC registry files X X X Export Registry files X X X Export shortcuts X X X Combine files X X X Compare file's content and generate report2 X X X 69 Besides, the value-based testing prioritization strategy was applied to further improve testing cost-effectiveness by focusing the scarce testing resources on the most valuable and risky parts of those macros. The project manager and the product manager helped to provide the business value for scenarios based on their frequencies of use (FU), combined with product importance (PI) as a variant for business value. Besides, from the previous testing experiences and observances, we know that which environments are tending to have more performance issues, which parts of the macros are tending to be the bottleneck, all of this information can help with the estimation of scenarios’ Risk Probability (RP). By this value-based prioritization, the testing effort is going to be put on those scenarios with higher frequency of use, and higher risk probability ones, and avoid testing some scenarios that are seldom/never used. The following sections will introduce in detail how the testing priorities are determined step by step. Basically, Table 26 to Table 28 displays the ratings guideline for FU and RP, Table 30 and Table 31 shows the ratings guideline for TC, and illustrates all the rating results for these scenarios. In this part, several acronyms are used as below: FU: Frequency of Use RP: Risk Probability TC: Testing Cost TP: Test Priority BI: Business Importance PI: Product Importance 70 5.2.2. The step to determine Business Value In order to quantify the Frequency of Use (FU), a survey with a rating guideline in Table 26 was sent to the project manager and the product manager for rating various scenarios’ relative FU. Table 26. FU Ratings FU Ratings Rating Guideline 1 (+) Least frequently used, if we have enough time, it is ok to test; 3 (+++) Normally used, so need to test in a normal queue & and make sure work well; 5 (+++++) Most frequently used, so must be tested first & thoroughly and make sure the macros work well; Based on the ratings they provided, for the host machine, WinXP and Win 7 (x64) have the highest frequency of use in Galorath, Inc. For server installation test, people in Galorath, Inc. usually use virtual machines of WinServer 2003(x32) and WinServer 2008(x64) to represent server installation test and rated the highest. For Win 7(x32), although its host machines are used not as many as Win XP and Win 7 (x64), but people frequently use its virtual machine to do the test, so rated as the highest. For Vista (x64), it is seldom used before, and they even don’t have a virtual copy, so it was rated as the lowest as shown in Table 32 and Table 33, Besides, they also provided the product relative importance ratings as shown in Table 27, which will be combined to determine the business value of a scenario as well. 71 Table 27. Product Importance Ratings Product Product Importance SEER-MFG 2 SEER-H 2 SEER-SEM 3 5.2.3. The step to determine Risk Probability In order to quantify the probability of a performance issue’s occurrence, Table 28 gives the rules of thumb for rating the probability. The subjective ratings will be based on past experiences and observances. Table 28. RP Ratings RP Ratings Rating Guideline 0 Have been passed testing 0.3 Low 0.5 Normal 0.7 High 0.9 Very High From previous random testing experiences on different operating systems, the general performance order from low to high is Vista < WinXp(x32) < Win7(x64), however, WinXP(x32) host machine has passed the test when these macros were developed, so its RP rating is 0, even Win7(x64) is supposed to work better than WinXP (x32), but it has never been thoroughly tested before, so we rated its RP as Low; Vista (either x32 or x64) is supposed to have a lower performance, so we rated its RP as High. 72 Win7(x32) is supposed to work well as WinXP (x32) but not better than Win7 (x64), so we rated its RP as Normal. Besides, from previous random testing, we learned that virtual machine’s performance is usually lower than the host machine, and our experiences were proved and validated as they are in consistency with many discussions on some professional forums or technical papers, so we rated virtual machine’s RP not lower than its host. These ratings are also shown in Table 32 and Table 33. Furthermore, during our brainstorm of these macros’ quality risks, the project manager provided the information that few defects were found before for client type installation before and no recent modifications for the recent release. So we only need to test Local and Server installation as shown in Table 29. This information greatly reduced the testing scope and avoided testing the defect-free parts. Table 29. Installation Type Installation Type Need Test? Local 1 Server 1 Client 0 5.2.4. The step to determine Cost Table 30 shows the roughly estimated average time to run each macro. And the total time of running all the three macros for one scenario is their sum 125mins. Table 30. Average Time for Testing Macro 1-3 Macros Running T ime Macro 1 25mins Macro 2 25mins Macro 3 75mins 73 In fact, the time to run one scenario not only consists of the time running macros, the testing preparation time is un-ignorable as well: Setup testing environments, which includes: configuring all installation prerequisites, setup expected results, install/configure COTS required for macro execution. If the operating system which the macros will be tested on is not available, installing a proper one for testing requires even longer time. So basically, we defined the three-level cost ratings as shown in Table 31, and the cost relative rating is roughly 1:2:5. Table 31. Testing Cost Ratings Install OS (3hours) Setup Testing E nvironments (60mins) Run Macros (125mins) Time (mins) Cost Ratings X 125 1 X X 185 2 X X X 490 5 As shown in Table 32 and Table 33, for WinXP and Win7 (x64) host machines, because we developed the macros on them, they both have been set up with testing environments, the testing cost only consists of the time for running macros, so the cost ratings is as low as 1. For Vista(64) and Win 7(x32), no one in Galorath, Inc. has their host machines. It requires installing OS additionally, so they are rated as high as 5. For all virtual machines, Galorath Inc. has their movable copies, we don’t need to install OS, but has to setup testing environments on them, and so they are rated as 2. 74 5.2.5. The step to determine Testing Priority After passing the testing for each scenario, the probability of failure would be reduced to 0, so the testing priority (TP) triggered by RRL is calculated as: Testing Priorities for all scenarios are calculated by FU*RP/TC as shown in in Table 32 and Table 33. Table 32. Testing Priorities for 10 Local Installation Working Environments Local Installation Host Machine Virtual Machine working on the host on the same row FU RP TC TP (RRL) FU RP TC TP (RRL) WinXP (x32) 5 0 1 0 Vista (x32) 3 0.9 2 1.35 Win7 (x64) 5 0.3 1 1.5 WinXP(x32) 5 0.3 2 0.75 Win7 (x32) 5 0.5 2 1.25 Vista(x32) 3 0.9 2 1.35 Vista (x32) 3 0.7 2 1.05 WinXP (x32) 1 0.9 2 0.45 Vista (x64) 1 0.7 5 0.14 Win7 (x32) 3 0.5 5 0.3 75 Table 33. Testing Priorities for 3 Server Installation Working Environments Server Installation Win 7 (x64) VM FU RP TC TP (RRL) WinServer 2003x32 5 0.3 2 0.75 WinServer 2008x64 5 0.5 2 1.25 WinServer 2008x32 3 0.3 2 0.45 Combined with the product importance ratings in Table 27, the value-based scenario testing prioritization algorithm is: First test the scenario whose working environment has the highest TP (RRL); For each selected operating system environment, first test SEER-SEM, which has higher importance, and then test SEER-H or SEER-MFG, which have lower importance. 5.3. Results Table 34 shows the value-based testing prioritization order and the relevant metrics based on this order. Several acronyms used are as below: RRL: Risk Reduction Level BI: Business Importance ABI: Accumulated Business Importance PBIE: Percentage of Business Importance Earned APBIE: Average Percentage of Business Importance Earned AC: Accumulated Cost 76 Table 34. Value-based Scenario Testing Order and Metrics TP(RR L) Passed 1.5 1.35 1.35 1.25 1.25 1.05 0.75 0.75 0.45 0.45 0.3 0.14 FU(BI ) 39 5 3 3 5 5 3 5 5 3 1 3 1 PBIE 48.15% 54.32% 58.02% 61.73% 67.90% 74.07% 77.78% 83.95% 90.12% 93.83% 95.06% 98.77% 100.00% ABI 39 44 47 50 55 60 63 68 73 76 77 80 81 TC 1 1 2 2 2 2 2 2 2 2 2 5 5 AC 1 2 4 6 8 10 12 14 16 18 20 25 30 APC 3.33% 6.67% 13.33% 20.00% 26.67% 33.33% 40.00% 46.67% 53.33% 60.00% 66.67% 83.33% 100.00% ABI/AC 39.00 22.00 11.75 8.33 6.88 6.00 5.25 4.86 4.56 4.22 3.85 3.20 2.70 The first row TP (RRL) in Table 34 shows the testing order we followed to do this testing by first testing the scenario with higher RRL. This order enabled us to focus the limited effort on testing more frequently used scenarios with higher risk probability to fail, and supposed to improve the testing efficiency especially when the testing time and resource is limited. The testing results by using the value-based testing prioritization strategy are shown in Table 35 and Table 36. Due to the schedule constraint, and according to the TP order, we didn’t do thorough test on WinXP (x32) Virtual Machine working on host of Vista (x32) and Vista (x64) host machine, since they both has the lowest frequency of use, they can be ignorable for testing if the time runs out. For Win7 (x32), although it is never tested, it is supposed to pass since its Virtual Machine copy, which is supposed to have even lower performance, has passed the testing. Besides, if we installed a Win 7 (x32) on a host machine to test, this will cause more time, and we couldn’t finish other scenario testing which has higher TP and won’t require installing a new OS before testing. Therefore, the testing strategy combines the considerations of all critical factors and makes the testing results optimal under scarce testing resources. 77 Table 35. Testing Results Local Installation Host Machine Virtual Machine working on the host on the same row WinXP (x32) pass Vista (x32) pass Win7 (x64) pass WinXP (x32) pass Win7 (x32) pass Vista (x32) pass Vista (x32) pass WinXP (x32) Never test, we are running out of time, FU is the lowest, no need to test when the testing time is limited Vista (x64) Never test, we even don’t have VM for this, besides, we are running out of time, FU is the lowest, no need to test when the testing time is limited Win7 (x32) Never test, we don’t have a host machine for this, but supposed to pass, since its VM has passed Table 36. Testing Results (continued) Server Installation Win 7 (64) WinServer 2003x32 pass WinServer 2008x64 pass WinServer 2008x32 pass Figure 16 shows the results of value-based testing prioritization compared with two other situations which might be common in testing planning as well. The three situations for comparison are: Situation 1: value-based testing prioritization strategy: this situation is exactly what we did for the macro testing in Galorath, Inc., using the value-based scenario testing strategy. We followed the Testing Priority (TP) to do the testing. Since our testing time is limited, we had to stop testing when the Accumulated Cost (AC) reached 18 units as shown in Figure 16. At this point, Percentage of Business Importance Earned (PBIE) is as high as 93.83%; Situation 2: Reverse of value-based, risk-driven testing strategy: this situation’s testing order is reversed from Situation 1; when the AC reaches 18 units, PBIE is only 78 22.22%; this is the worst case, but this might be a common value-neutral situation in reality as well. Situation 3: The prioritization in Situation 1 considers all variables into the value- based testing prioritization: not only prioritizes various operating systems, but also prioritizes different products and different installation types. However, in the situation 3, we do a partial value-based prioritization: we still prioritize products and operating systems, but we assume that the installation type is equally important, so the client installation type which has been proved to be defect-free should also be tested. The results show a significant difference: when AC reaches 18 units, PBIE is only 58.02%; much of the testing effort is wasted on testing the defect-free type. In fact, this “partial” value- based prioritization is common in practice: testing managers often do prioritize tests in practice, but the way they prioritize is often intuitive, and tends to ignore some factors into prioritization, so this situation can represent most common situations in practice as well. Since this situation still treats all installation types equally important, we still consider it as a value-neutral one to differentiate the “complete, systematic, comprehensive and integrated” value-based prioritization in Situation 1. 79 Figure 16. Comparison among 3 Situations 74.07% 77.78% 83.95% 90.12% 93.83% 95.06% 100.00% 4.94% 6.17% 9.88% 16.05% 22.22% 25.93% 58.02% 35.80% 39.51% 45.68% 51.85% 58.02% 61.73% 87.65% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 8 10 12 14 16 18 20 22 24 26 28 30 PBIE-1 PBIE-2 PBIE-3 Stop Table 37 compares APBIE of the three situations, and it is obvious that value- based testing prioritization is the best in terms of APBIE. The case study in Galorath, Inc. validates that the added value-based prioritization can improve the scenario testing’s cost- effectiveness in terms of APBIE. Table 37. APBIE Comparison Comparison APBIE Situation 1 (Value-based) 70.99% Situation 2 (Inverse Order) 10.08% Situation 3 (Value-neutral) 32.10% Other value-neutral (or partial value-based) situations’ PBIE curves are supposed to lie between the Situation 1 and Situation 2 in Figure 16, and are representative of the most common situations in reality. From the comparative analysis, we can reject the 80 hypothesis H-t1 which means that value-based prioritization can improve the testing cost- effectiveness. 5.4. Lessons Learned Integrate and leverage the merits of state-of-art effective test prioritization techniques: in this paper, we synthetically incorporated the merits of various test prioritization techniques to maximize the testing cost effectiveness, i.e. coverage-based, defect proneness-driven and most important incorporated the business value into the testing prioritization. Value-based testing strategy introduced in this paper is not independent of other prioritization techniques; on the contrary, it is the synthesis of all the merits from other techniques with a focus on bridging the gap between business or mission value from customers and the testing process. Think more on trade-offs for automated testing at the same time: form our experiences in this case study to establish automated testing at Galorath, Inc., we can also see that establishing automated testing is a high risk as well as a high investment project [Bullock, 2000]. The test automation is also software development, which might be also expensive, fault-prone, and facing evolving and maintenance problems. Furthermore, automated testing usually treats every scenario equally important. However, the combination of value-based test prioritization and automated testing might be a promising strategy and can even further improve the testing cost-effectiveness. For example, adopting the value-based test case prioritization strategy can shrink the testing scope by 60%, the remaining tedious manual testing effort can be further replaced by an initial little investment to write some automated scripts to allow testing run by computer programs overnight and save human effort by 90%, so by the strategy of 81 combining value-based test case prioritization and automated testing, the cost is reduced to (1-60%)*(1-90%)=4% with a factor of 25’s RRL improvement. Anyway, this is also a trade-off question among how much automated testing is enough based on its saving and investment to establish. In fact, any testing strategy has its own advantages; the most important for testing practitioners is having a strong sense of combining the merits of these testing strategies to continuously improve the testing process. Team work is recommended to determine ratings. Prioritization factors’ ratings, i.e. ratings of business importance, risk probability, testing cost, should not only determined by a single person, this might introduce subjective bias which might cause the prioritization misleading. Ratings should be discussed and brainstormed at team meetings when more stakeholders involved to acquire more comprehensive information, resolve disagreements and negotiate to consensus. For example, if we didn’t send out the questionnaire to get the frequency of use of each scenario, we would treat all scenarios equally important and couldn’t finish all the testing in a limited time. The worst situation is that we installed some operating system sceneries that were seldom used and tested the macros on them and finally found that it was no need to test them. The same for risk probability: if we didn’t know that Client installation would not needed to test because it seldom failed before and supposed to be defect-free, amount of testing effort would be put on this unnecessary testing. So team work to discuss and understand the project under test is very important to determine the testing scope and testing order. Business case analysis is based on project contexts: from these empirical studies so far, the most difficult, yet flexible part is how to determine the business importance for 82 the testing items via business case analysis: The business case analysis can be implemented with various methods, considering their ease of use and adaption under experiments’ environment. For example, in this case study of value-based testing scenario prioritization, we use frequency of use (FU) combined with product importance as a variant of business importance for operational scenarios; in the case study of value-based feature prioritization for software testing in Chapter 5, Karl Wiegers’ requirement prioritization approach [Wiegers, 1999] is adopted, which considers both the positive benefit of the presence of a feature and the negative impact of its absence. In the case study of value-based test case prioritization in Chapter 7, classic S-curve production function with segments of investment, high-payoff, and diminishing returns [Boehm, 1981] are used to train students for their project features’ business case analysis with the Kano model [Kano] as a reference to complement their analysis for feature business importance ratings. Test cases’ business importance is then determined by its corresponding functions, components or features’ importance, and test cases’ usage, whether testing the core function of this feature or not As for the case study of determining the priority of artifacts (system capabilities) in Chapter 3, the business importance is tailored to ratings of their influences/impacts to the project’s success. The similarity for these different business case analyses is that all using well-defined, context-based relative business importance ratings. Additional prioritization effort is a trade-off as well: Prioritization can be as easy as in this case study or can be more deliberate. Too much effort on prioritization might bring diminishing testing cost-effectiveness. “How much is enough” depends on the project context and how easily we can get that information required for prioritization. It should be kept in mind all the time that value-based testing prioritization aims at saving 83 effort, rather than increasing effort. In this case study, the information required for this prioritization is from expert estimation (project managers, product manager and project developers) with little cost, yet generate high pay-offs for the limited testing effort. However, for this method’s application on large-scale projects which might have thousands of test items to be prioritized, there has to be a consensus mechanism to collect all the data. We started to implement an automatic way to support this method’s application on large-scale industrial projects. This automation is designed to support establishing the traceability among requirements, code, test cases and defects, so business importance ratings for requirements can be reused for test items, the code’ change and defect data can be used for predicting risk probability. The automation will also experiment the sensitivity analysis on judging the correctness of ratings and how the rating’s change can impact the testing order. The automation is supposed to generate recommend ratings in order to save effort and provide reasonable ratings as well to facilitate value-based testing prioritization. 84 Chapter 6: Case Study III-Prioritize Software Features to be functionally Tested 6.1. Background This case study to prioritize features for testing was implemented at the system and acceptance testing phase of one of an industry product’s (named “Qone” [Qone]) main releases in a Chinese Software Organization. The release under test added nine features with total Java codes of 32.6 KLOC in this release. The features are mostly independent amendments or patches of some existing modules. The value-based prioritization strategy was also applied to prioritize the 9 features to be tested based on their ratings of business importance, Quality Risk Probability, and Testing Cost. Features’ testing value priorities provide the decision support for the testing manager to enact the testing plan and adjust it according to the feedback of quality risk indicators, such as defects numbers and defects density and updated testing cost estimation. Defects data was collected automatically and displayed real-time by this organization’s defect reporting and tracking system with immediate feedback to adjust the testing priorities for the next testing round. 6.2. Case Study Design 6.2.1. The step to determine Business Value To determine business importance of each feature, Karl Wiegers’ approach [Wiegers, 1999] is applied in this case study. This approach considers both the positive benefit of the presence of a feature and the negative impact of its absence. Each feature is assessed in terms of the benefits it will bring if implemented, as well as the penalty that will be incurred if it is not implemented. The estimates of benefits and penalties are relative. A scale of 1 to 9 is used. For each feature, the relative benefit and penalty are 85 summed up and entered in the Total BI (Business Importance) column in Table 38 using the following formula. The sum of the Total BI column represents the total BI of delivering all features. To calculate the relative contribution of each feature, divide its total BI by the sum of the Total BI column. As we can see, there is an approximate Pareto distribution in which F1 and F2 contribute 22.2% of the features and 59.3% of the total BI. Table 38. Relative B usiness Importance Calculation Benefit Penalty Total BI BI % Weights 2 1 F1 9 7 25 30.9% F2 8 7 23 28.4% F3 1 3 5 6.2% F4 2 1 5 6.2% F5 1 1 3 3.7% F6 2 1 5 6.2% F7 3 2 8 9.9% F8 1 2 4 4.9% F9 1 1 3 3.7% SUM 28 25 81 1 Figure 17 shows the BI distribution of the 9 features. As we can see, there is an approximate Pareto distribution in which F1 and F2 contribute 22.2% of the features and 59.2% of the total BI. 86 Figure 17. Business Importance Distribution 6.2.2. The step to determine Risk Probability The risk analysis was performed prior to system testing start, but was continuously updated during testing execution. It aims to calculate the risk probability for each feature. We follow the four steps: Step 1: List all risk factors based on past projects and experiences: set up the n risks in the rows and columns of an n*n matrix. In our case study, according to this Chinese organization’s past similar projects’ risk data. Four top quality risk factors with the highest Risk Exposure are: Personnel Proficiency, Size, Complexity, and Design Quality. Defects Proportion and Defects Density are usually used as hand-on metrics for quality risk identification during the testing process and they together with the top four quality risk factors to serve as the risk factors that would determine the feature quality risk in this case study. Step 2: Determine risk weights according to their impact degree to software quality: different risk factor has different impact degrees to influence software quality under different organizational contexts, and it is more reasonable to assign them different 30.9% 28.4% 6.2% 6.2% 3.7% 6.2% 9.9% 4.9% 3.7% 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% F1 F2 F3 F4 F5 F6 F7 F8 F9 Business Importance 87 weights before combining them to get one risk probability number for each feature. AHP (The Analytic Hierarchy Process) Method [89], a powerful and flexible multi-criteria decision-making method that has been applied to solve unstructured problems in a variety of decision-making situations, ranging from the simple personal decisions to the complex capital intensive decisions, is used to determine the weight for each risk factor. Based on the understanding of risk factors and their knowledge and experience of their specific relative impact degree to software quality in this organization’s context, the testing manager collaborated with the developing manager to determine the weights of each quality risk using AHP method. In this case study, the calculation of quality risks weights is illustrated in Table 39. The number in each cell represents the value pair-wise relative importance: number of 1, 3, 5, 7, or 9 in row i and column j stands for that the stakeholder value in row i is equally, moderately, strongly, very strongly, and extremely strongly more important than the stakeholder value in column j, respectively. In order to calculate weight, each cell is divided by the sum of its column, and then averaged by each row. The results of the final averaged weight are listed in the bolded Weights column in Table 39. The sum of weights equals 1. If we are able to determine precisely the relative value of all risks, the values would be perfectly consistent. For instance, if we determine that Risk1 is much more important than Risk2, Risk2 is somewhat more important than Risk3, and Risk3 is slightly more important than Risk1, an inconsistency has occurred and the result’s accuracy is decreased. The redundancy of the pairwise comparisons makes the AHP much less sensitive to judgment errors; it also lets you measure judgment errors by calculating the 88 consistency index (CI) of the comparison matrix, and then calculating the consistency ratio (CR). As a general rule, CR of 0.10 or less is considered acceptable [Saaty, 1980]. In the case study, we calculated CR according to the steps in [Saaty, 1980], and the CR is 0.01, which means that our result is acceptable. Table 39. Risk Factors’ Weights C alculation-AHP Personnel Proficiency Size Complexity Design Quality Defects Proportion Defects Density Weights Personnel Proficiency 1 1/3 3 3 1/3 1/5 0.09 Size 3 1 3 3 1 1 0.19 Complexity 1/3 1/9 1 1 1/7 1/9 0.03 Design Quality 1/3 1/7 1 1 1/7 1/9 0.04 Defects Proportion 3 1 7 7 1 1 0.27 Defects Density 5 3 9 9 1 1 0.38 Step 3: Score each risk factor for each feature: the testing manager’s in collaboration with the developing manager scores each risk factor for each feature. The estimate is of the degree to which the risk factor is present for each feature. 1 means the factor is not present and 9 means the factor is very strong. A distinction must be made between factor strength and action to be taken. 9 indicates factor strength, but does not indicate what should be done about it. Initial Risks are risk factors we use to calculate the risk probability before the system testing and Feedback Risks such as Defects Proportion and Defects Density are risk indicators used during the testing process and serve to monitor and control the testing process. 89 Risks such as Personnel Proficiency, Complexity, and Design Quality etc. are scored by the developing manager based on their understanding of each feature and pre- defined scoring criteria. The organization also has its own defined scoring cr iteria for each risk rating. For example, for Personnel Proficiency, Years of experience in application, platform, language and tool serves as a surrogate for simply measuring it, the scoring criteria the organization adopts are as follows: 1-More than 6 years, 3-More than 3years, 5-More than 1 year, 7-More than 6 months, 9-<2 months Use of intermediate scores (2, 4, 6, 8) was allowed More comprehensive measures for Personnel Proficiency could be a combination of COCOMO II [Boehm et al., 2000] personnel factors, e.g. ACAP (Analyst Capability), PCAP (Programmer Capability), PLEX (Platform Experience), LTEX( Language and Tool Experience) with other outside factors that might influence Personnel Proficiency, e.g. reasonable workload, and work spirit and passion from psychological view. Risks such as Size, Defects Proportion, Defects Density are scored based on collected data, for example, if a feature’s size is 6KLOC and the largest feature’s size is 10KLOC, so the feature’s size risk is scored as 9*(6/10) 5. Step 4: Calculate the risk probability for each feature: for each feature Fi, after each risk factor’ score is obtained, following formula is used to combine all the risk factors to get the risk probability P i of Fi 90 j i R , is Fi’s risk value of jth risk factor, j W denotes the weight of jth risk factor. Table 40 will calculate the Probability of the total initial risks that comes from each feature before system test. Table 40. Quality Risk Probability Calculation (Before System Testing) Initial Risks Feedback Risks Probability Personnel Proficiency Size Complexity Design Quality Defects Proportion Defects Density Weights 0.09 0.19 0.03 0.04 0.27 0.38 F1 5 3 1 1 0 0 0.13 F2 4 9 5 2 0 0 0.26 F3 3 3 5 5 0 0 0.14 F4 5 4 7 5 0 0 0.19 F5 5 2 3 3 0 0 0.12 F6 5 2 5 6 0 0 0.14 F7 5 4 5 2 0 0 0.17 F8 1 2 1 1 0 0 0.06 F9 1 1 1 1 0 0 0.04 Lessons Learned and Process Implication: From the data of initial risks collected, some potential problems are found for this organization: Potential problem in tasks break down and allocation: the Feature F9 has the least risks of both Personnel Proficiency and Complexity and it implies that one of the most experience developers is responsible for the least complex feature. But for the most complex feature F4, it is developed by the least experienced developer. This implies a potential task allocation problem in this organization. Generally, it is highly risky to let 91 the least experienced staff to do the most complex task and also a resource waste to let the most experienced developer to do the least complex task. In the future, the organization should consider a more reasonable and efficient task allocation strategy to mitigate risk. Potential insufficient design capability: basically, the risk factors should be independent when they are combined to generate a risk probability, which means that the risk factors should not have strong interrelation among them. Based on the data from Table 40, we do a correlation analysis among the risk factors, almost all risk factors don’t have strong correlations (correlation coefficient>0.8). But it should be noted that the correlation coefficient 0.76 between Complexity and Design Quality is high, which means as the Complexity becomes an issue, the Design Quality also becomes a risky problem. This could imply that the current designers or analysts are inadequate for their work. To mitigate this risk, the project manager should consider recruiting analysts with more requirements, high-level design and detailed design experiences in the future. Table 41. Correlation among Initial Risk Factors: Personnel Proficiency Size Complexity Design Quality Personnel Proficiency 1 Size 0.30 1 Complexity 0.56 0.48 1 Design Quality 0.44 -0.05 0.76 1 From Table 39, we could see that feedback risk factors: “Defect Proportion” and “Defect Density” have the largest weights when they use AHP to determine the risk items’ weights. This is reasonable, because initial risk factors are mainly used to estimate 92 the risk probability before system testing starts. As long as system testing starts, the testing manager should be more concerned with each feature’s real and undergoing quality situation to find which features are the most fault-prone. “Defect Proportion” and “Defect Density” could serve to provide the real quality information and feedback during the process of system testing. This is also the reason that probabilities in Table 40 are low, since the initial risks are assigned smaller weights and there are no feedback risk factors before system testing starts. 6.2.3. The step to determine Testing Cost The test manager estimates the relative cost of testing each feature, again on a scale ranging from a low of 1 to a high of 9. The test manager estimates the cost ratings based on factors such as the developing effort of the feature, the feature complexity, and the quality risks as shown in Table 42. Table 42. Relative Testing Cost E stimation Cost Cost% F1 2 4.8% F2 5 11.9% F3 5 11.9% F4 9 21.4% F5 6 14.3% F6 4 9.5% F7 5 11.9% F8 3 7.1% F9 3 7.1% sum 42 1 93 Figure 18. Testing Cost Estimation Distribution A correlation analysis is done between the 9 features’ business importance and estimated testing cost as shown in Table 43. The negative correlation denotes that the most testing costly features might have less business importance to key customers. Testing the features first with more business importance but less cost will definitely improve the testing efficiency and maximize its ROI at the early stage of testing phase. Table 43 Correlation between B usiness Importance and Testing Cost BI Cost BI 1 Cost -0.31 1 6.2.4. The step to determine Testing Priority Similar as the scenario prioritization, after passing the testing for each feature, the probability of failure would be reduced to 0, so the testing priority (TP) triggered by RRL is calculated as: 4.8% 11.9% 11.9% 21.4% 14.3% 9.5% 11.9% 7.1% 7.1% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% F1 F2 F3 F4 F5 F6 F7 F8 F9 Cost 94 And the Testing Priorities for the 9 features are shown in Table 44, the testing order is F1, F2, F7, F6, F3, F4, F8, F5, and F9. Table 44. Value Priority Calculation BI % Probability Cost% Priority F1 30.9 0.13 4.8 0.81 F2 28.4 0.26 11.9 0.63 F7 9.9 0.17 11.9 0.14 F6 6.2 0.14 9.5 0.09 F3 6.2 0.14 11.9 0.07 F4 6.2 0.19 21.4 0.05 F8 4.9 0.06 7.1 0.04 F5 3.7 0.12 14.3 0.03 F9 3.7 0.04 7.1 0.02 6.3. Results After adapting the value-based prioritization strategy to determine the testing order of the 9 features, the PBIE comparison between value-based order and its inverse order (the most inefficient one) is shown in Figure 19 , and the difference of APBIE between the two is 76.9%-34.1%=42.8% which means value-based testing order can improve the cost- effectiveness by 42.8% than the worst case, other value-neutral (or partial value-based) situations’ PBIE curves are supposed to lie between the these two PBIE curves, and are representative of the most common situations in reality, and this further rejects hypothesis H-t1. 95 Figure 19. Comparison between Value-Based and Inverse order In our case study, the test manger plans to execute 4 rounds of testing. During each round, test groups focus on 2-3 features with the highest current priority, and the other features are tested by automated tools. The testing result is when the first round is over, F1 and F2 satisfy the stop-test criteria, when the second round is over, F3, F6, F7 satisfied the stop-criteria, when the third round is over, F4, F8 satisfied the stop-test criteria, and the last round is F5 and F9. And initial estimating testing cost and actual testing cost comparison can be shown in Figure 20. Figure 20. Initial Estimating Testing Cost and Actual Testing Cost Comparison 30.8% 59.2% 69.1% 75.3% 81.4% 87.6% 92.5% 96.2% 99.9% 3.7% 7.4% 12.3% 18.5% 24.7% 30.9% 40.7% 69.1% 99.9% 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 1 2 3 4 5 6 7 8 9 PBIE Features Value-Based Inverse 16.7 33.3 28.6 21.4 19.8 25.3 30.3 24.6 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 1 2 3 4 Cost(Percent) Testing Rounds Estimate Actual 96 If we regard the testing activity as an investment, its value is realized when features satisfy the stop-test criteria. The accumulated BI earned curve in Figure 22 is like a production function, with higher pay-off at the earlier stage but diminishing return later. Also from Figure 21 and Figure 22, we can see that when we finished the Round 1 testing, we earned 59.2% BI of all features, at a cost of only 19.8% of the all testing process, the ROI is as high as 1.99. During the Round 2, we earned 22.2% BI, cost 25.3% effort, and the ROI became negative as -0.12. We also can see, from Round 1 to Round 4, both the BI earned line and the ROI line is descending. Round 3 and Round 4 earn only 18.5% BI but cost 54.9% effort. This shows that the Round 1 testing is the most cost effective. Testing the features with higher value priority first is especially useful when the market pressure is very high. In such cases, one could stop testing after finding a negative ROI in Round 1. However, in some cases, continuing to test may be worthwhile in terms of customer-perceived quality. Figure 21. BI, Cost and ROI between Testing Rounds Start Round 1 Round 2 Round 3 Round 4 BI Earned 0 59.2 22.2 11.1 7.4 Cost 0 19.8 25.3 30.3 24.6 Test_ROI 0 1.99 -0.12 -0.63 -0.70 -1 -0.5 0 0.5 1 1.5 2 0 14 28 42 56 70 97 Figure 22. Accumulated BI Earned During Testing Rounds Consideration of Market Factors Time to market can strongly influence the effort distribution of software developing and project planning. As testing phase serves as the adjacent phase before software product transition and delivery, it will be influenced even more by market pressure [Huang and Boehm, 2006]. Sometimes, under the intense market competition situation, sacrificing some software quality to avoid more market share erosion might be a good organizational strategy. In our case study, we use a simple function as follows to display the market pressure’s influence to Business Importance: Time represents the number of unit time cycle. A unit time cycle might be a year, a month, a week even a day. For simplicity, in our case study, the unit time cycle is a testing round. Pressure Rate is estimated and provided by market or product managers, with the help of customers. It represents during a unit time cycle, what is the percentage 0 59.2 81.4 92.5 99.9 0 20 40 60 80 100 Start Round 1 Round 2 Round 3 Round 4 BI Earned 98 initial value of the software will depreciated. The more furious the market competition is, the larger the Pressure Rate is. As we can see from the formula above, the longer the time is, the larger the Pressure Rate is, the smaller is the present BI, and the larger the loss BI caused by market erosion. In our case study, Due to we calculate the relative business importance, the initial total BI is 100(%). When the Round n testing is over, the loss BI caused by market share erosion is . On the other hand, the earlier the product enters the market, the larger the loss caused by poor quality. Finally, we can find a sweet spot (the minimum) from the combined risk exposure due to both unacceptable software quality and market erosion. We assume three Pressure Rates 1%, 4% and 16% standing for low, medium and high market pressure respectively in Figure 23 to Figure 25, and this could be also seen as three types of organizational contexts: high finance, commercial and early start-up [Huang and Boehm, 2006]. When market pressure is as low as 1% in Figure 23, the total loss caused by quality and market erosion reaches the lowest point (sweet spot) at the end of the Round 4.When the Pressure Rate is 4%, the lowest point of total loss is at the end of Round 3 in Figure 24, which means we should stop testing and release this product even F5 and F9 haven’t reached the stop-test criteria at the end of Round 3; this would ensure the minimum loss. When the market pressure rate is as high as 16% in Figure 25, we should stop testing at the end of Round 1. 99 Figure 23. BI Loss (Pressure Rate=1%) Figure 24. BI Loss (Pressure Rate=4%) Figure 25. BI Loss (Pressure Rate=16%) 100 Extension of Testing Priority Value Function: In this case study, we use multi-objective multiplicative value function to determine the testing priority. There is also another additive value function that can be used to determine the testing priority as follows: V(X BI ), V(X C ) and V(X RP ) are single value functions for “Business Importance”, “Cost” and “Risk Probability”. W BI , W C and W RP are relative weights for them respectively. V(X BI +X C +X RP ) is the multi-objective additive value function for testing priority. For the single value functions of “Business Importance” and “Risk Probability”, they are increasing preference, the larger the “Business Importance” or “Risk Probability”, the higher the testing priority as shown in the left part of Figure 26. For the single value function of “Testing Cost”, it is decreasing preference, the larger the Cost, the lower the testing priority value as shown in the right part of Figure 26. Figure 26. Value Functions for “B usiness Importance” and “Testing Cost” Extension from the multiplicative value function to additive one also shows the similar feature testing priorities result [Li, 2009]. No matter the value function is multiplicative or additive, as long as they reasonably reflect the similar SCSs’ win 101 condition’ preferences, they are supposed to generate the similar priority results. From our extension experiment, both dynamic prioritizations could make the ROI of testing investment reach the peak at the early stage of testing, which is especially effective when the time to market is limited. This extension of value function is also supported by Value- Based Utility Theory. 102 Chapter 7: Case Study IV-Prioritize Test Cases to be Executed 7.1. Background This case study for prioritizing test cases to be executed by using the Value-Based, Dependency-Aware prioritization strategy was experimented on USC 2011 spring and fall semester software engineering course’s a number of 18 projects. As an extension to previous work for prioritizing testing features, this work prioritized test cases in a fine- grained granularity with added considerations on test cases’ inner-dependency. Besides, it tailored the Probability of Loss from the Risk Reduction Leverage (RRL) definition to test case Failure Probability and used this as a trigger to shrink the regression test case suite by excluding the stable features for the scarce testing resource. A project named “Project Paper Less” [USC_577b_Team01, 2011] with 28 test cases is used as an example to investigate the improved testing efficiency. Through Fall 2010 CSCI 577a, Team01 students have already developed good results of Operation Concept Description (OCD), System and Software Requirement Description (SSRD), System, System and Software Architecture Description (SSAD) and Initial Prototype together with various planning documents, such as Lifecycle Plan (LCP), Quality Management Plan (QMP). From Spring 2011 CSCI 577b, they develop Initial Operational Capability with concurrently generating Test Plan and Cases (TPC), students are trained to write test cases according to the requirements in SSRD with Equivalence Partitioning and Boundary Value Testing techniques [Ilene, 2003] to elaborate test cases. Their test cases in the TPC cover 100% requirements in the SSRD and they have already done some informal unit testing, integration testing before the acceptance testing. They 103 follow the Value-based Testing Guideline [USC_577b_VBATG, 2011] to do Value-based test case prioritization (TCP), execute their acceptance testing according to the testing order from the prioritization, record their testing results in the Value-based Testing Procedure and Results (VbTPR) and report defects discovered to Bugzilla system [USC_CSSE_Bugzilla] to report and track those defects until closure. From the next section, the Value-based TCP steps will be introduced within Team01’s project’s context. 7.2. Case Study Design 7.2.1. The step to do Dependency Analysis Most features in the SUT are not independent of each other and they typically have precedence or coupling constraints between them that requires some features must be implemented before others, or some must be implemented together [Maurice et al., 2005]. Similar for test cases, some test cases are required to be executed and passed before others can be executed. The failure of some test cases can also block others to be executed. Understanding the dependencies among test cases would benefit test case prioritization and test planning; also they are useful information for rating business importance, failure probability, criticality and even testing cost that will be introduced within the following sections. Based on the test cases in TPC [USC_577b_Team01, 2011], testers were asked to generate dependency graphs for their test suites. They could be as simple as Team01’s test case dependency tree in Figure 27, or could be much more complex, for example, one test case node has more than one parental node. In Figure 27, for each test case, the bracket associated with have two space holders for later filling in, one is for Testing Value (=Business Importance*Failure Probability/Testing cost) and the other is Criticality. The 104 following sections will introduce in detail how to rate those factors, and use them for prioritization. Figure 27. Dependency Graph with Risk Analysis 7.2.2. The step to determine Business Importance As for testing, the business importance of a test case is mainly determined by its corresponding functions, components or features’ importance or value to clients. Besides, due to the test case elaboration strategies, such as Equivalence Partitioning and Boundary Value Testing, various test cases for the same feature are designed to test different aspects of the feature with different importance as well. The first step to determine the Business Importance of a test case is to determine the BI of its relevant function/feature. From CSCI577a, students are educated and trained on how to do business cases analysis for software project, and rate relative Business Importance for function/feature in a software system from the client’s view, such as the importance of software, product, component, or feature to his/her organization in terms of its Return on Investment [Boehm, 1981] as shown in Figure 28. A general mapping instruction between function/feature BI rating range as given in the box in Figure 28. And the range in production function (investment, high-payoff, diminishing returns) are given to students for their references. 105 Basically the slope of the curve represents the ROI of the function, the higher the slope, the higher the ROI, so the higher the BI of the function. The BI of the function in the Investment segment is usually in the range from Very Low to Normal, since the early Investment segment involves development of infrastructure and architecture which does not directly generate benefits but which is necessary for realization of the benefits in the High-payoff and Diminishing returns segments. For “Project Paper Less”, the Access Control and User Management features should belong to the Investment segment. The main application functions for this project such as Case Management, Document Management features are the core capabilities for this system that the client most wants to have and they are within High-payoff segment, so the BI of those functions are in the range from High to Very High. Because of the scope and schedule constraints of the course projects, these projects are usually small-scale and only require students developing the core capabilities and seldom have some features that belong to Diminishing Return segment. Figure 28. Typical production function for software product features [Boehm, 1981] BI: H-VH BI: VL-N BI: VL-N 106 The business importance of a test case is determined by the business importance of its corresponding feature, function or module on one side, it is also determined by the criticality magnitude of the failure occurrence on the other side. A guideline for rating a test case’s Business Importance is shown in Table 45 by considering both two sides. The ratings for Business Importance are from VL to VH, with corresponding values from 1 to 5. For example, for the Login function in the Access Control module, the tester used Equivalence Partitioning test case generation strategy to generate two test cases: one is to test whether a valid user can login, and the other is to test whether an invalid user cannot login. Since the Access Control feature belongs to “Investment” segment and the tester rated it as “Normal” benefit to the client. If the first test case to test whether a valid user can login fails, the Login function won’t run and this will block other functions, such as Case Management, Document Management, to be tested, so this test case should be rated “Normal” according to the guideline in Table 45. On the other side, for the other test case to test whether an invalid user cannot login should be rated “Low”, because if it fails, the login can still run (the valid user can still login to test other functionalities without blocking them). So its criticality magnitude is relatively smaller than the first test case and deserves a relative lower rating “Low”. This is just an example for differentiating the Business Importance of test cases elaborated by Equivalence Partitioning yet within the same feature. There are other various cases applicable to differentiate the relative importance by considering the criticality magnitude of failure occurrence as well. 107 Table 45. Guideline for rating B I for test cases VH:5 This test case is used to test the functionality that will bring the Very High benefit for the client, without passing it, the functionality won’t run H:4 This test case is used to test the functionality that will bring the Very High benefit for the client, without passing it, the functionality can still run This test case is used to test the functionality that will bring the High benefit for the client, without passing, the functionality won’t run N:3 This test case is used to test the functionality that will bring the High benefit for the client, without passing it, the functionality can still run This test case is used to test the functionality that will bring the Normal benefit for the client, without passing it, the functionality won’t run L:2 This test case is used to test the functionality that will bring the Normal benefit for the client, without passing it, the functionality can still run This test case is used to test the functionality that will bring the Low benefit for the client, without passing it, the functionality won’t run VL:1 This test case is used to test the functionality that will bring the Low benefit for the client, without passing it, the functionality can still run This test case is used to test the functionality that will bring the Very Low benefit for the client, without passing it, the functionality won’t run As a result of rating the total 28 test cases’ Business Importance for “Project Paper Less”, the ratings’ distribution is shown in Figure 29, High, and Very High business importance test cases consist more than half. This makes sense because most features implemented are core capabilities, but still needs some “investment” capabilities that are necessary for those core ones. 108 Figure 29. Test Case BI Distribution of Team01 Project 7.2.3. The step to determine Criticality Criticality, as mentioned the above step, represents impact magnitude of failure occurrence and what influences it will bring to the ongoing test. Combined with the Business Importance from the client’s value perspective, they contribute to determine the size of loss at risk. The empirical guideline for rating it is in Table 46. The ratings are from VL to VH with values from 1 to 5. The common reason for this is that test cases which with high Criticality should be passed as early as possible, otherwise, it would block other test cases to be executed and might delay the whole testing process if defects are not resolved soon enough. Students are educated to refer the dependency tree/graph for rating this. For “Project Paper Less” test case dependency tree as shown in Figure 27, for the ones TC-01- 01, TC-03-01 and TC-04-01, they are all rated Very High, because they are on the “critical path” for executing all other test cases, if they fail, it would block most of the other test cases to be executed and most of those blocked test cases have high Business Importance. VL 11% L 21% N 14% H 50% VH 4% VL L N H VH 109 Most of the other test cases are tree leaves, if they fail, they won’t block other test cases to be executed and their Criticality are rated Very Low. Table 46. Guideline for rating C riticality for test cases VH:5 Block most (70%-100%) of the test cases, AND most of those blocked test cases have High B usiness Importance or above H:4 Block most (70%-100%) of the test cases, OR most of those blocked test cases have High B usiness Importance or above N:3 Block some (40%-70%) of the test cases, AND most of those blocked test cases have Normal Business Importance L:2 Block a few (0%-40%) of the test cases, OR most of those blocked test cases have Normal Business Importance or below VL:1 Won’t block any other test cases 7.2.4. The step to determine Failure Probability The primary goal of testing is to reduce the uncertainty of the software product quality before it is finally delivered to the client. Testing without risk analysis is a waste of resources, and uncertainty and risk analysis are triggers for selecting the subset of test suite, in order to focus the testing resources on the most risky, fault-prone features. A set of self-check questions from different aspects or factors that might cause test case failure are provided in Table 47 for students’ reference to rate the test case’s failure probability. Students rated each test case’s Failure Probability based on those recommended factors or others they might think of by themselves. The rating levels with numeric values are: Never Fail (0), Least Likely to Fail (0.3), Have no idea (0.5), Most Likely to Fail (0.7), Fail for sure (1). 110 Table 47. Self-check questions used for rating Failure Probability Experience Did the test case fail before? --People tend to repeat previous mistakes, so does software. From pervious observations, e.g. unit test, performance at CCD, or informal random testing, the test case failed before tends to fail again Is the test case new? --The test case that hasn’t not been tested before has a higher probability to fail Change Impact Does any recent code change (delete/modify/add) have impact on some features? --if so, the test cases for these features have a higher probability to fail Personnel Are the people responsible for this feature qualified? -- If not, the test case for this feature tends to fail Complexity Does the feature have some complex algorithm/ IO functions? --If so, the test case for this feature have a higher probability to fail Dependencies Does this test cases have a lot of connections (either depend on or to be depended on) with other test case? --If so, this test case have a higher probability to fail For “Project Paper Less”, before the acceptance testing, testers have already done Core Capability Drive-through (CCD) for core capabilities developed in the first increment, design-code review, unit test, informal random testing, testers have already gained information and experiences about the health status of the software system they developed. Based on this, they rated the Failure Probability for the whole 28 test cases. The distribution of the rating levels are shown in Figure 30. Never Fail test cases consist of more than half based on previous experiences and observations. So for those Never Fail ones, they should be delayed to be executed at the end of each testing round if resources are still available, or even not to be executed if time and testing resources are limited. So in this way, quality risk analysis drives to shrink the test case suite and only choose to execute those test case subsets with quality risks. 111 Figure 30. Failure Probability Distribution of Team01 Project 7.2.5. The step to determine Test Cost Value-Based Software Engineering considers every activity as an investment. For test activities, the cost/effort for executing each test case should also be considered for TCP. However, estimating the effort to execute each test case is challenging [Deonandan et al., 2010], [Ferreira et al., 2010]. Some practices simply suggest count the numbers of steps to execute the test case. To simplify our experiment, students are also asked to write test case on the same granularity level to make sure that every case has the nearly the same number of steps to be executed as much as they can do, and assume that the cost for executing each test case is the same. 7.2.6. The step for Value-Based Test Case Prioritization As far as testers rated those factors above for each test case, Testing Value triggered by RRL is defined as below: Never Fail, 15, 54% Least Likely to Fail, 6, 21% Have no idea, 1, 4% Most Likely to Fail, 6, 21% Fail for sure, 0, 0% Never Fail Least Likely to Fail Have no idea Most Likely to Fail Fail for sure 112 It is obvious from the definition of Testing Value that the Testing Value is in proportion to Business Importance and Failure Probability and inversely proportional to Testing cost. This allows test cases to be prioritized in terms of return on investment (ROI). Students were asked to fill in each test case node with the number of Testing Value and Criticality ratings as shown in Figure 27. Executing the ones with the highest Testing Value and highest Criticality first is our basic prioritization strategy. However, due the dependencies among test cases, a common situation is that testers cannot usually jump and reach to the test case with the highest Testing Value directly without executing and passing some others with lower Testing Value on the critical path to obtain the highest one. For example, in Figure 27, TC-04-01 has the highest Testing Value (3.5) together with highest Criticality rating (VH), but testers can’t directly execute it until TC-01-01 and TC-03-01 on the critical path are executed and passed. So the factor of dependency should also be added into the value-based TCP algorithm. Some key concepts below are introduced to help understand the value-based TCP algorithm. Passed: All steps in the test case generates the expected outputs that can make this feature work accordingly Failed: As long as one of the steps in the test case generates an unexpected outputs to make this function can’t work or this failure would for sure block other test cases to be executed if possible (some minor improvement suggestion doesn’t belong to this category ) NA: The test case is not able to be executed, there are some candidate reasons: This test case depends on another test case which fails; External factors, such as the testing environment e.g. the pre-condition could not be satisfied, or there is no required testing data, etc. 113 Dependencies Set: A test case’s Dependencies Set is the set of the test cases that this test case depends on. The Dependencies Set should include all dependent test cases, either directly or indirectly. Ready-to-Test: it is a status of test cases, and its definition is: A test case is Ready-to- Test only if the test case has no dependency or all the test cases in its Dependencies Set have been “Passed”. Not-Tested-Yet: it is another status of test cases, and its definition is: A test case is Not- Tested-Yet means this test case has not been tested yet so far. The algorithm of value-based, dependency-aware Test Case Prioritization is shown below with brief description in Figure 10. It is basically a variant of greedy algorithm with the optimal goal of first selecting the Ready-to-Test one with the highest Testing Value and Criticality to test. Value First: Test the one with the highest Testing Value. If several test cases’ Testing Values are the same, test the one with the highest Criticality. Dependency Second: If the test case selected from the first step is not “Ready-to-Test”, which means at least one of the test cases in its Dependencies Set is “Not-Tested-Yet”. In such situation, prioritize the “Not-Tested-Yet” test cases according to “Value First” in this Dependencies Set and start to test until all test cases in the Dependencies Set are “Passed”. Then the test case with the highest value is “Ready-to-Test”. Update the prioritization: After one round, update the Failure Probability based on updated observation from previous testing rounds. 114 Pick the one with the highest Test Value (if the same, choose the one with higher Criticality) Have dependencies? All dependencies passed? Y Start to test N Y Exclude the “Passed” one for prioritization Failed? Exclude the “Failed” one and the others “NA” that depends on it for prioritization N <<In the Dependencies Set>> N <<Ready-to-Test>> <<Ready-to-Test>> <<- -In the Whole Test Case Set- ->> Resovled? Y N <<Report for Resolution>> Figure 31. In-Process Value-Based TCP Algorithm For “Project Paper Less”, 15 Never Fail test cases are excluded in the subset selected to test, as shadowed in the dependency tree in Figure 27. For those test cases, it is not necessary to test them deliberately if the testing effort or resources are limited; yet it is ok to test them at the end of this round if time is still available. According to the Value- Based TCP algorithm, the testing order for the remaining test cases is: TC-04-01, TC-04-02, TC-04-03, TC-05-10, TC-18-01, TC-12-01, TC-11-01, TC-13-01, TC-02-01, TC-14-01, TC-03-04, TC-02-02, TC-03-02. However, the testers still need to walk through TC-01-01 and TC-03-01 to reach TC-04-01, but walking-through costs much less than deliberately testing and the effort for it could be neglected. 7.3. Results 7.3.1. One Example Project Results Average Percentage of Business Importance Earned (APBIE) is used to measure how quickly the SUT’s value is realized, the higher it is, and the more efficient the test is. Y 115 For the above test case prioritization for “Project Paper Less”, the BI, FP, Criticality ratings could be found at [USC_577b_Team01, 2011]. For the whole T set of 28 test cases, we get TBI=88; At the initial point of the testing round, 15 test cases were rated “Never Fail” with no need to test in this testing round, they consist of the set T-T’. In total, they have 45 business importance, which means IBIE=45, and PBIE 0 =45/88=51.1%; For the remaining 13 prioritized test cases to be executed in order in the set of T’, PBIE 1 =(45+5)/88=56.8% when TC-04-01 is passed, PBIE 2 =(45+5+4)/88=61.4% when TC-04-02 is passed…, PBIE 13 =(45+5+4+…+1)/88=100% when TC-03-02 is passed and all 88 business importance is earned at this moment. The business importance earned fast at the beginning and becomes slower to the end as shown in Figure 32; The APBIE=(56.8%+61.4%…+100%)/13=81.9%; Figure 32. PBIE curve according to Value-Based TCP (APBIE=81.9%) 56.8% 61.4% 65.9% 70.5% 75.0% 79.5% 84.1% 88.6% 92.0% 95.5% 96.6% 98.9% 100.0% 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% PBIE 116 As the obvious evidence above, risk analysis for Failure Probability for test cases can help to select subset test case suite to focus effort on most risky test cases in order to save testing cost and effort. However, the risk analysis should be based on previous hands- on experiences and observations about the quality of the SUT. If testers have no idea about the SUT health status before test, in practice, for example, the third party testing, outsourcing testing etc. the Test Value should depend only on Business Importance before their first test, assuming test cost is the same for each test case as an example dependency tree shown in Figure 27. So in this case, all test cases should be prioritized, according to the Value-Based TCP algorithm, the test order for the whole test suite without risk analysis is: TC-01-01, TC-03-01, TC-04-01, TC-05-01, TC-04-02, TC-04-03, TC-05-02, TC-05-03, TC-05-05, TC-05- 07, TC-05-08, TC-05-10, TC-12-01, TC-18-01, TC-11-01, TC-13-01, TC-19-01, TC-02-01, TC-14-01, TC- 01-02, TC-02-02, TC-15-01, TC-16-01, TC-16-02, TC-16-03, TC-03-02, TC-03-03, TC-03-04 This testing order’s PBIE is displayed in square curve in Figure 33, with a comparison with a commonly used value-neutral test order in diamond curve, which follows the test case ID number or Breadth-First-Search (BFS) the dependency tree. It is obvious that Value-Based TCP can earn business importance quicker than value-neutral one. APBIE for Value-Based TCP is 52%, higher than value-neutral one 46%, which rejects the hypothesis H-t1. This improvement would be more significant if the business importance numeric values are not in a linear range from 1 to 5, but an exponential range from 2 1 to 2 5 . 117 Figure 33. PBIE Comparison without risk analysis between Value-Based and Value-Neutral TCP (APBIE_value_based=52%, APB IE_value_neutral=46%) It is also should be noted that the 21.9% difference (81.9%-60%) with/without Failure Probability analysis is contributed by risk analysis to select sub test case suite to further improve the test efficiency. So the Value-Based TCP can improve testing cost- effectiveness by selecting and prioritizing test cases in order to earn Business Importance as early as possible, and this is especially useful when the testing schedule is tight and testing resources are limited. Value-Based TCP enables early execution for test cases with high business importance and criticality, the failure of test cases would lead to defects reported to responsible developers, and developers would arrange time to prioritize and fix defects according to the degrees of severity and priority of those defects in an efficient way. In fact, test cases’ business importance and criticality determine the severity and priority for defects on the failure occurrence, as the mapping in Table 48. Basically, if test cases with Very High business importance fail, the corresponding features/functions which brings highest benefit to customers can’t work, it will cause large size of customer’s benefit loss, 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 PBIE Test Case Order Value-Neutral Value-Based 118 and due to this reason, the relevant defect’s severity should be rated “Critical”; if test cases with Very High criticality fail, it blocks most of other test cases with high business importance to be executed, so the relevant defect should be “Resolve Immediately” in order not to delay the whole testing process. Table 48. Mapping Test Case BI &Criticality to Defect Severity& Priority BI <-> Severity Criticality<->Priority Value-Based TCP BI ratings Defect Severity in Bugzilla Value-Based TCP Criticality ratings Defect Priority in Bugzilla VH Critical VH Resolve Immediately H Major H N Normal N Normal Queue L Minor L Not Urgent, Low Priority, Resolve Later, VL Trivial, Enhancement VL So if testers follow the Value-based TCP to select and prioritize test cases, it will directly lead to early detection of high severity and priority defects for the above reasons if potential defects exist. For “Project Paper Less”, after the first round of acceptance testing, 4 defects are reported to Bugzilla, their severity, priority and corresponding test cases with business importance and criticality were shown in Table 49. From the ascending ordinal defect sequence (the earlier defect report results in a foremost defect ID) and their relevant Test Case ID, it is obvious that the value-based prioritization enable testers to detect high- severity defects as early as possible, although there were some mismatches between test case Criticality ratings and defect Priority ratings. This is mainly because we didn’t instruct students to report defects according to the mapping in Table 48 and Bugzilla has 119 Priority default value as Normal Queue and students might feel it is of no need to change, or think that high-severity defects should be Resolved Immediately in common sense. Yet, this in turns provides evidence that Value-Based TCP enables testers to detect the high- severity faults at the early time if those potential faults exist. So from the observations of defect reporting in Bugzilla for this project, defects with higher Priority and Severity are reported earlier and resolved earlier. This can reject the hypothesis H-t2. Table 49. Relations between Reported Defects and Test Cases Defect ID in Bugzilla Severity Priority Test Case ID BI FP Criticalit y #4444 Critical Resolve Immediately TC-04-01 V H 0.7 VH #4445 Major Normal Queue TC-04-03 H 0.7 VL #4460 Major Normal Queue TC-05-10 H 0.7 VL #4461 Major Resolve Immediately TC-18-01 H 0.7 VL 7.3.2. All Team Results: After all teams executed the acceptance testing with several follow-on regression testing using the Value-Based TCP technique, a survey with several open questions are sent and answered by the primary testers. Questions are mainly around their feelings and feedback on applying the Value-Based TCP for the acceptance testing, problems they encountered, and improvement suggestion. Some representative responses are shown below: “Before doing the prioritization, I had a vague idea of which test cases are important to clients. But after going through the Value-Based testing, I had a better picture as to which ones are of critical importance to the client.” 120 “I prioritized test cases mainly based on the sequence of the system work flow, which is performing test cases with lower dependencies at first before using value-based testing. I like the value-based process because it can save time by letting me focus on more valuable test cases or risky ones. Therefore, it improves testing efficiency.” 7.3.2.1 A Tool for Faciliating Test Case Prioritization: In the upper example case study, a semi-automatic spreadsheet was developed to support its application on USC graduate software engineering course projects in 2011 spring semester. In order to further facilitate and automate its prioritization to save effort and minimize human errors, and support its application on large-scale projects which might have thousands of test cases to be prioritized, there indeed has to be a consensus mechanism to collect all the required rating data. We implemented an automated and integrated tool to support this method based on an open source, built on PHP+MySQL+Apache platform, widely-used test case management toolkit TestLink. We customized this system to incorporate the value-based dependency-aware test case prioritization technique and is available at [USC_CSSE_TestLink], and used for USC graduate software engineering course projects. Figure 34 illustrates an example of the test case in the customized TestLink. 121 Figure 34. An Example of Customized Test Case in TestLink Basically, it supports to: Rate Business Importance, Failure Probability, and Test Cost by selecting the ratings from the dropdown lists as shown in Figure 34, currently it supports for 5- level ratings for each factor :Very Low, Low, Normal, High and Very High with default numeric values from 1 to 5, and the Testing Value in terms of RRL for each test case can be calculated automatically. Manage test case dependencies by inputting other test cases that this test case directly depends on as shown in the text field “Dependent Test Case” in Figure 34, and dependencies are stored in the database for later prioritization. 122 Prioritize test cases according the value-based, dependency-aware prioritization algorithm in Chapter 7 to generate a planned value-based testing order as illustrated in Figure 35, in order to help testers to plan their testing more cost- efficient. A value-neutral testing order which only deals with the dependencies among test cases without considering the RRL of each test cases are also generated for comparison. Display the PBIE curves for both value-based and value-neutral testing orders visually, and shows the APBIE for both orders at the bottom of the chart in Figure 35. Figure 35. A Tool for facilitating Value-based Test Case Prioritization in TestLink 123 Several future feasible features are planned incrementally implemented into the tool include: Establish test case dependencies by dragging and dropping, and generate visible dependency graph. Establish the traceability matrix between the requirement specifications (TestLink also maintains specifications) and test cases and category test cases by tagging “core” or “auxiliary” to automatically obtain test case business importance ratings. Establish the traceability matrix between test cases and defects (TestLink provides interfaces to integrate with commonly used defect tracking systems, such as Mantis and Bugzilla) in order to automatically predict the failure probability based on the collected historical defect data. Other solutions to predict failure probability include: integrate the code change analysis tool (e.g. Diff tool) and traceability matrix to quantitatively predict code change’s impact on test cases’ failure probability; establish a historical database and a measurement system to predict software features’ fault-proneness and personnel qualifications. Experiment sensitivity analysis for reasoning and judging the correctness of factors’ ratings. By implementing these features, this tool is expected to automatically generate recommended ratings for business importance, failure probability and won’t require testers too much effort for inputting their ratings for each test case, which will greatly facilitate the value-based TCP and add value to this technique. 124 7.3.2.2 Statistical Results for All Teams via this Tool We imported these rating data in the test case prioritization spreadsheets for all the 18 teams into the tool for facilitating comparative analysis. Three measures are used for Value-Based and Value-Neutral testing strategies comparative analysis: “APBIE”, “Delivered-Value when Cost is fixed”, “Cost when Delivered-Value is fixed”. Besides, since 18 teams are trained to use the Value-Based testing strategy, we also use T-test to see whether there is a statistically significant improvement for these teams under experiment. It should be noted that both value-based and value-neutral ones are dependency-aware, the difference is that value-based strategy adds RRL in combination of business importance, failure probability and cost (in this case study, assume each test case cost is the same) into prioritization, while the value- neutral one just considers dependencies into prioritization without considering the value- based factor RRL and this is typical in industry. APBIE Comparison APBIE is a new metric we proposed to measure how quickly a testing order can earn the business or mission value. The higher it is, the more efficient the testing is. The tool can automatically display APBIE comparison at the bottom of the chart in Figure 36. Figure 36. APBIE Comparison 125 Delivered-Value Comparison when Cost is fixed (e.g. 50% test cases executed as shown below) In reality, one situation is that a version’s release date is fixed. Before the fixed deadline, which features can be delivered is determined by which features have passed the quality criteria in terms of test cases. So maximizing the delivered value under a fixed testing cost is usually the goal of a testing strategy. In this way, “Delivered-Value Comparison when Cost is fixed” is a practical and effective testing measure under time constraints. In Figure 37 and the analysis later, it compares the delivered value while the testing cost is cut to 50%, which means only 50% of the test cases can be executed, assuming the cost of running each test case is the same. Figure 37. Delivered-Value Comparison when Cost is fixed Cost Comparison when Delivered Value is fixed (e.g. 50% business Importance as shown below) Another situation in the release planning is: for a release version, it requires several features in the package to achieve a certain degree of customer satisfaction, for example, in terms of a fixed percentage of the total business importance presented by all the features in the backlog, e.g. 50%, should be delivered in the upcoming version as soon as possible in order to satisfy the critical customers’ needs or enter the market the earliest time to maximum the market share). So minimizing the testing cost while 126 achieving the required, fixed delivered value is the goal of this release situation. In this case, “Cost Comparison when Delivered Value is fixed” is a practical and effective testing measure under such value constraints. . In Figure 38 and the analysis later, it compares the testing cost while the delivered value is set to 50%. Figure 38. Cost Comparison when Delivered Value is fixed Comparative Analysis Results: For all 18 teams in Spring & Fall 2011, Table 50 tells us: the APBIE of the Value- Based testing is always no less than the value-neutral one with statistical significance (p- value for its t-test is highly below 0.05), more visually, this means the value-based testing curve is always on top of the value-neutral testing curve, at least overlapped at the worst case. So this can reject the hypothesis H-t1. However, the improvement of some projects (Spring team 2, 3 and Fall team 3, 7, 8, 12, 13) as shadowed in Table 50, is not obvious, explanations for this will be introduced later. 127 Table 50. APB IE Comparison (all teams) APBIE # of TCs Value-Based Value-Neutral Improvement 2011S_T01 28 56.41% 46.38% 10.03% 2011S_T02 29 54.94% 53.80% 1.14% 2011S_T03 22 51.76% 50.75% 1.01% 2011S_T05 31 54.36% 51.87% 2.49% 2011S_T06 39 53.07% 50.40% 2.67% 2011F_T01 19 51.93% 45.98% 5.95% 2011F_T03 14 52.15% 50.33% 1.82% 2011F_T04 24 61.95% 53.62% 8.33% 2011F_T05 77 63.21% 42.07% 21.14% 2011F_T06 31 59.22% 53.31% 5.91% 2011F_T07 10 57.25% 56.25% 1.00% 2011F_T08 7 55.71% 54.76% 0.95% 2011F_T09 10 57.27% 51.51% 5.76% 2011F_T10 18 62.08% 57.23% 4.85% 2011F_T11 25 53.16% 51.39% 1.77% 2011F_T12 6 58.33% 58.33% 0.00% 2011F_T13 31 53.64% 53.25% 0.39% 2011F_T14 29 57.24% 48.17% 9.07% Average 56.32% 51.63% 4.68% F-test 0.5745 T-test 0.000661 Table 51 tells us: If the testing cost is fixed, e.g. only half of the total test cases can be run before releasing, assuming the time for running each test case is the same, the results show that the Value-Based testing always delivers not less business value than the Value-Neutral one with statistical significance, so this can reject the hypothesis H-t3. However, it is also noted that no obvious improvement for some projects (Spring team 2, 3 and Fall team 3, 7, 8, 12, 13) as above. 128 Table 51. Delivered Value Comparison when Cost is fixed (all teams) PBIE 1/2 # of TCs Value-Based Value-Neutral Improvement 2011S_T01 14 60% 40% 20.00% 2011S_T02 15 61% 58% 3.00% 2011S_T03 11 52% 50% 2.00% 2011S_T05 16 56% 50% 6.00% 2011S_T06 20 59% 51% 8.00% 2011F_T01 10 60% 45% 15.00% 2011F_T03 7 50% 50% 0.00% 2011F_T04 12 70% 50% 20.00% 2011F_T05 39 70% 40% 30.00% 2011F_T06 16 65% 50% 15.00% 2011F_T07 5 53% 52% 1.00% 2011F_T08 4 60% 50% 10.00% 2011F_T09 5 58% 45% 13.00% 2011F_T10 9 63% 55% 8.00% 2011F_T11 13 55% 50% 5.00% 2011F_T12 3 50% 50% 0.00% 2011F_T13 16 51% 50% 1.00% 2011F_T14 15 60% 40% 20.00% Average 58.50% 48.67% 9.83% F-test 0.3822 T-test 0.000083 Table 52 tells us: If the business value to be delivered is fixed for a release, e.g. 50% of the total business value is planned to deliver as soon as possible to enter market the earliest time, assuming the time for running each test case is the same, the Value- Based testing always spends no less testing cost than the Value-Neutral one with statistical significance, so this can reject the hypothesis H-t3. However, no obvious improvement for some projects (Spring team 2, 3 and Fall team 3, 7, 8, 12, 13) as above. 129 Table 52. Cost Comparison when Delivered Value is fixed (all teams) # of TCs when gaining 50% BI Value- Based Value- Neutral # of TCs Value-Based Cost% Value-Neutral Cost% Cost saving % 2011S_T01 12 17 28 42.86% 60.71% 17.86% 2011S_T02 13 13 29 44.83% 44.83% 0.00% 2011S_T03 11 11 22 50.00% 50.00% 0.00% 2011S_T05 13 16 31 41.94% 51.61% 9.68% 2011S_T06 18 21 39 46.15% 53.85% 7.69% 2011F_T01 9 11 19 47.37% 57.89% 10.53% 2011F_T03 7 7 14 50.00% 50.00% 0.00% 2011F_T04 8 14 24 33.33% 58.33% 25.00% 2011F_T05 21 51 77 27.27% 66.23% 38.96% 2011F_T06 11 16 31 35.48% 51.61% 16.13% 2011F_T07 5 5 10 50.00% 50.00% 0.00% 2011F_T08 4 4 7 57.14% 57.14% 0.00% 2011F_T09 4 6 10 40.00% 60.00% 20.00% 2011F_T10 7 9 18 38.89% 50.00% 11.11% 2011F_T11 11 13 25 44.00% 52.00% 8.00% 2011F_T12 3 3 6 50.00% 50.00% 0.00% 2011F_T13 16 16 31 51.61% 51.61% 0.00% 2011F_T14 12 18 29 41.38% 62.07% 20.69% Average 44.01% 54.33% 10.31% F-test 0.2616 T-test 0.000517 After re-checking those rating spreadsheets and re-interviewing with students from those projects with no obvious improvement. Explanations are listed as below: Most of the course projects are small-size, during the two-semesters, students usually only have the time to focus on core capabilities’ implementation, it is hard for some of them to differentiate the levels of business importance for those “equally important “capabilities. This is also one partial reason for the small percentage of overall improvement as well. Especially for Spring team 02, 03 and Fall team 07,08,12,13, we discovered from their prioritizations that nearly all the test cases’ business importance are rated High or 130 above. From this respective, they are rarely the value-based teams, although they are trained to use the value-based strategy to differentiate the levels of business importance. Some students don’t have strong capabilities or sense to do project business analysis, resulting in nearly the same levels of business importance ratings; Some teams have very small set of test cases, and this makes them even harder for differentiating the business importance. Based on the explanations above, the teams with no obvious improvement are in fact the value-neutral teams, if we exclude them into the comparative analysis, the performances on the three measures improve for obvious reasons as shown Table 53 to Table 55. This further rejects H-t1 and H-t3. Table 53. APB IE Comparison (11 teams) APBIE # of TCs Value-Based Value-Neutral Improvement 2011S_T01 28 56.41% 46.38% 10.03% 2011S_T05 31 54.36% 51.87% 2.49% 2011S_T06 39 53.07% 50.40% 2.67% 2011F_T01 19 51.93% 45.98% 5.95% 2011F_T04 24 61.95% 53.62% 8.33% 2011F_T05 77 63.21% 42.07% 21.14% 2011F_T06 31 59.22% 53.31% 5.91% 2011F_T09 10 57.27% 51.51% 5.76% 2011F_T10 18 62.08% 57.23% 4.85% 2011F_T11 25 53.16% 51.39% 1.77% 2011F_T14 29 57.24% 48.17% 9.07% Average 57.26% 50.18% 7.09% F-test 0.8326 T-test 0.000704 131 Table 54. Delivered Value Comparison when Cost is fixed (11 teams) PBIE 1/2 # of TCs Value-Based Value-Neutral Improvement 2011S_T01 14 60% 40% 20.00% 2011S_T05 16 56% 50% 6.00% 2011S_T06 20 59% 51% 8.00% 2011F_T01 10 60% 45% 15.00% 2011F_T04 12 70% 50% 20.00% 2011F_T05 39 70% 40% 30.00% 2011F_T06 16 65% 50% 15.00% 2011F_T09 5 58% 45% 13.00% 2011F_T10 9 63% 55% 8.00% 2011F_T11 13 55% 50% 5.00% 2011F_T14 15 60% 40% 20.00% Average 61.45% 46.91% 14.55% F-test 0.9339 T-test 0.000043 Table 55. Cost Comparison when Delivered Value is fixed (11 teams) # of TCs when gaining 50% BI Value- Based Value- Neutral # of TCs Value-Based Cost% Value-Neutral Cost% Cost saving % 2011S_T01 12 17 28 42.86% 60.71% 17.86% 2011S_T05 13 16 31 41.94% 51.61% 9.68% 2011S_T06 18 21 39 46.15% 53.85% 7.69% 2011F_T01 9 11 19 47.37% 57.89% 10.53% 2011F_T04 8 14 24 33.33% 58.33% 25.00% 2011F_T05 21 51 77 27.27% 66.23% 38.96% 2011F_T06 11 16 31 35.48% 51.61% 16.13% 2011F_T09 4 6 10 40.00% 60.00% 20.00% 2011F_T10 7 9 18 38.89% 50.00% 11.11% 2011F_T11 11 13 25 44.00% 52.00% 8.00% 2011F_T14 12 18 29 41.38% 62.07% 20.69% Average 39.88% 56.76% 16.88% F-test 0.7218 T-test 0.000065 132 7.3.2.3. Lessons learned Intuitively, the benefit of the Value-Based testing strategy only comes after really differentiating the business importance levels of the test cases to be prioritized. It makes no sense for the Value-based testing prioritization if you give all the test cases the same level of business importance. Small-size projects usually focus on the core capabilities, of which the business importance differences are not obvious, and results in not obvious improvement via the Value-Based testing. For medium and large-size projects, as the project size grows, the number of the test cases proportionally increases, the benefit of prioritizing test cases to maximum the business value or minimum the test cost will surely become obvious and significant in terms of improvement percentages. A correlation analysis is conducted between the columns “Improvement” and “# of test cases” in Table 50, and the Correlation Coefficient is 0.735, which means the “Improvement” and “# of test cases” have a strong positive correlation, in other words, the more test cases to be prioritized, the more improvement can be potentially achieved. However, even the small percentage of effort savings on the fixed values or more values delivered on the fixed costs will become significant in terms of monetary dollars, especially for those large-scale projects with the investment of millions of dollars. 133 Chapter 8: Threats to Validity Diversities of Projects and Subjects: for the Case Study I at USC graduate level software engineering project course, especially for cross-project comparative analysis of value-based review experiments, the 35 projects cover different applications with diverse technical characteristics and different clients. Also, reviewers with different personnel capabilities and non-uniform granularities of issues reported by different reviewers might also impact the number of issues reported and reviewing effectiveness as displayed in this experiment. These are sources of high variability across projects, and certainly contributed to the large standard deviations seen for some of the results within the same year’s team, 2011, 2010 team’s high standard deviation for review cost effectiveness as an example in Chapter 4. However, the comparison analysis is conducted on 2011,2010 teams using value-based review, and 2009 teams using value-neutral one, and the distribution of project application types, technical and clients/reviewers’ characteristics among the three years are similar. So even the variability is high within one-year teams, but the general similarity of the projects improves to some degree the three years’ comparison. Meanwhile, to actively minimize the high variability, detailed guidelines and instructions on how to report issues to Bugzilla (the customized issue tracking system) in a consistent granularity, which attributes, e.g. Priority, Severity are especially required to correctly report, are presented and distributed to reviewers for their learning and understanding before they really act. Teaching Assistants periodically monitor reviewers’ performance without bias, quality check their issues reported, give more instructions/training without bias for not-well-done performers; train review on issue reporting to Bugzilla for the first few package reviews before value-based review is 134 introduced with detailed step-by-step guidelines; provide more office hours, training sessions for answering questions and confusion if necessary can further lower down the variability and reduce the effects of learning curve as well. Then comparison analysis is done based on those more stable package reviews. In this way, the learning curve variability is reduced. Non-representativeness of Projects and Subjects: Although the development teams are primarily full-time graduate level students with an average of less than 2 years’ industry experience, the reviewers are almost all full-time professional employees. Their review schedule conflicts were similar to review schedule conflicts on the job. Thus the results should be reasonably representative of industrial review practices. Besides, for value-based testing practices, we also conducted the case studies in real industry projects either at Galorath, Inc. or the Chinese Software Organization, and this can reduce this type of threat. Besides, voices from practitioners are good resources to further test our research hypotheses, reduce the effects of the threats that are introduced by the quantitative data analysis, and provide research improvement opportunities as well. So in and after each empirical experiment, a series of surveys with various aspects for retrospection on the experimented prioritization process is conducted to hear the feedback from practitioners. To reduce the threats of being an experimenter as well as a grader, we state clearly in the survey instructions “we do not grade on your choice, but on the rationale you provide for your choice”. Also, in our general grading for issue reporting, criteria is kept that: grading is not based on how the results are close to what we expected, but on whether students report data in an honest and correct way for their real project context. References [20, 46] 135 have the detailed survey information and result analysis for the value-based review, while [32] includes those for value-based test cases prioritization. In this way, we believe that the quantitative and qualitative evidences can complement each other to test our research hypotheses. Correctness of Input Factors’ Values: The reviewing or testing priorities are calculated based on the input factors, such as Business Importance, Risk Probability, or Cost. The correctness of those factors’ ratings or values will directly influence the correctness of output priorities. In our experiments, especially for student projects, we first provided detailed guidelines to train students on how to determine factors’ values/ratings. Students in each team determined the ratings or values based on a group consensus. Besides, we asked students to provide rationales for their ratings. Teaching Assistants also double-checked their rationales’ correctness and the consistency between the ratings and their provided rationales to avoid the bias and errors of the subjective inputs to the largest extent, in order to minimize the threats to the results’ validity. For other real industry projects, such as the Chinese Software Organization project, and the Galorath, Inc. project, those ratings are determined and validated by professional project managers, developers, and testing managers, and thus minimizing the threat. Applicability to Large-Scale Industrial Projects: For this method’s application on large-scale projects, especially for test case level prioritization, which might have thousands of test cases to be prioritized, there indeed has to be a consensus mechanism to collect all the required data. In addition to the automatic tool that we have already implemented for facilitating test case prioritization, several feasible capabilities to be explored will include: 136 For dependency analysis, some existing dependency analysis tools will be explored and integrated; For business importance, some value management systems will be explored, developed, and integrated. In this research, relative business importance in terms of ROI is captured by understanding the S-curve Production Function in Figure 28. Other Customer Value Analysis (CVA) techniques, such as the Kano Model [44] can also be applicable. Besides, a real Value Management System (VMS) to capture, manage, monitor and control the value flow for the whole software development lifecycle to facilitate software decisions on various software engineering activities based on cost/benefit analysis, business case analysis etc. is under-development and would support this. For failure probability prediction, to minimize the bias of subjective risk assessments, a sophisticated quantitative solution includes: using some candidate code change analysis tool (e.g. Diff tool) and traceability matrix to quantitatively predict code change’s impact on test cases’ failure probability; establishing a historical database and a measurement system to predict software features’ fault-proneness and personnel qualifications; and combining all these influencing factors with defined calculating rules to estimate test case failure probability in a more comprehensive and unbiased way. For reasoning/judging the correctness of factors’ ratings, and weights assigned to them, we can experiment with sensitivity analysis; We are also cooperating with some software management tool vendors to integrate these above candidate features, e.g. Qone [45], a widely-used lifecycle project management tool in China; Besides, IBM Rational Team Concert [92] is another option. Since these tools are mature, they have accumulated mechanisms to collect and share 137 required data. It is easier for the method to be applied with these tools in real industry, which will definitely have large systems that might have thousands of test cases. We believe that prioritization would become more meaningful and efficient when the scale becomes large. 138 Chapter 9: Next Steps Our next steps would involve more data and empirical studies by applying the Value-Based, Dependency-Aware prioritization strategy on exercising the lifecycle. A phased-based selection of cost-effective defect removal options for various defect types in Risk Reduction Leverage (RRL) priority order, can enable the various defect removal options to be ranked or mixed by how well they reduce risk exposure for various defect types. Combining this with their relative option costs enables them to be prioritized in terms of return on investment, as initially investigated in [Madachy and Boehm, 2008] via Orthogonal Defect Classification [Chillarege et al., 1992]. Three notional yet representative examples below might give some insights for more data and empirical studies in industry settings, and they are also the points that we want to get the most advice on the feasibility of these examples’ scenarios in industry settings. And we also would like to take this opportunity to call for more cooperation from industry. Example 1: The first example is provided by Boehm in [Selby, 2007] to compare the cost-effectiveness of two approaches for eliminating a type of error: suppose that the loss incurred by having a particular type of interface error in a given software product is estimated at one million dollars, and that from experience we can estimate that the probability of this type of interface error being introduced into the software product is roughly 0.3. Two approaches for eliminating this type of error are a requirements and design interface checker, whose application will cost $20K and will reduce the error probability to 0.1; and an interface testing approach, whose application will cost $150K 139 and will reduce the error probability to 0.05. The RRL of the two approaches are compared as below: RRL(R-D checker) =1000K*(0.3 -0.1)/20K=10 RRL (Test) =1000K*(0.3-0.05)/150K=1.67 Thus, the RRL calculation confirms that V&V investments in the early phases of the software life cycle generally have high payoff ratios, and that V&V is a function that needs to begin early to be most cost-effective. Defect removal techniques have different detection efficiencies for different types of defects, and their effectiveness may vary over the lifecycle duration. Also, the early defect detection activities can provide insights on how to perform more cost-effective testing as discussed next. Example 2: Similar calculations can help a software project determine the more cost-effective mix of defect removal techniques to apply across the software life cycle. For example, suppose the loss due to another type of defect is also 1000K, software peer review can reduce this type of error’s occurring probability from 0.6 to 0.3 with reviewing effort of 2PM: if this error’s probability can be continuously reduced by reviewing to 0.0, it will cost an extra 8PM reviewing effort; however testing can reduce this to 0.0 by only extra testing effort of 1PM. The RRL of the two strategies are compared as below: RRL (Review Only)= 1000K*(0.6 -0.0)/(2+8)PM=60K/PM RRL (Review+Test)=1000K*(0.6-0.0)/(2+1)PM=200K/PM Thus, instead of using one single defect removal strategy, the mix of defect removal options can further improve the cost-effectiveness. Additionally the techniques may have overlapping capabilities for detecting the same type of defects, and it is difficult to know how to best apply them, especially for the combination of cross-phase defect 140 removal options, e.g., when to stop reviewing and start to test, how much reviewing is enough with combination of other options at hands are difficult. One option that might be worthwhile attempting is applying Indifference Curves and Budget Constraints analysis from Microeconomics Utility theory. The optimal combination is the point where the indifference curve and the budget line are tangent. Another solution is investigated in [Madachy and Boehm, 2008] with Dynamic Simulation tool support to determine the best combination of techniques, their optimal order and timing. A further source of insights can be the collection and analysis of Orthogonal Defect Classification data [Chillarege et al., 1992] . Example 3: Another option to simplify the above scenario is the combination of different defect removal options at the same phase to reduce the costs in turns to improve RRL. For example, at the acceptance testing phase, by adopting the value-based test case prioritization strategy can shrink the testing scope by 60%, the remaining tedious manual testing effort can be further replaced by an initial little investment to write some automated scripts to allow testing run by computer programs overnight and save human effort by 90%, so by the strategy of combining value-based test case prioritization and automated testing, the cost is reduced to (1-60%)*(1-90%)=4% with a factor of 25’s RRL improvement. To the best of our knowledge so far, Example 1 and 3 might be more feasible to implement within industrial settings than example 2, at least theoretically, even for Example 1 and 3, the quantitative approach to obtain RRL becomes difficult as concerned with the precise estimation of Size (Loss) and Prob (Loss). As the series of empirical studies reflect, the place we put the most effort on is to customize the definition of RRL 141 and its quantitative analysis, practical meanings for each prioritization driver under different applications within specific project contexts, and to translate their practical meanings to practitioners through various examples and guidelines. On the other side, even the estimates of probabilities and losses are imprecise, and the resulting approaches will be more judgment-oriented strategies than they are fully quantitative optimal policies [Selby, 2007]. The cost-effectiveness assessment of ODC defect removal options can be implemented for different domains and operational scenarios in industrial settings. The ODC Delphi survey will be revisited for the extra high usage of defect removal techniques under more recent trends of Cloud Computing, Software as a Service (SaaS), and Brownfield development. 142 Chapter 10: Conclusions In this research, we propose the Value-Based, Dependency-Aware inspection and test prioritization strategy to select and prioritize defect removal activities and artifacts by how well they reduce risk exposure which is the product of the size of the loss and the probability of loss. The technique considers business importance from the client’s value perspective combined with the criticality of failure occurrence as a measure of the size of loss at risk. The reduction probability of loss is the probability that a given inspection or testing item would catch the defect. This enables the inspection or testing items to be ranked by how well they reduce risk exposure. Combining this with their relative costs enables the items to be prioritized in terms of return on investment. We applied this strategy to a series of case studies that cover the most commonly used defect removal activities during the software development life cycle, such as inspection, functionality testing, performance testing, and acceptance testing. Both quantitative and qualitative evidence from these case studies shows that this strategy enables early execution for inspection and testing items with high business importance and criticality, thus improving defect removal cost-effectiveness. The detailed steps, practices, and lessons learned to design and implement this strategy in real industrial project contexts provide the practical guidelines and insights for this strategy’s application in future industrial projects. As most of the current software testing strategies are coverage-based and value- neutral with few empirical studies aiming to maximize testing cost-effectiveness in terms of APBIE or other business-value or mission-value metrics. I hope that the results here 143 will stimulate further research and practice in value-based defect identification and removal. Furthermore, the automatic tool for facilitating test case prioritization is implemented for this strategy’s future application in large-scale projects, which might have thousands of test cases to be prioritized. In the future, we will elaborate this technique for different defect types (algorithm, interface, timing etc.) and find optimal cost-effective defect removal technique options for different types of defects to further improve testing effectiveness. 144 Bibliography [Abdelrabi et al., 2004] Z. Abdelrabi, E.Cantone, M. Ciolkowski, and D. Rombach, D, “Comparing code reading techniques applied to object oriented software frameworks with regard to effectiveness and defect detection rate”, Proc, ISESE 2004, pp 239-248. [Amland, 1999] S. Amland, “Risk Based Testing and Metrics”, 5th International Conference EuroSTAR'99. 1999: Barcelona, Spain. [Basili et al., 1996] V. Basili, S. Green, O. Laitenberger, F. Lanubile, F.Shull, S.Sorumgard, and M.Zelkowitz. “The empirical investigation of perspective-based reading”, Intl. J. Empirical SW. Engr., 1(2) 1996, pp.133-164. [Bird et al., 2009] C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy. “Putting It All Together: Using Socio-technical Networks to Predict Failures”, In Proceedings of the 17th International Symposium on Software Reliability Engineering (ISSRE 2009),Mysore, India, 2009. 109-119 [Boehm, 1981] B. Boehm, “Software Engineering Economics”, Prentice Hall, 1981. [Boehm, 1988] B. Boehm, “A Spiral Model of Software Development and Enhancement”. IEEE Computer, 1988. 21(5): p. 61-72. 145 [Boehm et al., 1998] B. Boehm, et al., “Using the WinWin spiral model: a case study”. IEEE Computer, 1998; 31(7): pp. 33-44. [Boehm et al., 2000] B. Boehm, et al., “Software Cost Estimation with COCOMO II”. Prentice Hall, NY(2000) [Boehm and Basili, 2001] B. Boehm, and V. Basili, "Software Defect Reduction Top 10 List," Computer, vol. 34, no. 1, pp. 135-137, Jan. 2001, doi:10.1109/2.962984 [Boehm, 2003] B. Boehm, “Value-Based Software Engineering”. ACM Software Engineering Notes, 2003; 28(2). [Boehm and Turner, 2003] B. Boehm, and R. Turner, “Balancing Agility and Discipline: A Guide for the Perplexed”, 2003: Addison-Wesley [Boehm et al., 2004] B. Boehm, et al., “The ROI of Software Dependability: The iDAVE Model”. IEEE Software, 2004; 21(3): pp. 54-61. [Boehm and Jain, 2005] B.Boehm, and A. Jain, “An Initial Theory of Value-Based Software Engineering”, Value-Based Software Engineering. 2005, Springer. pp. 16-37. 146 [Boehm and Lane, 2007] B.Boehm, and J. Lane, “Using the Incremental Commitment Model to Integrate System Acquisition, Systems Engineering, and Software Engineering”, CrossTalk, 2007. [Boehm et al., 2007] B. Boehm, et al., “Guidelines for Lean Model-Based (System) Architecting and Software Engineering (Lean MBASE)”, USC-CSSE, 2007. [Bullock, 2000] J. Bullock, “Calculating the Value of Testing”, Software Testing and Quality Engineering, May/June 2000, pp. 56-62. [Chillarege et al., 1992] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y. Wong, "Orthogonal Defect Classification-A Concept for In- Process Measurements," IEEE Transactions on Software Engineering, vol. 18, no. 11, pp. 943-956, Nov. 1992, doi:10.1109/32.177 [Cobb and Mills, 1990] R.H.Cobb, and H.D.Mills, "Engineering software under statistical quality control," Software, IEEE , vol.7, no.6, pp.45-54, Nov 1990 [Conradi and Wang, 2003] R.Conradi, and A.Wang. (eds.), “Empirical Methods and Studies in Software Engineering: Experiences from ESERNET”, Springer Verlag, 2003. [Czerwonka et al., 2011] J.Czerwonka, R.Das, N.Nagappan, A.Tarvo, A.Teterev. “CRANE: Failure Prediction, 147 Change Analysis and Test Prioritization in Practice - Experiences from Windows”. In Proceedings of ICST'2011. 357~366 [Deonandan et al., 2010] I. Deonandan, R. Valerdi, J. Lane, F. Macias, “Cost and Risk Considerations for Test and Evaluation of Unmanned and Autonomous Systems of Systems “, IEEE SoSE 2010 [Do et al., 2008] H Do, S. Mirarab, L. Tahvildari, and G. Rothermel. 2008. “An empirical study of the effect of time constraints on the cost-benefits of regression testing”. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 71-82. [Do and Rothermel, 2006] H. Do and G. Rothermel. 2006. “An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models”. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 141-151. [Do and Rothermel, 2008] H. Do and G. Rothermel. 2008. “Using sensitivity analysis to create simplified economic models for regression testing”. In Proceedings of the 2008 international symposium on Software testing and analysis (ISSTA '08). ACM, New York, NY, USA, 51-62. 148 [Eaddy et al., 2008] M.Eaddy, T.Zimmermann, K.D.Sherwood, V.Garg, G.C.Murphy, N.Nagappan, A.V.Aho, "Do Crosscutting Concerns Cause Defects?," IEEE Transactions on Software Engineering , vol.34, no.4, pp.497-515, July-Aug. 2008, doi: 10.1109/TSE.2008.36 [Elbaum et al., 2000] S. Elbaum, A. G. Malishevsky, and G. Rothermel, “Prioritizing test cases for regression testing”. ISSTA 2000: 102-112 [Elbaum et al., 2001] S. Elbaum, A. G. Malishevsky, and G. Rothermel. 2001. “Incorporating varying test costs and fault severities into test case prioritization”. In Proceedings of the 23rd International Conference on Software Engineering (ICSE '01). IEEE Computer Society, Washington, DC, USA, 329-338. [Elbaum et al., 2002] S. Elbaum, A. G. Malishevsky, and Gregg Rothermel. 2002. “Test Case Prioritization: A Family of Empirical Studies”. IEEE Trans. Softw. Eng. 28, 2 (February 2002), pp. 159-182. [Elbaum et al., 2004] S. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky. 2004. “Selecting a Cost-Effective Test Case Prioritization Technique”. Software Quality Control 12, 3 (September 2004), 185-210. [Elberzhager et al., 2011] F.Elberzhager, J.Münch, D.Rombach, and B.Freimut. 2011. “Optimizing cost and quality by integrating inspection and test processes”. 149 In Proceedings of the 2011 International Conference on on Software and Systems Process (ICSSP '11). ACM, New York, NY, USA, 3-12. DOI=10.1145/1987875.1987880 [Emam et al., 2001] K.E.Emam, W.Melo, J.C. Machado, “The prediction of faulty classes using object-oriented design metrics”, Journal of Systems and Software, Volume 56, Issue 1, 1 February 2001, Pages 63-75 [Fagan, 1976] M. Fagan, “Design and code inspections to reduce errors in program development”, IBM Sys. J IS(3), 1976, pp. 182-211 [Ferreira et al., 2010] S. Ferreira, R. Valerdi, N. Medvidovic, J. Hess, I. Deonandan, T. Mikaelian, “Gayle Shull Unmanned and Autonomous Systems of Systems Test and Evaluation: Challenges and Opportunities”, IEEE Systems Conference 2010 [Galorath] Galorath Incorporated: http://www.galorath.com/ [Gerrard and Thompson, 2002] P. Gerrard and N. Thompson, “Risk-Based E-Business Testing”, Artech House, 2002. [Hao and Mendes, 2006] J. Hao and E.Mendes. 2006. “Usage-based statistical testing of web applications”. In Proceedings of the 6th international conference on Web engineering (ICWE '06). ACM, New York, NY, USA, 17-24 150 [Huang and Boehm, 2006] L. Huang, and B. Boehm, “How Much Software Quality Investment Is Enough: A Value-Based Approach”. IEEE Software, 2006; 23(5): pp. 88- 95. [ICSM-Sw] Instructional ICSM-Sw Electronic Process Guidelines: http://greenbay.usc.edu/IICSMSw/index.html [Ilene, 2003] B. Ilene, (2003), Practical Software Testing, Springer-Verlag, p. 623, ISBN 0-387-95131-8 [Johnson, 2006] Jim Johnson. My Life Is Failure: 100 Things You Should Know to Be a Better Project Leader, Standish Group International (August 30, 2006) [Jones, 2008] C. Jones,: Applied Software Measurement: Global Analysis of Productivity and Quality, 3rd Edition. McGraw-Hill, (2008) [Kouchakdjian and Fietkiewicz, 2000] A.Kouchakdjian, R.Fietkiewicz, “Improving a product with usage-based testing”, Information and Software Technology, Volume 42, Issue 12, 1 September 2000, Pages 809-814 [Kano] Kano Model: : http://people.ucalgary.ca/~design/engg251/First%20Year%20Files/kano.pdf 151 [Lee and Boehm, 2005] K. Lee, B. Boehm, Empirical Results from an Experiment on Value-Based Review (VBR) Processes, in International Symposium on Empirical Software Engineering. 2005. [Li et al., 2008] J. Li, L. Hou, Z. Qin, Q. Wang, G.Chen, “An Empirically-Based Process to Improve the Practice of Requirement Review”. ICSP 2008: 135-146 [Li, 2009] Q. Li, “Using Additive Multiple-Objective Value Functions for Value-Based Software Testing Prioritization”, University of Southern California, Technical Report (USC-CSSE-2009-516) [Li et al., 2009] Q. Li, M. Li, Y. Yang, Q. Wang, T. Tan, B. Boehm, C. Hu: “Bridge the Gap between Software Test Process and Business Value: A Case Study”. ICSP 2009: 212-223 [Li et al., 2010a] Q. Li, Y. Yang, M. Li, Q. Wang, B. Boehm and C. Hu. “Improving Software Testing Process: Feature Prioritization to Make Winners of Success-Critical Stakeholders” Journal of Software Maintenance and Evolution (2010): Research and Practice, n/a. doi: 10.1002/smr.512 [Li et al, 2010b] Q. Li, F. Shu, B. Boehm, Q. Wang: “Improving the ROI of Software Quality Assurance Activities: An Empirical Study”. In Proceedings of International 152 Conference on Software Process (ICSP 2010): pp. 357-368, Paderborn, Germany, July 2010 [Li et al., 2011] Q. Li, B. Boehm, Y. Yang, Q. Wang, “A Value-Based Review Process for Prioritizing Artifacts” In Proceedings of 2011 International Conference on Software and System Process (ICSSP 2011): pp. 13-22, Honolulu, USA, May 2011 [Li and Boehm, 2011] Q. Li, B. Boehm, “Making Winners for both education and research: verfification and validation process improvement practice in a software engineering course”, Proceedings of CSEE&T 2011, pp. 304-313 [Madachy and Boehm, 2008] R. J. Madachy, B. Boehm: “Assessing Quality Processes with ODC COQUALMO”. ICSP 2008: 198-209 [Malishevsky et al., 2006] A. G. Malishevsky, J. R. Ruthruff, G. Rothermel, and S. Elbaum. “Cost-cognizant test case prioritization”. Technical report, Department of Computer Science and Engineering, University of Nebraska-Lincoln, March 2006. [Maurice et al., 2005] S. Maurice, G. Ruhe, O. Saliu, and A. Ngo-The: “Decision support for Value-based Software Release Planning”, S. Biffl, A Aurum, B. Boehm, Erdogmus, and H.,Gruenbacher, P. (eds.). Value-Based Software Engineering.Springer Verlag (2005) 153 [Musa, 1992] J.D.Musa, "The operational profile in software reliability engineering: an overview," 1992. Proceedings., Third International Symposium on Software Reliability Engineering , vol., no., pp.140-154, 7-10 Oct 1992 [Nagappan et al., 2006] N. Nagappan, T. Ball, and A. Zeller. 2006. “Mining metrics to predict component failures”. In Proceedings of the 28th international conference on Software engineering (ICSE '06). ACM, New York, NY, USA, 452-461. DOI=10.1145/1134285.1134349 [Ostrand et al., 2005] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, "Predicting the location and number of faults in large software systems," IEEE Transactions on Software Engineering , vol.31, no.4, pp. 340- 355, April 2005, doi: 10.1109/TSE.2005.49 [Ostrand et al., 2007] T.J. Ostrand, E.J. Weyuker, and R.M. Bell. 2007. “Automating algorithms for the identification of fault-prone files”. In Proceedings of the 2007 international symposium on Software testing and analysis (ISSTA '07). ACM, New York, NY, USA, 219-227. DOI=10.1145/1273463.1273493 [Persson and Yilmazturk, 2004] C. Persson and N. Yilmazturk, “Establishment of Automated Regression Testing at ABB: Industrial Experience Report on ‘Avoiding the Pitfalls”, Proceedings, ISESE 2004, IEEE, August 2004, pp. 112-121. 154 [Pinzger et al., 2008] M.Pinzger, N.Nagappan, and B.Murphy. 2008. “Can developer- module networks predict failures?”. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE- 16). ACM, New York, NY, USA, 2-12. DOI=10.1145/1453101.1453105 [Porter et al., 1995] A.Porter, L. Votta, and V.Basili, “Comparing Detection Methods for software Requirement Inspection: a Replicate Experiment”, IEEE Trans. Software Eng., vol 21, no 6, pp. 563-575, June 1995. [Qone] Qone website: http://qone.nfschina.com/en/ [Ramler et al., 2005] R. Ramler., S. Biffl, and P. Grunbacher, “Value-Based Management of Software Testing”, Value-Based Software Engineering. 2005, Springer. pp. 226-244. [Raz and Shaw, 2001] O. Raz, and M. Shaw, “Software Risk Management and Insurance”, in Proceedings of Workshop on Economics-Driven Software Engineering Research. 2001. [Roongruangsuwan and Daengdej, 2010] S. Roongruangsuwan, and J. Daengdej, 2010. “A test case prioritization method with practical weight factors”.. J. Software Eng., 4: 193-214. DOI: 10.3923/jse.2010.193.214 155 [Rothermel et al., 1999] G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, "Test Case Prioritization: An Empirical Study," Software Maintenance, IEEE International Conference on, p. 179, 15th IEEE International Conference on Software Maintenance (ICSM'99), 1999 [Rothermel et al., 2001] G. Rothermel, R. J. Untch, and C. Chu. 2001. “Prioritizing Test Cases For Regression Testing”. IEEE Trans. Softw. Eng. 27, 10 (October 2001), pp. 929-948. [RTC] Rational Team Concert: https://jazz.net/products/rational-team-concert/ [Saaty, 1980] T.L.Saaty, “The Analytic Hierarchy Process”. 1980, New York: McGraw- Hill. [Selby, 2007] R. Selby (Ed.), “Software Engineering: Barry W. Boehm's Lifetime Contributions to Software Development, Management, and Research”, Wiley-IEEE Computer Society Pr; 1 edition (June 4, 2007) [Srikanth et al., 2005] H. Srikanth, L.Williams and J. Osborne, “System test case prioritization of new and regression test cases”. In Proceedings of ISESE. 2005, 64-73. 156 [Srivastava and Thiagarajan, 2002] A. Srivastava, Thiagarajan, J., "Effectively Prioritizing Tests in Development Environment", Proceedings of International Symposium on Software Testing and Analysis, pp. 97-106, 2002 [Thelin et al., 2003] T. Thelin, P.Runeson, and C.Wohlin, “Prioritized use cases as a vehicle for software inspections”, Software, July/Aug 2003, pp. 30-33. [USC_577a_VBV&VAPE, 2010] A Value-based V&V artifact prioritization example: http://greenbay.usc.edu/csci577/fall2010/projects/team2/IIV&V/VbIIVV_CoreFCP_F10a _T02.xls [USC_577a_VBV&VPS, 2010] Value-based V&V prioritization spreadsheet, http://greenbay.usc.edu/csci577/fall2010/site/assignments/IVV_Assign/Evaluation_of_C oreFC_Package.zip [USC_577b_Team01, 2011] Spring 2011 USC 577b Team 01: Project Paper Less: http://greenbay.usc.edu/csci577/fall2010/projects/team1/ http://greenbay.usc.edu/csci577/spring2011/projects/team01/ [USC_577b_VBATG, 2011] USC 577b Value-based Acceptance Test Guideline: http://greenbay.usc.edu/csci577/spring2011/uploads/assignments/Test_Activities_Schedu le_Instructions.zip 157 [USC_CSSE_Bugzilla]USC Csci-577 Bugzilla issue tracking system: http://greenbay.usc.edu/bugzilla3/ [USC_CSSE_TestLink] An automatic tool for faciliating test case prioritization: http://greenbay.usc.edu/dacs/vbt/testlink/index.php [Wagner and Seifert, 2005] S. Wagner, T. Seifert. 2005. “Software quality economics for defect-detection techniques using failure prediction”. SIGSOFT Softw. Eng. Notes 30, 4 (May 2005), 1-6. [Walton et al., 1995] G.H.Walton, J.H.Poore, and C.J.Trammell (1995), “Statistical testing of software based on a usage model”. Software: Practice and Experience, 25: 97– 108. doi: 10.1002/spe.4380250106 [Whittaker and Thomason, 1994] J.A.Whittaker, M.G.Thomason, "A Markov chain model for statistical software testing", IEEE Transactions on Software Engineering , vol.20, no.10, pp.812-824, Oct 1994, doi: 10.1109/32.328991 [Wiegers, 1999] K. E.Wiegers, “First Things First: Prioritizing Requirements”. Software Development, 1999. 7(10): pp. 24-30. [Williams and Paradkar, 1999] C.Williams and A.Paradkar. 1999. “Efficient Regression Testing of Multi-Panel Systems”. InProceedings of the 10th International Symposium on 158 Software Reliability Engineering (ISSRE '99). IEEE Computer Society, Washington, DC, USA, 158 [Wu et al., 2010] D. Wu, Q. Li, M. He, B. Boehm, Y. Yang, S. Koolmanojwong: “Analysis of Stakeholder/Value Dependency Patterns and Process Implications: A Controlled Experiment”. HICSS 2010: 1-10 [Yang et al., 2008] Y. Yang, et al., “An Empirical Analysis on Distribution Patterns of Software Maintenance Effort”, Proceedings of 24th IEEE International Conference on Software Maintenance, Beijing, China, 2008. pp. 456-459 [Yoo and Harman, 2011] S.Yoo, and M.Harman. (2011), “Regression testing minimization, selection and prioritization: a survey”. Journal of Software Testing, Verification and Reliability. doi: 10.1002/stvr.430 [Zhang et al., 2009] L.Zhang, S.Hou, C.Guo, T.Xie, and H.Mei. 2009. “Time-aware test- case prioritization using integer linear programming”. In Proceedings of the eighteenth international symposium on Software testing and analysis (ISSTA '09). ACM, New York, NY, USA, 213-224. [Zimmermann and Nagappan, 2008 ] T.Zimmermann and N.Nagappan. “Predicting defects using network analysis on dependency graphs”. In Proceedings of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 531-540. DOI=10.1145/1368088.1368161
Abstract (if available)
Abstract
As two of the most popular defect removal activities, Inspection and Testing are of the most labor-intensive activities in software development life cycle and consumes between 30% and 50% of total development costs according to many studies. However, most of the current defect removal strategies treat all instances of software artifacts as equally important in a value-neutral way
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Domain-based effort distribution model for software cost estimation
PDF
Software quality analysis: a value-based approach
PDF
Experimental and analytical comparison between pair development and software development with Fagan's inspection
PDF
A value-based theory of software engineering
PDF
Shrinking the cone of uncertainty with continuous assessment for software team dynamics in design and development
PDF
The incremental commitment spiral model process patterns for rapid-fielding projects
PDF
A unified framework for studying architectural decay of software systems
PDF
Using metrics of scattering to assess software quality
PDF
WikiWinWin: a Wiki-based collaboration framework for rapid requirements negotiations
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Quantitative and qualitative analyses of requirements elaboration for early software size estimation
PDF
Composable risk-driven processes for developing software systems from commercial-off-the-shelf (COTS) products
PDF
Improved size and effort estimation models for software maintenance
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Calibrating COCOMO® II for functional size metrics
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
The effects of required security on software development effort
PDF
Architectural evolution and decay in software systems
PDF
Calculating architectural reliability via modeling and analysis
Asset Metadata
Creator
Li, Qi
(author)
Core Title
Value-based, dependency-aware inspection and test prioritization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Software Engineering)
Publication Date
11/21/2012
Defense Date
10/04/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
APBIE,business importance,cost effective,cost estimation,dependency,inspection,OAI-PMH Harvest,prioritization,probability,quality,review,risk exposure,software engineering,software process,testing,value-based
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry W. (
committee chair
), Medvidović, Nenad (
committee member
), Settles, F. Stan (
committee member
)
Creator Email
qli1@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-113295
Unique identifier
UC11290248
Identifier
usctheses-c3-113295 (legacy record id)
Legacy Identifier
etd-LiQi-1329.pdf
Dmrecord
113295
Document Type
Dissertation
Rights
Li, Qi
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
APBIE
business importance
cost effective
cost estimation
dependency
prioritization
probability
quality
review
risk exposure
software engineering
software process
testing
value-based