Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Development and evaluation of value -based review (VBR) methods
(USC Thesis Other) 

Development and evaluation of value -based review (VBR) methods

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content DEVELOPMENT AND EVALUATION OF VALUE-BASED REVIEW (VBR) METHODS by Keun Lee --------------------------------------------------------------------------------------------------- A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2006 Copyright 2006 Keun Lee UMI Number: 3237109 3237109 2006 UMI Microform Copyright All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346 by ProQuest Information and Learning Company. ii ACKNOWLEDGEMENTS First of all, I would like to dedicate this dissertation to God. Even though I had been a Christian, I didn’t know God before coming to USC. God gave me a chance to meet him and let me know how the grace of God had led me to get this far. I finished this dissertation; however, I knew it would be impossible for me to achieve Ph.D. without God. I would also like to thank my family and Good Shepherds members. My parent had supported me with encouragements and prays always. I refreshed when I had great time with GS members especially for SeungHyun, HaeReem, YoungKyung and SaeEun. I am also deeply grateful to my Ph.D. advisor Dr. Barry Boehm. He has not been great mentor in my research but also be my role model of research attitude. I really appreciate his advice and look forward to continue relationship in the future. My gratitude also goes to my committee members, Nenad Medvidovic and Bert Steece. They gave valuable comments on my research so that I can update my dissertation. Last but not least, I really thank to all my friends who prayed for my research including YuJung, SungA, and SeungWon. iii TABLE OF CONTENTS ACKNOWLEDGEMENTS.............................................................................................................. ii LIST OF TABLES............................................................................................................................ v LIST OF FIGURES........................................................................................................................ vii ABSTRACT ................................................................................................................................... x CHAPTER 1 OBECTIVE AND APPROACH ................................................................................ 1 1.1 Motivation.............................................................................................................................. 1 1.1.1 Software Quality and cost effectiveness ........................................................................ 1 1.1.2 Review techniques ......................................................................................................... 2 1.1.3 Comparison of value-based methods and value-neutral methods .................................. 3 1.2 Objective................................................................................................................................9 1.3 Approach.............................................................................................................................. 11 1.3.1 Review techniques ....................................................................................................... 12 1.3.2 Value Based Review .................................................................................................... 12 1.3.3 Experiment................................................................................................................... 13 CHAPTER 2 RELATED RESEARCH ON REVIEW/READING TECHNIQUES ...................... 14 2.1 Checklist Based reading (CBR) ........................................................................................... 14 2.2 Defect Based Reading (DBR).............................................................................................. 15 2.3 Perspective Based Reading (PBR)....................................................................................... 17 2.4 Functionality Based Reading (FBR).................................................................................... 20 2.5 Usage Based Reading (UBR)............................................................................................... 21 CHAPTER 3 VALUE-BASED VERIFICATION & VALIDATION (VBV&V).......................... 25 3.1 Concepts in VBV&V ........................................................................................................... 25 3.2 Value Based Review............................................................................................................ 26 3.3 Value Based Checklist ......................................................................................................... 28 3.4 Example of Artifact Review Process ................................................................................... 31 3.5 Weight of review issues....................................................................................................... 34 CHAPTER 4 EXPERIMENT DESCRIPTION.............................................................................. 36 4.1 CSCI 577A Course: Real-Client Projects with Independent V&V...................................... 37 4.2 Real-Clients Projects in the experiments ............................................................................. 39 4.3 Experiment Preparation ....................................................................................................... 48 4.3.1 Reviewers and roles ..................................................................................................... 48 4.3.2 Materials ...................................................................................................................... 50 4.3.3 Data.............................................................................................................................. 53 4.4 Experiment Planning............................................................................................................ 54 4.4.1 Variables ...................................................................................................................... 54 4.5 Hypotheses........................................................................................................................... 55 4.6 Experiment Design .............................................................................................................. 56 4.7 Experiment Operation.......................................................................................................... 57 4.8 Experiment Data .................................................................................................................. 58 iv CHAPTER 5 EXPERIMENT RESULTS....................................................................................... 65 5.1 Effort Comparison ............................................................................................................... 67 5.2 Comparative Effectiveness .................................................................................................. 69 5.2.1 Overall.......................................................................................................................... 69 5.2.2 Analysis of OCD data .................................................................................................. 73 5.2.3 Analysis of SSRD data................................................................................................. 79 5.2.4 Analysis of SSAD data................................................................................................. 84 5.3 Comparative Cost Effectiveness .......................................................................................... 89 5.3.1 Overall.......................................................................................................................... 89 5.3.2 Analysis of OCD data .................................................................................................. 93 5.3.3 Analysis of SSRD data................................................................................................. 96 5.3.4 Analysis of SSAD data................................................................................................. 99 CHAPTER 6 THREATS TO VALIDITY AND LIMITATIONS................................................ 103 6.1 Nonuniformity of Projects ................................................................................................. 103 6.2 Nonuniformity of Subjects................................................................................................. 103 6.3 Nonuniformity of Motivation ............................................................................................ 104 6.4 Treatment Leakage ............................................................................................................ 104 6.5 Nonrepresentativeness of Subjects..................................................................................... 105 6.6 Limitation: Individual vs. Group Reviewing ..................................................................... 105 CHAPTER 7 DISCUSSION AND CONCLUSIONS .................................................................. 106 7.1 Overall ............................................................................................................................... 106 7.2 Detailed Analysis of artifacts: OCD .................................................................................. 108 7.3 Detailed Analysis of artifacts: SSRD................................................................................. 109 7.4 Detailed Analysis of artifacts: SSAD................................................................................. 110 CHAPTER 8 FUTURE RESEARCH DIRECTIONS .................................................................. 112 8.1 Update Value Based Checklists for OCD, SSRD and SSAD ............................................ 112 8.2 Value Based Test Experimentation: Count Part of the Value-Based Review.................... 112 8.3 Value Based Checklists for Other MBASE Documents: Life Cycle Plan (LCP) and Feasibility Rationale Description (FRD) ........................................................................... 113 8.4 Address the Value-based fixing process ............................................................................ 113 8.4.1 Motivation: Value-based fixing process — Large project data analysis.................... 113 8.5 Combinations of VBR and PBR for Group-Based Reviews.............................................. 118 BIBLIOGRAPHY ........................................................................................................................ 119 APPENDICES.............................................................................................................................. 123 v LIST OF TABLES Table 1-1 Comparative Business Cases: ATG and Pareto Testing...................... 5 Table 1-2 Comparative Business Cases: Value-neutral review and Value-based review......................................................................................... 8 Table 2-1 Example of checklist ......................................................................... 14 Table 4-1 Differences between the LCO and the LCA...................................... 38 Table 4-2 Assignments in the Experiment......................................................... 58 Table 5-1 T-test results and percent Group A less than Group B ...................... 68 Table 5-2 Comparison of Group A and Group B: Average number of Concerns and Problems................................................................................. 70 Table 5-3 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems................................................................................. 71 Table 5-4 Comparison of Group A and Group B: Mean Number based on Concerns and Problems of OCD................................................................... 74 Table 5-5 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of OCD................................................................... 76 Table 5-6 Comparison of Group A and Group B: Mean Number based on Concerns and Problems of SSRD ................................................................. 80 Table 5-7 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of SSRD ................................................................. 81 Table 5-8 Comparison of Group A and Group B: ............................................. 85 Table 5-9 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of SSAD ................................................................. 87 Table 5-10 Comparison of Group A and Group B: Mean number of concerns and problems per hour ................................................................... 89 vi Table 5-11 Comparison of Group A and Group B: Mean Cost Effectiveness of concerns and problems per hour......................................... 92 Table 5-12 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of OCD.................................................... 94 Table 5-13 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of OCD.................................................... 96 Table 5-14 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of SSRD .................................................. 97 Table 5-15 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of SSRD .................................................. 99 Table 5-16 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of SSAD ................................................ 100 Table 5-17 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of SSAD ......................... 102 Table 7-1 The P-values from the T-tests.......................................................... 106 Table 7-2 The P-values from the T-tests of OCD............................................ 108 Table 7-3 The P-values from the T-tests of SSRD .......................................... 109 Table 7-4 The P-values from the T-tests of SSAD .......................................... 110 Table 8-1 Open_Days_IDFix on defects (Accumulated number of defects)... 115 Table 8-2 Open_Days_IDFix on the defect by priority ................................... 116 vii LIST OF FIGURES Figure 1-1 Pareto 80-20 distribution of test value ............................................... 5 Figure 1-2 ROI: Value-Neutral vs. Pareto Analysis ............................................ 6 Figure 1-3 value-neutral review vs. value-based review (Pareto Analysis)......... 9 Figure 1-4 Result chain of VBR ........................................................................ 10 Figure 1-5 Research Approach .......................................................................... 11 Figure 2-1 General idea of DBR ........................................................................ 16 Figure 2-2 General idea of PBR......................................................................... 17 Figure 2-3 The process of developing scenario for PBR ................................... 19 Figure 2-4 UBR processes ................................................................................. 23 Figure 3-1 VBR processes ................................................................................. 27 Figure 3-2 The general Value-Based checklist.................................................. 29 Figure 3-3 Example of artifact-oriented checklists............................................ 30 Figure 3-4 OCD section 4.3 System Capabilities .............................................. 32 Figure 3-5 Effectiveness metric; Issue metrics and optimality guidelines ........ 35 Figure 4-1 Experiment Process .......................................................................... 36 Figure 4-2 Independent Verification and Validation in CSCI 577A ................. 49 Figure 4-3 The Process to collect Data .............................................................. 59 Figure 4-4 AAR form (Concern Log)................................................................ 61 Figure 4-5 Example of collected data in AAR form (Concern Log) ................. 62 viii Figure 4-6 AAR form (Problem List) ................................................................ 63 Figure 4-7 Example of AAR form (Problem List)............................................. 64 Figure 5-1 Number of Concerns and Problems by IV&Vers............................. 65 Figure 5-2 Impact of Concerns and Problems by IV&Vers............................... 66 Figure 5-3 Effort Comparison............................................................................ 67 Figure 5-4 Number of concerns and problems found by IV&Vers in both groups............................................................................................................ 70 Figure 5-5 Impacts of Concerns and Problems in both groups.......................... 72 Figure 5-6 Average number of issues of OCD .................................................. 73 Figure 5-7 Average impact of issues of OCD.................................................... 75 Figure 5-8 Accumulated Average number of OCD issues ................................ 77 Figure 5-9 Accumulated Average impact of OCD issues.................................. 78 Figure 5-10 Average number of issues of SSRD............................................... 79 Figure 5-11 Average impact of issues of SSRD ................................................ 80 Figure 5-12 Accumulated Average number of SSRD issues............................. 82 Figure 5-13 Accumulated Average impact of SSRD issues .............................. 83 Figure 5-14 Average number of issues of SSAD............................................... 84 Figure 5-15 Average impact of issues of SSAD ................................................ 86 Figure 5-16 Average number of SSAD issues ................................................... 87 Figure 5-17 Accumulated Average impact of SSAD issues .............................. 89 Figure 5-18 Number of concerns and problems found by IV&Vers per hour in both groups................................................................................................ 90 ix Figure 5-19 Cost Effectiveness of concerns and problems per hour in both groups............................................................................................................ 92 Figure 5-20 Average number of issues per hour of OCD .................................. 93 Figure 5-21 Average impacts of issues per hour (Cost Effectiveness) of OCD .............................................................................................................. 95 Figure 5-22 Average number of issues per hour of SSRD ................................ 96 Figure 5-23 Average impacts of issues per hour (Cost Effectiveness) of SSRD............................................................................................................. 98 Figure 5-24 Average number of issues per hour of SSAD ................................ 99 Figure 5-25 Average impacts of issues per hour (Cost Effectiveness) of SSAD .......................................................................................................... 101 Figure 8-1 Definitions of Open_Days.............................................................. 114 Figure 8-2 The percentage of accumulated number of defects by their severity........................................................................................................ 117 x ABSTRACT Reviewing is a key activity that can find defects at an early stage of system and software development. Since it is often cheaper to fix defects at an early stage, reviewing is a good technique for improving both product quality and project cost effectiveness. Currently, there are many review techniques proposed and many experiments have been performed to compare the review techniques. However, to data there have been no review techniques or experiments that have focused explicitly on the relative business value or mission value of the artifacts being reviewed. In this dissertation, I provide Value-based review techniques adding cost effectiveness and value of each issue into review processes, and report on an experiment on Value-based review. I developed a set of VBR checklists with issues ranked by success-criticality, and a set of VBR processes prioritized by issue criticality and stakeholder-negotiated product capability priorities. The experiment involved 28 independent verification and validation (IV&V) subjects (full-time working professionals taking a distance learning course) reviewing specifications produced by 18 real-client, full-time student e-services projects. The IV&V subjects were randomly assigned to use either the VBR approach or the previous value-neutral checklist-based reading (CBR) approach that had been xi used in the course. The difference between groups was not statistically significant for number of issues reported, but was statistically significant for number of issues per review hour, total issue impact in terms of criticality and priority, and cost effectiveness in terms of total issue impact per review hour. For the latter, the VBRs were roughly twice as cost-effective as the CBRs. The dissertation also covers threats to validity and limitations of the experiment. Threats to validity were present but appear to have been adequately addresses. The main limitation of the experiment was its coverage of reviews by individuals as compared to groups. For reviewers by groups, it is likely that combinations of VBR and risk-driven forms of perspective-based review (PBR) approaches would be most cost-effective. This and other topics are attractive candidates for further research. 1 CHAPTER 1 OBECTIVE AND APPROACH 1.1 Motivation 1.1.1 Software Quality and cost effectiveness Increasing software quality is a common objective for software engineers, however, the goal is not easy to achieve and there have still been many research efforts addressing how best to decrease defects and increase quality in software. Basically, there can be three main strategies for decreasing defects in software: defect prevention, detecting defects and fixing them, and reducing impacts of defects. Automated analysis, human reviewing, and execution testing are the main methods to detect and fix defects. The methods have their own characteristics, and based on the project situation, they can be used selectively or together. Another issue for software engineers is cost effectiveness. There are four factors in developing software to be considered always: functionality, cost, quality and schedule [BIFF01]. Cost effectiveness is a primary concern of organizations in finding the best combination of four factors. Extensive research has been done on techniques to assess and to reduce the cost, but less has been done on techniques to assess and increase the benefits or cost effectiveness. However, when business 2 cases or stakeholder negotiations of software capabilities have been performed, information is available to assess benefits and cost-effectiveness. 1.1.2 Review techniques The review is one of the main processes to find defects from the initial development stage and increase the quality of the deliveries [BOEH01, FAGA76, GILB93]. Peer reviews have been used in requirements analysis, architecture, design and coding. Many research efforts have focused on formulating effective review processes to find defects [SHUL02, AURU02]. Studies have addressed review team composition and procedures, review preparation and duration, and criteria for focusing reviewers on sources of defects. Initial approaches for focusing reviewers involved adding checklists in the review process [FAGA76]. With checklist-based reviewing (CBR), it is easy to understand the process and CBR has become the most common review-focusing techniques in current practice. Another approach to reviewing artifacts is perspective-based reading (PBR). This focuses on different reviewers’ perspectives, such as designer perspective and tester perspective [BASI96]. Different review perspectives help to find more 3 defects without overlaps. Another reading technique is defect-based reading (DBR), which focuses on different defect classes [PORT95]. Other review techniques proposed are functionality-based reading (FBR) [ABDE04], and usage-based reading (UBR) [CONR03, THEL03A]. A number of studies have compared the effectiveness of these techniques [SHUL02, AURU02, ABDE04, DENG04, BERL04]. They agree in finding that focused review methods do better than unfocused reviews; that methods’ cost effectiveness vary by the nature of the artifacts being reviewed; and generally, that the preferred method to use is situation-dependent. However, with the exception of some uses of PBR and UBR, the cost-effectiveness metrics used for the methods and their evaluation have been value-neutral in that each defect is considered to be equally important. This causes much review effort to be spent on trivial issues like obvious typos and grammatical errors. 1.1.3 Comparison of value-based methods and value-neutral methods Chapter 1 of the book, “Value-Based Software Engineering” [ARUE05] gives an example comparing the use of value-based and value-neutral methods for defect detection. The example describes a $2 million software project to develop a large customer billing system. The proposal of an automated test generation (ATG) tool from a vendor claims that: 4 1. ATG will cut the test cost in half. 2. Typical test cost is 50% of total development cost (or $1M). 3. Cost for using the ATG is 30% of typical test cost (or $300K). 4. The benefit of using the test tool is $500K - $300K = $200K. The experience paper [PERS04] shows that ATG might not reduce the cost as much as the vendor proposed. The reasons for the failure are unrepresentative test coverage, too much output data, lack of management commitment, and lack of preparation and experience. An additional serious concern is that ATG tools, along with many other software engineering methods, processes, and tools, are value-neutral. However, in many practical applications, the relative value of requirements, test cases, and defects have a Pareto distribution. The Pareto distribution shows that 80% of the mission value comes from 20% of the software components. Figure 1- 1 shows the Pareto curve from the experience report (Bullock, 2000). In this application, each customer billing type tested improved initial billing revenues from 75% to 90% and reduced customer complaint rates. However, just one of the 15 customer types accounted for 50% of the business revenue, and the rest roughly followed a Pareto 80-20 distribution. The straight line in Figure 1-1 is a result of value-neutral ATG-driven testing in which case each test is equally likely to affect business value. 20 40 60 80 100 510 15 ATG – all test have equal value Actual business value Cumulative Business Value (%) Customer Billing Type Figure 1-1 Pareto 80-20 distribution of test value 5 ATG Testing Pareto Testing % of Tests Cost(K) Value(K) Net Value(K) ROI Cost(K) Value(K) Net Value(K) ROI 0 1300 0 -1300 -1 1000 0 -1000 -1 10 1350 400 -950 -0.7 1100 2560 1460 1.33 20 1400 800 -600 -0.43 1200 3200 2000 1.67 40 1500 1600 100 0.07 1400 3840 2440 1.74 60 1600 2400 800 0.5 1600 3968 2368 1.48 80 1700 3200 1500 0.88 1800 3994 2194 1.21 100 1800 4000 2200 1.22 2000 4000 2000 1 Table 1-1 Comparative Business Cases: ATG and Pareto Testing Table 1-1 shows the relative levels of relative levels of investment costs, business benefits, and return of investment ROI = (benefit-costs)/costs, for the value- neutral ATG testing and value-based Pareto testing strategies. Figure 1-2 compares the ROIs. 2 1.5 1 0.5 0 -0.5 -1 -1.5 20 40 60 80 100 ROI % Tests Run ATG Pareto Figure 1-2 ROI: Value-Neutral vs. Pareto Analysis The assumptions of the analysis are as follows: z $1M of the development costs has been invested in the customer billing system by the beginning of testing. z The ATG tool will cost $300K and will reduce test cost by 50% as promised. z The business case for the system will produce $4M in business value in return for the $2M investment cost. 6 7 z The business case will provide a similar 80:20 distribution for the remaining 14 customer types. Table 1-1 and Figure 1-2 show the comparative results of using ATG tool and Pareto testing. As seen in Table 1-1, and Figure 1-2, the highest ROI is at 40% tested point for Pareto testing, however, it is at 100% tested point for ATG. If testing focuses on higher values first, the highest ROI is much better, and is achieved at an earlier stage. This example comprises the value-based method and the value-neutral method. Even though it is an example for testing tool, testing is another primary process to increase software quality. If the results of the example are applied to the reading/review technique, similar results are produced. I will now compare a value-neutral review technique and value-based review technique. The assumptions are as follows: z $1M of the development costs has been invested in the customer billing system by the beginning of reviewing. z The value-neutral and the value-based review techniques will cost 200K each. z The business case for the system will produce $4M in business value in return for the $1.2M investment cost. 8 z The business case will provide a similar 80:20 distribution for the value- based review. Other assumptions are the same as with the testing comparison. Table 1-2, Figure 1-3 show ROIs for value-neutral and value-based reviews. ATG Testing Pareto Testing % of review ed Cost(K) Value(K) Net Value(K) ROI Cost(K) Value(K) Net Value(K) ROI 0 1000 0 -1000 -1 1000 0 -1000 -1 20 1040 800 -240 -0.23 1040 3200 2160 2.08 40 1080 1600 520 0.48 1080 3840 2760 2.56 60 1120 2400 1280 1.14 1120 3968 2848 2.54 80 1160 3200 2040 1.76 1160 3994 2834 2.44 100 1200 4000 2800 2.33 1200 4000 2800 2.33 Table 1-2 Comparative Business Cases: Value-neutral review and Value-based review Table 1-2 and Figure 1-3 show comparative results of using a value-neutral review and a value-based review. For a value-based review, the highest ROI, 2.56, is achieved between 40% and 60% reviewed point. However, the highest ROI, 2.33, for value-neutral review is achieved at 100% reviewed point. For a value-based review, beyond the 60% reviewed point, $80K of review investment produces only $52K in business value. This result shows that a value-based review can be worthwhile at an early stage of the review process. 2 1.5 1 0.5 0 -0.5 -1 -1.5 20 40 60 80 100 ROI % Reviews Run Value-neutral Value-based Figure 1-3 value-neutral review vs. value-based review (Pareto Analysis) 1.2 Objective The primary objective of the research was to develop a value-based review technique and evaluate its relative cost effectiveness with respect to current value- neutral. An experiment was performed to evaluate the VBR with respect to the value-neutral checklist-based review (CBR) technique having been used previously. 9 Developing VBR process Developing VBR checklist Value-Based Review Increase cost effectiven ess More effort spent on higher-value issues Better understanding of the higher-criticality defects Figure 1-4 Result chain of VBR Figure 1-4 shows the result chain of VBR. The desired outcome is increasing cost effectiveness in the reading technique VBR. The contribution to this outcome of an initiative to develop a VBR process is to focus effort on higher–value issues. The contribution of investing in value-based review checklists to this outcome is to increase understanding of which defects are most critical to identify. In order to validate that investing in the initiatives would indeed lead to the desired outcome; an experiment was designed to evaluate VBR, by comparing it with the CBR guidelines that had been used in previous years. 10 1.3 Approach Design Value-Based Review Experiment Related Research • Checklist Based Reading • Defect Based Reading • Perspective Based Reading • Functionality Based Reading • Usage Based Reading • VBR concepts • VBR process •VBR checklist • Effectiveness Metrics •Experiment Design •Experiment Result • Experiment Discussion • Experiment Conclusion Figure 1-5 Research Approach The next few sections describe the research approach and results. In section 2, several common and popular review/reading techniques are surveyed including, reading processes, characteristics, strengths, pitfalls, and costs. In section 3, based on the survey, a VBR process and associated VBR checklists are developed. New concepts, main processes, and critical success factors are considered in this step. In section 4, the experiment that evaluates the VBR is described, and aspects of the results discussed in section 5-7. A summary of the research approach is shown in Figure 1-5. 11 12 1.3.1 Review techniques To improve upon ad hoc reviewing, several review/reading techniques have been developed. Checklist-based reading (CBR) is a common reading technique that is used in many industrial fields. Perspective-based reading (PBR) is organized around multiple review perspectives; several empirical reports comparing CBR and PBR have been published. Some particular perspectives include Defect-based reading (DBR), functionality-based reading (FBR), and usage-based reading (UBR). These review/reading techniques are compared with each other and with value-based reviews in section 2. 1.3.2 Value Based Review Based on research of review/reading techniques, a value-based review method is developed. It includes a basic VBR process and a set of value-based checklists. Its general concepts include priority and criticality. The priority and criticality are a basis used to calculate the value of artifacts. There are two checklists for VBR: the general value-based checklist covers general issues, and artifact-oriented checklists are prepared to review particular artifacts. 13 1.3.3 Experiment The experiment was designed to compare the use of VBR using value-based procedures and checklists with a value-neutral CBR approach. The two approaches were performed by 28 randomly-selected professional software engineers taking a graduate software engineering project course by distance learning at USC. Their project assignment was to independently verify and validate (IV&V) the artifacts from one of 18 real-client e-services project applications that were being developed by the on-campus teams in the course [BOEH98]. 14 CHAPTER 2 RELATED RESEARCH ON REVIEW/READING TECHNIQUES 2.1 Checklist Based reading (CBR) Checklist-based reading (CBR) is a reading technique using checklists to read artifacts. The checklist is provided as part of the evaluation process for Quality Assurance. The checklist helps ensure that the appropriate items have been covered for software elements. Usually, the checklist is applied to requirements, design, and code artifacts. Questions in a checklist are to review completeness, consistency, feasibility and testability issues. A checklist gives what reviewers should read but it doesn’t provide how to read the artifacts. Table 2-1 shows an example of a checklist used as a part of a value-neutral CBR in the experiment. Architecture Design Checklist - Completeness Have all TBDs been resolved in requirements and specifications? Are all of the assumptions, constraints, decisions and dependencies for this design documented? Is the logic correct and complete? Is the conceptual view of all composite data elements, parameters, and objects documented? Questions Is there any data structure needed that has not been defined and vice versa? Table 2-1 Example of checklist The strength of CBR is that it is easy to understand and thus additional efforts to train in CBR are not necessary. In addition, it is considered to be superior to an ad 15 hoc reading technique. Today, CBR is a common reading technique used in the industrial fields so that there are many experiments comparing other reading techniques with CBR. The weakness of CBR is the passive attitude for reviewers to read artifacts. Generally, there are not formal steps or procedures to follow. Reviewers read artifacts following questions in checklists. And when the reviewer finds any issue which is not pass questions in checklists, (s)he report the issue. Therefore reviewers do not need to read artifacts with active attitude to find issues. 2.2 Defect Based Reading (DBR) Primary objective of Defect-based reading (DBR) is developing a strategy for identifying defects in requirement documents. DBR focuses defect classification to find defects in artifacts and a scenario is a key factor in DBR. The scenario of DBR is derived from defects classification. Based on defect class, a set of questions is droved from the defect class. The questions are characterizing the defect class, and also guiding the reviewers to read artifacts. The difference between scenario and checklist is scenario has more specific information and guidelines for how to read the artifacts. The figure 2-1 shows the basic idea of DBR. Checklist Defect Class Scenario Defect-based reading Developing questions Designing Model Figure 2-1 General idea of DBR There have been empirical reports about comparing DBR and the other two reading techniques, CBR and ad-hoc reading [FUSA97, MILL98, PORT95, PORT98, SAND98]. Some reports concluded that DBR has better performance results than CBR and ad-hoc reading techniques [PORT95, PORT98]. Some results from the study by Porter et al. [PORT95] include: 1. The defect detection rate for DBR is higher than the other two reading techniques. 2. Scenarios lead reviewers to focus on defect classes specifically. 3. On the average, collection meetings contributed nothing to defect detection effectiveness. Other experiments could not find the statistical evidence that DBR is better in finding issues than the others [FUSA97, SAND98]. However, it is clear that CBR 16 and ad-hoc reading techniques have no description given as to how the document should be read. DBR is more structured, distinct, and less overlapping than these other two techniques. 2.3 Perspective Based Reading (PBR) Defect Class Roles Scenario Perspective- based reading Developing questions Identifying activities Figure 2-2 General idea of PBR 17 Even though many observers are looking at only one object, the point of view of each observer is different. The interesting on the object is different and the attitude for looking is also different. Different interesting and attitude lead to focusing on different characteristics of the object. This idea is the basis of perspective-based reading (PBR). The assumption of PBR is each reviewer has different particular points of view to look at the artifacts. There are several roles in reviewing such as tester, designer, and user. Different roles cause different points of view and lead to focusing on different concerns in the artifacts. The 18 scenarios for each role are provided, and the reviewer follows the scenarios based on their roles. The scenario includes a set of questions and activities to guide the reviewers on what to perform, and how to read the artifacts. While following the scenario, the reviewer reads the artifacts and finds issues. There are many empirical reports that PBR is more useful to find more issues than CBR, and without overlaps. The general idea of PBR is shown in Figure 2-2. PBR has the following characteristics: • Systematic: The reader knows how he should read the document • Specific: The reader is only responsible for his role and the defects that can be detected with regard to his particular point of view. • Distinct: Each reader has his own role. I assume that there is a broader coverage between them as far as defects from different defect classes are concerned. The key success factor of PBR is scenario. More practical and precise scenarios will lead to more efficiency of PBR so that the process to build scenarios is important in PBR. The process for developing the scenario is shown in figure 2-3. First, the main activities of each role are abstracted. Next, based on abstraction, practical activities for the roles are identified. Last, detailed questions and guidelines for each activity are defined. Abstraction the responsibility of role Identification and description of expected activities for each role Provide questions for each activity based on different defect classes Figure 2-3 The process of developing scenario for PBR Dr. Basili, developer of PBR, said in his paper [BASI96], "PBR teams were seen to have an improved (statistically significant at the 0.05-level) coverage of defects for both the generic and NASA documents. It was also shown that individuals using PBR performed better on the generic documents than non-PBR reviewers. There was, however, no significant effect on individual detection rates due to reading technique for the generic documents." There are some other empirical studies comparing PBR with CBR and ad-hoc reading [CIOL97, SORU97, SHUL98, REGN00, LANU00, BIFF01, HALL01]. Some studies proved that PBR has higher performance results to find issues with statistical significance than other readings [CIOL97, SHUL98]. However, other studies could not prove this [SORU97, REGN00, LANU00, BIFF01, HALL01]. 19 20 DBR is usually compared to PBR because of its characteristics. DBR is reading artifacts based on defect classes; however, PBR is based on different perspectives on artifacts, otherwise, the processes of reading are almost similar. 2.4 Functionality Based Reading (FBR) Functionality-based reading (FBR) has different starting points than other reading techniques. FBR is developed based on framework reading. Framework is an artifact with a particular objective. For example, object-oriented frameworks are artifacts for reuse and extensibility. Currently, there have been many developments of frameworks for many other domains. There are empirical reports to compare reading techniques for frameworks. Three reading techniques, CBR, use case-driven reading (UCDR), and systematic order- based reading (SOBR), are compared in the experiment [ABDE04]. The results show there are no superior reading techniques, and the best solution is using the three reading techniques together based on practical situation. FBR is proposed to trace framework requirements to produce well-constructed framework and review the code. Functionality types and rules are defined to extract the functionalities from the framework. Functionality types are provided as input to FBR by the top-down phase of “framework understanding.” 21 Functionality rules describe the objective, description of the functionality, location of the functionality, and correlated functionality. The process of FBR is first understanding the framework based on functionality types, and then reviewing functionality rules to find defects. The experiment was performed to compare FBR with two other OO reading techniques, CBR and SOBR. The experiment results show FBR seems to be significantly more effective and productive than CBR and SOBR. The strength of FBR is that it is very applicable to function based artifacts. It gives formal steps to understand functions and key points to review in an artifact. However, due to the characteristics, it has limitation to apply to other artifact types like object-oriented artifacts. 2.5 Usage Based Reading (UBR) The idea behind usage-based reading (UBR) is to spend reading effort to find the most critical usage-based defects in the reviewed artifacts. Defect criticality in UBR is prioritized in terms of the users’ perception of system quality. Potential users identify their perception of system quality based on prioritized use cases. 22 And other procedures like resource scheduling, meeting, and follow-up are important steps related to UBR. UBR identifies use cases as to focus review effort. Also, use cases can tell reviewers about practical review plan, as test cases are helpful to tester in making plan for testing. There are three steps in the UBR. The beginning of UBR is prioritizing the use case. This step is called ‘Before Inspection’. One of key ideas of UBR is prioritizing use cases as a vehicle for a review [THEL03A]. Based on potential users’ opinions, the use cases are prioritized to identify the users’ perception of system quality. Next is to understand design artifacts. Since UBR is performed based on use cases, the understanding on design artifacts is important. Through the step, the reviewers need to utilize the use cases to guide reading and requirements documents. Requirement documents are used as references to verify the use cases. The last step is inspection, which is inspecting the design artifact. There are four small steps in inspection. Figure 2-4 shows all the steps of UBR. Prioritizing Use cases Understand Design artifacts Select highest- priority use case Trace and execute the use case Review the artifacts pass the exit criteria All use cases are inspected or time is unavailable? Inspection Design artifacts yes no Figure 2-4 UBR processes UBR has some common and also different characteristics of PBR. The common point is utilization of the user perspective. While PBR uses different perspectives like tester, designer, and user, UBR uses only user perspective and prioritized use cases. Another difference is applicable artifacts. PBR is applicable to all artifacts if the scenarios for the artifacts are developed, while the scenario is developed 23 24 based on use cases. The scenario can be used for requirement artifacts, and code inspection in that project. There have been experiments to compare CBR and UBR and the conclusion is that there are differences between the two reading techniques. It has been shown that UBR had better performances than CBR. Based on experiments, it is proved that reviewers applying usage-based reading are more efficient and effective in detecting the most critical faults from a user’s point of view than reviewers using checklist-based reading [THEL03B, THEL04]. 25 CHAPTER 3 VALUE-BASED VERIFICATION & VALIDATION (VBV&V) 3.1 Concepts in VBV&V The basic idea of VBV&V, as with other value-based software engineering (VBSE) activities, is to treat each V&V activity (analysis, review, and test) as a candidate investment in improving the software products [AURU05]. Earlier VBSE activities involve prioritizing the capabilities to be developed. This enables VBV&V activities to be sequenced by priority and criticality. Generally there are many stakeholders in a project. Each stakeholder has different win conditions so that priorities of each stakeholder are also different. Through negotiations and meetings with all stakeholders, priorities of system capabilities are determined. The priorities values determine values of system capabilities. The values of priority are High, Medium, Low or 3, 2, 1 in the experiment. Another factor to determine the value of a project is criticality. The criticality is how critical an undiscovered defect is to the project’s success. Generally, the values of criticalities are furnished by domain experts, but reviewers can determine the values in special circumstances. Since the Model-Based (System) Architecting and Software Engineering (MBASE) approach is used to develop artifacts in a project, MBASE experts determined the levels of criticality for the MBASE- related defects. The values of criticality are also High, Medium, Low or 3, 2, 1 in 26 the experiment. The numerical values of priority and criticality are used to guide V&V activities and evaluate their effectiveness. 3.2 Value Based Review With the aid of other MBASE experts, I have developed an experimental set of value-based checklists for reviewing general and MBASE-specific specifications. These are used to test the hypothesis that review activities will be more cost- effective if review effort is focused on the higher-priority system capabilities and the higher-criticality sources of implementation risk [LEE05]. The USC Center for Software Engineering (CSE) spiral approach to systems and software engineering used in MBASE emphasizes risk-driven activities, but CSE previous review guidelines made no distinctions in focusing limited reviewing resources on high vs. low priority and criticality review artifacts. The revised approach to overcome the weaknesses in the previous approach includes the priority of system capabilities and criticality of sources of risk. Low- priority, low-criticality items are optional to review based on time availability. The basic process for the Value-based review can be divided into four steps. 1. First the reviewers need to determine which system capability will be reviewed first. A negotiation among stakeholders defines the priority of each system capability. 2. The high-priority system capabilities will be reviewed first. If there are three levels of priorities like high, medium, and low, the system capabilities with high priorities will be reviewed first. 3. At each priority level, high-criticality sources of risk will be reviewed first, basically going down the columns in Figure 3-1 below. 4. After reviewing the issues with higher criticality, the reviewers can address the next lower criticality issues, as time is available. As shown in Figure 3-1, it is optional to go all the way down the columns for the lower-priority artifacts. Negotiation Meetings Developers Customers Users Other stakeholders Priorities of system capabilities Artifact-oriented checklist Criticalities of issues General Value- based checklist Domain Expert Priority High Medi um Low Critic ality High Medi um Low 1 2 3 4 5 optio nal 6 optio nal optio nal Reviewing Artifacts Number indicates the usual ordering of review* * May be more cost-effective to review highly-coupled mixed-priority artifacts. 27 Figure 3-1 VBR processes 28 Figure 3-1 presents general processes of VBR. The priorities of system capabilities are determined through stakeholders’ negotiations and meetings. Domain experts decide the values of criticality of issues. The combination of two values of priority and criticality determines an order of review. The numbers in the table (right side) in the figure 3-1 shows orders in a review. Higher valued issues are reviewed first, lower one is reviewed later, lower one is optional if review time is not available. 3.3 Value Based Checklist There are two checklists for VBR. The first one is a general value-based checklist shown in Figure 3-2 and the other is Artifact-oriented checklists. The main difference of Value Based checklists to normal checklists is that Value Based checklists include priority and criticality values in each question or statements in the checklists. General Value-based checklist The general checklist covers general issues for reviewing any set of artifacts for completeness, consistency, feasibility, ambiguity, conformance, and risk. Each category has three levels of issue criticality: high, medium, and low. •Missing FRD evidence of mitigation strategies for low- probability, low-impact risks •Missing FRD evidence of mitigation strategies for low-probability high- impact or high-probability, low-impact risks: unlikely disasters, off-line service delays, missing but easily- available information •Missing FRD evidence of critical capability feasibility: high-priority features, levels of service, budgets and schedules •Critical risks in top-10 risk checklist: personnel, budgets and schedules, requirements, COTS, architecture, technology Risk •Non-misleading lack of conformance with document formatting standards, method and tool conventions, optional or low- impact operational standards •Lack of conformance with medium- criticality operational standards, external interfaces •Misleading lack of conformance with document formatting standards, method and tool conventions •Lack of conformance with critical operational standards, external interfaces Conformance •Non-misleading, easily deferrable, low-impact ambiguities: GUI details, report details, error messages, help messages, grammatical errors •Vaguely defined medium-criticality capabilities, test criteria •Medium-criticality misleading ambiguities •Vaguely defined critical dependability capabilities: fault tolerance, graceful degradation, interoperability, safety, security, survivability •Critical misleading ambiguities: stakeholder intent, acceptance criteria, critical user decision support, terminology Ambiguity •Easily-deferrable, low-impact inconsistencies or inexplicit traceability: GUI details, report details, error messages, help messages, grammatical errors •Medium-criticality shortfalls in traceability, inter-artifact inconsistencies, evidence of consistency/feasibility in FRD •Critical elements in OCD, SSRD, SSAD, LCP not traceable to each other •Critical inter-artifact inconsistencies: priorities, assumptions, input/output, preconditions/post- conditions •Missing evidence of critical consistency/feasibility assurance in FRD Consistency/ Feasibility •Easily-deferrable, low-impact missing elements: straightforward error messages, help messages, GUI details doable via GUI builder, project task sequence details •Medium-criticality missing elements, processes and tools: maintenance and diagnostic support; user help •Medium-criticality exceptions and off- nominal conditions; smaller tasks (review, client demos), missing desired growth capabilities, workload characterization •Critical missing elements: backup/ recovery, external interfaces, success-critical stakeholders; critical exception handling, missing priorities •Critical missing processes and tools; planning and preparation for major downstream tasks (development, integration, test, transition) •Critical missing project assumptions (client responsiveness, COTS adequacy, needed resources) Completeness Low-Criticality Issues Medium-Criticality Issues High-Criticality Issues •Missing FRD evidence of mitigation strategies for low- probability, low-impact risks •Missing FRD evidence of mitigation strategies for low-probability high- impact or high-probability, low-impact risks: unlikely disasters, off-line service delays, missing but easily- available information •Missing FRD evidence of critical capability feasibility: high-priority features, levels of service, budgets and schedules •Critical risks in top-10 risk checklist: personnel, budgets and schedules, requirements, COTS, architecture, technology Risk •Non-misleading lack of conformance with document formatting standards, method and tool conventions, optional or low- impact operational standards •Lack of conformance with medium- criticality operational standards, external interfaces •Misleading lack of conformance with document formatting standards, method and tool conventions •Lack of conformance with critical operational standards, external interfaces Conformance •Non-misleading, easily deferrable, low-impact ambiguities: GUI details, report details, error messages, help messages, grammatical errors •Vaguely defined medium-criticality capabilities, test criteria •Medium-criticality misleading ambiguities •Vaguely defined critical dependability capabilities: fault tolerance, graceful degradation, interoperability, safety, security, survivability •Critical misleading ambiguities: stakeholder intent, acceptance criteria, critical user decision support, terminology Ambiguity •Easily-deferrable, low-impact inconsistencies or inexplicit traceability: GUI details, report details, error messages, help messages, grammatical errors •Medium-criticality shortfalls in traceability, inter-artifact inconsistencies, evidence of consistency/feasibility in FRD •Critical elements in OCD, SSRD, SSAD, LCP not traceable to each other •Critical inter-artifact inconsistencies: priorities, assumptions, input/output, preconditions/post- conditions •Missing evidence of critical consistency/feasibility assurance in FRD Consistency/ Feasibility •Easily-deferrable, low-impact missing elements: straightforward error messages, help messages, GUI details doable via GUI builder, project task sequence details •Medium-criticality missing elements, processes and tools: maintenance and diagnostic support; user help •Medium-criticality exceptions and off- nominal conditions; smaller tasks (review, client demos), missing desired growth capabilities, workload characterization •Critical missing elements: backup/ recovery, external interfaces, success-critical stakeholders; critical exception handling, missing priorities •Critical missing processes and tools; planning and preparation for major downstream tasks (development, integration, test, transition) •Critical missing project assumptions (client responsiveness, COTS adequacy, needed resources) Completeness Low-Criticality Issues Medium-Criticality Issues High-Criticality Issues Figure 3-2 The general Value-Based checklist 29 Artifact-oriented checklists The set of value-based checklists also includes artifact-oriented checklists. Since the experiment had been performed using Model-based (system) Architecting and software Engineering (MBASE), the MBASE-oriented checklist is provided to the reviewers for reviewing the various sections of the following MBASE artifacts: Operational Concept Description (OCD), Requirements Description (SSRD), and Architecture Description (SSAD) specifications. 2 ‰ Are there no levels of service goals (OCD 4.4) included as system capabilities? 2 ‰ Are simple lower-priority capabilities (e.g., login) described in less detail? 3 ‰ Are capabilities traced back to corresponding project goals and constraints (OCD 4.2)? 3 ‰ Are capability priorities consistent with current system shortcoming priorities (OCD 3.3.5)? 3 ‰ Are capabilities prioritized as High, Medium, or Low? 3 ‰ Are there critical missing capabilities needed to perform the system services? 3 ‰ Are the system capabilities consistent with the system services provided as described in OCD 2.3? Criticality Question 2 ‰ Are there no levels of service goals (OCD 4.4) included as system capabilities? 2 ‰ Are simple lower-priority capabilities (e.g., login) described in less detail? 3 ‰ Are capabilities traced back to corresponding project goals and constraints (OCD 4.2)? 3 ‰ Are capability priorities consistent with current system shortcoming priorities (OCD 3.3.5)? 3 ‰ Are capabilities prioritized as High, Medium, or Low? 3 ‰ Are there critical missing capabilities needed to perform the system services? 3 ‰ Are the system capabilities consistent with the system services provided as described in OCD 2.3? Criticality Question OCD 4.3 System Capability Figure 3-3 Example of artifact-oriented checklists 30 31 The figure 3-3 shows an example of artifact-oriented checklist. Since the artifacts are MBASE document, it is called MBASE-oriented checklist. Questions are provided in the checklist to check artifacts. Each question has criticality value which is determined by domain experts. First five questions are with criticality 3, while two questions in the bottom are with criticality 2. Based on the values, reviewers decide an order to check questions. 3.4 Example of Artifact Review Process Below is part of an adapted OCD artifact from a 2003-04 CS 577 project. There are two main capabilities described in OCD section 4.3. Their priorities were determined through the project’s WinWin negotiation and MBASE guidelines. The example shows how to review an artifact using Value-based checklists. Two Value-based checklists in Figure 3-2, “The general Value-Based Checklist,” and Figure 3-3, “Artifact-oriented checklists”, are used in the example. <Start of Example> Figure 3-4 shows a part of artifact OCD section 4.3 “System Capabilities.” In the example, it describes two system capabilities, “Convert Word to XML” and “Error Check.” And some more detail information like descriptions, and priorities of the system capabilities are described. Process-03 Used In High Priority The application shell be able to convert SSRD word document to XML file. Description Convert word 2000 SSRD to XML. Name CAP-01 Process-03 Used In High Priority The application shell be able to convert SSRD word document to XML file. Description Convert word 2000 SSRD to XML. Name CAP-01 Process-03 Used In Medium Priority The application should be able to check errors during conversion. Description Error Check Name CAP-02 Process-03 Used In Medium Priority The application should be able to check errors during conversion. Description Error Check Name CAP-02 OCD 4.3 System Capabilities OCD 4.3.1 Convert Word to XML. Table 3-3. Convert Word to XML capability OCD 4.3.2 Error Check Table 3-4. Error Check capability Figure 3-4 OCD section 4.3 System Capabilities When a reviewer reads section 4.3.1, (s)he sees that the priority of capability “Convert Word to XML” is high and reviews the section now. The next step the reviewer follows is checking the issue criticalities for the system capability. 32 33 Figure 3-3 shows the part of the Value-Based Checklist for section 4.3. There are seven potential issues, called Questions in the checklist. The criticalities of four of them are high (the value of criticality is 3); whereas the rest of them are medium (the value of criticality is 2). When the reviewer reads section 4.3.1, (s)he needs to review high criticality questions first. The first five questions are about consistency with OCD 2.3 and OCD 3.3.5, missing capabilities and prioritizing all with a high criticality value. The reviewer checks the consistency with other sections, missing capabilities and prioritizing. In this case the reviewer finds that OCD 2.3 refers to conversion of SSRD Word files to HTML, and registers an inconsistency concern. When the reviewer has reviewed all the high criticality questions, (s)he moves onto the next criticality questions. The criticalities of the rest of the questions are medium. So the reviewer reads the rest of the questions. The questions are about description of lower-priority capabilities and levels of service goals. The reviewer finds no concerns about them, and does not register any concerns. The reviewer also notices several minor in consistencies and grammatical shortfalls (Word 2000 vs. word, shall vs. shell, awkward English). The reviewer consults the General Value-Based Checklist in Figure 3-2, determines that these 34 are Low-criticality, non-misleading Ambiguity and Conformance issues, and decides not to register them. Such concerns are often registered when using value- neutral guidelines; they are generally not cost-effective to process. The reviewer finishes reviewing system capability ‘Convert Word to XML’ (section 4.3.1) and moves onto the next system capability, ‘Error Check’ in section 4.3.2. The reviewer sees that the priority of the capability is medium. The reviewer could defer section 4.3.2 and come back later, but decides that it is easy to review and that it makes sense to maintain review continuity and context. No issues are found with it. <End of Example> 3.5 Weight of review issues The main goal of the Value-based review is to increase the cost effectiveness of the review. Review effectiveness metrics involve weighted sums of distinct issues reported, using the impact metrics in Figure 3-5. Effectiveness Metric Artifact Priority Issue Criticality HM L H M L 96 3 64 2 32 1 issues (Artifact Priority) * (Issue Criticality) = Generally considered optional to review Figure 3-5 Effectiveness metric; Issue metrics and optimality guidelines Each issue reported has a priority value and criticality value. The impact of each issue is the product of its priority and criticality value. For example, if one issue has medium priority and high criticality, the impact of the issue is six from two (medium) times three (high). An issue with high priority and high criticality has value 9. The issue has nine times higher value than an issue with low priority and low criticality. The overall review effectiveness metric is the sum of all the issue impacts. As reviewer finds more number of higher valued issues, the review activity has higher impacts. 35 CHAPTER 4 EXPERIMENT DESCRIPTION The experiment was performed during the fall 2004 semester and is described in this section Figure 4-1 shows the experiment processes that have been performed. Experiment Preparation Experiment Planning Experiment Operation Experiment Data •Reviewers • Materials • Roles & Activity •Data • Variables • Hypotheses •Design •Extraction •Analysis Figure 4-1 Experiment Process Basically there are four steps for the experiment process. The preparation step covers general preparation for the experiment. I identified which artifacts to be reviewed, and what data to be collected. Also, materials for review techniques are prepared before the experiment. After this preparation step, I defined hypotheses to test and identified variables in the experiment. An experiment operation plan was built in the planning step. 36 37 There had been an experiment operation after planning. The operation had been performed in the course CSCI 577A Software Engineering. There were five times collections of data. The collected data were vetted before storing in Database. Final step is analysis of the data. The hypotheses were tested using the collected data. 4.1 CSCI 577A Course: Real-Client Projects with Independent V&V The CSCI 577A course is the first half of the USC Software Engineering project course, which focuses on software plans, processes, requirements, and architectures. This course covers the application of software engineering process models and management approaches for developing plans, requirements, and architecture of large software systems. Students work in teams and apply the Win- Win spiral model [BOEH98] and Model Based System Architecting and Software Engineering (MBASE) guidelines for software engineering of real-client projects [BOEH99]. The three critical project milestones are the Life Cycle Objectives (LCO), the Life Cycle Architecture (LCA), and the Initial Operating Capability (IOC) in MBASE. In the LCO and the LCA, the focus is on requirement analysis and system design, however, coding is the main task in the IOC. The LOC and the 38 LCA are done in CSCI 577A and the IOC is performed in CSCI 577B. The main differences between the LCO and the LCA are shown in Table 4-1. LCO LCA Less structured, with information moving around More formal, with solid tracing upward and downward. Focus on the strategy or “vision” (e.g., for the Operational Concept Description and Life Cycle Plan), as opposed to the details. No major unsolved issues or items, and closure mechanisms identified for any unsolved issues or items. Many have sine mismatches (indicating unresolved issues or items). No more TBDs expect possibly within Construction, Transition, and Support plans. No need for complete forward and backward traceability Basic elements from the Life Cycle Plan are indicated within the Construction, Transition, and Support plans. Many still include “possible” or “potential” elements (e.g., Entities, Components, etc) There should no longer be any “possible” or “potential” elements. Some sections may be left as TBD, particularly Construction, Transition, and Support plans. No more superfluous, unreferenced items: each element either should reference or be referenced by another element. Items that are not referenced should be eliminated, or documented as irrelevant. Table 4-1 Differences between the LCO and the LCA The review guidelines in CSCI 577 have been value-neutral. Every artifact and issue was equally important and was reviewed from the beginning to the end with the same weight. 39 4.2 Real-Clients Projects in the experiments This section described 18 real-clients projects in the experiment. Project: Quality Management Information System Enhancement There are many techniques to detect the defects from the system. However, currently, QMIS manages only the defect information which are produced by Agile Review technique. The goal of this project is to design and implement an enhanced QMIS system which will provide full Quality Management capability. Firstly, the system shall collect the defects which are detected by Agile Review technique, testing and bug reporting. Secondly, the system shall manage and extract the metadata for future analysis. Moreover, the proposed system not only organizes the data for future analysis, but also provides action guidelines for quality assessment techniques and a data tracking system for current projects. Project: Interactive USC Maternal, Child and Adolescent Center for Infectious Disease and Virology The MCA program which is part of the Keck School of Medicine's Department of Pediatrics provides comprehensive HIV services to approximately 900 patients per year in the Los Angeles area and is the largest center for services to pregnant women with HIV in area. The program has grown dramatically in the last few years and has been very successful in the treatment of vertical transmission of 40 HIV---meaning that women with HIV disease do not pass their disease to their children. The Clinic's successes culminated in a visit by UN Secretary General Kofi Annan, in December 2003 where he heralded the model of care as one that should be duplicated around the world. The primary objective of the project is to develop a website with tools for program development. The website should be interactive, outcome based and include a component for prevention of the HIV disease. It will include successful program strategies, papers written by experts in the field of infectious disease, clinical care guidelines. The goal is to help programs around the world successfully care for those suffering with HIV. Project: Online Bibliographies on Chinese Religions in Western Languages The main objective of the project is to develop the system for management of bibliography online so that the entire corpus is easily available and updating can be done electronically. Main features of the system are z Chinese Character entry system z Author submission of entries and abstract of content. z Graphics capability. z Multilingual indexes. 41 z Cross-referencing to the topic-based table of contents which is now the primary matrix of presentation should be much more thorough than in the print-based versions. Project: Hunter-Gatherer Anthropological Database Search Engine The objective of the project is developing a Google-type search engine for the specialized needs of scholars in evolutionary biology, psychology, and anthropology who will use this database when it is eventually made public. The search engine will have to be able to handle several thousand coding categories, and it will need to do specialized searches involving combinations of categories. It also will need to be searchable either for full texts falling under a given code, or brief summaries of these texts, or both together, and it also will need to be searchable with or without pictures. Project: Unified Cost Model The challenge is to integrate the COCOMO suite of models and provide a single Graphical User Interface that allows the user select which models should be included in the estimate. When these models are used together there needs to be a way to track the scope of the estimate (what WBS activities are being estimated) and the life cycle stages being covered by the estimate. 42 Project: Bar-coding for Maternal-Child Virology Research Laboratory, LAC/USC The project is developing a barcode reader system compatible with laboratory involved in research on infectious agents like HIV and Hepatitis. Bar-coding reader will enable laboratory personnel to avoid error-prone data entry. Data format generated by barcode reader has to be compatible with a proprietary laboratory data management system (LIS) and allowing for a wide variety of outputs (text files, excel files, scanned documents). Project: Data Mining of Digital Library Usage Data Developing automated methods for the collection and normalization of DL usage data, the evaluation of clickstream mining methods and automated construction and prototyping of recommender services. We are presently exploring the applications of log analysis to the formation of networks of complex objects (Buckets) within the framework of data preservation efforts. Project: Development of Open Source Technologies for Creating a Correspondence Document Data Center A user-friendly Correspondence Document Data Center to provide access to pieces of correspondence from many departments within one organization 43 (intranet). The Client is extremely interested in creative, innovative approaches (to an otherwise boring database/data center project!) by the students. The main objective of the project is developing a correspondence data system. The system should be 1) Innovative back-end data exchange architecture and related applications 2) A user-friendly Project: Requirements Tracking Currently while a project is underway there is no way to track requirements, their inter-dependencies, and any changes to them besides the documentation. So the primary objective of the project is developing that is able to track any requirements that are part of a project. The program will need to be able to track dependencies between requirements in system of systems levels of projects as well as changes to those requirements and reasons behind the requirements. It must also be able to interface through the web so that it can be accessed remotely. Project: NSF Database Over time, the number of video clips has increased, requiring the need to create some type of data base structure that will allow users to name, locate, retrieve, use, and analyze movement data stored in digital video form. 44 1. Video archiving system needs to be designed and created with the understanding that the end user needs simple ways of retrieving files for student use, maintaining the data base over time with minimal knowledge, and mechanisms for adding additional video clips over time. 2. Video searching system that will allow end users (faculty and students) to search for video clips on the basis of a pre-defined search criteria. 3. Video viewing system needs to be designed and created that will allow students to view a subset of the video clips and email their analysis using an e-journaling system set up by the USC Center for Scholarly Technology. Students need to be able to synchronize video clips, view them side by side, advance images frame by frame, and draw on selected images as a form of communicating analysis results. Project: Total Quality Management Application Through the Intranet, a variety of action items are created for individuals within the Institute or at the University. The primary objective is to develop is a "Total Quality Management" module that tracks action items and people's performance against tasks, helping them and management understand their efficiency, encouraging improvement, and identifying bottlenecks in the system. 45 The system could potentially be developed within ASP, integrated tightly with our current code, or alternatively as a stand-alone application/COM object(s) with a clear API for interfacing with our business systems. Project: Automated Reconciliation Currently, all research administrators across campus are required to reconcile every transaction against the source paperwork supporting it. This is an extremely labor-intensive project, as it is common for research units to have over 100 accounts that must be reconciled every month. By creating a system that would compare the data downloaded from the University’s financial system to the known transactions (documentation for which would be retrieved from several different sources, including user input for paper supporting documentation) and provide us with an exception listing (known transactions that have not yet hit the accounts and transactions hitting the accounts for which we do not have documentation), our available manpower would be optimized by focusing on researching the exceptions. The system could potentially be developed within ASP, integrated tightly with our current code, or alternatively as a stand-alone application/COM object(s) with a clear API for interfacing with our business systems. 46 Project: STS Database Evaluation Recently, clients of the project want to transfer their vendor who maintains database, which is based on the Society of Thoracic Surgeons (STS) criteria. The main objective of the project is obtaining an evaluation of our set up and suggestions for improvement. Also, clients want to have an evaluation of the organization between STS and its vendors regarding maintaining integrity of the software. Project: Data Mining from Report Files Clients are looking for data mining from reports application, that allowing for a wide variety of input (text files, excel files, scanned documents). This application would be able to understand the relational nature of reports, versus a flat file. This application needs to be able to handle the common unique formatting techniques, such as column and row wrapping. The application needs to be easily customized for use with new reports, defining keywords, column and row wrapping, so that it can be adapted for a variety of existing and future reports. The application create techniques for data mining from a wide variety of reports. It should will work with many formats of data, including test files, and scanned documents. 47 Project: Proposal for an Ada to AADL Translator The project will develop software to perform the complete translation of Ada 83 source code to AADL including, but not limited to: z Automated identification of the main software application z Automated identification of each individual software thread. z Determination of each thread priority z Identification of all linkages between threads including all control/feedback loops Alert Messages warning of potential priority inversion of threads and tasks. It will also facilitate system definition by providing pre- and/or post processing to allow operator entry of system parameters not defined in the Ada source code. Project: CSE Website Enhancement - Student View using Extreme Programming The goal is to make any sections that students would use more user-friendly, such as the courses section. This group will need to define a XML schema for the courses section and develop an attractive front-end to interface with it. We would like the ability to search by course, by semester, etc. Students will be using a combination of the Extreme Programming and MBASE methods to do this project. 48 Project: CSE Website Enhancement - Student View The goal is to make any sections that students would use more user-friendly, such as the courses section. This group will need to define a XML schema for the courses section and develop an attractive front-end to interface with it. We would like the ability to search by course, by semester, etc. 4.3 Experiment Preparation 4.3.1 Reviewers and roles The remote students (off-campus students) in the course are responsible for "Independent Verification and Validation" (IV & V). V & V is a collection of analysis and testing activities across the full life cycle and complements the efforts of other quality-engineering functions. It determines that the software performs its intended functions correctly, performs no unintended functions and measures the quality and reliability of the software. Figure 4-2 shows overall steps of developing and reviewing the artifacts. Developers (on campus students) produce the artifacts: OCD, SSRD, and SSAD. The artifacts are sent to reviewers (off-campus IV&Vers) and they review the artifacts. In this experiment, the off-campus IV&Vers were divided into two groups, A and B. The randomly-selected Group A performed verification and validation with the Value based checklists (VBR in Figure 4-3) discussed in Section 2. Group B used the traditional CS 577 value-neutral checklists (CBR). Both groups had otherwise-equal education on MBASE and IV&V processes through the lectures and the assignments. If the IV&Vers find any issue, they register potential Problems as Concerns on the Concern logs. The Concern logs made by IV&Vers are sent back to the developers. The author determines which Concerns need fixing and registers them as Problems. Developers IV&Vers OCD SSRD SSAD Checklist Training Group A Group B VBR CBR Review Develop Concerns Problems identify provide filter for fix Figure 4-2 Independent Verification and Validation in CSCI 577A 49 50 4.3.2 Materials The Value-based checklists used in the experiment covered three documents in MBASE, which are directly related to the product development. • Operational Concept Description (OCD) The OCD describes how the proposed system operates. It gives a general understanding of the proposed system and its operational concepts to stakeholders. It defines the objectives and scope of the proposed system, and describes the system’s key operational stakeholders and scenarios, along with prototype summaries. The main contents in OCD are as follow z Shared Visions It describes System Capability Description, Key Stakeholders System Boundary and Environment, Major Project Constraints, Top- Level Business Case, Inception Phase Plan and Required Resources, Initial Spiral Objectives, Constraints, Alternatives, and Risks z Domain/Organization Description It includes Organization Background, Organization Goals, and Current Organization Environment. And domain of the organization is described. 51 z Proposed System It explains Project Goals and Constraints, System Capabilities, Levels of Service (L.O.S.) Goals, Changes in the Organization Environment Due to Proposed System, and Effect on Organizations’ Support Operation z Prototyping It describes Approach, Initial Results of prototype. • System and Software Requirement Description (SSRD) The SSRD describes all requirements of the project. It includes capability requirements, Level of service requirements and global constraints (Interface requirements, Project requirements. The main contents of SSRD are as follow z Project Requirements It includes Budget and Schedule, Development Requirements, Deployment Requirements, Transition Requirements, and Support Environment Requirements z Capability Requirements It describes System Definition, and System Requirements 52 z System Interface Requirements It explains User Interface Standards Requirements, Hardware Interface Requirements, Communications Interface Requirements, and Other Software Interface Requirements z Level of Service (L.O.S.) Requirements z Evolution Requirements It involves Capability Evolution Requirements, Interface Evolution Requirements, Technology Evolution Requirements, Environment and Workload Evolution Requirements, and Level of Service Evolution Requirements • System and Software Architecture Description (SSAD) The SSAD describes the results of analyzing the requirements and concept of operation for the system, designing architecture, and designing an implementation. The SSAD serves as a development blueprint for the Construction phase. The main contents of SSAD are as follow z System Analysis It describes System Structure, Artifacts & Information, System Behavior, L.O.S. Goals, and Rules. 53 z Architecture Design & Analysis Structure It describes Architecture Analysis Classes, Behavior, L.O.S., and Architectural Styles, Patterns & Frameworks z Implementation Design It explains detail, practical implementation design with Product Structure, Behavior, L.O.S, Patterns & Frameworks, and Project Artifacts 4.3.3 Data Issues There are three priority and criticality levels. All the issues IV&Vers found have values for priority and criticality. Based on priority and criticality, the value of each issue is defined, and then the total values (impact of issues) are analyzed. The generic term “Issue” covers both “Concerns” and “Problems.” Issues are the key data collected through the experiment. Effort The effort IV&Vers had spent is another data collected to be analyzed. There are two effort types collected: preparation effort and review effort. Preparation effort covers all effort IV&Vers had spent for preparing a review. It includes 54 communications between development team members, understanding artifacts to review, preparing checklist for a review and other activities related to preparation. The review effort is the only effort for reading artifact and finding issues. 4.4 Experiment Planning 4.4.1 Variables Independent variables Two different review techniques, CBR and VBR, are used in the experiment. The VBR is assigned to group A and the traditional value- neutral CBR is assigned to group B. Dependent variables Number of issues: The number of concerns and problems found by IV&Vers are compared in both groups. Impact of issues: using the effectiveness metric, the effectiveness of each issue is calculated (figure 3-5). The total Impacts and sum of each issue effectiveness are compared in both groups. 55 Effort: generally, the cost of performing a software review includes the individual preparation of each reviewer before the review and the effort of reviewers during the review activity. The total effort IV&Vers spent is the sum of the preparation time and review time. Preparation for review includes understanding background of the artifacts, understanding the checklist to use in the review, communication with the development team related to the review and other efforts to prepare the review. Cost-effectiveness: Numbers or impact of issues found per hour of effort. 4.5 Hypotheses I defined the hypotheses for the experiment as follows. Overall Hypothesis: The mean numbers, rates, impacts and cost-effectiveness of concerns and problems will not differ between the Value based review group and the traditional review group. 56 Individual Hypotheses: 1. The number of concerns IV&Vers found does not differ between groups. 2. The number of problems IV&Vers found does not differ between groups. 3. The Impact of concerns in both groups does not differ between groups. 4. The Impact of problems in both groups does not differ between groups. 5. The number of concerns IV&Vers found per hour does not differ between groups. 6. The number of problems IV&Vers found per hour does not differ between groups. 7. The cost effectiveness (impact per hour) of concerns does not differ between groups. 8. The cost effectiveness (impact per hour) of problems does not differ between groups. 4.6 Experiment Design As described in Section 4.1, the IV&Vers are randomly divided into two groups. The original number of IV&Vers is 29, and 15 are assigned to group A (VBR group) and 14 are assigned to group B (CBR group). However, because one 57 project (in group B) did not follow the MBASE guideline, the data has been discarded. Hence the total data set collected is 28 (15 from group A, 13 from group B). IV&Vers used a review report form called the Agile Artifact Review (AAR) form, which had been used for several years in CSCI 577A. IV&Vers and developers reported their issues using the form. Also, all of the efforts they had spent in working on a review are written on the form. Video lectures and VBR guidelines are prepared to explain VBR to group A. Group B used CBR materials that had been used in the fall of 2003. The materials include a checklist, guidelines and forms they need to use. 4.7 Experiment Operation The experiment had run over the course of a full semester (fall 2004). There are two milestones in the course 577A: the Life Cycle Objective (LCO) and the Life Cycle Architecture (LCA). There were five assignments to be collected. After reviewing the artifacts, the IV&Vers produced concern logs. The concern logs were sent to the development team to identify the problem, and they were also collected for research. The development team produced a problem list to fix the problems and the problem list was collected for research. 58 Date Assignment Reviewed sections Artifacts 1 9.29.2004 Review Early OCD The Early OCD Package OCD 2 10.11.2004 Review LCO Core Core parts of the LCO package: . LCO version of OCD . At least section 1,2,3,4 for SSRD . At least section 1,2 for SSAD OCD, SSRD, SSAD 3 10.18.2004 Review LCO Draft No final artifacts LCO version in MBASE OCD, SSRD, SSAD 4 11.5.2004 Review LCO Package All the artifacts LCO version in MBASE OCD, SSRD, SSAD 5 12.1.2004 Review LCA Draft Not final LCA version. But significant progress on entire artifacts. OCD, SSRD, SSAD Table 4-2 Assignments in the Experiment 4.8 Experiment Data Basically, there are two data types to be collected. The first is concern data from the IV&Vers. After reviewing the artifacts, the IV&Vers made Concern Logs, in which the concerns of the IV&Vers are written. All of the necessary information is included in the logs. Another data type is a Problem List, which is produced by the development team. The development teams recorded which concerns are problems that need to be fixed and if the concerns are not problems, they provide the reason why. Collect Concern Log Vet the Concern Logs Data Analysis Recollect? Contact the IV&Vers Concern data Collect Problem List Recollect? Contact the development team Problem data Vet the Problem List Yes Yes No No Figure 4-3 The Process to collect Data All the collected data are verified so as not to miss any necessary data information. If there is missing information, the IV&Vers or the development teams are asked to resubmit the data in order to add to the missing information. The key information contained in data set is as follow: 1. Description of the issues, 2. Effort - preparation hours, review hours, 3. Priority and criticality of issues. 59 60 Based on the collected data, data analysis was performed and statistical significant was tested. Detailed results are presented in the next chapter. Figure 4-4 shows ‘AAR form : Concern Log’ that I used to collected concern data. Figure 4-5 is one of collected concern data from an IV&Ver. Figure 4-6 shows ‘AAR form: Problem List’ that development team produces based on a Concern Log from an IV&Ver. Figure 4-7 is one of the collected Problem Lists form a development team. Project Name: Review # Artifact: Module: MBASE Phase/lvevel: Activity: Exit Criteria: Reviewer: Reviewer email: Reviewer phone: Auhor: #Location(s) M/W/E Priority Criticality Area of concern Use this sheet to record the areas of concern that come up during your reading/review of the Artifact. Give the "location" information and the associated technical description of the area of concern to indicate to the developer/author during his/her analysis of this information about the relevant part of the Artifact. Give your opinion for the classification of the area of concern in M/W/E field. Write in letter each for Missing(M)/Worng(W)/Extra(E). Also, rank the priority and criticality in both field as High, Medium or Low. Keep this sheet with you during the analysis of the artifact. When an area of concern you recorded here requires corrective action and is placed on the artifact's Problem list, note the number of the Defect/Issue(s) in the Areas of Concern Log. Problems are things you believe the author of this artifact can/should fix; "open issues" are things which can not be corrected solely in this artifact or at this time. Review Date: Total Preparation Time: Date Retuned to QAT: Review Time: Date Sent to Reviewer: Date Returned to Author: Figure 4-4 AAR form (Concern Log) 61 62 Figure 4-5 Example of collected data in AAR form (Concern Log) Project Name: Review #: - Artifact: Date: Module: Activity: Type of review: [Indepent] Review MBASE Phase/level: Review Date(s): No. of Priority: High Medium Low High Medium Low Comments: Defects/Issues D / I # Location (s) Description Missing High(3) High(3) Wrong Medium(2) Medium(2) Extra Low(1) Low(1) Open issue Missing High(3) High(3) Wrong Medium(2) Medium(2) Extra Low(1) Low(1) Open issue Missing High(3) High(3) Wrong Medium(2) Medium(2) Extra Low(1) Low(1) Open issue Location of correction(s) Date of fix Comments No. of Criticality: No. of open issues Classification Priority Criticality Activity of Defect Injection (Requirements,De sign,Code,etc.) Figure 4-6 AAR form (Problem List) 63 Project Name: Team 15 Review #: OCD - 5 Artifact: OCD LCA ver 1.3 Date: 2005-12-04 Module: N/A Activity: Type of review: [Indepent] Review MBASE Phase/level: Elaboration Review Date(s): 11/27/04-12/2/04 No. of Priority: 4 High 3 Medium 7 Low 5 High 4 Medium 5 Low 3 Comments: Defects/Issues D / I # Location (s) Description Table Incorrect page numbers Missing High(3) High(3) 1 content X Wrong X Medium(2) Medium(2) Reqirement Table Extra Low(1) X Low(1) content Open issue Table of The names of the tables in the Missing High(3) High(3) 2 Tables Tables don't all match the captions X Wrong X Medium(2) Medium(2) Reqirement Table of on the actual tables (e.g. Tables 27 Extra Low(1) X Low(1) Tables through 36). Open issue Sec 2.1.2 Training is discussed in the text, Missing High(3) High(3) 3 but no training initiative is shown Wrong X Medium(2) X Medium(2) Reqirement Sec 2.1.2 in theResults Chain diagram. Either X Extra Low(1) Low(1) delete the text about training or .. Open issue Location of correction(s) Date of fix Comments No. of Criticality: No. of open issues Classification Priority Criticality Activity of Defect Injection (Requirements,De sign,Code,etc.) Figure 4-7 Example of AAR form (Problem List) 64 CHAPTER 5 EXPERIMENT RESULTS The total number of IV&Vers involved in the experiment was 29, but one did not follow the MBASE guidelines. Thus the number of samples collected was 28: 15 in Group A and 13 in Group B. The total number of concerns IV&Vers found is 4,641, and the number of problems is 2,765. The average number of concerns IV&Vers found is 165.75 (Group A 189.13, Group B 138.77), and the average number of problems is 98.75 (Group A 111.80, Group B 83.69). The average preparation time for review is 23.36 hours (Group A 19.57 hrs, Group B 27.73 hrs), and the average review time is 43.29 hours (Group A 40.97 hrs, Group B 45.97 hrs). 446 236 275 138 136 167 49 132 472 164 177 65 78 187 251 62 85 69 125 198 328 167 135 89 95 13 47 102 219 126 157 93 79 134 24 46 269 150 130 40 100 116 29 60 45 81 104 303 64 81 51 47 7 188 114 61 0 50 100 150 200 250 300 350 400 450 500 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 Concerns Problems Group A (Value Based) Group B (Traditional) Figure 5-1 Number of Concerns and Problems by IV&Vers 65 Basically, the results show that Group A reviewers tended to find more concerns and problems than Group B (Figure 5-1). Figure 5-2 shows that Group A subjects tended to have a higher impact on concerns and problems but not uniformly across all the subjects in both groups. Below, I will test for statistical significance of the results. 2780 1502 1307 730 698 1109 242 534 2490 94 1 1128 463 504 841 930 347 394 252 665 957 1593 1057 501 468 209 34 273 452 13 24 682 747 391 430 997 95 187 1351 854844 305 437 339 145 269 179 456 560 1479 398 303 268 161 28 853 640 378 0 500 1000 1500 2000 2500 3000 A1 A2 A3 A4 A5 A6 A7 A8 A9 A1 0 A11 A12 A1 3 A14 A15 B1 B2 B3 B4 B5B6 B7 B8B9 B10B11B12B13 C o nce rns P roblem s G roup A (Value Based) G roup B (Traditional) Figure 5-2 Impact of Concerns and Problems by IV&Vers 66 5.1 Effort Comparison 57.39 34.93 22.45 56.46 40.04 15.04 0 2040 6080 Total Effort Review Effort Preparation Effort Group A Group B Figure 5-3 Effort Comparison Figure 5-3 shows efforts IV&Vers spent using VBR and CBR. The effort consists of two sub-efforts: preparation effort and review effort. Preparation effort includes understanding and preparing checklists, artifacts and other efforts for preparing a review. Review effort is the direct effort of reading and finding issues. The total effort is the sum of both efforts. There is no difference between the two groups for total effort. Group A spent 1.61% less effort than Group B. The difference is small enough to say that both groups spent the same effort. For review effort, Group B spent less effort than Group A. This difference in effort may be because VBR uses two different 67 68 checklists (General Value-Based checklist and Artifact-Oriented checklist), while CBR uses only one checklist. However, the difference is not serious (40.04 hours vs. 34.93 hours). However, Group A spent 33.03% less preparation effort than Group B. Even though VBR has two types of checklists, the checklists might be helpful for understanding artifacts so that IV&Vers can spend less effort in preparing for a review. P-value % Group A less Preparation Effort 0.060 33.03% Review Effort 0.446 -14.62% Total Effort 0.911 1.61% Table 5-1 T-test results and percent Group A less than Group B Table 5-1 shows t-test results to compare effort in both groups. Because all p- values are higher than 0.05, the effort for both review techniques are the same statistically. CBR is one of the review techniques that require minimum effort for reviewing artifacts. It only adds checklists into review processes. There are no formal guidelines or processes to follow. Reviewers only read artifacts according to questions and statements in the checklists. Considering the nature of CBR, the 69 above results prove that VBR also is a review technique that can be performed with minimum effort. 5.2 Comparative Effectiveness 5.2.1 Overall Table 5-2 shows the comparative average numbers of concerns and problems found by IV&Vers in Groups A and B. The Group A mean is 34% higher for number of concerns and 51% higher for number of problems. The IV&Vers in Group A appear to have found a higher number of concerns and problems than Group B. However, standard deviations in each group are high compared to the means of the issues. The standard deviations of Group A are 98.89 (for concerns) and 52.79 (for problems). The standard deviations of Group B are 60.92 (for concerns) and 27.82 (for problems). The high standard deviations are caused by the high variability of the number of issues in each group (refer to Figure 5-1). These high standard deviations lead to the results of the t-test in Table 5-2. The p- values are 0.202 for number of concerns and 0.056 for number of problems. The hypothesis that there is no difference between the two groups for numbers of concerns and problems cannot be rejected at the 0.05 level. 70 Issue Group Mean Std. Deviation % Group A higher p-value A(VBR) 178.15 98.89 concerns 34% 0.202 B(CBR) 133 60.92 A(VBR) 106.45 52.79 problems 51% 0.056 B(CBR) 70.73 27.82 Table 5-2 Comparison of Group A and Group B: Average number of Concerns and Problems 500 400 300 200 100 0 446 65 178.15 251 62 133.00 219 40 106.46 116 29 70.73 13 13 11 11 Concerns Problems Group A Group B Figure 5-4 Number of concerns and problems found by IV&Vers in both groups Figure 5-4 more clearly explains the t-test results. The figure shows the numbers of concerns and problems found by IV&Vers with boxplots. In the figure, the maximum, minimum and mean of the two groups are shown, and the box in the middle shows a data range of 75% of total data in the two groups. Because of some anomalies with data reporting, I have dropped the highest and lowest data 71 points in each group. The numbers of issues found by IV&Vers appear to be approximately the same even though there are differences in the means of the issues. Fifteen data (9 in Group A, 6 in Group B) report that the numbers of concerns the IV&Vers found range from 100 to 250, which is 62.5% of the total data (15 of 24 total data). For problems, 16 data (8 in Group A, 8 in Group B) show the number of problems the IV&Vers found, which range from 50 to 150 and is 66.7% (16 of 24 total data) of the total data. More than 60% of the data overlapped in the same range for concerns and problems. Issue Group Mean Std. Deviation % Group A higher p-value A(VBR) 992.23 552.58 concerns B(CBR) 601.91 302.51 65% 0.049 A(VBR) 604.92 334.43 problems B(CBR) 319.55 133.27 89% 0.012 Table 5-3 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems Table 5-3 shows the comparative effectiveness in terms of concern and problem impacts found by IV&Vers in Groups A and B. The difference between the means of Group A and Group B is considerably higher for impacts of concerns and problems than it was for numbers of concerns and problems. For concerns, Group A’s mean is 65% higher for impact vs. 34% for number. For problems, Group A is 89% higher for impact vs. 51% for number. In addition, the results of the t-test of Impact of Concerns and Problems show that the two groups differ. The p-value is 0.049 for Impact of Concerns and 0.012 for Impact of Problems. This result is enough to reject the hypothesis that there is no difference in the impact of concerns and problems between the two groups. 1500 1200 900 600 300 0 2490 463 992.23 1057 209 601.91 1324 187 604.92 560 145 319.55 13 13 11 11 Impact of Concerns Impact of Problems Group A Group B 1800 Mean Figure 5-5 Impacts of Concerns and Problems in both groups Figure 5-5 shows the individual impacts of the concerns and problems in both groups of boxplots. The figure illustrates that the differences between the two groups are much greater than they were for the number of concerns and problems in Figure 5-6. For the impact of concerns, the number of data in Group A over the mean impact (992.23) is 5 (of 13 data, 38.5%), while the number of data in Group 72 B over Group A’s mean impact is 1. Most data in Group B range from 200 to 900 (8 of 11 data, or 72.7%), which is less than Group A’s mean impact. For the impact of problems, the number of data in Group A over the average impact of Group B (319.55) is 10 of 13 in Group A or 76.9%. The differences in impact also were clear from examining the review results. Group B reported many more low- criticality issues (e.g. non-misleading typos and grammatical errors as seen in Figure 3-2) than Group A. 5.2.2 Analysis of OCD data 93.92 57.36 56.38 29.45 0 20 40 60 80 100 concerns problems Group A Group B Figure 5-6 Average number of issues of OCD In the previous section, the differences between the two groups for all artifacts were analyzed. This section describes a detailed analysis of the data only for 73 74 OCD. Figure 5-6 shows the number of issues of OCD IV&Vers found in reviews. There are considerable differences between these groups. The number of concerns Group A found is 93.92, whereas Group B only found 57.36. Group A found 63.74% more numbers of concerns than group B. For problems, Group A detected almost twice as (91.44%) many numbers of problems than Group B (56.36 vs. 29.45). Issue Group Mean P-value % Group A higher Group A (VBR) 93.92 concerns Group B (CBR) 57.36 0.059 63.74% Group A (VBR) 56.38 problems Group B (CBR) 29.45 0.009 91.44% Table 5-4 Comparison of Group A and Group B: Mean Number based on Concerns and Problems of OCD T-test results are presented in Table 5-4. The p-value for number of concerns is 0.059, which cannot prove a difference between the two groups statistically at the 0.05 level. However, Group A found 63.74% more concerns than Group B, and the p-value is close to 0.005. The difference is noticeable even though it is insufficient to prove statistically. For the number of problems, the difference between both groups is clear as the p-value is 0.009. 505.46 263.73 318.54 119.64 0 100 200 300 400 500 600 Effectiveness (concerns) Effectiveness (problems) Group A Group B Figure 5-7 Average impact of issues of OCD Figure 5-7 shows the average impacts of issues of OCD. The differences between the groups are greater for impacts compared to the numbers of issues of OCD. The percentage of effectiveness, where Group A is higher than Group B for concern impacts, is 91.66%; while the percentage of effectiveness, where Group A is higher than Group B, for the number of concerns is 63.74%. This means that Group A has more high-valued issues than Group B. The t-test result is shown in Table 5-5. The p-value of 0.025 proves that there is a difference between the groups for impacts of concerns. 75 76 The difference of problem impacts is larger than the difference of concern impacts. The impact of the problems of Group A is 318.54, while the impact of the problems of Group B is 119.64. Group A has 166.25% more impacts of problems than Group B. The t-test p-value is 0.003, which is small enough to say that there is a difference between Group A and Group B with statistical significance. Impact Group Mean p-value % Group A higher Group A (VBR) 505.46 concerns Group B (CBR) 263.73 0.025 91.66% Group A (VBR) 318.54 problems Group B (CBR) 119.64 0.003 166.25% Table 5-5 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of OCD Figure 5-8 shows the accumulated numbers of issues for both groups of OCD. For number of concerns, the difference between the two groups becomes greater as the project progresses. At the Early OCD data collection, the number of concerns of both groups is 22.82 for Group A vs. 18.17 for Group B. The number of concerns becomes 43.15 vs. 31.67 at the LCO Core collection. This is a larger difference than at the Early OCD data collection. After the LCO Core collection, the difference becomes 51.08 vs. 33.82 and 79.77 vs. 50.36, and the final number of concerns is 93.92 vs. 57.36. This implies that Group A found more numbers of concerns at each review than Group B. 93.92 29.45 79.77 51.08 43.15 22.82 57.36 50.36 33.82 31.67 18.17 56.38 49.54 33.08 25.69 11.09 25.64 18.73 16.56 10.67 0 10 20 30 40 50 60 70 80 90 100 Early OCD LCO Core LCO Draft LCO LCA Draft concern(A) concern(B) problem(A) problem(B) Figure 5-8 Accumulated Average number of OCD issues This phenomenon is found in a number of problems of OCD for both groups. The difference between the two groups for numbers of problems is small at the beginning (11.09 vs. 10.67). This difference becomes bigger as the project progresses, and the difference is 56.38 vs. 29.45 at the end of data collection. 77 505.46 263.73 318.54 119.64 421.38 276.62 233.85 112.82 210.82 136.36 112.91 73.33 265.92 183.23 56.09 157.15 103.64 75.36 40.33 60.27 0 100 200 300 400 500 600 Early OCD LCO Core LCO Draft LCO LCA Draft CE_con(A) CE_con(B) CE_pro(A) CE_pro(B) Figure 5-9 Accumulated Average impact of OCD issues Figure 5-9 shows the accumulated average impacts of OCD issues. As is the case for the accumulated average number of OCD issues, the result shows that the differences are not large at the beginning but become larger as the data collection progresses. One interesting point is the impact of problems of Group A is larger than the impact of concerns of Group B. The number of problems Group A found is 56.38, whereas the number of concerns Group B found is 57.36. The numbers are similar; however, the impacts are different. It is more evident that Group A focused on higher-value issues first, and as a result, the impact of issues is larger than Group B. 78 5.2.3 Analysis of SSRD data 33.15 32.18 17.85 17.45 0 5 10 15 20 25 30 35 concerns problems Group A Group B Figure 5-10 Average number of issues of SSRD Unlike the results of OCD, the differences between the two groups are not considerable for SSRD. Both groups found almost the same number of issues for SSRD artifacts. Figure 5-10 shows the number of issues for both groups. Table 5-6 shows the t-test results. As seen in the Table, p-values say there is not difference between two groups for number of issues. These results may imply that VBR has the same performance as CBR for finding issues for requirement artifacts. 79 % Group A higher Issue Group Mean p-value Group A (VBR) 33.15 concerns 0.886 3.01% Group B (CBR) 32.18 Group A (VBR) 17.85 0.922 2.29% problems Group B (CBR) 17.45 Table 5-6 Comparison of Group A and Group B: Mean Number based on Concerns and Problems of SSRD 189.77 150.27 101.77 79.09 0 50 100 150 200 Effectiveness (concerns) Effectiveness (problems) Group A Group B Figure 5-11 Average impact of issues of SSRD However, the impacts of both review techniques are different. Figure 5-11 shows the impact of issues both groups found in SSRD. Although the numbers of issues both groups found are not different, the impacts of issues are different. Each 80 81 IV&Ver in Group A got 189.77 impacts of concerns while IV&Vers in Group B achieved 150.77 impacts on average. For the impacts of problems, the IV&Vers in Group A had 101.77 impacts, whereas the IV&Vers in Group B had only 79.09 impacts. Impact Group Mean p-value % Group A higher Group A (VBR) 189.77 concerns Group B (VNR) 150.27 0.289 26.29% Group A (VBR) 101.77 problems Group B (VNR) 79.09 0.217 28.68% Table 5-7 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of SSRD However, a statistical test found no significant differences between the two groups. Table 5-7 shows the t-test results. Although Group A had 26.79% higher impacts for concerns and 28.68% higher impacts for problems, the p-values are 0.289 for the impact of concerns and 0.217 for the impact problems, which are not enough to prove differences between both groups with statistical significance. 33.15 25.31 12.92 9.42 32.18 24.91 15.55 12 17.85 13.69 6.92 5.33 17.45 13.73 9.27 7.25 0 5 10 15 20 25 30 35 LCO Core LCO Draft LCO LCA Draft concern(A) concern(B) problem(A) problem(B) Figure 5-12 Accumulated Average number of SSRD issues Figure 5-12 shows the accumulated average number of SSRD issues. As seen in the figure, the number of issues found by the IV&Vers in Group B at the beginning of data collection is higher than Group A. The differences remained until the LCO Draft data collection. After that point, the numbers of issues of both groups are the same at the LCO data collection. The similarity in the numbers of issues remained through the end of data collection. This means that CBR is better than VBR in finding issues at the beginning of the project; however, VBR shows the same performance in detecting issues for a requirement artifact compared to CBR at the end of data collection. 82 189.77 146.23 74.08 51.08 150.27 118.91 76.45 72 101.77 78.23 27.92 38.38 79.09 65.27 43.38 46.82 0 20 40 60 80 100 120 140 160 180 200 LCO Core LCO Draft LCO LCA Draft CE_con(A) CE_con(B) CE_pro(A) CE_pro(B) Figure 5-13 Accumulated Average impact of SSRD issues Figure 5-13 shows the accumulated average impacts of SSRD issues. The figure shows dissimilar results with the numbers of issues of SSRD as seen in Figure 5- 12. The results at the beginning of data collection are the same within Figure 5- 12, but the impact of issues of Group A caught up with Group B at the LCO Draft data collection. From the LCO data collection, the impact of issues of Group A is higher than Group B’s, and this difference is significant at the end of data collection. This result implies that even though the number of issues found by IV&Vers in Group A is less than the number found by Group B at the beginning of the data collection, the IV&Vers in Group A focused on higher-value issues 83 first. As a result, the impacts of Group A are higher than Group B for latter data collections. 5.2.4 Analysis of SSAD data 49.92 40.64 30.69 23.45 0 10 20 30 40 50 concerns problems Group A Group B Figure 5-14 Average number of issues of SSAD There are some gaps between the two groups in the numbers of issues found by the IV&Vers for SSAD. The IV&Vers in Group A found 49.92 SSAD issues and the IV&Vers in Group B found 40.64 SSAD issues on average for concerns. Group A found 22.83% more concerns than Group B. Moreover, the gap between the groups for the numbers of problems is greater. Group A found 30.69 problems 84 85 on average, while Group B found only 23.45 problems. Group A is 30.87% higher than Group B for the numbers of problems found. Issue Group Mean p-value % Group A higher Group A (VBR) 49.92 concerns Group B (CBR) 40.64 0.393 22.83% Group A (VBR) 30.69 problems Group B (CBR) 23.45 0.248 30.87% Table 5-8 Comparison of Group A and Group B: Mean Number based on Concerns and Problems of SSAD The t-test results show that there are no serious differences between the groups for the numbers of issues as seen in Table 5-8. The p-value is 0.393 for concerns and 0.248 for problems. Nevertheless, the 22.83% and 30.87% higher performance of Group A deserve attention. Figure 5-15 shows the average impact of issues of SSAD. Clearly, Group A has a higher impact than Group B. The difference in the impacts of issues between the groups is greater than the difference in the numbers of issues between the groups for the SSAD artifact. The percentage of effectiveness, where Group A is higher than Group B for impacts of concerns, is 48.78%, while the perentage of effectiveness is only 22.83% higher for numbers of concerns. This phenomenon also is shown in the case of the impacts of problems. The percentage that Group A is higher than Group B for impacts of problems is 38.09%, which is larger than 30.87% (the percentage that Group A is higher than Group B for numbers of problems). Again, this result implies that VBR detected more numbers of higher- value issues than CBR. 295.54 198.64 177.38 128.45 0 50 100 150 200 250 300 Effectiveness (concerns) Effectiveness (problems) Group A Group B Figure 5-15 Average impact of issues of SSAD Even though Group A has a greater difference in impacts of issues between the groups than the difference in numbers of issues, the p-values of the t-test do not agree that there are serious differences in the impacts of issues between the two groups with statistical significance. The p-values of the cases are shown in Table 5-9. The p-value is 0.168 for impact of concerns and 0.207 for impact of 86 problems. However, the percentage (48.78% for concerns impacts and 38.09% for problems impacts) that Group A is higher than Group B for impacts of issues is noticeable. % Group A higher Impact Group Mean p-value Group A (VBR) 295.54 concerns 0.168 48.78% Group B (CBR) 198.64 Group A (VBR) 177.38 problems 0.207 38.09% Group B (CBR) 128.45 Table 5-9 Comparison of Group A and Group B: Mean Impact based on Concerns and Problems of SSAD 49.92 36.23 14.85 8.54 40.64 32.36 16.91 10.57 30.69 24.31 10.85 6.23 23.45 17.91 11.45 8.43 0 10 20 30 40 50 60 LCO Core LCO Draft LCO LCA Draft concern(A) concern(B) problem(A) problem(B) Figure 5-16 Average number of SSAD issues 87 88 Figure 5-16 shows the accumulated average number of SSAD issues. The number of issues Group B found is more than the number of issues Group A found for SSAD at the beginning of the data collection. This phenomenon remained through the second data collection (LCO Draft). However, the number of issues Group A found is greater than the number Group B found at the LCO data collection, 36.23 vs. 32.36. This difference becomes larger at the last data collection. The results imply that CBR can find slightly more numbers of issues for the initial SSAD artifact than VBR. However, as the SSAD artifact upgrades, VBR is more effective in finding issues than CBR. Figure 5-17 shows the accumulated average impact of SSAD issues. The results are similar to the results in Figure 5-16 except there are greater differences between the two groups for impacts of SSAD issues. 295.54 196.31 76.62 49.5 198.64 161.27 79.45 49 177.38 130.85 35.85 56.38 128.45 102.73 43.13 63.18 0 50 100 150 200 250 300 350 LCO Core LCO Draft LCO LCA Draft CE_con(A) CE_con(B) CE_pro(A) CE_pro(B) Figure 5-17 Accumulated Average impact of SSAD issues 5.3 Comparative Cost Effectiveness 5.3.1 Overall 89 Issue Group Mean Std. Deviation % Group A higher p-value A(VBR) 2.9 1.23 concerns 55% 0.026 B(CBR) 1.88 0.77 A(VBR) 1.78 0.85 problems 61% 0.023 B(CBR) 1.11 0.43 Table 5-10 Comparison of Group A and Group B: Mean number of concerns and problems per hour Table 5-10 shows the comparative average numbers of concerns per hour and problems per hour found by IV&Vers. The differences clearly are higher than the differences in the mean numbers of concerns and problems found. For concerns, Group A’s mean is 51% higher for concerns/hour compared to 34% higher for numbers of concerns. For problems, Group A’s mean is 61% higher for problems/hour compared to 51% for numbers of problems. Also, the p-values (0.026 for concerns/hour, 0.023 for problems/hour) in Table 5-27 show that there are differences between the two groups with statistical significance. 5 4 3 2 1 0 5.79 1.04 2.9 3.52 1.01 1.88 3.12 0.7 1.78 1.78 0.55 1.11 13 13 11 11 Concerns per hour Problems per hour Group A Group B Mean Figure 5-18 Number of concerns and problems found by IV&Vers per hour in both groups In Figure 5-18, boxplots of the numbers of concerns and problems found by IV&Vers per hour are shown. The figure also shows that the IV&Vers in Group A found a larger number of concerns and problems in an hour. Eleven of 13 or 90 91 84.6% in Group A are higher than the average of Group B, which was1.88 for the number of concerns the IV&Vers found per hour. For problems per hour, the number of data in Group A over the maximum data (1.78) in Group B is 7 (of 13 data or 53.8%). These results show that VBR generally found more numbers of issues per hour than CBR. The primary objective of VBR is to increase the cost effectiveness of reading techniques by focusing on higher-value issues first. As described in the previous section, cost effectiveness is defined as impact per hour. Table 5-11 compares the mean impacts per hour for concerns and problems between Groups A and B. Again, these differences are obviously higher than the differences for mean impact. For concern impact per hour, Group A’s mean is 105% higher than Group B’s vs. 65% higher for the concern impact mean. For problem impact per hour, Group A’s mean is 108% higher vs. 89% higher for problem impact mean difference. The results show that Group A had roughly twice the cost effectiveness of Group B. The p-values of cost effectiveness of concerns and problems are 0.004 (Concerns) and 0.007 (Problems) as seen in Table 5-11. These results apparently show that the two groups have statistical differences in cost effectiveness. 92 Issue Group Mean Std. Deviation % Group A higher p-value A(VBR) 16.75 8.46 concerns 105% 0.004 B(CBR) 8.16 3.27 A(VBR) 10.14 5.67 problems 108% 0.007 B(CBR) 4.87 1.93 Table 5-11 Comparison of Group A and Group B: Mean Cost Effectiveness of concerns and problems per hour 40 30 20 10 0 36.1 6.15 16.75 13.79 3.01 8.16 20.78 2.83 10.14 8.71 2.32 4.87 13 13 11 11 Impact of Concerns per hour Impact of Problems per hour Group A Group B Mean Figure 5-19 Cost Effectiveness of concerns and problems per hour in both groups Figure 5-19 shows the cost effectiveness of the two groups with boxplots. The differences of cost effectiveness of issues in the two groups are shown clearly in the figure. For both concerns and problems cost effectiveness, there is no data in Group B over the mean (16.75 concerns, 10.14 problems) of Group A. 5.3.2 Analysis of OCD data 4.59 3.02 2.8 1.6 0 1 2 3 4 5 concerns/Hr problems/Hr Group A Group B Figure 5-20 Average number of issues per hour of OCD Like the results of the number of issues of OCD, there are differences between the two groups for numbers of issues per hour. The IV&Vers in Group A found 4.59 concerns per hour on average, while the IV&Vers in Group B found 3.02 concerns per hour. Group A had 52.26% higher numbers of concerns per hour and 75.39% higher numbers of problems per hour than Group B. The percentages for which Group A is higher for numbers of issues of OCD are 63.74% for number of concerns and 91.44% for number of problems. These results imply that Group A spent more effort in reviewing the OCD artifact relative to Group B. 93 94 Issue Group Mean p-value % Group A higher Group A (VBR) 4.59 concerns Group B (CBR) 3.02 0.031 52.26% Group A (VBR) 2.8 problems Group B (CBR) 1.6 0.023 75.39% Table 5-12 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of OCD T-test results are shown in Table 5-12. The p-values are 0.031 for number of concerns per hour for OCD and 0.023 for number of problems per hour for OCD. The T-test shows that there are differences between the two groups for numbers of concerns and problems per hour for OCD with statistical significance. The fact that Group A achieved 52.26% (for concerns) and 75.39% (for problems) higher numbers of concerns and problems per hour of OCD than Group B with statistical significance is impressive. Figure 5-21 shows the average impact of issues per hour and cost effectiveness of OCD. As seen in the figure, there are large gaps between the two groups. The impact of concerns per hour of Group A is 25.16, while Group B had only an 11.44 impact value. The difference is more than twice the impact value of Group B and shows that VBR is obviously a more effective technique than CBR for finding higher-value concerns for OCD. A similar phenomenon is shown for cost effectiveness of problems of OCD. Group A had 15.36 impact values per hour, while Group B had only 6.6 impact values per hour. Group A had 132.58% higher impact values than Group B. 25.16 11.44 15.36 6.6 0 5 10 15 20 25 30 CE/Hr (concerns) CE/Hr (problems) Group A Group B Figure 5-21 Average impacts of issues per hour (Cost Effectiveness) of OCD The differences are clearer through t-test results. The p-values for cost effectiveness of issues are 0.002 for concerns and 0.007 for problems as seen in Table 5-13. The p-values are small enough to verify that there are significant differences between the two groups for cost effectiveness of issues. 95 % Group A higher Impact Group Mean p-value Group A (VBR) 25.16 concerns 0.002 119.98% Group B (CBR) 11.77 Group A (VBR) 15.36 0.007 132.58% problems Group B (CBR) 6.79 Table 5-13 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of OCD 5.3.3 Analysis of SSRD data 2 2.12 1.16 1.1 0 0.5 1 1.5 2 2.5 concerns/Hr problems/Hr Group A Group B Figure 5-22 Average number of issues per hour of SSRD 96 97 In the previous section (Section 5.2.3), there are no serious differences between the two groups for numbers of issues for the SSRD artifact. Similar results are shown again for the numbers of issues per hour. Figure 5-22 shows the number of issues per hour for SSRD. Group A found almost the same number of issues per hour as Group B; t-test results in Table 5-14 also verify no differences between the two groups. Issue Group Mean p-value % Group A higher Group A (VBR) 2 concerns Group B (CBR) 2.21 0.774 -6.07% Group A (VBR) 1.16 problems Group B (CBR) 1.11 0.831 5.2% Table 5-14 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of SSRD Figure 5-23 shows the average impact of issues per hour (Cost Effectiveness) of SSRD. Unlike the previous results, this shows somewhat different results. Group A achieved 11.78 impact values per hour for concerns, while Group B had 8.32 concern impact values per hour. For cost effectiveness of problems, Group A had 6.3 values, while Group B had 4.62 values. 11.78 8.32 6.3 4.62 0 2 4 6 8 10 12 CE/Hr (concerns) CE/Hr (problems) Group A Group B Figure 5-23 Average impacts of issues per hour (Cost Effectiveness) of SSRD Some gaps appear to exist between the two groups; however, the t-test results in Table 5-15 prove that there are no statistical differences between the two groups for cost effectiveness of issues. The p-value for concern cost effectiveness is 0.121 and 0.176 for problem cost effectiveness, which is not small enough to say that there is a statistical difference. However, the percentages (41.54% for concerns, 36.37% for problems) for which Group A is higher than Group B are noticeable. 98 % Group A higher Impact Group Mean p-value Group A (VBR) 11.78 0.121 41.54% concerns Group B (CBR) 8.32 Group A (VBR) 6.3 0.176 36.37% problems Group B (CBR) 4.62 Table 5-15 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of SSRD 5.3.4 Analysis of SSAD data 2.6 1.79 1.63 1.02 0 0.5 1 1.5 2 2.5 3 concerns/Hr problems/Hr Group A Group B Figure 5-24 Average number of issues per hour of SSAD There are some gaps between the two groups for numbers of issues for the SSAD artifact (49.93 vs. 40.64 for concerns, 30.69 vs. 23.45 for problems as seen in 99 100 Figure 5-21), even though the gaps are not proven statistically. For numbers of issues per hour for SSAD, differences between the two groups are larger than the differences for numbers of issues for SSAD. The IV&Vers in Group A found 2.6 concerns per hour, while the IV&Vers in Group B found 1.79 concerns per hour on average. Group A found 44.78% more numbers of concerns per hour than Group B. For problems per hour, Group A found 1.63 problems per hour, while Group B found 1.02 problems per hour on average. For 59.57% of the data, Group A is higher than Group B for numbers of problems per hour. The results (p- value=0.045 as seen in Table 5-16) of the t-test show that there is a difference between the two groups for numbers of problems per hour with statistical significance. Issue Group Mean p-value % Group A higher Group A (VBR) 2.6 concerns Group B (CBR) 1.79 0.083 44.78% Group A (VBR) 1.63 problems Group B (CBR) 1.02 0.045 59.57% Table 5-16 Comparison of Group A and Group B: Mean Number per hour based on Concerns and Problems of SSAD 15.64 7.95 9.68 5.59 0 2 4 6 8 10 12 14 16 CE/Hr (concerns) CE/Hr (problems) Group A Group B Figure 5-25 Average impacts of issues per hour (Cost Effectiveness) of SSAD Figure 5-25 shows cost effectiveness of issues for SSAD. The results of cost effectiveness contrast with the results of numbers of issues per hour for SSAD. Group A achieved a 15.64 cost effectiveness value for concerns, while Group B achieved only a 7.95 value. The percentage for which Group A is higher than Group B is 96.79%, which is a serious difference compared to 44.78% — the percentage of numbers of concerns per hour where Group A is higher than Group B. These results mean that the IV&Vers in Group A focused on higher-value issues compared to the IV&Vers in Group B. The t-test p-value for concern cost effectiveness is 0.014, which is small enough to prove that there is a difference between the two groups for cost effectiveness for SSAD as well (Table 5-17). 101 102 Impact Group Mean p-value % Group A higher Group A (VBR) 15.64 concerns Group B (CBR) 7.95 0.014 96.79% Group A (VBR) 9.68 problems Group B (CBR) 5.59 0.045 73.28% Table 5-17 Comparison of Group A and Group B: Mean Cost Effectiveness based on Concerns and Problems of SSAD The results for problem cost effectiveness are similar to the results of cost effectiveness of concerns. Group A achieved a 9.68 cost effectiveness value for problems. Group B achieved a 5.59 value, which is smaller than the value Group A achieved. Group A’s value is 73.28% higher than Group B’s, which provides strong evidence of a difference. T-test results in Table 5-42 show the differences as well. The p-value is 0.045 for problem cost effectiveness, which proves the difference with statistical significance. 103 CHAPTER 6 THREATS TO VALIDITY AND LIMITATIONS 6.1 Nonuniformity of Projects Each project covered a different application with different client and technical characteristics. Some were highly COTS-intensive with fewer artifacts to review. Some were done very well with fewer problems to find. Some were highly dynamic, leading to version mismatches between developers and IV&Vers. Some had relatively good communication between on-campus developers and off- campus IV&Vers, while other projects had significant communications problems. These are all sources of high variability across projects, which lead to higher standard deviations for each group but not sources of bias between groups. In situations with two IV&Vers on a project, I randomly assigned one to Group A and one to Group B, which tended to reduce some of the variability across projects. 6.2 Nonuniformity of Subjects In both Groups A and B, there were outlier performers who were either tremendously effective or in over their heads. In some cases, this also led to unreliable data reporting. As a result, I excluded the highest and lowest performers in Groups A and B. Subjects also exhibited considerable variability 104 across reviews, particularly if their job duties were heavy at review times. Again, these differences led to higher standard deviations but not sources of bias. 6.3 Nonuniformity of Motivation I presented the experiment as a comparison of tried and untried review methods. I provided equal training time to both groups. The course instructors and I indicated that we were not sure which method would work better. Grading criteria (primarily Number of Problems) were uniform across the two groups, but the two groups were graded on separate curves with very similar grade distributions. Again, this tended to minimize bias between groups. 6.4 Treatment Leakage I tried to avoid leakage of the value-based guidelines to Group B IV&Vers and do not know of any complete access to the value-based guidelines by Group B subjects. But as the USC software engineering program teaches value-based methods elsewhere, some concept leakage was inevitable. Some of the better Group B subjects showed a more value-based orientation, but that could have come through innate value orientation or job experience. 105 6.5 Nonrepresentativeness of Subjects Although the on-campus development teams are primarily full-time MS-level students, the IV&Vers are almost all full-time professional employees taking the course via distance learning. Their review schedule conflicts were similar to review schedule conflicts on the job. Thus the results should be reasonably representative of industrial review practices. 6.6 Limitation: Individual vs. Group Reviewing One more limitation of VBR is that reviews were performed individually. Each reviewer had reviewed artifacts along the value-based checklists depending on the issues’ priorities and criticalities. However, some other reading techniques, such as PBR and DBR, provide organized group-based review processes. Reviewers in the PBR and DBR have different roles in reading artifacts so that they can focus on different defect types. This leads to full coverage of defects in artifacts. 106 CHAPTER 7 DISCUSSION AND CONCLUSIONS 7.1 Overall By Number P-value % Group A higher By Impact P-value % Group A higher Average of Concerns 0.202 34% Average Impact of Concerns 0.049 65% Average of Problems 0.056 51% Average Impact of Problems 0.012 89% Average of Concerns per hour 0.026 55% Average Cost Effectiveness of Concerns 0.004 105% Average of Problems per hour 0.023 61% Average Cost Effectiveness of Problems 0.007 108% Table 7-1 The P-values from the T-tests The t-test is performed to estimate the probability that the two groups’ means will be equal. The t-test p-values, which are explained in Section 5, are shown in Table 7-1. The results show that the equal-means hypothesis can be rejected at the p=0.05 level in six cases (average number of concerns and problems per hour, impact of concerns, impact of problems, concern cost effectiveness and problem cost effectiveness). The results of the t-tests in Table 7-1 show that one cannot reject the hypothesis that there are no differences in numbers of concerns and problems found by IV&Vers between the value-based review group and the 107 traditional review group, even though I found that there are some gaps between the two groups’ means. However, I can reject the hypothesis that the numbers of concerns and problems per hour found by IV&Vers do not differ in both groups (0.026 for concerns, 0.023 for problems). The mean effort that the IV&Vers spent for reviewing was lower for Group A, which caused the difference. Also, there are differences in impact and cost effectiveness of concerns and problems between the two groups with statistical significance. The overall 65% to 89% higher mean effectiveness metrics of the value-based review group over the value-neutral review group, and the overall 105% to 108% higher mean cost effectiveness of the value-based review group provide strongly indicative evidence that value-based review techniques were roughly twice more cost effective than value-neutral techniques. One of the other major benefits of using a value-based checklist is an educational one. Qualitative IV&Ver feedback indicated that the value-based checklists provided stronger support for understanding review objectives and for prioritizing review effort. As a result, the USC software engineering program committed to the value-based checklists for both intra-project and IV&V reviews for current and future projects. 108 7.2 Detailed Analysis of artifacts: OCD Basically, there are more differences between the two groups for OCD than other artifacts. Table 7-2 shows t-test results and the percentages where Group A is higher than Group B. Except one case of “Number of concerns,” there are differences between the two groups with statistical significant. Especially for the cases of “Cost Effectiveness of issues,” the p-values are very small, which clearly verify that there are large differences between the groups with more than twice the cost-effectiveness values. For the case of “Number of concerns,” even though there are no differences between both groups with statistical evidence, the percentage of 63.74% where Group A is higher than Group B gives a strong sign that the difference with VBR is noticeable. P-value % Group A higher No differences in Number of concerns 0.059 63.74% Differences in Number of problems 0.009 91.44% Effectiveness of concerns 0.025 91.66% Effectiveness of problems 0.003 166.25 Number of concerns per Hr 0.031 52.26% Number of problems per Hr. 0.023 75.39% Cost Effectiveness of concerns 0.002 119.98% Cost Effectiveness of problems 0.007 132.58% Table 7-2 The P-values from the T-tests of OCD 109 7.3 Detailed Analysis of artifacts: SSRD P-value % Group A higher No differences in Number of concerns 0.886 3.01% Number of problems 0.992 2.29% Effectiveness of concerns 0.289 26.29% Effectiveness of problems 0.217 28.68% Number of concerns per Hr 0.774 -6.07% Number of problems per Hr. 0.831 5.2% Cost Effectiveness of concerns 0.121 41.54% Cost Effectiveness of problems 0.176 36.37% Table 7-3 The P-values from the T-tests of SSRD Unlike the results of OCD, no case can prove the differences between the groups for SSRD. All p-values for all cases are high enough to reject any hypotheses. However, there are some gaps between the two groups for “Effectiveness of issues.” These gaps mean that even though the numbers of issues the IV&Vers found are not different, the IV&Vers in Group A focused on more higher-value issues than Group B. The differences of the two groups are larger for “Cost Effectiveness of issues.” This is a strong sign that VBR is a good reading technique for focusing on higher-value issues. 110 7.4 Detailed Analysis of artifacts: SSAD Over all, results of comparisons of SSAD show that VBR is more effective than CBR. One of main objectives of VBR is to increase the cost effectiveness of reading processes. The p-values for “Cost Effectiveness of issues” are small enough to prove that there are differences between the two groups with statistical significance. For concerns cost effectiveness, Group A had almost twice the cost effectiveness value compared to Group B. For problems cost effectiveness, Group A had a more than 70% higher value than Group B. Also, there is a difference between the two groups for numbers of problems per hour. The p-value is 0.045, and Group A achieved about a 60% higher value than Group B for numbers of problems per hour. P-value % Group A higher No differences in Number of concerns 0.393 22.83% Number of problems 0.248 30.87% Effectiveness of concerns 0.168 48.78% Effectiveness of problems 0.207 38.09% Number of concerns per Hr 0.083 44.78% Differences in Number of problems per Hr. 0.045 59.57% Cost Effectiveness of concerns 0.014 96.79% Cost Effectiveness of problems 0.045 73.28% Table 7-4 The P-values from the T-tests of SSAD 111 For other cases except for the above three cases, t-tests cannot verify whether there are differences between the two groups. However, VBR shows some higher performance in the cases of numbers of issues and effectiveness. The percentages where Group A is higher than Group B give these impressions. 112 CHAPTER 8 FUTURE RESEARCH DIRECTIONS 8.1 Update Value Based Checklists for OCD, SSRD and SSAD Basically, the value-based checklist for OCD, SSRD and SSAD are verified in order to be useful not only for review processes but also for cost effectiveness. However, in some of the feedback, the IV&Vers provided some helpful suggestions for the current value-based checklist. Based on the feedback and additional research on the checklists, the value- based checklists for OCD, SSRD and SSAD will be updated. 8.2 Value Based Test Experimentation: Count Part of the Value- Based Review The current VBR is only for the purpose of the review process. Future research will develop a test process, which adds the values into test cases. Similar to VBR, priority and criticality of requirements will be added, and higher-value items will be tested first to increase cost effectiveness. It is necessary to build a value-based test processes to perform test with adding values. It will count as part of the VBR. 113 8.3 Value Based Checklists for Other MBASE Documents: Life Cycle Plan (LCP) and Feasibility Rationale Description (FRD) The LCP and FRD are also key documents in MBASE. Because the two documents are not directly related to development, value-based checklists for the two documents are not prepared in the current VBR package. Based on the experiences of existing value-based checklists, the new value-based checklists for LCP and FRD will be developed and added into the VBR package. 8.4 Address the Value-based fixing process The current VBR only focuses on detecting issues. However, some feedback indicates that the value-based fixing process also increases the cost effectiveness of development. The basic idea is to add to the ease of fixing issues in the process. With priority and criticality, ease will be another main factor to be considered in the fixing process. 8.4.1 Motivation: Value-based fixing process — Large project data analysis CSE at USC had the opportunity to analyze Center/TRACON Automation System (CTAS) defect data. Through the analysis, CSE was able to make some suggestions to increase the process quality in the CTAS project. The analysis had been done based on defect data in Distributed Defect Tracking System (DDTS) directly. The CSE focused on open days. Open days can be defined as the days for sequential processes for some objective. For example, the open days for IDFix- ID(Identification), Fix(Rework) are the days from the submission date of the defects to the resolve date of the defects. Open_Days_IDFix covers the first two processes: “Identification” and “Rework.” “Identification” process is open days from detection of a defect to identify it. A plan to fix the defect is set up in this process. “Rework” step only covers open days that developers fix the defect. The open days were analyzed based on the priority of the defects. The open days showed how the CTAS project had handled the defect by its priority. Figure 8-1 shows definitions of open days. “Verification & Validation” process was not covered in the analysis because the detail data for the process were not available at that time. Identification Rework Verification & Validation Open_Days_ID Open_Days_Fix Open_Days_IDFix Figure 8-1 Definitions of Open_Days 114 115 The results from CTAS data analysis The total number of defects analyzed was 922. More detailed data is shown in Table 8-1. More than 60% of the total defects are removed in one month, while less than 8% of total defects take more than six months to fix. Since the Open_Days_IDFix does not involve V&V days, it actually will take more time to close the defects completely. Six months can severely impact a project schedule. If major, the defect could cause a crucial problem for the product. This table shows that fixing the defects is not easy work. Less than 10 days Less than 1 month Less than 2 months Less than 3 months Less than 6 months Less than a year More than a year Total Number of defects 347 566 715 775 854 895 992 992 Accumulated percentage 37.64% 61.39% 77.55% 84.06% 92.62% 97.07% 100% 100% Table 8-1 Open_Days_IDFix on defects (Accumulated number of defects) Next, I will examine open days by priority. There are three levels of priority: high, medium and low. There are 128 defects that are without priority value. This group will have a priority value of N/A. Table 8-2 shows Open_Days_IDFix on defect by priority. The number of defects with priority medium is 462 of 922, which is about 50%. 116 Less than 10 days Less than 1 month Less than 2 months Less than 3 months Less than 6 months Less than a year More than a year Total Priority low 33 60 70 78 85 88 95 95 Accumulated percentage 34.74% 63.16% 73.68% 82.11% 89.47% 92.63% 100% Priority medium 91 147 189 197 224 231 237 237 Accumulated percentage 38.40% 62.03% 79.75% 83.12% 94.51% 97.47% 100% Priority high 192 300 368 396 429 451 462 462 Accumulated percentage 41.56% 64.94% 79.65% 85.71% 92.86% 97.62% 100% Priority N/A 31 59 88 104 116 125 128 128 Accumulated percentage 24.22% 46.09% 68.75% 81.25% 90.63% 97.66% 100% Total 347 566 715 775 854 895 992 992 Accumulated percentage 37.64% 61.39% 77.55% 84.06% 92.62% 97.07% 100% Table 8-2 Open_Days_IDFix on the defect by priority Figure 8-2 shows that all lines are concentrated on the center. Because all lines have almost the same slope, some overlapping occurs (the defects with value N/A have not been considered). 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% Less than 10 days less than 1 month less than 2 month less than 3 months less than 6 months less than an year All Priority Low Priority Medium Priority High Priority N/A Average s Figure 8-2 The percentage of accumulated number of defects by their severity There are several possible causes for these results: z The defect closure process operates dependent of priority but not by much. z Higher priority defects receive more and earlier attention but take longer to close. z Data uniformity may be induced by artifacts, such as batching forms processing. z Priorities may change as may other contributing factors. 117 118 From the analysis results, a value-based process is considered to be necessary to increase software quality and cost effectiveness. Fixing or managing higher priority defects first could increase customer satisfaction. 8.5 Combinations of VBR and PBR for Group-Based Reviews The key strength of PBR is that each reviewer has a different perspective when reading artifacts so that they can detect different defect types. It gives more active attitude for reviewers to find defects in artifacts and union of all reviewers’ reading cover all defects in artifacts. A combination of VBR and PBR based on group-based review can provide a more thorough coverage of defects in artifacts through value-based processes and value-based checklists. 119 BIBLIOGRAPHY [ABDE04] Z. Abdelrabi, E. Cantone, M. Ciolkowski, and D. Rombach, “Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate,” Proc. ISESE, pp. 239-248. 2004 [AURU02] A. Aurum, H. Petersson, and C. Wohlin, State-of-the-Art: software inspections after 25 years, Software Testing, Verification, and Reliability, Vol. 12, pp. 133-154, 2002. [AURU05] A. Aurum, S. Biffl, B. Boehm, H. Erdogmus, and P. Gruenbacher, (eds.), Value-Based Software Engineering, Springer Verlag, 2005 [BASI96] V. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull, S. Sorumgard, and M. Zelkowitz, “The empirical investigation of perspective-based reading,” Intl. J. Empirical SW. Engr., 1(2), pp.133-164, 1996. [BIFF01] S. Biffl, Software Inspection Techniques to Support Project and Quality Management. Austria: Habilitationsschrift, Shaker Verlag, 2001. [BOEH81] B. Boehm, Software Engineering Economics, Prentice Hall, 1981 [BOEH98] B. Boehm, A. Egyed, J. Kwan, D. Port, A. Shah, and R. Madachy, “Using the WinWin Spiral Model: A Case Study”, Computer, pp. 33-44, July 1998. [BOEH99] B. Boehm, D. Port, M. Abi-Antoun, and A. Egyed, “Guidelines for the Life Cycle Objectives (LCO) and the Life Cycle Architecture (LCA) deliverables for Model-Based Architecting and Software Engineering (MBASE)”, USC Technical Report USC-CSE-98-519, February 1999. [BOEH01] B. Boehm, and V. Basili, “Software Defect Reduction Top 10 List,” Computer, pp.135-137, January 2001. 120 [BERL04] T. Berling, and T. Thelin, “A case study of reading techniques in a software company”, Proc. ISESE, pp. 229-238, 2004. [CIOL97] M. Ciolkowski, D. Differding, O. Laitenberger, and J. Münch, “Empirical Investigation of Perspective-Based Reading: A Replicated Experiment,” ISERN Report no. 97-13, 1997. [CONR03] R. Conradi, and A. Wang, (eds.), Empirical Methods and Studies in Software Engineering: Experiences from ESERNET, Springer Verlag, 2003. [DENG04] C. Denger, M. Ciolkowski, and F. Lanubile, “Investing the active guidance factor in reading techniques for defect detection,” Proc, ISESE, pp. 219- 228, 2004. [FAGA76] M. Fagan, M., “Design and code inspections to reduce errors in program development”, IBM Sys. J IS(3), pp. 182-211, 1976. [FUSA97] P. Fusaro, F. Lanubile, and G. Visaggio, “A Replicated Experiment to Assess Requirements Inspection Techniques,” Empirical Software Eng.: An Int’l J., vol. 2, no. 1, pp. 39-57, 1997. [GILB93] T. Gilb, and D. Graham, Software Inspection. Addison-Wesley, 1993. [HALL01] M. Halling, S. Biffl, T. Grechenig, and M. Köhle, “Using Reading Techniques to Focus Inspection Performance,” Proc. 27th Euromicro Workshop Software Process and Product Improvement, pp. 248- 257, 2001. [JACO92] I. Jacobson, M. Christerson, P. Jonsson, P., and G. Övergaard, Object- Oriented Software Engineering: A Use Case Driven Approach. Addison-Wesley, 1992. [LANU00] F. Lanubile, and G. Visaggio, “Evaluating Defect Detection Techniques for Software Requirements Inspections,” Technical Report no. 00-08, ISERN, 2000. 121 [LEE05] K. Lee, M. Phongpaibul, and B. Boehm, “Value-based verification and validation guidelines,” Technical Report USC-CSE-2005-502, February 2005. [MILL98] J. Miller, M. Wood, and M. Roper, “Further Experiences with Scenarios and Checklists,” Empirical Software Eng.: An Int’l J., vol. 3, no. 3, pp. 37-64, 1998. [MUSA98] J.D. Musa, Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, 1998. [PERS04] C. Persson, and N. Yilmazturk, “Establishment of Automated Regression Testing at ABB: Industrial Experience Report on 'Avoiding the Pitfalls'”, 19 th IEEE International Conference on Automated Software Engineering, pp. 112-121, Sep 2004. [PORT95] A. Porter, L. Votta, and V. Basili, “Comparing Detection Methods for software Requirement Inspection: a Replicated Experiment,” IEEE Trans. Software Eng., vol 21, no 6, pp. 563-575, june 1995. [PORT98] A. Porter, and L. Votta, “Comparing Detection Methods for Software Requirements Inspection: A Replication Using Professional Subjects,” Empirical Software Eng.: An Int’l J., vol. 3, no. 4, pp. 355-380, 1998. [REGN00] B. Regnell, P. Runeson, and T. Thelin, “Are the Perspectives Really Different?—Further Experimentation on Scenario-Based Reading of Requirements,” Empirical Software Eng.: An Int’l J., vol. 5, no. 4, pp. 331-356, 2000. [SAND98] K. Sandahl, O. Blomkvist, J. Karlsson, C. Krysander, M. Lindvall, and N. Ohlsson, “An Extended Replication of an Experiment for Assessing Methods for Software Requirements,” Empirical Software Eng.: An Int’l J., vol. 3, no. 4, pp. 381-406, 1998. [SHUL98] F. Shull, “Developing Techniques for Using Software Documents: A Series of Empirical Studies,” PhD thesis, Computer Science Dept., Univ. of Maryland, 1998. 122 [SHUL02] F. Shull, V. Basili, M. Zelkowitz, B. Boehm, A.W. Brown, D. Port, I. Rus, and R. Tesoreiro, “What we have learned about fighting defects,” Proc, Intl. Conf. SW Metrics, June 2002. [SORU97] S. Sorumgard, “Verification of Process Conformance in Empirical Studies of Software Development,” PhD thesis, Dept. of Computer and Information Science, Norwegian Univ. of Science and Technology, 1997. [THEL03A] T. Thelin, P. Runeson, and C. Wohlin, “Prioritized use cases as a vehicle for software inspections,” Software, pp. 30-33, July/Aug 2003. [THEL03B] T. Thelin, P. Runeson, and C. Wohlin, “An experimental comparison of usage-based and checklist-based reading” Software Engineering, IEEE Transactions, vol. 29, issue 8, pp.687 – 704, Aug. 2003. [THEL04] T. Thelin, C. Andersson, P. Runeson, and N. Dzamashvili-Fogelstrom, “A replicated experiment of usage-based and checklist-based reading”, Software Metrics, 2004. Proceedings. 10th International Symposium on 14-16, pp. 246 – 256, Sep. 2004. 123 APPENDICES Appendix A: Value-Based verification and Validation Guideline 0. Overview This Appendix provides the value-based review processes and checklists used in the team projects, IV&V reviews, and the experiment. The detailed checklists are related to specific sections of the USC MBASE Guidelines, Version 3.4.1 [USCSE04]. However, it should be relatively straight forward to modify the processes and checklists to other software project guidelines such as there of the Rational Unified Process [KRUC98]. I would like to acknowledge the contributions of Monvorath Phongpaibul and Prof. Barry Boehm in reviewing and iterating the process and checklists. 1. Background and Motivation The USC Center for Software Engineering’s Value-Based Software Engineering agenda involves experimentation with value-based reformulations of traditional value-neutral software engineering methods. The experimentation explores conditions under which value-based methods lead to more cost-effective project outcomes, and assesses the degree of impact that value-based methods have on the various dimensions of project outcomes. Examples of areas in which value- based technical have shown improvements in cost-effectiveness have included stakeholder win-win requirements determination, use of value-based anchor point milestones, use of prioritized requirements to support schedule-as-independent 124 variable development processes, and the use of risk management and business case analysis to support value-based project monitoring and control. The value-based V&V guidelines presented here are an initial experiment in value-based inspection and test. They use the relative priority of artifacts as determined from stakeholder negotiations, and the relative criticality of defect types, to focus effort on high-impact (=priority * criticality) defects Section 2 provides the initial revised guidelines and new checklists for value- based inspections. Section 3 provides counterpart initial guidelines and review checklists for value-based test planning, execution, and reviews. The guidelines were prepared to be used by off-campus students performing independent V&V functions on USC’s real-client team-project-course projects, but can be easily adapted to support intra-project peer reviews or other IV&V modes. 2. Value-Based Guidelines and Checklists for Agile Artifact Reviews 2.1 Overview In a peer review, co-workers of a person who created an artifact examine that product to identify defects and correct shortcomings. A review: • verifies whether the work product correctly satisfies the specifications found in any predecessor work product, such as requirements or design documents • identifies any deviation from standards • suggests improvement opportunities to the author • promotes the exchange of techniques and education of the participants. 125 All interim and final development work products are candidates for review, including: • requirements specifications • user interface specifications and designs • architecture, high-level design, and detailed designs and models • source code • test plans, designs, cases, and procedures • software development plans, including project management plan, configuration management plan, and quality assurance plan The value-based checklists here cover three specification documents: the Operational Concept Description (OCD), System and Software Requirements Description (SSRD), and System and Software Architecture Description (SSAD). 2.2 Agile Artifact Review Procedures The Agile Artifact Review is a lightweight review technique, in which artifacts are reviewed in detail by the reviewer. This technique is designed for a single reviewer, independent from the author, but can be used by more than one reviewer. Objective To identify defects as closely as possible to the point of occurrence in order to facilitate corrective action. 126 Participants • The author (or project manager) • IV&Ver (Reviewer) • IV&V Coordinator (for USC CS 577 projects, a 577 staff person) Review Forms • AgileArtifactReview_Form.xls: Concern Log and Problem List • AgileArtifactReview_Form-FieldDesc.xls Entry Criteria • The author or project manager has stated his or her objectives for the review, including any guidance on required vs. optional impact levels to be reviewed. • Each capability’s priority attribute has a High, Medium, or Low rating. • The document has been spell-checked. • Reviewers understand the General Value-Based checklist and procedures • Reviewers understand the MBASE Value-Based checklist 127 Tasks Responsible 1. Either distribute a physical or electronic copy of the work product to a reviewer, or simply notify a reviewer that work product is available. Author informs a reviewer that the artifact is ready to be reviewed. Author 2. Either distribute a physical or electronic copy of the review form to a reviewer. (Only for first review) IV&V Coordinator 3. If necessary, the author/team leader explains the document background, purpose, design rationale, etc. (i.e. a fast walkthrough) Author and Team Leader 4. Obtain the artifact to be reviewed including review forms. IV&Vers 5. Examine the artifact to understand it. Focus on finding defects and evaluating the artifact. 5-1. Understand the general idea of the General Value-based checklist. IV&Vers need to keep in mind the General Value-based checklist in reviewing the artifact. 5-2. Understand the priorities of system capabilities through MBASE guidelines and WinWin Negotiation. Contact the development team to understand the project system capabilities. 5-3. Start to review the artifact. 5-4. If the priority of capability being read is high, this will be review at first. If priority is medium or low, defer reviewing it until later. 5-5. Go to the appropriate matching section in the checklist to review the issues. 5-6. Review the high criticality issues first in the checklist. After finishing review on the high critical issues, go to the next level of critical issues. 5-7. After finishing the high prioritized capabilities, review the next level capabilities if time is available. 5-8. Either hand-write comments in ‘Areas of Concern Log’, or enter comments into the file. Deliver the ‘Areas of Concern Log’ with comments to the author and IV&V coordinator (via DEN at USC) after completing the review. Keep a copy for yourself. IV&Vers 6. Perform any necessary rework of the work product (including any other project artifacts affected by defects identified). z Mark ‘Area of Concern Log’ to indicate action taken and send copy to originating IV&V’er and IV&V coordinator. z Record defects and resolve issues raised on ‘Problem List’. Author 7. After finish rework, check in work product. Author 128 Deliverables • Mark up of artifact (optional) • Filled Agile Artifact Review Forms Verification Verification of rework is performed by the project manager or his/her designate. The author is responsible for making appropriate decisions on issues and for correctly performing any rework. Exit Criteria • The reviewer has addressed all required impact-level review artifacts and questions. • The reviewer has complete the review forms (Figure A-1 and A-2) in Appendix A.. 2.2.1 Value-Based Checklist Definitions Priority • The priority of the system capability in the artifact. In MBASE, the priority is determined from WinWin negotiations, meetings with clients and prioritizes indicated in the MBASE Guidelines. • There are fields in the AAR form for priority. The values of priority are High, Medium, Low or 3, 2, 1. Higher priority capabilities will be reviewed first. The value will be used to calculate effectiveness metrics. 129 Criticality • Generally, the values of criticalities are given by SE experts, but IV&Vers can determine the values in special circumstances. • There are fields in the AAR form for criticality. The values of criticality are High, Medium, Low or 3, 2, 1. Higher criticality issues will be reviewed first a given level of prioritized capabilities. The value will be used to calculate effectiveness metrics. Effectiveness Metrics • The impact of each issue-identified can be calculated by multiplying the values of the priority and criticality. The total effectiveness of the review is the total sum of issue impacts. Concerns, Problems, Issues, and Questions • A concern is a potential problem identified by a reviewer. • A problem is a concern accepted as something that needs fixing by the author. • An open problem is a concern identified as someone else’s problem by the author. • An issue is a generic term including concerns, problems, and open problems. • A question is a potential issue suggested for review in the checklists. 130 2.2.2. Detail on the process to review an artifact with the Value-based checklists 1) Understand the importance of each system capability Check and understand the importance of each system capability. Most artifacts to be reviewed have priorities determined through WinWin negotiation results and the MBASE Guidelines. 2) Understand the General Value-Based Checklist The General Value-Based checklist is a basic checklist you should look during the review always. It is a good cross-check for determining the relative criticality of defects. It is best to use it in concert with the table-of contents checklist. 3) Perform a prioritized review Read the artifacts in order. If the capability’s priority is high, review it with its Value-Based checklist and the General Value-Based checklist. If the priority is other (Medium or Low), then you can defer it unless it makes sense to maintain review continuity and context. 4) Review each artifact based on issue criticalities. When you review the requirement or system capability, you need to identify the relative criticality of the issue given in Value-Based checklist. Based on criticalities, you can determine which issues you need to review first. Review high criticality issues first. After reviewing for the high-criticality issue, you can proceed to the lower criticality issues. In general, it is best to continue reviewing all criticality levels for High priority artifacts, to make Low-criticality issues optional for Medium priority artifacts, and Medium and Low-criticality issues optional for Low priority artifacts, as shown in Figure 1. 131 5) Fill out the AAR from provided in Appendix A. Write down each concern and its priority, criticality. For more details on AAR form, refer file AgileArtifactReview_Form-FieldDesc.xls. There are descriptions on all the fields in the form. Reference [KRUC98] P. Kruchten. Rational Unified Process. Addison-Wesley, 1998 [USCCSE04] USC-CSE. Guidelines for Model-Based (System) Architecture and Software Engineering. ver. 3.4.1, 2004 132 Appendix B: Value Based Checklists Value Based Checklist for OCD 1. Introduction 1.1. Purpose 1.2. References Question Criticality ‰ Have all citations and sources used in the preparation of this document been provided? 2 1.3. Change Summary Question Criticality ‰ Have all key changes been logged? 3 ‰ Are there SSAD change counterparts for all OCD and SSRD changes with SSAD implications? 3 2. Shared Vision 2.1 System Capability Description Question Criticality ‰ Does the description state clearly: - Who is the customer? - What is the proposed system? - What is the key benefit to the customer of using the proposed system? - What makes the proposed system better than other alternatives? 3 133 2.1.1. Benefits Realized 2.1.2 Results Chain Question Criticality ‰ Are initiatives identified as necessary for conversion, installation, and training? 3 ‰ Are barriers to successful implementation (e.g. making losers of some operational stakeholders) addressed by initiatives? 3 ‰ Are there major logical gaps between initiatives, contributions, and outcomes? 3 ‰ Where there are doubts about adequacy of continuing management support and key resource availability, are these stated as assumptions? 3 ‰ Are the outcomes consistent with those in OCD 2.1? 3 ‰ If UML is used as the representation, is the UML usage proper? 1 2.2 Key Stakeholders Question Criticality ‰ Have all key stakeholders identified in the Results Chains clearly been described with their role? 3 ‰ Are the stakeholder representatives collaborative, representative, authorized, committed, and knowledgeable? 3 2.3 System Boundary and Environment Question Criticality ‰ Are the key operational stakeholders (OCD 2.2) included in the system environment? 3 ‰ Are key operational stakeholders who are not included in OCD 2.2 represented (e.g. administrators, interfaces)? 3 ‰ Are all project initiative contributions in the results chains (OCD 2.1.2) identified as services provided? 3 ‰ Are low-impact services excluded (e.g. login)? 2 ‰ If the diagram is using UML, is its UML usage proper? 1 134 2.4 Major Project Constraints Question Criticality ‰ Have all non-negotiable project constraints (schedule, budget, infrastructure compatibility) been listed and clearly been described? 3 ‰ Have all negotiable project constraints been identified as such? 2 ‰ Have all low-impact project constraints (e.g. detailed naming conventions) been excluded? 2 2.5 Top-Level Business Case Question Criticality ‰ Do the costs in the business case reflect the initiatives in the Results Chain? 3 ‰ Do the benefits in the business case reflect the outcomes in the Results Chain? 3 ‰ Are low-impact business case details avoided? 2 2.6 Inception Phase Plan and Required Resources Question Criticality ‰ Are inception phase roles, responsibilities, and required resources of critical stakeholders committing to the inception phase been clearly identified? 3 2.7 Initial Spiral Objectives, Constraints, Alternatives, and Risks Question Criticality ‰ Are the spiral objectives consistent with the project initiative contributions in the Results Chain? 3 ‰ Are the spiral constraints consistent with OCD 2.4, Major Project Constraints? 3 ‰ Are there missing critical risks with respect to the Top-10 Risk List in FRD 4? 3 135 3 Domain/Organization Description 3.1 Organization Background Question Criticality ‰ Is a simple organization chart relating the sponsor, user, and maintainer organizations included? 2 ‰ Are low-level details (e.g. work hours) avoided? 2 3.2 Organization Goals Question Criticality ‰ Are the organization goals consistent with the benefits realized and desired outcomes (OCD 2.1)? 3 ‰ Do any goals violate the constraints in OCD 2.4? 3 ‰ Are there no system services documented as organization goals? 2 ‰ Do the goals include measurement and relevance indicators? 2 ‰ Are there no organization goals that are never referenced by organization processes (OCD3.3.3 and 4.5.3), project goals (OCD 4.2) or capabilities (OCD 4.3), or system requirements (SSRD 3.2)? 2 ‰ Are low-level details (e.g. work hours) avoided? 2 3.3 Current Organization Environment 3.3.1 Structure Question Criticality ‰ Have all workers and outside actors relevant to the system services in OCD 2.3 and the shortcomings in OCD 3.3.5 been addressed? 2 ‰ Are workers and outside actors irrelevant to the system services in OCD 2.3 excluded? 2 ‰ Are unnecessary worker/actor distinctions avoided (e.g. students and faculty using the same services)? 2 ‰ Are structural relationships consistent? 2 ‰ Have the operational stakeholders (OCD 2.2, 4.6.1) that are part of the current organization been represented as a worker? 2 ‰ Are UML notations properly used? 1 136 3.3.2 Artifacts Question Criticality ‰ Have all artifacts relevant to the system services in OCD 2.3 and the shortcomings in OCD 3.3.5 been addressed? 2 ‰ Are artifacts irrelevant to the system services in OCD 2.3 excluded? 2 ‰ Are the relevant structure elements in OCD 3.3.1 consistent with the business-artifact model? 2 ‰ Has each artifact, described in this section, been used or produced by one or more of the organization’s process (OCD 3.3.3)? 1 ‰ Are UML notations properly used? 1 3.3.3 Processes Question Criticality ‰ Have all processes relevant to the system services in OCD 2.3 and the shortcomings in OCD 3.3.5 been addressed as Use-Cases? 2 ‰ Are processes irrelevant to the system services in OCD 2.3 excluded? 2 ‰ Have all relevant Use-Cases been described with respect to priorities, pre- conditions and post-condition? 2 ‰ Are UML notations properly used? 1 3.3.4 Rules Question Criticality ‰ Have all rules or high-priority standards relevant to the system services in OCD 2.3 and the shortcomings in OCD 3.3.5 been addressed and clearly described? 3 3.3.5 Shortcomings Question Criticality ‰ Are the current system shortcomings sufficiently critical to the organization goals in OCD 3.2 to justify the project initiatives in OCD 2.1.2 and the development of the system services in OCD 2.13? 3 ‰ Are the shortcomings prioritized (High, Medium, Lows)? 3 137 4 Proposed System 4.1 Statement of Purpose Question Criticality ‰ Is the statement of purpose consistent with the organization goals (OCD 3.2) and the system shared vision (OCD 2)? 3 ‰ Does the statement of system purpose resolve the current system shortfalls (OCD 3.3.5)? 3 ‰ Are there no architectural decisions or implications described as a statement of purpose? 2 ‰ Is the statement of purpose consistent with the organization background (OCD 3.1)? 1 4.2 Project Goals and Constraints Question Criticality ‰ Are all project goals and constraints consistent with the system purpose (OCD 4.1) and shared vision (OCD 2)? 3 ‰ Do the project goals and constraints relate to one or more organization goals (OCD 3.2), organization processes (OCD3.3.3), or major project constraints (OCD 2.4)? 3 ‰ Are the goals sufficiently measurable, relevant, and specific? 3 ‰ Are there no system capabilities (OCD 4.2.1) or level of service goals (OCD 4.4) included as project goals and constraints? 2 4.3 System Capabilities Question Criticality ‰ Are the system capabilities consistent with the system services provided as described in OCD 2.3? 3 ‰ Are there critical missing capabilities needed to perform the system services? 3 ‰ Are capabilities prioritized as High, Medium, or Low? 3 ‰ Are capability priorities consistent with current system shortcoming priorities (OCD 3.3.5)? 3 ‰ Are capabilities trace back to corresponding project goals and constraints (OCD 4.2)? 3 ‰ Are simple lower-priority capabilities (e.g., login) described in less detail? 2 ‰ Are there no levels of service goals included as system capabilities? 2 138 4.4 Levels of Service (L.O.S.) Goals Question Criticality ‰ Are levels of service goals prioritized as High, Medium, or Low? 3 ‰ Do all the levels of service goals trace back to corresponding organization goals (OCD 3.2)? 3 ‰ Are all the levels of service goals consistent with corresponding level of service requirements (SSRD 5)? 3 ‰ Is each level of service measurable, relevant and specific? 3 ‰ Are desired and acceptable levels of service specified? 3 ‰ Are there no project goals (OCD 4.2) or system capabilities (OCD 4.3) included as levels of service goals? 2 4.5 Changes in the Organization Environment Due to Proposed System 4.5.1 Structure Question Criticality ‰ Are the structural changes clear and easy to understand? 3 ‰ Are UML notations properly used? 1 4.5.2 Artifacts Question Criticality ‰ Are the artifact changes clear and easy to understand? 3 ‰ Are UML notations properly used? 1 4.5.3 Processes Question Criticality ‰ Are the process changes clear and easy to understand? 3 ‰ Are UML notations properly used? 1 139 4.5.4 Rules Question Criticality ‰ Are the rule changes clear and easy to understand? 3 ‰ Are UML notations properly used? 1 4.5.5 How New System Cures Current Shortcomings Question Criticality ‰ Are the structural, artifact, process, and rules changes consistent with each other? 3 ‰ Do they adequately address the High priority shortcomings? 3 ‰ Do they adequately address the Medium priority shortcomings? 2 ‰ Do they adequately address the Low priority shortcomings? 1 4.6 Effect on Organizations’ Support Operation Question Criticality ‰ Do the proposed changes move roles and responsibilities from one organization to another in mutually satisfactory (win-win) ways? 3 ‰ Do they require extensive retraining and cause obsolescence of previous skills? 3 ‰ Do they increasingly rather than decreasingly empower operational stakeholders? 3 ‰ Do they increase rather than decrease security and privacy safeguards? 3 ‰ Are the effects consistent with the desired outcomes in the results chain (OCD 2.1.2)? 3 5 Prototyping 5.1 Objectives Question Criticality ‰ Are prototypes focused on high-risk (IKIWISI, decision support) rather than low-risk (login) issues? 3 ‰ Do they address high-risk non-user interface issues as well? 3 ‰ Is their complexity commensurate with the level of risk and the available schedule? 2 ‰ Are prototype names realistic rather than abstract? 2 140 5.2 Approach Question Criticality ‰ Is the prototyping approach incremental rather than one-shot? 3 ‰ Is enough time allocated for prototype exercise and iteration? 3 ‰ Are prototype users representative of the full spectrum of users? 3 ‰ Are the tools adequately representative of operational practice? 2 ‰ Are prototypes appropriately named and numbered? 1 5.3 Initial Results Question Criticality ‰ Do the initial results adequately resolve the risks they were developed to address? 3 5.4 Conclusions Question Criticality ‰ Did the current results identify the most significant risks remaining to address via prototyping 3 ‰ Did they suggest more cost-effective ways of prototyping? 2 Value Based Checklists for SSRD Question Criticality ‰ Have all the requirements been prioritized (High, Medium, Low)? 3 ‰ Do the priorities agree with these that were negotiated between the stakeholders in the win-win sessions? 3 ‰ Do all the project requirements trace back to corresponding project goals and constraints (OCD 4.2)? 3 141 1. Introduction 1.1. Purpose 1.2. References Question Criticality ‰ Have all citations and sources used in the preparation of this document been provided? 2 1.3. Change Summary Question Criticality ‰ Have all key changes been logged? 3 ‰ Are there SSAD change counterparts for all OCD and SSRD changes with SSAD implications? 3 2. Project Requirements 2.1. Budget and Schedule Question Criticality ‰ Are budget and schedule requirements/constraints realistic with respect to project scope? 3 ‰ Are custom-code maintenance costs considered when specifying zero budget for COTS products? 3 2.2. Development Requirements Question Criticality ‰ Are development requirements focused on post-development needs (familiarity, compatibility) versus development preferences? 3 2.3. Deployment Requirements Question Criticality ‰ Do deployment requirements cover the full range of platforms and networks to be used in operations? 3 142 2.4. Transition Requirements Question Criticality ‰ Are the transition requirements achievable within a 2-week transition schedule? 3 ‰ Do transition requirements cover the full range of platforms and networks to be used in operations? 3 2.5. Support Environment Requirements Question Criticality ‰ Are requirements for delivery of development tools and test suites included? 2 3. Capability Requirements 3.1. System Definition Question Criticality ‰ Does the system definition trace back to the corresponding system capability description (OCD 2.1)? 3 3.2. System Requirements Note: Use the general Value-Based Checklist in reviewing for completeness, consistency/feasibility, ambiguity, conformance, and risk. Question Criticality ‰ Have all system requirements been prioritized (High, Medium, Low)? 3 ‰ Are the priorities consistent with the system capabilities priorities in OCD 4.3? 3 ‰ If there are many capabilities, are their priorities balanced among High, Medium, and Low? 3 ‰ Are requirements for multiple operational modes addressed? 3 ‰ Have any levels of service requirements been included as capability requirements? 2 ‰ Are all requirements actually requirements, and not design or implementation solutions which belong to SSAD? 2 143 4. System Interface Requirements Note: Use the general Value-Based Checklist in reviewing for completeness, consistency/feasibility, ambiguity, conformance, and risk. Question Criticality ‰ Have both user interface requirements and inter operating system interface requirements been described in this section? 3 ‰ Are interfaces to all interoperating systems in the system context diagram in OCD 2.3 defined? 3 ‰ Do interface definitions include not only message content (data order, type, units, frequency, etc.) but also protocols (initialization, operation, termination)? 3 4.1. User Interface Standards Requirements Note: Use the general Value-Based Checklist in reviewing for completeness, consistency/feasibility, ambiguity, conformance, and risk. 4.2. Hardware Interface Requirements Question Criticality ‰ Are specific constraints (e.g., screen size) included? 3 4.3. Communications Interface Requirements Question Criticality ‰ Are network security constraints (e.g., no CGI scripts) included? 3 4.4 Other Software Interface Requirements Question Criticality ‰ Are appropriate interfaces to common campus services (directories, catalogs) included? 3 144 5. Level of Service (L.O.S.) Requirements Question Criticality ‰ Are the level of service requirements prioritized (High, Medium, Low) 3 ‰ Are the prioritizes consistent with those in OCD 4.3 and OCD 4.4 3 ‰ Do the levels of service requirements specify more detail than the levels of service in OCD 3.3? In particular, do they reference workload characteristics (normal load, peak load)? 1-3 ‰ Are there no system requirements (SSRD 3.2) included as level of service requirements? 2 ‰ Do all the level of service requirements trace back to the corresponding levels of service in OCD 4.4 and organization goals in OCD 3.2? 1-3 ‰ Is each required level of service requirement appropriately measurable achievable, relevant, and specific? 1-3 6. Evolution Requirements Question Criticality ‰ Are the evolution requirement priorities generally (but not necessarily) lower than the SSRD 1-5 priorities? 2 Note: - Use the counterpart checklists for SSRD 1-5. - Use the general Value-Based Checklist in reviewing for completeness, consistency/feasibility, ambiguity, conformance, and risk. Value Based Checklists for SSAD SSAD artifacts inherit their priorities from the requirements they satisfy directly or help satisfy. Middleware and system software SSAD artifacts should be High priority unless their role is more off- line, such as recording transaction statistics or supporting off-line administrative updates. 145 In some cases below, SSAD review questions are given criticality ranges, for example, a 1-3 range means that the review item’s criticality is High when applied to a High-priority SSAD artifact and Low when applied to a Low-priority SSAD artifact, etc. The general Value-Based Checklist should also be use for prioritizing the review of SSAD completeness, consistency/feasibility, ambiguity, compliance, and risk issues. Integrating COTS products on tight schedules and accepting vendor claims about COTS products are frequently high source of risk that should be given a high priority in SSAD reviews, unless these is strong evidence of low risk. 1. Introduction 1.1. Purpose of the SSAD Document 1.2 Standards and Conventions Question Criticality ‰ Have all architecture standards been listed? 2 ‰ Have the naming conventions been addressed? 2 ‰ Have the naming conventions been followed? 2 ‰ Have all architecture standards been followed? 2 ‰ Are the client’s currently-used components being evaluated as alternatives? 2 1.3 References Question Criticality ‰ Have all citations and sources used in the preparation of this document been provided? 2 146 1.4 Change Summary Question Criticality ‰ Have all key changes been logged? 3 ‰ Are there SSAD change counterparts for all OCD and SSRD changes with SSAD implications? 3 2. System Analysis Question Criticality ‰ Is the system analysis model consistent with the current versions of the OCD and SSRD that it is summarizing? 3 2.1. Structure Question Criticality ‰ Are there no artifacts of the system (SSAD 2.2) included in this section? 2 ‰ Are there no components of the system (SSAD 3.1) included in this section? 2 ‰ Are there no organization workers or outside actors that do not interact with the system according to the system’s behavior description (SSAD 2.3) included in this section? 2 ‰ Has each actor been needed to implement a system capability or interface requirement (SSRD 3 and 4)? 2 ‰ Is the structure consistent with its counterpart in OCD 4.5.1? 1-3 2.1.1. System Question Criticality ‰ Have all system services, which are used in the system’s behavior (SSAD 2.3) been listed? 1-3 ‰ Have all system services shown in the system boundary and environment (OCD 2.3) been listed? 1-3 ‰ Has each system service been represented as a system capability (OCD 4.3)? 1-3 147 2.1.2. Actor X Question Criticality ‰ Have all system actors been addressed and clearly described by role, purpose, and responsibilities? 1-3 ‰ Have all attributes of each system actor that the system processes (SSAD 2.3.1) are known to be use been listed? 1-3 ‰ Have all services provided by each system actor that are used in system processes (SSAD 2.3.1) been listed? 1-3 ‰ Have all the processes in which each system actor participates been listed? 1-3 2.2. Artifacts & Information Question Criticality ‰ Have all system artifacts and information classes been addressed and clearly described by role, purpose, and responsibilities? 1-3 ‰ Are there no structural elements (SSAD 2.1) included in the Business- Artifact Model? 2 ‰ Are there no system components included? 1 ‰ Are there no artifacts which are not included in the artifacts of proposed system (OCD 4.5.2) included? 2 ‰ Are there no detail designs of artifacts or information included? 1 ‰ Are there no artifacts or information that are not represented in both architecture and implementation design (SSAD 3 or 4) included? 2 ‰ Are all artifacts or information inspected, manipulated, produced or maintained according to the system’s behavior (SSAD 2.3)? 2 2.3. Behavior 2.3.1. Processes Question Criticality ‰ Have all the processes which represent system capabilities (OCD 4.3) been defined in Use-Cases? 1-3 ‰ For LCO, have all Use-Cases been described by pre-conditions, post- condition, and included Use-Cases? 1-3 ‰ For LCA, have all risks for each Use-Case been identified and prioritized? 1-3 ‰ For LCA, have all Use-Cases added inclusions and extensions as appropriate? 1-3 ‰ Does each process’s name express the behavior that it represents? 2 148 2.3.2. Modes of Operation Question Criticality ‰ Have all the system modes been defined in terms of system states and state transition events? 3 ‰ Is the state model’s level of detail risk-driven (generally simple, but detailed for safety-critical systems)? 3 ‰ Have all the modes in the system requirements (SSRD 3.2.1) been represented in this section? 1-3 ‰ Have all system capabilities (OCD 4.3) and system processes (SSAD 2.3.1) been listed in one or more modes? 1-3 ‰ Is the state model’s initial state defined? 2 2.4. L.O.S. Goals Question Criticality ‰ Is there a model showing how satisfaction of each SSAD-element LOS goal is combined to ensure satisfaction of the system LOS requirement? 3 ‰ Have all level of service requirements (SSRD 5) been covered by SSAD- element level of service goals? 1-3 2.5. Rules Question Criticality ‰ Are all the rules that the system must implement listed? 2-3 ‰ Can each rule be traced back to the proposed organization’s rules (OCD 4.5.4)? 1-3 ‰ Has each rule process been defined in the system’s processes (SSAD 2.3.1)? 1-3 ‰ Has each rule mode been defined in the system’s modes (SSAD 2.3.2)? 1-3 149 3. Architecture Design & Analysis 3.1 Structure 3.1.1 Topology Question Criticality ‰ Are layers organized in a logical progression from low-layer hardware services to high-layer user services? 3 ‰ Are partitions subsystems and components contained compatibly within individual layers? 3 ‰ Are there approaches for dealing with COTS components that span multiple layers? 3 ‰ Are the layers, partitions, subsystems and components modeled in different UML diagrams? 2-3 ‰ Have all the dependency relations between layers been addressed? 1-3 ‰ Have all the dependency relations between partitions been addressed? 1-3 ‰ Have all the dependency relations between subsystems been addressed? 1-3 ‰ Have all the dependency relations between components been addressed? 1-3 3.1.1.1 Layer X Question Criticality ‰ Does the layer structure accurately represent COTS products? 3 ‰ Have all the layers been represented? 1-3 ‰ If layer contains components, have all the components been addressed and referred to the corresponding diagrams in the hardware classifier model (SSAD 3.1.2), the software classifier model (SSAD 3.1.3), and the deployment model (SSAD 3.1.4) that describe the components and classes that belong in this layer? 1-3 3.1.1.1.1 Partition X Question Criticality ‰ Does the partition structure accurately represent COTS products? 3 ‰ For each layer partitions, has a partition section been generated with its subsystems, if any? 1-3 ‰ If a partition in layer contains components, have all the components been addressed and referred to the corresponding diagrams in the hardware classifier model, the software classifier model , and the deployment model that describe the components and classes that belong in this layer? 1-3 150 3.1.1.1.1.1 Subsystem X Question Criticality ‰ Does the subsystem structure accurately represent COTS products? 3 ‰ Has each subsystem in a partition addressed all its components and referred them to the corresponding diagrams in the hardware classifier model (SSAD 3.1.2), the software classifier model (SSAD 3.1.3), and the deployment model (SSAD 3.1.4) that describe the components and classes that belong in this layer? 1-3 3.1.2 Hardware Classifier Model Question Criticality ‰ At LCO, have all the expected hardware components been described with respect to the expected actors with which the components interact and all the connectors that will be used? 3 ‰ At LCA, have all the selected hardware components been completely described with respect to their selected actors and all the selected connectors? 3 ‰ Have all the actors in each hardware component been described in the system structure (SSAD 2.1)? 1-3 ‰ Are there no software component classifiers described in this section? 2 3.1.3 Software Classifier Model Question Criticality ‰ Does the deployment model accurately represent the deployment of COTS products? 3 ‰ At LCO, have all the expected software components, including COTS components been described with respect to the expected actors with which the components interact and all the connectors that will be used? 3 ‰ At LCA, have all the selected software components been completely described with respect to their selected actors and all the selected connectors? 3 ‰ Have all the actors in each software component been described in the system structure (SSAD 2.1)? 1-3 ‰ Are there no hardware component classifiers described in this section? 2 151 3.1.4 Deployment Model Question Criticality ‰ At LCO, has the expected configuration of hardware and software components, whose classifiers are described in the hardware classifier model and the software classifier model been described? 3 ‰ At LCA has the specified configuration of required hardware and required software components, whose classifiers are described in the hardware classifier model and the software classifier model have been described? 3 ‰ At IOC, has the realized configuration of optional hardware and optional software components, whose classifiers are described in the hardware classifier model and the software classifier model have been described? 3 ‰ Has each hardware component classifier in this section been described in the hardware classifiers model (SSAD 3.1.2) and in the hardware component classifiers (SSAD 3.1.5)? 2 ‰ Has each hardware connector classifier in this section been described in the hardware classifiers model (SSAD 3.1.2) and in the hardware connector classifiers (SSAD 3.1.6)? 2 ‰ Has each software component classifier in this section been described in the software classifiers model (SSAD 3.1.3) and in the software component classifiers (SSAD 3.1.7)? 2 ‰ Has each software connector classifier in this section been described in the software classifiers model (SSAD 3.1.3) and in the software connector classifiers (SSAD 3.1.8)? 2 ‰ Has each actor in this section been described in the system structure (SSAD 2.1)? 2 3.1.5 Hardware Component Classifiers Note: Hardware components for most applications will be relatively straightforward. For more complex hardware configurations and behavior, use a hardware version of the software classifier checklist. 3.1.6 Hardware Connector Classifiers Note: Hardware connectors for most applications will be relatively straightforward. For more complex hardware configurations and behavior, use a hardware version of the software classifier checklist. 152 3.1.7 Software Component Classifiers 3.1.7.1 Component Classifier X 3.1.7.1.1 Purpose 3.1.7.1.2 Interface(s) Question Criticality ‰ Have all the visible features of each component classifier been described with their correct feature set? 1-3 ‰ Has each interface feature described in this section been used in the component’s behavior description (SSAD 3.1.7.1.4)? 2 3.1.7.1.3 Parameters Question Criticality ‰ Have all the parameters of each component classifier been described? 1-3 3.1.7.1.4 Behavior 3.1.7.1.4.1 Processes Question Criticality ‰ Have all the processes of each component classifier been described? 1-3 ‰ Does each component process description identify which other components and actors participate in the process, and which artifacts are inspected, manipulated, or produced by the process? 1-3 ‰ Does each component process description identify at a high level the actions performed? 1-3 ‰ Does each Use-Case description in the component classifier identify all the pre-conditions and all the post-conditions? 1-3 ‰ Has each actor in this section been described in the system structure (SSAD 2.1)? 1-3 ‰ Has each artifact or information classifier used or produced in a process description been defined in the analysis classes for the system’s architecture (SSAD 3.2)? 1-3 ‰ Has each component in a process description been defined in the architecture of the system (SSAD 3.1)? 1-3 ‰ Has each action represented in this section been defined in the component’s interface (SSAD 3.1.7.1.2)? 1-3 153 3.1.7.1.4.2 Modes of Operation Question Criticality ‰ Have all of the state transitions between modes and their defining events been consistently described? 3 ‰ Have all the modes of each component classifier been described? 1-3 3.1.7.1.5 L.O.S. Goals Question Criticality ‰ Have all the level of service goals (OCD 4.4 and SSAD 2.4) and level of service requirements (SSRD 5) been related to component level of service goals? 1-3 3.1.7.1.6 Constraints Question Criticality ‰ Have all the constraints of each component been described? 1-3 ‰ Has each system rule described in this section been described in the proposed system’s rules (SSAD 2.5)? 1-3 ‰ Has each actor described in the rule been described in the system context (SSAD 2.1)? 1-3 ‰ Has each process listed in a constraint been described in the component classifier’s processes (SSAD 3.1.7.1.4.1)? 1-3 ‰ Has each mode listed in this section been described in the component classifier’s modes of operation (SSAD 3.1.7.1.4.5)? 1-3 ‰ Has each artifact or information classifier listed here been described in the architecture’s analysis classes (SSAD 3.2)? 1-3 ‰ For IOC, has each component rule been implemented by one or more elements described in either the component’s internal architecture (SSAD 3.1.7.1.7), or its implementation design (SSAD 4)? 1-3 3.1.7.1.7 Internal Architecture Question Criticality ‰ Are internal architecture descriptions limited to High-priority, complex components? 3 ‰ Is their level of detail risk-driven? 3 154 3.1.8 Software Connector Classifiers 3.1.8.1 Connector Classifier X 3.1.8.1.1 Purpose 3.1.8.1.2 Interface(s) Question Criticality ‰ Have all the visible features of each connector classifier been described with their correct feature set? 1-3 3.1.8.1.3 Parameters Question Criticality ‰ Have all the parameters of each connector classifier been described? 1-3 3.1.8.1.4 Behavior Question Criticality ‰ Are the connector behaviors consistent with the behaviors of the components connected to it? 3 ‰ Have all the behaviors of each connector classifier been described? 1-3 3.1.8.1.5 L.O.S. Goals Question Criticality ‰ Have all the system level of service goals (OCD 4.4 and SSAD 2.4) and level of service requirements (SSRD 5) been related to the connectors’ level of service goals? 1-3 3.1.8.1.6 Constraints Question Criticality ‰ Have all the constraints of each connector classifier been described? 1-3 ‰ Has each system rule described in this section been described in the proposed system’s rule (SSAD 2.5)? 1-3 ‰ Has each actor described in the rule been described in the system context (SSAD 2.1)? 1-3 ‰ Has each artifact or information classifier listed here been described in the architecture’s analysis classes (SSAD 3.2)? 1-3 155 3.1.8.1.7 Internal Architecture Question Criticality ‰ Are internal architecture descriptions limited to High-priority, complex connectors? 3 ‰ Is their level of detail risk-driven? 3 3.1.9 Hardware Components Question Criticality ‰ Is each hardware component X defined in the hardware classifier model (SSAD 3.1.2) described in terms of its hardware classifier? 1-3 ‰ Is each level of service applied to component X defined in terms of its hardware component classifier (SSAD 3.1.5.1.5) level of service description? 1-3 3.1.10 Hardware Connectors Question Criticality ‰ Is each hardware connector X defined in the hardware classifier model (SSAD 3.1.2) described in terms of its hardware classifier? 1-3 ‰ Is each level of service applied to connector X defined in terms of its hardware connector classifier (SSAD 3.1.6.1.5) level of service description? 1-3 3.1.11 Software Components Question Criticality ‰ Is there a model showing how satisfaction to the component and connector level of service goals is combined to ensure satisfaction of the system level of service requirement? 3 ‰ Is each software component X defined in the software classifier model (SSAD 3.1.3) described in terms of its software classifier? 1-3 ‰ Is each level of service applied to component X defined in terms of its software component classifier (SSAD 3.1.7.1.5) level of service description? 1-3 156 3.1.12 Software Connectors Question Criticality ‰ Is each software connector X defined in the software classifier model (SSAD 3.1.2) described in terms of its software classifier? 1-3 ‰ Is each level of service applied to connector X defined in terms of its software connector classifier (SSAD 3.1.8.1.5) level of service description? 1-3 3.2 Analysis Classes Note: High-priority information classes are the classes representing High-priority capability and inferface requirements (SSRD 3, 4), critical forms defined in the current prototype (OCD 5), required information needed for communication components in the architectural structure (SSAD 3.1), required control behavior specific to one or a few processes performed by architectural units and level of service critical classes. Medium-priority information classes are the class representing High- priority capability and inferface requirements (SSRD 3, 4), medium-critical forms defined in the current prototype (OCD 5), desired information needed for communication components in the architectural structure (SSAD 3.1), desired control behavior specific to one or a few processes performed by architectural units and level of service critical classes. Low-priority information classes are the class representing High-priority capability and inferface requirements (SSRD 3, 4), low-impact forms defined in the current prototype (OCD 5), optional information needed for communication components in the architectural structure (SSAD 3.1), optional control behavior specific to one or a few processes performed by architectural units and level of service critical classes. 157 Question Criticality ‰ Have all the High-priority information classes needed to satisfy the requirements been created? 3 ‰ Have all the Medium-priority information classes needed to satisfy the requirements been created? 2 ‰ Have all the Low-priority information classes needed to satisfy the requirements been created? 1 ‰ Are there no classes that are not needed for satisfying to requirements, or specifying the structure (SSAD 3.1) or behavior (SSAD 3.3) of the architecture? 2 ‰ Are COTS application services adequately represented in the class model? 3 ‰ At LCO, have the most critical artifacts and information defined for the system (SSAD 2.2) been represented in the class model? 1-3 ‰ At LCA, have all artifacts and information defined for the system (SSAD 2.2) been represented in the class model? 1-3 ‰ At LCO, have the most critical structure (SSAD 3.1) or behavior (SSAD 3.3) for the architecture been represented in the class model? 1-3 ‰ At LCO, have all the structure (SSAD 3.1) or behavior (SSAD 3.3) for the architecture been represented in the class model? 1-3 ‰ At IOC, has each class been realized in the implementation design (SSAD 4)? 3 158 3.3Behavior Question Criticality ‰ Have all the High-priority processes in SSAD 2.3 been described with corresponding components in the deployment model (SSAD 3.1.4), other processes, which components participate, and instances, which are inspected, manipulated, or produced in the process, in the analysis classes (SSAD 3.2)? 3 ‰ Have all the Medium-priority processes in SSAD 2.3 been described with corresponding components in the deployment model (SSAD 3.1.4), other processes, which components participate, and instances, which are inspected, manipulated, or produced in the process, in the analysis classes (SSAD 3.2)? 2 ‰ Have all the Low-priority processes in SSAD 2.3 been described with corresponding components in the deployment model (SSAD 3.1.4), other processes, which components participate, and instances, which are inspected, manipulated, or produced in the process, in the analysis classes (SSAD 3.2)? 1 ‰ Has each High-priority process identified all the corresponding risks in its Use-Case Realization Description? 3 ‰ Has each Medium-priority process identified all the corresponding risks in its Use-Case Realization Description? 2 ‰ Has each Low-priority process identified all the corresponding risks in its Use-Case Realization Description? 1 ‰ Has each High-priority process identified all the corresponding pre- conditions and post-conditions in its Use-Case Realization Description? 3 ‰ Has each Medium-priority process identified all the corresponding pre- conditions and post-conditions in its Use-Case Realization Description? 2 ‰ Has each Low-priority process identified all the corresponding pre- conditions and post-conditions in its Use-Case Realization Description? 1 3.4 L.O.S., Projected Question Criticality ‰ Have all the High-priority level of services goals (SSAD 2.4) been satisfactorily addressed by the classes’ level of services capabilities? 3 ‰ Have all the Medium-priority level of services goals (SSAD 2.4) been satisfactorily addressed by the classes’ level of services capabilities? 2 ‰ Have all the Low-priority level of services goals (SSAD 2.4) been satisfactorily addressed by the classes’ level of services capabilities? 1 159 3.5 Architectural Styles, Patterns & Frameworks Question Criticality ‰ Have all the strongest candidate architectural styles, patterns and frameworks been identified? 3 ‰ Has each candidate architectural style, pattern and framework been analyzed for benefits, costs and limitations? 3 ‰ Has each benefit and cost been related to the system’s level of service goals (SSAD 2.4) and factored in to the projected level of services for the architecture (SSAD 3.4)? 3 ‰ Are COTS product architectural mismatches identified and satisfactorily addressed? 3 ‰ Are treatments of implementation-level styles, patterns, and frameworks deferred to SSAD 4, or only addressed if they involve high-risk system- level issues? 3 ‰ Has each limitation been captured in the architecture’s structure (SSAD 3.1) and its analysis classes (SSAD 3.2)? 1-3 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Large motion-based pose estimation method
PDF
Large motion-based pose estimation method 
Multi-view image -based rendering and modeling
PDF
Multi-view image -based rendering and modeling 
Multi-level three-dimensional building modeling by integration of aerial and ground view images
PDF
Multi-level three-dimensional building modeling by integration of aerial and ground view images 
Trusted grid and P2P computing with security binding and reputation aggregation
PDF
Trusted grid and P2P computing with security binding and reputation aggregation 
Modeling, rendering and animating human hair
PDF
Modeling, rendering and animating human hair 
Measurement and monitoring in wireless sensor networks
PDF
Measurement and monitoring in wireless sensor networks 
Low-state mechanisms to protect the network from greedy and malicious agents
PDF
Low-state mechanisms to protect the network from greedy and malicious agents 
Web-based remote rendering with image -based rendering acceleration and compression
PDF
Web-based remote rendering with image -based rendering acceleration and compression 
Learning objects, places and relations in a brain model of visual navigation
PDF
Learning objects, places and relations in a brain model of visual navigation 
Modeling the mirror:  Grasp learning and action recognition
PDF
Modeling the mirror: Grasp learning and action recognition 
Analysis of protein -protein interactions using multiple biological data sets
PDF
Analysis of protein -protein interactions using multiple biological data sets 
Synthesis and optimization of application-specific intranets
PDF
Synthesis and optimization of application-specific intranets 
Resource management in large-scale data stream recording architectures
PDF
Resource management in large-scale data stream recording architectures 
Mathematical techniques for optimizing data gathering in wireless sensor networks
PDF
Mathematical techniques for optimizing data gathering in wireless sensor networks 
Semantic mapping using mobile robots
PDF
Semantic mapping using mobile robots 
Nucleotide-based  transport in ultrasonic fields
PDF
Nucleotide-based transport in ultrasonic fields 
Software architectural support for disconnected operation in distributed environments
PDF
Software architectural support for disconnected operation in distributed environments 
Multi-robot formations: Rule-based synthesis and stability analysis
PDF
Multi-robot formations: Rule-based synthesis and stability analysis 
A syntax-based statistical translation model
PDF
A syntax-based statistical translation model 
Promoting student feedback in the classroom
PDF
Promoting student feedback in the classroom 
Action button
Asset Metadata
Creator Lee, Keun (author) 
Core Title Development and evaluation of value -based review (VBR) methods 
Contributor Digitized by ProQuest (provenance) 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,OAI-PMH Harvest 
Language English
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-441626 
Unique identifier UC11336586 
Identifier 3237109.pdf (filename),usctheses-c16-441626 (legacy record id) 
Legacy Identifier 3237109.pdf 
Dmrecord 441626 
Document Type Dissertation 
Rights Lee, Keun 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA