Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Experimental and analytical comparison between pair development and software development with Fagan's inspection
(USC Thesis Other)
Experimental and analytical comparison between pair development and software development with Fagan's inspection
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EXPERIMENTAL AND ANALYTICAL COMPARISON BETWEEN PAIR DEVELOPMENT AND SOFTWARE DEVELOPMENT WITH FAGAN’S INSPECTION by Monvorath Phongpaibul A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2007 Copyright 2007 Monvorath Phongpaibul ii Acknowledgments I do not know where to begin thanking the people involved. My Ph.D. study and this dissertation would not have been successful without the guidance, encouragement and love from the following people: My advisor, Dr. Barry Boehm: He is the best advisor I have ever known. He has never told me exactly what I had to do. Instead he gave me advice to point me in the right direction. His guidance is invaluable. He still looks after me after my graduation. Prof. Robert Neches: His advice from his system engineer and philosophy standpoints made the dissertation even more complete. Prof. Steve Jacobs: He is the one who gave me the opportunity to continue my Ph.D study. He is not only my teacher, my boss and part of my PhD committee, but he is also my wonderful friend. The students in his class called him the “Professor with Heart” because he has the strong intention to see the students succeed. Prof. Bert Steece: Thank you for the advice on statistics. Winsor Brown: Thank you for seeding the idea of this dissertation in my brain. Office –mate: Alex Lam, Jesal Bhuta, Apuva Jain, thank you for advice and suggestions on my presentation. CSSE Friends: Ye Yang, Supannika Koolmanojwong, Yue Chen, Gustavo Perez, Steve Meyers, Di Wu, thank you for your support. iii All people who participated in my experiments: This research could not have published without them. My family and friends in Thailand: Even though, they are living thousands of miles from LA, I can always feel their love, their understanding and their support. My Boyfriend, Jessy Lu and his family: Thank you for giving me a home, comfort and love when I am apart from my family. Without all of you, I could not call LA as my home. Last but not least, USC: The place where I strengthen my knowledge and experience. The place where I met all the wonderful professors and friends. iv Table of Contents Acknowledgments ii List of Tables vii List of Figures x Abstract xii Chapter 1. Introduction 1 1.1. Motivation 1 1.2. Problem Statement 6 1.3. Research Approach 7 Chapter 2. Background 10 2.1. Software Inspection 10 2.1.1. Previous Research 13 2.1.2. Inspection Cost 19 2.2. Pair Programming(PP) 21 2.3. Cost of Software Quality (CoSQ) 27 2.4. Cultural Differences 31 2.4.1. Edward T. Hall’s Model [Hall, 1976] 33 2.4.2. Geert Hofstede’s Model [Hofstede, 2001] 35 Chapter 3. Research Methodology 38 3.1. Subjects 41 3.2. Experiment Design 44 3.2.1. Undergraduate Experiment in Thailand 44 3.2.2. Graduate Experiment in Thailand 45 3.2.3. Industry Experiment in Thailand 47 3.2.4. Directed Research Graduate Experiment in USA 48 3.2.5. Software Engineering Graduate Student Experiment in USA 51 3.3. Data Collection and Data Analysis 55 3.4. Hypotheses 58 3.5. Validity threats and control 60 3.5.1. Non-equality of Team Experiences 60 3.5.2. Non-representativeness of Subjects 60 3.5.3. Non-representativeness of Team Size 61 3.5.4. Non-representativeness of the Size of Project 61 v Chapter 4. Quantitative Results 62 4.1. Undergraduate Experiment Results 62 4.1.1. Total Development Cost 62 4.1.2. Cost of Software Quality (CoSQ) 65 4.1.3. Number of Problems Found by TA 68 4.1.4. Product Quality 72 4.1.5. Final Project Score 73 4.2. Graduate Experiment Results 74 4.2.1. Total Development Cost and Cost of Software Quality 74 4.2.2. Product Quality 76 4.2.3. Final Project Score 77 4.3. Industry Experiment Results 78 4.3.1. Total Development Cost and Cost of Software Quality 79 4.3.2. Product Quality 82 4.4. Directed Research (DR) Graduate Classroom Experiment Results 83 4.4.1. Total Development Cost (TDC) 83 4.4.2. Development Costs per Phase 85 4.4.3. Cost of Software Quality (CoSQ) 89 4.4.4. Product Quality 92 4.5. Software Engineering (SE) Graduate Classroom Experiment Results 93 4.5.1. Total Development Cost and Cost of Software Quality 93 4.5.2. Product Quality 96 4.5.3. Software Development Spending Profile Comparison 97 4.6. Efficiency and Effectiveness Comparison 102 4.6.1. Production Efficiency (PE) 102 4.6.2. CoSQ Efficiency (CE) 103 4.6.3. Effectiveness (E Eff ) 104 4.6.4. Efficiency vs. Effectiveness Comparison 105 4.7. Thailand and USA Experimental Results Comparison 107 4.7.1. Geert Hofstede’s Model Analysis 107 4.7.2. Impact of Cultural Differences 111 4.7.3. The Experimental Comparison 112 4.8. Conclusion 114 4.8.1. Total Development Costs 115 4.8.2. Production Costs 116 4.8.3. Appraisal Costs 117 4.8.4. Rework Costs 118 4.8.5. Product Quality 118 vi Chapter 5. Defect Type Analysis 120 5.1. Defect Type Results from Defects Data 121 5.1.1. Requirement Defects 121 5.1.2. Design/Code Defects 122 5.2. Defect Type Results from Post-survey 123 5.2.1. Requirement Defects 123 5.2.2. Design/Code Defects 124 5.3. Conclusion 125 Chapter 6. Risk-Based Decision Framework 126 6.1. Critical Decision Factors 126 6.2. Risk-based Decision Framework 132 6.2.1. Pair Development Risks 135 6.2.2. Inspection Risks 137 6.3. Example of Risk-based Decision 138 Chapter 7. Conclusion 144 Chapter 8. Future Work 148 8.1. Replicate Experiment in Industry Environment 148 8.2. The Experiment to Compare the Impact of Each Technique in the Maintenance Phase 148 8.3. Simulation of Pair Development Model 148 8.4. Extend COCOMO II to Support Pair Development 149 8.5. Automatic Decision Tool 150 References 151 Appendices 160 Appendix A: Defect Type Definition 160 Requirement Defect Type 160 Design/Code Defect Type 160 Appendix B: Post-Questionnaire 164 vii List of Tables Table 1: Activities Typically Included in Different Types of Peer Review 3 Table 2: Peer Review Objective 4 Table 3: Function of Fagan’s Inspection Roles 11 Table 4: Objective of Fagan’s Inspection Phases [Fagan, 1976] 13 Table 5: Experience Reports from Fagan Inspection 16 Table 6: Error Removal Cost and Effectiveness 18 Table 7: Strengths and Weaknesses of Inspection 18 Table 8: Experience Reports from Pair Programming 23 Table 9: Inspection, Pair Programming and Pair Development Activities 27 Table 10: Inspection, Pair Programming and Pair Development Objectives 27 Table 11: Activities of Cost of Software Quality 30 Table 12: Hofstede’s index scores and ranks for countries from 34 Table 13: Details of the Experiments 40 Table 14: Thai Undergraduate Experiment Schedule 45 Table 15: Thai Graduate Experiment Schedule 46 Table 16: Team’s Average GPA and Experiences 49 Table 17: Directed Research Graduate Experiment Schedule 50 Table 18: Software Engineering Graduate Experiment Schedule 53 Table 19: List of Real-Client Project in Software Engineering Course 54 Table 20: Research Hypotheses 59 Table 21: Results of Total Development Cost from Undergraduate Experiment 64 viii Table 22: Results of CoSQ from Undergraduate Experiment 66 Table 23: Results of Costs Distribution from Undergraduate Experiment 67 Table 24: Results of Number of Problems Found by TA from Undergraduate Experiment 70 Table 25: Results of Product Quality from Undergraduate Experiment 72 Table 26: Results of Project Score from Undergraduate Experiment 73 Table 27: TDC and CoSQ from Graduate Experiment 75 Table 28: Results of Project Score from Graduate Experiment 78 Table 29: Results of TDC and CoSQ from the Industrial Industry Experiment 79 Table 30: Number of Problems found during UAT 83 Table 31: Results of TDC from DR Graduate Classroom Experiment 85 Table 32: Development Costs per Phase from DR Graduate Experiment 87 Table 33: Results of Development Cost per Phase from DR Graduate Experiment 88 Table 34: TDC and CoSQ from DR Graduate Classroom Experiment 90 Table 35: Results of CoSQ from DR Graduate Classroom Experiment 91 Table 36: Results of Un-passed Test Cases from DR Graduate Classroom Experiment 92 Table 37: TDC and CoSQ from SE Graduate Classroom Experiment 94 Table 38: Results of TDC and CoSQ from SE Graduate Classroom Experiment 95 Table 39: Results of LCO and LCA packages from SE Graduate Classroom Experiment 96 Table 40: Country Index Score Comparison 110 Table 41: CoSQ Efficiency Comparison 114 Table 42: Results of Requirement Defects Data Analysis 121 Table 43: Results of Design/ Code Defects Data Analysis 122 ix Table 44: Summary of Critical Decision Factors 127 Table 45: Pair Development Risk Rating for Industry Experiment Project 142 Table 46: Inspection Risk Rating for Industry Experiment Project 143 Table 47: Summary of Quantitative Results 145 x List of Figures Figure 1: Relative Cost of Software Fault Propagation 3 Figure 2: Research Approach 9 Figure 3: Inspection Process 12 Figure 4:Fagan’s Study of Coding Productivity 15 Figure 5: Software Development Spending Profiles 20 Figure 6: Error Generation Rate Effects 21 Figure 7: Pair Development Process 26 Figure 8: Model of Cost of Software Quality 28 Figure 9: Development Cost 29 Figure 10: Cost of Software Quality Balance by Quality Level 31 Figure 11: Research Framework 39 Figure 12: Total Development Cost from Undergraduate Experiment 64 Figure 13: Distribution of Cost as a Percentage Development from Undergraduate Experiment 68 Figure 14: Box Plot of Project Score from Undergraduate Experiment 74 Figure 15: Distribution of Cost as a Percentage of Development from Graduate Experiment 76 Figure 16: Distribution of Cost as a Percentage of Development from 1 st Industry Experiment 81 Figure 17: Total Development Cost from DR Graduate Classroom Experiment 84 Figure 18: Development Cost by Phase 86 Figure 19: Effect of the calendar time 88 Figure 20: Production Cost Profile from SE Graduate Classroom Experiments 98 Figure 21: Cumulative Production Cost from SE Graduate Classroom Experiments 99 xi Figure 22: Appraisal Cost Profile from SE Graduate Classroom Experiment 99 Figure 23: Cumulative Appraisal Costs from SE Graduate Classroom Experiment 100 Figure 24: Rework Costs Profile from SE Graduate Classroom Experiment 101 Figure 25: Cumulative Rework Costs from SE Graduate Classroom Experiment 101 Figure 26: Production Efficiency vs. Effectiveness Comparison 106 Figure 27: CoSQ Efficiency vs. Effectiveness Comparison 107 Figure 28: Total Development Cost Comparison 115 Figure 29: Production Cost Comparison 116 Figure 30: Appraisal Costs Comparison 117 Figure 31: Rework Costs Comparison 118 Figure 32: Product Quality Comparison 119 Figure 33: Requirements Defect Type Analysis from Survey Data 123 Figure 34: Design/ Code Defects Type Analysis from Survey Data 125 Figure 35: Dimensions Affecting Verification Technique Selection 129 Figure 36: Software Market Value-Utility Function [Huang, 2006] 130 Figure 37: Example of Pair Development Scenario 131 Figure 38: Example of Inspection Scenario 131 Figure 39: Risk-Based Decision Framework 133 Figure 40: Industry Experiment Project's Profile 140 Figure 41: Software Development Spending Profiles with PD 149 xii Abstract Peer review is one of the essential activities in software quality assurance since peer reviews can detect and remove defects in the early stages of the software development life cycle. Removing defects early reduces the cost of defect rework later. Selecting a peer review methodology (e.g., inspection, walkthrough, checklist- based, defect-based, function-based, perspective-based, usage-based, value-based) to execute in a software project is difficult. The developers have to understand the commonalities and differences of each methodology. They need to know the relative strengths and weaknesses of these practices. However, very few studies have compared the commonalities and differences of each peer review methodology and none of the studies have shown an empirical comparison between pair programming and software inspection. Software inspection and pair programming are effective verification techniques. Software inspection is one of the best practices in traditional software development while pair programming is one of the best practices in agile development. Numerous studies have shown the success of software inspection in large-scale software development over the past three decades. Although Pair Programming (PP) is a newer approach and less structured, it has had a strong impact on the success of agile software development projects over the past five years This dissertation aims to identify the critical factors that impact the cost- effectiveness of either pair programming/development or inspection and provide the decision framework to help the developers select the most effective technique under xiii given conditions. To compare both techniques, four classroom experiments and one industry experiment were conducted in Thailand and US. The development effort and effect of quality were investigated with some additional calendar time comparisons. 1 Chapter 1. Introduction Quality is free. It's not a gift. What costs money are the un-quality things -- all the actions that involve not doing jobs right the first time. . .Quality is not only free, it is an honest-to-everything profit maker. -- Philip Crosby – 1.1. Motivation Software nonperformance and failure are expensive. The National Institute of Standards and Technology (NIST) reported that while the total sales of U.S. software was approximately $180 billion in 2000, low quality software costs the U.S. economy one third of total sales per year [Research Triangle Institute, 2002]. Reducing the cost of software development and improving software quality are main objectives of the successful software development project. However, in the software industry where time and budget are tight, development teams are more focused on reducing the cost of development. Assuring software quality can be easily neglected since such assurance is usually equated with extra cost. Removing software defects in the early stages of the software development life cycle not only improves software quality but also reduces the development cost and shortens the development life cycle. The costs of fixing defects are lower when the defects are removed as soon as they are injected. According to [Dabney, 2003], as in Figure 1, the cost of fixing a defect from requirement phase can cost 5 times more in design phase, 10 times more in code phase, or 50 times more in test phase and as high as 368 times more after product delivery (in operation phase). For 2 example, if a defect due to conflict among requirements is detected and corrected during the planning or requirements phases, its correction is relatively simple and inexpensive. The teams may simply re-negotiate with key stakeholders. If the same kind of defect is detected at the coding phase, the teams not only re-negotiate but they also must redesign and redevelop that part of the system or even the whole system. Peer review is one of the essential activities in software quality assurance to remove defects at the early stages of development. Peer review can be performed to verify almost all software artifacts in every phase of the software development life cycle such as requirement, design, code and test. According to [Fagan, 1976], software inspection, the most formal structure of peer review, can remove as much as 80% of total defects. The effectiveness of peer review depends on the formality of its process. Peer review methodology varies from formal processes such as software inspection, formal review and walkthrough to less formal processes such as pair programming, peer desk check, buddy check, and ad-hoc review [Wiegers, 2001]. Selecting a peer review methodology (e.g., inspection, walkthrough, checklist-based, defect-based, function based, perspective-based, usage-based, value- based) [Aurum, 2002], [Lee, 2005], [Shull, 2002] to execute in a software project can be difficult. The developers must understand the commonalities and the differences of each method and their relative strengths and weaknesses. They also need to understand the cost effectiveness of each methodology and balance the development cost and product quality to avoid excess overhead work in producing quality product. 3 Requirements Design Code Test Integration Requirements Design Code Test Integration Operation 368 64 37 7 3 130 26 13 3 1 50 10 5 1 10 2 1 5 1 1 0 50 100 150 200 250 300 350 400 Phase Defect Introduced Phase Repaired Relative Cost of Software Fault Propagation Figure 1: Relative Cost of Software Fault Propagation Table 1: Activities Typically Included in Different Types of Peer Review Activity Review Type Planning Preparation Meeting Correction Verification Inspection Yes Yes Yes Yes Yes Team Review Yes Yes Yes Yes No Walkthrough Yes No Yes Yes No Pair Programming Yes No Continuous Yes Yes Peer Deskcheck, Passaround No Yes Possibly Yes No Ad-Hoc Review No No Yes Yes No Very few studies have compared the commonalities and differences of each peer review methodology. In [Wheeler, 1996] and [Wiegers, 2001], the different 4 perspectives of peer reviews are shown. In [Fagan, 1976], [Fagan, 1986], and [Gilb, 1993], the comparisons between software inspection and walkthrough are shown. Wiegers [Wiegers, 2001] compared the activities and objectives of inspection, team review and walkthrough in his book, as shown in Table 1 and Table 2. However, to the best of our knowledge, none of the studies have shown an empirical comparison between pair programming and software inspection. Table 2: Peer Review Objective Review Objective Inspection Team Review Walkthrough Pair Programming Peer Deskcheck Passaround Find product Defects X X X X X X Check conformance to specifications X X X X Check conformance to standards X X X Verify product completeness and correctness X X Assess understandability and maintainability X X X X Demonstrate quality of critical or high-risk components X Collect data for process improvement X X Measure document quality X Educate other team members about the product X X X X X X Reach consensus on an approach X X X Ensure that changes or bug fixes were made correctly X X X X Explore alternative approaches X X Simulate execution of a program X Minimize review cost X 5 Both software inspection and pair programming are effective verification techniques. Software inspection is one of the practices in traditional software development while pair programming is one of the practices in agile development. Numerous studies have shown the success of software inspection in large-scale software development over the past three decades [Ackerman, 1989], [Dion, 1993], [Kelly, 1992], [Myers, 1988], [Russell, 1991], [Weller, 1993]. Since software inspection requires discipline and structure, the cost of achieving quality product for smaller and less critical software may be too high. Although Pair Programming (PP) is a newer approach and less structured, it has had a strong impact on the success of agile software development projects over the past five years [Cockburn, 2000], [Nagappan, 2003], [Nawrocki, 2001], [Mcdowell, 2002], [Muller, 2001], [Succi, 2002], [Williams, 2000,2002,2003]. Many studies have shown the success in delivering quality product within a limited time frame. Agile development people credit PP as a major contributor to the success of agile projects. The motivation behind the study is to investigate the cost-effectiveness of pair programming and to evaluate the ability of pair programming to perform not only in agile development but also in plan-driven development. The study will also evaluate how well using pair development as a peer review technique can increase the agility of plan-driven development. 6 1.2. Problem Statement Although in many ways dissimilar, both practices have the common aim of supporting the development of quality software, with minimal defects, through structured collaboration among developers / reviewers. ... We were very interested in investigating whether pair programming succeeds in its goals of providing the same or improved benefits as inspections, with what cost, and in general whether the two practices were complementary and under what circumstances they each made the most sense. [CEBASE, 2003]. The primary objective of this research is to study the commonalities and differences between software inspection and PP. We want to understand the advantages and disadvantages of each practice in different environments. The goal is to create a decision framework for a developer / project manager in order to decide the most cost-effective practice under specific circumstances. To achieve this objective, three controlled experiments were conducted in Thailand and two in the US: one Thai undergraduate experiment, one Thai graduate experiment, one Thai industry experiments and two US graduate experiment. The studies were analyzed in both qualitative and quantitative aspects to answer the following questions. o What are the differences in the level of product quality between Fagan’s inspection and pair development? o What are the differences in the cost of producing high quality product between Fagan’s inspection and pair development? o What are the factors that influence the success of Fagan’s inspection or pair development? o What are the circumstances in which Fagan’s inspection can perform more cost- effectively? 7 o What are the circumstances, which pair development, can perform in a more cost-effective way? o What types of quality Fagan’s inspection have a positive effect? o What types of quality pair development have a positive effect? o What types of defects is Fagan’s inspection better at detecting? o What types of defects is pair development better at detecting? o What are the risks of using Fagan’s inspection or pair development? o Can we use the results to create the decision framework for developer / project manager to make decisions dynamically? 1.3. Research Approach Figure 2 illustrates the research approach in answering the research question. It started with reviewing the studies on both software inspection and pair programming. The results of this step are described in Chapter 2. To understand the differences between each approach, the comparison of commonalities and differences of both approaches from previous empirical studies were performed. Due to the informality of pair programming, the pair development process and its data collections, which were used in the study, were designed in the next step. The pair development process used was first developed in the Fall of 2003. It started with tailoring the process from Collaborative Software Process (CSP) by Laurie A. Williams [Williams, 2000]. The process covered requirement, design, coding and 8 test phases. However, it also included following pair development with inspection / peer review in the requirement phase. In the Fall of 2003, the pair development process was used by 50 directed research students to maintain the existing USC CodeCount and develop the USC CodeCount for new languages [CSSE-USC, 2006]. The data collection and feedback were collected for process improvement. The improved process was used later by 50 directed research students in the Spring of 2004 and 5 students from the CS577a agile development team in the Fall of 2004. Next, the control experiment was designed and conducted. The pilot experiment was conducted in the Fall of 2004. Ten directed research students participated to develop the “Quality Management Information System (QMIS)”. The objective of this pilot study was to improve the experiment design. To assist the data collection during the experiment, the defect tracking system, QMIS, was developed parallel with experimental design steps. To test the hypotheses, five sets of experiments were conducted in both an academic environment and an industry environment to find the commonalities and differences between software inspection and pair development. Then, the data were analyzed in both qualitative and quantitative aspects. Lastly, the decision framework to facilitate the developer when selecting which peer review technique to be executed was developed. 9 Figure 2: Research Approach 10 Chapter 2. Background In this chapter, the researches related to the study are discussed. Section 2.1 gives the background of inspection and its empirical studies. Section 2.2 provides pair programming background and its previous works. Section 2.3 explains the cost of software quality, which is the key technique to compare cost-effectiveness between inspection and pair development. Section 2.4 discusses the model of cultural differences. 2.1. Software Inspection Software inspection is the most structured process for detecting and removing defects in software artifacts. Software inspection is considered the best practice in software static verification. Described by Fagan in [Wheeler, 1996], “The two main objectives of inspection are (1) to find and fix all defects in the product and (2) to find and fix all defects in the development process that cause product defects.” Most researchers agree that software inspection is the most effective approach of peer review. Although a great deal of “buddy checking” of software artifacts has been done as early as the 1940’s, the formalization of “software inspections” began in the 1970s. Software inspection was originally developed by Michael Fagan at IBM in the early 1970s and called Fagan’s Inspection [Fagan, 1976]. Fagan defined four significant roles in Fagan’s inspection: moderator, author, reader and tester. Each role has a different function. The moderator, who leads the inspection, requires 11 special training and inspection experience to conduct an effective inspection meeting. As suggested by Fagan, the moderator should not be part of the development team. Instead, he/she should come from another similar project. The author is the owner of the artifact being inspected. He/she verifies the understanding of the artifacts and confirms the validity of the reader’s or tester’s defects. The reader paraphrases and interprets the artifact based on his/her understanding. The tester considers testability, traceability and interface of the artifacts. Table 3 describes the function of each role [Wheeler, 1996]. Table 3: Function of Fagan’s Inspection Roles Inspection Roles Function Moderator Chief planner, meeting manager, and reporter Author Developer of work product. May not assume any other role. Reader Present work product Tester Examine for test issues According to Fagan [Fagan, 1976], there are six well-defined phases in the inspection process: planning, overview, preparation, inspection, rework and follow- up. Figure 3 illustrates Fagan’s inspection process. The inspection process starts with the planning phase where the moderator checks the quality of the artifact being inspected against the entry criteria. He/she selects the inspection team and determines the role for each team inspector. The time and place for the overview and the inspection meeting are scheduled in the planning phases. Next, the overview meeting is held to obtain a shared understanding among inspectors. After the overview, each inspector individually reviews and studies the artifacts based on their 12 role. Possible defects are revealed and prepared during individual preparation for discussion at the inspection meeting. Figure 3: Inspection Process During the inspection meeting, all inspectors participate in defect discovery. All potential defects are collected but not corrected during this phase. 13 Recommended by Fagan, inspection meeting should not exceed two hours since the inspector is getting tired after two hours. The inspection may be not as effective as the first two hours. However, if there is not enough time to discuss, the moderator can extend an extra hour called the third hour meeting. Next is the reworking phase where the author fixes the defects and informs the moderator when all the defects are fixed. Lastly, the follow-up is the phase where the moderator reviews corrected defects with the author to ensure that all identified defects are corrected and no new defects are introduced. Table 4 shows the objective of each phase in Fagan’s inspection. Table 4: Objective of Fagan’s Inspection Phases [Fagan, 1976] Phase Objectives Planning Materials to be inspected must meet inspection entry criteria. Arrange the availability of the right participants Arrange suitable meeting place and time. Overview Group education of participants in what is to be inspected. Assign inspection roles to participants. Preparation Participants learn the material and prepare to fulfill their assigned roles. Inspection Find defects. (Solution hunting and discussion of design alternatives is discouraged.) Rework The author reworks all defects. Follow-Up Verification by the inspection moderator or the entire inspection team to assure that all fixes are effective and that no secondary defects have been introduced. 2.1.1. Previous Research Over the last three decades, there have been numerous studies showing the success and benefits of Fagan’s inspection. In 1976, Fagan first reported the benefits of Fagan’s inspection in improving product quality and reducing development cost 14 [Fagan, 1976]. His study showed that by doing design and code inspection, coding productivity is increased 23 percent. Only in the coding phase, the net savings including inspection and rework time in programmer hours per 100 Non- Commentary Source Statements (K.NCSS) were 94 for design inspection, 51 for code inspection and –20 for inspection after unit test. Consequently, the inspection after unit test is no longer in effect. Time for error rework in programmer hours per K.NCSS found in this study was 24 hours for design error and 12 hours for code errors. For quality, the results showed that there is a 38% reduction in defects detected after unit test. Figure 4 depicts the coding productivity from the Fagan study. 15 Figure 4:Fagan’s Study of Coding Productivity 16 Table 5: Experience Reports from Fagan Inspection Company Product Type of Work Products Inspected Reported Results Reference American Telephone & Telegraph (AT&T) Telecom- municati- ons systems Requirements, design, code, test plans Inspections largely attributed to 14% increase in productivity and tenfold increase in quality. 20 times more effective than testing. Fowler 1986 Empirical research shows that functionality and modules with errors per inspection different from the process mean have the highest fault densities. Graden 1986 Hewlett- Packard (HP) N/A Designs, code, test plans, documentation Audit reveals unhealthy inspection process. Systemic problem are discussed Shirey 1992 Code .2 defects detected per hour, 80% of defects unlikely to be detected by other tests Blakely 1991 Bell- Northern Research (BNR) Telecom m- unications systems Code 1 defect detected per man-hour, 150 LOC/hour ideal inspection rate, inspection 20 times more efficient than testing. Russell 1991 Bull HN Information Systems Operating System Requirements, design, code, test plans and cases, documentation 4 person inspection team twice as efficient and effective than 3-person team. Weller 1993 International Business Machines (IBM) Operation System Design and code 23% increase in coding productivity, 38% reduction in defects detected after unit test, 1.1 hours per defect detected. Fagan 1976 ICL Operating System Design Design inspections revealed 40 – 50% of all recorded defects 1.2 – 1.6 hours per defect detected by inspection compared to 8.47 per defect detected by execution testing. Ketchenham 1986 Jet Propulsion Laboratory (JPL) Space systems Requirements, design, code, test plans .5 hours to fix defected defects vs. 5 – 17 using other techniques. Kelly 1992 17 Table 5: Continued Company Product Type of Work Products Inspected Reported Results Reference MEL N/A Design, code 8:1 return on investment, 75 inspections resulted in estimated 7000 hours of time saved. Reeve 1991 Shell Research Geophysi cal software Requirements 1 defect detected per 3 minutes, 30:1 return on investment calculated for inspections. Doolan 1992 The study from Fagan has shown that inspection can identify 60 - 90 percent of all software defects. The studies in [Ackerman, 1989], [Kelly, 1992], [Myers, 1988], [Weller, 1993] also presented positive results for inspection over large-scale software development. In [Dion, 1993] and [Russell, 1991], the studies showed the success of software inspection in reducing the cost of software development by decreasing the amount of rework. Table 5 shows the examples of experience reports from software inspection [Wheeler, 1996]. Boehm [Boehm, 1981] reported the summary of removal cost and effectiveness of code inspection as shown in Table 6. Boehm [Boehm, 1981] also suggested the type of defects which inspection can be able to detect well and the type of defects, which are the weakness of inspection. Table 7 shows the strength and weaknesses of inspection [Boehm, 1981]. 18 Table 6: Error Removal Cost and Effectiveness Source Effort Error Removal Jones, 1977 10-48 DSI/MH 70 % Myers, 1978 10 DSI/MH 38 % Boehm, 1980 20 DSI/MH 89 % Project A 120 DSI/MH 30 DSI/MH 41 % 64 % Crossman 1977 25 DSI/MH 50-60 % Note: DSI/MH: Delivered Source Instructions per Man-Hour In addition to improving product quality and reducing cost of development, inspection also gives indirect benefits when it is implemented correctly [Gilb, 1993]. Inspection facilitates the manager by identify problems early. The manager understands the problems and the payoff for dealing with those problems. Inspection gives early warning of impending problems, which can reduce the shock when things go wrong during the integration test. Moreover, the organization and people also benefit from inspection. Inspection can increase team synergy. During an inspection meeting, the team members are also educated by learning from mistakes. Table 7: Strengths and Weaknesses of Inspection Strengths Weaknesses Simple programming blunders, logic errors Developer blind spots Interface errors Missing portions Specification errors Numerical approximations Program dynamics errors 19 2.1.2. Inspection Cost The Cost of performing inspection has been studied and reported [Boehm, 1987], [Fagan 1986]. The studies showed that there is slightly extra effort during design and coding phase, while significantly reducing the effort of software development during testing and integration. To be able to demonstrate the effect of performing inspection on effort and schedule, Madachy developed a system dynamics model of an inspection-based process [Madachy, 1994]. He integrated the previous system dynamics modeling [Madachy 1996] with the knowledge-based method. By providing expert knowledge, the knowledge-based technique assisted the model to identify the risks and calculate the cost and schedule of a project. He also used the simulation technique to display the effects of inspection practices on cost, schedule and quality throughout the software development lifecycle. Quality of the project refers to the absence of defects during the development lifecycle. The results of the model facilitated the project manager for project planning by showing the effects of changed processes and management policies. The major result of the model is a dynamic comparison of manpower utilization with and without inspections. The results demonstrate the effort during design, coding and testing phase, which corresponds to Fagan’s results. As in Figure 5, inspection increases approximately 10% extra effort in the design and coding phases but reduces testing and integrating effort by about 50%. 20 Figure 5: Software Development Spending Profiles Another result from the model is error generation rates analysis. The effect of varying error generation rates aids the project manager to decide when to use inspection. For example, from Figure 6, we can see that if the defect density of inspection work product is low (lower than 20 defects/KLOC) the inspection take more effort. In this case, the inspection may not be cost-effective. 21 3000 3500 4000 4500 5000 5500 0 20 40 60 80 100 Defects/KLOC Total Effort (Person-days) Without Inspection With Inspection Figure 6: Error Generation Rate Effects 2.2. Pair Programming (PP) Pair programming (PP) is another effective peer review technique. PP is one of the practice areas in Extreme Programming (XP) methodology to improve the quality of the system. XP people consider pair programming as a continuous review. As defined by Laurie Williams: Pair programming is a style of programming in which two programmers work side-by-side at one computer, continuously collaborating on the same design, algorithm, code, or test. One of the pair, called the driver, types at the computer or writes down a design. The other partner, called the navigator, has many jobs. One is to observe the work of the driver looking for defects in the work of the driver. The navigator has a much more objective point of view and is the strategic, long-range thinker. Additionally, the driver and the navigator can brainstorm on-demand at any time. An effective pair programming relationship is very active. The driver and the navigator communicate, if only through utterances, at least every 45 to 60 seconds. Periodically, it’s also very important to switch roles between the driver and the navigator. [Williams, 2002] 22 Although PP is a new approach, there are anecdotes about using pair programming since 1953 [Williams, 2003. Fred Brooks, author of “The Mythical Man Month”, claimed that he used pair programming to produce 1500 lines of defect-free code in 1953 to 1956. Dick Gabriel, who conceived of Common Lisp and introduced the concepts of software patterns and software pattern languages, reported the practice of pair programming in the 1970s at the M.I.T. Artificial Intelligence Laboratory and the use of pair programming in 1984 to implement Common Lisp in nine months. In the early 1980s, Larry Constantine reported the observation of “Dynamic Duos” at Whitesmiths Ltd.,. Constantine stated that code was produced faster and with fewer defects than ever. The pair benefit from the thinking of two bright minds and the steady dialog between two trusted programmers [Constantine, 1995]. In 1995, James Copline, one of the most influential people in the software patterns movement, published the “Developing in Pair” Organization Pattern. This research was based on the findings of 50 highly effective software development organizations [Copline, 1995]. A number of empirical data shows that pair programming improves the quality of the product, reduces the time spent in the development life cycle, and increases the enjoyment of developers [Ackerman, 1989], [Dion, 1993], [Kelly, 1992], [Myers, 1988], [Russell, 1991], [Weller, 1993]. Research in Pair Programming was started in 1998 by Nosek [Nosek, 1998]. Nosek compared pair programming with solo programming. His study reported that the pairs produce better quality code than the individual with less elapsed time. The pair programmers spent 43% more total time on the task than the solo programmers. However, the 23 pairs completed the task in 29% less time than the solos. The average completion time of the pairs was 30 minutes while the average completion time of the solos was 42 minutes. Table 8: Experience Reports from Pair Programming Study Effort Schedule Defect Rate Satisfaction Baseline for Comparison Length (Hrs.) Nosek 1998 + 43% - 29% N/A High Experiment vs. non-pairs 0.75 Arisholm 2002 + 96% - 2% -30% N/A Experiment vs. non-pairs 1 Nawrocki 2001 + 82% - 9% N/A N/A Experiment vs. non-pairs 1 – 4 Rostaher 2002 + 98% - 1% N/A High Experiment: mixed PP-test 6 Williams 2000 + 15% - 43% -60% High Experiment vs. non-pairs > 10 Baheti 2002 + 2% - 49% N/A N/A Experiment: collocated 12 – 16 Baheti 2002 + 14% - 43% N/A N/A Experiment vs. non-pairs 12 – 16 Ciolkowski 2002 + 9% - 46% N/A High Experiment vs. non-pairs 14 24 In 1999, Williams conducted a classroom experiment to investigate the benefit of pair programming [Williams, 2002, 2003]. Her study reported similar results to Nosek that the pairs produce the better quality code than the individual with less elapsed time. The results showed that the pair spent approximately 15% more working hours to complete their assignments. However, the final artifacts of the pair had 15% fewer defects than artifacts done by an individual. Cockburn and Williams [Cockburn, 2000] also reported that more than 90% of the developers enjoyed the work and were more confident in their work because of pair programming. Table 8 shows the summary of experience reports from PP in small modules [Boehm, 2003]. The four experiences at the bottom of Table 8 show the relatively large schedule reduction with little extra effort. However, the four experiences at the top of Table 8 show relatively little improvement due to PP. This difference is because PP needs the pair-jelling period. For the pairs to work effectively, they need at least 10 hours to synchronize. After the pair already jell, PP can be effective as inspection to remove about 60 percent of the defects with 10 to 15 percent of extra effort [Boehm, 2003]. However, because the risk of getting products to market is high, reducing 45% of the schedule by using PP is valuable for the product. In [Wernick, 2004], the source suggests that pair programming practices might successfully be applied in a traditional software development process. Boehm and Turner [Boehm, 2003] also recommended using PP to balance the agility and the discipline of the development process. Thus, our study adopted pair programming as a peer review process for a traditional (plan-driven) software development process 25 which we call “Pair Development”. We used conventional software development process such as “Rational Unified Process (RUP)”, and iterative waterfall as a model of traditional software development for this study [Kruchten, 2003]. Pair programming was executed in the development of almost every artifact during the development life cycle such as project plan, vision document, system requirement definition, system architecture definition, source code, test plan and test cases. Pair development was not applied for eliciting requirements. Pair development, in this study, is composed of three phases as shown in Figure 7. The first phase is the preparation. A pair studies the entry criteria and goes through the checklist that they will use later during pair execution to find defects. For this phase, it is not required to work in pairs. The second phase is the planning. The pair discusses how the problem can be solved and what communication protocol(s) they are using during pair execution. They also list the task(s) that they want to achieve during pair execution and create the plan when roles will be switched. The purpose of this phase is to plan for the execution task. Next, the pair executes the task. A driver controls the keyboard or the mouse to produce the artifacts. A navigator actively examines the driver’s work while identifying the defects, which are recorded in the defects recording log. The pair brainstorms during their collaboration. During pair execution, the pair switches roles according to the plan. The driver becomes the navigator and the navigator becomes the driver. After the execution, the pair reviews the artifacts against the exit criteria to assure the quality of the product. In this phase, the pair can request the development team to inspect/review their artifacts to increase the level of quality. After each phase, the 26 pair records the effort in a time recording log. For process improvement purpose, developers can use the defect list to create/update defect checklist for the next pair session. Figure 7: Pair Development Process By developing the pair development process, the process of pair programming and inspection are more comparable, as illustrated in Table 8 and Table 9. Pair development adds the planning phase to increase the understanding of entry criteria such as requirement specification and design specification before the pair development actually starts. It also provides the data collection such as time recording log and defect recording log for the purpose of process and product measurement. 27 Table 9: Inspection, Pair Programming and Pair Development Activities Activity Review Type Planning Preparation Meeting Correction Verification Inspection Yes Yes Yes Yes Yes Pair Programming Yes No Continuous Yes Yes Pair Development Yes Yes Continuous Yes Yes Table 10: Inspection, Pair Programming and Pair Development Objectives Review Objective Inspection Pair Programming Pair Development Find product Defects X X X Check conformance to specifications X X Check conformance to standards X X Verify product completeness and correctness X X X Assess understandability and maintainability X X X Demonstrate quality of critical or high-risk components X Collect data for process improvement X X Measure document quality X X Educate other team members about the product X X X Reach consensus on an approach X X Ensure that changes or bug fixes were made correctly X X Explore alternative approaches X X Simulate execution of a program Minimize review cost 2.3. Cost of Software Quality (CoSQ) Cost of quality (CoQ), a cost analysis technique, is commonly used in manufacturing to present the economic trade-offs involved in delivering quality 28 products. Basically, it is the framework used to discuss how much good quality and poor quality costs. It is an accounting technique that has been adapted to software development. We call this technique “Cost of Software Quality (CoSQ). In this study, we used CoSQ to evaluate the cost of executing software inspection versus the cost of executing pair development. Figure 8: Model of Cost of Software Quality By definition [Slaughter, 1998], CoSQ is a measure of the costs specifically associated with the non-achievement of software product quality encompassing all requirements established by the company, its customer contracts, and society. Figure 8 presents the model of Cost of Software Quality (CoSQ) [Krasner, 1998]. Total Development Cost (TDC) is the cost that the team spends on producing the system called production cost and cost that the team spent on assuring the quality of the 29 system called CoSQ. CoSQ is composed of four categories of cost: prevention costs, appraisal costs, internal failures costs and external failures costs. quality TDC production C C C rework appraisal prevention quality C C C C Figure 9: Development Cost Prevention costs and appraisal costs are “conformance costs”, which are amounts spent on conforming to requirements. Prevention costs are costs related to the amount or effort needed to prevent the defects before they happen. Examples of prevention costs are cost of training, cost spent on process improvement, data collection costs, and cost of defining standards. Appraisal costs include expenses associated with the cost of assuring conformance to quality standards. Examples of appraisal costs are inspection or peer review costs, auditing products, and costs of testing. Internal failure costs and external failure costs are “non-conformance costs” or “rework costs”. They include all expenses incurred when things go wrong. Internal failures occur before the product is delivered or shipped to the customer. In software development, internal failure costs can be the cost of rework defects, re- inspection, re-review, and re-testing. External failure arises after the product is delivered or shipped to the customer. External failure costs include post release technical support costs, maintenance costs, recall costs, and litigation costs. In our 30 study, only efforts of developers are taken into account in our CoSQ analysis. Table 11 shows examples of activities for each category [Krasner, 1998]. Table 11: Activities of Cost of Software Quality Conformance Costs Non-Conformance Costs Prevention Costs Appraisal Costs Internal Failure Costs External Failure Costs Prototyping User requirement reviews Quality planning Training Reuse library Process improvements Metrics collection and analysis Quality standards Inspection / peer review Testing Software quality assurance V&V activities Quality audits Field performance trails Fixing defects Corrective rework Re-inspection, re-review, re- testing Updating documents Integration Technical support for defect complaints Maintenance and release Upgrade due to defect Penalties Product returned / recalled due to defects Service agreements claims Warrantee work According to Gryna [Gryna, 1988], the cost of achieving quality and costs due to lack of quality have an inverse relationship as depicted in Figure 10 [Galin, 2004]. The increase of prevention costs and appraisal costs reduces the failure cost exponentially and vice versa. As a result, the level of quality also impacts by the changes of prevention costs and appraisal cost. This relationship yields a minimal total cost of software quality [Galin, 2004]. The experience reports showed that the software industry also has similar trend (more investment in achieving quality decreases total cost of quality) [Dion, 1993]. 31 Figure 10: Cost of Software Quality Balance by Quality Level 2.4. Cultural Differences Techniques that work well in one culture may not work well in another culture. Fagan’s Inspection and Pair Programming have had success in the US; success in Thai culture is uninvestigated. Consequently, it is important to consider the impact of cultural differences on the results of this study. Empirical studies have shown that cultural differences impact the adoption of a software process in each culture [Jirachiefpattana, 1996],[Phongpaibul, 2005],[Thanasankit, 1999, 2000]. In 1999, Mr. Theerasak Thanasankit addressed the problem of these cultural differences in requirements engineering process in Thailand [Thanasankit, 1999, 2000]. Theerasak’s study used a case study of a Thai software house to reconstruct the management process used in requirements engineering. His paper showed the 32 relationship between the Thai culture and the requirements engineering process, developed by Loucopolous and Karakostas. Thanasankit described that in Thailand the requirements engineering process apparently lacks iterations and order, compared to Loucopolous and Karakostas’s requirements engineering process. Moreover, Thai management of the requirements engineering processes is based on either the relationship with the clients or the power of each stakeholder. In [Phongpaibul, 2005], the impact of cultural differences between the United States and Thailand in the domain of applying software process models and improvements is described. Most current software process models or improvements are developed and provided by either the US or European standards committee such as the SEI at Carnegie Mellon University which provides the CMM and CMMI Level frameworks. These models are generally tailored for western cultures. The paper asserted that cultural differences are a key factor to problems of adopting software process models and improvement in Thai companies. In attempts to understand human behavior in different societies, the model of cultural differences has been extensively studied in social science. Edward T. Hall and Geert Hofstede are two prominent researchers whose models have been accepted by many researchers in this area. Hall [Hall, 1976] has introduced the way cultures discriminate by looking at language and time. He also distinguished cultures between spaces of individuals, named proxemics. However, this dimension is not useful in software development organization. Hofstede [Hofstede, 1997, 2001] studied the differences in thinking and social actions that exist amongst inhabitants of the countries studied. At first, Hofstede revealed four main cultural dimensions, 33 namely power distance, uncertainty avoidance, individualism and masculinity, when he conducted the research at IBM in 53 countries. Later he applied the cultural dimensions to unrelated IBM populations, and added long-term versus short-term orientation as the fifth dimension. Table 12 shows the results of Hofstede study [Hofstede, 2001]. 2.4.1. Edward T. Hall’s Model [Hall, 1976] 2.4.1.1.Monochronic (M-time) versus Polychronic (P-time) Time From Hall’s Model [1], Monochronic (M-time) and Polychronic (P-time) are the distinction of culture by time. According to Hall, “M-time is one-thing-at-a- time”. M-time people are committed to the job and take time commitments seriously. They are concerned about not disturbing others and dislike interruptions when they are working. “Many-things-at-a-time”, P-time members do not adhere to schedules and appointments. The appointment can be postponed easily since they can be involved with many people and many things at the same time. Hall’s study showed that Americans are mostly monochromic. French and Thai are polychronic. 2.4.1.2.High-Low Context Languages High-Low context languages refer to the amount of information conveyed during communication including voice, gestures and facial expression. "High context transactions feature pre-programmed information that is in the receiver and in the setting, with only minimal information in the transmitted message. Low 34 context transactions are the reverse. Most of the information must be in the transmitted message in order to make up for what is missing in the context.” [Hall, 1976] According to Hall, Thailand is one of the high context language countries. In [Hall, 1976], Hall showed an example of a difference between context in two countries, American and French. “American contracts are about 10 times longer than French contracts. Americans like to have a lot of content. The French do not care as much, as long as the context of the agreement is understood. The French are a high context culture; America is low context / high content. Table 12: Hofstede’s index scores and ranks for countries from Country Power Distance Uncertainty Avoidance Individua- lism Masculinity Long term Orientation Index Rank Index Rank Index Rank Index Rank Index Rank US 40 38 46 43 91 1 62 15 29 27 Thailand 64 21- 23 64 30 20 39- 41 34 44 56 8 China n/a n/a n/a n/a n/a n/a n/a n/a 118 1 Japan 54 33 92 7 46 22- 23 95 1 80 4 India 77 10- 11 40 45 48 21 56 20- 21 61 7 Canada 39 39 48 41- 42 80 4-5 52 24 23 30 Australia 36 41 51 37 90 2 61 16 31 22- 24 35 2.4.2. Geert Hofstede’s Model [Hofstede, 2001] 2.4.2.1.Power Distance Power distance is the degree of inequality in prestige, wealth and power. Different societies tend to weight these areas differently. Power distance can determine the degree of dominance between boss and subordinate. Hofstede describes the concept of power distance in [Hofstede, 2001] as “the relationship between a boss B and a subordinate S. Power distance is a measure of the interpersonal power or influence between B and S as perceived by the less powerful of the two, S”. In high power distance society, subordinates are dominated and depend on their bosses. Leaders are expected to be authoritative and subordinates are expected to be told what to do. In [Hofstede, 2001], Hofstede says that the Thai culture is one with high power distance, which tells us that the power distance gap between bosses and subordinates is large. In Thailand, the bosses dominate subordinates and subordinates respect the bosses meaning the subordinates will generally not contradict the bosses directly. In US culture, which has a lower power distance, the bosses and subordinates relationship tends to be more equal. The bosses respect the subordinate’s opinions and subordinates do not hesitate to disagree with the decisions of their bosses. 36 2.4.2.2.Uncertainty Avoidance Uncertainty Avoidance is the degree to which people in societies tolerate uncertain situations. It is not risk avoidance but rather, how one deals with ambiguity. In the high uncertainty avoidance societies, members of societies are likely to fear ambiguous situations and try to avoid them by establishing formal rules. Hofstede states in his book that the different societies have adapted to uncertainty in different ways [Hofstede, 2001]. Ways to cope with uncertain futures belong to technologies, rules and religion. For example, many religions believe in life after death, which helps the believers to face uncertainties in their life. From [Jirachiefpattana, 1996], Thailand has a high uncertainty avoidance society. As a result, Thai people accept technologies and its abilities more easily than most other developing countries to prevent ambiguous environments such as political and economic disruption. Furthermore, Thai employees tend to be loyal to their employer and have a long duration of employment to avoid uncertainties in the work place. 2.4.2.3.Individualism and Collectivism Hofstede labeled individualism as a third dimension of his model. Individualism describes the relationship between the individual and the cohesive in- group. The societies with strong individualism believe that an individual is important. Members in this society are self-oriented. They make decisions based on their achievements and respect their private lives. In contrast, collectivism relies on a group structure and decisions are made as a group. Hiring and promotion are not 37 based on individual skills. As studied by Hofstede [Hofstede, 1997, 2001], US is the first individualism country in contrast to Thailand, which is much more collective. 2.4.2.4.Masculinity and Femininity Masculinity and femininity indicate the extent to which a culture favors dominance, assertiveness, achievement and acquisition of wealth versus a culture that favors people, social supports and quality of life. Masculine culture tends to distinguish expectations of male and female roles in society. On the contrary, the roles of male and female are equal in feminine culture. According to [Hofstede, 2001], the US is ranked fifteenth out of 53 countries in masculine countries and Thailand is ranked tenth in feminine countries. 2.4.2.5.Long-term Orientation The last culture dimension from Hofstede’s model is called long-term orientation. Long-term oriented behavior refers to the people whose values are oriented towards long-term commitments. People in this society are planning for the future. If they work in a company, they tend to expect a long-term reward. Table 12 shows that Thailand is a long-term oriented country contrary to the US, which is a short-term oriented country. 38 Chapter 3. Research Methodology In this chapter, the research methodology and research design are discussed. Section 3.1 explains the subjects of each experiment. Section 3.2 illustrates the experimental design. Section 3.4 discussed the data collection and analysis. Section 3.4 shows the hypotheses. To understand the differences between software inspection and pair development (PD), the control experiments were design and conducted. The research framework on pair programming as illustrated in Figure 11 was used as a framework to design the experiment [Gallis, 2003]. All of the experiments compared software development with Fagan’s inspection and PD. The Fagan’s inspection teams were the control group and the PD teams were the experimental group. For each experiment, the finding is different depending on the situation and environment of the participants. Table 13 summarizes the details of each experiment. All context variables were controlled for both control group and experiment group. 39 Figure 11: Research Framework 40 Table 13: Details of the Experiments Exp # Subject (Sample size) Size/ Complexity of Project / Length Application Technology / Development Process Dependent Variables E1 Thai undergraduate students (95 students; 8 inspection team: 9 pair development team) Small / Low / 12 Weeks Resource Access Control System - Java - JavaScript - File Management - Rational Unified Process (RUP) - Cost - Quality E2 Thai graduate students (35 students; 2 inspection team: 3 pair development team) Medium / Medium / 10 Weeks “Special Topic” Information Management System - Java - JavaScript - JSP - MySQL - Rational Unified Process (RUP) - Cost - Quality E3 Thai professional (9 developers; 1 inspection team: 1 pair development team) Medium / High/ 14 Weeks Business Application on Web Portal - Java - Java Servlet - MySQL - C# - Organization Process - Cost - Quality E4 USC Computer Science graduate students (56 students; 7 inspection team: 7 pair development team) Small/ Low/ 13 Weeks CodeCount for Visual Basic - C/C++ - Iterative Waterfall - Cost - Quality - Defect Type - Cultural Differences E5 USC Software Engineering graduate students (126 Students; 10 inspection team: 11 pair development team) Medium/ Medium/ 13 Weeks 21 different E-service application - LeanMBASE - Cost - Quality - Defect Type - Cultural Differences 41 3.1. Subjects Three sets of experiments were conducted in Thailand and two sets of experiments were conducted in USA. Thailand experiments were composed of one undergraduate classroom experiment, one graduate classroom experiment and one industry experiment. The undergraduate classroom experiment was conducted at Thammasat University (TU), Thailand, during the first semester of the 2005 academic year (June – September 2005). All the student participants were computer science majors. The Thai graduate classroom experiment was also conducted at TU at the same time as the undergraduate experiment. All of participants were master’s degree students in computer science. Fourteen students were full-time students and twenty-eight students were part-time students who work full time. All of the students have had industry experience in software development. Hence, the graduate students can be the representatives of the industry developers. The Thai industry experiment was conducted at a multi-international company located in Thailand. The experiment was conducted from December 2004 to April 2005. The post- experiment such as collecting defects or system failures after system delivery and post-interview was conducted from June 2005 to September 2005. The participants in both classroom experiments were 95 Thai undergraduate students taking a software engineering class and 35 Thai graduate students taking an advanced software engineering class. The experiment was part of a team project, which was a mandatory part of the class. The class included 3 hours of lecture per 42 week. In addition to the lectures, student subjects were trained in both inspection and pair development techniques. All of the students in the undergraduate experiment were university juniors who do not have industry experience. However, all of them have experience in developing software from previous computer science classes. To substitute for development experience, we provided an extensive weekly lab following the lecture. For example, if the instructor lectures about software design this week, there will be a software design lab in the same week. To assist the student during defect detection, design and code checklists were generated and used during inspection and pair development. The subjects in the third experiment were nine Thai developers from a multi- international company located in Thailand. Nine developers composed of one project manager, two designers and six programmers. The company business is creating a Thai web portal. Even though it is a multi-international organization, only 5% of the developers are foreigners. There are fifty developers in the IT department. The experiment was part of promoting the quality assurance program in the department. All the participants had positive attitudes about the experiment because they would like to establish the standard V&V process for their departments. Experiments in USA were two graduate classroom experiments at University of Southern California (USC). The experiments were conducted as part of a directed research course or a software engineering course at USC. Both experiments were conducted in Fall 2006 (August – December 2006). The participants from the directed research course were 56 graduate students in computer science. The 43 experiment was part of a team project, which was the main part of the course. The project’s objective was to provide additional features to an existing system. In the first month of the course, the students are required to participate in 2 hours weekly meeting where they learned how to use the existing system, how the system was designed and learned what new features they had to develop. In addition, the students were trained in how to perform their verification technique, either inspection or pair development. All students were informed about the experiment at the beginning of the course. For the experiment in the Software Engineering course, the participants were 126 graduate students in either computer science or software engineering majors. The experiment was part of a team based, real-client project. There were 21 different projects. A list of the projects is shown in Table 19. The primary sources of project were USC library information services division, USC bookstores, California Science Center and Center for System and Software Engineering’s affiliates. Most of the projects were web-based applications. For all of the classroom experiments, all students were informed about the experiment. To avoid bias, we clarified to the students that the objective of the experiment was not to explore which technique was better, but to understand the differences between both techniques. All students were informed that the number of defects found during the project and effort spent on developing the project, were not part of the grading criteria. We based the grading on quality of delivered product and compliance process. 44 3.2. Experiment Design 3.2.1. Undergraduate Experiment in Thailand Students were divided into 5-person team based on their GPA. Students were required to work in teams to develop “TU research resources access control system” as a class project. “TU research resources access control system” is the system that checked the authentication of the users who want to access the research resources (e.g. research article, research data) from the computer science department. It provides the basic capabilities such as logging-in and privilege checking for restricted areas. Moreover, the system protects from the hacker by spoofing the suspicious user. Teams used Rational Unified Process (RUP) [Kruchten, 2003] as a development process and used either inspection or pair development with checklists as the V&V technique. The experiment was conducted over a period of 12 weeks. The development life cycle was composed of 5 phases: 3-week inception, 4-week elaboration, 2-week construction #1, 2-week construction #2 and 1-week transition. At the end of each phase, students submitted the package for grading. Table 14 shows the experiment schedule and list of artifacts in each package. In addition to the project package, the students were required to submit the weekly report. The weekly report was composed of project planning, progress report, and individual effort report. There were a total of nineteen teams. Ten teams were randomly assigned to the pair development group and nine teams were randomly assigned to the inspection 45 group. One team from the pair development group and one team from the inspection group dropped in the middle of the semester due to the heavy workload of the class. During the experiment, one team from the pair development group and one team from the inspection group did not follow both RUP and assigned V&V technique. As a result, we dropped them from our study. At the end of the experiment, one team from the pair development group could not complete the system using pair development due to pair incompatibility among the members in the teams. We also dropped them from our analysis. Table 14: Thai Undergraduate Experiment Schedule Phase Schedule Package Due Artifacts Due Inception Jun 21 – Jul 11 Jul 12 Vision Document, System Requirement Definition, Inspection / Pair Development Report Elaboration Jul 12 – Aug 8 Aug 9 Vision Document, System Requirement Definition, System Architect Definition, Inspection / Pair Development Report Construction #1 Aug 9 – Aug 22 Aug 23 System Architect Definition, Source Code, Inspection / Pair Development Report Construction #2 Aug 23 – Sep 5 Sep 6 System Architect Definition, Source Code, Test cases, Inspection / Pair Development Report Transition Sep 6 – Sep 14 Sep 15 Final Delivery 3.2.2. Graduate Experiment in Thailand Thirty-five Students were divided into 7-person team based on their undergraduate GPA and work experience. The full-time students and part-time students were not allowed to be on the same team. Students were required to work in teams to develop “TU Special Topic management system” as a class project. 46 “Special topic” is the name of the senior project for undergraduate students in TU. The system is a web-based application to manage the information related to the special topic such as student information and professor information. The main capability of the system is automatic scheduling. After professors and students enter their available schedule, the system allocates the presentation time. The system also provides the grade calculation and grade report. Another capability is the calculation of workload and wage of the professor who served as project advisors or on project committees. Table 15: Thai Graduate Experiment Schedule Phase Schedule Package Due Artifacts Due Inception Jun 19 – Jul 9 Jul 10 Vision Document, System Requirement Definition, Inspection / Pair Development Report, Quality Plan Elaboration Jul 10 – Aug 6 Aug 9 Vision Document, System Requirement Definition, System Architect Definition, Inspection / Pair Development Report, Quality Plan Construction Aug 7 – Aug 28 Aug 28 System Architect Definition, Source Code, Inspection / Pair Development Report, Quality Plan Like the undergraduate experiment, the teams used Rational Unified Process (RUP) [Kruchten, 2003] as a development process and used either inspection or pair development with RUP checklists as the V&V technique. Since the semester for the graduate class is shorter than the undergraduate class, the experiment was conducted over a period of 10 weeks. The development life cycle was composed of 3 phases: 3-week inception, 4-week elaboration and 3-week construction. At the end of each 47 phase, students submitted the package for grading. Table 15 shows the experiment schedule and list of artifacts in each package. In addition to project package, the students were required to submit a weekly report. The weekly report was composed of project planning, a progress report, and an individual effort report. Since most of the students were part time students, they only came to school twice a week and on weekends. The experiment for the whole system was not possible to conduct due to time constraints. We asked students to prioritize their requirements and set up their quality plan. We did not require students to complete the system for this class but one high priority use case had to be fully developed. The inspection or PD was required during the design for all of the high and medium priority use cases and during coding for the high priority use cases. 3.2.3. Industry Experiment in Thailand Eight developers were divided into 2 teams and one project manager managed both teams. The pre-tests were distributed prior to the experiment to help team formation. Both teams used the company process standard as a development process and used either inspection or pair development with organization’s checklists as the V&V technique. The pre-test was composed of questions about application domain, design notation, data base design, mySQL, Java, and Java Servlet. The project manager selected two similar projects (considering the domain, technology, and size of project) for the study. Both teams developed different but similar web applications. Both applications were composed of registration, web service, and 48 SMS modules. The details of the applications and development process cannot be described in this thesis due to a confidential agreement. 3.2.4. Directed Research Graduate Experiment in USA 56 graduate students were divided into 4-person team. To avoid schedule clashes, we allowed students to set up their own team with other students who had compatible schedules. Otherwise, by randomizing teams, this increases the probability of teams having schedule mismatches. With a schedule clash, teams would not be able to meet and thus no work would get done. There were a total of 14 teams. Seven teams were randomly assigned to pair development group (PD group) and seven teams were randomly assigned to inspection group (FI group). After validating the data, we dropped five teams from our experiment due to three main reasons: invalid data, outlier data or the team violated academic integrity. At the beginning of semester, the students are required to fill out a background survey to measure their experiences. Table 16 summarizes the average GPA and experiences from the teams. The average GPA from the pair development group is 3.37 and the average GPA from the inspection group is 3.45. The average level of experience is measured by the number of years the students have been working in the industry. The average industry experience in the pair development group is 0.85 years and the average industry experience in the inspection group is 0.81 years. There is one team from pair development group that had the lowest GPA of the teams and another team from inspection group that had 49 the highest GPA but have no industry experience. We initially thought these teams would be outlier data points. However, since these two teams did not perform either the best or the worst in the experiment, we did not drop them from the experiment. In addition, the experiments required C and C++ knowledge. The average level of C and C++ knowledge is measured by the familiarity of the language. The students rated themselves based on a scale from 1 (never heard of it) to 10 (expert). All of the students rated themselves from 7 to 9, thus all students had a similar background in C/C++. We can conclude that there is no difference between the level of knowledge and experiences between two groups. Table 16: Team’s Average GPA and Experiences Team # Average GPA Average years of experience Average level of C and C++ knowledge Pair Development Group P1 3.19 0.5 7.25 P2 3.3 0.75 7.25 P3 3.5 0.25 7.75 P4 3.44 1.25 8.25 P5 3.43 1.5 7.25 Inspection Group I1 3.31 0.5 7.5 I2 3.44 1.25 7.5 I3 3.44 1.5 7.5 I4 3.6 0 7.75 P-Value 0.4172 0.923 0.967 Students were required to work in teams to develop “CodeCount for Visual Basic (VB CodeCount)”. VB CodeCount is the new CodeCount tool, which will be added to the USC CodeCount toolset. The USC CodeCount toolset is a collection of tools designed to automate the collection of source code sizing information [CSSE- 50 USC, 2006]. The USC CodeCount toolset spans multiple programming languages such as Ada, JavaScript, C and C++, Perl, and SQL. It provides the information on two possible Source Lines of Code (SLOC): physical SLOC and logical SLOC. USC CodeCount Toolset is the tool that our center provides to the affiliates from the aerospace and defense industries to use in their projects. Table 17: Directed Research Graduate Experiment Schedule Phase Schedule Major Activities Meeting / Training Aug 23 – Sep 12 Meeting, team formation, training on either inspection or pair development, training on CodeCount suite Requirement Sep 13 – Sep 26 Identify requirement, develop share vision, develop use case specification, plan project, verify major document Design Sep 27 – Oct 10 Define VB physical and logical SLOC definitions, VB keyword list, design the system, verify major document Implementation Oct 11 - Nov 14 Implementation, code verification, unit test Testing Nov 15 – Dec 13 System test, test case generation, verify test cases Delivery Dec 13 Final Delivery UAT Dec 14 – Dec 21 User Acceptance Test (UAT) The physical SLOC is programming language syntax independent, which enables it to collect other useful information such as comments, blank lines and overall size, all independent of information content. The logical SLOC definition depends on the programming language. The logical SLOC definition is compatible with the SEI’s Code Counting Standard [Humphrey, 1995]. In our experiment, the teams were using SEI’s Code Counting Standard as the template to develop the 51 physical and logical SLOC definitions. The students were required to develop the VB CodeCount in the C/C++ language and to follow the USC CodeCount’s architecture. To avoid the threat of validity due to the knowledge of development process, all teams were required to follow the course schedule. Table 17 shows the experiment schedule and major activities in each phase. However, there were some teams that deviated from the plan since they had to go back and rework their artifacts in the previous phase. The experiment was conducted over a period of 13 weeks (exclude training and UAT phases). Development life cycle composed of 4 phases: 2-week requirement, 2-week design and 5-week implementation (breaking into 2 iterations) and 4-week testing. Every other week the teams were required to meet with the instructor to track their progress. At the end of each phase, the teams were required to meet with the instructor to review the major artifacts in each phase. If there are defects in the artifacts, the teams were required to fix the defects before they could enter the next phase. After the deliverable, the instructor generated the test cases for UAT phase. The final products from every team were tested with these test cases and the results were recorded to compare the level of quality. 3.2.5. Software Engineering Graduate Student Experiment in USA The student participants were composed of 126 graduate students who were taking a software engineering course. The Software Engineering course at USC is 2- 52 semester, team based, real-client project course. The experiment was conducted during the first semester of the course. The participants were divided into 6-persons team to develop the project using our process guideline called Lean Model Based [System] Architecting and Software Engineering (LeanMBASE) [Boehm, 1998a], [Boehm, 1998b]. LeanMBASE is a light-weight version of MBASE which is a set of guidelines that describe software engineering’s best practices for the creation of software development project. MBASE integrates 4 major development models (product development model, process model, property model and success model) to avoid model clashes during the software production [Boehm 1998]. Both MBASE and LeanMBASE are highly compatible with Rational Unified Process (RUP) in term of anchor point milestones, phases and activities between the milestones. In the first semester of the course, there were 2 major milestones: Life Cycle Objective (LCO) and Life Cycle Architecture (LCA). At the end of each milestone, all the teams needed to present their project in the review board, which is called ARB to the instructors, TAs and clients. After the ARB, the teams made change on the project’s artifacts before submitting the milestone’s package. Table 18 shows the schedule of the experiment. 53 Table 18: Software Engineering Graduate Experiment Schedule Phase / Milestone Schedule Major Activities / Artifacts Inception Sep 6 – Oct 15 Define operational concept Define top-level system objectives and scope Exercise key usage scenarios Define top-level functions, interfaces, quality attribute levels Identify top-level definition of at least one feasible architecture Identify life cycle process model LCO ARB Oct 16 – Oct 20 LCO Package Oct 23 OCD, SSRD, SSAD, LCP, FRD, Prototype, Quality Report Elaboration Oct 23 – Nov 26 Elaborate of system objectives and scope by increment Elaborate of operational concept by increment Exercise range of usage scenarios Resolve major outstanding risks Elaborate of functions, interfaces, quality attribute levels by increment Stakeholders’ concurrence on their priority concerns Identify choice of architecture and elaboration by increment LCA ARB Nov 27 – Dec 1 LCA Package Dec 4 OCD, SSRD, SSAD, LCP, FRD, Prototype, Quality Report Both MBASE and LeanMBASE contain 5 major artifacts: Operational Concept Definition (OCD), System and Software Requirements Definition (SSRD), System and Software Architecture Definition (SSAD), Life Cycle Plan (LCP), and Feasibility Rationale Description (FRD). To model the system, students were required to use UML (unified model language) diagram. In our experiments, the 54 teams were required to verify OCD, SSAD and UML diagrams using either pair development or Fagan’s inspection. Table 19: List of Real-Client Project in Software Engineering Course Team # Project Name Type of Application 1 California Science Center Newsletter System Web Application 2 California Science Center Event RSVP System Web Application 3 California Science Center Volunteer Tracking System Web Application 4 Credit Card Theft Monitoring Project Business Application 5 Value-based Software Engineering (VBSE) Game Game 6 USC Diploma Order/ Tracking Database System Web Application 7 USC Civic and Community Relations (CCR) Web Application Web Application 8 Student’s Academic Progress Web Application Web Application 9 Personal Care Technology Help Line Web Application 10 Video Uploading and Conversion System Tool 11 New Economics for Woman (NEW) Web Application 12 Eclipse COCOMO II Tool 13 Web Portal for USC Electronic Resources Web Application 14 Early Medieval East Asian Tombs Business Application 15 LANI Database Management System Web Application 16 USC CONIPMO Tool 17 UAV Sensor Planning Web Application 18 Electronic Data Discovery Tool 19 An Eclipse Plug-in for Use Case Authoring Tool 20 Online Requirements negotiation Support System Web Application 21 African Millennium Foundation Web Application There were a total of 21 teams. Each team selected the project to develop based on their interest. List of the projects is showed in Table 19. Eleven teams were randomly assigned to pair development group (PD group) and ten teams were randomly assigned to inspection group (FI group). However, three teams were not 55 included in our experiments since their projects were COTS-based project, which was using the different set of guidelines. After validating the data, we dropped eight teams from our experiment due to two main reasons: invalid data or outlier data. 3.3. Data Collection and Data Analysis The data were collected and analyzed in both qualitative and quantitative methods. The quantitative data are analyzed with descriptive analysis and statistical tests. Since the population is small, the student t-test was used to investigate the hypotheses [Siegel, 1988]. The significance value of rejecting the hypotheses is 0.05 for all tests. We developed inspection data sheets and pair development data sheets for data collection for quantitative analysis. Inspection data sheets are composed of a planning record, an individual preparation log, a defects list, a defect detail report and an inspection summary report. Pair development data sheets are composed of a planning record, a time recording log, an individual defects list and a pair development summary report. Besides the data sheet, the teams from all experiments were required to submit progress report, an individual effort report, and project planning every week. Project planning contained planed and actual effort. For validation purposes, the data was checked for consistency. For all of the classroom experiments, after the first package submission, all teams were required to meet with the instructor to discuss the V&V technique that they were performing. All questions and concerns were raised and solved. To test that the students followed the process, questions regarding project, RUP and V&V 56 techniques were part of either the midterm exam or the pre-lecture quiz. After the experiment, a post survey was distributed to students, instructor and TA for qualitative analysis. Follow-up interviews were arranged with selected students, TA and instructor for the detailed information. For the industrial experiments in Thailand, there was a weekly meeting to discuss the product and process issues. The meeting began with the developers reporting the progress of the previous week. Then, the project manager updated the project plan and the assigned tasks of the current week. Previous issues and concerns were tracked and new issues and concerns were raised and assigned. For qualitative data, the participant observation, post-survey and post- interview were used as data collection techniques. The post-survey was conducted for all experiments after the development to better understand the behavior of the developer. For classroom experiments, the survey was distributed after the project presentation. The students needed to submit it before they left the presentation room. For the first industry experiments, the survey was distributed after the project signed off and was collected the same day. The survey primarily consisted of close-ended questions because such questions provide the same frame of reference when analyzing the data. We provide the option for the subjects to specify their own answer if the answer was not available in the list of choices provided. Semi-structured interviews were used as an interview technique. According to [Caver, 2004], both techniques are suitable for eliciting information from small groups. The set of open-ended questions and data gathering forms were prepared before the interview sessions. The researcher asked the same set of questions during 57 all the interviews. The set of questions was related to the familiarity with inspection or pair development process, the behavior of the teams during the experiments and the experiences after the experiments. The interviewees had the freedom to answer all the questions based on their experiences. To avoid bias during interviews, the researcher clarified that the objective of the interviews were not to evaluate the capability of the member in the team, but to understand the differences between both practices. The objective of post-survey and follow-up interview is to help researcher understand the opinion of the participants toward the verification technique, which they were using. The following are the example of questions in the post-survey. What are the problems that you found when followed either inspection or pair development process? In your opinion after this experiment, what are the strengths and weaknesses of the verification technique that you were assigned? In your opinion after this experiment, in which environment the assigned technique has the advantages of performing as a verification technique and in which environment the assigned technique has the disadvantages of performing as a verification technique, why? The qualitative data were analyzed in the following steps. Step 1 Data Preparation: All the data were read and re-read for understanding. The researcher checks the quality of data and reviews the data for the correctness and completeness. The data were sorted and summarized into the easiest way to translate. 58 Step 2 Coding: The data were re-read and the important data were assigned abbreviated codes. The codes were represented with which each part of the data are associated [Hewitt, 2001]. The codes were categorized into each category by question as a list or tree structure. Step 3 Analysis: We used the “Constant Comparative Method” (CCM) to analyze the data. CCM is originally developed for use in the grounded theory methodology [Glaser, 1967]. It is the strategy, which takes one piece of data and compares with all others that may be similar or different in order to develop the relationship between various pieces of data. In our study, this technique is useful in order to compare the data among the same experiment and the data between the different experiments. 3.4. Hypotheses As stated in the previous section, the quantitative study is focused on the difference between PD and inspection in areas of the cost of process and the effects of quality. Total Development Cost (TDC) and component of Cost of Software Quality (CoSQ) are used for comparing the cost of process. The effects of quality is determined by the number of problems found by TA, number of un-passed test cases, number of incomplete requirements, and total project score. The lists of hypotheses are shown in Table 20 59 Table 20: Research Hypotheses Hypotheses E1 E2 E3 E4 E5 Cost H1: There is no difference in Total Development Cost (TDC) in man-hours between pair development group and software development with Fagan’s inspection group. * * * * * H2: There is no difference in internal rework costs between pair development group and software development with Fagan’s inspection. * * * * * H3: There is no difference in appraisal costs between pair development group and software development with Fagan’s inspection. * * * * * H4: There is no difference in production costs between pair development group and software development with Fagan’s inspection. * * * * * Hypotheses E1 E2 E3 E4 E5 Quality H5: There is no difference in number of major problems found by TA between pair development group and software development with Fagan’s inspection. * H6: There is no difference in number of minor problems found by TA between pair development group and software development with Fagan’s inspection. * H7: There is no difference in number of total problems found by TA between pair development group and software development with Fagan’s inspection. * H8: There is no difference in number of un-passed test cases between pair development group and software development with Fagan’s inspection. * * * * H9: There is no difference in number of problem found by customer after product delivered between pair development group and software development with Fagan’s inspection. * * H10: There is no difference in number incomplete requirements between pair development group and software development with Fagan’s inspection. * H11: There is no difference in project score between pair development group and software development with Fagan’s inspection. * * * Note: * indicates that the hypotheses have been tested 60 3.5. Validity threats and control 3.5.1. Non-equality of Team Experiences Since the subjects from inspection group and pair development group are not the same population, we needed to make sure that the subjects from both groups have the same level of knowledge and experiences. For classroom experiments in Thailand, we used the background survey to divide the students based on their GPA and programming experiences. For industrial experiment in Thailand, we used the pre-test score to set up the project team. For graduate experiment in directed research course at USC, we could not use the background survey to assist in the team formation since most of the students have different class schedules. We allowed the students to set up their own team with other students who had compatible schedules. However, the data on the students from both groups indicated that they had essentially the same level of C and C++ knowledge. For the team projects, students needed to only know C and C++ to design/implement the system. From our data, the level of GPA and industry experiences did not have an effect on team success. 3.5.2. Non-representativeness of Subjects The participants in the classroom experiments are not representative of the people who work in the software industry since the undergraduate students have no experience in the industry and most of the graduate students have at most two years of experience in the industry and none of them have experience in software 61 inspection, peer review or pair programming. To overcome this threat, we provided extensive training in the practices and the software verification techniques (either Fagan’s inspection or pair development) prior to the start of the experiment. 3.5.3. Non-representativeness of Team Size The team size of projects is relatively smaller than industry projects. Four- people teams are not representative of team size in US. However, about 60% of the US projects are comprised of less than 10 people and about 60% of US software effort is spent on projects with over 50 people [Boehm, 2003]. 3.5.4. Non-representativeness of the Size of Project The size and complexity of project is quite small compared to the size and complexity of projects in industry. As mentioned above, USC CodeCount Toolset is a sizing tool that many developers from industry use in their projects, including the defense industry. Since this project needed a quick development time, the project is representative of the three to four months rapid development projects found in industry. 62 Chapter 4. Quantitative Results In this chapter, the results of the experiments are discussed. Section 4.1 gives the results from the undergraduate experiment and shows the hypotheses test results. Section 4.2 explained the results from the graduate experiment. The first industry experiment results are discussed in section 4.3. The results of experiments conducted at USC are discussed in section 4.4 and 4.5. Since the sample in the graduate experiment and the first industry experiment are small, the hypotheses testing cannot apply. We only correlate the results to undergraduate results. The results of the second industry experiment are not showed in this section because the experiment is still in progress. 4.1. Undergraduate Experiment Results In this section, the hypotheses and its results for undergraduate students are discussed. Since the number of the subjects (7 teams for pair development group and 7 teams for inspection group) is small, the student t-test was used in the analyses of the difference between mean of pair development group and inspection group [Siegel, 1988]. The 5% significance level was used in hypotheses testing. 4.1.1. Total Development Cost H1: There is no difference in Total Development Cost (TDC) in man-hours between pair development group and software development with Fagan’s inspection group. 63 Total Development Cost (TDC) is the total number of man-hour all team members spent on developing the project. Each week, students reported their effort in terms of man-hours. In addition to an individual effort report, a project manager for each team was required to submit a weekly project plan, which includes plan and actual data. Both project plan and individual effort report are checked for consistency. Figure 12 shows the TDC for seven pair development teams and seven inspection teams. The y-axis is the number of TDC in man-hours. The highest TDC for the pair development group is 650.27 man-hours and the highest TDC for the inspection group is 769.70 man-hours. The lowest TDC for the pair development group is 433.29 man-hours and the lowest TDC for the inspection group is 657.13 man-hours. The average TDC for the pair development group is 526.737 man-hours and the average TDC for the inspection group is 695.11 man-hours. It is quite clear that all the teams in the pair development group spent less effort to develop the project than all the teams in the inspection group. The reason that the value of TDC from pair development teams is lower than the value of TDC from inspection teams will be investigated in the analysis of CoSQ. 64 Figure 12: Total Development Cost from Undergraduate Experiment In Table 21, the mean, standard deviation and P-values of TDC are shown for both groups. TDC mean of the pair development group is 168.373 man-hours lower than the mean of the inspection group. The p-value between the two groups is 0.0004, thus we can reject the hypothesis that there is no difference between the two groups for TDC. Table 21: Results of Total Development Cost from Undergraduate Experiment Sample Mean Standard Deviation Standard Error P-Value Total Developmen t Cost (man- hour) Pair Development (PD) Group 526.73 72.09 33.53 0.0004 Fagan’s Inspection (FI) Group 695.11 51.74 In this experiment, since the team sizes were equal, the effect for calendar time is the same as for effort. The student experiment results showed that the pair development group had 24% less effort than the inspection group, which implies that 65 the pair development group required 24% less calendar time than the inspection group. Hence, pair development offers the option of reducing the calendar time. 4.1.2. Cost of Software Quality (CoSQ) H2: There is no difference in rework costs between pair development group and software development with Fagan’s inspection group. H3: There is no difference in appraisal costs between pair development group and software development with Fagan’s inspection group. H4: There is no difference in production costs between pair development group and software development with Fagan’s inspection group. In the study, we are using CoSQ as the measurement to compare the cost between software development with inspection and pair development. Basically, cost of software quality represents the distribution of actual cost in each category (refer to CoSQ in chapter 2) to produce the quality product. For example, if internal failure cost is high, it implies that the development process is ineffective or the product is poor quality. Since there are lots of rework that waste efforts and unacceptable software product. From Table 22, we can see that all of the teams from both the pair development group and the inspection group spent approximately the same amount of production costs (around 300 – 350 man-hours). However, the appraisal cost and the internal failure costs from the pair development group tended to be a lot less than the inspection group. As a result, the inspection team took more effort to develop the system. 66 The mean, standard deviation, standard error and P-value values the internal failures cost, appraisal cost and production cost are shown in Table 23. As indicated, on average, the pair development group spent significantly less effort on rework (internal failure costs) and review (appraisal costs). On average, the pair development group took 207.16 man-hours less than the inspection group for fixing the product; the pair development group took 35.69 man-hours less than the inspection group for review effort. However, there is only a small number of differences between the production costs from both groups. Table 22: Results of CoSQ from Undergraduate Experiment Team # TDC (man-hour) Production Costs (man-hour) Appraisal Costs (man-hour) Rework Costs (man-hour) Prevention Cost (man-hour) Pair Development Group P1 650.27 437.92 104.63 1.92 105.8 P2 515.58 292.08 115.00 2.00 106.5 P3 515.39 269.58 136.80 7.76 101.25 P4 589.57 303.87 154.20 26.50 105 P5 507.50 330.35 64.65 3.50 109 P6 475.56 296.01 78.30 4.50 96.75 P7 433.29 268.31 60.93 10.00 94.05 Inspection Group I1 621.60 207.60 269.00 36.00 109 I2 749.00 312.50 282.00 47.50 107 I3 680.50 272.93 249.07 56.50 102 I4 677.60 301.80 223.00 42.80 110 I5 769.70 445.95 172.50 52.25 99 I6 710.24 325.72 232.78 37.49 114.25 I7 657.13 298.13 216.50 33.50 109 The results show that there are significant differences in internal failures cost, and appraisal cost between the pair development group and the inspection group but 67 there is no significant difference in production cost between the pair development group and the inspection group. Due to lower appraisal cost and internal failures cost, the TDC for the pair development team is lower than the TDC for inspection team. The students claimed that these lower appraisal and internal failure costs are the advantage of pair development because most of the defects were found while they were developing artifacts. However, there are some situations that students believe that it is not necessary to work in pairs. For example, programming modules that already have the low level design. Table 23: Results of Costs Distribution from Undergraduate Experiment Sample Activities Mean Standard Deviation Standard Error P- Value Rework Costs (man-hour) PD Group Rework (pair and individual) 102.07 35.95 19.34 0.0001 FI Group Rework 309.23 36.42 Appraisal Costs (man-hour) PD Group Continuous Review (Pair Navigation) 8.03 8.68 4.64 0.0001 FI Group Inspection 43.72 8.70 Production Costs (man-hour) PD Group Pair Driving, Management, Individual Development 314.02 58.58 34.98 0.89 FI Group Individual Development, Management 309.23 71.65 The claim was also confirmed by the industry subjects. The industry subjects agreed that pair development offers the advantage of reducing the rework cost due to 68 the early feedback from their partner. In the situations that they were programming the simple modules or the modules, which the design is straight forward, they recommend working alone may be more effective. Next, we also analyze the distribution of cost as a percentage of development between the pair development groups and the inspection groups. You can see from Figure 13 that 50% to 65% of TDC for pair development teams is the production cost, which is the effort on pair driving, managing the project, meeting, or individual development on simple modules. On the other hand, only 30% to 45% of TDC for inspection teams is production cost. Inspection teams tend to allocate over 50% of TDC to total cost of quality (the sum of the costs associated with appraisal, prevention, and rework from internal failures). Distribution of Cost as a Percentage Development for Pair Development Group 0.00 20.00 40.00 60.00 80.00 100.00 1 2 3 4 5 6 7 Rew ork Internal Failures Cost Appraisal Cost Prevention Cost Production Cost Distribution of Cost as a Percentage Development for Inspection Group 0.00 20.00 40.00 60.00 80.00 100.00 8 9 10 11 12 13 14 Rew ork Internal Failures Cost Appraisal Cost Prevention Cost Production Cost Figure 13: Distribution of Cost as a Percentage Development from Undergraduate Experiment 4.1.3. Number of Problems Found by TA H5: There is no difference in number of major problems found by TA between pair development group and software development with Fagan’s inspection group. 69 H6: There is no difference in number of minor problems found by TA between pair development group and software development with Fagan’s inspection group. H7: There is no difference in number of total problems found by TA between pair development group and software development with Fagan’s inspection group. At the end of each phase, all teams were required to submit a package that contained the artifacts listed in Table 14. For instance, at the end of inception phase, the teams submitted an inception package, which was composed of a vision document, a system requirement specification and a quality report. The package was graded by the TA who recorded all the problems found. Table 24 reports comparative average number of problems found by TA for both groups. On average, the TA found 4.143 fewer problems in the pair development groups than the inspection groups. The TA also reported the pair development groups had, on average, 3.857 fewer major problems than the inspection group. The Student t-test analysis shows that there are differences between the two groups with statistical significance in number of total problems and major problems (p-value = 0.0434 for number of total problem and p-value = 0.0033 for number of major problems). However, the results indicated that we cannot reject the hypothesis that there is no different between the two groups in number of minor problems (p- value = 0.8224). Students from inspection teams said that the reason for their lower quality in the packages was because they did not have enough time to meet to inspect all the artifacts. The inspection meeting for 4 to 5 people was hard to set up due to schedule conflicts, and the meeting could not be held if the author was not available. This 70 situation is also true in the industry environment especially in Thailand, which one person is doing more than one project at a time [Phongpaibul, 2005]. Table 24: Results of Number of Problems Found by TA from Undergraduate Experiment Type of Problem Sample Mean Standard Deviation Standard Error P- Value Major Pair Development Group 4.714 1.8898 1.0594 0.0033 Inspection Group 8.571 2.0701 Minor Pair Development Group 3.428 2.6367 1.2062 0.8224 Inspection Group 3.714 1.7994 Total Pair Development Group 8.143 3.9339 1.8273 0.0434 Inspection Group 12.286 2.8102 Moreover, for the inspection group, the inspection meeting took more time than recommended by Fagan even though they tried to find the defects during preparation. Many times, the team got into arguments during the meeting and the moderator could not stop the argument since the moderator is one of their classmates who do not have enough power to force the team to follow. The industry subjects also agreed with this situation. They preferred to have the project manager during inspection meeting. Since the project manager have more power than the moderator to stop the argument and move the inspection forward. In [Fagan, 1976] and [Gilb, 1993] suggested that during inspection meeting, the inspectors are not allowed to identify final solutions for the problems. The defects are corrected by the author after the inspection meeting. This suggestion is based on the studies in the US in which people believe in individualism. Americans 71 believes in the potential of individual. The author can solve the problem without help from the team. There is a different from Thai culture, which is collectivism, where it is believed that the best solution comes from the team and not from an individual. Furthermore, Fagan also recommended that the project manager / manager should not participate in the inspection meeting in order to avoid superficial defects. Due to Thai culture having a high power distance, not having a project manager at the inspection meeting causes the meeting to lack energy. The inspectors did not prepare well for the inspection meeting. In addition, the manager takes an important role in keeping the meetings productive by motivating the team to focus on finding the defects and stopping the arguments (if any) 72 4.1.4. Product Quality H8: There is no difference in number of un-passed test cases between pair development group and software development with Fagan’s inspection group. H10: There is no difference in number of incomplete requirements between pair development group and software development with Fagan’s inspection group. At the end of semester, all the teams were required to give a presentation and demonstrate the product. The products were tested by the instructor and TA. The test cases were prepared before the presentation by the instructor. There was no team know what the test cases were and no changes were allowed after the project due date. Hence, the order of presentation was not effect to the validity of the experiment. Table 25: Results of Product Quality from Undergraduate Experiment Number Sample Mean Standard Deviation Standard Error P- Value Un-passed Test Cases Pair Development Group 4.429 1.7182 0.9826 0.4779 Inspection Group 5.142 1.9518001 Incomplete Requirements Pair Development Group 1.286 0.951 0.5408 0.44 Inspection Group 0.857 1.069 The product quality was determined by the number of un-passed test cases and number of incomplete requirements. The product that has a lower number of un- passed test cases and incomplete requirements has a better quality of product. Table 25 indicates that the pair development group has on average one fewer un-passed test 73 cases than the inspection group. The same table shows that the pair development group has a slightly higher number of incomplete requirements than the inspection group. However, there is no evidence to reject the hypothesis that there is a difference between both groups. This implies that on average both groups produced the same quality of final product. 4.1.5. Final Project Score H11: There is no difference in project score between pair development group and software development with Fagan’s inspection group. Table 26: Results of Project Score from Undergraduate Experiment Sample Mean Standard Deviation Standard Error P- Value Project Score (Total Score 100) Pair Development Group 81.43 8.9974 3.4247 0.6948 Inspection Group 80 7.0711 Mean, standard deviation, standard error and significance value of project score for both groups are shown in Table 26. The average project score of the pair development group is slightly higher than the inspection group. However, there is no statistical significance to reject the hypothesis that there is no difference between the pair development group and the inspection group for score of projects. As illustrated in Figure 14, the highest and lowest project score are in the pair development group. We do not include the group with the lowest project score in the analysis because the team could not complete the project due to pair incompatibility among members in 74 the team. Pair compatibility is out of scope in this study. We will explore the effect of pair compatibility in the future. Project Score 95 75 7070 85 90 80 85 70 80 90 85 7575 60 70 80 90 100 Score Pair Development Group Pair Development Group Data Inspection Group Inspection Group Data Outlier Figure 14: Box Plot of Project Score from Undergraduate Experiment 4.2. Graduate Experiment Results The results from the graduate experiment are discussed in this section. Due to the small size of samples (3 teams from pair development group and 2 teams from inspection group), there is no statistical test for the results. We only use these results to observe the correlation between graduate results and undergraduate results. Since, all of the graduate students have had experiences in the software industry. Their results should be a better representation of industry environment. 4.2.1. Total Development Cost and Cost of Software Quality H1: There is no difference in Total Development Cost (TDC) in man-hours between pair development group and software development with Fagan’s inspection group. 75 H2: There is no difference in rework costs in man-hour between pair development group and software development with Fagan’s inspection group. H3: There is no difference in appraisal costs in man-hour between pair development group and software development with Fagan’s inspection group. H4: There is no difference in production costs in man-hour between pair development group and software development with Fagan’s inspection group. The results as showed in Table 27 from the graduate experiment are similar to the undergraduate student results. Both pair development group and inspection group spent approximately the same effort in production (about 190 man-hours). The highest number of production cost is 278 man-hours from the inspection group and the lowest number of production cost is 181 man-hours from the pair development group. Table 27: TDC and CoSQ from Graduate Experiment Team # TDC (man-hour) Production Costs (man-hour) Appraisal Costs (man- hour) Rework Costs (man- hour) Prevention Cost (man-hour) Pair Development Group P1 324 193 57 10 64 P2 364 186 98 16 64 P3 322 181 65 12 64 Inspection Group I1 457 199 158 36 64 I2 508 218 172 54 64 Again, the production costs and appraisal costs from the pair development group are lower than the inspection group. The inspection group took over 50% more effort in both review and rework than the pair development group. As a result, 76 the TDC of all the inspection team is higher than the pair development team. These results confirm the findings in the undergraduate experiment. The results of the distribution of cost as a percentage of development are depicted in Figure 15. The results are similar to the undergraduate experiment. The pair development group located 50% of their efforts in creation of the product while the inspection group located almost 50 % of their efforts to assuring the quality of the product. Distribution of Cost as a Percentage of Development 0 10 20 30 40 50 60 70 80 90 100 P1 P2 P3 I1 I2 % Prevention Costs Internal Failures Costs Appraisal Costs Production Costs Figure 15: Distribution of Cost as a Percentage of Development from Graduate Experiment 4.2.2. Product Quality H8: There is no difference in number of un-passed test cases between pair development group and software development with Fagan’s inspection group. Like the undergraduate experiment, all the teams presented and demonstrated the final product at the end of semester. An instructor tested the final product. The 77 instructor prepared the test cases before the presentation. The teams were not informed about the test cases and test scenarios before the presentation time. Surprisingly, the final product from all the teams passed all test cases. The students claimed that this was due to two main reasons. First, the implementation was small enough for them to cover all the possible test scenarios, since the team only required implementing one capability due to time constraint. Second, the test effort was reduced because the defects were removed in the earlier stage from the peer review technique that they applied. 4.2.3. Final Project Score H10: There is no difference in project score between pair development group and software development with Fagan’s inspection group. The project scores of the graduate experiment are shown in Table 28. The project score was evaluated by an instructor based on four criteria: the completeness, the correctness, the consistency and the design solution of the final product. All of the teams from both groups tended to have the same level of the project score except team P3. Team P3 had a slightly higher project score. The instructor said that he gave the team an extra score due to the design solution. One of the requirements for the project was the design to apply one of the design patterns which had been taught during this semester. Team P3’s design solution and its implementation have shown that they well understood the use of design pattern. However, we cannot conclude that pair development is the reason that facilitated the team in order to understand the design pattern. The better design 78 solution from team P3 may be caused by one or more of the following reason. The first reason is the individual capability. One of the team members may be familiar with the design pattern and facilitates the team to come up with the design idea. The second, the team gained the benefit of collaborate design from using pair development. Pair development assisted the team to discuss and brainstorm the solution while they were designing. Last, the team has learned better by using pair development. Williams [Williams, 2003] suggested that using pair programming the partners take turns being the teacher and the student. Knowledge is being passed between partners. We will investigate the effect of pair design in the future. Table 28: Results of Project Score from Graduate Experiment Sample Team # Score Project Score (Total Score 40) Pair Development Group P1 31.5 P2 30.50 P3 34.50 Inspection Group I1 31.50 I2 30.50 4.3. Industry Experiment Results The results from the industry experiment are discussed in this section. There is also no statistical test for the results because there is only one team for the pair development group and one team for the inspection group. We only need to compare the results with classroom experiment results. 79 4.3.1. Total Development Cost and Cost of Software Quality H1: There is no difference in Total Development Cost (TDC) in man-hours between pair development group and software development with Fagan’s inspection group. H2: There is no difference in rework costs in man-hour between pair development group and software development with Fagan’s inspection group. H3: There is no difference in appraisal costs in man-hour between pair development group and software development with Fagan’s inspection group. H4: There is no difference in production costs in man-hour between pair development group and software development with Fagan’s inspection group. Table 29: Results of TDC and CoSQ from the Industrial Industry Experiment TDC (man-hour) Production Costs (man-hour) Appraisal Costs (man-hour) Rework Costs (man-hour) Prevention Cost (man-hour) Pair Development Team 1392.9 654.2 325.7 233 180 Inspection Team 1342 429 436 317 160 Table 29 shows the results of TDC and CoSQ from the first industry experiment. The result for the total development cost (TDC) is different from the undergraduate student’s results. The pair development team spent about 50 man- hours more than the inspection team to develop the system. TDC for the pair development team is 1392.9 man-hours while TDC for the inspection team is 1342 80 man-hours. The difference may be due to the complexity of the project in the industry environment. The production costs, on the effort of creating new artifacts, managing the team and meeting, of the pair development team was over 200 man-hours higher than the inspection team. The result was different from the graduate experiments, for which the production cost of both groups was about the same. The project manager said that for the pair development team, the developers had the difficulty of working in a pair at the beginning. They took some time to adjust since they were used to working by themselves. The productivity of the pair developer was increased later. Moreover, the project manager claimed that the pair development team was a lot harder to plan and manage. The effort to plan and manage (assigning the task and rotating the pair) the pair was much greater than to assign the task to individuals and set up the inspection time. Sometime a pair disagreed about the solution. They took awhile to solve the problem or even called for a meeting asking for the team consensus. When, the problem could not solve, the manager would simply break up the pair and rotate to different partner. However, as showed in Table 29, the appraisal and internal failure costs of the pair development team were less than the inspection team. The results correlated to the classroom experiment. To achieve the high quality product, inspection also required higher appraisal cost and internal failure cost. Consequently, for a small project or a project, which the time to market is a higher risk than system failures, pair development has more potential to become an effective V&V activity for the project. 81 Figure 16 illustrates the distribution of cost as a percentage of development of the pair development group and the inspection group. Similar to the undergraduate experiment results, the pair development team spent more effort on creating software and continuous review than reworking defects. Production cost for the pair development team is 46.97% of TDC and production cost for the inspection team is 31.97% of TDC. Again, all the developers from the pair development team claimed that this is the advantage of early feedback cycle from pair development. Distribution of Cost as a Percentage Development 0.00 20.00 40.00 60.00 80.00 100.00 Pair Inspection % Rework Internal Failures Cost Appraisal Cost Prevention Cost Production Cost Figure 16: Distribution of Cost as a Percentage of Development from 1 st Industry Experiment The effort of working in pair for the pair development team is 46.76%. For appraisal cost, the effort of continuous review (pair navigation) for the pair development team is 23.38 % of TDC and inspection effort for the inspection team is 32.46 % of TDC. Internal failures cost for the pair development team and inspection team is 16.73% and 23.62% of TDC respectively. We can see that the pair development team in the industry experiment had a higher appraisal cost (23.38% of TDC) and 82 internal failures cost (16.73% of TDC) than the pair development team in the undergraduate student experiment due to the higher complexity of the project. 4.3.2. Product Quality H8: There is no difference in number of un-passed test cases between pair development group and software development with Fagan’s inspection. H10: There is no difference in number of problem found by customer after product delivered between pair development group and software development with Fagan’s inspection group. To compare the quality of product, the problems found during user acceptance test (UAT) were recorded and classified by their phase of injection. Table 30 shows the number of problems found during UAT, distributed by phase of injection. We can see that the pair development team tends to have fewer problems, and particularly 39% fewer major problems. The project manager said that the lower quality product from the inspection team was not an unambiguous indicator to show that pair development can reduce the number of defects in a product more than inspection. Since there are time constraints which cannot be negotiated with the phone company, and the time to market which is also important for the success of product (Delaying the product means the product cannot launch in the next 6 months), only the high critical modules were inspected. Although, the pair development team had fewer defects, there were two serious problems caused by the interface between the system and another system, which were not found during the design phase but instead found during the testing 83 phase. Fixing the problem this late cost the company more since the company had to pay extra to set up another test with the phone company to test the interaction with the phone company system. Hence, for some modules, which require extra care such as high dependability modules and high criticality modules, inspection may be required after pair development to ensure the quality of the product. Table 30: Number of Problems found during UAT Pair Development Team Inspection Team # of Defects Project Definition 2 3 Major 1 1 Requirement 1 3 Major 1 2 Design 5 4 Major 3 3 Code 13 19 Major 6 12 Total 21 29 Major 11 18 4.4. Directed Research (DR) Graduate Classroom Experiment Results 4.4.1. Total Development Cost (TDC) H1: There is no difference in Total Development Cost (TDC) in man-hour between pair development and inspection groups. Figure 17 shows the TDC for five pair development teams and four inspection teams. The y-axis is the number of TDC in man-hour. It is quite significant that all the teams in the pair development group spent less effort to develop the project than all the teams in the inspection group. The average TDC for 84 the pair development group is 187.54 man-hours and the average TDC for the inspection group is 237.93 man-hours. It is interesting to note that team P4 that has the highest level of C and C++ knowledge (8.25) is the team that has the lowest TDC (179.43). Figure 17: Total Development Cost from DR Graduate Classroom Experiment In Table 31, the mean, standard deviation, and P-values of TDC are shown for both groups. TDC mean of the pair development group is 50.39 man-hours lower than the mean of the inspection group. The p-value between the two groups is zero, thus we can reject the hypothesis that there is no difference between the two groups for TDC. The result from this experiment is similar to the classroom experiments in Thailand which the pair development group spent less development effort than inspection group. We will discuss the reasons of this result when we analyze the component of CoSQ in the next section. 85 Table 31: Results of TDC from DR Graduate Experiment Sample Mean Standard Deviation P-Value TDC (man-hour) Pair Development Group 187.54 7.05 0.0001 Inspection Group 237.93 2.22 4.4.2. Development Costs per Phase H1.1 There is no difference in development costs (man-hour) in requirement phase (DCR) between pair development and inspection groups. H1.2: There is no difference in development costs (man-hour) in design phase (DCD) between pair development and inspection groups. H1.3: There is no difference in development costs (man-hour) in implementation phase (DCI) between pair development and inspection groups. H1.4: There is no difference in development costs (man-hour) in testing phase (DCT) between pair development and inspection groups. In additional to Total Development Costs (TDC), we analyzed the distribution of TDC in for each development phase: requirement, design, implementation and testing to demonstrate the effect of the schedule. The development cost in requirement phase (DCR) is number of man-hours to identify requirement, develop vision document, develop use case specification, plan project, review major artifacts, meet with client, and fix the defects found in requirement phase. The development cost in design phase (DCD) includes number of man-hours to define VB physical and logical SLOC definitions, define VB keyword list, design 86 the system, review the major artifacts, discuss or research in the design issues, and fix the design defects. Meeting FixTest Testing viewTest TestGen DCT Meeting FixCode UnitTest viewCode Coding DCI Meeting FixDesign viewDesign Design DCD ing ClientMeet q Fix q view q DCR C C C C C C C C C C C C C C C C C C C C C C Re Re Re Re Re Re Re Figure 18: Development Cost by Phase The development cost in the implementation phase (DCI) consists of time in man-hours to code the system, review the code, discuss the system issues, unit test, and fix the defects in implementation phase. The development cost in testing phase (DCT) is number of man-hours to generate test cases (test description, test input), run test cases, record test log, review test artifacts, and fix the test defects. Figure 18 summarizes the calculation of development cost by phase. From Table 32, you can see that in every phase the pair development group spent less amount of effort than the inspection group except during the design phase. In the design phase, development costs from both groups were about the same. This is due to the fact that all of the students were new to the system. For the pair development group, this is due to the impact of pairing a non-expert with another non-expert [Williams, 2003]. Due to their limited knowledge of the system, the pair had to spend lots of time on designing and even then, the students would not know if their designs were correct or not. For the inspection group, the time spent on 87 inspection was also not effective since few design defects had been removed. At the end of the design phase, most of the teams were asked by the instructor to go back and fix all the defects. The pair development group was required to do pair-rework and the inspection group was required to re-inspect the design. Table 32: Development Costs per Phase from DR Graduate Experiment Team # DCR DCD DCI DCT PD Group P1 16.05 41.75 63.73 26.65 P2 13.95 41.42 64.37 25.33 P3 16.20 35.70 66.53 27.12 P4 16.77 34.30 67.03 27.68 P5 12.40 37.92 73.03 32.05 FI Group I1 26.83 40.78 96.33 21.33 I2 22.65 41.50 90.27 35.30 I3 23.25 38.00 92.30 31.40 I4 21.70 39.37 84.83 43.98 As mentioned in the beginning, since the team sizes were equal, the effect for calendar time is the same as for effort. The experiment results showed that the pair development group had about 36% less effort in requirement phase, 4% less effort in design phase, 26% less effort in implementation phase and 16% less effort in testing phase than the inspection group, which implies that the pair development group required 36%, 4%, 26% and 16% less calendar time in each phase than the inspection group. This means pair development offers the option of starting each phase earlier and reducing the calendar time. As a result, the system is delivered to the market sooner and the return of investment will be increased. Figure 19 88 illustrates the difference of calendar time between the pair development group and the inspection group. Figure 19: Effect of the calendar time Table 33: Results of Development Cost per Phase from DR Graduate Classroom Experiment Development Costs Group Mean Standard Deviation P-Value DCR PD Group 15.07 1.84 0.0001 FI Group 23.61 2.24 DCD PD Group 38.22 3.33 0.35 FI Group 39.91 1.55 DCI PD Group 66.94 3.68 0.0001 FI Group 90.93 4.78 DCT PD Group 27.77 2.55 0.02 FI Group 33.00 9.39 Table 33 reports comparative average development costs per phase. On average, the pair development group took less effort per phase than the inspection 89 group. The p-value shows that there are significant differences in the development costs in requirement, implementation and testing phase between the pair development group and the inspection group. However, there is no difference between the two groups in the development costs in design phase (p-value=0.035). 4.4.3. Cost of Software Quality (CoSQ) H2: There is no difference in failure costs in man-hour between pair development and inspection groups. H3: There is no difference in appraisal costs in man-hour between pair development and inspection groups. H4: There is no difference in production costs in man-hour between pair development and inspection groups. From Table 34, all of the teams from pair development group spent approximately more production cost than the inspection group (around 5 man-hours more). However, the appraisal cost and failure cost from the pair development group tends to be less than the inspection group. As a result, the inspection teams took more effort to develop the system than the pair development teams as described in section 4.1. These results are consistent with the experiments conducted in Thailand. We do not show the prevention cost since every team spent the same training time (10.5 man-hours per team). As mentioned in the previous experiment, there are two main reasons that the pair development group has less appraisal costs and failure costs than the inspection 90 group. First, continuous review is an invariant of pair development. While the driver is working on the product, the observer is actively reviewing the product in the same time. The possible defects can be removed as quickly as they are generated. Hence, there are less rework costs, which mean less failure costs. Second, the cost of performing Fagan’s inspection is high due to the structure of the inspection process especially for the relatively small projects. However, Fagan’s inspection has been well documented as an effective verification technique. Table 34: TDC and CoSQ from DR Graduate Classroom Experiment Team # TDC (man-hour) Production Cost (man-hour) Appraisal Cost (man-hour) Rework Cost (man-hour) Pair Development Group P1 193.82 64.33 97.92 21.07 P2 188.57 67.17 93.67 17.23 P3 181.15 71.85 81.25 17.55 P4 179.43 69.00 81.00 18.93 P5 194.73 68.45 90.33 25.45 Inspection Group I1 239.62 63.83 127.17 38.12 I2 235.80 61.42 119.92 43.97 I3 236.25 64.82 120.63 40.30 I4 240.07 61.20 120.68 47.68 Table 35 shows the mean, standard deviation, and standard error values in man-hour for the production costs, appraisal costs and failure costs. As indicated, on average, the pair development group spent more effort on creating software than inspection group. This result is different from the classroom experiments in Thailand in which, on average, the pair development group and the inspection group spent approximately the same amount of man-hours in the production cost. 91 However, it is consistent with the industry experiment in Thailand in which the pair development group spent more effort in production cost than the inspection group. This can be due to the overhead during pair session, in which the pair needed to prepare and plan before the pair execution began. Appraisal costs are the cost of assuring the quality of the product. In this study, appraisal costs refer to two major costs: cost of reviewing and cost of testing. On average, the pair development group spent 59.79 man-hours in continuous review and 29.05 man-hours in testing. The inspection group spent on average 82.13 man- hours in inspecting and 39.98 man-hours in testing. The results in Table 5 show that there are significant differences in internal production costs, appraisal costs, and failure costs between the pair development group and the inspection group. Table 35: Results of CoSQ from DR Graduate Classroom Experiment Group Activities Mean Standard Deviation Difference between means P- Value Production Costs (man-hour) PD Group Pair Driving, Meeting, Individual Development 68.16 2.74 5.34 0.0017 FI Group Individual Development, Meeting 62.82 1.79 Appraisal Costs (man-hour) PD Group Continuous Review (Pair Navigation/ Observer) and Testing 88.83 7.53 -33.27 0.0001 FI Group Inspection and Testing 122.10 3.40 Rework Costs (man-hour) PD Group Rework (pair and individual) 20.05 3.38 -22.47 0.0001 FI Group Rework 42.52 4.21 92 4.4.4. Product Quality H8: There is no difference in number of un-passed test cases in testing phase between pair development and inspection groups. H9: There is no difference in number of un-passed test cases in UAT between pair development and inspection groups. Table 36: Results of Un-passed Test Cases from DR Graduate Classroom Experiment Number Group Mean Standard Deviation P- Value Un-passed Test Cases in Testing Phase PD Group 6.8 1.48 0.79 FI Group 6.5 1.73 Un-passed Test Cases in UAT PD Group 1.4 1.14 0.80 FI Group 2.0 0.82 The product quality was determined by the number of un-passed test cases in the testing phase and in UAT. During testing phase, all teams generated test cases and recorded the results. All of the defects found during this phase were required to be fixed before product delivery at the end of semester. After product delivery, the instructor tested the final product from every team with the same set of test cases that were prepared during the semester. We called this phase User Acceptance Test (UAT). The students had no knowledge about the test cases. All the un-passed test cases were recorded and classified by defect type. The student t-test analysis from Table 8 indicates that there is no evidence to reject the hypothesis that there is a 93 difference in the product quality (number of un-passed test cases in testing phase and UAT) between both groups. 4.5. Software Engineering (SE) Graduate Classroom Experiment Results 4.5.1. Total Development Cost and Cost of Software Quality H1: There is no difference in Total Development Cost (TDC) in man-hours between pair development group and software development with Fagan’s inspection group. H2: There is no difference in rework costs in man-hour between pair development group and software development with Fagan’s inspection group. H3: There is no difference in appraisal costs in man-hour between pair development group and software development with Fagan’s inspection group. H4: There is no difference in production costs in man-hour between pair development group and software development with Fagan’s inspection group. Table 37 shows the development cost from all the participated teams. We can see that the pair development group tended to have less costs than the inspection group especially in rework cost. The pair development group spent 68.45 hours in total development cost less than the inspection group and has about 45 hours less than the inspection group in the cost to assure the quality of software (CoSQ). However, the results in Table 38 showed that there is no evidence to conclude that 94 the pair development group spent less effort in Total Development Cost, production cost and appraisal costs. As a result, we can only concluded that pair development group has less rework cost than inspection group. Table 37: TDC and CoSQ from SE Graduate Classroom Experiment Team # TDC (man-hour) Production Costs (man-hour) Appraisal Costs (man-hour) Rework Costs (man-hour) Pair Development Group P1 325.8 204 95 26.8 P2 266.5 156 88.5 22 P3 311 188 109 14 P4 226.5 136.5 78 12 Inspection Group I1 316.7 173.2 80.5 63 I2 321.2 148.5 118.7 54 I3 354.5 184 98 72.5 I4 294.5 148 92.5 54 I5 402.9 236 93 73.9 I6 420 255 90 75 I7 346.5 212.5 87 47 95 Table 38: Results of TDC and CoSQ from SE Graduate Classroom Experiment Group Mean Standard Deviation Difference between means P-Value TDC (man-hour) Pair Development Group 282.45 45.02 -68.45 0.0603 Inspection Group 350.90 46.09 Production Costs (man-hour) Pair Development Group 171.13 30.51 -22.76 0.3365 Inspection Group 193.89 41.92 Appraisal Costs (man-hour) Pair Development Group 92.63 12.97 -1.61 0.8456 Inspection Group 94.24 12.08 Rework Costs (man-hour) Pair Development Group 18.70 6.92 -44.07 0.0001 Inspection Group 62.77 11.33 96 4.5.2. Product Quality H11: There is no difference in project score between pair development group and software development with Fagan’s inspection. Table 39: Results of LCO and LCA packages from SE Graduate Classroom Experiment Team # LCO Package (point) LCA Package (point) Pair Development Group P1 136.75 142 P2 131.125 146 P3 133.25 145.25 P4 124.5 132.13 STDEV 5.16 6.38 AVG 131.41 141.35 Inspection Group I1 130 142.625 I2 132.25 142.375 I3 119.4 142.5 I4 121.875 139.25 I5 130 145.635 I6 132.25 143 I7 133.875 139.25 STDEV 5.60 2.24 AVG 128.52 142.09 Difference between means 2.89 -0.74 P-value 0.4102 0.8798 In this experiment, we cannot measure the quality of product by number of defect since the experiment was only conducted in first semester of the course, which students focused on defining the operational concept, system requirements and system architectures. Hence, we used the score from the LCO package and the LCA package to determine the quality of the product. The LCO and LCA package were graded by TAs. The TAs graded the packages based on the correctness and 97 consistency of the artifacts. From Table 39, there is no statistical difference between the pair development group and the inspection group in term of product quality. This result is similar to all of the previous experiments which the pair development group produced the same or higher quality level as the inspection group 4.5.3. Software Development Spending Profile Comparison In this section we compared the development spending profile by week between the pair development group and the inspection group. Figure 20 shows the production cost by week from both pair development group and inspection group. We can clearly see that both the pair development group and the inspection groups have a similar trend. This is because all the teams were developing the system based on the course schedule. The effort from both groups went up dramatically before the LCO ARB and LCA ARB and dropped dramatically after the milestone’s package due. The cumulative production cost in Figure 21 also shows that the pair development group spent less production effort than the inspection group. Appraisal costs are costs that relate to finding the defects in software product. In this experiment it included effort spent on pair observation for pair development group and inspection (planning, overview meeting, individual preparation, inspection meeting, and follow up). From Figure 22, the pair development group starts finding the defects as soon as the project started. Unlike the inspection group, the teams start inspecting the artifacts on the third week of the experiment. This is because the team had to wait until the product had been developed before inspecting. Figure 22 also shows that the pair development group distributed the appraisal costs each week 98 evenly. Unlike the inspection group that the appraisal costs was dramatically high in the week that there was inspection and dramatically low in the week that there was no inspection. The cumulative appraisal costs in Figure 23 show that there is no significantly different in the appraisal costs between both groups. Figure 20: Production Cost Profile from SE Graduate Classroom Experiments 99 Figure 21: Cumulative Production Cost from SE Graduate Classroom Experiments Figure 22: Appraisal Cost Profile from SE Graduate Classroom Experiment 100 Figure 23: Cumulative Appraisal Costs from SE Graduate Classroom Experiment It is very interesting to see the rework costs profile which is presented in Figure 24 and Figure 25. We can see that for each week the pair development group spent almost 50% less effort in rework than the inspection group. This is the result of the early feedback loop from pair development. Pair development considers the continuous review. Most of the defects are likely to be detected during the pair session. Unlike inspection, the inspection can only take place after the artifacts have been developed. From empirical data, it takes about 8-10 days after the artifacts have been developed to conduct the inspection meeting [Madachy, 1994]. As a result, it costs more effort to fix the same defect that occurs in the pair development group. 101 Figure 24: Rework Costs Profile from SE Graduate Classroom Experiment Figure 25: Cumulative Rework Costs from SE Graduate Classroom Experiment 102 4.6. Efficiency and Effectiveness Comparison In this session, we analyzed the efficiency and effectiveness of each technique using the data from all of the experiments. We explained the detail of the metrics in section 4.6.1, 4.6.2, and 4.6.3. The comparison of the metrics is described in section 4.6.4. 4.6.1. Production Efficiency (PE) Production Efficiency (PE) is defined as the percentage effort spent on developing or creating the new piece of software relative to the total development effort. In the context of all experiments, the effort spent on developing or creating the new piece of software is production cost and the total development effort is total development cost. Where PE = Production Efficiency, PC = Production Cost, TDC = Total Development Cost For example, Table 31 shows that on average the pair development group form the directed research graduate classroom experiment in US (E4) has a total development cost of 187.54 man-hours. From Table 35, on average the pair development group spent 68.16 man-hours to creating the new artifacts or code. Hence, the production efficiency of the pair development group is 36.34%. 100 x TDC PC PE 103 Table 31 and Table 35 also show that the inspection group spent on average 62.82 man-hours from a total of 237.93 man-hours to produce the new artifacts or code. As a result, the production efficiency of the inspection group is 26.40%. This means that the pair development group has higher efficiency in producing the software than the inspection group. 4.6.2. CoSQ Efficiency (CE) CoSQ Efficiency (CE) is defined as the percentage effort spent on detecting and reworking the defects found during software production relative to the total development effort. From the experiments, the effort spent on detecting the defects is called appraisal costs and the effort spent on reworking the defects is called rework costs. Where CE = CoSQ Efficiency, AC = Appraisal Costs, RC = Rework Costs, TDC = Total Development Cost From Table 35, the pair development group spent 88.83 man-hours and 20.05 man-hours in appraisal cost and rework costs respectively. The inspection group 100 x TDC RC AC CE 104 spent 122.10 man-hours in finding the defects and 42.52 man-hours in removing defects. Hence, the CoSQ efficiency of the pair development group is (88.83+20.05)/187.54% = 63.66% and the CoSQ efficiency of the inspection group is (122.10+42.52)/237.93% = 73.60. This result shows that pair development group has higher CoSQ efficiency than the inspection group. 4.6.3. Effectiveness (E Eff ) Effectiveness (E Eff ) is defined as the number of defects escape after software production relative to the total development effort. Number of escaped defects after software production is determined by number of un-passed test cases. Where E Eff = Effectiveness, d = escaped defects which are detected in each phase after pair session or inspection session, TDC = Total Development Cost To calculate the effectiveness of each technique (in this study either pair development or inspection), the model measured the number of the defects that escaped from the each verification technique. For example in the DR graduate classroom experiment, we will use the defects from testing and UAT to measure the effectiveness of the verification technique. TDC d E n i Eff 1 105 Table 36 shows that the pair development group has on average 6.8 defects in testing phase and 1.4 defects in UAT phase. The total escaped defects for the pair development group are 8.2 defects. The inspection group has on average 6.5 defects in testing phase and 2 defects in UAT phase. The total escaped defects for the inspection group are 8.5 defects. Hence the pair development group and the inspection group have the effectiveness of 0.044 and 0.036 defects per LOC respectively. At this point we cannot conclude that the inspection group is more effective than the pair development group. We have to compare the effectiveness with the cost that each technique spent to achieve the quality level. We are going to discuss this result in the next section. 4.6.4. Efficiency vs. Effectiveness Comparison As mentioned above, to be able to compare both techniques, we need to analyze two important factors: spending costs and quality level. Less number of defects does not mean that the technique can remove the defects better. For example if both techniques have 3 escaped defects per LOC but one technique spent 10 man- hours removing defects and another technique spent 7 man-hours removing defects. Which technique is more cost effective? In this section, the cost effectiveness will be analyzed. Figure 26 illustrates the production efficiency and effectiveness of pair development and inspection from all of the experiments. At the same quality level, the pair development group has a higher efficiency level than the inspection group. 106 This analysis confirms our finding in the beginning that the pair development group spent more time producing the new piece of software. Figure 26: Production Efficiency vs. Effectiveness Comparison Figure 27 depicts the CoSQ efficiency and the effectiveness between the pair development group and the inspection group. It is significant that at the same level of effectiveness, the pair development group has spent less CoSQ than the inspection group. CoSQ efficiency is the inverse of production efficiency. In other words, less CoSQ efficiency yields higher production efficiency. CoSQ is the costs that use to achieve the quality of software product. In this study, CoSQ is composed of the costs of finding the defects and the costs of removing the defects. From the economic point of view, the developer team needs to reduce the rework costs to reduce the total 107 cost of software quality. In this study, the primary reason that the pair development yields less number of CoSQ is the reduction of rework costs. Figure 27: CoSQ Efficiency vs. Effectiveness Comparison 4.7. Thailand and USA Experimental Results Comparison 4.7.1. Geert Hofstede’s Model Analysis As mentioned in Section 2.4.2, Hofstede’s model of cultural differences is composed of five dimensions: power distance, uncertainty avoidance, individualism, masculinity and long-term orientation [Hofstede, 2001]. Hofstede measured the differences using the country scores. The country scores were calculated as an index based on a specific question from his questionnaire. To investigate that the participants in the experiments from a different country had a different culture, the participants in 2006 experiment were asked to submit the survey as in Appendix B to 108 computing the country scores/index. The survey was reproduced with slightly modification from Hofstede’s survey {Hofstede, 2001]. The index score for each dimension was calculated using the following formula [Hofstede, 2001]. Power Distance Index (PDI) was computed based on three questions: B17, B18 and B22. PDI = 135 – (% answer 3 in B17) + (% answer 1 or 2 in B18) – [25 * (mean score B22)] Uncertainty Avoidance Index (UAI) was computed based on questions C5, C6 and B34 UAI = 300 – [40 * (mean score C5)] – (% answer 1 or 2 in C6) – [30 * (mean score B34)] Individualism Index (IDV) was calculated based on questions B2, B4, B8 and B15 IDV = [-27 * (mean score B2)] + [30 * (mean score B4)] + [76 * (mean score B8)] – [43 * (mean score B15)] – 29 Masculinity Index (MAS) was calculated based on questions B3, B8, B10 and B11 MAS = [-66 * (mean score B3)] + [60 * (mean score B4)] + [30 * (mean score B10)] – [39 * (mean score B11) + 76] 109 In the study, there were 32 Indian students from the DR graduate classroom experiment: 18 Indian students, 7 Thai students and 12 US students from the SE graduate classroom experiment. However, the results of cultural differences from this study may not be as accurate as Hofstede’s due to two main factors. First, the sample size of this study was small compared to Hofstede’s study. For example, 7 Thai students from the SE graduate classroom experiment cannot represent the Thai population, which is over 90 million. Second, we have no the index scores from 2005 experiments to compare with. Hence, in this study I only want to show that there are cultural differences among the participants from different countries and there is a similarity results compare to Hofstede’s study. Moreover, I want to investigate whether the cultural differences impact the experimental results. Table 40 shows the comparison results of the country scores from Hofstede’s study [Hofstede, 2001], the participant from the DR graduate classroom experiment, and the participant from the SE graduate classroom experiment. In this table, the rank numbers from Hofstede’s study are different from the original ranking. In this table, I ranked the country based on only 3 countries that were studied to make the comparison easily to understand. From Table 40, there are cultural differences among the participants from India, Thailand and USA. The index scores from each country were different from Hofstede’s index score. However, we used the ranking to observe the differences between countries. For power distance index, all three countries had higher index score but the ranking of the country was the same. This means that the pattern of 110 power distance is similar to Hofstede’s study which the participants from India has higher power distance index than Thailand and USA. Table 40: Country Index Score Comparison Power Distance Uncertainty Avoidance Individualism Masculinity Index Rank Index Rank Index Rank Index Rank India from Hofstede's Study 77 1 40 3 48 2 56 2 India from DR Graduate Classroom Experiment 99 1 24 3 47 3 57 3 India from SE Graduate Classroom Experiment 120 1 21 3 46 3 55 3 Thailand from Hofstede's Study 64 2 64 1 20 3 34 3 Thailand from SE Graduate Classroom Experiment 95 2 40 1 98 2 85 2 USA from Hofstede's Study 40 3 46 2 91 1 62 1 USA from SE Graduate Classroom Experiment 44 3 27 2 117 1 113 1 There was a decrease of uncertainty avoidance index for all three countries. This is because 90% of the participants were in the age of 25 – 30. The UAI score was computed based mainly on question C6 (How long do you think you will continue working in one company). Most of the answers to this question were 1 to 5 years since the participants were young graduate students who will start working in a year. They were not ready to settle in one company. Moreover, the participants from India and Thailand were foreign students. They only plan to work in US to gain the experiences and want to move back to their country later. However, the ranking of uncertainty avoidance index is similar to Hofstede’s study, which shows 111 that Thai people is more likely to fear ambiguous situations than people from India and US. There is a different in both index scores and ranking for individualism and masculinity index from Thailand. The individualism and masculinity index scores were a significant increase. However, as mentioned in the beginning, the sample size for Thailand was very small (7 Thai students). I cannot conclude from the results that there is a difference. Moreover, all the Thai students were male. This can explain the increase of masculinity index score. 4.7.2. Impact of Cultural Differences This section describes the potential cultural dimension that can impact the effectiveness of either pair development or inspection. Both pair development and inspection require the high degree of collaboration and communication among the team members. The cultural dimensions, which can influence the collaboration and communication skills, are power distance and individualism. Power distance can determine the degree of dominance between boss and subordinate. In high power distance, the subordinates hardly disagree or argue with their boss. Hence in the high power distance society, paring senior with junior can create un-effective pair development situation since the junior do not want to disagree with senior. Or the junior can feel uncomfortable having senior watching over the shoulder. Power distance can also impact the effectiveness of inspection. During the inspection meeting, if the reader and tester are junior members and the other is senior 112 member, the reader and tester may not be able to effectively detect the defects since they do not want to make an argument with the senior member. On the other hand, having senior as a tester or reader and junior as an author, the junior member do not disagree with senior when there are fault defects. Individualism has an impact on pair development more than inspection. The societies with strong individualism believe that an individual is important. Members in this society are self-oriented. They make decisions based on their achievements and respect their private lives. They have a personal space. This type of members is less likely to work in a pair. Hence, the members in high individualism societies such as US are likely to perform less effective pair development compare to Thai members. 4.7.3. The Experimental Comparison The comparison of differences between the experiments in Thailand and US are described. As explained in the previous section that, there are two cultural dimensions that impact the cost-effectiveness of pair development and inspection: power distance and individualism. Power distance index can have the impact on both pair development and inspection since it can create the distance between the members while collaborating or communicating. On the other hand, individualism only has an impact on pair development since two people are required to work in the same time continuously. In this experiment, the impact of power distance cannot be discussed since each experiment was required to develop different tasks. 113 To show the impact of individualism on pair development, the differences of CoSQ Efficiency from all experiments were compared. Table 41 shows the comparison of CoSQ Efficiency from all of the experiments. The impact of individualism was analyzed using the difference of CoSQ Efficiency between the pair development group and the inspection group. CoSQ Efficiency illustrates the percentage of effort spent on achieve the software quality. The lower percentage of CoSQ means more efficiency. As discussed in the previous section, higher individualism score can reduce the efficiency of pair development but no impact to inspection. Hence, the differences between CoSQ Efficiency of pair development and inspection shall reduce in the higher individualism score experiment. Table 41 shows that in all experiments in Thailand have almost the same different percentage of CoSQ Efficiency. The difference is smaller in the DR graduate classroom experiment in US in which all but one of the students came from India and smallest in the SE graduate classroom experiment in US. From the Hofstede’s model analysis, India has higher individualism than Thailand but less than the individualism scores from participants in the SE graduate classroom experiment. From this analysis, the individualism has an impact on the effectiveness of pair development. Pair development is less effective in the country where people are more self-oriented. 114 Table 41: CoSQ Efficiency Comparison Experiment # Group CoSQ Efficiency (%) Individualism Index Score Differences (%) E1: Undergraduate classroom experiment in Thailand Pair Development 40.39 20 -15.13 Inspection 55.52 E2: Graduate classroom experiment in Thailand Pair Development 25.54 20 -17.92 Inspection 43.52 E3: Industry experiment in Thailand Pair Development 53.03 20 -15 Inspection 68.03 E4: DR graduate classroom experiment in US Pair Development 63.65 46 -9.95 Inspection 73.60 E5: SE graduate classroom experiment in US Pair Development 39.42 62 -5.33 Inspection 44.75 Note: the DR graduate classroom experiment considered as India experiment since all but one of the students came from India. 4.8. Conclusion The common result from all of the experiments is that the pair development group had less development cost to produce the same quality level of the system. Pair development spent a greater percentage of effort in producing the artifacts and less effort in reworking defects. This can be the advantage of an early feedback cycle from pair programming, which the defects are found while the developers are producing the artifacts. Reducing the appraisal and failure costs during the design and construction phase in the software industry is significantly valuable since the product will be tested and integrated earlier. The following sections summarize the results from all of the experiments. 115 4.8.1. Total Development Costs From Figure 28, the results from all of the experiments except industry experiment showed that the pair development group had about 20% - 30% less total development cost than the inspection group. Although, pair development team from industry experiment spent 4% more development effort than inspection team, the quality of final product was higher than inspection team. As a result, the inspection team may spend more TDC to produce the same quality level of product. In the experiments, since the size of the teams was equal, reducing development effort means reducing the calendar time. Hence, pair development offers the option of reducing 20% to 30% of calendar time. Figure 28: Total Development Cost Comparison 116 4.8.2. Production Costs From all of the experiments, there is no statistical difference for production costs between the pair development group and the inspection group. Hence, both pair development group and inspection group had the same production cost to produce the same product. Figure 29 shows the production cost comparison from all of the experiments. Note: As mentioned in the previous section, the industry experiment spent higher cost with the higher quality product. Figure 29: Production Cost Comparison 117 4.8.3. Appraisal Costs Figure 30 summarizes the appraisal costs comparison between the pair development group and the inspection group from all of the experiments. Results showed that all of the experiments except the SE graduate classroom experiment (E5) the pair development group spent from 30% to 55% less appraisal costs than the inspection group. As explained in the previous section, the impact of cultural differences and the diversity of participants are two possible reasons that there is no difference in appraisal costs between two groups for the SE graduate classroom experiment. Figure 30: Appraisal Costs Comparison 118 4.8.4. Rework Costs Due to the impact of the early feedback loop from pair development, all of the experiments showed that there is a greater number of rework costs reduction in the pair development group. Pair development spent at least 30% to 60% less effort in reworking the product than inspection group. Figure 31 illustrates the rework costs from all of the experiments. Figure 31: Rework Costs Comparison 4.8.5. Product Quality Figure 32 shows the product quality of pair development group and inspection group from all of the experiments. The results from the experiments showed that there is no significant difference in product quality between two groups except in the industry experiment, which inspection team produced product with more number of defects. 119 Although, there was no number of defects measured in the SE graduate classroom experiment (E5), the quality of product was measured by the final score of the projects. The result is similar to the other experiments. There is no difference in the projects score between two groups in the SE graduate classroom experiment. Figure 32: Product Quality Comparison 120 Chapter 5. Defect Type Analysis The defects, which were escaped from verification technique (pair development or inspection) and post-survey results, were used to determine the type of defects. By analyzing the defect type, we can determine which defect types are eliminated better by pair development and which defect types are better detect by inspection. At first, we classified the type of defects at the test phase and UAT using Orthogonal Defect Classification (ODC) v.5.11 from IBM [IBM Research, 2002]. Detail of defects type description is provided in Appendix A. The defect data were analyzed by the activity of injection and its root cause. Then, they were separated into two categories: requirement defects and design/code defects. The requirement defects are the defects that inject during writing requirements due to the incorrect requirements, the ambiguity of requirements, conflict of requirements or missing requirements. The design/code defects are the defects that inject during either design or code. The cause can be from a simple problem such as incorrect validation of parameters or data in conditional statements until more complex problem such as incorrect external interface. At the end of semester, the participants were asked to fill out the survey to weight the difficulty of detection based on their experiences from the experiment. The students rated the type of defects from the easiest to the hardest to detect by the verification technique, which they were using during the semester. The results from both defect data and post-survey results were compared and analyzed. 121 5.1. Defect Type Results from Defects Data 5.1.1. Requirement Defects Table 42 shows the percentage number of defect for requirement defects from the experiment categorized by ODC defect type. Performing pair development in requirement document gave an option of increasing the clarity and correctness of the requirements. Requirements were written in the way that can easily determine the test criteria. The requirements were described in the detail, which made the developer understand the need of clients. There were a small number of incorrect requirements. However, pair development tended to have weaknesses in the traceability and consistency of the requirements. On the other hand, inspected requirement document reduced the number of inconsistency and incompliance among the requirements. There were no significant results to show the weaknesses of inspection for requirement defects. Table 42: Results of Requirement Defects Data Analysis Defect Type Pair Development Inspection Clarity 6.25% 23.07% Compliance 18.75% 11.538% Consistency 31.25% 11.538% Correctness 6.25% 19.23% Level of Detail 0 23.07% Traceability 37.5% 11.538% 122 5.1.2. Design/Code Defects Table 43 shows the number of design/code defects from the experiment categorized by ODC defect type. It is obvious that pair development has a weakness on the interface between both internal and external module. Especially from the industry experiment in Thailand, there were two severe defects due to the interaction with the external system. These defects cost the company extra to set up another test with the phone company to test the interaction with the phone company system. Pair development is better in detecting the incorrectness of algorithm, method, function, class and object. The defects from inspection were allocated evenly in almost all of the defect types. This is due to the limited time that the teams did not inspect all artifacts. However, it is obviously that inspection is better in detecting the interface type defects and inspection has a weakness on detecting timing/serialization defect type. Table 43: Results of Design/ Code Defects Data Analysis Defect Type Pair Development Inspection Assignment/ Initialization 14.12 % 19.35% Algorithm/ Method 7.06 % 14.51% Checking 8.23 % 14.51% Function/ Class/ Object 9.41 % 12.90% Interface/ O-O Message 27.06 6.45% Relationship 22.35 9.68% Timing/ Serialization 11.76 22.58% 123 5.2. Defect Type Results from Post-survey 5.2.1. Requirement Defects In Figure 33, the post-survey results showed similar results with the defects data analysis for pair development group. Students voted traceability defects as the most difficult defects type to detect by pair development. Compliance and consistency have been voted as the second and third difficulty to detect. The participants rated correctness as the easiest to find by pair development. Figure 33: Requirements Defect Type Analysis from Survey Data For inspection, there is the difference between the defect data and survey results. The defects data showed that inspection has weaknesses on detecting clarity, level of detail and correctness but is better in detecting compliance, consistency and traceability. The survey results showed that the participants voted traceability as the 124 hardest to detect by inspection. As a result, I need more data to conclude whether the inspection has weakness on traceability. 5.2.2. Design/Code Defects Figure 34 illustrates the results of design / code defect types analysis from the survey. The results show that pair development has the weaknesses in interface/OO message and relationship between modules. Inspection has a weakness in timing and serialization. These results are similar to defect data analysis. From survey results, pair development is better in detecting assignment/ initialization and function/ class/ object defect type. This result is different from defect data analysis in which pair development has strengths on algorithm/ method, checking and function/ class object. Inspection is better in detecting interface/ OO message and relationship defect type. This is similar to defects data analysis. However, for the others defects type, there is not much different for each type. 125 Figure 34: Design/ Code Defects Type Analysis from Survey Data 5.3. Conclusion The analysis showed the strengths and weaknesses of pair development and inspection. This analysis can assist developers to select the different type of techniques when they know what type of defects they need to remove. For requirement defect type, pair development is better than inspection on making the requirements clear, easily to read and provide complete detail but it has weaknesses on detecting consistency, compliance and traceability. On the other hand, inspection can detect consistency and compliance better than pair development. Hence, after performing pair development at requirement document, the developers can perform inspection to remove inconsistency and incompliance. For code/design defects type, pair development has the weaknesses on the interaction between modules or systems. Again, we can use inspection to detect these types of defects after pair development. 126 Chapter 6. Risk-Based Decision Framework This chapter discusses the decision framework to select the verification technique. The decision framework is based on the results from all of the experiments and literature review. First I defined the three critical factors which involve in the decision making. Next section explains the method to make the decision. This method is the same method that is used in balancing the agility and discipline [Boehm, 2003]. This method is relied heavily on risk identification. Hence, I identified the potential risks for both pair development and inspection in section 6.2.1and 6.2.2. The last section shows the example of applying the decision method. 6.1. Critical Decision Factors From the empirical data, we identified three critical factors that impact the cost-effectiveness of either pair development or inspection. These critical factors are based on the characteristic of each technique. The first factor is the size of development team. Pair development is suitable for small to medium size team and inspection is suitable for medium to large size team. Pair development is originated from agile development, which is created for small size team [Boehm, 2003]. The success of pair development also relied on pair rotation [Williams, 2003]. The team members need to constantly rotate pair to transfer knowledge around the team. This required a good management to keep assigning the pair. For the large size team, the manager may not have enough time to manage pair rotation. For inspection, the 127 inspection meeting required at least 4 members to involve in the inspection. In the small size team, there may be not enough developers to participate in the inspection or the production may be interrupted since majority of the team involving in the inspection. The second factor is safety and criticality of the project. Inspection is better in handling the high criticality project since inspection is more predictable than the pair development. Inspection is over 30 years old unlike pair development, which is only practiced for about 10 years. There are numerous empirical studies on inspection [Ackerman, 1989], [Dion, 1993], [Kelly, 1992], [Myers, 1988], [Russell, 1991], [Weller, 1993]. The developer can predict how many defects escaped per LOC and what type of defects escaped in which phase. In addition, pair development has the weaknesses of detecting the interface between the systems and between the modules. Most of high criticality is required extensive system integration. Table 44: Summary of Critical Decision Factors Factor Pair Development Inspection Size Small to medium Medium to Large Criticality Untested on safety-critical products. Handle highly critical product Time to market Rapid development or high pressure time to market Low pressure time to market The last factor is time to market. It is clear from our experiments that pair development reduces the life cycle of software development due to the early 128 feedback loop. There is a dramatically reduction on rework costs. The product sends to the testing phase earlier as a result the product gets delivery earlier. In contrast, inspection has a longer feedback loop. The inspection cannot occur if the artifact has not been developed. From empirical data, it takes about 10 days after the artifacts have been developed to conduct the inspection meeting [Madachy, 1994]. This prolongs the software development lifecycle. Table 44 summarizes the critical decision factors associated with pair development and inspection character. The critical decision factors are depicted in a graphical view in Figure 35. This representation is originally used to represent the five critical factors to balance agility and plan-driven by Barry Boehm and Richard Turner [Boehm, 2003]. In the graph, each axe represents each critical factor. Size and Criticality’s scale are similar to the factors used to distinguish between the lightweight Crystal methods and heavier weight Crystal methods by Alistair Cockburn [Cockburn, 2002]. 129 Figure 35: Dimensions Affecting Verification Technique Selection The time to market’s axe is scaled using the market value-utility function. There are three value-utility function’s shapes [Huang, 2006]. The typical value- utility function’s shape is S-shape (shown in Figure 36a). This shape represents most of the commercial product that early system delivery capture market share ahead of the competition. The market share diminishes when the significant competitors enter the marketplace. The system value loose goes up dramatically at the critical region until reaching the diminishing points where there is little market share left. The second shape is step shape (shown in Figure 36b). This shape generally represents the value of software for an event. The system value becomes zero if the system cannot release on time, for example, the system for the Olympic game. If the system cannot deliver before the game starts, the system has no more value. 130 The last shape is low-shape linear (shown in Figure 36c). This shape represents the system that the users value small loss versus delivery time, for example, a low-criticality in house application, in which the user can use the existing system until the new one is delivered. Figure 36: Software Market Value-Utility Function [Huang, 2006] Figure 37 shows the example of the project, which should consider apply pair development as the verification technique. This example shows that there are about 5 members in the team. The project is not in the high critical level and the project is an event-based. Hence, there is a huge loss if the team cannot deliver the project on time. In other hands, Figure 38 shows the example of the inspection project. The team size considers large (> 100 members). The system causes many lives if there is a failure. There is no competitor in the market so the system has small loss if there is a delay in delivering the system. 131 Figure 37: Example of Pair Development Scenario Figure 38: Example of Inspection Scenario 132 6.2. Risk-based Decision Framework To assist the developers or project managers in choosing the verification technique (either pair development or inspection), the risk-based framework which defined by Boehm and Turner to balance the agility and plan-driven [Boehm, 2003] was adopted. This framework is driven by the risks of applying either pair development or inspection. The developer or project manager needs the ability to identify the risks and its exposure in using each technique. For example, if there is a higher impact on the risks of using inspection, the teams shall consider applying pair development instead of inspection. The candidate risks for pair development and inspection were also identified in section 6.2.1 and 6.2.2. The risk-based decision framework is composed of 4 steps. Figure 39 illustrates the process framework. 133 Figure 39: Risk-Based Decision Framework Step 1: Identify the Risks Identify the risks, which will occur when applying pair development and inspection. In the following sections, I list the candidate risks for both pair development and inspection. These risks are identified based on all experimental results, the survey from all experiments and literature review. However, the lists are not always complete or applicable. The new research data may create other potential risks. 134 Step 2: Rate and Compare the Risk Impact Evaluate each risk by rating the severity, which will impact the impact if it occurs. If you can determine the impact of the risks as the dollar, which the project will lose, you will be able to compare the impact of the risks easily. If not, you can rate each risk based on 0 to 4 scale. Zero means minimal risks and four means serious and unmanageable risks. Step 3: Select the Technique If the inspection risks dominate, go for pair development. If the pair development risks dominate, apply inspection. However, if neither of them dominates, do pair development in part that there is a low impact of pair development risks and cover the necessary part with inspection. Step 4: Monitor and Control the Life Cycle It is important to monitor and evaluate the impact of using either technique. There should be an adjustment, if there is a change on the risk impact. For example, if using inspection is delaying the schedule of the project, the team should consider using inspection only on the high critical module and switching to pair development elsewhere. 135 6.2.1. Pair Development Risks This section describes the risks that are specific to the use of pair development. Pair Compatibility One success factor for effective pair programming is “Pair-Jelling” [Williams, 2000]. Williams stated that: The pair must cease considering themselves as a two-programmer them and must start considering themselves as ONE coherent, intelligent organism working with ONE mind. [Williams, 2000] To create pair-jelling, two programmers must have similar behaviors called “pair compatibility”. They think similar things, talked similar ways, come from similar cultural backgrounds. Katira summarized in her pair compatibility study that the students are more compatible when working with the partners who have similar skill level. Moreover, pairing with different genders is likely to be less compatible [Katira, 2004]. In the post-survey, many programmers raised the issues of pair- compatibility. One project manager from the SE graduate classroom experiment stated that: “Avoiding the personal differences can speed up the pair development process” Not enough people skilled to participate As mentioned above, pairing two programmers with similar skill level can increase a level of pair compatibility; in some cases, if the project required many 136 skill set and there is only one programmer who is an expert in each skill set. In this case, there are not enough people skilled to participate in paring. No third party review Developer blind spots occurred when the developer is working on the project for a long time and familiar with the system. The defects from the developer blind spots can be detected by having the review come from someone who is not an owner of the task (third party). By nature of pair development, it is difficult to have the third party review the artifacts. All members in the team are rotated around to work on different parts of the project. Most of members are familiar with the system. Active and dynamic team player The success of pair development is relied heavily on the team player. Performing pair development requires the team players that are active and dynamic to perform collaborative tasks. During the pair session, there is continuous interaction between a pair: asking the question, explain the detail of design, discuss the alternative solution. The pair also constantly switches the role between driver and observer during the session. The pair needs to be able to take the driver role any time that possible. Moreover, due to pair rotation, the team members need flexibility to always work in the different tasks. 137 Ego-less programmer In her dissertation, Williams stated that “Ego-less programming” is essential for effective pair programming. According to her study, high ego can damage the collaborative relationship in two ways. First having high ego can block a programmer from considering the other alternatives or solutions. Second, a high-ego programmer gets defensive when having criticism [Williams, 2000]. 6.2.2. Inspection Risks This section describes the risks that are specific to the use of inspection. Schedule conflict to schedule the inspection meeting The effectiveness of inspection is relied heavily on two important meetings: overview meeting and inspection meeting. The objective of an overview meeting is to educate the inspector about the artifacts, which will be inspected. The meeting inspection is arranged to identify the possible defects, issues or conflict agreement. Hence, it is important to have the key stakeholder participate in the inspection. Unfortunately, sometime the key stakeholders of the project are responsible more than one project at a time. There can be a conflict when needing to schedule the 4-5 stakeholders to participate in the inspection meeting. This issue causes the postponing of the meeting, which can delay the project schedule. 138 Not enough people to participate in an inspection One inspection session requires 4 – 6 inspectors. In a small project, having 4 to 6 programmers participate in inspection can cause the interruption of a project. Do not have people skilled in doing inspection Effective inspection required the inspection skill. The inspectors understand the process well and understand the objective of inspection role. Reader required ability to translate or paraphrase the artifacts. Tester requires a strong testing perception: how the artifacts will be tested or traced back? Moderator keeps the inspection meeting going, avoids the discussion about the solution, have ability to measure the quality of artifacts. High cost due to long feedback loop and overhead costs There is an increase of rework cost due to long feedback loop and overhead costs due to the formal process of structure. For the small to medium project, this overhead can increase one-third of development cost. 6.3. Example of Risk-based Decision This section gives the example on how to apply the risk-based decision framework by applying it to the project in the industry experiment. The project in the industry experiment was a small project (seven-person team). The project was a commercial web application that allows the registered user to predict the soccer score. The winner received the money based on the winning 139 point. The system provided the feature to interact with the system from the telephone company to be able to inform the updated scores and winning points. Since there was a banking transaction, we can consider this system as a high criticality system. Soccer is popular in Thailand. People are betting the score using the human to record the prediction. There is no online system yet. The market-value is very high if the system can deliver to market before the soccer season but the system will have no value if there is no soccer season during the delivery. Hence, the market- value utility function’s shape for the system is step shape. Figure 40 shows that there is a mixture of project profile between pair development and inspection. Since, the system is high critical the project manager cannot only use pair development to verify the system. There has a risk that there may be some critical defect escaping to delivery phase. Hence, we will use the risk-based framework to assist the selection. 140 Figure 40: Industry Experiment Project's Profile Step 1: Identify the Risks All the risks from both pair development and inspection that are listed above were applicable to this project and there is no the other risk. Hence, we will use all of the risks above. Step 2: Rate and Compare the Risks Impact Table 45 shows the risk rating and risk analysis for the industry experiment project. Neither pair development risks nor inspection risks are dominated. Step 3: Select the technique Since neither pair development risks nor inspection risks are dominated we cannot decide to choose one over another. The project manager can choose the 141 mixture between two techniques by applying pair development in most part and using inspection in the modules critical modules such as banking system. Step 4: Monitor and Control the Life Cycle It is important for project manager to monitor and evaluate the risks. For example, if a project manager found that the pair compatibility risk goes up to 2 or 3 due to the contractors. The project manager may not have them participate in pair development and schedule the inspection for their tasks. This example shows that the risk-based decision framework can apply to the real-life project to assist the developers/project managers to decide the most cost- effective model for their projects. Risk analysis facilitates the developers/project managers to determine is it risky to perform pair development or inspection in their projects. 142 Table 45: Pair Development Risk Rating for Industry Experiment Project Risk items Risks Ratings Reason Pair Development Risks Pair compatibility 1 The team members were compatible since most of the team members have been working closely for couple years except two members who were contractor. Not enough people skilled 2 The developers were familiar with the system except the banking module and the module that interacted with telephone system. However, the company hired two contractors who expert in these system. No third party review 3 There was a need from the product and development department to review the requirements and capabilities of the system to make sure that the system provides enough capabilities for the user to use the system. In addition to completion of capability, the algorithm for score calculation needed to be review to make sure the correction of transaction. Since, the score system was related to banking transaction, the failure of the system can cost lots of money. Active and dynamic team player 2 The team members are familiar with working more than one job at a time. This shows that they have a flexibility to work in different type of job. However, Pair development is a new technique for all team members even though there was training in the beginning. We do not know that in the situation that they need to constantly rotate to different job what would happen. Ego-less programmer 1 The members in the team have no high-ego. There are always collaboration among the members in the project. However, we had two contractor who we never had working experience with. 143 Table 46: Inspection Risk Rating for Industry Experiment Project Risk items Risks Ratings Reason Inspection Risks Schedule conflict 2 As mentioned above, all the team members were working more than one project at a time. It was difficult to schedule the 4 to 5 people at the same time. Not enough people 2 Since, there was only seven members in a team, perform inspection can interrupt the progress of the project. Half of the team needed to participate in the inspection. No inspection skilled 1 Team members were familiar with peer review. They know what type of defects to look for. High cost 2 Since the market pressure was extremely high, the longer feedback loop and overhead from formal / structure inspection process could lead to schedule delay. 144 Chapter 7. Conclusion In this research, five experiments to compare the difference between pair development and software inspection are discussed. Each of the experiments controlled the variable, such as the environment and the project type. Three experiments were conducted in Thailand and two experiments were conducted in US to investigate the impact of cultural differences. The results showed that the pair development has potential to become an effective V&V technique especially in the country in which the cultural is similar to Thailand. The common result from five experiments is that the pair development group spent a greater percentage of effort in producing the artifacts and less effort in reworking defects. This can be the advantage of an early feedback cycle from pair programming, which the defects are found while the developers are producing the artifacts. Reducing the appraisal and failure costs during the design and construction phase in the software industry is significantly valuable since the product will be tested and integrated earlier. Table 47 shows the summary of the experiment results. For the undergraduate experiment, TDC for the pair development group is 24 % less than the inspection group with the same quality. Similar to the graduate experiment, TDC for the pair development group is 30% less than the inspection group with the same quality. For the industry experiment, TDC for the pair development group is 4% more than the inspection group with 39% fewer major defects. This difference may be due to the project in industry environment being more complex than the 145 classroom project. However, for the inspection team from industry experiment to achieve the same quality as the pair development team may require more TDC than for the pair development team. Table 47: Summary of Quantitative Results Team TDC (man- hour) Production Costs (man-hour) Appraisal Costs (man- hour) Rework Costs (man- hour) # Test Defects Under- graduate Experiment PD Group (mean) 526.73 314.02 102.07 8.03 4.429 FI Group (mean) 695.11 309.23 234.97 43.72 5.142 Graduate Experiment PD Group (mean) 336.66 186.67 73.33 13.67 0 FI Group (mean) 482.5 208.5 165 45 0 First Industry Experiment PD Group 1392.9 654.2 325.7 233 11 (major) FI Group 1342 429 436 317 18 (major) DR Graduate Experiment PD Group (mean) 187.54 68.16 88.83 20.05 8.2 FI Group (mean) 237.93 62.82 122.10 42.52 8.5 SE Graduate Experiment PD Group (mean) 282.45 171.13 92.63 18.70 N/A FI Group (mean) 350.90 193.89 94.24 62.77 N/A Since the team sizes were equal, the effect for calendar time is the same as for effort. For example, from the student experiment results, reducing 24% effort 146 from the pair development team implies reducing 24% calendar time. Pair development offers the option of reducing the calendar time. The results of experiment in Thailand and US also showed that there is an impact of cultural differences due to an individualism index. Individualism describes the relationship between the individual and the cohesive in-group. The societies with strong individualism believe that an individual is important. Members in this society are self-oriented. Pair development, which required continuous interaction between the developers, is more efficiency in Thailand, which has lower individualism index score than in US, which people generally require more personal space [Phongpaibul, 2005]. The experiment results provided the possible factors that impact the cost- effectiveness of either pair development or inspection. For example, when team is small and there is a time constraint, inspection may not be as cost-effective as time constrains. It is more cost-effective if the project manager chooses to perform pair development as a peer review technique. Size of team member, safety criticality and time to market were defined as the critical factors in the decision-making. Pair development is working well in low critical project with the small size team and high market pressure. On the other hands, inspection is working better in the high critical project with the large size team and low market pressure. However, both practices can be combined in the project which the profile is mix between two practices. Boehm’s risk-based decision framework [Boehm, 2003] was used as the decision framework for the project manager to select the practice, which will be used in the project. 147 Boehm’s risk-based decision framework is relied heavily on the ability to identify risks. Hence, the pair development risks and inspection risks were identified from the all of the experiment results, survey results and literature review as a guideline. However, the lists are not always complete or applicable. The new research data may create other potential risks. At this point, we can conclude that pair development can generally perform the verification more cost effectively than software inspection in the small size project environment. To be able to expand the conclusion to the larger size project, further empirical assessments are necessary to fully understand the commonalities and differences between pair development and software inspection under given conditions. More replicated experiments in the industry environment are required to be able to create the best guidelines for developers to follow. 148 Chapter 8. Future Work This research has potential to expand in to the following future works. 8.1. Replicate Experiment in Industry Environment To improve the decision framework, further empirical assessments are necessary to fully understand the commonalities and differences between pair development and software inspection under given conditions. More replicated experiments in the industry environment are required to be able to create the better guidelines for developers to follow. 8.2. Tthe Experiment to Compare the Impact of Each Technique in the Maintenance Phase Williams suggested that Pair Programming reduce the risk of missing the key person on the project over time [Williams, 2000]. Moreover, the new experiment to compare the impact of both practices in the maintenance phases is also interesting. 8.3. Simulation of Pair Development Model The creation of an extension to Madachy’s System Dynamics model of inspections [Madachy, 1994] is useful to compare pair development and inspection dynamics. The data from all the experiments will be calibrated to the model. 149 In additional, the effort per month and actual schedule from previous experiments and new experiments will be analyzed to see the relationship between the effort and schedule of both inspection group and pair development group. Figure 41 shows our expected results from the simulation. Software development with PD may spend more total effort but the system will deliver to market faster than software development with inspection. Figure 41: Software Development Spending Profiles with PD 8.4. Extend COCOMO II to Support Pair Development COCOMO (Constructive Cost Model) is the tool to estimate the cost, effort and schedule for software development project. COCOMO estimation is based on the calibration of 161 software development projects [Boehm, 2000]. However, 150 none of this project performed pair development. In the future, it would be beneficial to extend the COCOMO to support pair development project by including the empirical data from pair development project in the calibration. 8.5. Automatic Decision Tool When there is enough data to fully understand the commonalities and differences between pair development and software inspection under given conditions, the researcher can perform the sensitivity analysis to extract all the factors that influence the effectiveness of each practices under given circumstance. After that, the research can create the tool to suggest the best practice to perform under given condition. This tool would be benefit to the developer or project manager. 151 References [Ackerman, 1989] Ackerman, A.F., Buchwald, L.S., and Lewski, F.H. “Software Inspection: An Effective Verification Process,” IEEE Software, Vol. 6, No. 3, May 1989, pp. 31-36. [Arisholm, 2002] Arisholm, E. “Design of Controlled Experiment on Pair Programming,” Proceedings, ISERN 2002, 2002. [Aurum, 2002] Aurum, A., Petersson, H., and Wohlin, C. “State-of-the-Art: Software Inspections after 25 Years,” Software Testing, Verification, and Reliability, Vol. 12, 2002, pp. 133-154. [Baheti, 2002] Baheti, P., Gehringer, E. and Stotts, D. “Exploring the Efficacy of Distributed Pair Programming,” Proceedings, XP/Agile Universe 2002, New York: Springer; 2002, pp. 208-220 [Blakely, 1991] Blakely, F.W. and Boles, M.E. “A Case Study of Code Inspection,” Hewlett-Packard Journal, Vol 42, No. 4, Oct.1991, pp.58-63. [Boehm, 1981] Boehm, B.W. “Software Engineering Economics,” Prentice Hall, New Jersey, 1981. [Boehm, 1987] Boehm, B.W. “Improving Software Productivity,” Computer, Vol. 20, No. 9, Sept. 1987, pp. 43-47. [Boehm, 1998a] Boehm, B., Port, D., Abi-Antoun, M., and Egyed, A., “Guidelines for the Life Cycle Objectives (LCO) and the Life Cycle Architecture (LCA) deliverables for Model- Based Architecting and Software Engineering (MBASE)”. USC Technical Report, 1998 [Boehm, 1998b] 152 Boehm, B., Port, D., Egyed, A., Abi-Antoun, M., “The MBASE Life Cycle Architecture Milestone Package: No Architecture is An Island”. In First Working IFIP Conference on Software Architecture (WICSA'1), 1998 [Boehm, 2000] Boehm, BW., Horowitz, E., Madachy, R., and et al, “Software Cost Estimation with COCOMO II,” Prentice Hall: January 2000. [Boehm, 2003] Boehm, B.W., and Turner, R. “Balancing Agility and Discipline,” Addison-Wesley: 2003. [Caver, 2004] Carver J. “Using Qualitative Methods in Software Engineering,” Presentation, International Advanced School of Empirical Software Engineering (IASESE), August 18th, Los Angeles, CA, 2004. [CEBASE, 2003] CEBASE, “eWorkshop on Software Inspections and Pair Programming Report,” December 2003, <http://www.cebase.org/www/home/index.htm>. [Cockburn, 2000] Cockburn, A., and Williams, L. “The Costs and Benefits of Pair Programming,” eXtreme Programming and Flexible Processes in Software Engineering XP2000, 2000. [Cockburn, 2002] Cockburn, A., “Agile Software Development,” Boston: Addison-Wesley; 1995. [Constantine, 1995] Constantine, L.L. “Constantine of Peopleware,” Yourdon Press. [Coplien, 1995] Coplien, J.O. and Schmidt D.C. “A Development Process Generative Pattern Language in Pattern Languages of Program Design,” MA: Addison-Wesley; 1995, pp. 183-237. [Ciolkowski, 2002] 153 Ciolkowski, M. and Hericko, M. “Study the Effect of Pair Programming,” Proceedings, ISERN, 2002. [Crosby, 1979] Crosby, P. “Quality Is Free: The Art of Making Quality Certain,” McGraw-Hill, New York, 1989. [CSSE-USC, 2006] Center for System and Software Engineering (CSSE) – University of Southern California, “USC CodeCount Toolset”, http://csse.usc.edu/research/CODECOUNT/ [Dabney, 2003] Dabney, J.B. “Return on Investment of Independent Verification and Validation Study Preliminary Phase 2B Report,” Fairmont, W.V.: NASA IV&V Facility, 2003. <http://sarpresults.ivv.nasa.gov/ViewResearch/289/24.jsp>. [Dion, 1993] Dion, R. “Process Improvement and the Corporate Balance Sheet,” IEEE Software, Vol. 10, No. 4, July 1993, pp. 28-35. [Doolan, 1992] Doolan, E.P. “Experience with Fagan’s Inspection Method,” Software – Practice and Experience, Vol. 22, No. 2, Feb. 1992, pp. 173-182. [Fagan, 1976] Fagan, M.E. “Design and Code Inspections to Reduce Errors in Program Development,” IBM Syst. J., Vol. 15, No. 3, 1976, pp. 181-211. [Fagan, 1986] Fagan, M.E. “Advances in Software Inspections,” IEEE Trans. Software Eng., Vol. 12, No. 7, July 1986, pp. 744-751. [Fowler 1986] Fowler, P.J. “In-Process Inspections of Work Products at AT&T,” AT&T Technical Journal, Vol 65, No.2, Mar./Apr. 1986, pp. 102-112. 154 [Galin, 2004] Galin, D. “Software Quality Assurance from Theory to Implementation,” Addison- Wesley, 2004. [Gallis, 2003] Gallis, H., Arisholm, E. and Dyba, T. “An Initial Framework for Research on Pair Programming,” Proceedings, ISESE, 2003. [Gilb, 1993] Gilb, T. and Graham, D. “Software Inspection,” Addison-Wesley, Mass., 1993. [Glaser, 1967] Glaser, B.G., Strauss, A.L. “The discovery of grounded theory,” Hawthorne, NY: Aldine, 1967. [Graden, 1986] Garden, M.E., Horsley, P.S. and Pingel, T.C. “The Effects of Software Inspections on a Major Telecommunications Project,” AT&T Technical Journal, Vol 65, No.3, May/June 1986, pp. 32-40. [Gryna, 1998] Gryna, F.M. “Quality Costs,” Juran’s Quality Control Handbook 4 th ed., New York: McGrawHill, 1998. [Hall, 1976] Hall, E.T. “Beyond Culture,” New York, Anchor Books/Doubleday, 1976. [Hewitt, 2001] Hewitt-Taylor, J. “Use of Constant Comparative Analysis in Qualitative Research”, Nursing Standard, March 2001, pp.39-42, [Hofstede, 1997] Hofstede, G. “Culture and Organizations – Software of the Mind,” McGraw-Hill, 1997. 155 [Hofstede, 2001] Hofstede, G. “Culture’s Consequences -- Comparing Balues, Behaviors, Institutions and Organizations Across Nations,” Thousand Oaks, CA, Sage, 2001. [Humphrey, 1995] Humphrey, W.S., “A Discipline for Software Engineering,” Addison-Wesley, 1995. [IBM Research, 2002] IBM Research, “Orthogonal Defect Classification,” version 5.11, Center for Software Engineering, 2002. < http://www.research.ibm.com/softeng/ODC/DETODC.HTM > [Jirachiefpattana, 1996] Jirachiefpattana, W. “The Impact of Thai Culture on Executive Information Systems Development,” Proceedings, the 6th International Conference Theme 1, Globalizaion: Impact on and Coping Strategies in Thai Society, 14-17 October, Chiang Mai, Thailand, 1996, pp 97-110. [Katira, 2004] Katira, N.A., “Understanding The Compatibility of Pair Programmers,” Master Thesis, Department of Computer Science, North Carolina state University, 2004. [Kelly, 1992] Kelly, J.C., Sherif, J.S. and Hops, J. “An Analysis of Defect Densities Found During Software Inspections,” Journal, Systems and Software, Vol. 17, No. 2, Feb. 1992, pp. 111-117. [Kitchenham, 1986] Kitchenham, B.A., Kitchenham, A.P. and Fellows, J.P. “The Effects of Inspections on Software Quality and Productivity,” ICL Technical Journal, Vol. 5, No. 1, May 1986, pp. 112-122 [Krasner, 1998] Krasner, H. “Using The Cost of Quality Approach for Software Development,” Crosstalk, Nov. 1998. 156 [Kruchten, 2003] Kruchten, P. “The Rational Unified Process: An Introduction,” Addison-Wesley, 2003. [Lee, 2005] Lee, K., and Boehm, B. “Empirical Results from an Experiment on Value-Based Review (VBR) Processes,” Proceedings, ISESE, 17 – 18 November, 2005. [Madachy, 1994] Madachy, R.J. “A Software Project Dynamics Model for Process Cost and Risk Assessment,” Ph.D. Dissertation, Department of Industrial and Systems Engineering, University of Southern California, December 1994. [Madachy, 1996] Madachy, R.J. “System Dynamics Modeling of an Inspection-Based Process,” Proceeding of ICSE-18, 1996, pp. 376-386. [Mcdowell, 2002] Mcdowell, C., Werner, L., Bullock, H. and Fernald, F. “The Effects of Pair- Programming on Performance in an Introductory Programming Course,” Proceedings, the 33 rd SIGCSE technical symposium on computer science education, 2002, pp. 38-42. [Muller, 2001] Muller, M.M. and Tichy, W.F. “Case Study:Extreme Programming in a University Environment,” Proceedings, ICSE, 2001, pp. 537-544. [Myers, 1988] Myers, W. “Shuttle Code Achieves Very Low Error Rate,” IEEE Software, Vol. 5, No. 5, Sept. 1988, pp. 93-95. [Nagappan, 2003] Nagappan, N., Williams, L., Wiebe, E., Miller, C., Balik, S., Ferzli, M., Petlick, M. “Pair Learning: With an Eye Toward Future Success,” XP/Agile Universe 2003, 2003. 157 [Nawrocki, 2001] Nawrocki, J. and Wojciechowski, A. “Experimental Evaluation of Pair Programming,” Proceedings, ESCOM, 2001, pp 269-276. [Nosek, 1998] Nosek, J.T. “The Case for Collaborative Programming,” Communications of the ACM, 1998, pp. 105-108. [Phongpaibul, 2005] Phongpaibul, M. “Improving Quality through Software Process Improvement in Thailand: Initial Analysis,” Proceeding, 3-WoSQ, ICSE 2005, May 17th, 2005. [Phongpaibul, 2006] Phongpaibul, M., and Boehm, B.W. “An Empirical Comparison Between Pair Development and Software Inspection in Thailand,” Proceeding, ISESE 2006, Sept. 2006. [Phongpaibul, 2007] Phongpaibul, M, and Boehm, B.W. “A Replicate Empirical Comparison Between Software Development with Inspection and Pair Development,” Proceeding, ESEM 2007, Sept 2007. [Reeve, 1991] Reeve, J.T., “Applying the Fagan Inspection Technique,’ Quality Forum, Vol. 17, No 1, Mar. 1991, pp. 40-47. [Research Triangle Institute, 2002] Research Triangle Institute, “The Economic Impacts of Inadequate Infrastructure for Software Testing,” Ed. Dr. Gregory Tassey. RTI Project No. 7007.011. Washington, D>C.:National Institute of Standards and Technology, May 2002. <www.mel.nist.gov/msid/sima/sw_testing_rpt.pdf>. [Rostaher, 2002] Rostaher, M. and Hericko, M. “Tracking Test-First Pair Programming An Experiment,” Proceedings, XP/Agile Universe 2002. New York: Springer; 2002, pp. 174-184. 158 [Russell, 1991] Russell, G.W. “Experience with Inspection in Ultralarge-Scale Development,” IEEE Software, Vol. 8, No. 1, Jan. 1991, pp. 25-31. [Shull, 2002] Shull, F., Basili, V., Zelkowitz, M., Boehm, B., Brown, A.W., Port, D., Rus, I., and Tesoreiro, R. “What we have Learned about Fighting Defects,” Proceeding, International Conference on SW Metrics, June 2002. [Shirey, 1992] Shirey, G.C. “how Inspections Fail,” Proceedings, the 9 th International Conference on Testing Computer Software, 1992, pp. 151-159. [Siegel, 1988] Siegel, A.F. “Statistic and Data Analysis,” John Wiley & Sons, Singapore, 1988. [Slaughter, 1998] Slaughter, S.A., Harter, D.E., and Krishnan M.S. “Evaluating the Cost of Software Quality,” Communications of ACM, Vol. 41, No. 8, August 1998, pp. 67-73. [Succi, 2002] Succi, G., Marchesi, M., Pedrycz,W., Williams, L. “Preliminary Analysis of the Effects of Pair Programming on Job Satisfaction,” Fourth International Conference on eXtreme Programming and Agile Processes in Software Engineering (XP2002). [Thanasankit, 1999] Thanasankit, T., and Corbitt B. “Towards Understanding Managing Requirements Engineering - A Case Study of a Thai Software House,” Proceedings of Conference on Computers and Information Technology in Asia 99, September, Sarawak, East Malaysia, pp 993-1013, 1999. [Thanasankit, 2000] Thanasankit, T., and Corbitt B. “Cultural Context and its Impact on Requirements Elicitation in Thailand,” The Electronic Journal on Information Systems in Developing Countries, <http://www.ejisdc.org>, 2000. 159 [Weller, 1993] Weller, E.F. “Lessons from Three Years of Inspection Data,” IEEE Software, Vol. 10, No. 5, Sept. 1993, pp. 38-45. [Wernick, 2004] Wernick, P., and Hall, T. “The Impact of Using Pair Programming on System Evolution: a Simulation-Based Study,” Proceedings, ICSM’ 04, IEEE, 2004. [Wheeler, 1996] Wheeler, D.A., Brykczynski, B., and Meeson, R.N.Jr. “Software Inspection: An Industry Best Practice,” IEEE CS Press, Los Alamitos, CA, 1996. [Wiegers, 2001] Wiegers, K.E. “Peer Reviews in Software: A Practical Guide,” Addison-Wesley, 2001 [Williams, 2000] Williams, L. “The Collaborative Software Process,” PhD Dissertation, Department of Computer Science, University of Utah, 2000. [Williams, 2002] Williams, L., Wiebe, E., Yang, K., Ferzli, M., and Miller, C. “In Support of Pair Programming in the Introductory Computer Science Course,” Computer Science Education, September 2002. [Williams, 2003] Williams, L., and Kessler, R.R. “Pair Programming Illuminated,” Addison-Wesley, 2003. 160 Appendices Appendix A: Defect Type Definition Requirement Defect Type A requirements defect is an error in the definition of system functionality Correctness Wrongly stated requirements Examples: 1) An incorrect equation, parameter value or unit specification 2) A requirement not feasible with respect to cost, schedule and technology Completeness Necessary information is missing Examples: 1) Missing attributes, assumptions, and constraints of the software system. 2) No priority assigned for requirements and constraints. 3) Requirements are not stated for each iteration or delivery Consistency A requirements that is inconsistent or mismatched with other requirements Examples: 1) requirements conflict with each other 2) Requirements are not consistent with the actual operation environment (eg. Test, demonstration, analysis, or inspection) have not been stated. Traceability A requirement that is not traceable to or mismatched with the user needs, project goals, organization goals Design/Code Defect Type A design/code defect is an error in the design definition or code [IBM, 2002]. 161 Assignment/Initialization Value(s) assigned incorrectly or not assigned at all; but note that a fix involving multiple assignment corrections may be of type Algorithm. Examples: 1) Internal variable or variable within a control block did not have correct value, or did not have any value at all. 2) Initialization of parameters 3) Resetting a variable's value. 4) The instance variable capturing a characteristic of an object (e.g., the color of a car) is omitted. 5) The instance variables that capture the state of an object are not correctly initialized. Checking Errors caused by missing or incorrect validation of parameters or data in conditional statements. It might be expected that a consequence of checking for a value would require additional code such as a do while loop or branch. If the missing or incorrect check is the critical error, checking would still be the type chosen. Examples: 1) Value greater than 100 is not valid, but the check to make sure that the value was less than 100 was missing. 2) The conditional loop should have stopped on the ninth iteration. But it kept looping while the counter was <= 10. Algorithm/Method Efficiency or correctness problems that affect the task and can be fixed by (re)implementing an algorithm or local data structure without the need for requesting a design change. Problem in the procedure, template, or overloaded function that describes a service offered by an object. Examples: 1) The low-level design called for the use of an algorithm that improves throughput over the link by delaying transmission of some messages, but the implementation transmitted all messages as soon as they arrived. The algorithm that delayed transmission was missing. 2) The algorithm for searching a chain of control blocks was corrected to use a linear-linked list instead of a circular-linked list. 3) The number and/or types of parameters of a method or an operation are incorrectly specified. 4) A method or an operation is not made public in the specification of a class. Function/Class/Object 162 The error should require a formal design change, as it affects significant capability, end-user interfaces, product interfaces, interface with hardware architecture, or global data structure(s); the error occurred when implementing the state and capabilities of a real or an abstract entity. Examples: 1) A database did not include a field for street address, although the requirements specified it. 2) A database included a field for postal zip code, but it was too small to contain international postal codes as specified in the requirements. 3) A C++ or SmallTalk class was omitted during system design. Timing/Serialization Necessary serialization of shared resource was missing, the wrong resource was serialized, or the wrong serialization technique was employed. Examples: 1) Serialization is missing when making updates to a shared control block. 2) A hierarchical locking scheme is in use, but the defective code failed to acquire the locks in the prescribed sequence. Interface/O-O Messages Communication problems between: 1) modules 2) components 3) device drivers 4) objects 5) functions via 1) macros 2) call statements 3) control blocks 4) parameter lists Examples: 1) A database implements both insertion and deletion functions, but the deletion interface was not made callable. 2) The interfaces specify a pointer to a number, but the implementation is expecting a pointer to a character. 3) The OO-message incorrectly specifies the name of a service. 163 4) The number and/or types of parameters of the OO-message do not conform to the signature of the requested service. Relationship Problems related to associations among procedures, data structures and objects. Such associations may be conditional. Examples: 1) The structure of code/data in one place assumes a certain structure of code/data in another. Without appropriate consideration of their relationship, program will not execute or it executes incorrectly. 2) The inheritance relationship between two classes is missing or incorrectly specified. 3) The limit on the number of objects that may be instantiated from a given class is incorrect and causes performance degradation of the system. 164 Appendix B: Post-Questionnaire Part A: General Information A1. Are you: ____ Male (married) ____ Male (unmarried) ____ Female (married) ____ Female (unmarried) A2. How old are you: _____ Under 20 _____ 20 - 24 _____ 25 - 29 _____ 30 – 34 _____ 35 – 49 _____ 40 or over A3. How many years of formal school education (or their equivalent) did you complete (starting with primary school): _____ 10 years or less _____ 11 years _____ 12 years _____ 13 years _____ 14 years _____ 15 years _____ 16 years _____ 17 years _____ 18 years or over 165 A4. What is your nationality? _________________________________ A5. And what was your nationality at birth (if different)? _______________________ 166 Part B: Work Belief Information Question A1 – A16 About your goals: People differ in what is important to them in a job. In this section, we have listed a number of factors which people might want in their work. We are asking you to indicate how important each of these is to you. In completing the following section, try to think of those factors which would be important to you is an ideal job. Note: Although you may consider many of the factors listed as important, you should use the rating “of utmost importance” only for those items which are of the most importance to you. With regard to each item, you will be answering the general question: “HOW IMPORTANT IS IT TO YOU TO …” (mark one of each line across) Of utmost importance to me Very importance Of moderate importance Of little importance Of very little or no importance B1. Have Challenging work to do- work from which you can get a personal sense of accomplishment? 1 2 3 4 5 B2. Live in an area desirable to you and your family? 1 2 3 4 5 B3. Have an opportunity for high earning? 1 2 3 4 5 B4. Work with people who cooperate well with one another? 1 2 3 4 5 B5. Have training opportunities (to improve your skills or to learn new skills)? 1 2 3 4 5 167 Of utmost importance to me Very importance Of moderate importance Of little importance Of very little or no importance B6. Get the recognition you deserve when you do a good job? 1 2 3 4 5 B7. Have good physical working conditions (good ventilation and lighting, adequate work space, etc.)? 1 2 3 4 5 B8. Have considerable freedom to adopt your own approach to the job? 1 2 3 4 5 B9. Have the security that you will be able to work for your company as long as you want to? 1 2 3 4 5 B10. Have an opportunity for advancement to higher level jobs? 1 2 3 4 5 B11. Have an element of variety and adventure in the job? 1 2 3 4 5 B12. Have a good working relationship with your manager or your direct superior? 1 2 3 4 5 B13. Fully use your skills and abilities on the job? 1 2 3 4 5 B14. Have a job which leaves you sufficient time for your personal or family life? 1 2 3 4 5 B15. Be consulted by your manager or your direct superior in his/her decisions? 1 2 3 4 5 168 Question B17: The descriptions below apply to four different types of managers. First, please read through these descriptions: Manager 1 : Usually makes his/her decisions promptly and communicates them to his/her subordinates clearly and firmly. Expects them to carry out the decisions loyally and without raising difficulties. Manager 2 : Usually makes his/her decisions promptly, but, before going ahead, tires to explain them fully to his/her subordinates. Gives them the reasons for the decisions and answers whatever questions they may have. Manager 3 : Usually consults with his/her subordinates before he/she reaches his/her decisions. Listens to their advice, considers it, and then announces his/her decision. He/she then expects all to work loyally to implement it whether or not it is in accordance with the advice they gave. Manager 4 : Usually calls a meeting of his/her subordinates when there is an important decision to be made. Puts the problem before the group and tries to obtain consensus. If he/she obtains consensus, he/she accepts this as the decision. If consensus is impossible, he/she usually makes the decision him/herself. B16. Now for the above type of manager, please mark the one which you would prefer to work under? _____ Manager 1 _____ Manager 2 _____ Manager 3 _____ Manager 4 169 B17. If you had a choice of promotion to either a managerial or a specialist position and these jobs were at the same salary level, which would appeal to you most? _____ I would have a strong preference for being a specialist _____ I would have some preference for being a specialist _____ It des not make any difference _____ I would have some preference for being a manager _____ I would have a strong preference for being a manager B18. All in all, what is your personal feeling about working for a company which a primarily foreign-owned? _____ All in all, I prefer it this way _____ It makes no difference to me one way or the other _____ I would prefer that it was not this way B19. How do you feel or think you would feel about working for a manager who is from a country other than your own? _____ In general, I would prefer to work for a manager of my nationality _____ Nationality would make no difference to me _____ In general, I would prefer to work for a manager of a different nationality 170 Question B21 – B24 : In your country, how frequently, in your experience, do the following problems occur (mark one of each line across) ? Very frequently Frequently Sometimes Seldom Very seldom B20. Employees begin afraid to express disagreement with their managers 1 2 3 4 5 B21. Being unclear on what your duties and responsibilities are 1 2 3 4 5 B22. People above you getting involved in details of your job which should be left to you 1 2 3 4 5 B23. Some groups of employees looking down upon other groups of employees 1 2 3 4 5 Question B25 – B38: About General Beliefs: Indicates the extent to which you personally agree or disagree with each of these statements (mark one of each line across). Strongly agree Agree Undecided Disagree Strongly disagree B24. A corporation should have a major responsibility for the health and welfare of its employees and their immediate families 1 2 3 4 5 B25. Having interesting work to do is just as important to most people as having high earnings 1 2 3 4 5 B26. Competition among employees usually does more harm than good 1 2 3 4 5 171 Strongly agree Agree Undecided Disagree Strongly disagree B27. Employees lose respect for a manager who asks them for their advice before he makes a final decision 1 2 3 4 5 B28. Employees in industry should participate more in the decisions made by management 1 2 3 4 5 B29. Decisions made by individuals are usually of higher quality than decisions made by groups 1 2 3 4 5 B30. A corporation should do as much as it can to help solve society’s problems (poverty, discrimination, pollution, etc.) 1 2 3 4 5 B31. Staying with one company for a long time is usually the best way to get ahead in business 1 2 3 4 5 B32. Company rules should not be broken even when the employee thinks it is in the company’s best interests 1 2 3 4 5 B33. Most employees in industry prefer to avoid responsibility, have little ambition, and want security above all 1 2 3 4 5 B34. Most people can be trusted 1 2 3 4 5 B35. One can be a good manager without having precise answers to most questions that subordinates may raise about their work 1 2 3 4 5 B36. An organization structure in which certain subordinates have two bosses should be avoided at all cost 1 2 3 4 5 B37. When people have failed in life it is often their own fault 1 2 3 4 5 172 Part C: Personal Belief Information Question C1 – C4 In your private life, how important is each of the following to you (mark one of each line across)? of utmost importance to me Very important Of moderate importance Of little importance Of very little or no importance C1. Personal steadiness and stability 1 2 3 4 5 C2. Thrift 1 2 3 4 5 C3. Persistence (perseverance) 1 2 3 4 5 C4. Respect for tradition 1 2 3 4 5 C5. How often do you feel nervous or tense at work? _____ Never _____ Seldom _____ Sometimes _____ Usually _____ Always 173 Part D: Project Experience D1. How important do you feel it is that a defect is found early, in any moment, compared with found but at a later moment? _____ Not at all _____ Little _____ Fairly _____ Much _____ Absolute D2. What technique did you used to verification your project? _________________________ (Pair Programming or Inspection) D3. Compare to your previous experience, how efficient do you believe that Pair Programming or Inspection can be to find/prevent early defects producing documents, code etc.? _____ Not at all _____ Little _____ Fairly _____ Much _____ Absolute D4. Compare to your previous experience, how cost-effective do you believe that Pair Programming or Inspection can be (compare quality of product vs. time spend)? _____ Not at all _____ Little _____ Fairly _____ Much _____ Absolute 174 Question D5 – D7: About Pair Programming Experience. (Inspection group skip to Q11) From your experience with Pair Programming, D5. Rank (1 – 6) the requirement’s defect types which Pair Development can easily detect. 1 is the easiest to detect using Pair Development 6 is the hardest to detect using Pair Development _____ Clarity _____ Compliance _____ Consistency _____ Correctness _____ Level of Detail _____ Traceability D6. Rank (1 – 7) the design/code’s defect types which Pair Design/ Programming can easily detect. 1 is the easiest to detect using Pair Design/ Programming 7 is the hardest to detect using Pair Design/ Programming Note: Make sure you understand the definition of each defect type before rating. _____ Assignment/ Initialization _____ Algorithm/ Method _____ Checking _____ Function/ Class/ Object _____ Interface/ O-O Mess _____ Relationship _____ Timing/ Serialization 175 D7. Mark three options you think are most important to use Pair Development into get the highest benefit/profit in a project in total. _____ Pre-study phase _____ Requirement engineer _____ All documentation _____ Architecture _____ Low-level design _____ In coding _____ Unit-tests creation _____ System-tests creation _____ Run unit tests _____ Run system tests _____ Defect code correction Question D8 – D10: About Inspection Experience. (Pair Development group skip to Q11) From your experience with Inspection, D8. Rank (1 – 6) the requirement’s defect types, which Inspection can easily detect. 1 is the easiest to detect using Inspection 6 is the hardest to detect using Inspection _____ Clarity _____ Compliance _____ Consistency _____ Correctness _____ Level of Detail _____ Traceability 176 D9. Rank (1 – 7) the design/code’s defect types which Inspection can easily detect. 1 is the easiest to detect using Inspection 7 is the hardest to detect using Inspection Note: Make sure you understand the definition of each defect type before rating. _____ Assignment/ Initialization _____ Algorithm/ Method _____ Checking _____ Function/ Class/ Object _____ Interface/ O-O Mess _____ Relationship _____ Timing/ Serialization D10. Mark three options you think are most important to use Inspection in to get the highest benefit/profit in a project in total. _____ Pre-study phase _____ Requirement engineer _____ All documentation _____ Architecture _____ Low-level design _____ In coding _____ Unit-tests creation _____ System-tests creation _____ Run unit tests _____ Run system tests _____ Defect code correction
Abstract (if available)
Abstract
Peer review is one of the essential activities in software quality assurance since peer reviews can detect and remove defects in the early stages of the software development life cycle. Removing defects early reduces the cost of defect rework later. Selecting a peer review methodology (e.g., inspection, walkthrough, checklist-based, defect-based, function-based, perspective-based, usage-based, value-based) to execute in a software project is difficult. The developers have to understand the commonalities and differences of each methodology. They need to know the relative strengths and weaknesses of these practices. However, very few studies have compared the commonalities and differences of each peer review methodology and none of the studies have shown an empirical comparison between pair programming and software inspection.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Using metrics of scattering to assess software quality
PDF
Security functional requirements analysis for developing secure software
PDF
Value-based, dependency-aware inspection and test prioritization
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Quantitative and qualitative analyses of requirements elaboration for early software size estimation
PDF
Improved size and effort estimation models for software maintenance
PDF
The effects of required security on software development effort
PDF
A model for estimating schedule acceleration in agile software development projects
PDF
Designing an optimal software intensive system acquisition: a game theoretic approach
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
A model for estimating cross-project multitasking overhead in software development projects
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Software architecture recovery using text classification -- recover and RELAX
PDF
Calibrating COCOMO® II for functional size metrics
PDF
Incremental development productivity decline
PDF
Analytical and experimental studies in the development of reduced-order computational models for nonlinear systems
PDF
Analytical and experimental studies in system identification and modeling for structural control and health monitoring
PDF
Where geospatial software development and video game development intersect: using content analysis to better understand disciplinary commonalities and facilitate technical exchange
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
Asset Metadata
Creator
Phongpaibul, Monvorath
(author)
Core Title
Experimental and analytical comparison between pair development and software development with Fagan's inspection
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Software Engineering)
Publication Date
12/01/2007
Defense Date
05/05/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cost of software quality,emperical study,OAI-PMH Harvest,pair development,pair programming,software engineering,software inspection,software management,software verification
Language
English
Advisor
Boehm, Barry W. (
committee chair
), Jacobs, Steve (
committee member
), Neches, Robert (
committee member
), Steece, Bert M. (
committee member
)
Creator Email
phongpai@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m957
Unique identifier
UC1308491
Identifier
etd-Phongpaibul-20071201 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-486595 (legacy record id),usctheses-m957 (legacy record id)
Legacy Identifier
etd-Phongpaibul-20071201.pdf
Dmrecord
486595
Document Type
Dissertation
Rights
Phongpaibul, Monvorath
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
cost of software quality
emperical study
pair development
pair programming
software engineering
software inspection
software management
software verification