Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
(USC Thesis Other)
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ARICHTECTURE AND APPLICATION OF AN AUTONOMOUS ROBOTIC SOFTWARE ENGINEERING TECHNOLOGY TESTBED (SETT) by Alexander K. Lam A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHIILOSOPHY (COMPUTER SCIENCE) December 2007 Copyright 2007 Alexander K. Lam ii Dedication To my parents and sister who have always supported and loved me; To my grandmother who I lost this year– I wish you could have been here for my graduation. iii Acknowledgements First and foremost, I would like to thank my parents for providing me support and love and teaching me to always work hard and to never give up on my dreams. The way you live your lives is the model that I always strive to achieve. I thank my sister for helping me to get settled in my life here in Los Angeles. Without you, I wouldn’t be able to survive the jungle of Los Angeles. I deeply thank Dr. Barry Boehm for guiding me in my research studies. I value all the advice you have given me during my years here at USC and for teaching me so much about software engineering. I thank my dissertation committee for providing me guidance in my research. I am thankful for the people who helped me on the SCRover testbed project, which includes the NASA JPL-MDS and the HDC-NSF technologists group. Without your help, I would have never been able to do the SCRover project. Finally, I would like to thank my friends at USC-CSSE who went through the Ph.D. program with me, especially the ones in SAL 329. You provided me with support and friendship as we worked towards our degree. iv Table of Contents Dedication ii Acknowledgements iii Table of Contents iv List of Tables x List of Figures xii Abstract xvi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Statement 2 1.3 Research Strategy 3 v Chapter 2 Background - Nature and Benefits of Technology Testbeds 6 2.1 Current Testbed Definition 6 2.2 Software Engineering Technology Testbed Definition 6 2.3 What distinguishes software engineering technology testbeds (SETTs) from other classes of testbeds and software-intensive systems? 8 2.4 How do the distinguishing objectives and constraints determine a SETTs’ content and architecture? 10 2.5 A Look at Testbeds 12 2.5.1 Ada Compiler Evaluation System (ACES) Testbed 12 2.5.2 Testbed competitions in the software multi-agent field 13 2.5.3 Software Engineering Competitions 14 2.5.4 NASA Software Dependability Testbeds 15 2.6 Benefits of testbeds 16 2.7 Hazards of testbeds 18 2.8 Difficulties of getting new technologies adopted 20 2.9 How researchers develop and mature their technologies 21 2.9.1 Why measure maturity? 21 2.10 How practitioners choose technology? 22 2.10.1 Technology Maturity Measurement: TRLs 23 2.10.2 What are the advantages and disadvantages of using TRL? 24 2.11 NASA HDCP Testbeds 25 2.11.1 What was HDCP? 25 2.11.2 HDCP Testbed Objectives 25 2.11.3 HDCP Testbed Milestone 28 2.11.4 HDCP Golden Gate Testbed 29 2.11.5 HDCP Dependable Automated Air Traffic Management Testbed 30 2.11.6 SCRover Testbed 32 Chapter 3 Objectives of the Software Engineering Technology Testbed 34 3.1 Software Engineering Technology Testbed Operational Concepts 36 3.2 Technologies Evaluated 40 3.3 Testbed Stakeholders’ Objectives 42 3.4 What do the stakeholders need to meet their objectives? 42 3.5 Literature Review of Requirements 44 Chapter 4 Requirements and Architecture of the Software Engineering Technology Testbed 48 4.1 CBSP Approach 48 4.1.1 CBSP Introduction 48 4.1.2 Results of CBSP Process 49 4.2 DSSA Introduction 55 4.2.1 Domain Model 55 4.2.2 Reference Requirements 68 4.2.3 Reference Architecture 70 4.3 Results of CBSP vs. DSSA 106 4.4 Requirements Mapped to Published References 106 vi Chapter 5 Software Engineering Technology Testbed Analysis 109 5.1 Comparison of Software Engineering Technology Testbeds to Other Testbeds 109 5.2 Were testbed users’ needs met by the architecture? 119 5.3 How do software engineering technology testbeds answers the challenges of technology adaptation 121 5.4 How do software engineering technology testbeds answers the challenges the hazards of competitions 122 5.5 How do software engineering technology testbeds answers the obstacles of other testbeds 124 5.6 How to Configure a SETT 126 Chapter 6 The SCRover Testbed Life Cycle Architecture Package 128 6.1 SCRover results chain, system boundary, and operational concept 128 6.2 Explanation of MDS technology 131 6.2.1 State Analysis Engineering Process 132 6.2.2 MDS Framework 132 6.2.3 Operational Tools 132 6.3 USC Mission Adaptation Code 133 6.3.1 Software - Player/Gazebo 133 6.3.2 Hardware – Pioneer 2-AT 134 6.3.3 SCRover testbed architecture, specification, and code 134 6.3.4 Robot Missions 135 6.3.5 State knowledge 136 6.3.6 State control 136 6.3.7 Hardware proxy 137 6.3.8 State determination 137 6.4 SCRover Testbed Architecture Tradeoffs and Decisions 138 6.5 SCRover Testbed Architecture 141 6.5.1 Code 143 6.5.2 Specifications and Data 144 6.5.3 Predefined Packages 146 6.5.4 Guidelines - Manuals 146 6.5.5 Mission scenarios 147 6.5.6 Seeded Defects 148 6.5.7 Instrumentation 151 6.5.8 Software Simulator 152 6.5.9 Visual Query Interface (VQI) 153 6.5.10 Past Experiments and their results 154 6.5.11 System and Component Properties 155 6.6 Resulting SCRover testbed features 155 6.7 Summary of Testbed Operational Concepts mapped to SCRover Testbed Features 157 Chapter 7 Fault-Tree Analysis 158 vii Chapter 8 Extending Other Testbeds to Software Engineering Technology Testbeds 163 8.1 Comparing Testbed’s Architecture 163 8.2 Extending Other Testbeds to be Software Engineering Technology Testbeds 164 8.2.1 Conditions under which a system can be transformed to a dependability testbed: 164 8.3 Extending RoboCup 165 8.3.1 RoboCup Scenarios 166 8.3.2 Seeded Defects 166 8.3.3 How to assess 167 8.3.4 RoboCup Code and Platforms 168 8.3.5 RoboCup Specifications 168 8.3.6 RoboCup Defects 168 8.3.7 RoboCup Instrumentation 168 8.4 Extending TAC 169 8.4.1 TAC Scenario 169 8.4.2 Seeded Defects 169 8.4.3 How to assess 170 8.4.4 TAC Code and Platforms 170 8.4.5 TAC Specifications 171 8.4.6 TAC Defects 171 8.4.7 TAC Instrumentation 171 8.5 Extending DARPA and ISPW 171 viii Chapter 9 SCRover Testbed Architecture and other Technologies Evaluation 173 9.1 Mae Evaluation 173 9.1.1 Mae technology summary 173 9.1.2 Experimental application of Mae technology to SCRover 174 9.1.3 Mae Experimental results 175 9.1.4 Conclusion of using SCRover Testbed with Mae 178 9.2 AcmeStudio Evaluation 179 9.2.1 AcmeStudio Technology Summary 179 9.2.2 Experimental application of AcmeStudio technology to SCRover 180 9.2.3 AcmeStudio Experiment Results 181 9.3 Maude Evaluation 183 9.3.1 Maude Technology Summary 183 9.3.2 Experimental application of Maude technology to SCRover 183 9.3.3 Maude Experiment Results 184 9.4 ROPE Evaluation 186 9.4.1 ROPE Technology Summary 186 9.4.2 Experimental application of ROPE technology to SCRover 186 9.4.3 ROPE Lessons Learned 189 9.5 STRESS Evaluation 191 9.5.1 STRESS Technology Summary 191 9.5.2 Experimental application of STRESS technology to SCRover 191 9.6 STAR Evaluation 193 9.6.1 STAR Technology Summary 193 9.6.2 Experimental application of STAR technology to SCRover 193 9.6.3 How to Extend SCRover to meet STAR’s needs 194 9.7 Synergy between Evaluations 195 9.8 SCRover Limitations 196 9.9 Researchers’ Summary Results 203 Chapter 10 SCRover Testbed Implementation and Performance Analysis 205 10.1 Cost and Defect Analysis 205 10.1.1 Cost Analysis 206 10.1.2 Analysis of Defects Found vs. Effort 207 10.2 How well do SETTs work across a range of technologies? 211 10.2.1 Software Architecture Defect Analysis 212 10.2.2 Software Source Code Defect Analysis 213 10.2.3 Software Requirements Defect Analysis 216 ix Chapter 11 Conclusion 219 11.1 Did users benefit from the testbed’s architecture and capabilities? 219 11.2 Were testbed operational concepts met? 221 11.3 Principles and Practices Revisited 225 11.4 How well do SETTs answer challenges of technology adaptation 226 11.5 How well do SETTs meet the hazards of competitions? 227 11.6 Do SETTs help increase technology maturity? 228 11.7 Results to Date 228 11.8 Conclusions and Lessons Learned 230 Bibliography 235 Appendices 243 Appendix A: Technology Readiness Levels Summary 244 Appendix B: JPL-MDS Export Control Clearance Form 245 Appendix C: Testbed Survey 246 Appendix D: Technologies Examined for the HDCP Program 248 x List of Tables Table 1: Technology families to be evaluated 41 Table 2: Mapping between User' s Needs and Testbed Concepts 44 Table 3: Mapping btw. Operational Concepts, Architecture, and Requirements 50 Table 4: Object Model 66 Table 5: Specifications Component 83 Table 6: Code Component 85 Table 7: Seeded Defect Engine 87 Table 8: Defect Pool 90 Table 9: Scenario/Mission Generator 91 Table 10: Platform 93 Table 11: Instrumentation 94 Table 12: Project Data 96 Table 13: Guidelines-Manuals 97 Table 14: Technology Evaluation Result 99 Table 15: Experience Base 100 Table 16: Defect-Reducing Technology 102 Table 17: User-Interface 103 Table 18: Requirements Mapped to Published References 107 Table 19: Evaluation Context and Criteria, and Problem Scope Comparison 110 Table 20: Cost, Researchers’ and Practitioners’ Feedback Comparison 111 Table 21: Specifications, Code, and Seeded Defects Comparison 112 Table 22: Instrumentation and Mission Generator Comparison 113 xi Table 23: Feedback, Multiple Goals and Contributors Comparison 113 Table 24: Searchable and Combinable Results and Low Cost Comparison 114 Table 25: Mapping between User' s Needs and Architecture 119 Table 26: System and Component Properties 155 Table 27: Testbed Operational Concept mapped to Testbed Features 157 Table 28: Seeded defect estimate of remaining defect distribution 177 Table 29: Technology Evaluation Cost 205 Table 30: HDCP Technologies to be Evaluated 248 xii List of Figures Figure 1: TSAFE Architecture 31 Figure 2: Software Engineering Technology Testbed Reference Architecture 52 Figure 3: Researchers' Operational Scenario 58 Figure 4: Practitioners' Operational Scenario 60 Figure 5: Testbed System Context Diagram 62 Figure 6: Practitioners' ER Diagram 63 Figure 7: Researchers' ER Diagram 63 Figure 8: Specifications' ER Diagram 64 Figure 9: Technology Evaluation Result ER Diagram 64 Figure 10: Data Flow Diagram involving Practitioner and Researcher 65 Figure 11: Data Flow Diagram involving Researching Setting Up an Evaluation 66 Figure 12: Mapping between Problem Space and Solution Space 70 Figure 13: Use-Case Diagram 71 Figure 14: Sequence Diagram for Adopt Technology from Researcher 72 Figure 15: Sequence Diagram for Adopt Off-the-Shelf Technology 73 Figure 16: Sequence Diagram for Evaluate Technology 74 Figure 17: Sequence Diagram for Configure Testbed (1) 75 Figure 18: Sequence Diagram for Configure Testbed (2) 76 Figure 19: Sequence Diagram for Mature Technology 77 Figure 20: Sequence Diagram for Mature Technology Faster 78 Figure 21: Simple Layered Reference Architecture Model 79 Figure 22: Reference Architecture of the Presentation and Business Layers 81 xiii Figure 23: Persistence Layer Artifacts 82 Figure 24: Internal Class Diagram for Specifications 84 Figure 25: Sequence Diagram for Specifications 84 Figure 26: Internal Class Diagram for Code 86 Figure 27: Sequence Diagram for Code 86 Figure 28: Internal Class Diagram for Seeded Defect Engine 87 Figure 29: Sequence Diagram for Seeding Defect Engine (1) 88 Figure 30: Sequence Diagram for Seeding Defect Engine (2) 89 Figure 31: Internal Class Diagram for Defect Pool 90 Figure 32: Sequence Diagram for Defect Pool 91 Figure 33: Internal Class Diagram for Scenario Generator 92 Figure 34: Sequence Diagram for Scenario Generator 92 Figure 35: Internal Class Diagram for Platform 93 Figure 36: Sequence Diagram for Platform 94 Figure 37: Internal Class Diagram for Instrumentation 95 Figure 38: Sequence Diagram for Instrumentation 95 Figure 39: Internal Class Diagram for Project Data 96 Figure 40: Sequence Diagram for Project Data 97 Figure 41: Internal Class Diagram for Guidelines-Manual 98 Figure 42: Sequence Diagram for Reading Guidelines 98 Figure 43: Sequence Diagram for Reading Manuals 98 Figure 44: Sequence Diagram for Reading FAQ’s 99 Figure 45: Internal Class Diagram for Technology Evaluation Result 100 xiv Figure 46: Internal Class Diagram for Experience Base 101 Figure 47: Sequence Diagram for Practitioner' s Use of Experience Base 101 Figure 48: Sequence Diagram for Researcher' s Use of Experience Base 102 Figure 49: Internal Class Diagram for Defect-Reducing Technology 102 Figure 50: Sequence Diagram for Defect-Reducing Technology 103 Figure 51: Internal Class Diagram for User Interface 104 Figure 52: Sequence Diagram for Practitioners’ User Interface 104 Figure 53: Sequence Diagram for Researchers’ User Interface (1) 105 Figure 54: Sequence Diagram for Researchers’ User Interface (2) 105 Figure 55: Sequence Diagram for Researchers’ User Interface (3) 105 Figure 56: Results Chain Diagram for SCRover System (Increment 3) 128 Figure 57: System Boundary and Environment Context Diagram of SCRover 129 Figure 58: SCRover Operational Concept 131 Figure 59: SCRover architecture 135 Figure 60: SCRover Testbed Reference Architecture 142 Figure 61: Defect Data for SCRover artifacts 151 Figure 62: Gazebo Simulator 153 Figure 63: VQI 154 Figure 64: Fault-Tree Analysis 159 Figure 65: Testbed Comparison by Components 163 Figure 66: Current RoboCup and TAC architecture 165 Figure 67: Mae defect detection yield by type 176 Figure 68: ER1 Rover 187 xv Figure 69: ROPE experiment 188 Figure 70: Failure Scenario detected by STRESS 192 Figure 71: Rover Stuck in a Loop 193 Figure 72: Project Overview 198 Figure 73: Homeground for Architecture Technologies 200 Figure 74: Homeground for Requirement Engineering Technologies 201 Figure 75: Mae/AcmeStudio/Peer Review Results 207 Figure 76: Architecture Defects Found vs. Effort 208 Figure 77: Major Defects Found vs. Effort 209 Figure 78: Minor Defects Found vs. Effort 209 Figure 79: Mae/Acme/Peer Review Results 223 xvi Abstract This research provides a new way to develop and apply a new form of software: software engineering technology testbeds designed to evaluate alternative software engineering technologies, and to accelerate their maturation and transition into project use. Software engineering technology testbeds include not only the specifications and code, but also the package of instrumentation, scenario drivers, seeded defects, experimentation guidelines, and comparative effort and defect data needed to facilitate technology evaluation experiments. The requirements and architecture to build a software engineering technology testbed has been developed and applied to evaluate a wide range of technologies. The technologies evaluated came from the fields of architecture, testing, state-model checking, and operational envelopes. The testbed evaluation showed (1) that certain technologies were complementary and cost-effective to apply to mission-critical systems; (2) that the testbed was cost-effective to use by researchers; (3) that collaboration in testbed use by researchers and the practitioners resulted in actions to accelerate technology maturity and transition into project use; and (4) that the software engineering technology testbed’s requirements and architecture were suitable for evaluating technologies and to accelerate their maturation and transition into project use. 1 Chapter 1 Introduction 1.1 Motivation When an organization like NASA has to send its software to space, the software has to be dependable. If the software does the wrong thing, it could mean the end of the mission, which would be a huge loss to the organization. However, developing defect-free software is a complex problem. There are many technologies available to help software engineers identify defects, but choosing the right one can be difficult. There are many questions to be asked of the technology such as “How does one know if the technology does as it states”, “Is the technology mature enough for use”, and “Will the technology work on my system?” Furthermore, technology researchers face their own problems in getting their technology adopted. For dependability researchers, they have to prove that their technology can help improve software dependability (decrease defects in the system), that their technology is the right one for them, and that the technology will work for the end user as promised. For research sponsors, they would like to see the technologies they invest in, mature as soon as possible for a faster return on their investment. However, according to Redwine and Riddle, the current process of having researchers develop new technologies and getting it adopted by users normally takes around 18 years [Redwine and Riddle 1985]. A faster process to mature and evaluate technologies is needed for the new technologies being developed today since 2 software needs to be dependable today, like in the case of NASA, which needs its flight software to be dependable. 1.2 Research Statement The goal of my research is to formulate and evaluate a set of principles and practices for developing software engineering technology testbeds that organizations can use to identify good technologies to use in their system development. Researchers will be able to use a testbed developed with these principles and practices to demonstrate that their technology is a good fit for an organization’s software system development. In addition, using such a testbed will help an organization evaluate a whole range of technologies and allow them to compare the results of the technology evaluations across common mission scenarios. Furthermore, such a testbed will help users determine under what conditions the technology works best in. For example, does it works well for a small program but not scale up to a large software system. The results of this evaluation can be written into a technology evaluation report that will be part of an experience base. The organization of the experience base will not be part of my research problem, but it will have an effect in terms of what data a researcher will need to collect during the evaluation. I have developed, evaluated, and evolved the principles and practices by using the Mission Data System (MDS) [Dvorak and et al. 2000] technology created by NASA-JPL to develop an instance of a software engineering technology testbed, SCRover. Users of the SCRover testbed include Dr. Roshanak Roshandel and Dr. Sandeep Gupta from USC, Dr. David Garlan of 3 Carnegie Mellon University, Dr. Carolyn Talcott of SRI, and Dr. Steve Fickas of the University of Oregon. The software engineering technology testbed will be different from other “testbeds” in terms of the added features other classes of testbeds do not have, such as detailed specifications, seeded defects and a repository of prior experiences. Other differences include how technologists will use the testbed and the objectives of the testbed such as buying down the risks of new technologies and acceleration in technology maturation. In addition, the testbed has been created to test a wide range of dependability technologies. Currently, researchers have performed evaluations on the testbed in the areas of software architecture, operational envelopes, testing, and state model checking. Through collaborations with researchers from the University of Oregon, Stanford University, Carnegie Mellon University, and the University of Southern California, I will discuss the extent to which the researchers’ technologies improved after being applied to the SCRover testbed and the extent to which the testbed was able to demonstrate to organizations that the researchers’ technology would be applicable to their software systems and help the organization increase its software system’s dependability. 1.3 Research Strategy In order to determine what are the principles and practices for developing software technology evaluation testbeds, I began by interviewing software engineers and technology researchers to find out what their objectives are when they use a testbed to evaluate technologies. Next, these objectives were 4 referenced against other researchers’ published experiences in order to validate the set of requirements. In addition, I found out what kinds of technologies were to be evaluated. Next, I used the user objectives to generate the testbed’s operational concepts. From the operational concepts, the requirements of the software technology evaluation testbed are generated. Afterwards, I use the requirements and the CBSP (Component, Bus, System, Property) approach [Medvidovic and et al. 2003] to create the testbed’s architecture to ensure traceability between the requirements and architecture. A second approach called Domain-Specific Software Architecture (DSSA) [Tracz 1995] was then applied to create the testbed requirements and architecture. Next, I developed an instance of the testbed to be called SCRover. I had researchers apply and evaluate their technologies to the SCRover testbed. A practitioner from JPL also used the testbed to identify how technologies would behave on his software system. Next, a summary of the principles and practices for developing software engineering technology testbeds was performed. As part of my validation and verification process, I developed an instance of SETT called SCRover, performed a fault-tree analysis to show the importance of each testbed component, showed how the principles and practices of software engineering technology testbeds could be applied to other systems, used techniques to show the SETT’s requirements and architecture traceability, did a comparative analysis of SCRover SETT capabilities vs. other testbed SETT capabilities, and performed an evaluation of SCRover SETT cost-effectiveness across a range of software 5 engineering technologies, and used published references to validate the testbed requirements. 6 Chapter 2 Background - Nature and Benefits of Technology Testbeds 2.1 Current Testbed Definition There are numerous definitions of what a testbed should or should not be. Some examples of testbed definitions include “Testing facilities for the validation and verification of software applications” [Test 2006] “Virtual test environment” [Testbed 2007] Most current definitions of testbed refer to the testing aspects of one or more software applications but not about how to evaluate alternative software engineering technologies with respect to their cost-effectiveness. Since there is no formal definition for a testbed, testbed developers decide for themselves what features their testbed should have and how it should be used. For my research problem, I provided a definition of a particular class of testbed called software engineering evaluation testbed in the next section. 2.2 Software Engineering Technology Testbed Definition A software engineering technology evaluation testbed is an instrumented testbed made for the evaluation of technologies. Most existing testbeds validate software applications by executing them and telling practitioners if the application passed a series of tests or how it worked under a specific circumstance, such as the number of goals scored in the RoboCup testbed. However, this kind of 7 evaluation criterion may not be of much use to the practitioner trying to choose among alternative software engineering technologies. However, a software engineering technology evaluation testbed will evaluate how well a technology works in a representative setting. By evaluate, the testbed will not just tell practitioners if a technology works or not on the system, but it will help the researchers examine how much the system’s performance (e.g. dependability measured by number of defects left in the system) will increase and examine what classes of defects the technology is able to detect and not detect. Knowing the classes of defects a technology can detect will allow the practitioner to know what kind of defects remain in the system so that in the future, they will know what kind of technologies to look for. For this dissertation, the dimension of dependability examined will be the number of defects in a software system. Thus, to increase dependability, the number of defects has to be reduced in a software system. Usually testbeds are developed to work in a single problem domain, thus limiting the number of researchers who can use it. A software engineering technology testbed is developed to evaluate a wide range of software engineering technologies, thereby allowing multiple researchers in multiple fields to use it. In addition, positive and negative experiences with the technology can be stored in a software engineering technology evaluation database that allows practitioners to see how well the technology has worked in the past. Furthermore, by looking at the results, practitioners and researchers can see how technologies 8 compare against each other and if technologies are combinable in order to increase the number of defects found in the system. The database could also contain process data on how long it took to adapt the technology to the system giving practitioners an idea of how easy or difficult it might be for them to use the technology. Furthermore, the results of the evaluations would be comparable against other results performed on the testbed. Finally, software engineering technology evaluation testbeds will provide support tools that will help researchers gather performance data during their evaluation. Most of the testbeds that exist today provide very little tool support for the researcher. 2.3 What distinguishes software engineering technology testbeds (SETTs) from other classes of testbeds and software-intensive systems? Users are technology developers, research sponsors, and potential adopters. They have some common and some different needs. Potential adopters of the software engineering technology testbeds want to be able to evaluate how well a technology will work on a representative software system while technology developers want to gauge how well their technology works on a representative software system in order to show to an organization that it can be applied to their systems successfully. Research sponsors would want to see the technologies they invest in mature faster as well. In other classes of testbeds, technology developers 9 use the testbed to demonstrate how well their technology works, but it may not be in a setting representative of what the adopter is looking for. In addition, other classes of testbeds do not evaluate the technology, as a SETT will. Most other classes of testbeds will just validate that an application technology can work in a particular setting/situation but provides no data such as how well it took to learn the technology and how easy was it to adopt the technology. Objectives of a SETT are to determine relative technology capability and maturity vs. range of applicability for various technologies and mission situations. These impose additional needs, such as the need to cost- effectively configure comparable evaluations across a wide range of technologies. Current application testbeds only allow for the evaluation of execution testing technologies and are not set up to show how well technologies will be at finding defects throughout the life-cycle process. Special constraints may arise in special situations, such as the need to make portions of the SETT capabilities available for some user classes but not for others. There is a need to collect and store technology results in a SETT database for future software engineers to use in their evaluation of technologies that current testbeds do not do well now. A SETT should allow the practitioner to determine how well a technology performed in a 10 representative system as well as determine how long it will take to learn and adapt the technology. A better way to choose technologies. Current application execution testbeds use either head-to-head competitions or a field of alternatives to determine which technology is the overall best one. In SETTs, there is not necessarily a technology that can be rated the best one, but instead we must look at which combination of technologies best fits the needs of the software developer. Few, if any application execution testbeds, offer process data gathered from project development. The SETT offers defect and effort data. Ideally, the SETT will need to be available to all researchers from around the world. 2.4 How do the distinguishing objectives and constraints determine a SETTs’ content and architecture? Since the users of a SETT are technology developers, research sponsors, and potential adopters, this requires the testbed to be representative of the users’ software systems. For example, if the sponsoring organization was NASA, the testbed software will need to be representative of a NASA software system. A second objective of a SETT is to be able to determine the range of applicability for various technologies and mission situations in SETTs. Thus, the testbed will require more than just code to do these various evaluations. Other artifacts such as seeded defects, detailed project specifications, representative mission scenarios, and instrumentation will be needed to conduct experiments on 11 a wide range of technologies and applications. Once the artifacts are in place, the testbed will need to be configured for use for each individual researcher since a researcher will not necessarily need all the testbed artifacts to conduct their evaluation. Another objective of the SETTs is that it needs to store technology results from each of the evaluations performed on the testbed. This will mean that the SETT will need an experience base. Once the experience base is set up, a user can use it to search for technologies that will suit their dependability needs. Setting up the experience base will be the research project of another doctoral student. The testbed should provide a representative software system allowing practitioners and technology researchers to evaluate how well the technology will work on the sponsoring organizations’ software systems. If the technology will not work on a representative system, then chances are it will not work on the actual system. A software engineering technology testbed will need to help an end-user figure out if the technology is appropriate for their needs, which includes how easy it will be to use and adopt the technology to their software system. Thus, during the evaluation process, the researchers will need to collect process data such as effort needed to apply the technology to a software system and training time to learn the technology. In order for a testbed to help researchers evaluate the dependability aspects of a software system, there has to be some way to measure how much 12 dependability would improve by using the technology. Thus, the testbed’s architecture should contain seeded defects that can be used to determine how well a technology finds defects in the system. Finally, to meet the objectives and constraints of having the testbed evaluate rather than validates, a better way of doing technology comparisons, there will be a need to store technology results into the testbed. Thus, the creation of an experience base will be necessary. The experience base will store results of each technology evaluation that was performed on the testbed. The evaluation will contain data such as effort needed to apply the technology to the software system and how many of the seeded defects the technology found. 2.5 A Look at Testbeds This section provides a brief overview of other testbeds that are currently being used. In section 5.1, the disadvantages of using each of the following testbeds will be discussed. 2.5.1 Ada Compiler Evaluation System (ACES) Testbed The Ada community has developed a testbed called Ada Compiler Evaluation System (ACES). The purpose of ACES is to perform an analysis of Ada compilation and execution systems [Ada]. An Ada compiler vendor can use the ACES testbed to evaluate their product’s performances and use it to determine the product’s strong and weak points. ACES is used to determine the performance of an Ada compiler, specifically looking at time, code size, compilation speed, the symbolic debugger, diagnostic messages, program library 13 system, and system capacities. Support tools provided by ACES include tools to obtain measurement data from logs and do comparative analysis of several implementations 2.5.2 Testbed competitions in the software multi-agent field Each year, researchers in the field of software agents hold competitions to gauge how strong and mature their technologies are. Two such competitions are the Robot World Soccer Cup (RoboCup) [RoboCup 2007] and the Trading Agent Competition (TAC) [TAC 2007]. Each of these competitions draws many researchers to participate and gauge how well their research is compared to others in the same field by having a common benchmark problem to work with [Stone 2003]. RoboCup is an international project that uses the game of soccer as the underlying testbed. One of Robocop’s goals is to promote artificial intelligence and robotics research by providing a standard problem where a wide range of technologies can be examined. The ultimate goal of RoboCup is to develop a team of autonomous robots that can win against the a team of humans in soccer [RoboCup 2007]. RoboCup does have other competitions held at its venue such as search and rescue operations and dance challenges, but for my research, my focus will be primarily on the soccer aspects of the competition. The Trading Agent Competition is another international competition whose goal is to provide a common problem in the field of using automated agents in the e-marketplace. Software agents, while competing against each other, buy and sell multiple interacting goods in auctions [Stone 2003] .For example, each 14 agent may act as an automated travel agent whose goal is to assemble a travel package consisting of flights, hotel reservations, and entertainment tickets. The travel agent’s objective is to maximize their client’s satisfaction [TAC 2007]. 2.5.3 Software Engineering Competitions 2.5.3.1 International Software Process Workshop During the 5 th International Software Process Workshop (ISPW-5) [Kellner and et al. 1991], a recommendation was made to create at least one common problem in the area of software process modeling. The motivation for a common problem was that in the past, disparate examples were used for each type of modeling language, thus making comparisons between the modeling languages difficult. With a common benchmark problem, much more information would be gained from the resulting solutions. In August 1990, a modeling problem was sent to 18 researchers. The problem dealt with a hypothetical problem to a software system. The problem is that there is a rather localized change to the software system and a method has to be proposed to describe this problem. The solutions returned and discussed at ISPW-6 varied significantly in terms of their coverage of the problem. Solutions ranged from being abstract to very detailed, as well as to covering a few aspects of the problem to the whole problem. Overall, using a common benchmark problem proved to be fruitful. Many agreed that having common problem allowed to see the advantages and disadvantages of each modeling language and that it was possible that there is more than one correct way to model a problem. 15 One lesson learned is that the results of the solution should be defined in a framework that should be used by each technologist. 2.5.3.2 DARPA Speech, Image, Message Understanding In previous years, DARPA has held workshops to gather researchers together for speech and image recognition contests. The purpose of the workshops was to identify which technologies were capable of identifying objects correctly in images or recognizing certain keywords in speech. Once a technology is able to recognize objects in images, DARPA will then utilize the technology for object recognition in real life. Likewise for speech, DARPA will utilize speech recognition technology to translate speech into text or to detect certain keywords in live text such as phone conversations. [DARPA 2000] Other examples of technology testbeds have included computer chess competitions and computer performance benchmarking suites. 2.5.4 NASA Software Dependability Testbeds 2.5.4.1 Verification and Validation (V&V) Testbed In 2002, NASA performed a study whose purpose was to assess the maturity of different V&V technologies. The study simulated four independent V&V teams debugging Martian rover software with various V&V tools such as traditional testing, static analysis, model checking, and runtime analysis technologies [Brat and et al. 2004]. Defects, which occurred in the actual development of the software, were seeded back in the software for the teams to find. For the experiment, each team was equivalent in terms of equipment, training of the Martian rover software, number of hours (16) to conduct the 16 experiment, and number of people on the team. The only difference among the teams was the V&V technology being applied to the Martian rover software. During the experiment, teams had to submit reports every hour to indicate their progress allowing the researchers to monitor and track how the team’s time was spent. After the experiment concluded, vast amounts of data was collected and allowed the researchers to see which V&V technologies were effective in terms of number of defects found, time spent in finding defects, and time spent using the tools. In the end of the experiment, the following lessons were learned: 1. Conducting an experiment that has external validity is hard and/or expensive 2. Utilizing real code and seeding it with real errors are important for the validity of the results. 3. To classify results obtained from the experiment, clear guidelines needs to be posted on what to report 4. During the experiment, as much information should be gathered as it helps in the evaluation of the technology [Brat and et al. 2004]. 2.6 Benefits of testbeds According to [Stone 2003], there are several reasons why competitions such as the TAC and RoboCup competition are beneficial to researchers. One benefit is that the games are complex enough to prevent researchers from solving them in “game theoretic perspective”. Thereby, this allows researchers to focus on developing their technology and not on winning a game. 17 [Stone 2003] also states that competitions inspire research amongst the academic and research communities. Competitions provide interesting problems for researchers to solve and thus force them to create innovative technologies they may not have thought of before. In addition, the ideas and technology being developed will hopefully improve with each new competition thereby leading to a mature technology or to general solutions that will apply to an abundance of future applications. In addition, another benefit touted by [Stone 2003] is that competitions encourage the design of flexible software and hardware. Many researchers only test their technologies under the conditions of their own labs. However, in competitions, researchers have to adapt their technologies to work under different conditions, which encourage researchers to develop technologies that can be used in many different circumstances. In turn, this can lead to the technology being used in many fields of applications. According to [Stone 2003], competitions create excitement for students to become more involved in research via classes. There are many classes offered where the class project involves entering a competition such as RoboCup and TAC. Competitions allow students and professors to work together to come up with a solution and to test their solutions at competitions. With more people involved now, this can help technologies mature faster as well. Furthermore, [Stone 2003] indicated that competitions provide a common platform for exchanging ideas as evidenced by the software engineering competition held at ISPW-6. Researchers from around the world can work on the 18 same problem and then at the competition exchange their approaches to how they solved the problem, which will hopefully lead to collaborations amongst the researchers and new ideas each researcher would not have thought on their own. In recent years, RoboCup introduced simulation leagues where technology researchers could simulate the soccer field and 11 player agents. With the simulation program, this allowed researchers to test their technologies/algorithms at a low cost. [Lima and et al. 2005] 2.7 Hazards of testbeds According to [Stone 2003], hazards of competition include technologies being developed just to win the competition, thereby having no applications in other areas of research. Also, domain-dependent solutions do not help advance the field of research. Furthermore, some of the solutions at these competitions may have a high cost. For example, at the RoboCup competition, a research group could design a very expensive robot that wins the competition, but due to the high costs, the robot may not prove cost-effective for future applications, thus the winning solution will have very little value. One solution to prevent researchers developing expensive solutions is to give them a budget or to require the use of a common hardware platform, thus allowing everyone to play on a level field. Another hazard of competitions is that for researchers who are participating for the first time, they may find that they are at a disadvantage to the seasoned veterans. For example, there are researchers who have participated in 19 the same competition for years, thereby spending a large amount of time and money developing their technology for the competition. Each year, those researchers would ideally improve their technology for the competition, giving them an advantage over newcomers who may still be in the early stages of their technology development and their familiarity with the competition conditions. Another hazard to competition is for researchers to design their solution to achieve success in a particular environment instead of designing their solution for general purposes. Thus, competitions like RoboCup change its rules or its field environment each year. For example, RoboCup may add shadows to various parts of the soccer field or competitions like RoboCup Rescue may hide their victims in more difficult places. The challenge for all testbed competition is to ensure that a researcher will not design a solution to achieve success in a particular setting. Another benefit to changing the rules is that it makes the competition more difficult thus hopefully leading to advancement in research. [Lima and et al. 2005] Finally, from these competitions, [Stone 2003] said there may be the potential to draw incorrect conclusions based on the results of the competition. For example, someone may conclude that if team A beats team B then team A’s techniques are generally better than team B’s techniques, which would be a flawed conclusion to make. There is no way to make a solid general conclusion as to which technology will be better in the long run. 20 2.8 Difficulties of getting new technologies adopted Many kinds of technologies are created every year, but researchers generally have a difficult time proving that their technologies can be used by industry in a successful manner. Many times, researchers come into a company, introduce their technology, write a white paper about it, and leave the company resulting in the company’s doubt on whether or not the technology is feasible for use. According to [Redwine and Riddle 1985], the following lists several critical factors needed to have a successful technology transition: · Conceptual Integrity: the technology must be well developed · Clear Recognition of Need: the technology must fill a well-defined and well-recognized need · Tuneability: the technology can be tuned to fit the user’s needs · Prior Positive Experience: experience showing demonstrable cost/benefit indicating prior positive experiences · Training: training must be provided and include a large number of examples. This is particularly true when new and modern concepts are involved Factors that slow down the transition process include: · Diversity of technology usage · High cost needed to understand the technology · Large number of technology alternatives with different strengths and weaknesses 21 2.9 How researchers develop and mature their technologies Currently, researchers use other testbeds that do not have the requirements/architecture proposed in my research or create non-realistic systems from scratch in their labs in order to evaluate how well their technologies work. Then they write a paper documenting their results and present the paper and results to practitioners hoping they will adopt their technology. They don’t show how it can work for a specific organization, leaving those details to the developers/practitioners to figure out. Another way of how researchers get organizations to adapt their technology is they work with an organization, learn how the organization’s software systems, and apply the technology to the system. However, both ways impose a high cost on the researchers. [Lindvall and et al, 2005] In addition, researchers face many obstacles when trying to get practitioners to adopt their technologies as outlined by [Redwine and Riddle 1985] in chapter 2.8. By using the software technology evaluation testbed, researchers will be able to present a stronger case to practitioners as to why their technologies are suited for the practitioner’s organization. 2.9.1 Why measure maturity? There are several reasons why maturity of a product needs to be measured. Two of them are risk management and government mandates when building systems. Others include readiness for marketplace entry and suitability for technology investment. 22 2.9.1.1 Risk Management Measuring a technology’s maturity can be an indicator of program risk. As the [US GAO 1999] report indicates, when the readiness level of a technology is established, then the risk of using that technology in your system can be assessed. The [US GAO 1999] report also states that a low level of readiness i.e. low TRL, represents a high risk, since there are many unknowns that still need to be resolved in developing the technology. These unknowns cause risks because it will be unknown as to whether a program can meet its cost, schedule, and performance goals. [Nolte and et al. 2004] 2.9.1.2 Government Mandates There are both legislative and regulatory mandates that require the Department of Defense (DoD) to measure technology maturity before developing a software system. For example, both DoD Directive 5000.1 and Section 804 of the FY2003 Defense Authorization Act require the government to consider software product maturity as a criterion during source selection [Nolte and et al. 2004]. 2.10 How practitioners choose technology? According to [Redwine and Riddle 1985, when practitioners are choosing technologies to use in their system development, several factors should be examined. The first factor is the technology must be able to solve the practitioner’s dependability (defect-reduction) needs. After identifying the technology, the practitioners will want to know several things such as has the technology been successfully applied before to other systems. Prior positive 23 experiences will indicate to the practitioner how well developed the technology is. In addition, the practitioner would examine the technology’s tuneability. The user should be able to tune the technology to the needs of their development process since technologies will rarely work for all systems as is. Finally, training in the use of the technology should be provided in order for the practitioners to effectively use the technology. Currently, some practitioners do not follow the method outlined by Redwine and Riddle. Many times, practitioners will read about a technology in a research paper, see the technology performed on a laboratory software system that may not be representative of their organization’s software systems, and try to adopt the technology on their own for their use. However, this can lead to the practitioners not using the technology correctly or being able to get the desired results, which would then give the practitioner a bad experience with the technology. This could then result in the technology not being chosen for future use, which would be bad for the technology maturation process as in order for technologies to get better, it must be used actively. 2.10.1 Technology Maturity Measurement: TRLs Another way practitioners choose technologies is with the Technology Readiness Levels (TRLs) [Mankins 1995b]. Technology Readiness Levels were developed by NASA to measure the maturity of a technology so that it can be decided if the technology should be used to build or support mission software. There are nine different technology levels ranging from new research technology (TRL 1) to technology demonstration to being “flight-proven” (TRL 9). TRLs 1-3 24 demonstrates that a technology is feasible from a scientific point of view, TRLs 4- 6 demonstrates that the technology can be implemented, and TRLs 7-9 deals with the usage of the technology in an actual system. [Mankins 1995a] Another benefit of TRLs as touted by [Nolte and et al. 2004] is that the program manager or practitioner can use it to compare the current maturity of the technology being used to the required maturity that is needed by the software system. Knowing this difference can provide the program manager a plan describing what activities needs to be done for the software system to achieve its required maturity. Appendix A provides TRL rating scales for general technology and for software technology. 2.10.2 What are the advantages and disadvantages of using TRL? According to a report by the Army CECOM, TRL helps to address 30% of the factors engineers should pay attention to when evaluating technologies thus serving as a way to reduce risk. In addition, according to the Army CECOM report, the use of TRLs encouraged its research department to build and evolve from existing, though immature, technologies from other research labs and universities instead of developing new ones from scratch. [Graettinger and et al. 2002] TRL does not address the managerial aspects of a program such as identifying customers, only the technical side of a project [Nolte and et al. 2004]. In addition, TRLs does not help software engineers to compare different technologies or help engineers determine which technology works best for them. 25 2.11 NASA HDCP Testbeds 2.11.1 What was HDCP? The NASA High Dependability Computing Program (HDCP) [HDCP 2001] was a major investment in new research and technology to improve the dependability of NASA mission software, and of software-intensive systems in general. The research program addressed new capabilities in such areas as lightweight formal methods, model checking, architecture analysis, human factors, code analysis, and testing The HDCP strategy for accelerating the usual 18-year transition time for software engineering technologies employed common- use technology evaluation testbeds representative of NASA mission software. With the HDCP program, the goal was to help tests how well new or existing dependability techniques apply to complex systems researchers normally do not have access to. 2.11.2 HDCP Testbed Objectives One of the objectives of the HDCP testbed was to buy down risks of using new HDCP technologies. As reported by the [US GAO 1999], using new technologies in your systems can be risky. One risk is poor technology match to mission needs in terms of but not limited to real-time performance, scalability, and exception handling. A technology may work great on a small project, but when applied to a NASA mission where the project is typically large, the technology may not scale up to 26 meet the mission needs. The technology may not be as fast or doesn’t do as stated when applied to a mission context. A new technology may have undesired side effects that even the researcher does not know about. If the technology has never been applied or used on a large system, undesired side effects may occur and could hamper the software system’s performance. In addition, it could be that applying the new technology to a mission context takes longer than expected, thus affecting the software development cost and schedule. Another risk is that due to the lack of relevant mission experience, NASA may be reluctant to apply the new technology to their software systems since they have no idea how well it will perform on their large software systems. A new technology may not be mature enough to be used in industry. Finally, one more risk is that the new technology may have poorly- understood operational behavior and usage. The researcher can describe how well it works in an academic laboratory setting, but it may not be well understood on how to apply the technology to a mission context. The technology is codified in computer science terms whereas NASA needs the technology to be described in mission terms. Thus, due to all of the above risks, one of the objectives of the HDCP testbed was to pre-qualify new technologies in mission context. Applying the technology to a representative NASA system setting would ideally allow NASA to determine how well the technology will work in an actual mission context. 27 A second objective of the HDCP program was to enable cost-effective HDCP technology integration. As [Redwine and Riddle 1985] states, there is a high cost associated to understanding the new technology, thus inhibiting technology transition from academia to industry. As stated above, it is also a risk to try new technologies on a system, as it may not be scalable when taken out of the laboratory environment, thus having an impact on project cost and schedule. Due to these issues, many companies may find it too risky and costly to use new technologies. In SCRover’s case, researchers applied their technology to the SCRover system and determined how well it will work on a representative NASA software system. As the technology evaluations in chapter 9 will demonstrate, using the SCRover system helped researchers better adapt their technologies to a large software system via lessons learned. If the technology worked well on SCRover, this gave program developers more confidence in using the new technology on their projects. In addition, the SCRover application will lower the costs of technology integration to a company, as the researcher will have worked out many (if any) problems in integrating the technology to a NASA-like system. Furthermore, using SETTs, a researcher can estimate how much it costs to apply a technology to a project, estimate how much benefits are gained from using the technology, and if the technology can be applied to the organization’s software systems. This data would then be stored on the testbed. Afterwards, program developers can go to the testbed and get mission-relevant cost- effectiveness data for a particular technology and determine if it is beneficial for them to use it on their project. 28 Finally, the last objective of the HDCP program was to accelerate the pace of HDCP technology maturity via early and accurate feedback from the testbed. According to [Redwine and Riddle 1985], it takes about 18 years for a technology to be transitioned to industry. With the testbed concept, the time to transition would be cut to 5 years. For example, researchers who apply their technology to the SCRover testbed will get faster feedback on how well the technology performs on a NASA-like system. Since the SCRover testbed is open most of the public, researchers do not have to wait on NASA to get their data; they can gather the data themselves. With faster feedback on how well their technology performs on a large software system, researchers will be able to improve their technology faster, thereby accelerating the pace of technology transition. 2.11.3 HDCP Testbed Milestone Risk considerations and economic considerations make it generally impractical to experimentally apply immature research artifacts to operational mission software. In response, the HDCP testbed approach has defined a series of four testbed stages that enable a highly cost-effective progression through increasingly challenging and realistic experimental applications of HDC technology. The four stages are: 1. Experimental scenarios roughly representative of NASA missions developed or adapted by researchers to fit their technology capabilities. 2. Tailorable testbed suites with hardware, software, and specifications representative of NASA missions; provided to researchers along with representative mission scenarios, instrumentation, seeded defects, 29 installation and experimentation guidelines, and baseline data for comparative evaluation of a technology’s cost-effectiveness; 3. Application of more mature technologies to NASA operational mission software in a NASA simulation environment; 4. Application of matured technologies to NASA operational mission software in a NASA operational environment. The use of these testbeds and complementary approaches such as Technology Maturation Teams enables NASA and HDCP to accelerate the maturation of an emerging HDC technology, using criteria such as the NASA Technology Readiness Levels [Mankins 1995b] [George and Powers 2003]. Experimentation at each stage can be done with relatively low marginal expenditure of effort, based on the feedback received in earlier stages. Involving NASA mission personnel in defining representative mission scenarios and evaluating experimental results on a technology’s cost-effectiveness will also enable mission personnel to adopt new technology earlier. Researchers will also be able to concurrently pipeline more advanced and more mature versions of their technology through the testbed stages. A major hypothesis to be tested by the HDCP is that the staged testbed approach will be able to compress the traditional 18-year interval from concept emergence to regular mission usage to 5-7 years. 2.11.4 HDCP Golden Gate Testbed The Golden Gate Testbed [Dvorak and et al. 2004] is another NASA HDCP testbed developed by CMU-West. For its testbed, it also uses the MDS 30 technology with one major difference. Instead of C++, MDS is written in Java. Their goal is the same as the SCRover testbed’s goals, which are to evaluate technologies in the area of defect-reduction and to help mature HDCP technologies quicker. Like the SCRover testbed, the Golden Gate testbed provides complex software, a realistic setting, and is representative of a NASA mission. However, unlike the SCRover testbed, no experiments by outsiders have been conducted on the testbed, thus it is hard to evaluate the usefulness of the testbed. In addition, the Golden Gate testbed is not ITAR-safe and no ITAR-safe adaptation code is available to the public. Furthermore, limited specifications detailing requirements and design are available for the Golden Gate project making the testbed difficult to use by outsiders. 2.11.5 HDCP Dependable Automated Air Traffic Management Testbed TSAFE (Tactical Separation Assisted Flight Environment) [Lindvall 2004] is another NASA HDCP testbed developed by the Fraunhofer Center at the University of Maryland. Their goal is the same as the SCRover testbed’s goals, which are to evaluate technologies in the area of defect-reduction and to help mature HDCP technologies quicker. Currently, TSAFE only works for evaluating architecture technologies. Like the SCRover testbed, the Golden Gate testbed provides complex software, a realistic setting, but is not representative of a NASA mission. 31 However, unlike the SCRover testbed, no experiments by researchers outside FC-MD have been conducted on the testbed, thus it is hard to evaluate the usefulness of the testbed. Furthermore, limited specifications detailing requirements and design are available for the TSAFE project making the testbed difficult to use by outsiders. TSAFE does provide a set of seeded defects, however the defects were created by the researchers and may not be realistic. TSAFE runs on a simulator and does provide instrumentation tools. [Lindvall 2004] The architecture for the TSAFE testbed is outlined in Figure 1 below: Figure 1: TSAFE Architecture 32 In summary, for the two HDCP testbeds identified above, the operational concepts is similar to the operational concepts of the software technology evaluation testbed, however the HDCP testbeds do not give representative feedback or allow researchers from multiple fields to use the testbed, as in the case of the TSAFE testbed., the HDCP testbeds do not provide detailed specifications and developer/research effort data. 2.11.6 SCRover Testbed The objective of the SCRover HDCP testbed is to help researchers in multiple fields evaluate their technologies on a representative NASA software system. Unlike the other testbeds mentioned, there is no restriction as to what type of research can be done on it, as long as it has the potential to be used by NASA in building software systems. For the HDCP testbed, researchers specializing in architecture, operational envelopes, and modeling languages have participated in the project. One of the evaluation criteria for the technologies is the number of defects avoided, found, and diagnosed. Using defects as the evaluation criteria will allow NASA to see how well a technology detects certain kinds of defects. Like the DARPA Speech and Image testbed, end users will find the results of the technology application to be direct. End users will be able to estimate how well the technology will work on their system by seeing how well it worked on SCRover. In addition, users can directly apply the technology to their system and get immediate feedback on how well the technology worked for their software systems. 33 Please refer to Chapter 6 for more information about the SCRover testbed. 34 Chapter 3 Objectives of the Software Engineering Technology Testbed Based on lessons learned from previous testbeds, the following were used as objectives for the software engineering technology testbed. The research described and evaluated here addresses the testbed architectural tradeoffs needed to satisfy all of the objectives as well as possible. As evidenced by the TSAFE testbed [Lindvall 2004], good design and requirement documents are needed for researchers wishing to use the testbed. Without a good set of artifacts, researchers will have a difficult time in figuring out what the software system does and how they can interact with it. Having the testbed be publicly available to all researchers, i.e. ITAR 1 -safe, is an important criterion. Due to the fact many researchers have international students for their research assistants, researchers need the testbed to be ITAR- safe so that all their students will be able to use it. For the TSAFE testbed, the Fraunhofer Center found it difficult to persuade researchers to use the TSAFE testbed due to the ITAR restrictions. Likewise, the Golden Gate testbed faced similar problems with ITAR restrictions. As indicated by [Stone 2003] and by the [Brat and et al. 2004], a low cost is needed to encourage usage of the testbed. If the cost to perform an experiment on the testbed (or to enter a competition) is too high, researchers will 1 International Traffic in Arms Regulations – a set of rules restricting the export of information to other countries 35 be reluctant to use the testbed. This is especially true for researchers with a limited budget. The artifacts in a testbed should be easy to use and modify by the researchers. Having artifacts that are difficult to understand and use will prohibit researchers from using the testbed. As evidenced by the NASA Verification and Validation Tools testbed, utilizing real code helps validate the experiment results to other program developers [Brat and et al. 2004]. The developers will have first-hand evidence of how well the technology can work in a non-academia project. Due to ITAR restrictions, it may not be possible to get code from actual missions, thus the code should at least be representative of a NASA mission in order to help validate the experiment. Another lesson learned from the NASA Verification & Validation Tools testbed is that clear guidelines needs to be established on how to conduct the experiment, how to collect data, and on what to report. Within the time allowed, researchers should ideally gather as much data as possible from the experiment. As shown by [Brat and et al. 2004], this data can prove to be valuable for researchers in their technology valuation. In addition, some researchers may not know how to conduct an experiment or collect data, thus having guidelines will be helpful to them. Finally, having guidelines establishing a standard on what data needs to be collected makes it easier to compare technologies. 36 An experience base of technologies evaluated on the testbed is needed if the testbed is to be used successfully by software engineers. The experience base should have a clear and easy way to display and search the information collected about the technologies evaluated. If the information is collected in a disorganized manner, this will discourage people from using the testbed. The information collected should show both the positive and negative results of a technology. As [Redwine and Riddle 1985] indicates, a lack of an experience base causes software engineers to be hesitant to use new technologies since they will have no idea how well it really works. For my research, I will not be creating the experience base, but will be providing the data to populate it. Finally, one of the last lessons learned from the previous testbeds is that there needs to be an effective way to evaluate technologies against each other. Choosing between two or more technologies can be difficult if there is no standard way to judge them. One standard the software engineering technology testbed proposes to use is seeded defects. The number of defects a technology finds in the software system will play a factor (amongst many others) in how well a technology works. Chapter 6.5.6 elaborates more on using seeded defects in the testbed. 3.1 Software Engineering Technology Testbed Operational Concepts To meet the operational concepts of the software engineering technology testbed, the testbed will require certain features of which I will describe below. 37 These concepts were derived from the stakeholders’ win conditions, best practices of current testbeds, and the failings of current testbeds. Representative Feedback - For the testbed to provide representative feedback to the sponsoring organization, the software engineering technology testbed must have a software system that would be representative to what the organization uses/develops. In my instance of the testbed, the sponsoring organization was NASA. Thus, we decided to use Mission Data System (MDS) to build our code. MDS is a framework that was developed by NASA-JPL engineers. Originally, it was designed for use on a 2009 Mars mission. (NASA has decided not to use MDS in their 2009 Mars mission.) The capabilities of the rover include movement and using the camera, all capabilities that the Mars rovers has as well. Multiple Goals/Fields - An organization usually finds defects at all levels of software development. Thus, a software engineering technology testbed should test a wide range of technologies that looks for defects at any place in the software development cycle. Since the goals of the testbed are many, this will require the testbed to have many features/requirements to fulfill those goals. Thus, the testbed provides many test artifacts such as the Operational Capabilities, Requirements specifications, Architecture specifications, a Life- Cycle Plan [Boehm and Port 2001], effort logs, defect logs, and code. These diverse artifacts allow researchers to test many aspects of software engineering. For example, in the SCRover testbed, an architecture researcher can use the Architecture specifications to study SCRover’s architecture, a cost expert can 38 look at the Life-Cycle Plan, and effort logs to get a sense of how much time and effort was spent on the SCRover project, and a requirements expert can read the Requirements specifications to understand what the requirements of the system are. Multiple Contributors - The software engineering technology testbed will have many contributors, possibly working in the same area of research. Since there will be many contributors, each with their own unique experiment, the testbed will need artifacts that the researcher can tailor to his/her need. For example, both Neno Medvidovic and David Garlan will conduct architecture experiments on the SCRover testbed. To do this, both researchers will need the SSAD in order to understand the SCRover architecture. However, Medvidovic uses the Mae [Roshandel and et al., 2004] language in his experiment and Garlan uses the ACME [Garlan et al. 2000] language to describe architecture while the SSAD is written in the UML language. Thus, Medvidovic will need to tailor the architecture models in the SSAD to use Mae and Garlan will need to tailor the UML models to ACME models. Searchable Results - The results of applying technologies to the testbed will be made available to interested software engineers who wish to use them and researchers who wish to compare their technologies to technologies previously experimented on the testbed. Since the number of experiments performed on the testbed is unlimited, the results will have to be in a searchable format to allow others to find it quickly and easily. Visual Query Interface (VQI) 39 [Seaman and et al. 1999] is one method users can use to find information from the testbed. Combinable Results/Integrated Assessment - Since different researchers will be using the testbed, there arises a need to have a common data definition so that the researchers’ results will be easily combinable in case two or more different researchers choose to collaborate and need a common format to express their results. We cannot have each researcher come up with their own format to express their results, else it makes it complicated to combine and compare results. Also, for a software engineer who wishes to use multiple technologies in their project, having combinable results allows the engineer to estimate how well using multiple technologies will work on the project. In some instances, using multiple technologies on a software system will not increase the software’s dependability more than using a single technology, especially if there are two or more technologies that look for the same type of defects. The SCRover testbed uses the Orthogonal Defect Classification [Chillarege 1992] to classify the defects found by a technology. Also, since each of the researchers will be looking for the same seeded (and unseeded) defects, it will be easy to compare one technology against another, and also to determine which technologies should be combined to produce a more effective way of finding defects. Low Cost - Since academic institutions do much of the research, budget for experiments could be limited. Thus, usage of the software engineering technology testbed should be at a low cost so that any researcher can use the 40 testbed. In addition, as [Stone 2003] points out, a low cost provides a low entry barrier for any researcher to participate. 3.2 Technologies Evaluated The types of technologies that will be evaluated by the testbed are indicated in Table 1. An analysis was performed on the technologies to be evaluated by NASA in the HDCP program. Table 1 summarizes what those technologies are and what the researchers need for their evaluation. The complete list of technologies examined can be found in Appendix D: Technologies Examined for the HDCP Program. 41 Table 1: Technology families to be evaluated Technology Family What artifacts technology needs What the testbed framework provides What SCRover provides Software Architecture Architectural and requirements description, Code, defects, various missions, way to collect information Architecture and Requirement Specifications, Code, Seeded Defects, Instrumentation, Mission generator SSAD, SSRD, SCRover/MDS code, Seeded Defects, Instrumentation Class, Various rover missions Code Analysis Code, defects, various missions, way to collect information Code, platform to run code, Seeded Defects, Instrumentation, Mission generator SCRover/MDS code, Gazebo, and Pioneer rover, Seeded Defects, Instrumentation Class, Various rover missions Testing and operational envelopes Code and test cases, defects, various missions, way to collect information Code, Test document, Seeded Defects, Instrumentation, Mission generator SCRover/MDS code, Test document, Seeded Defects, Instrumentation Class, Various rover missions Requirements Engineering Requirements description, code, defects, various missions, way to collect information Requirement Specifications, code, Seeded Defects, Instrumentation, Mission generator SSRD, SCRover/MDS code, Seeded Defects, Instrumentation Class, Various rover missions Based on the above table summarizing the family of technologies being evaluated, the specifications needed by researchers include Architecture Specifications, Requirement Specifications, Code, Test Document, and a platform to run the code. In addition, each of the technologies will require seeded defects in order to indicate how well the technology works, an instrumentation class in order to 42 collect the data needed for technology analysis, and a mission generator in order to run the testbed system under different circumstances and to make sure the technology doesn’t just work under one specific scenario. 3.3 Testbed Stakeholders’ Objectives The overall objectives of a software engineering technology testbed are to help mature technologies faster and help practitioners identify technologies that will make their systems more dependable. Overall, there are three groups to analyze for my research. The information was gathered by interviewing NASA employees and researchers working with the High Dependability Computing Program. The first group of stakeholders is the researchers. They have a need to evaluate their technology more cost-effectively in a representative setting that will convince practitioners to use their technologies. Researchers want to show to practitioners their technology can create more dependable systems. They also want to use the testbed to increase the technology maturation process. The second group is the practitioners or technology users. This group needs a way to find defect-reducing technologies easier and faster. The third group is the sponsors, which has a need for defect-reducing technologies to mature faster. They want a quicker return on their investment. 3.4 What do the stakeholders need to meet their objectives? Researchers need the following things to meet their objectives: · A common platform to show how well their technology works 43 · The platform should be low cost and easily accessible to them · Specifications they can configure for their technology. Specifications can be code or architecture specifications or other similar artifacts giving information about the platform · A way to easily gather information during the evaluation for their results and analysis · A way to judge how well their technology performed in a representative system Practitioners need the following things to meet their objectives: · Positive and negative experiences regarding how well the technology performed in a representative system o Results should show how many defects were found with the usage of the technology · A way to compare different technologies Sponsors need the following things to meet their objectives:: · A platform that will evaluate technologies cost-effectively The above users’ needs match the testbed objectives described earlier as shown in the table below: 44 Table 2: Mapping between User' s Needs and Testbed Concepts User’s Needs Testbed Concepts Low cost platform Low Cost Representative and Common platform Representative Feedback, Combinable Results Configurable Specifications Multiple Goals and Multiple Contributors Way to gather information used for analyzing the performance of the technology Searchable Results Way to evaluate technology in representative system Representative Feedback Positive and negative experiences Representative Feedback, Searchable Results Able to compare different technologies Representative Feedback, Searchable Results, Combinable Results 3.5 Literature Review of Requirements In addition to looking at the objectives and needs of the stakeholders involved, a literature review was conducted to evaluate how existing testbeds work and what other requirements could be gathered from the literature review. As indicated by Tichelaar, lack of good documentation increases the amount of work researchers have to do to use the testbed or software system they are investigating or trying to use. Without accurate documentation, it makes understanding the system harder [Tichelaar and et al. 1997]. Thus, a requirement for a software engineering technology testbed would be to provide good representative specifications about the system being analyzed. In addition, with so many researchers providing different technologies to solve the same problem, the specifications will have to be tailorable to meet each researcher’s needs. 45 A limitation of other testbeds and test suites are that they tend to work for a single technology family. Defects can be found throughout the development lifecycle, thus the software engineering technology testbed should be able to support a wide range of technologies that can find defects throughout the lifecycle. To support this goal, the software engineering technology testbed should provide specifications that work for a wide range of technologies. Another requirement that would be good for testbeds to have comes from studies such as the one conducted by Nokia [Metz and Lencevicius 2003]. Nokia documented that they used instrumentation to collect data they need to form their analysis of a system. Thus, a good software engineering technology testbed should have instrumentation to support researchers in their evaluation. According to [Stone 2003], a hazard of competitions includes technologies being developed just to win the competition, thereby having no applications in other areas of research. One solution to this problem is the use of seeded defects in the system. Researchers would not know ahead of time what defects are planted in the system, thus preventing researchers from developing technologies to work on a single system. In their evaluation, the researcher would have to generate a report indicating what defects their technology discovered and the testbed developer (or practitioner) would analyze the report to see how many of the seeded defects were found by the researcher. According to [Boehm and Port 2002], the seeded defects would come from defect change histories captured during the development of the testbed system. 46 According to [Stone 2003], domain-dependent solutions are bad. Repeating the same competition or scenarios several times encourages high- level, generalizable solutions. Thus, there is a need to have the testbed have several different missions to discourage domain-dependent solutions. Another observation from [Stone 2003] is that finances do have an effect on who can participate in a competition or technology evaluation. Thus, having a low cost or zero-cost testbed would allow most, if not all, researchers to evaluate their technologies on the testbed. Furthermore, providing every researcher the same low or zero-cost platform/testbed would create a level playing field and allow for an easier comparison between technologies. [Redwine and Riddle 1985] indicated that one of the barriers to technology adoptions is the lack of prior experiences demonstrating how well a technology will work. In addition, Basili states that experience bases can be used to share knowledge [Basili, Tesoriero, and et al. 2001]. Thus, if a testbed has a set of positive (or negative in some cases) experiences about a technology, this will help a practitioner decide whether or not to adopt the technology. Ideally, the experiences of using the technology would be representative in how the technology would be used by the organization. If a researcher were to submit an experience that was non-representative, the practitioners would have a difficult time to assess whether or not a technology would work in their organization. Since practitioners often will use more than one kind of technology in their system development, practitioners will need to know how well technologies work together. For example, “Do two technologies clash with each other?” or “Does 47 two technologies find the same set of defects?” According to Basili, having combinable results will help in seeing how two solutions may work together and allows for easier collaboration amongst researchers [Basili, Zelkowitz, and et al. 2007]. Finally, a good software engineering technology testbed should provide guidance to the testbed users. Guidance helps users become more familiar with the tool [Basili, Tesoriero, and et al. 2001]. Guidance can be of many things, including but not limited to, a set of frequently asked questions or instructions. 48 Chapter 4 Requirements and Architecture of the Software Engineering Technology Testbed To determine the testbed architecture, I applied two different methods to find the requirements and architecture of a software engineering technology testbed. Using two methods allows me to check that the right set of requirements and the right set of architecture is being conveyed. The two methods chosen are: DSSA (Domain Specific Software Architecture) [Tracz 1995] and CBSP (Component, Bus, System, Property) [Medvidovic and et al. 2003]. 4.1 CBSP Approach 4.1.1 CBSP Introduction Now that the testbed objectives and users’ needs have been established, we will use the testbed objectives and users’ needs to formulate the testbed requirements and architecture. I describe the architecture of the testbed using the CBSP (Component, Bus, System, Property) approach [Medvidovic and et al. 2003] that is used to bridge requirements and architectures. The stakeholders who participated in the process were Dr. Barry Boehm, Leslie Cheung, Somo Banerjee, and myself. Cheung and Banerjee are doctoral students at the University of Southern California. Their research focus is estimating reliability in systems. They will be participating in the CBSP approach as users. More information about the CBSP process can be found in [Medvidovic and et al. 2003]. 49 4.1.2 Results of CBSP Process After working with Banerjee, Cheung, and Boehm, the following requirements and architecture were developed for the testbed. Later in 4.3, I map the requirements to other researchers who have published similar problems when they have evaluated technologies. 50 Table 3: Mapping btw. Operational Concepts, Architecture, and Requirements Testbed Operational Concepts Requirements Architecture Representative Feedback, Multiple Goals, Multiple Contributors Testbed should provide tailorable, representative specifications and code to support a wide range of technologies and multiple contributors Cp – Specifications Component CP – Specifications Component should reflect a wide range of technologies CP – Specifications component should be tailorable CP – Specifications developed with good software engineering practices Cp – Code Component Cp – Effort Data Component Representative Feedback, Searchable Results Testbed should allow a researcher to instrument the system in order to collect data during the experiment Cp – Instrumentation component B- Connector to Instrumentation component Representative Feedback, Combinable Results Testbed system should have a library of seeded defects that researchers can use to seed defects into the specifications/code. The defect data classification used should be comparable to classifications used by the researchers. In addition, researchers should be able to classify the defects according to their own defect classification system. Cp – Seeded Defects Component B- Connector to Seeded Defects component CP – A classification should be used to classify/group the defects Representative Feedback The defect pool should come from actual defects incurred while implementing the testbed system Cp – Defect Pool Component SP – The defects should come from a pool of defects actually incurred during development of testbed 51 Table 3: Mapping btw. Operational Concepts, Architecture, and Requirements, Continued Representative Feedback Testbed should be able to allow a researcher to use multiple representative missions in their evaluation Cp – Mission generator component B- Connector to mission generator component Low Cost Testbed should run on a low-cost platform and require a low amount of effort to set up and use, including costs to inject defects Cp – Simulator Component Cp – Rover Component CP – The simulator component should be low cost CP – The rover component should be low cost Searchable Results Researchers should capture technology results for practitioners to review Cp – Technology Evaluation Results Component Combinable results Researchers should use a common platform to evaluate their technologies allowing a fair comparison of technologies [and allowing results to be combinable] S – The system should work on a common platform Cp – Technology Evaluation Results Component Cp – Researcher’s Effort Data Component CP – Technology Evaluation Results Component should be compatible with other results Cp – Simulator Component Cp – Rover Component Representative Feedback Testbed should generate results that is representative to the end user SP – The system shall provide representative feedback Multiple Goals, Multiple Contributors Testbed should be available for public use SP – The system should be available to all Multiple Goals, Multiple Contributors Testbed should provide guidance to the researcher on how to use the testbed Cp –Guidelines - Manuals Component Combinable results Testbed should generate results that is combinable with other researchers’ results CP – Technology Evaluation Results Component should be compatible with other results With the architecture components defined by the CBSP process, we will now present an architecture that realizes the CBSP view. 52 Figure 2: Software Engineering Technology Testbed Reference Architecture A brief description of each component in the testbed’s architecture is given below along with why the component is important to the architecture. Instrumentation – an instrumentation class will help the researchers collect data for their evaluation report. Without this feature, a researcher will have to spend more time in figuring out how to collect the data they need. Seeded Defects – seeded defects will help the researchers estimate how well their technology finds defects and it prevents researchers from building their technology to just pass a certain test. Without this feature, researchers can design their technology to pass the testbed’s criteria. In 53 addition, seeded defects give a measurement to researchers on how well their technologies can find defects and what kind of defects the technology can and cannot find. In addition, there is a wide variety of defects in the defect pool. The defects are classified according to the Orthogonal Defect Classification. Code – A system is needed to evaluate the technology on. Without code, researchers whose technology requires code will not be able to use the testbed. This will limit the scope and usefulness of the testbed. Along with specifications, code provides the basis for the evaluation. Specifications – Specifications are needed if the organization wishes to test more than just technologies that work on code. Specifications provide the basis for the evaluation, along with the code. For example, specifications are needed for evaluations on technologies dealing with architecture and/or requirements. Without specifications, researchers whose technology requires specifications will not be able to use the testbed. This will limit the scope and usefulness of the testbed. Mission Generator –allows researchers to do a more thorough evaluation of their technology since it will be operating under several circumstances and not just one. Also, prevents a researcher from developing their technology to work under a specific circumstance. Simulator – a simulator is needed to do low-cost runs of the system. Running the system in simulation mode will help the researcher acquire data and do runs of the system at a low cost. For those technologies that 54 require code to be run, a simulator provides a free way to evaluate the system as many times as they want. Without a simulator, researchers will have to buy hardware, which may not be possible due to limited budgets. Robot Hardware – a robot is provided in case the researcher wishes to test their technology on an actual platform. Usually the researcher would use the robot after getting good results on the simulator and want to be sure the simulated data will match the actual data obtained from the robot. The rover is for researchers who have the funds to test their results on the actual hardware. Guidelines-Manuals – manuals are needed to help guide the researchers in how to use the testbed. Without manuals, researchers will spend more time learning the testbed. Guidelines provide help to researchers on how to conduct a technology evaluation. Effort Data – effort data that went into the development of the testbed and effort data of how long it took a researcher to perform a technology evaluation on the testbed. Researchers who study costs of software systems can use developer effort data in their studies. Researcher effort data is useful for practitioners who wish to know how long it took to apply the technology to the testbed. Technology Evaluation Results - useful for practitioners who wish to know how well the technology did on the testbed before applying the technology on their own systems. Without the technology evaluation results being 55 captured, practitioners will have a difficult time determining how well the technology worked. 4.2 DSSA Introduction According to [Tracz 1995], a domain-specific software architecture (DSSA) is “a process and infrastructure that supports the development of a Domain Model, Reference Requirements, and Reference Architecture for a family of applications within a particular problem domain.” A DSSA also provides a process to instantiate the requirements and architecture. Domain model represents the elements and the relationships between the elements of a domain. Its purpose is to provide an “ambiguous understanding” [Tichelaar and et al. 1997] of the various elements of the system. Creating a domain model involves experts in the domain. [Tichelaar and et al. 1997] [Tracz 1995], Reference requirements are used to steer the reference architecture’s design. It describes the behavior requirements of the system [Tracz 1994] [Tracz 1995]. Reference architecture is an architectural design that satisfies the capabilities described by the reference requirements [Tracz 1994] [Tracz 1995]. 4.2.1 Domain Model The first step to creating a DSSA is to start with the domain model, which will provide individuals with an understanding of the system. The domain model is composed of the customers’ needs statements, scenarios, domain dictionary, 56 context diagrams, entity/relationship diagrams, data flow models, state transition models, and object models. [Tracz 1995] 4.2.1.1 Customers’ Needs Statement There are three customers for a software engineering technology testbed: the researcher, the practitioner, and the sponsor. Researchers need to evaluate their technology more cost-effectively in a representative setting that will convince practitioners to use their technologies. Researchers want to show to practitioners their technology can create more dependable systems. They also want to use the testbed to increase the technology maturation process. Practitioners need a way to find defect-reducing technologies easier and faster. Sponsors need technologies to mature faster. They want a quicker return on their investment. 4.2.1.2 Scenarios Researcher’s scenario - Researchers wish to entice an organization into adopting their technology that the researchers claim will increase software dependability (by finding defects faster) in the organization’s software systems. The organization may be hesitant to use a technology they are unfamiliar with. Thus, the researchers have to prove to the organization that their technology can help improve software dependability, that their technology is the right one for them, and that the technology will work for the end user as promised. The 57 researchers will use a software engineering technology testbed to help demonstrate their technology is right for the organization. When using the testbed, the researchers will go through the following steps as summarized in Figure 3. If a technology researcher has developed a new technology, the researcher will begin by exploring the experience base for technologies similar to the new technology. If no existing technology appears to provide the new technology’s’ capabilities, then the researcher will configure the software engineering technology testbed to evaluate the new technology. Once the evaluation is complete, the researcher will prepare a technology evaluation report and submit the report to the experience base where practitioners looking for dependability help will view it. If the evaluation results are positive and a practitioner has been identified, the researcher may then proceed to collaborate with the practitioner to configure the technology for use on a live project. Once the configuration and application of the technology is performed on the live project, the experience base should be updated to reflect the new results. If the testbed evaluation results are negative, the researcher should update the experience base with the current results to indicate to practitioners that the technology may not be ready for use yet and continue to refine the technology until the evaluation results are positive. If the researcher finds there already exists technologies that provide the new technology’s capabilities at the same or at lower costs, then the researcher should update their research goals and once the new goals are implemented, the researcher can begin the evaluation again. 58 Explore Experience Base for similar technologies Configure testbed; Evaluate Technology Collaborate with practitioner to configure technology for use on project. Apply, evaluate, and iterate technology Prepare technology evaluation report, Compare evaluation with evaluations from similar technologies, and Update Experience Base Positive Results Technology Researchers New Technology Refine Technology Negative Results If no current technology provides new technology’s capabilities Evaluate until Positive Results Found Update research goals and continue with research Good candidates exist Update technology needs and capabiltiies Figure 3: Researchers' Operational Scenario Practitioner’s scenario - When an organization like NASA has to send its software to space, the software has to be dependable. If the software does the wrong thing, it could mean the end of the mission, which would be a huge loss to the organization. However, developing dependable software is a complex problem. There are many technologies available to help software engineers create dependable software, but choosing the right one can be difficult. There are many questions to be asked of the technology such as “How does one know if the technology does as it states”, “Is the technology mature enough for use”, and “Will the technology work on my system”. 59 Practitioners will go through the following steps to identify the right technology for them as summarized in Figure 4. For practitioners having dependability problems, they will first explore the experience base looking for technologies that fit their search criteria. The practitioner will read the technology evaluation reports to determine how well a technology performed on the testbed system or on past projects done for the organization. Once a technology has been identified as a promising candidate, the practitioner will explore using the technology on their project. This could involve collaborating with the technology provider to configure the technology for use. After applying the technology to their project, the practitioner will submit a technology evaluation report to the experience base detailing how well the technology worked for the project. If no proven candidates are identified, then the practitioner will contact the researcher of the technology and ask the researcher to perform an evaluation of the technology on the testbed. The researcher will configure the testbed, perform and evaluation, and update the experience base with the results of the evaluation that will be viewed by the practitioner. 60 Explore Experience Base for relevant technology. Filter candidates Explore technology usage considerations on user project Collaborate with technology provider to configure technology for use. Apply and evaluate technology Prepare technology evaluation report and update Experience Base Formulate/revise mission approach, dependability issues Good candidate(s) Unproven candidate(s) Practitioner Contact Researcher to perform evaluation on testbed Researcher configures testbed and performs evaluation Figure 4: Practitioners' Operational Scenario Sponsors’ scenario - Technology maturation process goes from 18 years to 5 years, i.e. technology is being adopted faster. 4.2.1.3 Domain Dictionary · Researcher – Developer of the technology · Practitioner (Technology user) – User of a technology 61 · Sponsor – The organization or person paying for the testbed to be developed in hopes of having new technologies mature faster. · Technology – A method/software that will help practitioners develop dependable software · Software Engineering Technology Testbed - an instrumented testbed made for the evaluation of technologies · Instrumentation – A way for researchers to collect information when using their technology with the testbed · Seeded Defects – Defects planted in the testbed for researchers to find. Used as a way to measure the effectiveness of a technology · Robot/Rover – hardware platform to execute the code on · Simulator – software platform to execute the code on · Specifications – artifacts describing the software system used in the testbed. Artifacts include architecture models and requirements of the system · Evaluate – determine how well and easy to use a technology is. Also determine if the technology is right for the organization to use. · Technology Result – a report indicating how well the technology performed on the testbed. Results include, but are not limited to, such information as how many seeded defects the technology found and how easy it was to modify the technology to be used on the testbed. · Experience Base – a database-like system storing technology results that a practitioner can use to find technologies. 62 4.2.1.4 Context Diagram A context (block) diagram shows the data flow between the system’s major components [Tracz 1995] Testbed Artifacts (Specifications, Code, Instrumention), Comparable Technology Results, Technology Results Technology Search Parameters Technology Results Practitioner Testbed Researcher Figure 5: Testbed System Context Diagram Technology Results /Technology Search Parameters include such information as technology type, seeded defects found, and ease of use. 4.2.1.5 Entity/Relationship Diagrams Figure 6 details the type of practitioners that will use the testbed in an ER diagram. There are primarily two kinds of practitioners: developers and program managers. 63 Developer Program Manager Practitioner (from Logical View) Figure 6: Practitioners' ER Diagram Figure 7 details the type of researchers (technology providers) that will use the testbed in an ER diagram. There are primarily two kinds of researchers: researchers from industry and researchers from academia. Academic Researcher Industry Researcher Researcher ( from L ogical View) Figure 7: Researchers' ER Diagram 64 Figure 8 provides the type of specification artifacts that would be included as part of the testbed in an ER diagram. Note: this diagram provides only a sample of possible artifacts. The organization would decide what artifacts would be needed. Figure 8: Specifications' ER Diagram Figure 9 provides the ER diagram for the technology evaluation results. Figure 9: Technology Evaluation Result ER Diagram The seeded defects of a system should be classified according to a classification that would be useful to the practitioner. Usually an organization 65 would collect defects and catalog them based on a classification the organization believes is appropriate for their needs. Thus, the organization should produce a set of seeded defects described with their classification, as opposed to another classification schema, since this will make the technology evaluation more representative to the organization. However, researchers should be allowed some liberty to amend the classification if they feel the practitioner’s classification does not adequately represent the defect types their technology finds. Under this scenario, the researcher should also explain in their evaluation report why they amended the classification and how the practitioners should interpret it. 4.2.1.6 Data Flow Model Figure 10: Data Flow Diagram involving Practitioner and Researcher The data flow model in Figure 10 shows a simple scenario involving practitioners looking for technologies and researchers contributing technologies to the experience base (technology info). 66 Figure 11: Data Flow Diagram involving Researching Setting Up an Evaluation Figure 11 shows the data flow diagram for a researcher preparing to do a technology evaluation on a testbed. 4.2.1.7 Object Models Table 4: Object Model Object Attributes Operations Specifications · Specification Type/Name (Requirements, Architecture, Test) · Model Types · Life-Cycle Phase (LCO, LCA, IOC) · Select Scenario Specifications · Get specifications Code · Class names · Component names · Code complexity · Code language · Select Scenario code · Get code Platform · Name · Platform type (Simulator/Hardware) · Run experiment · Add Instrumentation 67 Table 4: Object Model, Continued Instrumentation Class · Name · Data to Collect · Collect data · Get/Display data · Configure Instrumentation Scenario/Mission Generator · Mission Type · Choose Scenario · List Scenario Guidelines- Manual · Guideline Type · Read Guidelines, FAQ, Manual Defect-Reducing Technology · Technology Name · Technology Type · Defects Technology Finds · Configure Technology · Execute Technology Technology Evaluation Results · Technology Adoption Data (information on how long it took to adopt technology) · Technology Evaluation Results (how well technology performed) · Technology Data (Information about technology and its researcher(s)) · Read Technology Evaluation Result Experience Base · Technologies Evaluated · Explore Technologies · Submit Technology Evaluation Result · Get Technology Evaluation Result Seeded Defect Engine · Defect Type · Populate specifications with seeded defects · Populate code with seeded defects Defect Pool · Defect Classification · Get Defects Project Data · Effort Data · Schedule Data · Cost Data · Get Project Data User Interface None · Explore Technologies · Get Technology Report · Configure Testbed 68 4.2.2 Reference Requirements The following is a list of the SETT’s functional requirements. · F1 - Testbed should provide specifications, code and effort data · F2 - Testbed should allow a researcher to instrument the system in order to collect data during the experiment · F3 - Testbed system should have a library of seeded defects that researchers can use to seed defects into the specifications/code. · F4 - Testbed should provide guidance to the researcher on how to use the testbed · F5 - Testbed should have a mechanism for researchers to submit their technology evaluation results (including technology adoption data) for practitioners to review · F6 – Researchers should be able to run the testbed system on a hardware platform or simulator to obtain or verify the technology evaluation results · F7 – Testbed should provide a mission generator that will multiple missions from which the researcher can select from for their evaluation. This will allow the researcher to test their technology under various scenarios. · F8 – Testbed should provide project data such as effort as part of the testbed system · F9 – Testbed should have an user interface that will allow an evaluator to access the testbed. The following is a list of the SETT’s non-functional requirements. 69 · NF1 - The testbed, including specifications and code, should be representative of the systems the sponsoring organization develops and support a wide range of technologies. In addition, the specifications should follow good software engineering practices. · NF2 - Specifications/Code of the testbed should be tailorable to meet the needs of multiple contributors · NF3 - Testbed should be available for public use · NF4 - Testbed should generate results that is combinable with other researchers’ results and representative to the end user · NF5 - The defect library should be composed of actual defects incurred while implementing the testbed system · NF6 - Researchers should use a common (ideally low-cost) platform to evaluate their technologies allowing a fair comparison of technologies and allowing results to be combinable · NF7 - The defect data classification used should be comparable to classifications used by the researchers. In addition, researchers should be able to classify the defects according to their own defect classification system. SETTs have no design requirements, thus there are no architectural style requirements or user interface style requirements for SETTs. The implementation requirements should be representative to the sponsoring organization. For example, if the sponsoring organization is JPL: · Programming Language: Language that can support real-time systems 70 · Operating systems: a real time operating system such as VxWorks or Real-time Linux · Hardware Platform: robotic hardware platform · Public use of testbed: ITAR-safe or freeware · Seeded defects: defects are actual defects found in development of testbed The following figure shows the problem and solution space for SETTs. Figure 12: Mapping between Problem Space and Solution Space 4.2.3 Reference Architecture 4.2.3.1 System Analysis The first thing I did to establish the SETT architecture was to describe the system behavior via use-case diagrams as provided in Figure 13. The use cases were derived by looking at the user scenarios provided in Figure 3 and Figure 4. The use-case diagram describes the behavior for each of the three stakeholders. 71 The practitioners would like to be able to adopt defect-reducing technologies, the researchers would like to be able to evaluate and mature their technologies on the testbed, and the sponsors would like the technologies to mature faster, from 18 years to 5 years. Figure 13: Use-Case Diagram Next, sequence diagrams are provided for each of the use-cases identified in Figure 13. Figure 14 and Figure 15 provide the details of the use-cases for the researcher. 72 Figure 14: Sequence Diagram for Adopt Technology from Researcher 73 Figure 15: Sequence Diagram for Adopt Off-the-Shelf Technology 74 Figure 16 provides the sequence diagram for the Evaluate Technology Use-Case. In this diagram, there is a sub sequence diagram called “Configure Testbed” as shown in Figure 17 and Figure 18(The sequence diagram is split into 2 parts to better view it). Figure 16: Sequence Diagram for Evaluate Technology 75 Figure 17: Sequence Diagram for Configure Testbed (1) 76 Figure 18: Sequence Diagram for Configure Testbed (2) Figure 19 provides the sequence diagram for the Practitioner’s Mature Technology use-case while Figure 20 provides the sequence diagram for the Sponsor’s Mature Technology Faster use-case. 77 Figure 19: Sequence Diagram for Mature Technology 78 Figure 20: Sequence Diagram for Mature Technology Faster 79 4.2.3.2 Architecture Design and Analysis Since the behavior of the system has been established, I will now provide the architecture of the SETT. First, Figure 20 provides the deployment diagram for the software engineering technology testbed. This figure indicates which layer each of the components will be placed in. Components dealing with the user interface will be placed in the Presentation Layer, components dealing with the logic of the SETT will be placed in the Business Layer, and all data/artifacts will be stored in the Persistence Layer. Figure 21: Simple Layered Reference Architecture Model Next, the reference architecture diagram is provided in Figure 22. The diagram indicates what components make up the SETT’s architecture in the Presentation and Business Layer. In the figure, a solid line box indicates that the component is required, while a dashed-line box indicates the component is optional to have. Three boxes grouped together indicate that the component may have variances, i.e. multiple versions of the component may exist in the same 80 instance of the architecture. The scenario for which Figure 22 applies is when an organization wishes to build a testbed for a single project. If the organization desires to build a testbed that can be used across multiple projects in the organization, then a slight adjustment would be made to the figure. Each of the components could have multiple versions, depending on the situation. For example, two projects may or may not share the same defect pool. If they do share the same defect pool, then only one version exists. However, if each project has their own defect pools, then the component would have multiple versions. Figure 23 indicates what artifacts belong in the Persistence Layer. 81 Specifications Code Defect- Reducing Technology Scenario/ Mission Generator Robot Simulator Defect Pool Seeded Defect Engine Instrumentation Project Data Guidelines- Manuals Technology Evaluation Result Experience Base User Interface Platform uses seeds defects seeds defects translates runs collects data collects data summarizes evaluation results stores optional required required and multiple versions optional and multiple versions Selects scenario specs Selects scenario code Business Layer Presentation Layer Specifications Code Code Defect- Reducing Technology Scenario/ Mission Generator Robot Simulator Defect Pool Defect Pool Seeded Defect Engine Instrumentation Project Data Guidelines- Manuals Guidelines- Manuals Technology Evaluation Result Experience Base Experience Base User Interface User Interface Platform uses seeds defects seeds defects translates runs collects data collects data summarizes evaluation results stores optional required required and multiple versions optional and multiple versions optional required required and multiple versions optional and multiple versions Selects scenario specs Selects scenario code Business Layer Presentation Layer Figure 22: Reference Architecture of the Presentation and Business Layers 82 Manual FAQ Guidelines Technology Evaluation Result Project Data Specifications Package Code Scenarios Defects optional required required and multiple versions optional and multiple versions Manual FAQ Guidelines Technology Evaluation Result Project Data Specifications Package Code Scenarios Defects Manual Manual FAQ FAQ Guidelines Guidelines Technology Evaluation Result Technology Evaluation Result Project Data Project Data Specifications Package Specifications Package Code Code Scenarios Scenarios Defects Defects optional required required and multiple versions optional and multiple versions optional required required and multiple versions optional and multiple versions Figure 23: Persistence Layer Artifacts For each of the components identified in Figure 22, the following items are provided: a description of each component, the internal class diagram for each component and a sequence diagram detailing how the internal classes and artifacts work together. In addition, properties such as requirements satisfied, whether or not the component is required, whether it has multiple versions, and good rule of thumbs developers should follow when building the component are provided as well. 83 4.3.2.3.1 Specifications Component Table 5: Specifications Component Name Specifications Purpose Component to provide specifications to the evaluators. Specifications are needed if the organization wishes to test more than just technologies that work on code. Specifications provide the basis for the evaluation, along with the code. For example, specifications are needed for evaluations on technologies dealing with architecture and/or requirements. Requirements Satisfied F1, NF1, NF2, NF3 Required Yes – However, full life-cycle specifications may not be needed. It will depend on what technologies are being evaluated. For example, if no technologies require requirement specifications, requirements can be optional to have. Multiple Version on a Single Project Artifacts within these components can have multiple versions. Rules of Thumb Following good software engineering techniques, like MBASE [Boehm and Port 2001] [USC-CSSE 2003] or RUP [Krichten 2001], will help ensure such things as the architecture meets the needs of the requirements and that the project is feasible within constraints such as schedule and cost. Should be representative of specifications organization develops Figure 24 indicates that the Specifications component has a Specifications class with a “Specifications Package” artifact. The Specifications class is in charge of returning the correct specifications package to whoever calls for it. When creating the “Specifications Package” artifact, it should be representative of the type of specifications the organization typically creates. 84 Figure 24: Internal Class Diagram for Specifications Figure 25 represents how the Specifications class interacts with the ”Specifications Package” artifact. The GetSpecifications function will be used by the Seeded Defect Engine component. Figure 25: Sequence Diagram for Specifications 85 4.3.2.3.2 Code Table 6: Code Component Name Code Purpose Component to provide code to the researchers. A software system is needed to evaluate the technology on. Requirements Satisfied F1, NF1, NF2, NF3 Required Yes Multiple Version on a Single Project Artifacts within these components can have multiple versions. Rules of Thumb Code artifact should satisfy all requirements and the code should match the architecture. A way to do this is to use good software engineering techniques like MBASE [Boehm and Port 2001] [USC-CSSE 2003] or MDS [Dvorak and et al. 2000]. Should be representative of code organization develops Figure 26 indicates that the Code component has a Code class with the Code artifact. The Code class is in charge of returning the correct code artifact to whoever calls for it. When creating the Code artifact, it should be representative of the type of code the organization typically creates. 86 Figure 26: Internal Class Diagram for Code Figure 27 shows the sequence diagram for how the Code class interacts with the Code artifact. The GetCode function will be used by the Seeded Defect Engine component. Figure 27: Sequence Diagram for Code 87 4.3.2.3.3 Seeded Defect Engine Table 7: Seeded Defect Engine Name Seeded Defect Engine Purpose Component to seeds defects into the specifications and code that the technology will attempt to detect. Requirements Satisfied F3, NF3 Required Yes Multiple Version on a Single Project No Rules of Thumb N/A Figure 28 indicates that the Seeded Defect Engine component has a SeededDefectEngine class that interacts with two artifacts: SeededSpecifications and SeededCode. The SeededDefectEngine class is in charge of creating the seeded artifacts and returning the artifacts to whoever calls for it. Figure 28: Internal Class Diagram for Seeded Defect Engine Figure 29 and Figure 30 represents how the SeededDefectEngine class will respectively get the specifications and code artifacts and seeds defects into them. 88 Figure 29: Sequence Diagram for Seeding Defect Engine (1) 89 Figure 30: Sequence Diagram for Seeding Defect Engine (2) 90 4.3.2.3.4 Defect Pool Table 8: Defect Pool Name Defect Pool Purpose Component that stores the database of defects the Seeded Defect Engine can seed into the specifications/code. Requirements Satisfied F3, NF3, NF5, NF7 Required Yes Multiple Version on a Single Project No Rules of Thumb Good defects come from actual defects occurred Figure 31 indicates that the Defect Pool component has a Defect class that interacts with the Defect artifact. The Defect class is in charge of storing the seeded defects. When creating the Defect artifact, it should be representative of the type of defects the organization usually encounters in its software development and representative of the defects the organization would like the Defect-Reducing technologies to find. Figure 31: Internal Class Diagram for Defect Pool 91 Figure 32 represents how the Defect class interacts with the Defect artifacts. Typically, a call is made to get a certain defect from the defect database and that defect is returned to the caller. Figure 32: Sequence Diagram for Defect Pool 4.3.2.3.5 Scenario/Mission Generator Table 9: Scenario/Mission Generator Name Scenario/Mission Generator Purpose Component to select which scenario/mission to run and chooses the corresponding set of specifications/code that goes with the scenario/mission. Having multiple scenarios allows researchers to do a more thorough evaluation of their technology since it will be operating under several scenarios and not just one. Also, prevents a researcher from developing their technology to work for only a specific scenario. Requirements Satisfied F7, NF3 Required Yes Multiple Version on a Single Project Scenario artifacts within the component may have multiple versions. Rules of Thumb Missions should be representative of organization Figure 33 indicates that the Scenario/Mission Generator component has a Scenario Generator class that interacts with various Scenario artifacts. When 92 creating the Scenario artifact, it should be representative of the type of scenarios the organizations’ software systems usually runs. Figure 33: Internal Class Diagram for Scenario Generator Figure 34 represents how the Scenario Generator class interacts with the Scenario artifacts. Typically, a call is made to list the possible scenarios the user can choose from, the user is asked to choose a scenario, and that scenario is returned to the caller. Figure 34: Sequence Diagram for Scenario Generator 93 4.3.2.3.6 Platform Table 10: Platform Name Platform Purpose Component to run the code. Can be either simulator or robotic hardware. Requirements Satisfied F6, NF3, NF6 Required If the technology doesn’t need a platform to run code/execute the experiment, this may be optional. Multiple Version on a Single Project Yes – Can run on multiple platforms. Rules of Thumb Should run on a platform that the organization’s system runs on Figure 35 indicates that the Platform component has a Platform class inside its internal structure. When developing the platform, the platform can either be a hardware system or a simulator. However, it should be representative of the type of platforms the organization usually builds. Figure 35: Internal Class Diagram for Platform 94 Figure 36 represents how the Platform class interacts with the software engineering technology testbed. The platform class will add instrumentation to itself and then the evaluator can perform the evaluation on the platform. Figure 36: Sequence Diagram for Platform 4.3.2.3.7 Instrumentation Table 11: Instrumentation Name Instrumentation Purpose Component that provides a way for researchers to collect the information they need to perform an analysis of how well their technology performed. Requirements Satisfied F2, NF3 Required Yes Multiple Version on a Single Project No Rules of Thumb N/A Figure 37 indicates that the Instrumentation component has an Instrumentation class inside its internal structure and that the class collects and stores the data in the “Researcher’s Data” artifact. 95 Figure 37: Internal Class Diagram for Instrumentation Figure 38 represents how the Instrumentation class interacts with the Platform class. The platform class will add instrumentation to itself and then the Instrumentation class will collect the data as requested by the researcher when the evaluation is running. This data then gets returned to the researcher. Figure 38: Sequence Diagram for Instrumentation 96 4.3.2.3.8 Project Data Table 12: Project Data Name Project Data Purpose Component that stores the data on how long it took to develop the specifications and code of the system. Good for technologies evaluating costs. Requirements Satisfied F8, NF1, NF3 Required If no studies on project cost are being conducted, this is optional to have. Multiple Version on a Single Project No Rules of Thumb N/A Figure 39 indicates that the “Project Data” component has a ProjectData class inside its internal structure and that the class collects and stores the project data in the ProjectEffortData artifact. Figure 39: Internal Class Diagram for Project Data .Figure 40 represents how the ProjectData class interacts with the ProjectEffortData artifact. A researcher will ask the ProjectData class for the project data and that gets returned to the researcher for his/her use. 97 Figure 40: Sequence Diagram for Project Data 4.3.2.3.9 Guidelines-Manuals Table 13: Guidelines-Manuals Name Guidelines-Manuals Purpose Component that provides users help/information about the testbed such as how to use it or how to conduct an experiment. Requirements Satisfied F4, NF3 Required If researcher/practitioner can understand testbed well enough, this is not needed, but will be helpful to have according to studies by [Basili, Tesoriero, and et al. 2001]. Multiple Version on a Single Project Yes Rules of Thumb Good and descriptive manuals/FAQ/guidelines will mean researchers will rely less on the organization when it comes to understanding and using the testbed. Figure 41 indicates that the Guidelines-Manual component has Guidelines-Manual class inside its internal structure and that the class interacts with three artifacts: the Experimentation Guidelines, the FAQ, and the Manual. 98 Figure 41: Internal Class Diagram for Guidelines-Manual Figure 42, Figure 43, and Figure 44 represent how the researcher will access each of the artifacts. Figure 42: Sequence Diagram for Reading Guidelines Figure 43: Sequence Diagram for Reading Manuals 99 Figure 44: Sequence Diagram for Reading FAQ’s 4.3.2.3.10 Technology Evaluation Result Table 14: Technology Evaluation Result Name Technology Evaluation Result Purpose A report created by the researcher or a technology user detailing how well the technology worked for them. Requirements Satisfied F5, NF4, NF3 Required Yes Multiple Version on a Single Project Artifacts within these components can have multiple versions. Rules of Thumb N/A Figure 41 indicates that the Technology Evaluation Result component is composed of several artifacts. Since the Technology Evaluation Result artifact is a part of the experience base, I have shown the sequence diagram for this component as part of the sequence diagram in Figure 47 and Figure 48. The diagrams indicate how a practitioner or researcher could use the Technology Evaluation Results. 100 Figure 45: Internal Class Diagram for Technology Evaluation Result 4.3.2.3.11 Experience Base Table 15: Experience Base Name Experience Base Purpose Component used to store the Technology Evaluation Reports that user can search to find a technology. Requirements Satisfied F5 Required Yes Multiple Version on a Single Project No Rules of Thumb N/A Figure 46 indicates that the “Experience Base” component has an ExperienceBase class inside its internal structure and that the class interacts with a database of Technology Evaluation Results. 101 Figure 46: Internal Class Diagram for Experience Base Figure 47 represents how a practitioner would use the experience base to search for a technology evaluation result while Figure 48 indicates how a researcher would create a technology evaluation result report and submit it to the experience base. Figure 47: Sequence Diagram for Practitioner' s Use of Experience Base 102 Figure 48: Sequence Diagram for Researcher' s Use of Experience Base 4.3.2.3.12 Defect-Reducing Technology Table 16: Defect-Reducing Technology Name Defect-Reducing Technology Purpose The technology that is provided by the researcher that will be used to find detects. Requirements Satisfied F5 Required Yes Multiple Version on a Single Project Yes Rules of Thumb N/A The Defect-Reducing Technology component will not be developed by the sponsoring organization, but instead be provided by the researchers themselves. However, users of the technology should be able to configure the technology to their software systems and execute the technology to find defects in the system as shown in Figure 50. Figure 49: Internal Class Diagram for Defect-Reducing Technology 103 Figure 50: Sequence Diagram for Defect-Reducing Technology 4.3.2.3.13 User Interface Table 17: User-Interface Name User Interface Purpose Component that allows users to access the testbed and its functionalities. Requirements Satisfied F9 Required Yes Multiple Version on a Single Project No Rules of Thumb N/A Figure 51 indicates that the “User Interface” component is composed of two classes: the Practitioner User Interface and the Researcher User Interface classes. Each of the classes is able to interact with the other components of the SETT as shown in Figure 52, Figure 53, Figure 54, and Figure 55. 104 Figure 51: Internal Class Diagram for User Interface Figure 52: Sequence Diagram for Practitioners’ User Interface 105 Figure 53: Sequence Diagram for Researchers’ User Interface (1) Figure 54: Sequence Diagram for Researchers’ User Interface (2) Figure 55: Sequence Diagram for Researchers’ User Interface (3) 106 In section 6.5, an instance of the testbed architecture called SCRover is provided, along with an in-depth description of each of the architecture’s component. 4.3 Results of CBSP vs. DSSA Using both methods generated the same set of requirements and a similar architecture. 4.4 Requirements Mapped to Published References From the CBSP approach, I obtained a list of requirements for the software engineering technology testbed. In this section, I will indicate which reference provided the origin of each testbed requirement. 107 Table 18: Requirements Mapped to Published References Requirement Published Reference: Testbed should provide tailorable, representative specifications and code to support a wide range of technologies and multiple contributors Lack of good documentation increased the amount of work researchers had to do to use the testbed or software system they were investigating. Without accurate documentation, it makes understanding the system harder [Tichelaar and et al. 1997] Testbed should allow a researcher to instrument the system in order to collect data during the experiment Many computer systems use instrumentation to collect data (Example: Nokia uses instrumentation for performance profiling [Metz and Lencevicius 2003]) Testbed should allow a researcher to seed defects into the system Pseudo defects were seeded into a program, which was then tested. The tester reported the number of defects found and these would include pseudo defects and real defects allowing us to estimate how many defects were in the system [Mills 1972]. According to [Stone 2003], domain- dependent solutions are bad. With the use of seeded defects, this will discourage researchers from developing technologies to work on a single system The defect pool should come from actual defects incurred while implementing the testbed system Use Change histories to seed defects [Boehm and Port 2002] Testbed should be able to allow a researcher to use multiple representative missions in their evaluation According to [Stone 2003], domain- dependent solutions are bad. Repeating same competition several times encourages high-level, generalizable solutions. Thus, there is a need to have the testbed have several different missions to discourage domain-dependent solutions Testbed should run on a low-cost platform According to [Stone 2003],, one disadvantage of competitions is that there is a huge barrier, in terms of finances, for newcomers to overcome if they wish to participate in the competition. 108 Table 18: Requirements Mapped to Published References, Continued Researchers should capture technology results for practitioners to review According [Basili, Tesoriero, and et al. 2001], experience bases can be used to share knowledge. [Redwine and Riddle 1985] indicates a difficulty in getting technology adopted is a lack of positive experiences in the technology. Researchers should use a common platform to evaluate their technologies allowing a fair comparison of technologies [and allowing results to be combinable] A second disadvantage to competitions according to [Stone 2003] is that different solutions use different hardware platforms, giving teams with the more expensive hardware an unfair advantage in some cases. Testbed should generate results that is representative to the end user According to [Stone 2003], another disadvantage is that researchers can build technology solutions that will work only for the competition Testbed should require a low amount of effort to set up and use, including costs to inject defects According to [Stone 2003], one disadvantage of competitions is that there is a huge barrier, in terms of finances, for newcomers to overcome if they wish to participate in the competition. Testbed should be available for public use Other testbeds were not made for public use, thus limiting the success of those testbeds, example: NASA’s own testbed systems that had ITAR restrictions Testbed should provide guidance to the researcher on how to use the testbed Guidance helps users become more familiar with the tool [Basili, Tesoriero, and et al. 2001] Testbed should generate results that is combinable with other researchers’ results Helps in seeing how two solutions may work together and allows for easier collaboration amongst researchers [Basili, Zelkowitz, and et al. 2007]. 109 Chapter 5 Software Engineering Technology Testbed Analysis 5.1 Comparison of Software Engineering Technology Testbeds to Other Testbeds While each of the testbeds mentioned in chapter 2 are good and have helped foster growth in technology, each testbed has its limitations as shown in Table 19-Table 24. I do a comparison of the testbeds based on how it is used, its architecture, and its operational concepts. Table 19 provides the comparison based on the Evaluation Context, Evaluation Criteria, and the Problem Scope while Table 20 provides the comparison based on the Time/Cost Considerations, Researchers’ Feedback, and the Practitioners’ Feedback. 110 Table 19: Evaluation Context and Criteria, and Problem Scope Comparison Evaluation Context Evaluation Criteria Problem Scope RoboCup Head to Head Point Score Few TAC Field of Alternatives Lowest Cost Single ISPW 5/6 Field Imprecise Vector Single DARPA Speech, Image, Message Understanding Field Correctness Score Single, Multiple ACES Field Correctness Score Single HDCP Testbeds (GG – Golden Gate Testbed) Multiple Fields Defects avoided, found, and diagnosed Many contributors; single goal SETT Multiple Fields Defects avoided, found, and diagnosed; Time to adapt technology Multiple Goals, many contributors 111 Table 20: Cost, Researchers’ and Practitioners’ Feedback Comparison Cost (Time) Considerations Researchers’ Feedback - Practitioners’ (Users) Feedback - RoboCup Implicit Strengths, weaknesses Indirect TAC Added criterion Strengths, weaknesses Representative ISPW 5/6 Added criterion Strengths, weaknesses Representative DARPA Speech, Image, Message Understanding Added criterion; ROI Strengths, weaknesses Representative or Indirect ACES Added criterion; ROI Strengths, weaknesses Representative or Indirect HDCP Testbeds (GG – Golden Gate Testbed) Added criterion; ROI Strengths, weaknesses Representative or Indirect SETT Added criterion; ROI Strengths, weaknesses Representative or Indirect Table 21 provides the comparison based on several components: Specifications, Code, and Seeded Defects while Table 22 provides the comparison based on the Mission Generator and Instrumentation Components. 112 Table 21: Specifications, Code, and Seeded Defects Comparison Specifications Code/ System Seeded Defects RoboCup None Provided rules of the game; Framework provided to build your own robot None TAC None Provided rules of the game; Framework provided to build your own agent None ISPW 5/6 None Provided benchmark problem None DARPA Speech, Image, Message Understanding None Provided rules None ACES None Test harness provided Provided HDCP Testbed None or Limited Specifications System provided, may not be representative Defects may not be realistic SETT Architecture, Requirements, and Testing Specifications Representative system provided Provided 113 Table 22: Instrumentation and Mission Generator Comparison Instrumentation Mission Generator RoboCup None Modifiable rules to create new scenarios TAC None Modifiable rules to create new scenarios ISPW 5/6 None None DARPA Speech, Image, Message Understanding None Provide different images, speech ACES None Provide many test cases HDCP Testbed Provided for TSAFE only Limited SETT Provided Provided Table 23 provides the comparison based on several operational concepts: Representative Feedback, Multiple Goals and Multiple Contributors while Table 24 provides the comparison based on the operational concepts of Searchable Results, Combinable Results and Low Cost. Table 23: Feedback, Multiple Goals and Contributors Comparison Representative Feedback Multiple Goals Multiple Contributors RoboCup No Yes Yes TAC Yes No Yes ISPW 5/6 Yes No Yes DARPA Speech, Image, Message Understanding Yes No Yes ACES Yes No Yes HDCP Testbed Yes for GoldenGate; No for TSAFE Yes for GoldenGate; No for TSAFE Yes SETT Yes Yes Yes 114 Table 24: Searchable and Combinable Results and Low Cost Comparison Searchable Results Combinable Results Low Cost RoboCup No No Free to use TAC No No Free to use ISPW 5/6 No No Free to use DARPA Speech, Image, Message Understanding No No Free to use ACES No No Free to use HDCP Testbed No for GG, Yes for TSAFE No for GG, Yes for TSAFE Free to use; GG – not ITAR-safe SETT Yes Yes Free to use In RoboCup [RoboCup 2007], there is a head-to-head competition between two research teams to determine who is the winner. The team that scores the most goals wins. The disadvantages to using RoboCup as a testbed are numerous. First, it can be difficult to judge how technologies compare against each other. In its soccer competition, it uses a playoff type format to determine the winner of the competition. As [Stone 2003] indicates, this can be an erroneous way to judge the value of one’s research, and it is a bad way to judge whose technology is better. Perhaps the technology is not suited for soccer, but it will work better in other fields of application. RoboCup needs an additional way to help researchers gauge how well their technologies performed besides the number of soccer goals scored. Another limitation of RoboCup is that the problem scope of the competition is limited to just a few applications: soccer, search and rescue missions, and dance challenges. With its limited scope, this could deter many researchers from joining the competition. Finally, after the 115 competition, its application towards other fields of interest is indirect. Unless the end user is interested in dance or soccer applications, the user will have to judge for themselves whether or not the technology can be applied to their applications. If the end user does decide to adapt the technology, it may not be clear as to how that application will take place and the additional time it would take to apply the technology to the new field. Requirements for the RoboCup competition are few. All RoboCup provides are a set of rules to play the game of soccer under a set of environmental conditions. Each year, the environmental conditions could be changed to prevent researchers from programming their robots to win under a given set of conditions. An example of an environmental change is adding more shadows to the field. RoboCup provides a simulator to researchers who wish to evaluate their software technologies such as AI algorithms. However, this is all that RoboCup provides. The RoboCup testbed architecture does not provide much other than rules and a simulator as indicated in Table 21 and Table 22. As indicated in Table 23 and Table 24, RoboCup does not meet all the operational concepts needed by the stakeholders. The feedback will not be representative for the practitioners and results of the competition are not combinable or searchable. For researchers wishing to use RoboCup, they will be able to use RoboCup to determine which of the technologies being used is superior, but it will be difficult for them to use RoboCup to evaluate how well a technology can increase the robot’s dependability by finding as many defects as possible in the software system. 116 In the TAC competition [TAC 2007], the competition is amongst a field of software agents. The winner of the competition is the agent who finds the best deal for its customer. Unlike RoboCup, each agent competes with all the other agents. Thus, it makes it easier to compare technologies against each other. A limitation of the TAC competition is the problem scope of the testbed is limited to the auction field, and adapting the technology to other domains may be difficult. However, if an end user is interested in using agents to help them find the best deal on a travel package, then the user can get excellent representative feedback from the TAC competition to determine how well the agent will perform for them. The requirements for the TAC competition are few as well. TAC provides a set of rules for the competition and a simulator for researchers to use. The architecture for the TAC testbed would be minimal as it provides only a simulator and some rules as indicated in Table 21 and Table 22.As indicated in Table 23 and Table 24, TAC does not meet the operational concepts needed by the stakeholders. Due to its limited problem scope, TAC can only be used in the software agents’ community, and results of the competition are not combinable or searchable. In the ISPW 5/6 competition [Kellner and et al. 1991], multiple researchers give solutions to the same problem and using an imprecise vector of evaluation criteria, a panel decides which solution is the best one for the given problem. Due to the imprecise nature of the judging, there is no way for sure to say which 117 solution is really best. It is up to the judges’ opinions, which can vary amongst the different judges. As with many other competitions, the ISPW 5/6 competition is restricted to only a single specialized problem scope, which can make applications to other problems difficult or impossible. Many of the languages created for the competition may not be suitable for use by other domains. If a user wishes to use the language in another domain than the one stated in the contest, then the end user will have to configure the language to work for their domain, thus making the feedback to the end user indirect. It will be up to the end user to decide if the language will work for them or not, based on the contest’s results and based on their own opinion of the language. Requirements for the ISPW competition are few as it only provides a common problem for the researchers to solve. There is no architecture for the ISPW testbed as indicated in Table 21and Table 22.As indicated in Table 23 and Table 24, the ISPW competition does not meet much of the operational concepts needed by the stakeholders. In the DARPA Speech and Image Workshops [DARPA 2000], multiple researchers give solutions to the same problem. DARPA will then determine how well each technology did in identifying the speech or image. The criteria will be based on a correctness score, which is determined by the number of correct images and texts identified and the number of false positives made by the technology. Thus, it will be easy for a user to determine how well the technology did in this problem scope. For DARPA to evaluate the technology in live scenarios DARPA would just need to replace the pre-recorded images and 118 conversations with a live camera shot and an actual live conversation between two or more people. Thus, the workshop can be a representative or a direct application of the technology for DARPA. Due to the limited scope of the workshop, only a handful of researchers are able to participate which can be a drawback. Just like RoboCup and TAC, the DARPA testbed provides a set of rules for the competition and a set of pictures and sounds for researchers to correctly identify. However, the DARPA testbed gives no technical support for its researchers to evaluate their technology, as its architecture is sparse. As indicated in Table 23 and Table 24, the DARPA workshop does not meet much of the operational concepts needed by the stakeholders. In the ACES test suite, a field of Ada compiler developers use the ACES test suite to evaluate their Ada compiler. The test suite will put the Ada compiler through a series of tests to determine how well the compiler works. The result is a test score indicating how many tests the compiler passed. However, the tests are known ahead of time to the Ada compiler developers, thus there can be a tendency to develop the compiler to just pass a series of tests, possibly resulting in a compiler that may not work outside ACES environment. Furthermore, a test suite like ACES can only validate a technology, i.e. indicates how well it works. No information is collected on how long it took to learn and adapt the technology. The ACES test suite is restricted to a single specialized problem, Ada compilers. It cannot evaluate a wide range of technologies. For example, if someone in the architecture domain wanted to use ACES to analyze architecture technologies, 119 this would not be feasible. Furthermore, the ACES test suite doesn’t provide an experience base for practitioners to search for the best ACES compiler. Another component test suites don’t generally provide is instrumentation. All of the above testbeds do have one feature in common between them. Each testbed provides feedback to the researchers detailing the technology’s strengths and weaknesses. 5.2 Were testbed users’ needs met by the architecture? Table 25: Mapping between User' s Needs and Architecture User’s Needs SETT Architecture Low cost common platform System that is free or low-cost to use Configurable Specifications Configurable specifications Way to gather information used for analyzing the performance of the technology Configurable instrumentation class Way to evaluate technology Seeded Defects, Researcher’s Effort Positive and negative experiences Instrumentation, Technology Results Evaluations Able to compare different technologies Combinable Technology Results Evaluations Table 25 lists the users’ needs, based on their objectives, mapped to the part of the SETT’s architecture used to fulfill those needs. The researcher needed a low cost common platform to evaluate their technology and a SETT provides a simulator that is free to use. The researcher needed configurable specifications they can use in their technology and the testbed provides configurable specifications. There was a need to gather information to be used for analyzing the performance of the technology. The SETT’s architecture fulfilled that need with a configurable instrumentation class that would run during the 120 execution of the system. The researchers and practitioners needed a set of criteria to base their evaluation on. Using seeded defects will allow the researchers to tell practitioners how well their technologies can increase system dependability. Using process information such as the effort to apply the technology to the representative system will allow the researcher to give an estimate of how much effort it will take to apply the technology to the practitioners’ software system. Practitioners need positive and negative experiences about the technologies in their evaluation of them. Each researcher will give a technology evaluation report detailing how many defects the technology found and how hard it was to apply the technology to the system. To create the technology evaluation report, the researcher will need to create an instrumentation class to collect data needed for the report. Practitioners will sometimes choose more than one technology to fulfill their dependability needs. Thus, they need a way to compare different technologies and figure out if using two or more technologies will be of benefit to them. If the technology evaluation results are combinable, this allows practitioners to see how many unique defects using multiple technologies will find. For combinable results, this assumes that the researchers will be using the same set of seeded defects. 121 5.3 How do software engineering technology testbeds answers the challenges of technology adaptation According to [Redwine and Riddle 1985], one of the difficulties of getting users to use new technologies is that there is no collection of prior experiences demonstrating positive feedback on a technology. This lack of an experience base causes software engineers to be hesitant to use new technologies since they will have no idea how well it really works. The software engineering technology testbed will have an experience base of prior experiences, both positive and negative, for each technology allowing software engineers to see how well it worked on a representative software system. The information going into the experience base would contain information such as, but not limited to, the effectiveness of the technology to finding defects, what type of defects it found, training time to learn the technology, and a description of the technology. By looking at the information, a software engineer would be able to gauge how well the technology will work on their project. Another challenge according to Redwine and Riddle was conceptual integrity. By using the software engineering technology testbed, the researcher will be able to demonstrate that the technology is well developed by applying the technology on a representative software system and being able to find (seeded) detects. If the researcher is unable to find the seeded defects in the representative system, then the researcher will need to develop the technology some more before it can be used by the technical community. 122 A third challenge was showing a clear recognition of need for the technology. For the software engineering technology testbed, the need of the testbed users would be to increase system dependability via technologies. By using the software engineering technology testbed, a researcher can demonstrate that their technology is able to increase software dependability by showing how well their technology can identify defects in a system. Finally, Redwine and Riddle mentioned that lack of training for the new technology is an impediment for technology transition. The software engineering technology testbed will contain information about the technology from the researcher, information on how the technology was applied to the testbed, and how long it took to apply the technology. 5.4 How do software engineering technology testbeds answers the challenges the hazards of competitions According to [Stone 2003], one disadvantage of competitions is that there is a huge barrier, in terms of finances, for newcomers to overcome if they wish to participate in the competition. With the software engineering technology testbed, the hope is that the financial constraints will be easy to overcome for newcomers. For example, to use the SCRover system in simulation mode, all a researcher needs is a P4 desktop with a recommended 512 RAM and 10GB of hard drive space which is fairly inexpensive nowadays. All the software, including the Linux operating system and MDS software, used in the testbed is low cost. If the researcher wishes to run the code on an actual robot, the costs can vary between $2,000 and $25,000, depending on the robotic platform chosen. 123 A second disadvantage to competitions according to Stone is that different solutions use different hardware platforms, giving teams with the more expensive hardware an unfair advantage in some cases. For example, with the SCRover testbed, every researcher can use the same simulator, Gazebo, and the same hardware, a Pioneer 3-AT rover, allowing every team to be on the same footing. Using a common hardware platform (or simulator) will also allow comparisons amongst the different technologies to be made easier. Another disadvantage to competitions according to Stone is that there is a potential to conclude that if team A beats team B, then team A’s solution is better than team B’s. Of course, this one-dimensional way of comparison is flawed since it only looks at the overall success of the technology. For example, with the SCRover experience base and all the information it holds, the engineers should get a better sense of what each technology can do (for example, what type of defects it can find) and what are each technology’s advantages and disadvantages. Thus, eliminating the one-dimensional comparisons made at competitions. Another disadvantage is that researchers can build technology solutions that will work only for the competition. For example, with the SCRover testbed, it is possible that researchers can build technologies that will only work on the SCRover system; however since the SCRover system is a representative NASA software system, the solution should be able to work on another NASA system as well. In addition, the SCRover system has seeded defects in it that are 124 unknown to the researcher. The researcher will not be able to configure their technology to find the seeded defects, thus allowing for a better technology evaluation process. 5.5 How do software engineering technology testbeds answers the obstacles of other testbeds The following list below describes the obstacles of past testbeds and how the SCRover testbed has overcome those obstacles: The first obstacle was that testbeds needed good design and requirement documents needed. Unlike the TSAFE and Golden Gate testbeds, the SCRover testbed provides specification documents such as architecture and requirements documents to help other researchers understand the SCRover system. In addition, effort and defect data were collected in the development of the project. A FAQ list is also provided as well answering questions from previous researchers who used the SCRover system. The second obstacle was that testbeds needed to be publicly available to researchers, i.e. ITAR-safe or open-source. The MDS Framework code that the SCRover testbed uses has been approved to be ITAR-safe, along with the USC Adaptation code. In addition, all the specifications produced by the SCRover team are publicly available on the SCRover website – http://cse.usc.edu/iscr The third obstacle was that testbeds needed a low cost to encourage usage. The SCRover testbed provides a free simulator program courtesy of the USC Robotics department that will allow researchers to run the SCRover system on their workstations. The only significant cost to a researcher would be that they 125 might need to spend time understanding what the SCRover system is and how to incorporate their experiment into the system. The fourth obstacle was that testbeds needed to be easy to use and modify. For the SCRover testbed, manuals and guidelines are provided to researchers to help them understand and modify the SCRover system. The fifth obstacle was that testbeds needed to be representative of a complex mission. SCRover is a mobile rover that can roam around the hallways of USC buildings looking for specific objects. This scenario resembles an existing Mars mission where the Mars rovers are looking for rocks and evidence of water on the planet. The sixth obstacle was that testbeds needed clear guidelines on how to conduct an evaluation. For the SCRover testbed, the Fraunhofer Center provided guidelines on how to collect data and conduct an experiment. The next obstacle was a need to display the results/information collected about technologies. This can be accomplished by setting up an experience base of prior positive experiences. [Basili, Tesoriero, and et al. 2001] In addition, VQI [Seaman and et al. 1999] could be used to display the results in the experience base. The last obstacle was a need a to evaluate effectiveness of the technologies and compare them to each other. In the SCRover testbed, seeded defects in the code and documents are used as the basis for comparison to determine how well a technology performed against other similar technologies. 126 5.6 How to Configure a SETT Figure 3 provided an overview on how an evaluator would use a SETT but this section will go into more detail on how one would configure a testbed to evaluate a technology. To use the testbed, evaluators start by reading the experimentation guidelines that provide them a framework for conducting their evaluation and provide instructions on how to use the testbed. Next, they will define the type of defects the technology is expected to detect and what kind of data the instrumentation class should be collecting. These features will help gather the necessary data used to evaluate the performance of the technology. The next step is to define the appropriate operational scenario under which the technology will be evaluated. Then, based on the criteria defined, appropriate instrumentation and seeded defects are applied to the project artifacts associated with the selected operational scenario. Once the appropriate set of project specifications and code has been obtained, the evaluator will apply the technology to the set of project artifacts. Next, the technology is then executed. This step does not necessarily require code to be run. For example, if a new type of requirements peer reviews were being performed, then no code is necessary. After the execution of the technology, the evaluators use the data provided by the instrumentation to determine the percentage of seeded and unseeded defects of each type that were found. This enables an analysis of how well the technology performs in detecting, avoiding, or compensating for various classes of seeded and previously undiscovered defects, in comparison to alternative technologies. 127 The data and the analysis are then stored in an experience base to be accessed by project managers interested in technology to increase the dependability of their delivered systems. Chapter 9 provides examples of how evaluators used the software engineering technology testbeds. 128 Chapter 6 The SCRover Testbed Life Cycle Architecture Package 6.1 SCRover results chain, system boundary, and operational concept The section below describes the results chain [Thorp and DMR 1998] diagram for the SCRover testbed project. Figure 56: Results Chain Diagram for SCRover System (Increment 3) 129 The square-cornered boxes represent initiatives while the round – cornered boxes represent outcomes. The arrows represent contributions. The system boundary indicates the services the SCRover testbed will be responsible for developing and delivering and how it interacts with the system stakeholders. Figure 57: System Boundary and Environment Context Diagram of SCRover The SCRover testbed provides an ITAR-safe, experimental framework that allows researchers to evaluate the efficiency of their technology on a NASA- like project. The testbed contains software, supporting information such as specifications, metrics, instrumentation, seeded defects, and guidelines, a robotic 130 platform (both real and simulated), and a development environment. As shown in Figure 58, to use the testbed, researchers start by applying their technology to the SCRover specification and code. Then, based on the evaluation criteria defined by them, appropriate instrumentation and seeded defects are applied to the project artifacts. These features will help gather the necessary data used to evaluate the performance of the technology. The next step is to define the appropriate operational scenarios under which the technology will be evaluated. These operational scenarios are represented by goal networks that are transmitted to the system in the form of GEL files. The code is then executed. After the execution of the system, the researchers use the data provided by the instrumentation to determine the percentage of seeded and unseeded defects of each type that were found. This enables an analysis of how well the technology performs in detecting, avoiding, or compensating for various classes of seeded and previously undiscovered defects, in comparison to alternative technologies. The data and the analysis are then stored in an experience base to be accessed by project managers interested in technologies to reduce the number of defects in their delivered systems. 131 Configure specs, code Add instrumentation, seeded defects Compile code; expand Goal scenarios Execute code; create MDS hardware proxy commands Command robot or simulator Monitor, record performance Analyze performance w.r.t. evaluation criteria Technology Intervention Technology Intervention Evaluation Criteria Evaluation Criteria Robot or Simulator Robot or Simulator MDS Framework MDS Framework Operational Scenarios Operational Scenarios Experience Base Experience Base End mission Goals unsatisfied Goals satisfied Estimate state vs. goals (abort mission) Figure 58: SCRover Operational Concept 6.2 Explanation of MDS technology In 1998, the NASA Jet Propulsion Laboratory (JPL) initiated a project called the Mission Data System (MDS) [Dvorak and et al. 2000] to develop a core system engineering methodology and software toolset for the next generation of deep space missions. The MDS goal has been to develop a set of closely matched tools and techniques to reduce development and debugging costs, promote reusability, and increase reliability throughout a project’s lifecycle. The principal MDS products include a system engineering methodology called the State Analysis Process, a software framework, a goal-based operational methodology, and a cost estimation model based on COCOMO II. 132 6.2.1 State Analysis Engineering Process MDS provides a collaborative engineering methodology and tools for systems engineers to capture requirements in terms of familiar concepts: states, commands, measurements, estimators, controllers, and hardware devices. Requirements are captured in a database that can be checked for validity and completeness. The resulting requirements are organized into a state-oriented model of the system' s behavior, which maps directly into the software framework (discussed below), eliminating errors in translation and reducing cycle time 6.2.2 MDS Framework The MDS Framework consists of over 35 reusable packages for common functionality such as state-oriented control, event logging, time services, data management, visualization, units of measurement, state variables, and an interface to real or simulated hardware called the hardware proxy. The entire set is organized into a modular architecture supportive of state oriented real-time control systems. 6.2.3 Operational Tools Operators of MDS-based systems specify activities in terms of “what” rather than “how,” or, in MDS parlance, in goals rather than commands. Goal- driven operation provides a level of control that can vary from purely time- scripted to fully autonomous operations. A goal is simply a constraint on the value of a state variable over a time interval. Goals are assembled into goal networks that prescribe timing and prerequisites (or preconditions) for goals. 133 Goal networks are scripted in a Goal Elaboration Language (GEL) that provides an unambiguous expression of operational intent. 6.3 USC Mission Adaptation Code 6.3.1 Software - Player/Gazebo One of the disadvantages of other testbeds is the cost involved in using the testbed. Testbeds like RoboCup require a high cost due to expensive hardware involved. The SCRover testbed provides a low-cost solution for researchers who wish to use the testbed. The SCRover testbed utilizes software called Player/Gazebo [Gerkey and et al. 2003] to run the system in simulation mode. Thus, all that is required to use the SCRover system is an inexpensive desktop system. Player is developed at USC to communicate with the Pioneer family of robots. Player supports driving the robot’s wheel motors, controlling the camera’s pan/tilt unit, and querying a variety of on-board sensors. Access to these functions is provided through a client API, which communicates with a server process running on the rover itself. The client-server interaction can be conducted over a TCP/IP link, allowing us to execute the MDS Framework on a machine separate from the rover’s on-board PC. Our MDS Hardware Proxy makes calls to a Player client shared library, various functions of which allow us to operate the drive motors, operate the camera, read the rover’s position (maintained by the Player server process), and obtain a profile of the 134 environment generated by the laser rangefinder. Figure 59 details this interaction. 6.3.2 Hardware – Pioneer 2-AT For researchers who wish to run the SCRover system on an actual robot, instead of in simulation, the researchers can purchase a Pioneer 2-AT rover [ActivMedia 2003]. The Pioneer 2-AT is an all-terrain robotic platform that has the capabilities of laser sensing, camera vision, Ethernet-based communications, and other autonomous functionalities. The cost of the Pioneer 2-AT rover is approximately $25,000. For researchers who want a cheaper robot, the students at the University of Oregon were able to run the SCRover system on an ER1 robot [ER1 2004]. I will talk about this topic more in chapter 9.4. 6.3.3 SCRover testbed architecture, specification, and code Development within the MDS Framework to operate our robotic platform (Pioneer 2-AT) has focused on the Hardware Proxy, State Knowledge, State Determination, and State Control components of the framework’s four- component cycle as expressed on the left side of Figure 59. In the following sections, we describe our efforts to enable MDS to communicate with the robot (Hardware Proxy) and our implementation of three top-level components (State Knowledge, State Control, and State Determination). 135 6.3.4 Robot Missions We have successfully implemented three separate high-level missions for the SCRover as a proof-of-concept, and to provide a baseline for our ongoing development of more complex behaviors. In Increment 1, we duplicated the functionality of JPL’s MRE4 (Mars Rover Example 4). This demonstration required the rover to turn 90 degrees and drive three meters. The simple scenario enabled us to establish the basic interoperability preconditions and protocols between the MDS Framework, the robot, and its simulator. Figure 59: SCRover architecture In Increment 2, we implemented reactive “wall-following” behavior. In this mode, the rover uses the laser rangefinder to determine the distance to the wall, drives forward while maintaining a fixed distance from that wall, and turns both inside and outside corners when it encounters them. An additional state in this behavior is that of the laser rangefinder’s profile of obstacles (walls) in its surroundings. This scenario, involving both sensing and controlled locomotion (including reducing speed when approaching obstacles), provided an initial representative capability for technology evaluation. 136 For Increment 3, the rover performed target sensing with its camera and target rendezvous. Given a list of targets identifiable by colors, the rover will find each target in a given area and then proceed to map a course to the target and then drive to the target. If an obstacle blocks the target, then the rover will try to get as close to the target as possible. MDS architecture has 4 main components: State Knowledge, State Determination, State Control, and the Hardware Proxy. Each of these components will be explained in the sections below. 6.3.5 State knowledge State Knowledge is used to maintain the current state of the rover. For the two behaviors implemented, we adapted two State Knowledge components. One was called the PositionAndHeading state variable and holds the estimated position of the rover. The other is called the Obstacle State Variable and holds the estimated position of the nearest wall(s) in its frontal 180-degree view. 6.3.6 State control The purpose of the State Controller is to collect the robot' s current state from the State Knowledge components and to generate the proper commands for the robot to achieve the goal being executed. The commands generated then get submitted to the Hardware Proxy. For the two behaviors described, we built a controller that subscribed to the Obstacle State Variable and the PositionAndHeading State Variable. The controller would use this state information to generate the correct movement commands. 137 6.3.7 Hardware proxy Our implementation of the Hardware Proxy in the MDS Framework is a stub for the Player rover API. Player [Gerkey and et al. 2003] is developed at USC to communicate with the Pioneer family of robots. Player supports driving the robot’s wheel motors, controlling the camera’s pan/tilt unit, and querying a variety of on-board sensors. Access to these functions is provided through a client API, which communicates with a server process running on the rover itself. The client-server interaction can be conducted over a TCP/IP link, allowing us to execute the MDS Framework on a machine separate from the rover’s on-board PC. Our MDS Hardware Proxy makes calls to a Player client shared library, various functions of which allow us to operate the drive motors, operate the camera, read the rover’s position (maintained by the Player server process), and obtain a profile of the environment generated by the laser rangefinder. Figure 59 details this interaction. 6.3.8 State determination Another component that we adapted is the State Determination Component. This component takes the sensor readings from the Hardware Proxy and uses this information to estimate current state of the robot. Once a state has been estimated, this information gets stored in the State Knowledge component. For the two behaviors described, we adapted two State Determination components. One component estimates the position and heading of the robot using the wheel sensors as its data while the other component estimates distance to the nearest wall with its laser rangefinder’s values. 138 6.4 SCRover Testbed Architecture Tradeoffs and Decisions At the start of the SCRover project, the SCRover developers realized that having good specifications and data would be necessary for the long-term success of the project. For any researcher not associated with the development of the SCRover testbed, they would need a good set of specifications to learn about the SCRover system. Thus, having good specifications was made into one of the SCRover objectives. However, creating documents specifying the project requirements, design, and life cycle would take lots of time. But, we felt creating the documents were necessary and would be needed in the long term, and thus decided to spend approximately 6 months writing the required specifications. A benefit to writing the specifications was that it allowed engineers from the JPL- MDS program to analyze the SCRover system during Architecture Review Board (ARB) meetings. During the 6 months, the SCRover team conducted two ARB meetings where the MDS engineers evaluated the requirements, design, and life cycle of the system to ensure that the SCRover system was feasible. After getting their evaluation, the SCRover team then proceeded to build the system. In the end, writing the documents and having ARB became a good decision for the project, despite the fact that it is a time-consuming process. The SCRover team used the MBASE Guidelines [USC-CSSE 2003] as a template to write the SCRover specifications. For the architecture modeling, we decided to represent the architecture models with UML since it is a standard many programmers and researchers 139 understand. If we had gone with a modeling language that only a few people understood, we felt this would not help the testbed users. Likewise, for the defect classification, we used Orthogonal Defect Classification to classify our defects since this is a standard many researchers know about. Furthermore, during the development process, the USC SCRover team made an effort to collect as much effort and defect data as possible even though it can be a time-consuming process. The SCRover team realizes that this data will be valuable to future researchers. It was a trade-off the SCRover team deemed a good one. Another decision the SCRover team faced in the early stages of the project was to determine how to make the SCRover a low-cost testbed for researchers to use. As [Stone 2003] states, a low cost platform would be needed to encourage researchers to evaluate their technology on a testbed. After discussions with the USC Robotics department, a way was discovered on how to meet the low-cost objectives. SCRover would be developed with Player and Stage [Gerkey and et al, 2003], both of which are open-source tools available to the public. Thus, for researchers with a limited budget, they will be able to run the SCRover system on their computer at no cost to them. Since Player and Stage provide a good simulation of the rover in action, their results should be similar to any results the researchers would obtain from running the SCRover system on an actual rover. 140 While the software would be free, the hardware would not be. Thus, another decision would have to be made on how much money to spend on robotic hardware. USC had two choices: the Pioneer [ActivMedia 2003] rover or the ER1 [ER1 2004] rover. The Pioneer rover would be more expensive than the ER1 rover, but from talking to the JPL engineers, they felt the Pioneer rover was more sturdy and reliable than the ER1 rover and would be worth the extra money. Capability-wise, the Pioneer rover could also do more than the ER1 rover. Future experiments conducted by Steve Fickas and his graduate students demonstrated that the SCRover system could have been run on an ER1 rover. Another decision that had to be made was what type of system should be incorporated into the testbed. If USC developed its own software system for the testbed, NASA may deem the system to be too simple and not representative of a NASA software system. If USC wished to use a NASA or other government- built system as the testbed, such as the TSAFE system, then we face ITAR issues, which will then restrict the researchers to be U.S. citizens only. In the end, we decided to use the MDS system. Early in the project, the MDS group was sure that the MDS Framework system would be declared ITAR-safe by the government. The only part of the system that would not be ITAR-safe would be the mission adaptation code, thus born the USC SCRover team. USC would take the MDS Framework and use it to develop our own mission adaptation code, which would be ITAR-safe. The capabilities of the SCRover system would be similar to capabilities that the Mars rovers would perform. In the end, the SCRover system would be representative of a NASA mission satisfying all 141 stakeholders involved. NASA gets a testbed system that is representative of its own systems and researchers get an ITAR-safe system that they can use to evaluate technologies on. 6.5 SCRover Testbed Architecture In chapter 4, the architecture of a software engineering technology testbed was given. However, for researchers and practitioners to use the testbed, an instance of the testbed must be created which I will call the SCRover testbed. The SCRover testbed is based on the JPL-MDS technology and using MBASE Guidelines [USC-CSSE 2003] to write the specifications. A COTS product called Visual Query Interface (VQI) [Seaman and et al. 1999] was also used to view the data in the experience base. Figure 60 shows the architecture for the SCRover testbed. 142 Figure 60: SCRover Testbed Reference Architecture 143 6.5.1 Code In order to get representative feedback from the usage of the testbed and to meet the objectives of the technology developers, research sponsors, and potential adopters, the SCRover system has to have code that is representative of a NASA software system. Thus, the SCRover system uses the MDS Framework code that was developed by NASA-JPL. The MDS Framework code is approximately 300,000 lines of C++ code. The MDS Framework code contains many libraries and performs much functionality that a researcher will find interesting. For example, the MDS Framework has code for scheduling goals, planning mission resources, event logging, time services, and data management. All of these libraries and functionalities provide many opportunities for researchers to test their technology on. In addition to the MDS Framework code, SCRover currently has about 5,000 lines of C++ adaptation code, written by myself, that is written to support Increments 1-2. The code is seeded with defects and has comments to explain what each function does. During development, we used Concurrent Versions System (CVS) to store the SCRover code. The MDS Framework was chosen because it helps in the mission representativeness as NASA developed it. It provides a framework for creating dependable system, the framework is reusable to create many types of systems, and MDS encourages the use of good software engineering practices. 144 6.5.2 Specifications and Data As stated earlier, to meet the objectives of the testbed having multiple goals and applicable to many fields, many diverse test artifacts will be needed. In addition, for them to be used by multiple researchers, the test artifacts should be tailorable. Thus, the SCRover team used a well-instrumented version of the Win-Win Spiral model called Model-Based (System) Architecting and Software Engineering (MBASE) [Boehm and Port 2001] [USC-CSSE 2003] for system and software development. MBASE involves the concurrent development of the system’s operational concept, prototypes, requirements, architecture, and life- cycle plans, plus a feasibility rationale ensuring that the artifact definitions are compatible, achievable, and satisfactory to the system’s success-critical stakeholders. MBASE shares many aspects with the Rational Unified Process (RUP) [Krichten 2001], including the use of the Unified Modeling Language (UML) [Booch and et al. 1999] and the spiral model anchor point milestones [Boehm 1996]. The SCRover team was experienced in its use. While executing the development strategy, the team was able to collect data about the development process using various instrumentation techniques. In addition to the aforementioned defects found in the SCRover artifacts, the SCRover team kept track of its effort spent on the project. The effort data covered all the tasks performed by the SCRover team which includes writing each MBASE document, the system engineering aspects of the project, tool support, the defect reviews, coding, and testing. Currently, the data is kept in 145 Excel format. In addition, the developers used a tool called Hackystat developed by NSF-HDCP researcher Philip Johnson to collect the effort spent in coding the system [Johnson 2001]. Furthermore, each of the SCRover team members also submitted weekly progress reports letting the SCRover project manager know the status of their work and what each had planned for the following weeks. These reports helped the manager keep track of the team’s overall activities, allowed the manager to make changes to the schedule as necessary, and serve as a record of the project’s progress versus plans. 6.5.2.1 Architecture Specifications To provide guidance to researchers who develop technologies to detect defects at the architecture level, a software engineering technology testbed should therefore contain several views of the architecture that will give them a good, if not complete, understanding of the system. In addition, the architecture diagrams should be described using a language that many researchers would know about. One modeling language that is popular and provides various types of models is the Unified Modeling Language (UML) [Booch and et al. 1999]. UML provides 13 types of diagrams that can be categorized into 3 types: structure, behavior, and interaction diagrams. Using these three types of UML diagrams allows a system to be modeled as completely as possible. Structure diagrams indicate what is the system composed of. There are 6 types of structure diagrams: class diagram, component diagram, composite structure diagram, deployment diagram, object diagram, and package diagram. 146 Behavior diagrams indicate what the system does. There are 3 types of behavior diagrams: activity diagram, state machine diagram, and use case diagram. Interaction diagrams indicate the control flow of the system. There are 4 types of interaction diagrams: collaboration diagram, interaction overview diagram, sequence diagram, and UML timing diagram. 6.5.3 Predefined Packages Based on past experiments with USC, University of Oregon, and CMU, we have constructed predefined packages that a researcher may find useful, thus saving them the time of having to search through the SCRover testbed for the artifacts they need. For example, for a researcher wishing to examine SCRover’s architecture, we have a pre-defined architecture package that contains all of the architecture documents the researcher will need for their experiment. 6.5.4 Guidelines - Manuals Additional experimentation guidelines are being developed by the Fraunhofer Center at the University of Maryland to provide guidance on designing sound experimental evaluations, on which experimental technique is best for a given situation, and on most appropriate statistical data analysis techniques. Manuals and instructions are provided to show how to install the MDS Framework code/SCRover adaptation code as well as any third-party software that is needed. In addition, guides on how users can perform their own robotic 147 adaptation are included. With these manuals, researchers should be able to modify the code as needed for their experiment. The SCRover testbed contains a list of questions past researchers have had about using the SCRover testbed and about the MDS Framework architecture. By publishing the questions I received from researchers, I hope that it will help future researchers use the SCRover testbed more easily and thus rely less on the SCRover developers for help in setting up their experiments, understanding the SCRover testbed capabilities/artifacts, and understanding the MDS architecture. 6.5.5 Mission scenarios A mission/scenario is specified in MDS by using the Goal Elaboration Language (GEL). At the beginning of a mission, a scientist passes the GEL file to the rover and the rover executes the mission as stated in the file. Researchers who wish to create their own scenarios with the SCRover system may create their own GEL files. To create a goal for the rover to execute, researchers fill in the goal statement located in the GEL file with the appropriate values and the interval during which the goal should be achieved. Guidelines on how to create a GEL file are included in the SCRover testbed. Researchers can also execute the provided scenario drivers by simply executing the right command with the right GEL file. Currently, the SCRover system offers two GEL files for researchers to execute: the Increment 1 (MRE4) and Increment 2 (wall-following) scenarios. We are currently working on Increment 3 that will have the rover find specific targets 148 in a room with its camera. After locating the target, the rover will traverse to its location. The mission scenarios for SCRover are representative of missions that NASA performs. MRE4 was the first mission that the MDS group conducted when developing the MDS code. For Increment 3, the rover uses its camera for vision and it traverses to a location, just as the Mars rovers, Spirit and Opportunity [NASA 2003], currently do. Thus, having representative mission scenarios will allow testbed users to get representative feedback. 6.5.6 Seeded Defects Seeded defects are important in a technology evaluation since it can help determine how effective a technology is in increasing system dependability and it also prevents researchers from configuring their technology to win a competition, or in this case, make their technology look good to NASA. With seeded defects, NASA or other testbed users will get a fairer and impartial evaluation of how well the technology worked on the SCRover system. One of the objectives of a technology evaluation testbed is to be able to compare technologies. One way to compare technologies Suppose an experiment shows that in a given situation, the technology being evaluated finds 3 defects. How can we tell whether this is 100% of 3 defects or 3% of 100 defects? The best technique found to date is the seeded defect technique adapted from previous statistical techniques to software testing [Mills 1972]. If we insert 10 representative defects into the software, and the technology being evaluated finds 6 of them, the maximum likelihood estimate is 149 that the technology has found 60% of both the seeded and the unseeded defects. In general, if we insert I seeded defects, and the technology finds S seeded defects and U unseeded defects the maximum likelihood estimate of the total number T of unseeded defects is T = I*(U/S). Of course, this estimate is only as good as the assumption that the seeded defects are representative of the remaining defects [Voas and McGraw 1998]. We have tried to avoid the known shortfall of people’s inability to invent sets of representative defects by using as our pool of seeded defects the defects actually found in the specifications and code through peer reviews and a formal architecture review by JPL personnel. Researchers conducting their experiments simply modify a configuration file to insert selected defect(s) into the code without the need to recompile it. Once the configuration file is changed, the researcher can run the code with the defects and try to detect them with his/her technology. Currently, when a researcher downloads the artifacts, the seeded defects are known ahead of time by the researcher and they have the option to seed as many of them as they want into their experiment. However, this can lead to researchers setting up the experiment to specifically find those defects and thus the results may be deceiving. Furthermore, using seeded defects will allow testbed users to compare how well technologies fared in finding specific type of defects. 6.5.6.1 Defect classification used Orthogonal Defect Classification (ODC) [Chillarege and et al. 1992] was chosen as the classification we used to organize the defects found in the 150 SCRover artifacts. ODC was chosen because the classification allows researchers to determine what part of the software development process the defect was introduced in. Once the cause of error has been identified, researchers can then work on developing technologies to help improve the process. [Chillarege and et al. 1992] 6.5.6.2 Number of defects seeded in SCRover The SCRover adaptation C++ code has 5 defects. The 5 defects cause an undesired behavior in the rover either due to a bad algorithm, forgetting to initialize a component, or misusing SI units. Currently, we allow the researcher to decide which defects they would like to seed into the code. This is accomplished by modifying a control file in the system that indicates which defects should be active and non-active. These defects are from a pool of defects that the programmer found while debugging and testing the code. In addition, the SSAD has 38 defects ranging from typographical errors to errors that could potentially cause harmful behaviors. Some of the more harmful defects violated the MDS architecture rules. The OCD and SSRD documents also underwent peer reviews and defects were recorded for those artifacts as well. Figure 61 shows the defect data captured from the peer reviews of the OCD, SSRD, and SSAD. 151 Figure 61: Defect Data for SCRover artifacts 6.5.7 Instrumentation The SCRover testbed provides guidelines to the researchers on how to instrument the code for collecting the statistics they wish to track. For the first set of analysis performed with the SCRover testbed, the development team implemented an instrumentation class on top of one of the features offered by the MDS Framework that allows programmers and/or researchers to report events that occur in the code. The instrumentation class generates an output file containing a list of the events that occurred in the system and when each one of them happened. A researcher can then analyze this file. 152 Instrumentation is an important feature to have in the testbed since it allows a researcher to collect data, which can be used for their analysis when they have to demonstrate the effectiveness of their technology. The researcher would determine the data collected, as each technology is unique in what they are solving. 6.5.8 Software Simulator A free or low-cost similar is needed by the testbed since the testbed has to provide a low-cost barrier for researchers. The simulator should provide realistic sensor data and rover behavior, able to do multiple missions in a configurable environment, and help the researcher with their experiment. For the simulator, we chose Gazebo[Gerkey and et al, 2003], which is free, that was developed by the USC Robotic group. Gazebo is a multi-robot simulator that is capable of generating realistic sensor feedback, such as camera and laser data. But unlike Stage, Gazebo provided a 3-D simulation of the world. Figure 62 is a picture of the Gazebo simulator in action: 153 Figure 62: Gazebo Simulator 6.5.9 Visual Query Interface (VQI) Visual Query Interface (VQI) [Seaman and et al. 1999] is a search and visualization tool that helps testbed users locate artifacts they are interested in. In this case, researchers can use VQI to search the SCRover testbed for documents and artifacts they feel would be needed by their experiment. For example, an architecture researcher can indicate they want documents/artifacts pertaining to architecture in the SCRover testbed. By searching the appropriate attributes, the researchers can find the relevant architecture documents, such as the Software System Architecture Document (SSAD) and the UML models. Figure 63 shows an example of how VQI looks when used for searching. 154 Figure 63: VQI 6.5.10 Past Experiments and their results One of the objectives of a technology evaluation testbed is the need for an experience base to store the technology results as explained previously. One way to accomplish this objective is with the above-mentioned VQI tool. The VQI tool can be used to display attributes of a technology evaluation, such as the number of seeded defects found and the time it took to adopt the technology to the SCRover system. In addition, project managers can use VQI to search for results of how well technologies performed on the SCRover testbed. 155 6.5.11 System and Component Properties From the CBSP architecture, the following is a list of the system and component properties and how each property is addressed in the architecture. Table 26: System and Component Properties System/Component Property How property is addressed Specifications Component should reflect a wide range of technologies Using the MDS Framework and MBASE lifecycle process encourages good software engineering techniques such as building models for architecture, requirements, and testing, which are specifications needed to evaluate a wide range of technologies. System should be low cost The testbed was built with relatively low cost software, a free simulator, and a low-to-mid cost rover Specifications component should be tailorable The specifications were developed with MBASE GL, which are tailorable to the researchers’ needs. In addition, the specifications use standard modeling language (UML) that can be transformed to other specification languages. Technology Evaluation Results Component should be compatible with other results Using seeded defects as a baseline allows for different technologies to be compared in its effectiveness. The system shall provide representative feedback The testbed is a rover system that is representative to organizations building embedded systems. The system should be made available to the public The system was built with ITAR-safe code. Seeded defects should help in giving representative feedback The seeded defects are actual defects the developers came across in their system development 6.6 Resulting SCRover testbed features SCRover is a good instance of a software technology evaluation testbed for several reasons. The first reason is that the SCRover system is created using 156 the MDS framework, MDS process, and the MBASE lifecycle process. The MDS framework and the MBASE process enforce a set of good software engineering practices such as the use of good requirement models, architecture models, and state modeling. Good specifications and a good software engineering process help ensure the quality of the software system is high. In addition, good software engineering processes and specifications allow other researchers and developers to learn about the software system without relying on the developers who built the system. Thus, researchers who wish to evaluate their technologies on the SCRover testbed will need minimal help to use the testbed and understand the system. Due to the following of processes, the SCRover testbed provides matching documentation and structured code which will be useful for researchers who wish to study architecture and code or requirement and code. The SCRover testbed provides a quick way to gather performance data and a set of seeded defects that will help the researchers with their performance analysis. SCRover is representative of an embedded system, thus allowing for good representative feedback to organizations developing embedded systems such as NASA. SCRover has demonstrated to be capable of being applied to a wide range of technologies thus allowing the testbed to be cost-effective and useful for many kinds of researchers. Finally, the SCRover testbed is a good testbed for researchers to use because it requires relatively low software cost and is available to most researchers for use. SCRover has no ITAR restrictions. 157 6.7 Summary of Testbed Operational Concepts mapped to SCRover Testbed Features Table 27 provides a summary of each of the testbed operational concepts and which testbed feature(s) are needed to support that concept. Table 27: Testbed Operational Concept mapped to Testbed Features Testbed Operational Concept Testbed Feature Representative Feedback SCRover Code, Mission scenarios, (Seeded defects) Multiple Goals Specifications and Code, Predefined packages, Guidelines-Manuals, Instrumentation, FAQ Multiple Contributors Specifications and Code Searchable Results VQI, eBase Combinable Results/ Integrated Assessment Seeded defects, Past Experiments and their Results Low Costs Free robot simulator 158 Chapter 7 Fault-Tree Analysis To verify/demonstrate the importance of each testbed component, a fault- tree analysis [Pfleeger and et al. 2002] was performed. The fault tree shows what would happen if a component were not provided. In a fault-tree, circles represent basic events that could happen, rectangles represent failures in the system if an event were to happen, and the OR gate represents that any of the events can occur to trigger a failure or event. The fault tree was developed with the help of researchers, such as Steve Fickas’s graduate students and Carolyn Talcott, who provided feedback on their SCRover experiences and from published references [Basili, Tesoriero, and et al. 2001] ][Basili, Zelkowitz, and et al. 2007] [Boehm and Port 2002][Metz and Lencevicius 2003] [Mills 1972] [Stone 2003] [Tichelaar and et al. 1997]. 159 Figure 64: Fault-Tree Analysis From the fault-tree analysis, if the testbed had no specifications (as indicated by the circle on the left side of the figure), then that would cause researchers to have to build their own models of the software system based on the code. This could then lead to them building the incorrect models, which would mean they had to spend more time performing their evaluations, and an increased possibility that they would yield incorrect and non-representative results because of the incorrect models they built. The incorrect (and bad) results could then lead to the technology not being adapted by practitioners. 160 If the testbed had no code, but still had the specifications available, the researchers could still apply their technology to the specifications, but may not be able to verify their results without a running software system. Without a way to confirm their results, practitioners may be unwilling to adapt the technology. Likewise, if the testbed had no simulator or platform to run the code, researchers may not be able to verify the results they obtained from applying their technology to the specifications. If the testbed had no mission scenario, then two possibilities may occur. The first one is that researchers would have to generate their own missions/scenarios for the testbed system, which may not be representative of the organization’s missions. If the missions are not representative, this will provide a less convincing case for a practitioner to adapt the technology. Another possibility that could occur if the testbed had no mission generator would be that the researcher could adapt his/her technology to a specific mission, thus making it possible the technology will only work well under a specific set of instances which may not be what the practitioner is looking for. Typically, most practitioners would prefer technologies that are designed for general purposes and not a specific purpose. Furthermore, for a researcher to develop their own missions, it could be a time-consuming effort, thus if the testbed provided missions, the researchers would save time. In addition, providing a set of common missions for all researchers would allow for easier comparisons between technologies since the practitioners could evaluate the technologies under a common set of variables. 161 If the testbed had no instrumentation, then there is the possibility that either no data will be collected during the evaluation or the researcher will have to collect data on their own. If no data is collected, then the researcher may have a difficult time gauging the effectiveness of the technology, which could then lead to the technology not being adapted. If the researchers have to collect data on their own, this would mean the researchers would be spending more time on the technology evaluation. Thus, if the testbed provided an instrumentation class, this would save time and effort for the researchers. If the testbed provided no manuals, guidelines, FAQs, or instructions on how to use the testbed system, then the researchers would have to spend more time conducting their evaluation on the testbed. Furthermore, if the researchers deem the testbed to be too hard to use without instructions, they may decide to evaluate their technology on another system. If the testbed had no seeded defects, then two possibilities may occur. The first one is that researchers would have to generate their own defects for the testbed system, which may not be representative of the organization’s defects. If the defects are not representative, this will provide a less convincing case for a practitioner to adapt the technology. Another possibility that could occur if the testbed had no seeded defects would be that the researcher could adapt his/her technology to a specific mission, thus making it possible the technology will only work well under a specific set of instances which may not be what the practitioner is looking for. Typically, most practitioners would prefer technologies that are designed for general purposes and not a specific purpose. Furthermore, for a 162 researcher to develop their own defects, it could be a time-consuming effort, thus if the testbed provided seeded defects, the researchers would save time. In addition, providing a set of common seeded defects for all researchers would allow for easier comparisons between technologies since the practitioners could evaluate the technologies under a common set of variables. 163 Chapter 8 Extending Other Testbeds to Software Engineering Technology Testbeds In addition to the fault-tree analysis, as part of the V&V process to validate the process to build a software engineering technology testbed, I will describe in this section how other testbeds such as RoboCup can be extended to fit the definition of a software engineering technology testbed. However, first I will do a comparison of what components each testbed lacks in order to meet the definition of a software engineering technology testbed. 8.1 Comparing Testbed’s Architecture T1 – RoboCup T2 – TAC T3 – DARPA T4 - TSAFE T5 - ISPW (T1 – T2)***, T4 *May not be representative **Limited ***Uses rules to generate missions T4 T4 (T1-T5)* (T1-T5) T4 (T1-T5) T1 T4** (T1-T5)* T4* T4 T1 – RoboCup T2 – TAC T3 – DARPA T4 - TSAFE T5 - ISPW (T1 – T2)***, T4 *May not be representative **Limited ***Uses rules to generate missions T4 T4 (T1-T5)* (T1-T5) T4 (T1-T5) T1 T4** (T1-T5)* T4* T4 Figure 65: Testbed Comparison by Components 164 As one can see from Figure 65, most testbeds do not provide a seeded defect engine, an instrumentation class, specifications, and guidelines-manuals to help the researcher on how to use the testbed. Most testbeds however have some sort of simulator and code without any specifications. Most testbeds do generate technology evaluation results, however it may not be in a format that is useful to a practitioner. For example, RoboCup may say in its report how many soccer goals each team scored, but won’t indicate how long it took to use/adapt the technology. In a software engineering technology testbed’s technology evaluation result, the report would provide the practitioner much more useful data in how to adapt the technology to their project. 8.2 Extending Other Testbeds to be Software Engineering Technology Testbeds In this section, I will indicate the conditions under which an existing testbed may be converted to a software engineering technology testbed and give an example of that transformation. 8.2.1 Conditions under which a system can be transformed to a dependability testbed: Below lists the required and recommended conditions that an existing testbed should meet before extending it to become a software engineering technology testbed. Required Conditions: 165 · Has a software system · Someone who can create the specifications for the system, if not created already · Someone who can plant defects into the system · Someone who can create various missions/states of the system · Someone who can develop instrumentation for the system Recommended Conditions: · Defect database containing defects encountered during the construction of the system, else defects cannot be representative · If hardware is required, a simulator can be substituted 8.3 Extending RoboCup Rules Simulator Figure 66: Current RoboCup and TAC architecture When extending existing testbeds, we ideally want to keep the spirit of the competition the same, and not change its goals. For example, to extend the RoboCup testbed, we would like it to maintain its emphasis on soccer. Thus, I have created scenarios that the testbed could work under that would meet the 166 needs of a software engineering technology testbed while maintaining the original spirit of the testbed. 8.3.1 RoboCup Scenarios Scenario 1: Each team has 2 robots on the field and 3 broken robots on the sidelines, with each robot having different problems. Objective is to fix the robots as fast as possible so the robot can play in the game. Robots can come from past participants with seeded defects in them. Scenario 2: Give each team a robot to fix and the robot will have to demonstrate specific soccer skills like passing the ball to a teammate and shooting the ball. 8.3.2 Seeded Defects The defects can be placed in several places depending on what type of technologies are being evaluated: · Defects in the architecture that have been passed all the way to code · Defects in the requirements that have been passed all the way to code · Defects in the code only (i.e. developers misread the specifications when writing the code.) Examples of defects the robot will have: · Doesn’t start due to various reasons ranging from variables not initialized to software components not interfacing correctly. · Robot does run but is kicking the ball to the wrong team · Robot does run but is shooting erratically 167 · Robot does run but is going in the wrong direction Examples of defects found in the specifications (and possibly in the code as well): · Software components connected incorrectly · Incorrect pre- and post-conditions · Variables not initialized · Architecture doesn’t match requirements · Incorrect algorithm for passing/kicking the ball 8.3.3 How to assess The following is an example list of criteria on how each technology can be evaluated: · Time to fix defects · What defects researchers found · Time to apply technology Divide competition up by defect types, i.e. architecture, code, requirements, etc. Some technologies will not require any code, just the specifications. For those technologies, RoboCup should just provide the specifications and have researchers look for defects there only. 168 8.3.4 RoboCup Code and Platforms RoboCup will need to additionally provide working robots (or use their existing simulators) and their source code. The robots and source code can be obtained from teams who participated in RoboCup in past years. 8.3.5 RoboCup Specifications For RoboCup, we will need to ask teams to submit architecture specifications on how each robot is developed. Either that or we can try to use reverse engineering technologies to obtain an architecture model of the code. As for requirements specifications, those can be obtained from the RoboCup sponsors and the rules of the competition. Each robot has to perform a set of given tasks as specified in the rules. 8.3.6 RoboCup Defects As for finding a library of seeded defects, ideally it would be nice if each developer kept a defect-tracking system when they did development. If no defect- tracking system was used, then interviews will have to be conducted with researchers about what defects they encountered in their development and efforts will have to be take to duplicate them. 8.3.7 RoboCup Instrumentation Instrumentation can be built as part of the simulator or an instrumentation class can be built that will have to be integrated with the robot code. 169 8.4 Extending TAC 8.4.1 TAC Scenario Agents are misbehaving. Instead of booking the cheapest flight, the agents are booking the most expensive flights or booking travelers to the wrong destination. Objective is to fix the agents so they will act properly. Agents can come from past participants with seeded defects in them. 8.4.2 Seeded Defects The defects can be placed in several places depending on what type of technologies are being evaluated: · Defects in the architecture that have been passed all the way to code · Defects in the requirements that have been passed all the way to code · Defects in the code only (i.e. developers misread the specifications when writing the code.) Examples of defects found in the specifications (and possibly in the code as well): · Software components connected incorrectly · Incorrect pre- and post-conditions · Variables not initialized · Architecture doesn’t match requirements · Incorrect algorithm for choosing the lowest ticket price 170 8.4.3 How to assess The following is an example list of criteria on how each technology can be evaluated: · Time to fix defects · What defects researchers found · Time to apply technology Divide competition up by defect types, i.e. architecture, code, requirements, etc. Some technologies will not require any code, just the specifications. For those technologies, TAC should just provide the specifications and have researchers look for defects there only. TAC will need to additionally provide working agents (or use their existing simulators), their source code, their specifications, seeded defects, and instrumentation. The issues to consider are: time to fix will become an important factor and for some technologies, time may not be a good factor to consider when looking for best technologies and how to determine best technology from field 8.4.4 TAC Code and Platforms TAC will need to additionally provide working robots (or use their existing simulators) and their source code. The robots and source code can be obtained from teams who participated in TAC in past years. 171 8.4.5 TAC Specifications For TAC, we will need to ask teams to submit architecture specifications on how each robot is developed. Either that or we can try to use reverse engineering technologies to obtain an architecture model of the code. As for requirements specifications, those can be obtained from the TAC sponsors and the rules of the competition. Each robot has to perform a set of given tasks as specified in the rules. 8.4.6 TAC Defects As for finding a library of seeded defects, ideally it would be nice if each developer kept a defect-tracking system when they did development. If no defect- tracking system was used, then interviews will have to be conducted with researchers about what defects they encountered in their development and efforts will have to be taken to duplicate them. 8.4.7 TAC Instrumentation Instrumentation can be built as part of the simulator or an instrumentation class can be built that will have to be integrated with the robot code. 8.5 Extending DARPA and ISPW DARPA and ISPW do not meet the condition under which they can be extended to be software engineering technology testbeds. 172 Both ISPW and DARPA do not have a software system for dependability technologies to be applied to, thus it would be too much effort to create a software dependability testbed around them. 173 Chapter 9 SCRover Testbed Architecture and other Technologies Evaluation 9.1 Mae Evaluation 9.1.1 Mae technology summary Roshanak Roshandel performed an architecture analysis of the SCRover testbed using the Mae technology [Roshandel and et al., 2004]. The SCRover development team used UML diagrams, such as class and sequence diagrams, to specify the architecture of the SCRover system. However, UML’s lack of a precise semantic underpinning prevents reliable detection of inconsistencies, mismatches, and other classes of defects. The only mechanism to detect such errors was using peer-review of the UML diagrams that required reviewers to know about the MDS Framework architecture. According to Roshandel, USC’s Mae technology serves as an intermediate step between the UML diagrams and the implemented system. Mae is an extensible architectural evolution environment developed on top of xADL 2.0 [Dashofy and et al. 2002] that provides functionality for capturing, evolving, and analyzing functional architectural specification. A set of XML extensions were developed to model specific characteristics of MDS architectures. Consequently, Mae-MDS models of SCRover capture all functional properties of MDS style architectures [Roshandel and et al., 2006] 174 9.1.2 Experimental application of Mae technology to SCRover In her evaluation of Mae on the SCRover testbed, Roshandel used the architecture specifications, the SCRover code executing on a simulator, the instrumentation class, and the seeded defects. With her customized instrumentation class running in the code, Roshandel was able to collect information she needed for her evaluation in a quick manner. The simulator provided Roshandel with a no-cost way to execute the SCRover code. Roshandel also showed the architecture specifications is configurable for a researcher’s needs. The Mae-MDS specification of SCRover architecture was built by refining the existing UML diagrams. The UML class and sequence diagrams were used in determining the architectural configuration of SCRover in terms of components and connectors, and their interaction. The analysis provided by Mae revealed several inconsistencies in the SCRover architecture as dictated by the MDS Framework rules. These inconsistencies correspond to mismatches in the interface and behavioral specification of components’ services. As part of the design and implementation process, a set of defects in the requirements and UML specifications was found and identified. These defects were classified under a categorization schema similar to Orthogonal Defect Classification [Chillarege and et al. 1992] and their severity was identified. The defects were reviewed and seeded into the Mae-MDS specification of SCRover to identify and track the class of defects that Mae analysis can detect. With the 175 seeded defects, she was able to include in her report, what kind of defects Mae can detect in the design and the success rate of finding them. The seeded defects also allowed her to show the type of defects her tool could not find. This could have been due to the tool’s intended purpose or a failure in the tool if the tool was designed to find those type of defects but didn’t. 9.1.3 Mae Experimental results The peer-review process of the UML design documentation identified 38 defects. Each of the defects is classified under one of the predefined defect types of interface, object/class/function, method/logic/algorithm, ambiguity, data value, and other to each defect. The nature of the 38 defects varied from typographical errors to errors that could potentially cause harmful behaviors. Some of the defects were architectural in nature while others were conceptual. A subset of architectural defects concerned functional behaviors that Mae-MDS models capture, while other defects relate to behaviors not captured by Mae models. Re- seeding these defects into the Mae-MDS models helped identify and further classify the defects that Mae can and cannot detect. The defects Mae cannot detect are valuable in identifying complementary technologies necessary to detect additional classes of architectural defects. From the pool of 38 defects in the requirements and UML specifications, 24 (63%) of them were seeded into the Mae-MDS specification. The remaining 14 were conceptual defects that do not directly translate to functional specification of the system or its behavioral properties. Examples of this type of 176 defects that Mae models do not capture is “Inaccurate purpose for a given component X”, or “Class Y should be split into classes Y1 and Y2”. The result of Mae analysis on the models containing the 24 seeded defects is as follows: · Mae analysis revealed 15 errors (62.5%). · Mae analysis revealed 6 additional defects that were previously undetected by the peer review process. These defects primarily concerned the inconsistency in the specification. Specifically, Mae detected inconsistent specification of interfaces and behaviors among interacting components, resulting in possibly harmful interactions in the system. Interface Class/Obj Logic/Alg Ambiguity DataValues Other Inconsistency 0 5 10 15 20 25 #Def ects #Represented in Mae #Mae Detected Figure 67: Mae defect detection yield by type 177 Figure 67 summarizes the original number of defects (left column) against the subset that can be captured in Mae-MDS models (middle column), and those detected by Mae (right column). Incorporating defect seeding analysis to these results also demonstrates that, since Mae detected 15 of 38 seeded defects as well as 6 unseeded defects, the maximum likelihood estimate of the total number of remaining defects is T = 38*(6/15) = 15. Since Mae found 6 unseeded defects, this leaves an estimate of 9 remaining defects. As a rough estimate of where to look for these defects, we can posit that their distribution is similar to the distribution of defects not found by Mae. This is often but not always true, as with other defect-proneness metrics such as module complexity metrics [McCabe 1976] [Halstead 1977]. Table 28shows the results. Table 28: Seeded defect estimate of remaining defect distribution Defect Class Tota l Interfac e Class/ Objec t Logic/ Algorith m Ambiguit y Data Value s Othe r Unfound Seeded Defects 23 2 4 10 3 1 3 Remainin g Unseeded Defects 9 0.8 1.6 3.9 1.2 0.4 1.2 178 9.1.4 Conclusion of using SCRover Testbed with Mae The availability of the testbed support capabilities made the effort to perform the translation from UML to Mae-MDS and the Mae tool runs relatively low. The total effort was roughly 160 hours of which about 50 hours was spent on adapting the tool to model MDS architectures, 80 hours was spent on building Mae-MDS models out of UML models and models while the remaining 30 hours was spent on building the model, using the tool, and performing the analyses. The testbed evaluation showed that Mae could perform stylistic constraint analysis that checks for specific defects related to the MDS architectural style and that Mae could perform protocol matching to ensure proper dynamic behaviors of components. In addition, using Mae on a NASA-representative project helped the Mae researchers prove to NASA-JPL that the tool could be used on an actual NASA mission to help reduce software defects. However, before the tool could be used by NASA-JPL, the SCRover experiment did identify a few improvements needed by the tool before NASA-JPL could adopt the tool. Unfortunately, due to budget and time constraints, Roshandel was unable to implement the Mae improvements. Another benefit the experiment provided to Roshandel was that is helped her measure reliability correctly in a software system. Before using SCRover, she did not know how she would calculate a software system’s reliability. After seeing how the Mae tool worked on the SCRover system and analyzing the data collected from the experiment (via the instrumentation class written for the Mae tool), it gave her the idea of how to measure reliability. 179 An improvement in the testbed that Roshandel recommends is in the area of defect classification. Roshandel indicates that the defect classification system is too general for her work in architecture. Currently, the SCRover team uses a high-level defect classification, i.e. interface, logic/algorithm, class/object, and etc. The SCRover testbed does not classify deeper beyond those levels. However, Roshandel was interested only in the architecture design defects and needed a more specific classification for the defects, in order to determine what type of design defects the Mae tool could identify. Having a more specific defect classification system for the design defects would allow her to differentiate between the types of defects the Mae tool found as opposed to the type of architecture design defects David Garlan’s ACME tool would find. Overall, the testbed usage resulted in insights and plans for maturing and extending Mae’s defect detection capabilities. 9.2 AcmeStudio Evaluation 9.2.1 AcmeStudio Technology Summary AcmeStudio is an editing environment and visualization tool for software architecture designs based on the Acme architectural description language (ADL). AcmeStudio is particularly good at verifying whether a system' s architectural specifications are in appropriate compliance with the relationships and constraints imposed by the architectural style. [Schmerl and Garlan 2004] 180 Before applying the AcmeStudio technology to the SCRover architecture, researchers at CMU first had to create an ADL profile for the MDS architecture framework. At the time, MDS had 13 architecture rules an architecture design had to follow for it to be deemed valid. Thus, the purpose of the experiment is to determine if the SCRover architecture breaks any of the rules set by the MDS framework. 9.2.2 Experimental application of AcmeStudio technology to SCRover In the evaluation of AcmeStudio, David Garlan’s students used the architecture specifications and the seeded defects. Using the SSAD and the State Analysis documents, CMU was able to obtain the architectural information it needed for its AcmeStudio evaluation. However, when analyzing the UML diagrams provided in the SSAD, Dehua Zhang felt that there was a mismatch in the information provided in the SSAD and the architectural information needed to create a MDS design. He felt that some information was missing from the SSAD such as the description of the interaction type for an association relationship. Other information missing is detailed in his report, SCRover Architecture Checking in AcmeStudio [Zhang, Garlan, and Schmerl 2004]. According to [Zhang, Garlan, and Schmerl 2004], human interpretation was needed when analyzing the SSAD in order to use the UML diagrams to create the Component and Connector (C&C) architecture needed for the AcmeStudio tool. Zhang felt the interpretation between UML and C&C was a 181 difficult transition due to the incompleteness of the UML diagrams provided in the SSAD. Furthermore, Zhang felt there could be an ambiguous interpretation of the UML diagrams that could lead to an error in the Acme ADL diagrams. In the end, Zhang summarizes that there is a large semantic gap between the object modeling language SCRover uses and architecture description language used by AcmeStudio. In the end, with an understanding of how MDS works, Zhang was able to interpret the SCRover architecture and use it to create a model of the SCRover system in AcmeStudio. However, during the mapping process, Zhang found in addition to the seeded defects, some errors in the UML diagrams that the SCRover team did not detect in their peer reviews of the SSAD. The seeded defects allowed Dr. Garlan’s students to determine what kind of defects AcmeStudio can detect in the design and the success rate of finding them. In addition, since Garlan and Roshandel used the same set of architecture specifications and seeded defects, this allowed for their results to be combinable. In combining their results, Garlan and Roshandel discovered their respective technologies are actually complimentary and can be used together to find a greater number of defects. In addition, using SCRover allowed David Garlan to present representative feedback to the JPL-MDS group in how well AcmeStudio could find defects, which in turn led to the adoption of his technology in the MDS group. 9.2.3 AcmeStudio Experiment Results 182 After producing the model, the results of the experiment are as follows [Zhang, Garlan, and Schmerl 2004] [Roshandel, Schmerl, and et al. 2004]: · AcmeStudio was able to find 3 previously detected interface defects and 8 previously undetected defects involving compliance of the SCRover architectural specifications with the MDS architectural style. · Although the full capabilities of AcmeStudio were not exercised, there were some defects found by Mae that would not be found by AcmeStudio and vice-versa. · As with Mae, a number of ambiguities were found in translating the UML specs into Acme that represented potential defects that would be avoided by using Acme. · Approximately 120 hours were required to perform the UML-Acme translation and AcmeStudio analysis. Of those 120 hours, 80 hours was spent developing the architectural style, independent of the SCRover development, 30 hours was spent transforming the SCRover UML documentation to an architectural model in that style, and 10 hours was spent tailoring the environment, modeling the system, and conducting the analysis. · The AcmeStudio researchers identified several improvements in the SCRover testbed package that could have reduced the experimental effort, such as the organization of and access capabilities for the testbed artifacts. These improvements are being made to the testbed package. 183 · A review of the results by Mae and AcmeStudio researchers indicated that combining their representations and tool capabilities was both feasible and advantageous. · The objectives of creating an exportable and externally usable SCRover testbed were reasonably well met on this first attempt with valuable feedback on how to improve subsequent external usage. ACME has been adapted by NASA-JPL. Using the SCRover testbed helped prove that ACME could work on an actual NASA mission. 9.3 Maude Evaluation 9.3.1 Maude Technology Summary Maude is a multi-paradigm executable specification language based on rewriting logic that provides model checking capabilities. Dr. Carolyn Talcott and Grit Denker from SRI International have been using the Maude language to model the MDS Framework. Specifically, Talcott and Denker are using Maude to help MDS develop a framework with methods and tools that will increase software systems built with the MDS Framework. They plan to achieve better predictability and dependability of MDS systems by developing formal executable specifications of the MDS framework and its mission-specific adaptations and providing a set of formal checklists. [Denker and Talcott 2004] 9.3.2 Experimental application of Maude technology to SCRover To demonstrate that Maude could be potentially applied to space systems, Talcott and Denker used the SCRover testbed as their case study. They modeled 184 various components of the MDS Framework and then developed formal checklists for the SCRover system. They modeled the SCRover systems’ state variables, controller and estimator based on information they obtained from the architecture specifications. The goal (mission) they modeled was giving the rover a specific location on a grid for the rover to traverse to. They specified the scenarios and expected outcomes in their Maude language. Next, they ran their model through a model-checker to make sure the goals specified in the mission were achievable given the mission constraints. 9.3.3 Maude Experiment Results While executing the mission, Dr. Talcott and Denker expected two kinds of outcomes for their missions (that has two goals): 1. The rover finishes one goal, reports back to ground systems, and proceeds to process the other goal, or 2. The rover is processing the first goal and in the middle of the process receives a request to process the other goal and reports back that it is not achievable (since the controller is not ready for a new goal). Due to the symmetry of the 2 above outcomes, Dr. Talcott and Denker expected 4 solutions when searching possible configurations of all outcomes. However, after running the experiment, they got 16 solutions instead. During their investigation as to where the extra solutions came from, Dr. Talcott and Denker discovered they had incorrectly modeled the state variable. This mistake illustrated “the importance of analyzing the model for validation of the model itself 185 and it emphasizes the importance of making formal connections between the model and the informal designs and implementation.” [Denker and Talcott 2004] One positive effect of using the SCRover testbed for Dr. Talcott’s project was that it allowed her to gain knowledge of the MDS project, which was originally suppose to be her end client. Since Dr. Talcott had limited access to the MDS developers, she found it difficult to obtain information on how the MDS Framework works. She had to rely on SCRover specifications and conversations with the SCRover developers to gather information about the MDS Framework. With the MDS knowledge she gained from SCRover and using the SCRover missions as her model, she was able to “illustrate the importance of analyzing the model for validation of the model itself. It also emphasizes the importance of making formal connections between the model and the informal designs and implementation.” [Talcott and et al. 2004] Dr. Talcott and Dr. Denker found the testbed artifacts to be quite useful in helping them gain an understanding of the SCRover system with minimal help from the SCRover developers. In particular, the SSAD helped them understand what function calls the MDS Framework made, the definition of each function, what components comprised the rover, and how each component interacted with each other. Also, the GEL files allowed them to see how goals were set up during a mission. Another lesson learned from this experiment is that Dr. Talcott and Dr. Denker needs a more advanced version of the SCRover testbed to determine if Maude can be applied to a more complex system with lots of state variables and 186 multiple goals. Currently, a simple version of SCRover can find small problems. [Denker and Talcott 2004] Unfortunately, due to cuts in their funding, Dr. Talcott and Dr. Denker will not be able to apply Maude to a more advanced version of SCRover. 9.4 ROPE Evaluation 9.4.1 ROPE Technology Summary The ROPE [Fickas and et al, 2004b] project’s purpose is to determine the operational envelope of the environment for a given technology during runtime. In this case, ROPE will be used to discover the operational envelope of the SCRover system. The researchers of ROPE believe that no engineered artifact can be developed to meet all possible environmental conditions. Instead, cost/benefit studies should be done to determine what the "operational envelope" would be for the artifact. Once the analysis is done, the artifact should work dependably inside the envelope. However, if the environment becomes extreme, i.e. the artifact is outside of the envelope, the artifact can be said to be in an undependable state, one it is not designed to handle. [Fickas and et al, 2004b] 9.4.2 Experimental application of ROPE technology to SCRover Led by Dr. Steve Fickas who created the ROPE technology, graduate students at the University of Oregon were able to take the MDS Framework code and the SCRover adaptation code and became the first team to successfully run the SCRover system on their own machines. Instead of using a Pioneer rover, 187 Fickas and his students ran the system on an ER1 rover [ER1 2004] made by Evolution Robotics. Likewise, Fickas and his students replaced Player with iSIM [iSIM 2004], which is a simulator created by the University of Oregon. However, to use the ER1 rover and the iSIM program, the researchers first had to modify the SCRover code, in particular the hardware adapters, in order for it to work with their hardware and software systems. Instead of issuing robot commands that a Pioneer rover would understand, the SCRover system had to now issue robot commands that the ER1 rover would understand. After consulting with the USC SCRover team and receiving a tutorial of how the SCRover code worked, Fickas’s students were able to adapt the code to fit their hardware and software systems. The result of their experiment was that Fickas was able to run the wall- following scenario on an ER1 rover. A picture of the ER1 robot in action is shown in Figure 68. Videos of the ER1 rover in action can be found at the following website: http://www.cs.uoregon.edu/research/mds/roper. Figure 68: ER1 Rover 188 After getting the wall-following mission to work, Dr. Fickas then started to conduct experiments to determine under what conditions the wall-following scenario would work and not work well in. The researchers determined that the wall-following algorithm the SCRover team used had at least one error. SCRover would not work well in tight places, i.e. an environment where 2 walls would form an acute angle. (See Figure 69 for an example of a tight place.) The rover would follow one of the walls until it reaches a point where the width of the robot is the same as the width between the two walls. At this point, the robot gets stuck and does not have enough room to turn or move, thus causing the wall-following mission to fail. [Fickas and et al. 2004a] Figure 69: ROPE experiment Another defect the ROPE researchers found was that the robot would not follow the wall correctly if the wall has a curvature. The curvature of the robot’s path would not match the wall’s curvature exactly. The SCRover developers knew about this defect already. 189 The whole experiment, including the time to install the MDS Framework code and modify the SCRover adaptation code, took approximately 80 hours between 2 programmers. 9.4.3 ROPE Lessons Learned After applying ROPE to the SCRover testbed, many lessons were learned about the SCRover testbed and about ROPE. Like the Mae application, Dr. Fickas and his students did not have to spend a lot of time in getting their experiment up and running. They spent an estimated 80 hours to set up the experiment. Many of those hours were spent trying to install the SCRover system on their computers; they had to install about 10 software packages and configure the SCRover system to work on their local computer environment. Unfortunately, they did not have the expertise of an MDS personnel working for them and had to rely on themselves (with help from USC) to install all the software and configure the MDS code in order to run the SCRover system. The University of Oregon team found that the SCRover specifications were of great help to them when they were trying to understand the SCRover system. The documents explained what the functionality of SCRover was, the architecture of the system, and how they could adapt the code to their own missions. Due to their far location from JPL, the Oregon team found it difficult to learn about the MDS Framework. Thus, they had to rely on SCRover specifications and conversations with the SCRover developers to answer a lot of 190 their MDS questions. With the MDS knowledge they gained from SCRover, they were able to get a basic understanding of how the MDS Framework works. Another lesson learned from this experiment is that Dr. Fickas and his students need a more advanced version of the SCRover testbed to perform better experiments. More advanced capabilities are needed to determine how well ROPE can help identify complex operational envelopes. For example, adapting camera capabilities into the SCRover system would have provided them more interesting scenarios to work with other than the wall-following scenario. They could test how well a camera can detect objects in an environment. Another positive effect of the SCRover application was that the research team used SCRover to help them set up their tool, debug the tool, and determine what was possible for the tool to do. Having another software system to test their technology on helped them determine what features were lacking in their tool and how to apply their technology to external systems they did not develop such as NASA software systems. The experiment showed that MDS system could run on an inexpensive robot. At USC, we spent approximately $25,000 for the rover while Oregon was able to run the SCRover system for approximately $1,000. SCRover helped prove that ROPE could be used by a NASA software system like MDS. However, due to a cut in NASA funding, ROPE was never used by JPL. 191 9.5 STRESS Evaluation 9.5.1 STRESS Technology Summary STRESS stands for Systematic Testing of Robustness by Evaluation of Synthesized Scenarios. Sandeep Gupta and Ahmed Helmy developed STRESS to increase a software system’s robustness by generating test cases to help expose a software system’s defects. [Helmy and et al. 2004] 9.5.2 Experimental application of STRESS technology to SCRover In their experiment, Dr. Gupta and Ganesha Bhaskara [Bhaskara 2003] examined the algorithm used for wall following. Once they figured out the variables/factors involved in the algorithm, they began their work of finding test cases with which the wall following algorithm would fail. They discovered one such test case in which the algorithm failed. The wall following algorithm uses only two laser beams to detect walls. One laser beam is at an angle of 20 degrees while the other beam is at an angle of 75 degrees. If there is an object in between the two beams, the rover will not detect it, thus causing the rover to collide into the object as shown in Figure 70. 192 Figure 70: Failure Scenario detected by STRESS The defect found by the STRESS researchers can be fixed by making a small change to the algorithm. By examining all the laser range data, SCRover will be able to detect almost any object in front of it, unless the laser beam bounces off a reflective surface or if the object has a height of less than the height of the laser beam. STRESS also identified two defects that ROPE found as well. The first defect is that the robot would not follow the wall correctly if the wall has a curvature. The curvature of the robot’s path would not match the wall’s curvature exactly. The second defect is the rover could get stuck in certain wall configurations, like the one shown in Figure 69. Finally, in certain wall configurations, the rover could get stuck in a loop as shown in Figure 71. The rover would enter the enclosure as shown in the figure, but would not be able exit it. 193 Figure 71: Rover Stuck in a Loop 2 STRESS researchers spent approximately 5 hours to perform their analysis of the SCRover system. 9.6 STAR Evaluation 9.6.1 STAR Technology Summary A technology created by Somo Banerjee and Leslie Cheung to evaluate the reliability of a system from the design point of view. [Banerjee, Chung, and et al. 2006] [Roshandel and et al., 2006] 9.6.2 Experimental application of STAR technology to SCRover After meeting with the developers of the STAR technology, it was determined that SCRover would not be able to meet the needs of the technology. STAR needed a system that provides a greater heterogeneous number of software components that would range from simple to complex. Thus for the 2 Picture taken from [Bhaskara 2003] 194 STAR evaluation, Cheung chose to model the DeSi system [Mikic-Rakic and et al. 2004], a tool which models the system’s deployment architecture. DeSi has approximately 30 states, a low number of components, and 50,000 lines of code. For Cheung’s evaluation, after he obtained the code from Sam Malek, DeSi’s developer, Cheung had to then create state diagrams of the system since the code did not come with a set of architecture diagrams. To do the modeling, Cheung had to ask Malek for help. Malek had to provide Cheung guidance in how the code worked and how to model the system. After the modeling is finished, Cheung will then have to place defects into the system to determine what the reliability impact of the defect will be. However, any defects he seeds into the system may not be authentic, as he has to create the defects himself. Ideally, Malek would have kept a defect database of the defects he found during testing and development, which would have helped Cheung find realistic defects to work with. Initially, Cheung was hoping to use the SCRover system as the system to analyze. However, based on the requirements needed by STAR, it was deemed that SCRover was too simple of a system for STAR to evaluate. SCRover only has about 5-7 states, 4 components, and the SCRover adaptation code has about 5,000 LOC only. Any results obtained from the experiment would not be meaningful. 9.6.3 How to Extend SCRover to meet STAR’s needs For Cheung to use SCRover, changes to the SCRover system would have needed to be made. The most important change would be to increase the 195 number of states the SCRover system has. To increase the number of states, SCRover would need to provide more capabilities. Example capabilities would be having SCRover do chemical analysis, returning objects of interest, power management, providing first aid kits to people, and listening for sounds of help. Some of these capabilities would be run in parallel. For example, power management would be run in parallel with all the other capabilities. SCRover would always be monitoring that it has enough power to return to its home base while it performs its mission. Once power has reached a minimum, SCRover should shut off its sensors and return home. However, SCRover did provide some features that Cheung liked. SCRover provides the architecture models for the system, thus he would not have to spend many hours creating them himself. In addition, SCRover has a set of authentic seeded defects that he can use for his evaluation, which would have given his evaluation more of a real-world feel. 9.7 Synergy between Evaluations Even though a wide range of technologies was applied to SCRover, each researcher used the evaluation process outlined in [TODO]. Each researcher was able to configure the testbed to their specific needs (such as what defects they were looking for) and used only artifacts that they needed for the evaluation. In the cases of the Mae and ROPE evaluations, both researcher teams used all the components of the SCRover testbed, save for the project effort data. Both teams used the manuals and guidelines to determine how to conduct the evaluation, used the scenario/mission generator to select which scenario they 196 would use in the experiment, applied their respective technology to the set of project artifacts (including code and specifications), used seeded defects to determine how well their technologies performed, used the platforms to execute the code and collect instrumentation data, and finally use the technology evaluation reports to compare their technologies with similar ones. Since the teams were not doing research in software costs, they did not need to use the project effort data collected by the SCRover team. However, effort data to determine how long their evaluation took compared to other technologies were used by the teams, but that effort data were stored in the technology evaluation reports. The cost of doing each evaluation was 1-2 person-months for each researcher, most of which was in one-time learning and initial setup effort, thus indicating that using testbeds can be a cost-effective method for evaluating technologies. 9.8 SCRover Limitations While SCRover is representative of NASA’s planetary rovers, not every researcher will be able to use the SCRover testbed. In some cases, the SCRover testbed may not have enough capabilities or requirements to satisfy the researcher’s technology needs. However, SCRover does provide a representative instance of how software systems at NASA are developed since the JPL-MDS methodology was followed. All project artifacts that MDS required are part of the SCRover testbed. These artifacts span the software lifecycle from inception to delivery, thus allowing for a wide range of software engineering technologies to be evaluated. Furthermore, while the number of missions 197 SCRover performs is small compared to a Mars planetary rover, the missions that SCRover does execute are part of the missions that a Mars planetary rover performs as well. In fact, the first mission that SCRover performs is the same mission the JPL-MDS group developed in their Mars rover prototype. Both SCRover and Mars planetary rovers do automated obstacle avoidance, battery power management and replanning, and use their cameras to search for and detect objects of interest. Thus, while SCRover may be less complex in terms of capabilities than a Mars planetary rover, it is complete to provide researchers a model of how software for NASA is developed. An example of where SCRover couldn’t be used by researchers involved the STAR technology [Roshandel and et al., 2006]. STAR researchers wanted to use the SCRover testbed to analyze the reliability of a system from a design point of view. However, it was soon discovered that SCRover did not have enough states in its system to perform a solid evaluation of the STAR technology. STAR needed approximately 30 states while SCRover provided about 10 states. The following diagrams illustrate various dimensions of the SCRover testbed that will help researchers decide whether or not they can use the SCRover testbed for their technology evaluation. Figure 72 provides an overview of the SCRover testbed while Figure 73 and Figure 74 provide information for researchers evaluating technologies respectively in architecture and requirements engineering. For each technology family to be evaluated, similar diagrams can be created. The diagrams below are not meant to include all 198 possible dimensions that the SCRover testbed can be measured. However, the diagrams provide enough information for a researcher to decide if the SCRover testbed is appropriate for them to use. Platforms Scenarios Operational Capabilities Project Data Specifications 4 2 8 12 16 4 6 8 1 2 3 4 10k Source Lines of Code 50k 300k Low Medium High Low Medium High Low Medium High Low Medium High Low Medium High Low Medium High Figure 72: Project Overview Figure 72 outlines the overall project characteristics for the SCRover testbed. This will give researchers an overview of how complex the SCRover system is. For this diagram, we consider a complex Mars rover as having ratings of “High” along each axis. The “Project Data” axis is used to describe how much data was collected during the development of the SCRover system. The SCRover team collected much data on defects, effort spent in developing the system and SLOC count, data that the NASA JPL-MDS team collected as well. 199 However, more data could have been collected such as daily builds data and effort to fix defects, which kept the team from scoring a “High” rating. Counting the number of lines of code that was part of the MDS Framework, SCRover had over 300k LOC, which is a fairly high number. The JPL-MDS team had between 400 – 500k for their Mars prototype rover. The number of “Operational Capabilities” was in the medium range when compared to a Mars rover. The SCRover team spent much time developing a high amount of specifications for the SCRover system. The SCRover team used a well-instrumented version of the Win-Win Spiral model called Model-Based (System) Architecting and Software Engineering (MBASE) [Boehm and Port 2001] [USC-CSSE 2003] for system and software development. MBASE involves the concurrent development of the system’s operational concept, prototypes, requirements, architecture, and lifecycle plans, plus a feasibility rationale ensuring that the artifact definitions are compatible, achievable, and satisfactory to the system’s success-critical stakeholders. MBASE shares many aspects with the Rational Unified Process (RUP) [Krichten 2001], including the use of the Unified Modeling Language (UML) [Booch and et al. 1999] and the spiral model anchor point milestones [Boehm 1996]. In addition, all specifications that the MDS Framework required were generated as well. The number of platforms SCRover can run on is 3 (2 hardware and 1 simulator platforms), which places it in the medium-high range. The JPL-MDS team planned on using MDS on several platforms, including one simulator. For their prototype, the MDS team tested on a simulator and 2 different hardware platforms as well. Finally, the last axis covers the number of 200 scenarios/missions the rover could do. SCRover has 3 missions that placed it in the low-medium range while the Mars rover would have a higher number of missions it could do. However, since SCRover incorporated the MDS goal-driven scenario generator, some classes of additional scenarios can be straightforwardly added. States Architecture Diagrams Scenarios State Variables Components Classes 2 2 4 6 8 4 6 8 10 20 30 40 20 40 60 80 10 5 15 Low Medium High Low Medium High Low Medium High Low Medium High Low Medium High Low Medium High Figure 73: Homeground for Architecture Technologies Figure 73 is to be used by architecture researchers to give them an idea of what type of architecture analysis they could perform. The figure covers many dimensions that many architecture researchers care about, but by no means, does it cover every dimension a researcher may care about. As in Figure 72, we 201 consider a complex Mars rover as having ratings of “High” along each axis. Components and State Variables are used in the context defined by the MDS Framework [Rinker 2002]. States are defined as the states of the system. Classes are the number of object-oriented classes the SCRover team developed and Scenarios are the number of missions/scenarios SCRover could perform. For each of the 5 axes defined so far, SCRover falls in the medium range, as the rovers the JPL-MDS team built have a greater number of scenarios, components, states, state variables, and classes. However, the number of architecture diagrams the SCRover team defined is in the high range as the SCRover team produced similar architecture specifications the JPL-MDS team would produce. Project Requirements Capability Requirements Level of Service Requirements Requirement Specifications 2 4 6 8 10 12 14 16 4 8 12 16 40 Low Medium High Low Medium High 10 20 30 Low Medium High Low Medium High Figure 74: Homeground for Requirement Engineering Technologies 202 Figure 74 is to be used by requirements engineering researchers to give them an idea of what type of analysis they could perform. The figure covers many dimensions that many requirements researchers care about, but by no means, does it cover every dimension a researcher may care about. As in Figure 72, we consider a complex Mars rover as having ratings of “High” along each axis. Capability Requirements are defined as capabilities the system can do. For example, the rover should be able to use its camera to detect objects of interest. Project Requirements are defined as constraints placed upon the design team: e.g., solution constraints on the way that the problem must be solved, such as a mandated technology. For example, requiring MDS Framework to be used on the SCRover system. Project Requirements also summarizes process-related considerations such as cost or schedule constraints. For example, developing the project by a specified date. “Level of Service” Requirements are defined as “how well” the system should perform a capability requirement. For example, the accuracy of reaching a target should have an error range of +/-10% distance of the expected position of the target [Boehm and Port 2001] [USC-CSSE 2003]. For each of the requirements, SCRover falls in the medium range, as the JPL- MDS Mars rovers would have a higher number of requirements. However, the amount of requirement specifications the SCRover team defined is in the high range as the SCRover team produced similar requirement specifications the JPL- MDS team would produce. 203 9.9 Researchers’ Summary Results In summary, the results of all the technologies being applied to the SCRover testbed demonstrated many points. First, the artifacts were tailorable to the needs of the researcher. For example, the architecture specifications were used and configured in the evaluation of Mae, Acme, STAR, and Maude. The costs of hardware and software in using the testbed were low. The researchers incurred no software costs and only the ROPE team incurred a hardware cost, which was minimal. Seeded defect approach was effective in identifying the degree to which Mae and ACME could identify defects of various classes and provided a good way to combine technology results, as was the case for the AcmeStudio and Mae evaluation. During the course of the evaluations, the researchers for Mae and AcmeStudio discovered their tools were complimentary and could be used together to find a more diverse group of defects. The approach also prevented researchers from developing their technologies to work for only a specific scenario or mission, thus helping to keep the technology generic as possible. The instrumentation class provided a quick and easy way to gather data for performance analysis by the researchers. The results of the technology evaluation showed that the technology results are representative to a sponsoring organization, in this case, NASA. Dr. Garlan was able to use the SCRover results to help convince NASA to use his tool in the MDS project. In addition, Roshandel was able to use the SCRover testbed to further develop her tool to the way MDS would use her technology. 204 Furthermore, each of the results showed that the technology could be applied to robotic systems. 205 Chapter 10 SCRover Testbed Implementation and Performance Analysis 10.1 Cost and Defect Analysis Table 29: Technology Evaluation Cost Technology Cost Number of Defects Found Mae 160 hours (about 50 hours was spent on adapting the tool to model MDS architectures, 80 hours was spent on building Mae-MDS models out of UML models and models while the remaining 30 hours was spent on building the model, using the tool, and performing the analyses) 21 Architecture defects (15 seeded, 6 unseeded) AcmeStudio 120 hours (80 hours was spent developing the architectural style, independent of the SCRover development, 30 hours was spent transforming the SCRover UML documentation to an architectural model in that style, and 10 hours was spent tailoring the environment, modeling the system, and conducting the analysis) 3 seeded interface defects and 8 unseeded defects. Peer Reviews 120 hours (100 hours to create the UML models + 4 hours/person * 5 people for reviews) 18 Architecture defects, 38 total defects ROPE 90 hours (80 hours to install/setup, 10 for analysis) + $1,000 for rover 2 algorithmic defects STRESS 5 hours 4 algorithmic defects 206 10.1.1 Cost Analysis From Table 29, one can see that each researcher spent a relatively low amount of money on the software during their evaluation. Researchers from the Mae and ACME evaluations did not have to purchase any hardware when they used the SCRover testbed. The only cost to them using the testbed was the time it took to transform the UML models into their own respective models. Both group of researchers spent less than 1 person-month in setting up and running their evaluation. The ROPE team spent approximately $1,000 on the purchase of an ER1 robot to run the SCRover system, in addition to the cost of a laptop to attach the ER1 robot to. In addition, the ROPE team spent approximately 80 hours, ½ of a person-month, to understand and modify the SCRover system to use on their rover. Furthermore, since all the software SCRover provides is free, such as Player/Stage and MDS, no money needs to be spent on buying software when using the testbed. Effort may be needed to install or modify the software for use. Judging from these evaluations, the cost to use the SCRover testbed is low, approximately 1 person-month or less. In addition, since the testbed worked with multiple researchers from multiple fields, this showed how cost-effective the testbed was from the sponsor’s view. The organization just had to develop one testbed to satisfy the needs of multiple researchers working in a wide range of technologies. 207 10.1.2 Analysis of Defects Found vs. Effort 10.1.2.1 Mae vs. Acme vs. Peer Reviews The results of the Mae an AcmeStudio evaluation are summarized in Figure 75 along with using Peer Reviews. Peer Reviews were used to find the set of original seeded defects and the defects are classified by their architectural type. The figure indicates what type of defects each method can find. The results verified that Mae and AcmeStudio find different types of defects and are indeed complementary technologies that can be used jointly to find a greater number of defects. Figure 75: Mae/AcmeStudio/Peer Review Results Source: Roshandel, Schmerl, et al., 2004 208 Architecture Defects Found vs. Effort 0 10 20 30 40 50 60 0 50 100 150 200 250 300 350 400 450 Effort (Hours) Architecture Defects Found Peer Review first Mae second Acme third Mae first Acme second Peer Review third Figure 76: Architecture Defects Found vs. Effort Further analysis can be done from the Mae/AcmeStudio/Peer results to demonstrate to practitioners how using all three technologies in combination can benefit them in finding a greater pool of defects as shown in Figure 76. For this study, the set of 32 defects shown in Figure 75 as well as the 20 seeded defects the Mae and AcmeStudio technologies couldn’t find are used. If only peer reviews were used to identify defects, only 38 defects would be found leaving at least 14 more in the architecture specifications. However, if the Mae technology were applied after Peer Reviews, then an additional 6 more would be found. Likewise, if the AcmeStudio technology were applied after Mae and Peer Reviews, then an additional 8 more defects would be found. 209 Major Defects Found vs. Effort 0 2 4 6 8 10 12 14 16 18 20 0 100 200 300 400 500 Effort (Hours) Architecture Defects Found Peer Review first, Mae second, Acme third Mae first, Acme second, Peer Review third Mae first Acme second Peer Review third Acme third Mae second Peer Review first Figure 77: Major Defects Found vs. Effort Minor Defects Found vs. Effort 0 5 10 15 20 25 30 35 0 100 200 300 400 500 Effort (Hours) Architecture Defects Found Peer Review first, Mae second, Acme third Mae first, Acme second, Peer Review third Mae first Acme second Peer Review third Peer Review first Mae second Acme third Figure 78: Minor Defects Found vs. Effort Figure 77 and Figure 78 show the same set of 38 seeded architecture defects with the 14 unseeded ones found in Figure 76 but now we look at them 210 categorized on priority. The values of priority are major and minor. 19 major defects were found as opposed to 33 minor ones. From Figure 77, it appears that AcmeStudio and Mae are better at detecting major defects than peer reviews, while from Figure 78, peer reviews are better at finding minor defects. These results go against other studies conducted. Usually tools find more minor defects, due to the fact that tools will generally document every defect, regardless of priority, while peer reviews find more major defects. One possible reason for this discrepancy is because the Acme Studio tool was set up to find instances of when the MDS architecture rules were broken. Thus, the defects the Acme Studio found were related to broken architecture rules, which were classified as major defects. 10.1.2.2 STRESS vs. ROPE Both the STRESS and ROPE technologies help find where in the environment faults could happen in a mission. For the SCRover system, the rover enters an undependable state when the robot is in an environment it was not designed to handle. For the SCRover system, ROPE uses the Unified Model of Dependability (UMD) [Basili, Donzelli, and Asgari 2004] to represent dependability issues relating to the robot and its environment while STRESS uses the environment parameters, the algorithms, and fault types in its dependability analysis. Another difference between STRESS and ROPE is that STRESS is run during design time of the lifecycle while ROPE is executed during runtime. 211 During the evaluations, both technologies found the same two environmental defects, 1) the rover getting stuck in certain configurations and 2) the rover not being able to follow a curved wall accurately. However, STRESS found two additional defects: 1) the rover getting stuck in a loop due to the wall configuration and 2) the rover not being able to detect walls or objects that are between the two beams of laser used to find obstacles/walls. Based on the number of defects found and the number of hours it took to perform the analysis (ROPE – 10 hours, 2 defects vs. STRESS – 5 hours, 4 defects), it would seem that STRESS would be a better technology to use since it finds more defects with fewer effort. In addition, STRESS is used during design time instead of runtime, so the defects would be found earlier in the lifecycle. One advantage that ROPE would hold over STRESS is that ROPE can be used to find defects during the execution of a live mission, as opposed to STRESS. Thus, ROPE can potentially be used to avoid defects the rover detects during runtime of its mission. 10.2 How well do SETTs work across a range of technologies? Software defect analysis varies by the type of artifact being analyzed. For specifications, MIL-STD-498 [Department of Defense, 1994] (later transformed into IEEE/EIA J-STD-016) is a United States military standard that establishes uniform requirements for software development and documentation. In addition to just developing code, MIL-STD-498 recommends developers develop the following Data Item Descriptions (DIDs) which includes, but are not limited to, the Operational Concept Description (OCD), Software Requirements Specification 212 (SRS), Software Design Description (SDD), Test Plan, User Manual, and Software Development Plan (SDP). The Specifications component of a SETT should have artifacts similar to what MIL-STD-498 recommends. For the SCRover testbed, the SCRover team used the MBASE Guidelines, which has a large overlap with what MIL-STD-498 recommends. With this list of DIDs, researchers can use the list to determine if the SETT provides what their technologies need in order to run. In this section, I will discuss how one can use SETTs to evaluate technologies that are used to find defects in the software architecture, the software source code, and the software requirements. 10.2.1 Software Architecture Defect Analysis With the amount of specifications and artifacts the software engineering technology testbed provides, many classes of technologies can be evaluated on the testbed. For example, software source code defect analysis tools, requirement analysis tools, and architecture tools are a few of the technologies that can be evaluated. However, applying the technology to the specifications and artifacts may require some tailoring before the technology can work. For example, many architecture tools have their own language to specify the architecture. Thus, the architecture specifications may need to be transformed into the new language before the architecture tool can work. For example, Dr. Roshandel had to transform the UML architecture specifications SCRover provided to xADL models before she could run the Mae tool. In addition, Dr. Garlan had to transform the UML models to ACME models before he could apply his AcmeStudio tool to the SCRover testbed. Likewise, other evaluators would 213 take similar measures to transform/modify the specifications and artifacts to work with their technology. 10.2.2 Software Source Code Defect Analysis With the software engineering technology testbed, researchers and practitioners can use it to evaluate how well software source code defect analysis tools can find defects. Looking at examples of software source code defect analysis tools, such as Bandera [Corbett and et al., 2000], Fluid [Halloran and Scherlis, 2002], SLAM [Ball and Rajamani, 2002], CodeAdvisor [Duesing and Diamant, 1997], and Extended Static Checking for Java (ESC/Java) [Flanagan and et al., 2002], the primary input to these tools is source code, which is one of the artifacts a software engineering technology testbed should provide. In addition to the code, the evaluators will need seeded defects to be inputted into the code, instrumentation to collect data that will help in forming the results, a mission/scenario generator to create various scenarios under which the technology can be evaluated under, guidelines-manuals on how to use the testbed and to conduct an experiment, and an experience base to search for similar technologies. As previously mentioned, an example of a tool that looks for defects in the code is Bandera [Corbett and et al., 2000]. Bandera, taking Java source code as its input, is used to model check the correctness properties of a Java program. If someone wanted to evaluate Bandera on the SCRover testbed, the evaluators would want to first determine if the SCRover testbed is a viable option for them. Evaluators looking at the SCRover testbed characteristics would soon realize 214 that SCRover was built using C++ code, thereby making it, for the time being, an improper fit to evaluate Bandera. However, if someone wanted the SCRover testbed to support another languages such as Java, this change would be straightforward to make. There are no conceptual obstacles that would prohibit developers from having SCRover support Java-based tools. The developer would take the SCRover specifications and build the robot using Java instead of C++. After development of the Java code, code inspections can be performed to find defects that can be used as seeded defects in future technology evaluations. In addition, defects that were found and recorded during development of the SCRover Java code can be part of the seeded defect pool. For the SCRover testbed, CodeAdvisor is one example of a software defect analysis tool that works with C++ code. The method to evaluate CodeAdvisor on SCRover will be similar to the method described in the next paragraph in how Bandera will be evaluated. Once the SCRover testbed, or any software engineering technology testbed, is developed using Java, an evaluation of Bandera can begin. Evaluators would start by reading the guidelines and manuals for any information they may require in using the testbed. Next, they will define the type of defects the technology is expected to detect and what kind of data the instrumentation class should be collecting. In the case of Bandera, an example of a type of defect it can look for are logic detects as defined by the Orthogonal Defect Classification. A possible property Bandera could be checking for in SCRover is making sure one of the MDS goals does not prematurely end early. The 215 instrumentation could be collecting data on how many times a particular defect would get called or it can be used to track how many times a goal ended early. The next step is to select an operational scenario under which the technology will be evaluated. For example, in the SCRover testbed, the target detection scenario can be chosen. Next, the appropriate instrumentation and seeded defects are applied to the project artifacts associated with the selected operational scenario. Once the Bandera evaluators obtain the seeded Java code, they can input the code into the tool to determine how many defects/problems the tool detected. The problems found by Bandera are then compared to the number of seeded defects in the code, which will allow evaluators to determine how well the Bandera technology performed. In addition, the evaluators will review the data from the instrumentation to assess the defect’s impact on the software system. The data, the analysis, and any other information provided by Bandera (such as performance times of the tool) are stored in an experience base to be reviewed by future project managers interested in using the Bandera technology to increase the dependability of their systems. Other software source code defect analysis tools include Fluid [Halloran and Scherlis, 2002] created by researchers at Carnegie Mellon University and the Extended Static Checking for Java (ESC/Java) [Flanagan and et al., 2002] created by researchers at the Compaq Systems Research Center. Fluid focuses more on improving source code quality such as overspecific variable declarations and ignored exceptions whereas ESC/Java focuses more on functional correctness such as synchronization and array bound errors. Both tools use an 216 annotation scheme to help it identify defects within a Java program. If the Java code provided by the testbed does not have annotation, this would be most likely need to be done by the evaluators or the evaluators would cooperate with the developers to have the appropriate annotation added to the code. Evaluators for Fluid and ESC/Java would go through a similar procedure that Bandera went through to determine how well their respective technologies work. The only differences would be that the defects found by each technology would be most likely different from each other and that Java annotation might need to be added to the code, if not already there. Upon running evaluations for all three technologies, the defects each technology found can be combined together to determine what classes of defects each technology is good at finding and where, if any, overlaps exists between the three technologies. In addition, technology adoption data can be compared. For example, since both Fluid and ESC/Java uses annotation, a comparison of how long it took developers to add the annotation to the code can be performed. Thus, using software source code defect analysis tools with the SETT may require some tailoring of the specifications and/or code, but nothing should prohibit the tools from being evaluated with the SETT. 10.2.3 Software Requirements Defect Analysis If a researcher or practitioner wished to evaluate requirement analysis technologies on the testbed, the researcher or practitioner would primarily need the system requirements, along with possibly code or architecture specifications depending on the technology. For the SCRover testbed, a copy of the 217 requirements specifications seeded with defects that were previously identified with Agile Internal Reviews would be given to the evaluator, along with any other specifications needed. Now, I will look at two examples of requirement analysis technologies: Perspective-Based Reading (PBR) [Shull, Rus, Basili, 2000] and the Automated Requirement Measurement (ARM) Tool [Wyatt and et al., 2003]. Requirements analysis tools evaluators would follow a similar procedure that software source code defect analysis tools evaluators followed in using the software engineering technology testbed. The primary differences are that the specifications and seeded defects used would be different. Perspective-Based Reading (PBR) helps developers find problems in the software requirements by looking at the requirements from several different stakeholders’ perspectives. In PBR, if a customer reviewed the requirements, then a customer’s need statement would be needed as well as the requirements. However, if a tester were to conduct the review, then the set of executable code, the test plan, and the requirements specifications would be needed. If PBR was conducted on the SCRover testbed, the customer’s need statement could be found in the Operational Concept Description, a copy of the Test Plan would be available, as well as the source code. Thus, to evaluate PBR, having specifications similar to what SCRover provides would be enough to conduct the evaluation and no tailoring of the requirements would be needed in this case. If the requirements were not written in a way that is understandable to the stakeholders, then effort would need to be taken to rewrite the requirements in a 218 way that would be understandable to the reviewers before the evaluation can begin. The Software Assurance Technology Center (SATC) at the NASA Goddard Space Flight Center developed the Automated Requirement Measurement (ARM) Tool, which measures the quality of the requirement and helps in writing the requirements right. To use ARM, evaluators simply enter a file with the requirements listed. However, the requirements specifications provided by the SETT may not be in the right format. For example, the requirements specifications provided by SCRover contains more information than is needed by ARM. Thus, an evaluator may need to tailor down SCRover’s requirements specifications to only what ARM needs. After evaluations of ARM and PBR are done, a comparison between the defects those technologies found with the defects Agile Internal Reviews found could be conducted to determine how many defects and what classes of defects each technology is good at detecting. Thus, using software requirement analysis tools with the SETT may require some tailoring of the specifications and/or code, but nothing should prohibit the tools from being evaluated with the SETT. 219 Chapter 11 Conclusion 11.1 Did users benefit from the testbed’s architecture and capabilities? Specifications and Code - All researchers used the specifications and/or code to set up their evaluation. The specifications and code proved to be configurable as the researchers used the same set of artifacts. Without common specifications and/or code, a comparative evaluation would be impossible or at least more difficult to perform. Mission generator – More experiments will have to be conducted to determine the relative effectiveness of having a mission generator in the testbed, but as with other scripting languages, its use would generally be more cost effective than manually preparing each test scenario. Instrumentation – Roshandel (Mae technology) was able to use the instrumentation to quickly gather data for her performance analysis. For those researchers needing to collect data during the running of the code, the instrumentation class provided them a tool to quickly gather data, else they would have to find their own way to gather performance data. Seeded Defects and Defect Pool – Garlan (AcmeStudio) and Roshandel (Mae) were able to use the seeded defects to show the effectiveness of their tools in regards to what type of defects their respective technologies could detect. Without the seeded defects, researchers would have a more difficult time demonstrating the type of defects their technologies can and cannot find. Using 220 the library of defects provided, researchers will not have to think of defects by themselves, thus decreasing the time to perform an analysis and avoiding possible biases of seeding defects likely to be found by the technology. Rover and Simulator – Researchers were able to use the simulator and/or rover to execute the code and collect data for their performance analysis. Without a platform to run the code, the evaluation would be impossible for those researchers working with code or needing code to run. In addition, for those researchers using the simulator, it was a relatively low cost way for them to evaluate the technology. Technology Evaluation Results - Roshandel and Garlan were able to use the evaluation results to show their technologies were complementary and used it to show JPL-MDS personnel that their technology can work on NASA software systems. Guidelines-Manuals– Researchers used the guidelines provided to learn how to conduct an experiment. The manuals were useful in teaching researchers about the MDS Framework and the SCRover testbed. Without the manual, researchers such as the ROPE team would have spent lots of time asking NASA questions about how the MDS technology worked. Project Data – No researcher has used this testbed feature yet, but it is our belief that the data will be valuable for researchers developing technologies to estimate cost and schedule as it contains effort data in developing SCRover and defect data collected during the development phase. Ideally, the defect data 221 would contain the time it took to fix the defect and how this impacted the schedule. Finally, one more benefit the researchers obtained from using SETTs is the early feedback on their technologies. Both Dr. Roshandel and the ROPE team indicated that by applying their technologies to the SCRover testbed first, the experience identified what improvements to their respective technologies were needed to be made first before they tried to convince NASA –JPL users to adopt their respective technologies. The SCRover experience provided a better idea of how their technology should be applied to NASA-like systems, as opposed to the software systems they built themselves. 11.2 Were testbed operational concepts met? Representative Feedback - Researchers were able to receive representative feedback on their technology evaluations resulting for the Garlan and Medvidovic groups in a joint paper on the complementarity of their technologies [Roshandel, Schmerl, and et al. 2004]. In this case, the results were representative to NASA, and more specifically to robot-critical systems. Based on results from the ACME experiment, Dr. Garlan and his research group were able to use the SCRover testbed to get good feedback on how well the ACME Studio tool would work on the MDS technology. Based on the evaluation, Dr. Garlan was able to improve the tool and convince the JPL-MDS group to use his technology. In addition, Roshandel was able to use the SCRover testbed to further develop her tool to the way MDS would use her technology. So, initial 222 results indicate that the SCRover testbed can provide representative feedback on how well a technology will perform on an actual NASA software system. Multiple Goals - Many artifacts were created for the SCRover testbed, which were used in various technologies. The architecture technologies used the requirement specifications, architecture specifications, and the code for their evaluations. State modeling technologies were able to use architecture specifications and code for their evaluation. Finally, researchers developing testing technologies were able to use the architecture specifications and the code for their evaluations. By using the same set of artifacts, researchers from various fields were able to do a technology evaluation. Such results would not have been feasible to achieve using RoboCup or system integration types of testbeds. Multiple Contributors - Various researchers from the architecture technology family have evaluated their technologies on the SCRover testbed. Each time, the researcher was able to configure the architecture specifications to their experiment’s needs with a minimal amount of effort, usually less than 1 person-month. Further data would be needed on the other testbed artifacts to determine if those artifacts met this operational concept. Searchable Results - Each researcher who applied their technology to the testbed generated technology evaluation results. These results were kept in an experience base that is not part of my research. The SCRover testbed currently uses VQI to allow program developers to search through the results. More work will need to be done to refine the searching criteria. 223 Combinable Results/ Integrated Assessment - Seeded defects gave the researchers a standard to compare their technologies with other technologies. In addition, it allows results to be combined, allowing practitioners to determine how well two or more technologies will work together. For example, during the evaluations, Dr. Garlan and Dr. Roshandel discovered their technologies were compatible and if used together could find a greater number of defects than their respective technologies can by itself. The resulting data from the Mae and AcmeStudio collaboration is shown in Figure 79. The Mae/AcmeStudio results were compared against the peer review technique, which was used to find the original defects. Figure 79: Mae/Acme/Peer Review Results In addition to combining the Mae and Acme technologies, the results from the STRESS and ROPE technologies can be combined as well since both technologies try to find environmental conditions under which a system would 224 fail. However, while both of the technologies’ goals are similar, each uses a different approach to achieve their goal. ROPE tries to determine the system’s operational envelope, defined as a state where the system can be used safely, during runtime. ROPE will then compare the operational envelope with the known envelope to determine if the system is in a dangerous state. On the other hand, STRESS is applied during the design time of the system. STRESS analyzes the system’s algorithms to find unreliable states in the system. When applied to the SCRover testbed, each technology found similar defects. Both technologies predicted that the rover would get stuck in certain wall configurations during the wall-following scenario. However, STRESS also analyzed what would happen if a developer were to fix those defects. The STRESS researchers discovered that fixing those defects could lead to additional defects that were not detected before. Finally, when comparing both technologies, it seems on first glance that using STRESS would be a better technology to apply than ROPE. STRESS found the defects that ROPE found with a lower amount of effort. The ROPE researchers had to spend a lot of time setting up the SCRover system in order to run their evaluation, while STRESS didn’t have to set up the SCRover system to get its results. However, the STRESS researchers did have to run the SCRover system in order to validate their results. If we subtracted the amount of time it took to install the SCRover system, then the ROPE researchers spent about 10- 15 hours to obtain their results. Another advantage that STRESS has over ROPE is that STRESS can be run at design time to detect the defects earlier, as 225 opposed to ROPE, which has to be run at run-time. However, this also means that ROPE can be run live on a mission (for example, exploring Mars) while STRESS cannot be used on a live mission. Low Cost - From Table 29, one can see that each researcher spent a relatively low amount of money on the software during their evaluation. Researchers from the Mae and ACME evaluations did not have to purchase any hardware when they used the SCRover testbed. The only cost to them using the testbed was the time it took to transform the UML models into their own respective models. Both group of researchers spent less than 1 person-month in setting up and running their evaluation. In addition, since the testbed worked with multiple researchers from multiple fields, this showed how cost-effective the testbed was from the sponsor’s view. The organization just had to develop one testbed to satisfy the needs of multiple researchers working in a wide range of technologies. 11.3 Principles and Practices Revisited The testbed objectives, requirements, and architecture have been defined in chapters 3 and 4. I have provided principles and practices to guide what testbed developers need to do in order to construct their own testbed that will help in the technology maturation and adoption process. An instance of the testbed framework was created, which I call SCRover. Researchers from various universities have used the SCRover testbed to evaluate their technologies. SCRover demonstrated how researchers could use the results obtained from SCRover to provide a strong case for NASA practitioners to use their 226 technologies. In addition, the SCRover testbed provided one instance of how technology results could be combined (ACME and Mae) and another instance of how technologies can be compared (STRESS and ROPE). Lessons learned from those experiments have been captured and used to refine the testbed framework. 11.4 How well do SETTs answer challenges of technology adaptation [Redwine and Riddle, 1985] provided several critical factors for adapting new technologies. This section will indicate how well the SCRover testbed overcame those critical factors. · No collection of prior experiences demonstrating positive feedback on a technology - With the use of the SCRover testbed, researchers such as Dr. Roshandel and Garlan were able to provide practitioners (in this case, NASA) a positive experience with using their technology on a NASA-like representative system. In addition, the SCRover testbed helped demonstrate to NASA that the Mae and AcmeStudio technologies were indeed complimentary and could be used in combination to find a greater number of defects. · Conceptual integrity – By performing evaluations on SCRover, the researchers from Mae and ROPE were able to keep refining their technologies until they reached a point where they could demonstrate to NASA-JPL that their technologies was well developed to work on their systems. 227 · Showing a clear recognition of need for the technology – With the SCRover experience, researchers such as the AcmeStudio team and Dr. Roshandel were able to show NASA-JPL practitioners how well their technologies can detect certain classes of defects in a representative software system and how easy/difficult it was to apply the technology. · Tuneability – By applying their technologies on SCRover, researchers were able to indicate to NASA what kind of activities one would need to configure their technologies to work on a NASA-like system. · Lack of training for the new technology – With the SCRover experience, researchers were able to work with the NASA practitioners to identify what training needed to be provided as well as indicate technology adoption data such as, but not limited to, how long it took to apply the technology and how much training was involved before using the technology. 11.5 How well do SETTs meet the hazards of competitions? As mentioned in Chapter 5.4, software engineering technology testbeds can answer the challenges of the hazards of competitions, based on my experience with the SCRover testbed. One of the hazards to competitions is the high cost it takes to participate as indicated by [Stone 2003]. With the SCRover testbed, the cost has been small relative to the prospective use of other types of testbeds since all of the software is free, there is no requirement to buy any robotic hardware and pro-packaged capabilities are available to set up and perform software engineering technology 228 evaluations. There may be some minimal costs involved in installing and modifying the software for use. Another hazard indicated by [Stone 2003] is that most competitions use different platforms and this causes an un-level playing field. In addition, this makes comparing technologies more difficult. With the SCRover testbed, every researcher has the option to use the same platform, thus giving every researcher the same opportunity to succeed. A third hazard to competition was the potential to make flawed conclusions when comparing two or more technologies [Stone 2003]. Since the SCRover testbed utilizes a common data definition for representing the results this problem was avoided. Furthermore, SCRover’s data definition allows program developers to determine if using two or more technologies are beneficial or not to the project. It allows developers to see if different technologies will detect the same set of defects or will each technology find different defects. 11.6 Do SETTs help increase technology maturity? After the application of the ACME and Mae technologies to the SCRover testbed, both tools became more mature and more ready to be used by NASA. Likewise, the ROPE tool became more mature as the SCRover testbed helped the University of Oregon researchers find bugs in their tool. 11.7 Results to Date As of today, we have had various researchers apply their technologies to the SCRover testbed. The research fields include architecture, operational 229 envelopes, testing, and state-model checking. Each researcher was able to use the testbed to demonstrate how well their technology worked. Furthermore, the SCRover system gave them new insight into how the technology would be used by NASA, which led to improvements in their technologies. During the evaluation, each researcher was able to customize the testbed and choose which artifacts they needed for their evaluation. In some cases, like the Mae, STRESS and ROPE technologies, the researchers were able to run the SCRover code on an actual rover to find defects or to collect instrumentation data for their evaluation. In other cases like the ACME evaluation, it was not necessary to run the code on the actual rover as they were just looking for defects in the architecture designs. Each researcher spent approximately 1-2 person-months performing the evaluation thus the cost is low to conduct the evaluation as compared to manual preparation of similar capabilities to be used on system-under-test testbeds. An initial version of the SCRover testbed has been created and is available for use by researchers. Currently, five researchers have used the SCRover testbed for their technology evaluations. With the SCRover testbed, they have been able to show how well their technology works in a representative NASA-mission setting. In addition, researchers, such as Dr. Roshanak Roshandel and Dr. Fickas, were able to improve their technologies and/or tools by using it on the SCRover testbed. The SCRover system gave them new insight into how the technology would be used by NASA, which led to improvements in their technologies. SCRover thus provides an existence proof that the principles and 230 practices evolved for developing software engineering technology testbeds can be used to develop a distributable testbed that is low-cost and is representative of NASA missions and meets the objectives of the stakeholders involved. 11.8 Conclusions and Lessons Learned 1. Cost-effectiveness of Mae and AcmeStudio tools. Even in initial exploratory evaluations across somewhat different SCRover testbed configurations and limited mission scenarios, both Mae and Acme Studio were cost-effective with respect to UML and peer-reviews in avoiding, detecting, and diagnosing mission-critical specification defects. 2. Cost-Effectiveness of MDS and Player-Stage Frameworks. There is a non-trivial investment required in learning the frameworks and getting them to compile, run, and interoperate, but a significant acceleration in productivity thereafter. For example, it took two person-months to get the very simple Increment 1 MRE4 capability to work with SCRover, and only one person-month to develop the considerably more complex Increment 2 wall-following capability. Having the Player/Stage framework enabled us to implement the SCRover MRE4 capability in only 800 lines of code (LOC). This is a reduction of more than 80% over the 5000 LOC implemented by JPL for their version of the MRE4. The MDS Event Logging Function was also a significant timesaver in developing and applying the SCRover testbed instrumentation package. 3. Capabilities and limitations of seeded defect techniques. The seeded defect approach was effective in identifying the degree to which Mae could identify defects of various classes. However, after estimating 9 likely remaining 231 defects, we found that AcmeStudio alone discovered 8 remaining defects, 5 of which were in categories (style usage, completeness) not in our defect categorization scheme. Thus it appears that the seeded defect technique’s maximum likelihood estimate is better considered as a lower-bound estimate of the defects remaining in the categories constituting the current universe of defect sources. As an analogy, since the seeded defect technique derives from the use of fish tagging to estimate the total number of fish in a body of water, the technique can only estimate the number of fish catchable by the type of net used in catching tagged and untagged fish. There may be a number of smaller but significant fish (i.e., defects) swimming around undetected. 4. Testbed technology coverage: The SCRover testbed also includes requirements, code and test cases, but our initial experiments have focused on evaluation of architecture description language analysis tools, with some use of the ADL specifications for runtime assertion checking. Additional capabilities that could be used include testbed support for evaluating dependability technologies focused on requirements, code, or testing, and for evaluating combinations of technologies. 5. Testbed support scalability: The current SCRover testbed was able to provide a fairly low entry barrier for the Mae and AcmeStudio researchers, but only with a nontrivial amount of support by SCRover developers. This level of support could be reduced over time by further documentation and tailoring abilities. 232 6. Broad Participation and Teambuilding. Both for testbed technology and technology adoption, user-supplier teambuilding is at least as important as technology excellence. This is particularly true when multiple stakeholders need to rapidly adapt to unforeseeable changes, which happened frequently during the SCRover testbed development and experimental use. 7. Testbed ability to accelerate technology maturity and transition: The ability to evaluate alternative ADL-based specification technologies on the common SCRover testbed enabled both technology researchers and project personnel to identify previously unrecognized technology complementarities and opportunities to combine the technologies to achieve significant project dependability benefits. As discussed in chapter 0, JPL project personnel and USC and CMU researchers have come together to explore and expedite these technology opportunities. This provides encouraging evidence that the testbed approach can cost-effectively accelerate software engineering technology maturity and transition 8. Based on initial results, the SCRover testbed framework is sufficiently tailorable to enable completion of its objectives as stated in chapter 0 such as working in multiple fields and having multiple contributors. In addition, the SCRover testbed has demonstrated it can answer the challenges of technology adaptation and the hazards of competitions. 9. Testbeds can be used to improve a tool or technology. By having an external system to apply their technology to, researchers can gauge how well their technologies will work on a software system they didn’t build. In some 233 cases, the researchers will discover defects in their technologies when they try to apply their technology to the testbed or the application of the technology to the testbed may cause researchers to rethink how their technology is supposed to work, as is what happened in the case of the ROPE evaluation. In the ROPE evaluation, researchers found defects in their technology and realized they had to change how ROPE would work if they wanted their technologies to be applied to NASA software systems. 10. SCRover helped prove to NASA that their technology such as ACME can be applied to NASA-like systems 11. Not everyone may benefit from using a testbed to evaluate technologies. In some technology evaluations, the testbed may not meet the needs of the technology in order to be effectively applied, as evidenced by the TAR technology when researchers tried to apply it to the SCRover testbed. In some cases, the testbed system may have to be changed or improved upon for a technology to use the testbed system. In conclusion, this dissertation introduced the requirements, architecture, and concept of operation of a successfully used software engineering technology testbed. The experiences of three technology evaluations on an instance of the SETT called SCRover were reported. The results and benefits each researcher obtained from using SCRover were presented, as well as how a practitioner can interpret the data obtained from the evaluations. This dissertation also included several charts that define the current domain of applicability of the testbed. And as a bottom line, the SCRover testbed provided a working example of how 234 SETTs and their ability to provide users with comparable empirical data can overcome the challenges of technology adoption and maturation in order to increase the speed of the technology maturation and adoption process. 235 Bibliography [ActivMedia 2003] ActiveMedia Robitics. Pioneer 2 All-Terrain Robot, 2003. <http://www.activrobots.com/ROBOTS/p2at.html> [Ball and Rajamani, 2002] Ball, T. and Rajamani, S. K. The SLAM project: Debugging System Software Via Static Analysis. Proceedings of the 29 th Symposium on Principles of Programming Languages, pp. 1-3, ACM Press, New York, USA, Jan. 2002. [Banerjee, Chung, and et al. 2006] Banerjee, S., Cheung, L., Golubchik, L., Medvidovic, N., Roshandel, R., and Sukhatme, G. Engineering Reliability into Hybrid Systems via Rich Design Models: Recent Results and Current Directions. Proceedings of the NSF Next Generation Software Program (NSFNGS) Workshop, Rhodes Island, Greece, 2006. [Basili, Donzelli, and Asgari 2004] Basili, V., Donzelli, P., and Asgari, S. “A unified model of dependability: capturing dependability in context.” IEEE Software. Volume 21, Issue 6, pp.19 – 25, 2004. [Basili, Tesoriero, and et al. 2001] Basili, V., Tesoriero, R., Costa, P., Lindvall, M., Rus, I., Shull, F., and Zelkowitz, M. Building an Experience Base for Software Engineering: A report on the first CeBASE eWorkshop. Proceedings of the Third International Conference on Product Focused Software Process Improvement. 2001. [Basili, Zelkowitz, and et al. 2007] Basili, V., Zelkowitz, M., Sjøberg, D., Johnson, P., and Cowling, A. “Protocols in the use of empirical software engineering artifacts.” Empirical Software Engineering: pp. 107-119, 2007. [Boehm 1996] Boehm, B. "Anchoring the Software Process", IEEE Software. pp. 73-82, 1996. [Boehm and et al. 2004] Boehm, B., Bhuta, J., Garlan, D., Gradman, E., Huang, L., Lam, A., Madachy, R., Medvidovic, N., Meyer, K., Meyers, S., Perez, G., Reinholtz, K., Roshandel, R., and Rouquette, N. Using Testbeds to Accelerate Technology Maturity and Transition: The SCRover Experience. ACM-IEEE International Symposium on Empirical Software Engineering, Redondo Beach, CA, USA, 2004. [Boehm and Port 2001] Boehm, B. and Port, D. "Balancing Discipline and Flexibility with The Spiral Model and MBASE." Crosstalk: pp. 23-28, 2001. 236 [Boehm and Port 2002] Boehm, B. and Port, D. Defect and Fault Seeding In Dependability Benchmarking. Proceedings of the DSN Workshop on Dependability Benchmarking, 2002. [Bhaskara 2003] Bhaskara, G. SCRover Wall Following Experiments. 2003. [Booch and et al. 1999] Booch, G., Rumbaugh, J., and Jacobson, I. The Unified Modeling Language User Guide. Addison Wesley, 1999. [Brat and et al. 2004] Brat, G., Giannakopoulou, D., Goldberg, A., Havelund, K., Lowry, M., Pasareanu, C., Venet, A., and Visser, W. “Experimental Evaluation of Verification and Validation Tools on Martian Rover Software.” Formal Methods in Systems Design Journal. Volume 25, Issue 2-3, pp. 167-198, 2004. [Chillarege and et al. 1992] Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., and Wong, M-Y. “Orthogonal Defect Classification- A Concept for In-Process Measurements.” IEEE Transactions on Software Engineering, 18(11), 1992. [Clavel and et al, 2000] Clavel, M., Durán, F., Eker, S., Lincoln, P., Marti-Oliet, N., Meseguer, J. and Quesada, J. Towards Maude 2.0. Third International Workshop on Rewriting Logic and its Applications (WRLA' 00) - Electronic Notes in Theoretical Computer Science, 2000. [Corbett and et al., 2000] Corbett, J., Dwyer, M., Hatcliff, J., Pasareanu, C., Robby, Laubach, S., and Zheng, H. Bandera: Extracting Finite-state Models from Java Source Code. Proceedings of the 22nd International Conference on Software Engineering, June 2000. [DARPA 2000] DARPA Speech Recognition Workshop Proceedings. 2000. <http://www.nist.gov/speech/publications/> [Dashofy and et al. 2002] Dashofy, E.M., van der Hoek, A., and Taylor, R.N. An Infrastructure for the Rapid Development of XML-based Architecture Description Languages. Proceedings of the 24th International Conference on Software Engineering (ICSE 2002), Orlando, FL, pp 266-276, 2002. [Denker and Talcott 2004] Denker, G. and Talcott, C. L. Formal Checklists for Remote Agent Dependability. Fifth International Workshop on Rewriting Logic and Its Applications, 2004. [Department of Defense, 1994] Department of Defense. MIL-STD-498 - Military Standard Software Development and Documentation. December 1994. 237 [Duesing and Diamant, 1997] Duesing, T., and Diamant, J. “CodeAdvisor: rule- based C++ defect detection using a static database - HP' s SoftBench 5.0 application development software.” Hewlett-Packard Journal, February 1997. [Dvorak and et al. 2004] Dvorak, D., Bollella, G., Canham, T., Carson, V., Champlin, V., Giovannoni, B., Indictor, M., Meyer, K., Murray, A. and Reinholtz, K. Project Golden Gate: Towards Real-Time Java in Space Missions. Proceedings of the Seventh IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, 2004. [Dvorak and et al. 2000] Dvorak, D., Rasmussen, R. Reeves, G. and Sacks, A. Software Architecture Themes in JPL’s Mission Data System. Proceedings of 2000 IEEE Aerospace Conference, 2000. [ER1 2004] ER1 Personal Robot System. 2004. <http://www.evolution.com/er1/> [Flanagan and et al., 2002] Flanagan, C., Leino, K.R.M., Lillibridge, M., Nelson, G., Saxe, J.B., and Stata, R. Extended Static Checking for Java. Proceedings of the 2002 Conference on Programming Language Design and Implementation, pages 234-245, New York, June 2002. ACM Press. [Garlan and et al. 2000] Garlan, D., Monroe, R.T., and Wile, D. Acme: Architectural Description of Component-Based Systems. In Leavens, G.T., and Sitaraman, M. (eds), Foundations of Component-Based Systems. Cambridge University Press, 2000. [George and Powers 2003] George, T. and Powers, R. “Closing the TRL Gap”, Aerospace America: pp. 24-26. 2003. [Gerkey and et al. 2003] Gerkey, B., Vaughan, R., and Howard, A. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems. Proceedings of the 11th International Conference on Advanced Robotics. Portugal, 2003. [Graettinger and et al. 2002] Graettinger, C., Garcia, S., Siviy, J., Schenk, R., and Van Syckle, P. Using the Technology Readiness Levels Scale to Support Technology Management in the DoD’s ATD/STO Environments - A Findings and Recommendations Report Conducted for Army CECOM. CMU/SEI-2002-SR- 027, 2002. [Fickas and et al. 2004a] Fickas, S., Prideaux, J., and Fortier, A. “ROPE Experiment”. HDCP Workshop, University of Maryland, June 2004. [Fickas and et al, 2004b] Fickas, S., Prideaux, J., and Fortier, A. ROPE: Reasoning about OPerational Envelopes, 2004. <http://www.cs.uoregon.edu/research/mds/> 238 [Gerkey and et al, 2003] Gerkey, B., Vaughan, R., and Howard. A. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems. Proceedings of the 11th International Conference on Advanced Robotics (ICAR 2003), pages 317-323, Coimbra, Portugal, 2003. [Halloran and Scherlis, 2002] Halloran, T. J., and Scherlis, W. L. “Models of Thumb: Assuring Best Practice Source Code in Large Java Software Systems.” Tech. Rep. Fluid Project, School of Computer Science/ISRI, Carnegie Mellon University, Sept. 2002. [Halstead 1977] Halstead, M. Elements of Software Science. Elesvier, 1977. [HDCP 2001] HDCP - High Dependability Computing Project. 2001. <http://www.hdcp.org> [HDCP 2003] HDCP - Technologies. 2003 <http://www.cebase.org/hdcp/technologies/technologies.asp> [Helmy and Estrin 1998] Helmy, A. and Estrin, D. Simulation-based `STRESS' Testing Case Study: A Multicast Routing Protocol. IEEE Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Canada, 1998. [Helmy and et al. 2004] Helmy, A., Gupta, S., and Estrin, D. “The STRESS Method for Boundary-point Performance Analysis of End-to-end Multicast Timer- Suppression Mechanisms” IEEE/ACM Transactions on Networking (ToN). Vol. 12, No. 1, pp. 44-58, 2004. [Huang 2005] Huang, L. A Value-Based Process for Achieving Software Dependability. Software Process Workshop, Beijing, China, 2005. [Ada] Introduction to the Ada Compiler Evaluation System. 2007 <http://www.adaic.org/compilers/aces/aces-intro.html> [iSIM 2004] iSIM. 2004. <http://www.cs.uoregon.edu/~jprideau/iSIM/> [Jackson 2000] Jackson, D. Alloy: A Lightweight Object Model Notation. Technical Report 797, MIT Laboratory for Computer Science, Cambridge, MA, 2000. [Johnson 2001] Johnson, P. Project Hackystat: Accelerating adoption of empirically guided software development through non-disruptive, developer- centric, in-process data collection and analysis. Department of Information and Computer Sciences, University of Hawaii, 2001. 239 [Kellner and et al. 1991] Kellner, M., Feiler, P., Finkelstein, A., Katayama, T., Osterweil, L., Peneda, M., and Rombach H. (1991). ISPW-6 Software Process Example. Proceedings of First International Conference on the Software Process, Redondo Beach, CA, USA. pp. 176-186, 1991. [Kotov 2004] Kotov, V. Developing Testbeds and Testbed Technology. HDCP Workshop, San Jose, CA, June 2004. [Krichten 2001] Kruchten, P. The Rational Unified Process (2nd ed.), Addison Wesley, 2001. [Lima and et al. 2005] Lima, P., Custódio, L., Akin, L., Jacoff, A., Kraetzschmar, G., Beng Kiat, N., Obst, O., Röfer, T., Takahashi, Y., and Zhou, C. “RoboCup 2004 Competitions and Symposium: A Small Kick for Robots, a Giant Score for Science.” AI Magazine. Volume 26, Number 2, 2005. [Lindvall 2004] Lindvall, M. Testbed characterization - FC-MD TSAFE testbed. HDCP Workshop, San Jose, CA, June 2004. [Lindvall and et al, 2005] Lindvall, M., Rus, I., Shull, F., Zelkowitz, M., Donzelli, P., Memon, A., Basili, V., Costa, P., Tvedt, R., Hochstein, L., Asgari, S., Ackermann, C. and Pech, D. “An evolutionary testbed for software technology evaluation.” Innovations in Systems and Software Engineering - A NASA Journal, Vol. 1, No. 1, pp. 3-11, 2005. [Mankins 1995a] Mankins, J. Technology Readiness Levels. NASA Office of Space Access and Technology, 1995. [Mankins 1995b] Mankins, J. Technology Readiness Levels – A White Paper, 1995. [Mason and Talcott 2004] Mason, I. and Talcott, C. L. IOP: The InterOperability Platform & IMaude: An Interactive Extension of Maude. Fifth International Workshop on Rewriting Logic and Its Applications, 2004. [McCabe 1976] McCabe, T. “A Complexity Measure” IEEE Trans. Sw. Engr., SE- 2(4), pp-308-320, 1976. [Medvidovic and et al. 2003] Medvidovic, N., Gruenbacher, P., Egyed, A., and Boehm, B. “Bridging Models across the Software Lifecycle”. Journal of Systems and Software. Volume 68, Issue 3, pp. 199-215, 2003. [Medvidovic and et al. 1999] Medvidovic, N., Rosenblum, D.S., and Taylor, R.N. A Language and Environment for Architecture- Based Software Development and Evolution. Proceedings of the 1999 International Conference on Software Engineering, Los Angeles, CA, pp.44-53, 1999. 240 [Metz and Lencevicius 2003] Metz, E. and Lencevicius, R. “Efficient Instrumentation For Performance Profiling”. CoRR: Performance, 2003. [Mikic-Rakic and et al. 2004] Mikic-Rakic, M. Malek, S., Beckman, N. and Medvidovic, N. A Tailorable Environment for Assessing the Quality of Deployment Architectures in Highly Distributed Settings. International. Conference on Component Deployment, Edinburgh, UK, 2004. [Mills 1972] Mills, H. "On The Statistical Validation of Computer Programs", IBM Federal Systems Division Report 72-6015, 1972. [NASA 2003] NASA. Mars Exploration Rover Mission: Overview, 2003. http://marsrovers.jpl.nasa.gov/overview/ [Nolte and et al. 2004] Nolte, W., Kennedy, B., and Dziegiel, R. Technology Readiness Calculator. In White Paper: Air Force Research Laboratory, 2004. [Pfleeger and et al. 2002] Pfleeger, S., Hatton, L., and Howell, C. Solid Software. Upper Saddle River, NJ: Prentice Hall PTR, 2002. [Redwine and Riddle 1985] Redwine, S. and Riddle, W. Software Technology Maturation. Proceedings of the 8th International Conference on Software Engineering, 1985. [Rinker 2002] Rinker, G. Mission Data Systems Architecture and Implementation Guidelines. Ground System Architectures Workshop (GSAW). El Segundo, California, 2002. [RoboCup 2007] RoboCup, 2007. <http://www.robocup.org/> [Roshandel and et al., 2006] Roshandel, R., Banerjee, S., Cheung, L., Medvidovic, N., and Golubchik, L. Estimating Software Component Reliability by Leveraging Architectural Models. 28th International Conference on Software Engineering, Shanghai, China, 2006. [Roshandel, Schmerl, and et al. 2004] Roshandel, R., Schmerl, B., Medvidovic, N., Garlan, D. and Zhang, D. Understanding Tradeoffs among Different Architectural Modeling Approaches. Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture, 2004. [Roshandel and et al., 2004] Roshandel, R., van der Hoek, A., Mikic-Rakic, M. and Medvidovic, N. "Mae - A System Model and Environment for Managing Architectural Evolution", ACM Transactions on Software Engineering and Methodology. Vol. 11, no. 2, pp. 240-276, 2004. 241 [Schmerl and Garlan 2004] Schmerl, B. and Garlan, D. AcmeStudio: Supporting Style-Centered Architecture Development (Research Demonstration). Proceedings of the 26th International Conference on Software Engineering, Edinburgh, Scotland, 2004. [Seaman and et al. 1999] Seaman, C., Mendonça, M., Basili, V. and Kim, Y. An Experience Management System for a Software Consulting Organization. 24th NASA SEL Software Engineering Workshop, 1999. [Shull, Rus, Basili, 2000] Shull, F., Rus, I., and Basili, V. “How Perspective-Based Reading Can Improve Requirements Inspections.” Computer. Volume 33, pp. 73- 79. July 2000. [Stone 2003] Stone, P. (2003). “Multiagent Competition and Research: Lessons from RoboCup and TAC.” RoboCup-2002: Robot Soccer World Cup VI, pp. 224– 237, Springer Verlag, Berlin, 2003. [Talcott and et al. 2004] Talcott, C.L., Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., and Meseguer, J. Maude, 2004. http://maude.cs.uiuc.edu/ [Test 2006] Test Coverage Tools. 2006. <http://www.testingfaqs.org/t-eval.html> [Testbed 2007] TESTBED project. 2007. <https://doc.telin.nl/dscgi/ds.py/Get/File- 7820/testbed.htm> [Thorp and DMR 1998] Thorp J. and DMR. The Information Paradox, McGraw Hill, 1998. [Tichelaar and et al. 1997] Tichelaar, S., Ducasse, S., and Meijler, T. Architectural Extraction in Reverse Engineering by Prototyping: An Experiment. Proceedings of the ESEC/FSE Workshop on Object-Oriented Re-engineering, 1997. [Tracz 1994] Tracz, W. “Domain-Specific Software Architecture (DSSA) Frequently Asked Questions (FAQ)” ACM SIGSOFT, Software Engineering Notes, Volume 19, Issue 2 pp. 52-56, 1994. [Tracz 1995] Tracz, W. “DSSA (Domain-Specific Software Architecture) Pedagogical Example.” ACM SIGSOFT Software Engineering Notes. Volume 20, Issue 3 pp. 49-62, 1995. [TAC 2007] Trading Agent Competition (TAC). 2007. <http://www.sics.se/tac/> [US GAO 1999] United States General Accounting Office (US GAO). “Better Management of Technology Development Can Improve Weapon System Outcomes.” 1999. 242 [USC-CSSE 2003] USC Center for Systems and Software Engineering (USC- CSSE) (2003). Guidelines for Model-Based (System) Architecting and Software Engineering. [Voas and McGraw 1998] Voas, J. and McGraw, G. Software Fault Injection, Wiley, 1998. [Wyatt and et al., 2003] Wyatt, V., DiStefano, J., Chapman, M., and Aycoth, E. A Metrics Based Approach For Identifying Requirements Risks. Proceedings of the 28th Annual NASA Goddard Software Engineering Workshop (SEW’03). pp. 23- 28. December 2003. [Zhang, Garlan, and Schmerl 2004] Zhang, D., Garlan, D. and Schmerl, B. “SCRover Architecture Checking in AcmeStudio”, 2004. 243 Appendices 244 Appendix A: Technology Readiness Levels Summary TRL 1: Basic principles observed and reported TRL 2: Technology concept and/or application formulated TRL 3: Analytical and experimental critical function and/or characteristic proof-of concept TRL 4: Component and/or breadboard validation in laboratory environment TRL 5: Component and/or breadboard validation in relevant environment TRL 6: System/subsystem model or prototype demonstration in a relevant environment (ground or space) TRL 7: System prototype demonstration in a space environment TRL 8: Actual system completed and “flight qualified” through test and demonstration (ground or space) TRL 9: Actual system “flight proven” through successful mission operations Source: [Nolte and et al. 2004] 245 Appendix B: JPL-MDS Export Control Clearance Form This Form is to be completed by external persons requesting access to JPL software/technology controlled under the Export Administration Regulations. Software/Technology Requested (Include NTR number if known): Name of Requestor: Name of Recipient (If Different): Affiliation(s) (List all organizations recipient works for or otherwise represents): Citizenship of Recipient: Country of Residence of Recipient: Delivery Address: End Use: Approved By: -------------------------------------------------- ------------------------- Printed Name: Date Office of Legislative and International Affairs 246 Appendix C: Testbed Survey 1. Did using the SCRover testbed help with your research? If so, how? Some things to consider when answering the above question: · How did it advance your research/technology? o Did it lead to improvements in your tool/technology? If so, what were they? Did using the SCRover system help find flaws in your tool? Did using the SCRover system show what the limitations of your tool were? o Did using SCRover provide new insights into how your tool/technology worked? o Other comments? · Did using SCRover help mature your technology to a point where it can be used by NASA-JPL? 2. From your experience, what do you feel could be improved in the SCRover testbed? Some things to consider when answering the above question: · What was missing from the testbed that you needed for your experiment? · Did you feel you could use the SCRover testbed by yourself w/o any help from USC? If not, what was missing? Or was the system/artifacts hard to understand? Or anything else that made it hard to use? 247 3. Did you feel that the SCRover system was representative of a NASA mission? If not, why? 4. Can you please estimate how many hours did it take to set up your experiment? 5. Any other comments about the SCRover testbed? 248 Appendix D: Technologies Examined for the HDCP Program Researchers from the Fraunhofer Center at the University of Maryland gathered the information in Table 30 [HDCP 2003]. In addition to the artifacts listed in the “What the SETT framework provides” and “What SCRover provides”, the testbed provides seeded defects for each technology to look for, a mission/scenario generator to create various scenarios under which the technology can be evaluated under, instrumentation to collect data, guidelines- manuals on how to use the testbed and to conduct an experiment, and an experience base to search for similar technologies. For more information about each technology, you may go to the following website: http://www.cebase.org/hdcp/technologies/technologies.asp. Table 30: HDCP Technologies to be Evaluated Name of Technology What artifacts technology needs What the SETT framework provides What SCRover provides AcmeStudio Architectural designs Architecture Specifications SSAD Atomizer Code, unit tests Code, Test Document SCRover/MDS code, Test Document Cqual and Ccured Code Code SCRover/MDS code Dependability Cases Any - - Formal checklists Design, prototype, code, documentation Architecture Specifications, Code, running system SSAD, SCRover/MDS code, Running System on Gazebo and Pioneer rover 249 Table 30: HDCP Technologies to be Evaluated, Continued HackyStat Artifacts for which quality attributes can be measured automatically. Specifications and Code All SCRover specifications and code IVA – Instability Visualization and Analysis Code Code SCRover/MDS code Improving test suites via operational abstraction Code and test cases Code, Test document SCRover/MDS code, Test document Intrusion detection Running System Running System on simulator and hardware Running System on Gazebo and Pioneer rover Mae Architectural description Architecture Specifications SSAD Maude Models and specifications of the system Architecture Specifications SSAD NetFT: Network-level Fault Tolerance Design and code Architecture Specifications and code SSAD and SCRover/MDS code Orion Source Code Code SCRover/MDS code 250 Table30: HDCP Technologies to be Evaluated, Continued Predictive runtime analysis Running program or executable design Code SCRover/MDS code ROPE Design specification, running system Architecture Specifications and code SSAD and SCRover/MDS code STRESS Design specification and requirement, test cases Architecture and Requirement Specifications, Test document SSRD, SSAD, Test document Usability & Software Architecture Architecture design documents. Architecture Specifications SSAD
Abstract (if available)
Abstract
This research provides a new way to develop and apply a new form of software: software engineering technology testbeds designed to evaluate alternative software engineering technologies, and to accelerate their maturation and transition into project use. Software engineering technology testbeds include not only the specifications and code, but also the package of instrumentation, scenario drivers, seeded defects, experimentation guidelines, and comparative effort and defect data needed to facilitate technology evaluation experiments.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Software connectors for highly distributed and voluminous data-intensive systems
PDF
Using metrics of scattering to assess software quality
PDF
Architectural evolution and decay in software systems
PDF
The incremental commitment spiral model process patterns for rapid-fielding projects
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Automated synthesis of domain-specific model interpreters
PDF
Value-based, dependency-aware inspection and test prioritization
PDF
Improved size and effort estimation models for software maintenance
PDF
Redundancy driven design of logic circuits for yield/area maximization in emerging technologies
PDF
Techniques for methodically exploring software development alternatives
PDF
Calculating architectural reliability via modeling and analysis
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Software architecture recovery using text classification -- recover and RELAX
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Energy optimization of mobile applications
PDF
Quantitative and qualitative analyses of requirements elaboration for early software size estimation
PDF
Constraint-based program analysis for concurrent software
PDF
Security functional requirements analysis for developing secure software
Asset Metadata
Creator
Lam, Alexander K.
(author)
Core Title
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/29/2007
Defense Date
05/02/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,software dependability,software engineering technology testbed,software maturity
Language
English
Advisor
Gupta, Sandeep K. (
committee chair
), Boehm, Barry W. (
committee member
), Medvidovic, Nenad (
committee member
), Selby, Richard (
committee member
)
Creator Email
alexankl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m803
Unique identifier
UC1179498
Identifier
etd-Lam-20070829 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-546385 (legacy record id),usctheses-m803 (legacy record id)
Legacy Identifier
etd-Lam-20070829.pdf
Dmrecord
546385
Document Type
Dissertation
Rights
Lam, Alexander K.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
software dependability
software engineering technology testbed
software maturity