Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
(USC Thesis Other)
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Assessing Software Maintainability in Systems by Leveraging Fuzzy Methods and Linguistic Analysis by Qianqian Celia Chen A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) May 2019 Copyright 2019 Qianqian Celia Chen Acknowledgements This work grew through the continuous support and guidance from many people. I would like to rst thank my advisor, Dr. Barry Boehm, for his support, encouragement and condence in me. His working ethics and immense knowledge has inspired me every step of the way. I would also like to thank my dissertation committee members, Dr. Chao Wang and Dr. Sandeep Gupta for the valuable advice they provided through each stage of the process. I greatly appreciate the opportunity I had to visit the Institute of Software Chinese Academy of Sciences in Beijing, China in Fall 2016. I would like to thank Dr. Qing Wang for her hospitality and the productive collaboration I had with Dr. Lin Shi and Dr. Yang Sun. I would like to thank Michael Shoga for spending countless hours proofreading and helping me successfully complete my dissertation. In addition, I am extremely thankful for my husband Andy, my parents, and my in-laws for their warm love and endless support; my dogs, Dr. Pepper and Luke for enormous comfort and joy they have brought along the way; and the Nomad Gym crew for teaching me what it means to be savage. Most importantly, I want to thank God for the hope and peace He has provided in the whole journey. ii Table of Contents Acknowledgements ii List Of Tables v List Of Figures vii Abstract x Chapter 1: Introduction 1 1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Software Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Software Maintenance Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Software Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Related Work 9 2.1 Software Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Bugs Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Natural Language Processing in Software Engineering . . . . . . . . . . . . . . . . 14 Chapter 3: The Analysis of Current Techniques 16 3.1 The Classication of Software Maintainability Metrics . . . . . . . . . . . . . . . . 16 3.2 The Eectiveness of Maintainability Index . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 The Eectiveness of Human-assessed Metrics . . . . . . . . . . . . . . . . . . . . . 28 3.4 A Systematic Literature Review: Software Maintenance in Open Source Software . . . . . . . . . . . . . . . . . . . . 35 3.5 A Proposed Software Maintenance Readiness Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4: The Research Approach 46 4.1 Software Maintainability Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Preliminary Analysis on Mozilla Community . . . . . . . . . . . . . . . . . . . . . 50 4.3 Overview of Fuzzy Sets and Linguistic Patterns . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Fuzzy Sets and Fuzzy Reasoning . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.3 NLP Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Denition of Heuristic Linguistic Patterns . . . . . . . . . . . . . . . . . . . . . . . 57 iii 4.5 The Modeling Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Fuzzy Rule Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.6.1 Initial Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.2 Incremental Selection Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.7 Evaluation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 5: Experimental Background 66 5.1 Study Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Data Extraction and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Chapter 6: Research Results 73 6.1 Results of RQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.1.1 Rule-generating Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.1.2 Non-rule-generating Projects . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.2 Results of RQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Results of RQ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4 Results of RQ4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.4.1 Severity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.4.2 Fix Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.4.3 Domain Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.5 Results of RQ5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 7: Discussions and Future Work 96 7.1 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Chapter 8: Conclusion 100 Reference List 102 Appendix A Generated Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 iv List Of Tables 3.1 Classication on Number of Projects by LLOC in Each Domain . . . . . . . . . . . 19 3.2 Characteristics of Project Data Sources . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 One-way ANOVA Results for Language Analysis . . . . . . . . . . . . . . . . . . . 21 3.4 One-way ANOVA Results for Domain Analysis . . . . . . . . . . . . . . . . . . . . 22 3.5 Pearson Correlation Results between Eort and MI for UCC Projects . . . . . . . 26 3.6 Pearson Correlation Results between Productivity and MI for UCC Projects . . . . 26 3.7 Rating Scale for Software Understanding Increment (SU) . . . . . . . . . . . . . . 29 3.8 Characteristics of Project Data Sources . . . . . . . . . . . . . . . . . . . . . . . . 29 3.9 Examples of Maintenance Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.10 Projects and Tasks Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.11 The Mapping between Extracted Properties and RQs . . . . . . . . . . . . . . . . 39 3.12 Software-Intensive Systems Maintainability Readiness Levels . . . . . . . . . . . . 41 4.1 Upper Levels of SERC Stakeholder Value-Based SQ Means-Ends Hierarchy . . . . 47 4.2 Characteristics of Mozilla Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 POS Tags and Corresponding Tagged Types . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Code Term Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 Examples of Some Initial Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.1 Characteristics of the Study Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Number of Manually Tagged Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 v 5.3 Classication of RQ4 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1 Average Metrics for Rule-generating Projects . . . . . . . . . . . . . . . . . . . . . 79 6.2 Average Metrics for Non-rule-generating Projects . . . . . . . . . . . . . . . . . . . 79 A.1 Fuzzy Rules for Software Maintainability Subgroup SQs . . . . . . . . . . . . . . . 112 vi List Of Figures 3.1 MI by Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 MIwoc by Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 MIwc by Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Overall MI over Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Overall MI for Dierent Languages within Each Project Domain . . . . . . . . . . 25 3.6 UCC Projects' Eort versus Maintainability Index . . . . . . . . . . . . . . . . . . 27 3.7 UCC Projects' Productivity versus Maintainability Index . . . . . . . . . . . . . . 28 3.8 Pie Chart of Students' Industrial Experience . . . . . . . . . . . . . . . . . . . . . 30 3.9 Experience Ratings from Students' Personnel Questionnaire . . . . . . . . . . . . . 31 3.10 Correlation between SU Factors and Average Eort . . . . . . . . . . . . . . . . . 35 3.11 Correlation between Factors within Structure and Average Eort . . . . . . . . . . 36 3.12 Correlation between Factors within Self-Descriptiveness and Average Eort . . . . 37 3.13 Number of Included Primary Studies During the Study Selection Process . . . . . 38 4.1 Software Maintainability Ontology Hierarchy . . . . . . . . . . . . . . . . . . . . . 48 4.2 An Example of the Universal Dependencies . . . . . . . . . . . . . . . . . . . . . . 59 4.3 The Overall Process of the Proposed Approach . . . . . . . . . . . . . . . . . . . . 61 4.4 An Example of the Issue-quality Pair Generated from the First Phase . . . . . . . 62 5.1 Total Number of Dependent Issues with Various Number of Source Issues . . . . . 70 5.2 Dependency Relationship Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 71 vii 6.1 Number of Rules Generated for Each Subgroup SQ . . . . . . . . . . . . . . . . . . 74 6.2 Growth of Rules in Each Iteration for Each Subgroup SQ . . . . . . . . . . . . . . 75 6.3 F-measure of Rules in Each Iteration for Each Subgroup SQ . . . . . . . . . . . . . 76 6.4 Accuracy of Rules in Each Iteration for Each Subgroup SQ . . . . . . . . . . . . . 77 6.5 Metrics in Rule-generating Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.6 Metrics in Non-rule-generating Projects . . . . . . . . . . . . . . . . . . . . . . . . 80 6.7 Average Metrics for All Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.8 Correlation between Average MI and Average Software Maintenance Eort Spent per Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.9 Correlation between Average COCOMO SU Model Factor Ratings and Average Software Maintenance Eort Spent per Version . . . . . . . . . . . . . . . . . . . . 82 6.10 Correlation between Average Percentage of Maintainability Related Issues and Av- erage Software Maintenance Eort Spent per Version . . . . . . . . . . . . . . . . . 82 6.11 Correlation between Changes in Average MI and Changes in Average Software Maintenance Eort Spent between Versions . . . . . . . . . . . . . . . . . . . . . . 83 6.12 Correlation between Changes in Average COCOMO SU Model Factor Ratings and Changes in Average Software Maintenance Eort Spent between Versions . . . . . 83 6.13 Correlation between Changes in Average Percentage of Maintainability Related Issues and Changes in Average Software Maintenance Eort Spent between Versions 84 6.14 Changes in the Percentage of Maintainability Issues over Time per System . . . . . 85 6.15 Average Opened and Resolved Maintainability Related Issues per Introduction Sta- tus of the Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.16 Total Number of Issues in Respective Severity Group . . . . . . . . . . . . . . . . . 87 6.17 Overall Percentage of Software Maintainability Related Issue and non-Software Maintainability Related Issues in Respective Severity Group . . . . . . . . . . . . . 87 6.18 Percentage of Subgroup SQ Contributing to Software Maintainability Related Issue in Respective Severity Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.19 Percentage of Software Maintainability by Issue Fixing Time per Project . . . . . . 89 6.20 Overall Percentage of Software Maintainability by Issue Fixing Time . . . . . . . . 89 6.21 Percentage of Subgroup SQ contributing to Software Maintainability Related Issue by Issue Fixing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 viii 6.22 Overall Percentage of Software Maintainability by Domain . . . . . . . . . . . . . . 91 6.23 Percentage of Subgroup SQ Contributing to Software Maintainability Related Issue by Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.24 Characteristics of the Dependent Issues and Source Issues in Each Subgroup SQ . 93 6.25 Quality Dependency Graph with the Number of Identied Relationships for Each Quality Represented with the Size of the Node . . . . . . . . . . . . . . . . . . . . 94 6.26 Top Quality Dependency Relationships . . . . . . . . . . . . . . . . . . . . . . . . . 95 ix Abstract Beyond the functional requirements of a system, software maintainability is essential for project success. While many metrics for software maintainability have been developed, the eective use of accuracy measures for these metrics has not been observed. Moreover, while there exists a large knowledge base of software maintainability in the forms of ontology and standards, this knowledge is rarely used. Especially in open source ecosystems, due to the large number of developers and ineciency in identifying quality issues, it is extremely dicult to accurately measure and predict software maintainability. In this dissertation, a set of deep analysis of the eectiveness of the current metrics in the open source ecosystems has been reported. Based on the ndings, a novel approach is introduced for better assessment of the overall software maintainability through fuzzy methods and linguistic analysis on issues summaries. Expert input and a large data set of over 60,000 issues summaries found in 22 well-established open source projects from three major open source ecosystems are used to build, validate and evaluate the approach. The results validate the generalizability of the proposed approach in correctly and automati- cally identifying software maintainability related issues. When compared to some state-of-the-art measurements, the proposed approach is a more suitable metric to re ect the eort needed to understand the existing source code and the changes between versions. Further analysis on these projects provides a means for identifying the trend of software maintainability changes as soft- ware evolves. This informs which areas should be focused on to ensure maintainability at dierent x stages of the development and maintenance process. Furthermore, this analysis shows the dier- ences of software maintainability in dierent classications and the relationships among software qualities that contribute to software maintainability. xi Chapter 1 Introduction Due to the rapid growth in the demand for software, releasing software fast and using the least amount of resources have become crucial for software organizations to survive. In order to ac- quire those ingredients, software organizations must perform careful analysis to ensure that their software is highly maintainable. While the primary functionality of a software system is often the main focus, high maintainability is essential to ensure that the product satises performance, cost, and dependability objectives as well as allowing existing software components to be reused in future releases. Poorly maintainable software may result in problems for developers and con- sumers as well as aect the primary functionality [23]. Despite the importance of maintainability in project development, lack of information regarding where these issues occur and how these problems evolve over time impede their resolution. With the increasing rate of change in technology, competition, organizations, and the mar- ketplace [66], a majority of most software systems' life cycle cost has been spent in software maintenance. Koskinen's 2009 survey [60] found that 75-90% of business and command&control software and 50-80% of cyber-physical system software costs are incurred during maintenance. To aid potential adopters of existing software components to be reused and evolved in practice, a number of metrics have been developed [44] to provide ways to measure software maintainability. Most of these involve automated analysis of the source code, such as Maintainability Index [117], 1 technical debt [61][35], code smells [101][120][121], and other Object-Oriented metrics [32][31][75]. There are also reuse cost models that estimate maintainability of potentially reusable components such as code understandability [24][19]. In addition, software quality standards [53][1], readiness level frameworks [92][42], software quality ontology [16][18] and code styles of certain program- ming languages [57][105] are considered as valid approaches to measure and predict software maintainability. 1.1 The Problem Although there are a large number of proposed measurement metrics and approaches, these meth- ods come with some limitations. While the automated code analysis metrics are easy to use and require a relatively low level of human eort, Riaz et.al [86] pointed out that the eective use of accuracy measures for these metrics has not been observed and there is a need to further validate maintainability predic- tion models. Moreover, despite having the advantage of identifying the particular parts of the software most needing maintainability improvement at the module/method level, they do not provide an overall quality status for the current version of the software. Problem identication is valuable; however, a stronger and more thorough understanding of how defects detract from software qualities (SQs) is necessary for the development of software with high SQs. On the other hand, human-assessed methods can more accurately re ect the maintenance eort spent in understanding the existing code base and performing maintenance tasks, yet it is too expen- sive and subjective to collect throughout the software development process [24][96]. Software ontology, standards and frameworks introduce immerse high-level knowledge, which are mostly coming from consensus wisdom, professional discipline and expert sources. They tend to be used in larger organizations as guidelines during the development process. However, it is very dicult to enforce standards on actual program behavior. Moreover, while standardizing the process can 2 help make sure that no steps are skipped, standardizing to an inappropriate process can reduce productivity, and thus leave less time for quality assurance. Especially in smaller organizations and open source ecosystems, it is extremely dicult to apply and enforce these paradigms due to their limited resources and functionality-focused nature [90]. Instead, such organizations consider bug xing time as an estimate of maintenance eort [119]. However, these bug-focused metrics do not provide a systematic understanding of software maintainability. Overall, considering the pros and cons of these metrics and approaches, one may only conclude that they should be synergetic in estimating overall software maintainability and in identifying key parts of the software most needing maintainability improvement, rather than trying to use only a single approach. 1.2 A Solution Therefore, the goal of this dissertation is to design a new approach that can synergize with the existing approaches while providing a way to eectively measure and keep track of the overall maintainability with relatively low human eort and relatively high precision as software evolves. More specically, I propose an approach that utilizes issue summaries generated during soft- ware development with expert knowledge provided in a software maintainability ontology and a set of practice guidelines through natural language processing techniques to provide a more thor- ough understanding of software maintainability. That is, given an issue summary, the proposed approach can map it to one or more software maintainability concerns it expresses based on the meaning of the issue. When applied on all the issue summaries generated during each version of the software, it can assist developers and other stakeholders of an organization to understand the overall maintainability status of the current version of a system, to identify specic subgroup SQ concerns that persist throughout the life cycle, and to keep track of the maintainability changes in its life cycle, thus helping organizations better plan for future releases. 3 1.3 Denitions This section provides denitions of the common terms and phrases used throughout this disser- tation. 1.3.1 Software Maintenance Software maintenance is dened in various software engineering standards in the following ways: IEEE 1219: Software maintenance is dened in the IEEE Standard for Software Mainte- nance, IEEE 1219 [1], as the \modication of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modied environment". The standard also addresses maintenance activities prior to delivery of the software product but only in an information annex of the standard. ISO/IEC 12207: The ISO/IEC 12207 Standard for Life Cycle Processes [52], essentially depicts maintenance as one of the primary life cycle processes and describes maintenance as the process of a software product undergoing \modication to code and associated documentation due to a problem or the need for improvement. The objective is to modify existing software product while preserving its integrity." In addition, ISO/IEC 12207 describes an activity called \Process Implementation", which establishes the maintenance plan and procedures that are later used during the maintenance process. ISO/IEC 14764: The International Standard for Software Maintenance, denes software maintenance in the same terms as ISO/IEC 12207 and places emphasis on the pre-delivery aspects of maintenance such as planning [2]. SWEBOK: The Software Engineering Body of Knowledge denes software maintenance as \the totality of activities required to provide cost-eective support to a software system". Activ- ities are performed during the pre-delivery stage as well as the post-delivery stage. Pre-delivery 4 activities include planning for post-delivery operations, supportability, and logistics determina- tion. Post-delivery activities include software modication, training, and operating a help desk [5]. 1.3.2 Software Maintenance Tasks Since software maintenance is required to ensure the continuity of a software being healthy and meeting all user requirements, it is essential for maintainers to perform tasks of correction and improvement. There are various maintenance tasks dened in existing literature [59][19][36][2]: Corrective Maintenance: Corrective maintenance tasks often refer to the changes associated with addressing errors and faults in the software. These are commonly dened as \bugs". This type of tasks includes modications and updates done in order to correct or x bugs, which are either discovered by users or concluded by user error reports. Adaptive Maintenance: Adaptive maintenance tasks are triggered by changes in the environment the software is in, such as changes in operating systems, hardware, and dependencies. This type of tasks includes modications and updates applied to keep the software product up-to-date and tuned to the changing world of technology and business environments. Perfective Maintenance: Perfective maintenance tasks refers to the evolution of requirements and features in the soft- ware, and are often dened as \feature requests". This type of tasks includes modications and updates done in order to keep the software usable over long period of time. It includes new func- tionality, new user requirements for rening the software and improvements to its reliability and performance. 5 Preventive Maintenance: Preventive maintenance tasks focus on decreasing the failures of the software in the long term. The aim is to attend problems which are not currently signicant but may cause serious issues in future. This type of tasks includes modications and updates to prevent future problems of the software such as refactoring and optimizing source code and updated documentation. 1.3.3 Software Maintainability Software maintainability is the measurement of how maintainable a system is. It is dened as the ease with which changes can be made to a software system. These changes may include the correction of faults, adaptation of the system to a meet a new requirement, addition of new functionality, removal of existing functionality or correction when errors or deciencies occur and can be perfected, adapted or action taken to reduce further maintenance costs [84]. In this dissertation, software maintainability consists two parts - actual software maintain- ability and perceived software maintainability. Actual software maintainability is dened as the re ection of the level of average eort spent on performing any software maintenance tasks in a software system. Perceived software maintainability is dened as the re ection of the level of av- erage eort required to understand a code base in preparation for completing maintenance tasks. Less maintenance eort means higher software maintainability, thus showing the software system is more maintainable. 1.4 Research Questions The main research questions that this dissertation attempts to address, re ecting the proposed solution, are as follows: RQ1: Can the proposed approach eectively identify maintainability related quality concerns from an issue summary? 6 This research question is aiming to investigate whether the proposed approach can correctly and eectively tag an issue summary with one or more maintainability SQ concerns it is expressing. More specically, I focus on examining to what level the proposed approach can be generalized in a large-scale empirical study on nine open source systems. RQ2: How does the proposed approach perform when compared to some of the state-of-the-art approaches in measuring the software maintainability? This research question focuses on examining how accurately the proposed approach can re ect the actual and perceived software maintainability when compared with Maintainability Index and COCOMO II Software Understandability model. In addition, the research question investigates how these metrics relate to the changes in software maintainability from one version to the other. These are answered using data collected from a 3-semester long controlled experiment that was conducted at University of Southern California. RQ3: As software evolves, how does its maintainability change? This research question aims at investigating to what extent the common wisdom suggesting \Declining Quality" [66][39][125] applies in the context of maintainability; whether as software evolves, maintainability decreases or increases. Specically, I study how the percentage of main- tainability issues changes to understand whether software maintainability rises, or whether it decreases as software grows mature and stable. To this aim, I further investigate the dominant subgroup SQs contributing to software maintainability, to understand which subgroup SQs tend to have more issues in the earlier phase, middle phase and later phase of the life cycle. Moreover, I aim to explore under which circumstances maintainability related quality concerns are more prone to be introduced. I focus on factors that are indicated as possible causes for maintainability issues introduction such as the introduction status of the issue (e.g., Are there more maintain- ability related issues found after a major version release?) and the life-cycle phase status (e.g., Are there more maintainability related issues found in the later phase of a system?) to examine the patterns. 7 RQ4: How are software maintainability issues expressed in dierent classications? This research question addresses the dierences in maintainability changes found in systems belonging to dierent domains (e.g., Are server software more maintainable than client software?); the dierences in issues with dierent levels of severity (e.g., Do more severe issues contain more maintainability issues?); and the dierences in issues with dierent x times (e.g., Do quick-xed issues contain more maintainability issues than slow-xed issues?). RQ5: How do the maintainability subgroups SQs relate to each other? This research question focuses on examining the relationships among the subgroup SQs through issue dependency. Since some issues depend on other issues (for example, Issue A causes or blocks the resolution of Issue B), I look at the patterns of what SQs are expressed by these related issues. The rest of this dissertation is organized as follows. Chapter 2 summarizes related work and Chapter 3 discusses some state-of-art approaches while presenting the dierences of those compared to this study. Chapter 4 describes the overall process of the proposed approach. The design of the experiment is reported in Chapter 5 and the analysis of the results in Chapter 6. Chapter 7 discusses the results, states the threats to validity and provides several valuable ndings for the research community. Chapter 8 concludes the dissertation. 8 Chapter 2 Related Work Software maintenance and maintainability has attracted increasing attention from the software engineering research community in recent years. A number of studies have been published to address software maintenance and maintainability related problems. This chapter presents a review of software maintainability metrics in Section 2.1, bug characteristics in Section 2.2, and the usage of natural language processing in software engineering in Section 2.3. 2.1 Software Maintainability Boehm et al. [16] provided an IDEF5 class hierarchy of upper-level SQs, where the top level re ected classes of stakeholder value propositions (Mission Eectiveness, Resource Utilization, Dependability, Flexibility), and the next level identied means-ends enablers of the higher-level SQs. Their studies provided the denitions, examples of the application to maintainability. Based on the ontology study, they further summarized resulting changes in the SQ ontology, and also provides examples of Maintainability need and use, quantitative relations where available, and summaries and references on improved practices. Kitchenman et al. [58] developed a preliminary ontology in the form of a UML model with the purpose of identifying contextual factors that in uence the results of empirical studies of maintenance such as product age, application domain, and product and artifact quality. They 9 also identied two dierent types of maintenance process. One is used by individual maintenance engineers to implement a specic modication request, and the other one is used at the orga- nization level to manage the stream of maintenance requests from customers/clients, users and maintenance engineers. However, the ontology presented is not complete nor fully evaluated. Baggen et al. [11] provided an overview of the approach that uses a standardized measurement model based on the ISO/IEC 9126 denition of maintainability and source code metrics. These metrics include volume, redundancy, complexity and more. Antonellis et al. [7] proposed a methodology that uses data mining to evaluate a software sys- tems maintainability according to the ISO/IEC-9126 quality standard. Metrics used in this work include Weighted Methods Per Class, Data Access Metric, Depth Of Inheritance Tree and more. They extracted elements and metrics from open source projects that are written in Java. Then weights were assigned to the selected metrics with the purpose of re ecting how important they are in the quality standard. With the data collected, clusters were built to derive maintainability values. Zhuo et al. [133] compared seven software maintainability assessment models to examine the eective use in measuring software maintainability. Eight software systems were used for initial construction and calibration of the automated assessment models, and an additional six software systems were used for testing the results. A comparison was made between expert software engi- neers' subjective assessment of the 14 individual software systems and the maintainability indices calculated by the seven models based on complexity metrics automatically derived from those systems. Initial tests show very high correlations between the automated assessment techniques and the subjective expert evaluations. Heitlager et al. [48] discussed several problems with the Maintainability Index and identied a number of requirements to be fullled by a maintainability model to be usable in practice. They pointed out the shortcomings of the original MI approach, including that the result does 10 not provide clues on what characteristics of maintainability have contributed to that value, nor on what action to take to improve this value. Maintainability Index (MI) is the most widely used metrics to quantify maintainability of any software projects. It was rst introduced by Oman in 1992 [81]. The idea is to combine three code- related metrics for measuring the maintainability of a given system into a single index. Throughout the years, the original formula evolved into a few dierent versions, including Microsoft Visual Studio MI, Software Engineering Institute MI [112], and the revisited MI [117]. Sjberg [100] stated that most of the famous software maintenance metrics including the original formula of MI are overrated and they may not re ect future maintenance eort. Senousy [98] constructed a correlations and weights analysis of MI on OSS Linux Kernel Modules. By using correlation coecient model, they practically proved that there exists a relation between Line of Code, Cyclomatic Complexity, Halstead Volume and Maintainability Index. Out of the four factors, the most important parameter is Line of Code with weight value about 76%. Chen et al. [24] evaluated the software maintainability versus a set of human-evaluation factors used in the Constructive Cost Model II (COCOMO II) Software Understandability (SU) metric by conducting a controlled experiment on humans assessing SU and performing change-request modications on open source software (OSS) projects. Zhou and Xu [132] report the ability of 15 design metrics to predict how maintainable a system is, based on 148 java open source projects. They found the average control ow complexity per method to be the most important maintainability factor, and cohesion and coupling have weak impact on maintainability. Li and Henry [67] use combination of metrics from source code to predict the number of lines changed per class as maintenance eort of two commercial object-oriented software systems. Kumar et al. [63] investigate 11 dierent types of source code metrics in an empirical study to develop a maintainability prediction model for Service-Oriented software and compare their model 11 with the Multivariate Linear Regression (MLR) and Support Vector Machine (SVM) approaches. They found that a using a smaller set of source code metrics performed better than when they used all of the available metrics. Jindal et al. [55] analyzed defect descriptions mined from defect reports to predict the amount of maintainability eort associated with defects. They used Multi-nominal Multivariate Logistic Regression for model prediction and validated their results using the 'Camera' Android operating system application. They generated defect reports from change logs between two subsequent releases which were used in their model. They found that the performance of the model increased with increasing number of words for classication. 2.2 Bugs Characteristics Tan et al. [106] studied software bug characteristics by sampling 2,060 real world bugs in the Linux kernel, Mozilla, and Apache projects from three dimensions|root causes, impacts, and components. They also analyze the correlation between categories in dierent dimensions, and the trend of dierent types of bugs. Sahoo et al. [89] analyzed the software bugs based on three characteristics: observed symptoms, reproducibility, and the number of inputs needed to trigger the symptom. Their ndings are mainly involved in implications for automated bug diagnosis. Ocariza et al. [79] [80] performed an empirical study of bug reports to understand the root causes and impact of JavaScript faults and how the results can impact JavaScript programmers, testers, and tool developers. Lu et al. [69] presented a comprehensive study on real world concurrency bug characteristics by examining 105 randomly selected real world concurrency bugs in 4 representative server and client open-source applications. Their study reveals several interesting ndings and provides useful guidance for concurrency bug detection, testing, and concurrent programming language design, for example, they found that around one third of the examined non-deadlock concurrency bugs 12 are caused by violation to programmers' order intentions, which may not be easily expressed via synchronization primitives like locks and transactional memories. Li et al. [68] collected 709 bugs including security related and concurrency bugs. They analyzed the characteristics of those bugs in terms of root causes, impacts and software components. Their ndings reveal characteristics of memory bugs, semantic bugs, security bugs, GUI bugs, and concurrency bugs. They veried their analysis results on the automatic classication results by using text classication and information retrieval techniques. Jin et al. [54] conducted an empirical study on performance bugs by examining 109 real world bugs. Their ndings provide guidance for future work to avoid, expose, detect, and x performance bugs. Wang et al. [114] studied 57 real bugs from open-source Node.js applications. They analyzed bug patterns, root causes, bug impact, bug manifestation, and x strategies. Their results reveal ndings about future concurrency bug detection, testing, and automated xing in Node.js. Yin et al. [123] studied the characteristics of bugs in open source router software. They evaluated the root cause of bugs, ease of diagnosis and detectability, ease of prevention and avoidance, and their eect on network behavior. Zhou et al. [130] performed an empirical study on 72 Android and desktop projects. They studied how severity changes and quantify dierences between classes in terms of bug-xing at- tributes. Zhang et al. [129] investigated bug reports, comparing dierences in characteristics between reports in desktop software and mobile applications hosted on GitHub. They reported that bug reports for desktop software were longer on average than those for mobile apps and that mobile app bug reports had more main elements (stack traces, code examples, or patches) than desktop bug reports. They found that average xing time was shorter for mobile applications which they attribute to having a higher percentage of main elements relative to desktop software. Zhou et al. [131] propose an approach to binary classication of bug reports into `bug' and `nonbug' by leveraging text mining and data mining techniques. Analyzing the summary and some structured features including severity, priority, component, and operating systems, they use 13 Bayesian Net Classier as the machine learner. They performed an empirical study of 10 open source projects to validate their method and provide a MyLyn plugin prototype system that will classify given reports. Chaparro et al. [3] analyzed bug reports from nine systems and found that a large percentage of bug reports lack Steps to Reproduce (S2R) and Expected Behavior (EB) information. They in turn developed an automated approach to detecting missing S2R and EB from bug reports. They produced three versions using regular expressions, heuristics and natural language processing, and machine learning. They found their machine learning version to be the most accurate with respect to F1 score, but the regular expressions and heuristics and natural language processing approaches had similar accuracy results without training. 2.3 Natural Language Processing in Software Engineering Shi et al. [99] developed a method using fuzzy rules generated from natural language processing techniques to automatically categorize feature requests from open software domains. They iden- tied recurring lexical, syntax, and semantic patterns to generate their initial set of rules which were then repeatedly improved using more datasets to nd additional rules or adjust weights. They found success with classication in nine open source projects and found that performance improved when using fuzzy rules on machine learning methods. Bakar et al.[12] proposed Feature Extraction for Reuse of Natural Language requirements (FENL), an approach that uses software reviews as a means of extracting software features when Software Requirement Specications are unavailable. Their method is able to automatically ex- tract software features from lemmatized and POS tagged text scraped from software reviews. The extracted phrases would be used to initiate Requirements Reuse. Runeson et al. [88] presented an approach to help automate detection of duplicate defect reports based on the core functionality of the ReqSimile tool. Their approach used tokenization, 14 stemming and stop words removal techniques to the raw texts. Then synonym replacement and spell checking was conducted prior to measuring the similarity value between two defect reports. Hayes et al. [47] proposed an automatic approach that identies potential or candidate links in software traceability. Information retrieval techniques were used in their model for requirements tracing. To be more specic, vector space model with term-frequency-inverse document frequency (tf-idf) term weighting and latent semantic indexing were used. Menzies et al. [74] developed an automated method that uses text mining and machine learning techniques to assign severity levels to issue reports and performed a case study on Project and Issue Tracking Systems provided by NASA. They used tokenization, stop word removal, stemming, TF*idf weights, and InfoGain measures to limit the number of attributes provided to the machine learner. Provided there were sucient numbers of example reports for a given severity level, their approach performed with high f-measures even when adjusting the number of provided attributes. Chawla et al. [22] proposed an automated bug labeling approach that uses fuzzy logic to categorize issue into either \bug" or \other" requests. They conducted an empirical study on three open source systems and compared the results with the LDA based machine learning approach. The results show an improved accuracy and f-measure when compared to existing approaches, thus shown that fuzzy logic is a promising direction in task of bug categorization. 15 Chapter 3 The Analysis of Current Techniques This chapter summarizes the current state-of-the-art techniques that are used to measure software maintainability in practice. Through several controlled experiments and a systematic literature review, a set of conclusions has been reached to serve as the motivation and the literature support of this dissertation. 3.1 The Classication of Software Maintainability Metrics With a large number of developed software maintainability metrics, I categorize these metrics into the following classications: Automatic Code Metrics Automatic code metrics involve analyzing source code and quantifying software maintainability into numeric results. These metrics include Maintainability Index, Cyclomatic Complexity, lines of code, technical debt, code smells, and Object-Oriented metrics such as weighted methods per class, lack of cohesion in methods, coupling between objects, decoupling level, and more. Human-Assessed Metrics Human-assessed metrics include reuse cost models that estimate maintainability of potentially reusable components based on human-assessed maintainability aspects, such as code understand- ability and structure, code styles, documentation-related metrics and developer-related metrics. 16 Software Process Improvement Software Process Improvement methodology is dened as denitions of sequence of tasks, tools and techniques to be performed to plan and implement improvement activities [9]. This type of maintainability metrics includes software standards for software maintainability, quality ontology and maintainability readiness level measurement. 3.2 The Eectiveness of Maintainability Index I conducted a comparison analysis [27] on 97 open source projects written in dierent programming languages and dierent software domains with the purpose of identifying whether MI can be used to eectively compare across projects that are written in dierent programming languages and software domains. Maintainability Index (MI) is the most widely used metric to quantify maintainability of software systems. It was rst introduced by Oman in 1992 [81]. The idea is to combine three code- related metrics for measuring the maintainability of a given system into a single index. Throughout the years, the original formula evolved into a few dierent versions, including Microsoft Visual Studio MI, Software Engineering Institute MI [112], and the revisited MI [117]. MI is a composite metric formed by the following metrics from the source code: Halstead Volume (HV ) Cyclomatic complexity (CC ) Count of lines (LLOC ) Percent of lines of comments (CM ) With the results from those metrics, MI without comments per source le (MIwoc) and MI with comments per source le (MIwc) are calculated from the following formulas: 17 MIwoc (Sourcefile) = 171 5:2ln(HV ) 0:23CC 16:2ln(LLOC) MIwc (Sourcefile) = 50sin( p 2:46CM) With the measurements of above two formulas, MI per source le can be calculated: MI (Sourcefile) =MIwoc (Sourcefile) +MIwc (Sourcefile) Total MI of a system then is calculated with the following formula: MI = P (MI (Sourcefile) ) NumberofSourcefiles The top popular languages that are commonly used are Java, C/C++, Python, C#/.Net, and PHP 1 . Due to the availability of reliable metrics tools, Java, Python and PHP are chosen in this analysis. The repository used in this study is SourceForge, which contains more than 430,000 projects in various programming languages and more than 10 domains. 5 top software domains are listed below: Web Development Framework System Administration Software Software Testing Tools Security/Cryptography 1 Results based on the comparison from https://www.tiobe.com/tiobe-index/. Data were collected to generate the relative popularity of programming languages 18 Table 3.1: Classication on Number of Projects by LLOC in Each Domain Category [1,1000] [1000,5000] [5001,10000] >10,000 Web Development Framework 0 2 4 18 System Administration Software 6 4 3 5 Software Testing Tools 2 9 5 3 Security/Cryptography 7 6 4 1 Audio and Video 2 4 3 9 Audio and Video The above 5 domains contain sucient numbers of projects for three languages, have very high numbers of downloads and user bases, and requires a great deal of eort to maintain. Table 3.1 shows the characteristics of the selected projects in each domain. Due to the large amount of projects available for above domains, the data collection process involves establishing and applying consistent criteria for inclusion of well-known projects. Projects that are no longer open source or projects that have empty git/cvs/svn repository are excluded. Source code under test, example, sample and tutorial folders are also excluded. Projects that fall under all of the following criteria are considered: Has more than one ocial releases The latest stable release Has well-established sizing Well-presented in the community Has fully accessible source code 19 Table 3.2: Characteristics of Project Data Sources Language Number of Projects Average LLOC Metrics Collection Tools PHP 32 18643 Phpmetrics Java 32 33871 CodePro, Locmetrics Python 33 6644 Radon The various metrics are collected by using existing tools to gather information of Cyclomatic Complexity, Halstead Metrics, and size of the source programs and then normalized by the authors for calculating the maintainability index. The size metrics are based on the Logical Lines of Code (LLOC) or Logical SLOC denition, which is adopted from the Software Engineering Institute [82] and adopted into the denition checklist for source statement counts [19]. An executable statement is dened as one logical line of code, while non-executable statements such as blanks and comments are excluded. To avoid the inconsistency of dierent tools per language, MI is calculated from the revised formula for each project. One-way ANOVA was used to test for MI dierences among three languages, which includes MI without comments (MIwoc), MI with comments (MIwc) and the overall Maintainability Index (MI ). Table 3.3 shows the output of the one-way ANOVA analysis. From Table 3.3, MIwoc for dierent languages diered signicantly across the three groups, F (2; 94) = 2:544;p = 0:084(p< 0:1). The signicance level is 0.084, which is below 0.1. Therefore, there is a statistically signicant dierence in the mean MIwoc among dierent languages. MIwc for dierent languages diered signicantly across the three groups as well,F (2; 94) = 4:3:069;p = 0:051(p< 0:1). The signicance level is 0.051, which is below 0.1. Therefore, there is a statistically signicant dierence in the mean MIwc among dierent languages. Additionally, MI for dierent languages diered signicantly across the three groups, F (2; 94) = 2:614;p = 0:079(p< 0:1). The 20 Table 3.3: One-way ANOVA Results for Language Analysis ANOVA Sum of Squares df Mean Square F Sig. MIwoc Between Groups 844.599 2 422.299 2.544 0.084 Within Groups 15602.788 94 165.987 Total 16447.386 96 MIwc Between Groups 589.095 2 294.548 3.069 0.051 Within Groups 9022.420 94 95.983 Total 9611.516 96 MI Between Groups 1044.871 2 522.435 2.614 0.079 Within Groups 18783.525 94 199.825 Total 19828.395 96 signicance level is 0.079, which is below 0.1. Therefore, the results strongly suggest that there is a dierence in the mean overall MI among dierent languages. Figures 3.1, 3.2 and 3.3 show the mean values of MI, MIwoc and MIwc by languages. We can observe that there are dierences among the languages in the maximum and median values. For all three languages, PHP has shown the highest medians for MI and MIwoc, which Java has shown the highest medians for MIwc. Concerning the maximum values, PHP presents the highest value for MI, Python presents the highest value for MIwoc and Java presents the highest value for MIwc. On contrary, Java has the lowest median value for MIwoc while Python has the lowest median values for MI and MIwc. Thus, we have enough evidence to accept that MI diers across the three dierent languages. Overall, PHP is the most maintainable language among the three solely based on MI. 21 Figure 3.1: MI by Languages Moreover, one-way ANOVA was also used to test for MI dierences among ve domains, which includes MI without comments (MIwoc), MI with comments (MIwc) and the overall Maintain- ability Index (MI). Table 3.4 shows the output of the one-way ANOVA analysis and whether a statistically signicant dierence exists among the group means. From Table 3.4, MIwoc for dierent domains diered signicantly across the ve groups, F (4; 92) = 2:378;p = 0:057(p< 0:1). The signicance level is 0.057, which is below 0.1. Therefore, Table 3.4: One-way ANOVA Results for Domain Analysis ANOVA Sum of Squares df Mean Square F Sig. MIwoc Between Groups 1541.295 4 385.324 2.378 0.057 Within Groups 14906.092 92 162.023 Total 16447.386 96 MIwc Between Groups 741.498 4 185.374 1.923 0.113 Within Groups 8870.018 92 96.413 Total 9611.516 96 MI Between Groups 3221.732 4 805.433 4.462 0.002 Within Groups 16606.663 92 180.507 Total 19828.395 96 22 Figure 3.2: MIwoc by Languages there is a statistically signicant dierence in the mean MIwoc among dierent domains. MIwc for dierent domains diered less signicantly across the three groups, F (4; 92) = 1:923;p = 0:113(p> 0:1). The signicance level is 0.113, which is slightly above 0.1. Therefore, there has no statistically signicant dierence in the mean MIwc among dierent domains. Additionally, MI for dierent domains diered signicantly across the three groups,F (4; 92) = 4:462;p = 0:002(p< 0:1). The signicance level is 0.002, which is below 0.1. Therefore, the results strongly suggest that there is a dierence in the mean overall MI among dierent domains. Figure 3.4 shows the boxplot of the mean values of MI with each domain. We can observe that there are dierences among domains in the maximum and median values. For all ve domains, Web Development Framework has shown the highest medians and the highest maximum value. On the other side, Audio and Video has both the lowest maximum value and the lowest median value. Overall MI for each domain per language is also illustrated in Figure 3.5. Since MI is a composite metric, it is hard to determine which of the metrics cause a particular total value for MI. Moreover, the dierent metric values depend on the type of programming language, domains, the programmers, and the perception of the quality of code, thus making it 23 Figure 3.3: MIwc by Languages dicult to compare the results without addressing these dierences thus even though the results show MI values dier statistically among programming languages and domains. Another study [25] was conducted to compare the maintenance eort with MI on Unied Code Count (UCC) 2 to determine whether MI can eectively determine the maintainability of code within a single development environment. UCC is a data analysis tool that provides SLOC counting metrics for about 30 programming languages, such as logical SLOC [82] and cyclomatic complexity [72]. UCC is an object-oriented software programmed in C++, that reads input source les, parses the les against syntax rules, and outputs reports with counting results. Since the development environment faces high person- nel turnover [50], maintenance tasks are scoped to complete within or reach dened milestones within 4-month time-boxed increments. UCC's modularized architecture attempts to reduce the number of modied modules with the addition of new features - especially language counters. The language counters are similar in structure and outputs, but the specic outputs and algorithms dier depending on the language's 2 ucc.usc.edu 24 Figure 3.4: Overall MI over Domains Figure 3.5: Overall MI for Dierent Languages within Each Project Domain 25 syntax. For example, though most of the language counters return Directive SLOC results, the HTML language counter does not as it is not applicable to the HTML programming language. USC releases an updated UCC annually with new language parsers, additional features, and/or additional metrics. Data for UCC's project set came from the developed code, weekly timesheets, test case documentation with corresponding test data, and explanatory reports summarizing the steps taken and the results of projects that began and completed between 2010 and 2014. This analysis requires projects that lasted 3 or more increments, to track the changes and eects of maintainability; coinciding with the subset of UCC's projects used in [49]. Table 3.5 displays the Pearson Correlation test results for MI and eort of the UCC projects, and the results conclude that the correlation between MI and eort is nonexistent (the correlation coecient R is almost 0). Figure 3.6 visually displays the projects' eort plotted against MI, conrming that a correlation between the two does not exist. Table 3.5: Pearson Correlation Results between Eort and MI for UCC Projects R 0.007 R-Square 0.056 p-value 0.978 Since eort is highly aected by the size of the software [118], MI was compared to developers' productivity while working on the maintenance tasks. Similar to eort, the correlation between MI and productivity is weak (the correlation coecient R is 0.033). Table 3.6 contains the Pearson Correlation test results, and Figure 3.7 visually displays the lack of correlation between MI and productivity. Table 3.6: Pearson Correlation Results between Productivity and MI for UCC Projects R 0.033 R-Square 0.055 p-value 0.889 26 Figure 3.6: UCC Projects' Eort versus Maintainability Index If MI eectively indicates the maintainability level of the source code, the correlation between MI and eort should be strongly negative. As the source is more maintainable (MI is higher), less eort is required to complete a maintenance task. One major reason MI does not eectively indicate the maintainability of UCC's source code, is that the language parsers are similar to each other in structure. Cyclomatic Complexity counts the number of decision branches in the source code [72], and therefore, heavily depends on the structure of the code. Summary of the results. MI may not be a very clear and comprehensive measurement to indicate the overall quality of the code due to its composite natural and dependency on program- ming languages and software domains. Thus, it is dicult for developers to know the exact eort needed to add or change the existing code when the metrics output only a numeric value that is not comparable across systems written in dierent languages. In addition, automated metrics that evaluate the structure of the source code, like Cyclomatic Complexity, do not change much from one version to another for some software systems; making such automated metrics not useful to those development environments. 27 Figure 3.7: UCC Projects' Productivity versus Maintainability Index 3.3 The Eectiveness of Human-assessed Metrics Human evaluated COCOMO II Software Understandability (SU) metrics are assessed on candi- date reusable software components across small and large open-source and conventional software- development systems [24] to examine whether COCOMO II SU factors can accurately re ect to software maintenance eort. This study is conducted through a controlled experiment with 11 open source software projects and six graduate students at University of Southern California. The project selection process involves establishing and applying consistent criteria to ensure the quality of this experiment. Projects that are no longer open source or projects that have empty git/cvs/svn repositories are excluded. Projects that fall under all of the following criteria are considered: The latest stable release is available. The size of the source code is relatively reasonable for graduate level students to learn and understand individually. The source code is fully accessible. 28 Table 3.7: Rating Scale for Software Understanding Increment (SU) Factor Very Low Low Nominal High Very High Structure Very low cohesion, high coupling, spaghetti code. Moderately- low cohesion, high coupling. Reasonably well- structured; some weak areas. High cohesion, low coupling. Strong modularity, information hiding in data/control structures Application Clarity No Match between program an applica- tion world- views. Some correlation between program and application. Moderate correlation between program and application. Good correlation between program and application. Clear match between program and applica- tion world- views. Self-Descriptiveness Obscure code; documenta- tion miss- ing, obscure or obsolete. Some code commen- tary and headers; some useful documen- tation. Moderate level of code commen- tary, headers, documen- tation. Good code commen- tary and headers; useful docu- mentation; some weak areas. Self- descrip- tive code; documenta- tion up-to- date, well- organized, with design rationale. The online issue tracking system is active and up-to-date. Table 3.8 lists the characteristics of the selected projects. There are more projects that meet the above criteria, however they can not all be included into this experiment. Participants are carefully screened and recruited from 38 applications, based on their program- ming skills in Java and PHP. Table 3.8: Characteristics of Project Data Sources Language Number of Projects Average LOC Java 6 35,200 PHP 5 67,145 29 Table 3.9: Examples of Maintenance Tasks Project Task Description Type DocFetcher Search tabs within one window (more than one search at the same time) Feature Request PhpUnit Abstract class inheritance issues Bug At the beginning of the experiment, each student is required to report his/her industrial experience in a rating from 1 to 5, 1 being extremely inexperienced, 5 being extremely experienced. Over half of the students have at least some levels of industrial experience, including internships and entry-level full-time software engineer jobs at large corporations. One student has experience working in three start-up companies as lead software engineer. However, none of the students have more than ve years of industrial experience. Figure 3.8 shows the distribution of students' industrial experience and Figure 3.9 shows the experience ratings in details. Figure 3.8: Pie Chart of Students' Industrial Experience Students are asked to perform maintenance tasks, including xing bugs and implementing new feature requests, which are found on each project's corresponding issue tracking website, either Jira 3 or Bugzilla 4 . Students are asked to install time tracking plug-ins (e.g. WakaTime) on their IDEs so that eort spent on each task could be recorded accurately. Each student spends four weeks on one project and one week per task. There are 44 tasks in total. Tasks are assigned to students randomly and a task can be assigned to multiple students. Students are asked to work 3 https://jira.atlassian.com 4 https://www.bugzilla.org/ 30 Figure 3.9: Experience Ratings from Students' Personnel Questionnaire individually on these tasks. At the end of each week, students are responsible to report eorts spent on the task and answer a questionnaire that consisted a list of questions, which are derived from the COCOMO II SU factors. The answers to those questions are ratings from 1 to 10, 1 being extremely poor and 10 being extremely well. Students are also asked to provide rationale to the ratings they give to each question. The questions and their corresponding COCOMO II SU factors are as follows: Structure: How well are the codes organized? How well are the classes dened in terms of class structure? How well are the variables named? How well are the classes named? Are the classes highly coupled? Application Clarity: How well does the software match its application world-views? Are you able to understand the features as described? 31 Table 3.10: Projects and Tasks Distribution Student 1 2 3 4 5 6 Number of Projects 11 5 3 4 5 11 Number of Finished Tasks 35 12 9 7 13 31 Number of Unnished Tasks 9 8 3 9 7 13 Number of Total Tasks 44 20 12 16 20 44 % of Finished Tasks 79.55% 60.00% 75.00% 43.75% 65.00% 70.45% Self-Descriptiveness: How good are the comments? Are there sucient meaningful com- ments within the source code? How self-descriptive are the codes? How well is the docu- mentation written? Does the software have sucient documentation to describe interfaces, size, or performance? How well does the current documentation match the current software? If a student could not nish the assigned task, the student has the option to either continue working on the same task in the following week or abandoning the task. Students are asked to only submit the report after nishing a task. Each student has a dierent total number of assigned projects and tasks based on their availability and experience. Table 3.10 lists the details of the number of projects and tasks each student worked on. Students are required to rate each nished task on a diculty rating from 1 to 5, 1 being extremely easy and 5 being extremely hard. An algorithm 1 is developed to calculate the ratings of the COCOMO II SU factors for all projects. Since students have various levels of experience, in order to keep the consistency of under- standing and avoid bias that might be introduced during the experiment, their experience ratings and task diculty ratings are used as weights when calculating the SU ratings of each project. Since a task could be assigned to multiple students, the average ratings from all students who 32 Algorithm 1 An Algorithm of Calculating Final Ratings for COCOMO II SU Factors Weight each question rating by task diculty rating and student experience rating: Rating Question =OriginalRating Rating TaskDifficulty Rating StudentExperience (3.1) Calculate a COCOMO II SU factor rating per week: Rating SUperWeek = P NumOfQuestions n=1 (Rating Question ) n NumOfQuestions (3.2) Calculate a COCOMO II SU factor rating per project: Rating SUperProject = P NumOfWeeks n=1 (Rating SUperWeek ) n NumOfWeeks (3.3) Repeat above steps for all COCOMO II SU factors and all projects. Normalize the project ratings to a scale of 0 to 10: NormalizedR 1 = (Rating SUperProject ) 1 min (Rating SUperProject ) max (Rating SUperProject ) min (Rating SUperProject ) 10 (3.4) NormalizedR i = (Rating SUperProject ) i NormalizedR i1 (Rating SUperProject ) i1 (3.5) where (Rating SUperProject ) 1 is the rst not minimum data point in the dataset, Rating SUperProject are all the project level COCOMO II SU factor ratings and NormalizedR i is the i th normalized data. completed the task are calculated as the nal rating of the task. Each project nal SU rating does not come solely from one student, but is the average of all the ratings given by students who worked on that project. Given a project and a set of maintenance tasks, rst all three SU factors are calculated for each task. For each SU factor, I collect the ratings submitted from students who worked on the task. Then I use the task diculty ratings and student experience ratings as weights to calculate the adjusted SU factor ratings. The nal SU rating of a task is the average of all the adjusted SU factor ratings of the task. Rating Task = 1 X n=1 StudentRating Rating TaskDifficulty Rating StudentExperience n =n (3.6) 33 After I calculated all three SU factors for each task using the above equation, the SU factors for the given project are calculated by taking the average of SU factors ratings of all the maintenance tasks. Rating Project = 1 X n=1 Rating Taskn =n (3.7) Once the above steps are repeated until all the factor ratings for all projects are obtained, I performed normalization on the data into a scale between 0 to 10 so that the results are more concise and comparable. NormalizedR 1 = (Rating SUperProject ) 1 min (Rating SUperProject ) max (Rating SUperProject ) min (Rating SUperProject ) 10 (3.8) NormalizedR i = (Rating SUperProject ) i NormalizedR i1 (Rating SUperProject ) i1 (3.9) where (Rating SUperProject ) 1 is the rst not minimum data point in the dataset,Rating SUperProject are all the project level COCOMO II SU factor ratings and NormalizedR i is the i th normalized data. Pearson's correlation and condent interval analysis were used to examine the relationship between metrics and eort spent. Figures 3.10,3.11 and 3.12 visualize these results. Summary of the results. All COCOMO II SU factors show signicant re ection on average maintenance eort spent for human-assessed maintainability metrics. These empirical results show that human-assessed methods can accurately re ect the actual maintenance eort spent in maintenance tasks, yet it could be too expensive and subjective to collect throughout software development. 34 Figure 3.10: Correlation between SU Factors and Average Eort 3.4 A Systematic Literature Review: Software Maintenance in Open Source Software With its voluntary nature, developers are not required to sign contracts nor go through any rigorous interview processes before starting contributing to an OSS project. This extremely high personal turnover rate makes it dicult to keep track of the development eort investing in maintaining an OSS project. Moreover, often times, OSS projects do not have any specic and detailed project plans, schedules, road maps or a list of deliverables at the beginning of its life cycle. Even after the software has been released, the actual eort still remains unknown. In order to study how open source software (OSS) projects measure and estimate software maintenance eort and software maintainability, I have conducted a complete systematic literature review [119]. 35 Figure 3.11: Correlation between Factors within Structure and Average Eort With the purpose of identifying the current state of the art of the existing maintenance eort estimation approaches for OSS, we performed a systematic literature review on the relevant studies published in the period between 2000-2015 by both automatic and manual searches from dierent sources. In total, 29 out of 3,312 papers were selected and analyzed in this study. The following electronic databases were selected for the automated search because most of the publication venues that published papers in the maintenance eort estimation are indexed by these databases: Inspec Compendex IEEE Xplore ACM Digital Library Science Direct 36 Figure 3.12: Correlation between Factors within Self-Descriptiveness and Average Eort The overall process of the study selection is described in Figure 3.13. The overall search process consists two stages: the initial search stage and the secondary search stage. During the initial search stage, we used the proposed search terms to search for primary candidate studies in each literature resource separately, based on the title, abstract and keywords. All the searches were limited to articles published from January 1 st 2000 to December 31 th 2015. Then we merged the search results by removing duplicated articles. During the secondary stage, we performed snowballing technique [10] based on the search studies in the initial search stage to identify more relevant studies. Whenever a highly relevant paper was found, we added it to the set of the primary relevant studies. A set of research questions are proposed and answered in order to analyze the selected primary studies. Table 3.11 presents a brief mapping between each primary studys properties and RQs. 37 Figure 3.13: Number of Included Primary Studies During the Study Selection Process RQ1: What evidence is there for maintenance eort estimation techniques/methods for OSS projects? To identify the classication of current research focus, and identify the related research type. RQ2: What metrics related to OSS development records are extracted for maintenance eort estimation and how can they be classied? To identify software metrics commonly used in OSS maintenance eort estimation. RQ3: What are common projects and the size of dataset used as study cases in OSS maintenance eort estimation, and how has the frequency of approaches related to the size of dataset? To identify the datset used in the studies and investigate the studies external validity. RQ4: What methods/approaches are used to estimate actual project maintenance eort (including those from the usual incomplete OSS development records)? To identify trends and possible opportunities for estimation method focus. RQ5: What is the overall estimation accuracy of OSS maintenance eort estimation? To identify to what extent the studies provide accurate prediction. Summary of the results. The commonly used OSS maintenance eort estimation methods are actual eort estimation and maintenance activity time prediction; the most commonly used 38 Table 3.11: The Mapping between Extracted Properties and RQs Property RQ Topic RQ1, RQ4 Research Type RQ1 Metrics/Factors RQ2 Project RQ3 Size of dataset Estimation Approaches RQ4, RQ5 Accuracy Metrics Accuracy Level metrics and factors for actual eort estimation were source code measurements and people related metrics; the most commonly mentioned activity for maintenance activity time prediction was bug xing. With the growth of more companies developing or collaborating with OSS projects, estimating maintenance eort has become a major interest. More researchers have been focusing on improving the estimation towards the direct eort of OSS projects from both people and activity aspects by developing maintenance eort estimation methods. However, since most OSS projects lack complete development records and actual eort data, it is very dicult to evaluate and validate the results of these methods by comparing the estimated results with the actual eort. This can be a signicant threat to these estimation methods and raises the risks to eectively validating of their results. This raises the need for new evaluation methods that can validate the correctness of these estimation methods. To mitigate the diculty of acquiring actual eort data from incomplete development records, various studies have focused on predicting the size-related metrics to indirectly estimate the maintenance eort [124][128]. The strong correlation between size-related metrics and the actual eort has been conrmed in closed-source projects [77]. However there still exists a gap between size-related metrics and time-aware eort for maintaining OSS projects. There is a need for studies that can quantitatively infer OSS maintenance eort from size-related metrics. Furthermore, the eort drivers used in general maintenance eort estimation models can serve as an example to improve OSS maintenance eort estimation. For example, Nguyen [77] developed an extension to 39 COCOMO II [19] size and eort estimation models to capture various characteristics of software maintenance in general through a number of enhancements to the COCOMO II models to support the cost estimation of software maintenance. Some eort drivers such as DATA (Database size), CPLX (Product Complexity), and PVOL (Platform Volatility) in his study might contribute greatly to OSS maintenance eort estimation. Although most OSS projects rely on task or issue tracking systems to maintain the projects, recognizing the time of specic maintenance tasks can provide better decision support for task assignment as well as OSS project management, a large amount of studies are devoted to predicting bug xing time, a small amount focused on other activities such as code review and duplication identication while none of these study systematically analyze software maintainability from the bugs. 3.5 A Proposed Software Maintenance Readiness Framework Various concepts such as Technology Readiness Levels (TRLs) [34], Manufacturing Readiness Levels (MRLs) [94], and System Readiness Levels (SRLs) [95] have been highly useful in improving the readiness of systems to be elded and operated. However, the current SRL content does not address system maintainability readiness. Except for one Systems Readiness Level table indicating that for Operations and Support, State of the Art systems have high training cost and lack of support. State of the Practice systems have training and support readily available, and State of the Obsolescence systems have high cost of maintenance and increased training cost. Given the discussions above on the Software Process Foresight Shortfalls and the Major Shortfall Categories, it appears worthwhile to develop and use a similar Software Maintainability Readiness Framework (SMRF) [15] to improve future systems' continuing operational readiness and Total 40 Cost of Ownership (TCO). Most likely, its content would also help on hardware-intensive systems or cyber-physical-human systems. Table 3.12 provides a proposed SMRF. Its columns are organized around the three major maintainability readiness shortfall categories of Life Cycle Management, Maintenance Personnel, and Maintenance MPTs. In general one would expect a major defense acquisition project to be at SMRF 4 at its Materiel Development Decision milestone; at SMRF 5 at its Milestone A; SMRF 6 at its Milestone B; and SMRF 7 at its completion of Operational Test and Evaluation. Smaller, less- critical systems would be expected to be at least at SMRF 3 at its Materiel Development Decision milestone and at SMRF 4 at its Milestone A. Note that the SMRF framework emphasizes outcome- based maintenance incentives such as with Performance-Based Logistics or Vested Outsourcing [113] at SMRF 7, and maintainability data collection and analysis (DC&A) at SMRF 8. Over-extreme forms of agile development have had diculties with scalability as in [41], with security-critical and safety-critical systems, and with bridging incompatible infrastructures in multi-institution medical and crisis management systems. Some organizations have had signicant successes in developing and evolving complex systems with DevOps and Continuous Delivery approaches, but generally with very highly skilled teams and enterprise-controlled interfaces and infrastructure. Chen's paper [29], \Continuous Delivery: Huge Benets, but Challenges Too," is a good summary of the benets and challenges. Table 3.12: Software-Intensive Systems Maintainability Readiness Levels Software-Intensive Systems Maintainability Readiness Levels SMR Level OpCon, Contracting: Missions, Scenarios, Resources, Incentives Personnel Capabilities and Participation Enabling Methods, Processes, and Tools (MTPs) 9 5 years of successful mainte- nance operations, including outcome based incentives, adaptation to new technolo- gies, missions, and stake- holders In addition, creating incen- tives for continuing eec- tive maintainability. Per- formance on long-duration projects Evidence of improvements in innovative O&M MPTs based on ongoing O&M ex- perience Continued on next page 41 Table 3.12 { continued from previous page SMR Level OpCon, Contracting: Missions, Scenarios, Resources, Incentives Personnel Capabilities and Participation Enabling Methods, Processes, and Tools (MTPs) 8 One year of successful main- tenance operations, includ- ing outcome based incen- tives, renements of OpCon. Initial insights from main- tenance data collection and analysis (DC&A) Stimulating and applying People CMM Level 5 main- tainability practices in con- tinuous improvement and innovation in, e.g., smart systems, use of multicore processors, and 3-D print- ing Evidence of MPT improve- ments based on maintenance DC&A based ongoing re- nement, and extensions of ongoing evaluation, initial O&M MPTs. 7 System passes Maintainabil- ity Readiness Review with evidence of viable OpCon, Contracting, Logistics, Re- sources, Incentives, person- nel capabilities, enabling MPTs, outcome-based in- centives Achieving advanced People CMM Level 4 maintainabil- ity capabilities such as em- powered work groups, men- toring, quantitative per- formance management and competency based assets Advanced, integrated, tested, and exercised full- LC MBS&SE MPTs and Maintainability other-SQ tradespace analysis 6 Mostly-elaborated main- tainability OpCon, with roles, responsibilities, work- ows, logistics management plans with budgets, sched- ules, resources, stang, infrastructure and enabling MPT choices, V&V and review procedures. Achieving basic People CMM levels 2 and 3 main- tainability practices such as maintainability work envi- ronment, competency and career development, and performance management especially in such key areas such as V&V, identication & reduction of technical debt. Advanced, integrated, tested full-LC Model- Based Software & Systems (MBS&SE) MPTs and Maintainability-other-SQ tradespace analysis tools identied for use, and be- ing individually used and integrated. 5 Convergence, involvement of main maintainability success-critical stakehold- ers. Some maintainability use cases dened. Rough maintainability OpCon, other SCSHs, stang, re- source estimates. Prepara- tion for NDI and outsource selections. In addition, indepen- dent maintainability experts participate in project evidence-based decision,reviews, identify potential maintainability con icts with other SQs Advanced full-lifecycle (full-LC) O&M MPTs and SW/SE MPTs identied for use. Basic MPTs for tradespace analysis among maintainability & other SQs, including TCO being used. 4 Artifacts focused on mis- sions. Primary mainte- nance options determined, Early involvement of main- tainability SCSHs in elabo- rating and evaluating main- tenance options. Critical mass of maintain- ability SysEs with mission SysE capability, coverage of full M-SysE skills ar- eas, representation of main- tainability success-critical- stakeholder organizations. Advanced O&M MPT ca- pabilities identied for use: Model-Based SW/SE, TCO analysis support. Ba- sic O&M MPT capabilities for modication, repair and V&V: some initial use. Continued on next page 42 Table 3.12 { continued from previous page SMR Level OpCon, Contracting: Missions, Scenarios, Resources, Incentives Personnel Capabilities and Participation Enabling Methods, Processes, and Tools (MTPs) 3 Elaboration of mission Operational Concept (OpCon), Architectural views, lifecycle cost estima- tion. Key mission, O&M, success-critical stakeholders (SCSHs) identied, some maintainability options explored. O&M success-critical stake- holders provide critical mass of maintainability- capable SysEs Identi- cation of additional. M critical success-critical stakeholders. Basic O&M MPT capabili- ties identied for use, par- ticularly for OpCon, Arch, and Total cost of ownership (TCO) analysis: some ini- tial use. 2 Mission evolution directions and maintainability implica- tions explored. Some mis- sion use cases dened, some O&M options explored. Highly maintainability- capable Systems Engineers (SysEs) included in Early SysE team. Initial exploration of O&M MPT options 1 Focus on mission opportuni- ties, needs. Maintainability not yet considered. Awareness of needs for early expertise for maintainabil- ity. concurrent engineering, O&M integration, Life Cy- cle cost estimation Focus on O&M MPT op- tions considered The SMRF has been used on over 10 milestone reviews, generally resulting in improvements in maintainability planning, maintainer participation in project activities and reviews, and identica- tion of methods, processes, and tools needed by maintainers such as for requirements traceability, architecture denition and evolution, conguration management, problem diagnostics, technical debt analysis, and regression testing. At this point, a major company organization is preparing to apply it to its projects. Other evaluation results have included the evaluation of projects having high technical debt in both development and maintenance, and development of parametric models that relate the sources of technical debt to their ultimate magnitude. These include calibration of a model [87] to evaluate the return on investments in maintainability based on data from two TRW projects that did not make the investments and one that did: CCPDS-R, described in [111]. Another 43 corroborative result is the analysis of exponential growth of technical debt due to systems en- gineering underinvestment experienced across the 161 projects involved in the calibration of the COCOMO II model's architecture and Risk Resolution parameter [16]. The Vicious Circle phenomenon was exhibited in major architecture reviews of two large government projects. One project fortunately had two people with maintenance experience on the review team, who were able to provide maintainability recommendations that helped the project avoid signicant maintenance costs. The other maintenance project did not have such people, and its maintenance organization experienced extensive workload growth and an inability to quickly and cost-eectively respond to needed changes. The Conspiracy of Optimism phenomenon has been shown on several projects, in which the systems engineering budget was on the average 40% lower than the systems engineering cost estimated by the Constructive Systems Engineering Cost Model COSYSMO [111]. Summary of the result. Overall, the Software Maintenance Readiness Framework (SMRF), which is based primarily on cumulative improvement of the three acquisition shortfalls that result in increased maintenance costs, can enable development project management to anticipate and prepare for much more cost-eective software maintenance. 3.6 Summary To conclude, many software maintainability metrics have been proposed in the past decades. However, the eective use of accuracy measures for these metrics has not been observed or only been validated in a specic type of software and organization. Through the various analysis I have conducted, the results indicate that some of the popular metrics such as Maintainability Index heavily depends on the structure of the code, thus making it less desirable in measuring overall software maintainability. Additionally, human-assessed maintainability metrics such as COCOMO II Software Understandability factors can accurately re ect the actual maintenance 44 eort spent in maintenance tasks, yet it heavily depends on the developer experience and famil- iarity of the software, and could be too expensive to collect throughout software development. Specically in open source software, it is extremely dicult to measure and keep track of software maintenance eort and software maintainability. The most popular metrics used in OSS involve bugs found in the systems and do not systematically describe software maintainability. Further- more, software improvement process such as SMRF provides massive amount of expert knowledge and has found successful when used in larger organizations but it is extremely expensive to be used in organizations with limited resources such as open source ecosystems. 45 Chapter 4 The Research Approach This chapter presents the research approach and framework to derive the set of fuzzy rules to identify the maintainability subgroup SQ concerns expressed in issue summaries. More specically, Section 4.1 explains the software maintainability ontology that is used in this desertion in depth. Section 4.2 describes a preliminary analysis along with its results on products found in Mozilla community, which serves as the foundation and motivation of the propose approach. Section 4.3 provides an overview of fuzzy logic and how to form fuzzy rules in general. The denitions of linguistic patterns are elaborated in Section 4.4. Section 4.5 describes the modeling process in detail. The techniques used to generate fuzzy rules are explained in Section 4.6. Section 4.7 includes the evaluation strategies and measurement metrics. 4.1 Software Maintainability Ontology Boehm, et. al [18][16] provided an IDEF5 class hierarchy of upper-level SQs, where the top level re ected classes of stakeholder value propositions (Mission Eectiveness, Resource Utilization, Dependability, Flexibility), and the next level identied means-ends enablers of the higher-level SQs. In pursuing the various contributions and types of maintainability as a critical but complex SQ, various maintainability relationships are elaborated. Table 4.1 shows the overall importance 46 of a software system's maintainability. The ubiquity of maintainability is seen by noting the maintainability means in bold italics. Table 4.1: Upper Levels of SERC Stakeholder Value-Based SQ Means-Ends Hierarchy Stakeholder Value-Based SQ Ends Contributing SQ Means Mission Eectiveness Stakeholders-satisfactory balance of Physical Capability, Cyber Capability, Human Usability, Speed, Endurability, Maneuverability, Accuracy, Impact, Scalability, Versatility, Interoperability, Domain-Specic Objectives Life Cycle Eciency Development and Maintenance Cost, Duration, Key Personnel, Other Scarce Resources; Manufacturability, Sustainability Dependability Reliability, Maintainability, Availability, Survivability, Robustness, Graceful Degradation, Security, Safety Changeability Maintainability, Modiability, Repairability, Adaptability Composite QAs Aordability Mission Eectiveness, Life Cycle Eciency Resilience Dependability, Changeability The hierarchy denes maintainability as an external support to the SQ changeability [43]. Maintainability depends on two alternative SQs, repairability and modiability which handle defects and changes respectively. These SQs are further enabled by several subgroups. The proposed approach focuses on maintainability in the context of these mean-ends SQs as shown in Figure 4.1. Repairability [16] involves handling of defects in software. It is enabled by the following SQs: Diagnosability: Diagnosability is the quality of being diagnosable, which is the property of a partially ob- servable system with a given set of possible faults, that these faults can be detected with certainty with a nite observation [73]. Issues that aect this SQ involve problems with lack of logging and diagnosability management, faulty error messages and the process of trac- ing where they originate, failure of tests, and insucient information provided for accurate assessments [45][126][65]. 47 Figure 4.1: Software Maintainability Ontology Hierarchy Accessibility: In general, software accessibility [56] describes the ability of a software system to accommo- date people with special needs when using the system. This requires a software system to be suitable for most of the potential users without any modications and easily adaptable to dierent users such as with adaptable or customized user interfaces [46]. In this dissertation, accessibility is dened as the quality of being available and reachable, which involves whether the intended areas of a software system can be accessed as desired. The JCIDS manual [70] denes the Accessibility of Architectures as the ability to grant access to authorized users in a timely fashion in order to "support architecture-based analysis and decision making processes." Issues that aect this SQ prevent authorized users from accessing data or functions due to things such as redirects to unintended locations, broken links to intended areas, and incorrect user permission and authorization. 48 Restorability: Restorability describes the ability of a software system to restore to a previous state [14][13]. Issues that aect this SQ including activities such as clearing of caches, refreshing settings, proper removal of data and backups of the current system. Modiability [83] involves handling of software changes. It is enabled by the following SQs: Understandability: Software understandability not only describes the understanding of the source code, in- cluding code readability and code complexity [24][20][33], but also includes non-source code software artifacts generated during the development process, such as documentations, is- sue summaries, development emails and commit messages [64][19]. Moreover, it closely ties to developers background [24], such as experience and familiarity with the code base. To developers, software understandability is dened as the complexity and readability of the source code [96]; while to users, software understandability is dened as the capability of the software product to enable the user to understand whether the software is suitable, and how it can be used for particular tasks and conditions of use [53]. Boehm dened software understandability as \a characteristic of software quality which means ease of understanding software systems" [19]. In his model, understandability is placed as a factor of software maintenance. Although developers of the original software system usually maintain it, they may be transferred, or change their jobs or retire. Software maintainers need to understand and change the existing code base for enhancing functions, correcting faults, or adapting it to new circumstances. Issues that aect this SQ involve activities such as system enhancement, lack of explanations and comments, confusing or inaccurate descriptions, presence of deprecated software and more. 49 Modularity: Modularity involves separation of code into modules. It indicates the degree to which a systems components are made up of relatively independent components or parts which can be combined [104][91]. Issues that aect this SQ involve unwanted interactions between dierent modules and separation of one module into multiple modules. Scalability: Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for the software functions increases [102][21][40]. Issues that aect this SQ involve latency in functionality, hangs, and insucient resources for functionality to scale up or down. Portability: Portability refers to the ability of a software unit to be ported to a given environment and being independent of hardware, OS, middle-ware, and databases [76][17][107]. Issues that aect this SQ prevent proper interfacing between software components and external platforms. 4.2 Preliminary Analysis on Mozilla Community An empirical study on 11 products found in Mozilla community shows the potential for using bug reports to assess software maintainability at the system level using a software maintainability ontology [26]. A \bug" is a synonym of a \fault", which means \an event that occurs when the delivered service deviates from correct service" [8]. Table 4.2 reports the characteristics of each product: (i) the classication of the product, (ii) earliest bug reported, and (iii) the number of sampled bugs. I limit the study subjects to resolved 50 Table 4.2: Characteristics of Mozilla Products Classication Product Earliest bug Sampled bugs # Client Software Cloud Services 2007 617 Data Platform and Tools 2015 158 Firefox for Android 2009 807 Firefox OS 2011 1000 Thunderbird 2000 707 SeaMonkey 1998 875 Server Software Bugzilla 1998 841 Socorro 2007 448 Webtools 1999 308 Testopia 2006 130 Marketplace 2011 481 and xed bugs since unxed bugs may be invalid and the root causes cannot be identied through bug reports and follow-up discussions. To sum up, 61790 bugs were mined and 6213 bugs were sampled and analyzed. All the bug reports of the products are hosted on Bugzilla 1 . Through a closely controlled manual tagging process, each bug was tagged with one or more maintainability subgroup SQs found in [18] with its root cause. For any bugs that express multiple quality issues, I took the rst identied SQ tag as its expressing concern. A series of analysis was conducted to study how maintainability evolve as the software evolves, the root causes, the impact and dependency relationship of each subgroup SQ. The results validated the possibility of designing automatic ways to map bugs to maintainability subgroup SQ issues. A manual inspection on these bugs was then conducted to study and identify possible pat- terns 2 . During this process, I noticed that users are likely to use recurrent linguistic patterns in their sentences when reporting bugs or requesting new features. For example, one common pat- tern observed during the inspection was faulty error. This pattern appeared frequently in issue summaries that express diagnosability concerns, which indicated that the systems output error 1 https://bugzilla.mozilla.org/ 2 Dataset can be found: https://bit.ly/2WhtWJx 51 messages are inaccurate or insucient for diagnosing problems correctly. Here are some bugs that are expressed in this pattern: Incorrect error code return values. Searching on Null (" ") does not report an error message. Empty error messages from JS Console. Invalid error message for installing updates B2G Desktop/Mulet stating 2G network Updates error handling can fail. B2G [Bluetooth] Failing to pair devices results in redundant error message. If a bug contains error and the direct descriptive words are negative (whether an adjective or an action), then it is more likely that the bug is expressing a diagnosability concern. Additionally, some observed linguistic patterns match the denitions of certain subgroup SQs found in quality standards and practice guidelines. For example, software understandability has a denition as the capability of the software product that enables the user to understand whether the software is suitable, and how it can be used for particular tasks and conditions of use through proper documentation and user guides. One common pattern found in bugs that express under- standability concerns was the keyword "documentation". Below are some bugs that are expressed in this pattern: Documentation does not mention you need to copy /media/js/app/local-settings.js-dist to /media/js/app/local-settings.js. Update searchfox docs/ documentation. Document the signicance and usage of "popcorn" events. Fix the lack of documentation for MediaDB. 52 If a bug contains documentation, then it is more likely that the bug is expressing a understand- ability concern [28]. Since linguistic variables are central to fuzzy logic and the if-then formats match the denition of fuzzy rules, fuzzy rules are chosen to model the relationships between linguistic patterns and the classication results. 4.3 Overview of Fuzzy Sets and Linguistic Patterns This section summarizes the basic ideas of fuzzy logic and fuzzy rules. This overview does not cover all aspects of this topic, however, the goal is to introduce the fuzzy concepts that are used in this dissertation. 4.3.1 Fuzzy Sets and Fuzzy Reasoning A classical crisp set is dened by having crisp boundaries, which means each crisp set is a collection of distinct objects. Georg Cantor rst came up with the idea of crisp set theory [62], which describes a way to divide a given universe of discourse into two group: members and nonmembers. A crisp set can be dened by the "characteristic function". Let U be a universe of discourse. The characteristic function A(x) of a crisp set A in U is dened as: A(x) = 8 > > > < > > > : 1; i x2A: 0; otherwise: (4.1) Zadeh introduced fuzzy sets [127], where a more exible sense of membership is possible. In fuzzy sets, many degrees of membership are allowed. The degree of membership to a set is indicated by a number between 0 and 1. Hence, fuzzy sets may be viewed as an extension and generalization of the basic concepts of crisp sets. A fuzzy set A in the universe of discourse U can be dened as a set of ordered pairs, 53 A = (x;(x))jx2U where is called the membership function of A and (x) is the degree of membership of x in A, which indicates the degree that x belongs to A. The membership function maps U to the membership space M, that is A : U ! M. When M = 0; 1, set A is non-fuzzy and is the characteristic function of the crisp set A. For fuzzy set, the range of the membership function is a subset of the non-negative real numbers. In most general cases, M is set to the unit interval [0; 1]. 4.3.2 Fuzzy Systems Fuzzy rules and fuzzy reasoning are essential to any fuzzy systems, which are also the most important modeling tool based on fuzzy set theory. They have been applied to a wide range of real- world software problems, such as pattern recognition, and data classication [115][30][4][22][99]. Fuzzy if-then rules, also known as fuzzy conditional statements, are expressions of the following form: If x is A, then y is B where A and B are linguistic labels dened by fuzzy sets on universe of discourse X and Y , respectively. Often x is A is called the antecedent or premise, while y is B is called the consequence or conclusion. Due to their concise form, fuzzy if-then rules are often used to capture the imprecise modes of reasoning and play an essential role in the human ability to make decisions in an environment of uncertainty and imprecision. From another angle, due to the qualiers on the premise parts, each fuzzy if-then rule can be viewed as a local description of the system under consideration. Fuzzy reasoning, also known as approximate reasoning, is an inference procedure that derives conclusions from a set of fuzzy if-then rules and known facts. 54 A wide variety of fuzzy systems can be found in literature. The fuzzy system [103] introduces a popular computing framework based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. It has found successful applications in a wide variety of elds, such as automatic control, data classication, decision analysis, expert systems, robotics, and pattern recognition. The basic structure of a fuzzy inference system consists the following functional components: Rule base: contains a selection of fuzzy rules. Data base: denes the membership functions used in the fuzzy rules. Reasoning mechanism: performs the inference procedure upon the rules and given facts to derive a reasonable conclusion. Fuzzication interface: transforms the crisp inputs into degrees of match with linguistic values. De-fuzzication interface: transforms the fuzzy results of the inference into a crisp output. The following are the steps of fuzzy reasoning: Fuzzication Step: During this step, the input variables are compared with the mem- bership functions on the antecedent part, in order to obtain the membership values of each linguistic pattern. Combination Step: During this step, the membership values are combined on the premise part to get weight of each rule. Generation Step: During this step, the qualied consequent (either fuzzy or crisp) of each rule are generated depending on the weight. De-fuzzication Step: During this step, the qualied consequent is aggregated to produce a crisp output. 55 The role of implication is seen as a fuzzy generalization of a logical implication: A)B:A_B which translates into following logic: A is true (Premise) + If A then B (Implication)) B is true (Conclusion). 4.3.3 NLP Techniques To generate linguistic patterns that are used to construct fuzzy rules, a number of natural language processing techniques are used. Part-of-speech (POS) taggers takes text as an input, and output the text with parts of speech assigned to each word/term. These are built using dierent types of machine learning methods. POS tagging techniques can be categorized into rule-based and stochastic-based approaches, where tags are generated according to rules and probability models respectively. Commonly used su- pervised POS taggers include Unigram Taggers, Hidden Markov Model based taggers, Maximum Entropy based taggers, and Transformation based taggers [108] The Stanford POS tagger used herein builds upon the Maximum Entropy based tagger [110], and further incorporates preceding and following tag contexts, use of lexical features, use of priors in conditional loglinear models, and modeling of unknown word features [109]. These tagged texts are then used by other tools for tasks such as Named-Entity Recognition and Information Extraction in general [93] Stanford Dependencies provides a representation for grammatical relationships for words in a sentence, originally designed for natural language understanding applications [37] Universal Dependencies is a framework that expanded upon the Stanford Dependencies and is meant to include grammatical relations across languages [97]. These have been continually worked on since then and are documented here 3 . There they describe various syntactic relations; the following have been selected for denition as they are used in this approach. 3 https://universaldependencies.org/ 56 nsubj(nominal subject): this describes the subject of a clause and identies the \do-er, or the proto-agent". cop(copula): this is \the relation of a function word used to link a subject to a nonverbal predicate". csubj(clausal subject): this describes the case in which the subject of a clause is a clause in of itself. amod(adjectival modier): this is \any adjectival phrase that serves to modify the meaning of the noun". obj(object): this normally describes the \entity acted upon or which undergoes a change of state or motion". iobj(indirect object): this describes \any nominal phrase that is a core argument of the verb but is not its subject or (direct) object". dobj(direct object): this describes \any nominal phrase that is a core argument of the verb but is its subject or (direct) object". 4.4 Denition of Heuristic Linguistic Patterns Given an issue summary, its linguistic patterns can be heuristically identied from the following three levels: Syntax Patterns: Syntax patterns are dened as the specic sentence structures that frequently appear in issue summaries that express a certain SQ concern. The sequence of Part-of-speech (POS) tags and sentence dependency are used to identify these patterns [38][71]. Part of the tagset dened by Ye et al. [122] is adopted in this dissertation as shown in Table 4.3. 57 Table 4.3: POS Tags and Corresponding Tagged Types POS Tag Tagged Type <O> Pronoun <N> Common Noun <V> Verb <^> Proper Noun <A> Adjective <D> Determiner <R> Adverb <!> Interjection <P> Subordinating conjunction, pre(post)position <T> Verb Particle <X> Existential There, Predeterminers <&> Coordinating Conjunction <$> Numeral <G> Miscellaneousness, Abbreviation, Garbage nsubj, cop, and csubj [37] are analyzed to identify the subject and the action of a given issue summary. For example, Rule #13 in Appendix I has a syntax pattern of \fstart vb = 1g", which means that the rst word in the issue summary is a verb. Rule #11faction \clarify"=1g means that the action of the issue summary is \clarify" and the issue is describing a under- standability related concern. Lexical Patterns: Lexical patterns are dened as words or phrases that frequently appear in issue summaries that express a certain SQ concern. In order to identify these lexical patterns, I construct: 1). A set of keywords per SQ that contains the most frequent words appearing in issue summaries with certain SQ tags; and 2). a set of keywords contains a list of negative words [51]. For example, issue summaries containingf\documentation", \guide", or \user manual"g are more likely to be classied into tag Understandability. 58 Figure 4.2: An Example of the Universal Dependencies Semantic Patterns: Semantic patterns are dened as the meaning of linguistic expressions that frequently appear in issue summaries that express a certain SQ concern. More specically, I am interested in the patterns of meaningfulness that I nd in words or phrases. Take the issue summary \datasourcerealm should provide additional info on SQLException." as an example (shown in Figure 4.2). We can observe that the subject of this issue summary is \datasourcerealm" and the direct action links to the subject is \provide". The item it is \providing" is \info", which has an adjective \additional" to describe more. Thus, the semantic pattern of this summary is give more, which means that the issue describes providing more of something. Since regular POS taggers do not incorporate software-specic terms, in order to improve the results, the following steps are implemented to maintain the natural language state of the issue summaries: 1. I replace any source code found in issue summaries with<CODE>. Developers usually fol- low certain patterns when naming code terms such as CamelCase, C Notation, etc. There- fore, code terms can be identied through recognizing these patterns, shown in Table 4.4. Table 4.4: Code Term Patterns Type Example Pattern C Notation OPT DRIVER INFO [A-Za-z]+[0-9]* .* Qualied Name options.addOption [A-Za-z]+[0-9]*[n.].+ CamelCase OptionValidator [A-Za-z]+.*[A-Z]+.* Uppercase XOR [A-Z0-9]+ System Variable cmdline +[A-Za-z0-9]+.+ Reference Expression std::env [a-zA-Z]+[:]f2,g.+ 59 2. I replace any le path found in issue summaries with <FP>. File path usually contains a set of \/". 3. I replace any trivial words such as hello, thank you, regards, look forward, etc. that do not contribute to the content with <T>. 4. I replace any version number with <VERSION>. Version numbers usually follow the semantic versioning convention [78]. There are three parts in a version number: major version; minor version; and patch, separated by a period (.). 5. Some issue summaries come with content inside of a set of [ ] or ( ) at the beginning of the issue. I extract the word/phrase between the brackets. 6. I replace any component names found in the issue summary with <COMP>. For example, Eclipse has a component called Update. For any issues found within the Update component, I replace "Update" with the tag<COMP>to dierentiate from the action \update" to avoid false tagging these issues with understandability tags. 4.5 The Modeling Process The process of the proposed approach is consists of three phases, shown in Figure 4.3. Since maintenance tasks involve xing bugs and implementing new features, I use the term \issues" to represent both bugs and feature requests of a system and \issue summary" to represent the description of an issue. A \bug" is a synonym of a \fault", which means \an event that occurs when the delivered service deviates from correct service". A \feature request" evokes the idea of something new and not yet in the current system. The rst phase is the pre-processing phase. During this phase, a large set of randomly sampled issue summaries from various open source ecosystems is selected. The data is tagged with one or more SQ concerns through a manual tagging process based on the ontology described in Section 60 Figure 4.3: The Overall Process of the Proposed Approach 4.1. As a result, a set of issue-quality pairs are generated as the ground-truth dataset (Shown in Figure 4.4). The second phase is the fuzzy rule generation phase. First a set of initial rules for each SQ is generated from standards and practice guidelines and veried with priori probability by experts in software quality. Then the ground truth dataset is incrementally analyzed to discover new rules until the performance of the rule sets is stable. The same process is repeated for each SQ. The third phase focuses on answering the proposed research questions and evaluating the generated rule sets in data that are not used in generating the rules to validate the generalizability of the approach. 4.6 Fuzzy Rule Generation This section elaborate further on the modeling process. In this dissertation, I adopt the approach presented in [116], where the idea is to learn rules from a set of examples by collecting data 61 Figure 4.4: An Example of the Issue-quality Pair Generated from the First Phase samples into \fuzzy hyperboxes". If there are enough samples in one box, then a rule is formed. Specically, each rule is considered as one \fuzzy hyperbox" and rule weight is used to determine whether there exist enough samples for each rule. The general idea of a single rule can be expressed in following IF-Then formula: Rule R k : IF x 1 is A k1 AND ::: AND x n is A kn ; THEN class =C k with CF k ; (4.2) whereR k is thek th fuzzy rule,x 1 ;:::;x n are the linguistic variables,A ki are the values used to the represent linguistic variables, C k is a consequent class, and CF k is a rule weight. Moreover, an issue summary could satisfy more than one rule. The rule with the maximum CF k is considered as its nal rule. For example, given the issue \Remove R$ as alternative currency from all non- portuguese-BR keyboard layout", it satises Rule 12, Rule 13 and Rule 23 for the classication of Understandability. Since Rule 13 is a more dominant rule with a higher rule weight, this issue is tagged with Understandability and Rule 13 is its determining rule for the classication result. The process of fuzzy rule generation is divided into two parts: the initial phase and the incremental selection phase. In the initial phase, I aim to generate a set of initial fuzzy rules by heuristically identifying linguistic patterns from denitions and practice guidelines. In the 62 incremental selection phase, I aim to improve the initial fuzzy rules incrementally and determine whether the existing rule sets should be updated. 4.6.1 Initial Phase I rst start with the set of initial rulesfR n g identied from the standards and practice guidelines mentioned in Section 4.1. For each subgroup SQ, I extract lexical, syntax and/or semantic patterns as introduced in Section 4.5. These linguistic patterns can work as the antecedent Ak1 ::: Akn for a rule r, and the Ck of rule r is the subgroup SQ concern that the issue expresses. Table 4.5 lists some examples of the initial rules generated for some subgroup SQs. Table 4.5: Examples of Some Initial Rules Classication Rule (Linguistic Patterns) Type Understandability fdocumentation, guide, user manualg = 1 Lexical Accessibility faction \permit"g = 1 Lexical, Syntax Modularity fmoduleg = 1 ANDfcould, need, shouldg = 1 Lexical Diagnosability ftestg = 1 ANDfnegg = 1 Lexical, Syntax The output of the initial phase is a set of initial fuzzy rulesfR n g for each subgroup SQ. 4.6.2 Incremental Selection Phase In this phase, I aim to improve the initial fuzzy rule setfR n g through incrementally classifying new issue summaries, and determining whether the existing rule set should be updated. First I assemble the datasetfD i g by adding a new issue summary during each iteration. Then I classifyfD i g by applying fuzzy rulesfR i1 g, which is the rule set that is generated from the previous iteration. fR i1 g starts with the rules from the initial rule setfR n g. By comparing with the tagged results offD i g, I obtain the correct and incorrect classication results. For the correctly classied results, I identify the corresponding rule and increase the weight of the rule. For the incorrectly classied results, I extract the linguistic patterns and determine if the discovered pattern should be included as a new rule. 63 In order to determine whether a rule should be added, I use the correctness measurement explained in Section 4.7 as the determining factor. More specically, average accuracy and f- measure are used as the correctness measurement. When any changes need to be applied to the existing rule set, the change is only accepted when the correctness with the change is better than the current correctness of the rule set. For example, during iteration 49, when introducing a new ruler 0 to the rule setfR 48 g,r 0 can only be inserted if the correctness offR 48 g +r 0 is higher than fR 48 g. This step is repeated until the performance of the current rule set is stable, which means either the correctness does not increase when adding in new rules or no new rules are being discovered. Algorithm 2 and 3 explains the procedures to insert a new rule or update an existing rule. As a result, a set of updated and nal fuzzy rules is obtained. Appendix A lists the complete rule set for each subgroup SQ. Algorithm 2 Procedure of Inserting a New Rule 1: procedure Insert(Rule newRule, List currentRuleSet) 2: currentCorrectness Correctness(currentRuleSet) 3: newRuleSet currentRuleSet:add(newRule) 4: newCorrectness Correctness(newRuleSet) 5: if (newCorrectness>currentCorrectness) then 6: return newRuleSet 7: else 8: return currentRuleSet Algorithm 3 Procedure of Updating an Existing Rule 1: procedure Update(List ruleSet, String issue) 2: identifiedRule ; 3: if ruleSet:classify(issue) =true then 4: r:weight + = 1j r in ruleSet where r>issue 5: return ruleSet 6: else 7: identifiedRule issue:identifyRules() 8: Try : 9: for eachRule r 0 : identifiedRule do 10: Insert(r 0 ;ruleSet) 64 4.7 Evaluation Strategies The rule quality is the most important evaluation criteria for rule extraction models and often it is measured through the accuracy of the model. The accuracy of extracted rules describes their ability to correctly classify data that is not used for the training of the model, which can be used to measure of the generalizability of the extracted rules. Since one fuzzy rule can be viewed as a specic kind of association rule [6] of the form Ak to Ck, in this dissertation, I use accuracy, precision, recall, and f-measure as the metrics to evaluate the performance of the proposed approach. These metrics are commonly used to measure the correctness of classication results in machine learning models. Accuracy shows the percentage of correctly tagged issues in the entire dataset. Precision indicates the correctly identied proportion of issues tagged as the correct SQ concerns. Recall represents the coverage of correctly identied issues by the proposed approach. F-Measure is the harmonic mean of precision and recall. Accuracy = TP +TN TP +FP +FN +TN (4.3) Precision = TP TP +FP (4.4) Recall = TP TP +FN (4.5) FMeasure = 2PrecisionRecall Precision +Recall (4.6) where TP represents True Positive, FP represents False Positive, FN represents False Negative, and TN represents True Negative. 65 Chapter 5 Experimental Background This chapter describes the setup of the experiment in order to answer the research questions proposed in Chapter 1. The study subjects are listed in Section 5.1 and how data is collected is described in Section 5.2. 5.1 Study Subjects The subject projects are selected from three diverse and popular open source ecosystems: Apache, Eclipse, and Mozilla. Various products are used to answer dierent research questions. The choice of these ecosystems to analyze is not random, but rather driven by the motivation to consider ecosystems having (i) products belong to dierent domains, e.g., a mix of client appli- cations and server software is considered; (ii) products with dierent architectures, e.g.,Android mobile apps, development tools, and libraries; (iii) products with dierent duration, e.g., some products have been around for more than ten years whereas some have only been around for less than three years; (iv) large number of active contributors. Table 5.1 reports the characteristics of each chosen product: (i) product name, (ii) the ecosys- tem it belongs to, (iii) the classication of the product, (iv) earliest issue reported, and (v) the number of resolved and xed issues as of December 31 th 2018. In total, 22 products are analyzed in this dissertation. 66 Table 5.1: Characteristics of the Study Subjects Product Ecosystem Classication Earliest Issue Reported # of Issues Log4j Apache Logging Tool 2001 532 Lenya Apache Content Management System 2011 839 Taglibs Apache Library 2004 264 Rhino Mozilla Engine 1999 498 Instantbird Mozilla Messaging Client Application 2013 751 Testopia Mozilla Test Management 2006 431 JMeter Apache Load testing 1998 2316 Xerces-J Apache Library 2001 97 Tomcat 7 Apache Web Server 2001 988 Cloud Services Mozilla Services 2007 4592 Thunderbird Mozilla Email Application 2000 7814 Data Platform and Tools Mozilla Services 2015 1179 Firefox for Android Mozilla Web Browser 2009 8105 Firefox OS Mozilla Web Browser 2011 10000 SeaMonkey Mozilla Web Browser, Email Client Application 1998 9209 Bugzilla Mozilla Bug Tracking System 1998 8431 Socorro Mozilla Crash Stats Application 2007 4940 Webtools Mozilla Services 1999 3153 Marketplace Mozilla Services 2011 4812 Eclipse Eclipse Programming Tool 2001 2000 Mylyn Eclipse Task Management 2005 2000 EE4J Eclipse Programming Tool 2007 1998 5.2 Data Extraction and Analysis First I download the resolved and xed issues of each product from its issue tracking system. The title, the description and various information of the retrieved issue are exported as one issue summary. I limit the study to resolved and xed issues only since unxed issues may be invalid and the subgroup SQ concerns cannot be identied through issue summaries and follow-up discussions. After obtaining a set of qualied issue summaries as the subject data, I aim at tagging each sentence with one or more appropriate SQ concerns. The tagging process is closely monitored and the quality of the results is ensured by a group of experts. The group consists of four experts 67 in software quality ontology, ve members from the Mozilla community and three senior software engineers from a local startup company. All of them either have done intensive research work in software maintainability or have been actively contributing to one or more products in one or more open source communities. Due to the intensive and expensive labor requirement for manual tagging, we randomly selected and manually tagged nine projects. All the tagging results are reviewed and conrmed by at least ve team members. Regular discussion sessions are hosted to vote for those issue summaries that received dierent tagging results. Table 5.2: Number of Manually Tagged Issues Product # of Tagged Issues Total # of Issues Log4j 206 532 Lenya 337 839 Taglibs 108 264 Rhino 126 498 Instantbird 345 751 Testopia 179 431 JMeter 514 2316 Xerces-J 24 97 Tomcat 7 359 988 In total, we manually examined 6716 issue summaries from nine products and obtained 2198 software maintainability related issue summaries (shown in Table 5.2). To answer RQ1, I gradually add issue summaries found in the nine manually tagged projects one at a time to generate fuzzy rules until the rule set is stable. The correctness of each iteration is used as the measurement to determine the convergence of the fuzzy rules as described in Section 4.6. I rst focus on classifying issue summaries from the projects I used to generate these rules. Then I apply the nal rule set on the remaining projects that were not used in the rule generation to validate the generalization of the rules. To answer RQ2, I further analyze the data obtained from the study described in Section 3.3. In order to examine the eectiveness of the proposed approach in re ecting the changes in software maintainability between versions, 4 projects are chosen to answer this RQ. I rst compare the 68 correlation between the changes in software maintainability issues and the changes in these eorts between versions in the four Apache projects. Then I compare the above correlation with the correlation between the changes in other metrics such as MI, and human-assessed metrics and the changes in the eorts between versions to see which metrics can better re ect the changes in maintainability as software evolve, thus which is better at measuring software maintainability. To answer RQ3, I use the earliest issue found in each product as the starting point of the product. Then I treat each year as a period and group issues into dierent periods. Periods are then grouped into early, mid and late stage. To answer RQ4, I rst group the issues by severity. I use the severity label of each issue found on the issue tracking systems and group them into one of the ve severity groups: blocker, critical, major, enhancement and others in descending order of severity. The others group includes issues that are labeled normal, minor, trivia, and others. I then group the issues by the x time. The x time of an issue is calculated using the dierence in days between the creation of an issue and the closing of the issue. Issues with any reporting errors where the open date is after the last resolved date are excluded. All issue summaries are then sorted by the x time and I take 10% of both tails to represent quick-xed and slow-xed issues. In total, there are 6453 slow-xed and 6453 quick-xed issues obtained. On average, quick-xed issues take less than one day (except Xerces-J) while slow-xed issues take over 1000 days to x. Table 5.3 reports the groupings of severity with the corresponding number of issues, the average x time for the quick-xed issues, and the average x time for the slow-xed issues in each product. To answer RQ5, I rst extract all the issues that depend on other issues. Among those, 6121 issues express maintainability issues, which are considered as the subject for this RQ. The term \dependent issue" is used to refer to the extracted issue and the term \source issues" to refer to the issues the dependent issue depends on. Figure 5.1 shows the number of dependent issues with various number of source issues. The number of source issues varies from 1 to 141. Most of the dependent issues depend on only one source issue. Then I automatically tag the source issues 69 with the proposed approach and generate a quality dependency relationship for each issue with its source issues. Dependency relationships with source issues that require additional permissions to view are excluded from the RQ. Figure 5.1: Total Number of Dependent Issues with Various Number of Source Issues Figure 5.2 shows several examples of the dependency relationships. The issue 1033889 found in Bugzilla received the tag \Accessibility". There is only one source issue in this case, which is found in Firefox OS with the tag of \Understandability". Therefore, the quality dependency relationship generated from this example is: \Accessibility" -\Understandability". Figure 5.2b shows an example of an issue with more than one source issue. The issue 1061134 found in Firefox OS received the tag \Understandability". There are two source issues, which are both found in the same product, received \Restorability" and \Accessibility" respectively. In this case, the quality dependency is : \Understandability" - \Restorability, Accessibility". 70 In order to identify distinct quality relationships, the order of the quality tags in which the source issues received does not matter. More specically, a relationship of \A" - \B, C" is counted as identical as a relationship of \A" - \C, B". In total, I generate 1349 distinct quality relationships from 5872 issues. Figure 5.2: Dependency Relationship Examples (a) An Example Found in Bugzilla (b) An Example Found in Firefox OS 71 Table 5.3: Classication of RQ4 data Product # of Blocker Issues # of Critical Issues # of Major Issues # of Enhance Issues # of Others Issues Quick-xed Issues (in days) Slow-xed Issues (in days) Log4j 9 12 45 78 388 0.11 1963.73 Lenya 57 46 90 152 494 0.22 1225.03 Taglibs 2 8 67 27 160 0.80 1994.82 Rhino 1 10 35 74 378 0.05 1481.72 Instantbird 10 7 32 138 564 0.001 283.07 Testopia 5 25 72 44 285 0.17 728.87 JMeter 18 21 171 919 1187 0.004 1300.14 Xerces-J 6 3 25 1 62 1090.99 1487.52 Tomcat 7 12 25 54 199 698 0.13 1413.67 Cloud Services 47 66 128 50 4301 0.36 789.97 Thunderbird 139 251 437 460 6527 0.27 2143.54 Data Platform and Tools 4 5 9 2 1158 0.34 1212.48 Firefox for Android 40 606 77 58 7324 1.15 1615.79 Firefox OS 104 220 296 0 9380 0.45 507.61 SeaMonkey 305 399 586 558 7361 1.35 3161.02 Bugzilla 602 290 610 2229 4700 0.39 1858.61 Socorro 54 79 89 49 4669 0.05 1288.95 Webtools 13 36 90 117 2897 0.03 1453.13 Marketplace 32 50 96 199 4435 0.12 307.57 Eclipse 140 521 461 39 839 0.16 929.63 Mylyn 48 69 214 523 1146 0.23 946.23 EE4J 48 66 254 48 1582 0.06 984.01 Total 1695 2814 3938 5964 60535 49.88 1321.69 72 Chapter 6 Research Results This chapter reports the analysis of the results achieved aiming at answering the research questions proposed in Section 1.4. 6.1 Results of RQ1 As described in Chapter 4, fuzzy rules are generated by incrementally introducing new issue summaries and new linguistic patterns until the rule set is stable. In order to obtain a set of stable rules, I gradually added a random set of issue summaries from one project at a time to generate fuzzy rules. Each time, when a new issue summary is added to the dataset, I used the correctness of each iteration as the measurement to determine the convergence of the fuzzy rules. By observing the growth trend of the rules in each iteration and the measurement of classication, I assessed and showed the validity of the nal fuzzy rule set for each subgroup SQ. Figure 6.1 shows the number of rules generated in the initial phase and the number of rules in the incremental phase for each subgroup SQ. In total, 99 rules were generated. Figure 6.2, Figure 6.3 and Figure 6.4 illustrate the performance and the growth rate of the number of fuzzy rules generated in each iteration until there are no more new rules being discovered and the performance is stable. From the above gures, we can observe that as the number of 73 Figure 6.1: Number of Rules Generated for Each Subgroup SQ rules increases, the f-measure and accuracy is also increasing. By the time when Testopia was introduced to the dataset, the number of rules along with the f-measure and accuracy of the current rule set became stable. Therefore, I concluded the fuzzy rules converged at a high correctness level after introducing issue summaries from Taglibs, Rhino, Instantbird and part of Testopia. These observations show that the fuzzy rule set can reach the state of convergence with relatively high performance. In addition, each subgroup SQ took dierent numbers of iterations to achieve stability as shown in Figure 6.3 and Figure 6.4. 6.1.1 Rule-generating Projects As shown in Figure 6.5 and Table 6.1, the stable set of fuzzy rules can achieve high performance when applied on the projects where the rules were generated from. The few outliers are due to small numbers of automatic and manually identied subgroup SQs. For this reason, I removed projects which have undened metrics from the average calculations. For example, recall and 74 Figure 6.2: Growth of Rules in Each Iteration for Each Subgroup SQ f-measure were undened in Rhino for modularity since there were no true positives or false negatives. With these exceptions, most of the average accuracy, precision, recall, and f-measure are valued at above 0.8. Such high performance may be caused by the fact that the rules were generated from the sentences in those projects. Therefore, I evaluated the performance of these fuzzy rules on projects that were not used in generating the rules. 6.1.2 Non-rule-generating Projects Figure 6.6 shows the correctness and misclassication of the rules in ve other projects that were not used to generate the rule set. The f-measure is considered for all of the subgroup SQs 75 Figure 6.3: F-measure of Rules in Each Iteration for Each Subgroup SQ Accessibility: Xerces-J, Tomcat, and Lenya have the highest f-measure, while the remain- ing projects have f-measures above 0.85. Diagnosability: Jmeter, Tomcat, and Log4J have the highest f-measure, while the remain- ing projects have f-measures above 0.85. Modularity: Log4J, Xerces-J, and Tomcat have the highest f-measure, while the remaining projects have f-measures above 0.85. Portability: Xerces-J, Jmeter, and Tomcat have the highest f-measure, while the remaining projects have f-measures above 0.85. Restorability: Log4J, Lenya, and Tomcat have f-measures of 1, while Jmeter has an f- measure of 0.93. Scalability: Jmeter, Tomcat, and Log4J have the highest f-measure, Lenya has an f- measure of 0.91. 76 Figure 6.4: Accuracy of Rules in Each Iteration for Each Subgroup SQ Understandability: Log4J, Lenya, and Tomcat have the highest f-measure, while the remaining projects have f-measures above 0.9. To summarize the results of RQ1, metrics were averaged across all projects except for those that had undened values for a particular SQ. Overall, accessibility showed the best performance in terms of f-measure at 0.95 followed by restorability and scalability at 0.94. All metrics had an average above 0.8, indicating that the rule set is able to perform well in classifying issue summaries with all of the subgroup SQs. 6.2 Results of RQ2 Figures 6.8, 6.9 and 6.10 show the correlations between maintenance eort spent on xing issues and various metrics per version. As we can observe, there is no signicant correlation found between average actual maintenance eort and any of the metrics. All the p-values reported for the correlations are above 0.05. In terms of perceived software maintenance eort, COCOMO II 77 Figure 6.5: Metrics in Rule-generating Projects (a) Correctness Measurement of Accessibility (b) Correctness Measurement of Diagnosability (c) Correctness Measurement of Modularity (d) Correctness Measurement of Portability (e) Correctness Measurement of Restorability (f) Correctness Measurement of Scalability (g) Correctness Measurement of Understandability SU Model factors display a statistically signicant correlation with R = -0.93 and p <0.05; the percentage of software maintainability related issues also show a statistically signicant correction with R = 0.91 and p<0.05. However, MI still shows no correlation to the perceived maintenance eort. These results show that when a version of a project receives high COCOMO II SU Model factor ratings, developers tend to spent less time on understanding the existing code base. Sim- ilarly, when the percentage of software maintainability issues is lower, developers tend to spent less time on understanding the existing code base. Furthermore, the changes between versions are also examined as shown in Figures 6.11, 6.12 and 6.13. The changes of the percentage of maintainability issues correlate signicantly with the changes of the perceived maintenance eort signicantly between versions (R = 0.89, p = 0.0001). The changes in COCOMO II SU Model Driver ratings also correlate signicantly with the changes 78 Table 6.1: Average Metrics for Rule-generating Projects Subgroup SQ Average Accuracy Average Precision Average Recall Average F-measure Accessibility 0.99 0.94 0.94 0.94 Diagnosability 0.99 0.87 0.83 0.85 Modularity* 0.99 0.73 0.86 0.77 Portability 0.99 0.88 0.93 0.90 Restorability 1.00 0.94 0.85 0.89 Scalability 1.00 0.95 0.95 0.94 Understandability 0.97 0.90 0.88 0.89 Table 6.2: Average Metrics for Non-rule-generating Projects Subgroup SQ Average Accuracy Average Precision Average Recall Average F-measure Accessibility 0.99 0.95 0.95 0.95 Diagnosability 0.99 0.93 0.88 0.90 Modularity 1.00 0.96 1.00 0.98 Portability 1.00 0.97 0.94 0.95 Restorability* 1.00 0.98 0.99 0.98 Scalability* 1.00 0.94 0.95 0.94 Understandability 0.99 0.99 0.95 0.97 of the perceived maintenance eort signicantly between versions (R = -0.91, p <0.05). However, changes in MI between versions do not correlate signicantly with neither the actual eort nor the perceived eort. To summarize the results of RQ2, none of the metrics re ect the actual eort spent on per- forming maintenance tasks. Out of the three metrics, COCOMO II SU model factors and the percentage of maintainability related issues show statistically signicant correlation with the per- ceived eort. The changes in these metrics between versions also re ect the changes in perceived eort between versions. MI doesn't provide much information in terms of the eort needed to understand source code. Since it doesn't change much between versions, it also doesn't re ect the changes between versions very well. Since COCOMO II SU model factors were rated by the developers once they were nished the assigned maintenance tasks, one could argue that the high correlation could result from subjectivity and developer's experience and familiarity of the project and its domain. Thus, the percentage of maintainability related issues is a more suitable metric to 79 Figure 6.6: Metrics in Non-rule-generating Projects (a) Correctness Measurement of Accessibility (b) Correctness Measurement of Diagnosability (c) Correctness Measurement of Modularity (d) Correctness Measurement of Portability (e) Correctness Measurement of Restorability (f) Correctness Measurement of Scalability (g) Correctness Measurement of Understandability re ect the eort needed to understand the existing source code and the changes between versions as software evolves. 6.3 Results of RQ3 Figure 6.14 illustrates the changes in the percentage of maintainability issues found in each system over time. Note that Xerces-J is excluded due to the small number of issues and lack of evolution. The linear regression line (in red) shows the trend of how maintainability changes as software evolves. The blue line displays the actual changes between times. As we can observe, in spite of the uctuations, the majority of the systems display an obvious increase in maintainability issues as software evolves. 80 Figure 6.7: Average Metrics for All Projects Figure 6.8: Correlation between Average MI and Average Software Maintenance Eort Spent per Version 81 Figure 6.9: Correlation between Average COCOMO SU Model Factor Ratings and Average Soft- ware Maintenance Eort Spent per Version Figure 6.10: Correlation between Average Percentage of Maintainability Related Issues and Av- erage Software Maintenance Eort Spent per Version 82 Figure 6.11: Correlation between Changes in Average MI and Changes in Average Software Maintenance Eort Spent between Versions Figure 6.12: Correlation between Changes in Average COCOMO SU Model Factor Ratings and Changes in Average Software Maintenance Eort Spent between Versions 83 Figure 6.13: Correlation between Changes in Average Percentage of Maintainability Related Issues and Changes in Average Software Maintenance Eort Spent between Versions Further analysis is conducted to investigate the dominant subgroup SQs contributing to main- tainability during dierent phases of the software life cycle per system. The actual contribution of each SQ varies among products due to the dierent characteristics of these systems. One common trend discovered is that understandability concerns persist throughout the software evolution for all the studied systems. Additionally, the circumstances in which maintainability related quality concerns are more prone to be introduced are explored. Figure 6.15 illustrates the percentage of opened and resolved maintainability related issues after a release. More maintainability related issues are introduced after a patch being released, while more maintainability related issues are closed after a major release. To summarize the results of RQ3, Lehmans Declining Quality Law applies in software main- tainability. As software evolves, the percentage of software maintainability related issues are 84 Figure 6.14: Changes in the Percentage of Maintainability Issues over Time per System 85 Figure 6.15: Average Opened and Resolved Maintainability Related Issues per Introduction Status of the Issue increasing, which indicates the software maintainability decreases. Moreover, most of the main- tainability related issues are introduced during a patch release, while most of the maintainability related issues are closed during a major version release. 6.4 Results of RQ4 6.4.1 Severity Figure 6.16 and 6.17 show the total number of issues and the overall percentage of software maintainability issues and non-software maintainability issues per severity group. As we can observe, 50:1% of the blocker issues express maintainability concerns while 41:6% of the others issues express maintainability concerns. The result indicates that it is statistically signicant to suggest that among all the severity levels, blocker bugs contain the most maintain- ability issues. In other words, the most signicant bugs in software projects re ect maintainability 86 Figure 6.16: Total Number of Issues in Respective Severity Group Figure 6.17: Overall Percentage of Software Maintainability Related Issue and non-Software Main- tainability Related Issues in Respective Severity Group 87 issues of the software the most. The results also reveal that the majority of maintainability is- sues are introduced by the high severity bugs, which suggest and validate the importance of maintainability. Figure 6.18: Percentage of Subgroup SQ Contributing to Software Maintainability Related Issue in Respective Severity Group Furthermore, understandability contributes the most in all the severity levels, except in major issues where accessibility concerns occur the most. 6.4.2 Fix Time Figure 6.19 illustrates the percentage of software maintainability issues in quick-xed and slow- xed issue per project. Figure 6.20 shows how software maintainability are expressed in quick-xed and slow-xed is- sues in all projects. An independent-samples t-test was conducted to compare the overall percent- age of maintainability issues in quick-xed and slow-xed issues. There was not a signicant dier- ence in the percentage of maintainability related issues for quick-xed (M = 0:375;SD = 0:0761) and slow-xed (M = 0:408;SD = 0:0735) issues; t = 1:4630;p = 0:1509. These results sug- gest that maintainability related issues do not show any statistical dierences in issues that were 88 Figure 6.19: Percentage of Software Maintainability by Issue Fixing Time per Project resolved in short period of time and in long period of time. However, the overall percentage reveals a slightly higher amount of maintainability related issues in slow-xed issues with 40:77% in compare to quick-xed issues with 37:52%. Figure 6.21 further breaks down the percentage of each subgroup SQ that contributes to software maintainability. Both quick-xed and slow-xed issues have mostly understandability issues. Accessibility and diagnosability issues contribute mostly the same in both quick-xed and slow-xed issues, whereas portability and scalability issues are more dominant in slow-xed issues. Figure 6.20: Overall Percentage of Software Maintainability by Issue Fixing Time 89 Figure 6.21: Percentage of Subgroup SQ contributing to Software Maintainability Related Issue by Issue Fixing Time 6.4.3 Domain Classication Fig. 6.22 shows shows the overall percentage of software maintainability in server and client software. An independent-samples t-test was conducted to compare the overall percentage of maintainability issues in server and client software. There was no signicant dierence in the percentage of maintainability related issues in server (M = 0:0:399;SD = 0:0435) and client (M = 0:414;SD = 0:0847) software; t = 0:4835;p = 0:6343. These results suggest that maintainability related issues do not show any statistical dierences in issues found in server software and client software. However, the overall percentage reveals a slightly higher amount of maintainability related issues in client software with 41:44% in compare to server software with 39:95%. Figure 6.23 further breaks down the percentage of each subgroup SQ that contributes to soft- ware maintainability in server and client software. Understandability dominants in both domain classications while scalability concerns appear more in server software and accessibility concerns for client software. Summary for RQ4. Higher severity bugs express more maintainability issues compared to lower severity, which validates the importance of software maintainability measurement. Quick-xed and slow-xed bugs do not express a signicant dierence in the percentage of maintainability issues with the domination of understandability issues. However, scalability and portability issues 90 Figure 6.22: Overall Percentage of Software Maintainability by Domain appear signicantly more in slow-xed issues compared to quick-xed issues. Moreover, percentage of maintainability issues found in server and client software do not display a statistical dierence. Apart from having understandability as the most dominant subgroup SQ concern, server software contains more scalability issues while as client software contains more accessibility issues. 6.5 Results of RQ5 Figure 6.24 illustrates the characteristics of the dependent issues and source issues in each sub- group SQ. As we can observe, accessibility and understandability issues are mostly triggered by SQ issues within the same product while portability and restorability issues are triggered by SQ issues in dierent products. Portability issues have the highest number of source bugs. The results may indicate that when encountering an accessibility or understandability issue, it is reasonable to suggest searching within the same product. Portability issues are more likely to be triggered by a composition of SQ issues that are located in dierent products. Figure 6.25 illustrates the top quality dependency relationships found in the study. The size of each node represents the number of relationships found in each SQ. The bigger the node is, the higher the number of relationships identied for the particular SQ. Understandability contains 91 Figure 6.23: Percentage of Subgroup SQ Contributing to Software Maintainability Related Issue by Domain the highest number of relationships. Since non-maintainability issues are not part of the study goal, the size of the number of relationships found is unknown, thus, no node is shown for it in the graph. The arrow represents the \from-to" dependency relationship between subgroup SQs. Accessibility and portability contain a self-loop, which indicates that we have found accessibility issues that depend on other accessibility issues. Same applies to portability. Accessibility and non-maintainability issues are the most depended on SQs among these top relationships. Figure 6.26 lists the characteristics of the top quality dependency relationships. \Others" dependency relationships are mainly made up with relationships that have more than one source issues. Most of these relationships only occur once. The most occurring relationship is understandability <funderstandabilityg. This can occur when data or resources are modied, causing other data or resources to change; this can lead to requests for documentation changes that do not accurately re ect the structure. Summary for RQ5. A list of quality dependency relationships are identied. Accessibility issues are mostly dependent on SQ issues within the same product. Modularity issues have the 92 (a) Number of Dependent Issues per Subgroup SQ (b) Average Number of Source Issues per Subgroup SQ (c) Percentage of the Source Issues Found in the Same Product and Dierent Products per Subgroup SQ Figure 6.24: Characteristics of the Dependent Issues and Source Issues in Each Subgroup SQ 93 Figure 6.25: Quality Dependency Graph with the Number of Identied Relationships for Each Quality Represented with the Size of the Node highest number of source bugs and mostly depend on bugs within dierent products. The most frequent relationship originates with understandability issues, which lead to understandability issues. These results provide additional information for bug localization and searching additional issues that may need to be addressed. 94 Figure 6.26: Top Quality Dependency Relationships 3.19% 3.38% 3.25% 3.61% 4.48% 3.23% 0.90% 1.13% 3.51% 8.51% 11.49% 6.11% 5.18% 8.10% 33.93% accessibility->accessibility accessibility->understandability diagnosability->understandability modularity->accessibility modularity->N/A portability->portability portability->accessibility portability->N/A restorability->accessibility scalability->understandability understandability->modularity understandability->accessibility understandability->understandability understandability->N/A others 95 Chapter 7 Discussions and Future Work This chapter discusses the results, elaborates the contributions of this work and proposes ideas for further expansion. 7.1 Discussions In this dissertation, I have discussed the current state-of-the-art techniques in estimating software maintainability in various open source ecosystems. Motivated by the lack of eective systematic measurement of maintainability in practice, this dissertation has proposed and evaluated a novel approach that utilizes fuzzy methods and linguistic analysis to identify software maintainability related concerns from issue summaries. To my best knowledge, this approach is the rst of its kind to utilize fuzzy methods and linguistic patterns to systematically identify software maintainability concerns from issue summaries. Existing research [16][85] provides insightful knowledge in understanding, evaluating, and im- proving a system's maintainability planning, stang, and preparation of technology for cost- eective maintenance. However, it is rarely addressed in open source systems and smaller organi- zations. This validates the signicance of this study on connecting this knowledge to open source systems. 96 In addition, the results show that most of the maintainability issues are found in issues with higher severity. Thus, since these issues have a high impact on the system, it is important to resolve these issues as quickly and thoroughly as possible. With regard to software domain, both client and server software show an increasing trend of maintainability issues as they evolve. This result conrms the Lehman [66] Law of declining quality. Although statistically, the results show no signicant dierences between the percentage of maintainability issues in client and server software, the actual percentage of maintainability issues in client software is higher than server software. The main reason for this may due to external pressure from competitors and users. For example, Firefox as a browser is in the same market space as many other options. If the maintainability issues persist, users will be more likely to switch to a dierent product. Server software issues will aect all users, thus there is pressure to x the issue as quickly as possible regardless of whether the x will address the root causes of the bugs. When patches or temporary solutions are introduced while not directly addressing the cause of the maintainability issues, as the software evolves, the maintainability issues compound. Currently, when software systems measure maintainability, especially with automatic code analysis approaches, they use the same metrics without considering the dierences across domains. The results highlight these dierences and may indicate that it is reasonable to consider the domain classication of the software as a factor when selecting or developing metrics to measure maintainability. Moreover, the discovered rise of understandability may suggest that software systems should emphasize understandability during earlier phases in the life cycle as the software matures. Furthermore, understandability is the dominant subgroup SQ in both client and server soft- ware. However, existing metrics, generally used for eort estimation and commonly associated with understandability such as cyclomatic complexity, have been found to have no or low correla- tion to understandability [96]. Thus, new approaches that capture facets of code understandability are in signicant need. 97 Moreover, portability issues have a signicant impact on software systems. They tend to contain more source issues and span across products, as well as exhibit a trend of having more slow-xed issues and mostly occur in issues with high severity. It is more dicult to improve this SQ, thus suggesting more resources should be allocated toward identifying this type of issue earlier in the life cycle. The dependency relationships reveal a high-level connection between subgroup SQs. While most relationships were found to involve only one direct source issue, a chain of issues can be further linked. This could help systems understand the eect of these chains, the SQ issues they bring, and how these SQ issues change within each chain of issues, as well as provide more information for resource allocation. Thus, an extensive study is recommended to identify the common characteristics of these quality relationship chains and their impact on the rest of the software systems. 7.2 Threats to Validity External Validity The proposed approach may not generalize beyond the projects that were evaluated in this dissertation. To mitigate this threat, I tested and validated the generality of the proposed approach. The fuzzy rules extracted from four projects were tested against the issue summaries found from ve other projects. We can observe that the fuzzy rules constructed from the four projects were able to correctly classify issue summaries found in other ve projects as well. Moreover, compared to traditional projects, this kind of feature request management approach may not be needed in agile and continuous delivery and development ops situations. To examine whether the proposed approach is applicable to those projects, projects that follow agile process should be examined and analyzed. Internal Validity Threats to internal validity might come from the process of manual in- spection and tagging. Since issue summaries were tagged manually, subjectivity is inevitable. 98 However, to minimize such subjectivity, a double verication process was used: each bug report is examined and tagged by at least ve team members independently. If the tagging results are dierent, discussions were hosted until consensuses were reached. 7.3 Contributions The major contributions of the work are as follows: This work narrows the gap between practice and theory by leveraging natural language processing techniques to express the quality denitions in standards and practice guidelines as linguistic patterns used in a fuzzy classier. The proposed approach uses a fuzzy classier and linguistic patterns to provide a means of applying the insights from a software maintainability ontology to identify software main- tainability concerns from issue summaries. This automated solution classies issue summaries with maintainability concerns, which can facilitate further research on software maintenance and evolution. 7.4 Future Work Taking the resulting processed data, I plan to conduct an extensive study to identify synergies and con icts between subgroup SQ concerns and investigate their impact on the rest of the software systems. In addition, I am currently developing a plugin that allows projects to use the proposed approach with their own issue tracking systems to identify maintainability concerns. 99 Chapter 8 Conclusion This dissertation presents a novel approach to achieve automatic identication of how software maintainability subgroup SQs are expressed in issue summaries through fuzzy methods and lin- guistic patterns. Various natural language processing techniques are utilized to extract linguistic patterns presented in issue summaries. By incrementally discovering fuzzy rules with new linguis- tic patterns, the approach can output a set of stable fuzzy rules, when either the correctness does not increase when adding in new rules or no new rules are being discovered. To evaluate the approach, I conducted a large empirical experiment on nine open source projects with over 6000 issue summaries. In total, 99 rules are generated from 4 projects. The result indicates that the approach can achieve high performance on both rule generating and non-rule generating projects. Moreover, further analysis are conducted to compare the eectiveness of the proposed ap- proach against several state-of-art measurement. When compared to Maintainability Index and COCOMO II Software Understandability factors, the proposed approach is a more suitable metric to re ect the eort needed to understand the existing source code and the changes between ver- sions. Additional analysis on these projects provides a means for identifying the trend of software maintainability changes as software evolves. This informs which areas should be focused on to 100 ensure maintainability at dierent stages of the development and maintenance process. Further- more, this analysis shows the dierences of software maintainability in dierent classications and the relationships among software qualities that contribute to software maintainability. I believe that the proposed approach can provide insight into the maintainability of the system and serve as a building block for analyzing maintainability on a larger scale. 101 Reference List [1] Ieee std 1219: Standard for software maintenance. 1998. [2] Iso/iec 14764: Software engineering- software maintenance,. 2000. [3] Detecting Missing Information in Bug Descriptions. ESEC/FSE 2017, pages 396{407, New York, NY, USA, 2017. event-place: Paderborn, Germany. [4] Shigeo Abe and Ming-Shong Lan. A method for fuzzy rules extraction directly from numer- ical data and its application to pattern classication. IEEE transactions on fuzzy systems, 3(1):18{28, 1995. [5] Alain Abran, JW Moore, P Bourque, R Dupuis, and LL Tripp. Software engineering body of knowledge. IEEE Computer Society, Angela Burgess, 2004. [6] Rakesh Agrawal, Tomasz Imieli nski, and Arun Swami. Mining association rules between sets of items in large databases. In Acm sigmod record, volume 22, pages 207{216. ACM, 1993. [7] P Antonellis, D Antoniou, Y Kanellopoulos, C Makris, E Theodoridis, C Tjortjis, and N Tsirakis. A data mining methodology for evaluating maintainability according to iso/iec- 9126 software engineering{product quality standard. Special Session on System Quality and Maintainability-SQM2007, 2007. [8] Algirdas Avizienis, J-C Laprie, Brian Randell, and Carl Landwehr. Basic concepts and taxonomy of dependable and secure computing. IEEE transactions on dependable and secure computing, 1(1):11{33, 2004. [9] Banu Aysolmaz and Onur Demir ors. A detailed software process improvement methodology: Bg-spi. In European Conference on Software Process Improvement, pages 97{108. Springer, 2011. [10] Deepika Badampudi, Claes Wohlin, and Kai Petersen. Experiences from using snowballing and database searches in systematic literature studies. In Proceedings of the 19th Interna- tional Conference on Evaluation and Assessment in Software Engineering, page 17. ACM, 2015. [11] Robert Baggen, Jos e Pedro Correia, Katrin Schill, and Joost Visser. Standardized code quality benchmarking for improving software maintainability. Software Quality Journal, 20(2):287{307, 2012. [12] Noor Hasrina Bakar, Zarinah M. Kasirun, Norsaremah Salleh, and Hamid A. Jalab. Extract- ing features from online software reviews to aid requirements reuse. Applied Soft Computing, 49:1297{1315, December 2016. [13] Dines Bjrner. Facets of software development. Journal of Computer Science and Technol- ogy, 4(3):193{203, 1989. 102 [14] Benjamin S Blanchard, Dinesh Verma, and Elmer L Peterson. Maintainability: a key to eective serviceability and maintenance management, volume 13. John Wiley & Sons, 1995. [15] Barry Boehm, Celia Chen, Kamonphop Srisopha, Reem Alfayez, and Lin Shi. Avoiding non-technical sources of software maintenance technical debt. [16] Barry Boehm, Celia Chen, Kamonphop Srisopha, and Lin Shi. The key roles of maintain- ability in an ontology for system qualities. In INCOSE International Symposium, volume 26, pages 2026{2040. Wiley Online Library, 2016. [17] Barry Boehm and Hoh In. Identifying quality-requirement con icts. In International Con- ference on Requirements Engineering, page 218, 1996. [18] Barry Boehm and Nupul Kukreja. An initial ontology for system qualities. Incose Interna- tional Symposium, 25(1):341356, 2015. [19] Barry W Boehm, Ray Madachy, Bert Steece, et al. Software cost estimation with Cocomo II with Cdrom. Prentice Hall PTR, 2000. [20] Raymond PL Buse and Westley R Weimer. Learning a metric for code readability. IEEE Transactions on Software Engineering, 36(4):546{558, 2010. [21] Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. Performance and scalability of ejb applications. In Proc. 2002 ACM Sigplan Conference on Object-Oriented Programming Systems, Languages and Applications, pages 246{261, 2002. [22] Indu Chawla and Sandeep K Singh. An automated approach for bug categorization using fuzzy logic. In Proceedings of the 8th India Software Engineering Conference, pages 90{99. ACM, 2015. [23] Celia Chen, Reem Alfayez, Kamonphop Srisopha, Barry Boehm, and Lin Shi. Why is it important to measure maintainability, and what are the best ways to do it? In Proceedings of the 39th International Conference on Software Engineering Companion, pages 377{378. IEEE Press, 2017. [24] Celia Chen, Reem Alfayez, Kamonphop Srisopha, Lin Shi, and Barry Boehm. Evaluating human-assessed software maintainability metrics. In Software Engineering and Methodology for Emerging Domains, pages 120{132. Springer, 2016. [25] Celia Chen, Anandi Hira, and Barry Boehm. Using software readability for software main- tainability: A case study on unied code count. 2017. In Proceedings, IEEE SW Tech Conference. [26] Celia Chen, Shi Lin, Michael Shoga, Qing Wang, and Barry Boehm. How do defects hurt qualities? an empirical study on characterizing a software maintainability ontology in open source software. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 226{237. IEEE, 2018. [27] Celia Chen, Lin Shi, and Kamonphop Srisopha. How does software maintainability vary by domain and programming language? 2015. In Proceedings, IEEE SW Tech Conference. [28] Celia Chen, Michael Shoga, Brian Li, and Barry Boehm. Assessing software understand- ability in systems by leveraging fuzzy method and linguistic analysis. In 2019 Conference on Systems Engineering Research (CSER) (2019 CSER), Washington, USA, April 2019. [29] Lianping Chen. Continuous delivery: Huge benets, but challenges too. Software, IEEE, 32(2):50{54, 2015. 103 [30] Zheru Chi, Jing Wu, and Hong Yan. Handwritten numeral recognition using self-organizing maps and fuzzy rules. Pattern Recognition, 28(1):59{66, 1995. [31] Shyam R Chidamber, David P Darcy, and Chris F Kemerer. Managerial use of metrics for object-oriented software: An exploratory analysis. IEEE Transactions on software Engi- neering, 24(8):629{639, 1998. [32] Shyam R Chidamber and Chris F Kemerer. A metrics suite for object oriented design. IEEE Transactions on software engineering, 20(6):476{493, 1994. [33] Emilio Collar Jr and Ricardo Valerdi. Role of software readability on software development cost. Technical report, 2006. [34] Dan Cundi. Manufacturing readiness levels (mrl). Unpublished white paper, 2003. [35] Ward Cunningham. The wycash portfolio management system. ACM SIGPLAN OOPS Messenger, 4(2):29{30, 1993. [36] Bill Curtis, Sylvia B. Sheppard, Phil Milliman, MA Borst, and Tom Love. Measuring the psychological complexity of software maintenance tasks with the halstead and mccabe metrics. IEEE Transactions on software engineering, (2):96{104, 1979. [37] Marie-Catherine De Marnee, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Gin- ter, Joakim Nivre, and Christopher D Manning. Universal stanford dependencies: A cross- linguistic typology. In LREC, volume 14, pages 4585{4592, 2014. [38] Marie-Catherine De Marnee and Christopher D Manning. The stanford typed dependen- cies representation. In Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation, pages 1{8. Association for Computational Linguistics, 2008. [39] Raphael Pereira de Oliveira and Eduardo Santana de Almeida. Evaluating lehman's laws of software evolution for software product lines. IEEE Software, 33(3):90{93, 2016. [40] Leticia Duboc, David Rosenblum, and Tony Wicks. A framework for characterization and analysis of software system scalability. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foun- dations of software engineering, pages 375{384. ACM, 2007. [41] Amr Elssamadisy and Gregory Schalliol. Recognizing and responding to bad smells in extreme programming. In Proceedings of the 24th International conference on Software Engineering, pages 617{622. ACM, 2002. [42] Eduardo Figueiredo, Claudio Sant'Anna, Alessandro Garcia, Thiago T Bartolomei, Walter Cazzola, and Alessandro Marchetto. On the maintainability of aspect-oriented software: A concern-oriented measurement framework. In Software Maintenance and Reengineering, 2008. CSMR 2008. 12th European Conference on, pages 183{192. IEEE, 2008. [43] Ernst Fricke and Armin P Schulz. Design for changeability (dfc): Principles to enable changes in systems throughout their entire lifecycle. Systems Engineering, 8(4):308.1{308.2, 2005. [44] Anita Ganpati, Arvind Kalia, and Hardeep Singh. A comparative study of maintainability index of open source software. Int. J. Emerg. Technol. Adv. Eng, 2:228{230, 2012. [45] Brian J Guarraci. Instrumenting software for enhanced diagnosability, March 20 2012. US Patent 8,141,052. 104 [46] Jan Gulliksen and Susan Harker. The software accessibility of human-computer interfacesiso technical specication 16071. Universal Access in the Information Society, 3(1):6{16, 2004. [47] Jane Human Hayes, Alex Dekhtyar, and Senthil Karthikeyan Sundaram. Improving after- the-fact tracing and mapping: Supporting software quality predictions. IEEE software, 22(6):30{37, 2005. [48] Ilja Heitlager, Tobias Kuipers, and Joost Visser. A practical model for measuring maintain- ability. In Quality of Information and Communications Technology, 2007. QUATIC 2007. 6th International Conference on the, pages 30{39. IEEE, 2007. [49] Anandi Hira and Barry Boehm. Improving productivity for projects with high turnover. In The 27th Annual IEEE Software Technology Conference. [50] Anandi Hira, Shreya Sharma, and Barry Boehm. Calibrating cocomo R ii for projects with high personnel turnover. In Proceedings of the International Workshop on Software and Systems Process, pages 51{55. ACM, 2016. [51] Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168{177. ACM, 2004. [52] NBR ISO. Iec 12207. NBR ISO/IEC, page 25, 1998. [53] ISO/IEC. Iso/iec 9126 software engineering product quality part1: Quality model. [54] Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. Understanding and detecting real-world performance bugs. In PLDI, pages 77{87, 2012. [55] R. Jindal, R. Malhotra, and A. Jain. Mining defect reports for predicting software mainte- nance eort. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 270{276, August 2015. [56] A. Kavcic. Software accessibility: Recommendations and guidelines. In The International Conference on Computer As A Tool, pages 1024{1027, 2006. [57] Chai Kim and Stu Westin. Software maintainability: perceptions of edp professionals. Mis Quarterly, pages 167{185, 1988. [58] Barbara A Kitchenham, Guilherme H Travassos, Anneliese Von Mayrhauser, Frank Niessink, Norman F Schneidewind, Janice Singer, Shingo Takada, Risto Vehvilainen, and Hongji Yang. Towards an ontology of software maintenance. Journal of Software Maintenance: Research and Practice, 11(6):365{389, 1999. [59] Andrew J Ko, Brad A Myers, Michael J Coblenz, and Htet Htet Aung. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Transactions on software engineering, 32(12):971{987, 2006. [60] Jussi Koskinen. Software maintenance fundamentals. Encyclopedia of Software Engineering, P. Laplante, Ed., Taylor & Francis Group, 2009. [61] Philippe Kruchten, Robert L Nord, and Ipek Ozkaya. Technical debt: From metaphor to theory and practice. Ieee software, 29(6), 2012. [62] Rudolf Kruse. Fuzzy systems. 105 [63] L. Kumar, S. K. Rath, and A. Sureka. Using Source Code Metrics and Multivariate Adaptive Regression Splines to Predict Maintainability of Service Oriented Software. In 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE), pages 88{ 95, January 2017. [64] Kari Laitinen. Estimating understandability of software documents. ACM SIGSOFT Soft- ware Engineering Notes, 21(4):81{92, 1996. [65] Yves Le Traon, Farid Ouabdesselam, and Chantal Robach. Software diagnosability. In Software Reliability Engineering, 1998. Proceedings. The Ninth International Symposium on, pages 257{266. IEEE, 1998. [66] Meir M Lehman. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 68(9):1060{1076, 1980. [67] Wei Li and Sallie Henry. Object-oriented metrics that predict maintainability. Journal of systems and software, 23(2):111{122, 1993. [68] Zhenmin Li, Lin Tan, Xuanhui Wang, Shan Lu, Yuanyuan Zhou, and Chengxiang Zhai. Have things changed now?:an empirical study of bug characteristics in modern open source software. In The Workshop on Architectural and System Support for Improving Software Dependability, pages 25{33, 2006. [69] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a com- prehensive study on real world concurrency bug characteristics. ACM SIGOPS Operating Systems Review, 42(2):329{339, 2008. [70] JCIDS Manual. Manual for the operation of the joint capabilities integration and develop- ment system. US Department of Defense. Washington. DC, 2012. [71] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313{330, 1993. [72] Thomas J McCabe. A complexity measure. IEEE Transactions on software Engineering, (4):308{320, 1976. [73] Tarek Melliti and Philippe Dague. Generalizing diagnosability denition and checking for open systems: a game structure approach. In 21st International Workshop on Principles of Diagnosis DX'10, 2010. [74] Tim Menzies and Andrian Marcus. Automated severity assessment of software defect re- ports. In 2008 IEEE International Conference on Software Maintenance, pages 346{355. IEEE, 2008. [75] Ran Mo, Yuanfang Cai, Rick Kazman, Lu Xiao, and Qiong Feng. Decoupling level: a new metric for architectural maintenance complexity. In Proceedings of the 38th International Conference on Software Engineering, pages 499{510. ACM, 2016. [76] James D Mooney. Issues in the specication and measurement of software portability. In 15th International Conference on Software Engineering, Baltimore, 1993. [77] Vu Nguyen. Improved size and eort estimation models for software maintenance. PhD thesis, University of Southern California, 2010. 106 [78] Marc Novakouski, Grace Lewis, and William Anderson. Best practices for artifact versioning in service-oriented systems. Technical report, CARNEGIE-MELLON UNIV PITTSBURGH PA SOFTWARE ENGINEERING INST, 2012. [79] Frolin Ocariza, Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. An empirical study of client-side javascript bugs. In ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, pages 55{64, 2013. [80] Frolin S. Ocariza, Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. A study of causes and consequences of client-side javascript bugs. IEEE Transactions on Software Engineer- ing, 43(2):128{144, 2017. [81] Paul Oman and Jack Hagemeister. Metrics for assessing a software system's maintainability. In Software Maintenance, 1992. Proceedings., Conference on, pages 337{344. IEEE, 1992. [82] Robert E Park. Software size measurement: A framework for counting source statements. Technical report, DTIC Document, 1992. [83] Eltjo R Poort, Nick Martens, Van De Weerd Inge, and Hans Van Vliet. How architects see non-functional requirements: beware of modiability. In International Conference on Requirements Engineering: Foundation for Software Quality, pages 37{51, 2012. [84] Jane Radatz, Anne Geraci, and Freny Katki. Ieee standard glossary of software engineering terminology. IEEE Std, 610121990(121990):3, 1990. [85] Alan L Rector. Normalisation of ontology implementations: Towards modularity, re-use, and maintainability. In EKAW Workshop on Ontologies for Multiagent Systems. Citeseer, 2002. [86] Mehwish Riaz, Emilia Mendes, and Ewan Tempero. A systematic review of software main- tainability prediction and metrics. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 367{377. IEEE Computer So- ciety, 2009. [87] Walker Royce. Software project management. Addison Wesley, 1998. [88] Per Runeson, Magnus Alexandersson, and Oskar Nyholm. Detection of duplicate defect re- ports using natural language processing. In Proceedings of the 29th international conference on Software Engineering, pages 499{510. IEEE Computer Society, 2007. [89] Swarup Kumar Sahoo, John Criswell, and Vikram Adve. An empirical study of reported bugs in server software with implications for automated bug diagnosis. In ACM/IEEE International Conference on Software Engineering, pages 485{494, 2011. [90] Ioannis Samoladas, Ioannis Stamelos, Lefteris Angelis, and Apostolos Oikonomou. Open source software development should strive for even greater code maintainability. Commu- nications of the ACM, 47(10):83{87, 2004. [91] Ron Sanchez and Joseph T Mahoney. Modularity, exibility, and knowledge management in product and organization design. Strategic Management Journal, 17(S2):63{76, 2015. [92] Cl audio SantAnna, Alessandro Garcia, Christina Chavez, Carlos Lucena, and Arndt Von Staa. On the reuse and maintenance of aspect-oriented software: An assessment frame- work. In Proceedings of Brazilian symposium on software engineering, pages 19{34, 2003. 107 [93] Y. Sari, M. F. Hassan, and N. Zamin. Creating Extraction Pattern by Combining Part of Speech Tagger and Grammatical Parser. In 2009 International Conference on Computer Technology and Development, volume 1, pages 515{519, November 2009. [94] Brian Sauser, Dinesh Verma, Jose Ramirez-Marquez, and Ryan Gove. From trl to srl: The concept of systems readiness levels. In Conference on Systems Engineering Research, Los Angeles, CA, 2006. [95] Brian J Sauser. System maturity metrics for decision support in defense acquisition users guide: Version 1.0. 2007. [96] Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linaresvasquez, Denys Poshyvanyk, and Rocco Oliveto. Automatically assessing code understandability: How far are we? In Ieee/acm International Conference on Automated Software Engineering, pages 417{427, 2017. [97] Sebastian Schuster and Christopher D Manning. Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks. page 8. [98] Mohamed B Senousy and Tamer Sh Mazen. Correlations and weights of maintainability index (mi) of open source linux kernel modules. International Journal of Computer Appli- cations, 91(7), 2014. [99] Lin Shi, Celia Chen, Qing Wang, Shoubin Li, and Barry Boehm. Understanding feature requests by leveraging fuzzy method and linguistic analysis. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 440{450. IEEE, 2017. [100] Dag IK Sjberg, Bente Anda, and Audris Mockus. Questioning software maintenance met- rics: a comparative case study. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, pages 107{110. ACM, 2012. [101] Dag IK Sjberg, Aiko Yamashita, Bente CD Anda, Audris Mockus, and Tore Dyb a. Quan- tifying the eect of code smells on maintenance eort. IEEE Transactions on Software Engineering, 39(8):1144{1156, 2013. [102] Connie U Smith and Lloyd G Williams. Performance solutions: a practical guide to creating responsive, scalable software, volume 1. Addison-Wesley Reading, 2002. [103] Michio Sugeno. An introductory survey of fuzzy control. Information sciences, 36(1-2):59{ 83, 1985. [104] Kevin J Sullivan, William G Griswold, Yuanfang Cai, and Ben Hallen. The structure and value of modularity in software design. In ACM SIGSOFT Software Engineering Notes, volume 26, pages 99{108. ACM, 2001. [105] Herb Sutter. C++ coding standards: 101 rules, guidelines, and best practices. Pearson Education India, 2004. [106] Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, and Chengxiang Zhai. Bug characteristics in open source software. Empirical Software Engineering, 19(6):1665{ 1705, 2014. [107] Andrew S Tanenbaum, Paul Klint, and Wim Bohm. Guidelines for software portability. Software: Practice and Experience, 8(6):681{698, 1978. 108 [108] Y. Tian and D. Lo. A comparative study on the eectiveness of part-of-speech tagging tech- niques on bug reports. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 570{574, March 2015. [109] Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Con- ference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03, volume 1, pages 173{180, Edmonton, Canada, 2003. Association for Computational Linguistics. [110] Kristina Toutanova and Christopher D. Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT con- ference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -, volume 13, pages 63{70, Hong Kong, 2000. Association for Computational Linguistics. [111] Ricardo Valerdi. The constructive systems engineering cost model (COSYSMO). PhD the- sis, University of Southern California, 2005. [112] E VanDoren. Maintainability index technique for measuring program maintainability. soft- ware engineering institute, 2002. [113] Kate Vitasek, Mike Ledyard, and Karl Manrodt. Vested outsourcing: ve rules that will transform outsourcing. Palgrave Macmillan, 2013. [114] Jie Wang, Wensheng Dou, Yu Gao, Chushu Gao, Feng Qin, Kang Yin, and Jun Wei. A comprehensive study on real world concurrency bugs in node.js. In Ieee/acm International Conference on Automated Software Engineering, pages 520{531, 2017. [115] L-X Wang and Jerry M Mendel. Generating fuzzy rules by learning from examples. IEEE Transactions on systems, man, and cybernetics, 22(6):1414{1427, 1992. [116] Li-Xin Wang and Jerry M Mendel. Generating fuzzy rules from numerical data, with appli- cations. Signal and Image Processing Institute, University of Southern California , 1991. [117] Kurt D Welker. The software maintainability index revisited. CrossTalk, 14:18{21, 2001. [118] F George Wilkie, IR McChesney, P Morrow, C Tuxworth, and NG Lester. The value of software sizing. Information and Software Technology, 53(11):1236{1249, 2011. [119] Hong Wu, Lin Shi, Celia Chen, Qing Wang, and Barry Boehm. Maintenance eort estima- tion for open source software: A systematic literature review. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on, pages 32{43. IEEE, 2016. [120] Aiko Yamashita. Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data. Empirical Software Engi- neering, 19(4):1111{1143, 2014. [121] Aiko Yamashita and Leon Moonen. Do code smells re ect important maintainability as- pects? In Software Maintenance (ICSM), 2012 28th IEEE International Conference on, pages 306{315. IEEE, 2012. [122] Deheng Ye, Zhenchang Xing, Jing Li, and Nachiket Kapre. Software-specic part-of-speech tagging: An experimental study on stack over ow. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pages 1378{1385. ACM, 2016. 109 [123] Zuoning Yin, Matthew Caesar, and Yuanyuan Zhou. Towards understanding bugs in open source router software. Acm Sigcomm Computer Communication Review, 40(3):34{40, 2010. [124] Liguo Yu. Indirectly predicting the maintenance eort of open-source software. Journal of Software Maintenance and Evolution: Research and Practice, 18(5):311{332, 2006. [125] Liguo Yu and Alok Mishra. An empirical study of lehman's law on software quality evolu- tion. Int. J. Software and Informatics, 7(3):469{481, 2013. [126] Ding Yuan, Jing Zheng, Soyeon Park, Yuanyuan Zhou, and Stefan Savage. Improving software diagnosability via log enhancement. In Sixteenth International Conference on Ar- chitectural Support for Programming Languages and Operating Systems, pages 3{14, 2011. [127] Lot A Zadeh et al. Fuzzy sets. Information and control, 8(3):338{353, 1965. [128] Hui Zeng and David Rine. Estimation of software defects x eort using neural networks. In Proceedings of the 28th Annual International Computer Software and Applications Con- ference, 2004. COMPSAC 2004., volume 2, pages 20{21. IEEE, 2004. [129] Tao Zhang, Jiachi Chen, Xiapu Luo, and Tao Li. Bug Reports for Desktop Software and Mobile Apps in GitHub: What's the Dierence? IEEE Software, 36(1):63{71, January 2019. [130] Bo Zhou, Iulian Neamtiu, and Rajiv Gupta. Experience report: How do bug characteristics dier across severity classes: A multi-platform study. In IEEE International Symposium on Software Reliability Engineering, pages 507{517, 2016. [131] Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. Combining text mining and data mining for bug report classication. Journal of Software: Evolution and Process, 28(3):150{ 176, March 2016. [132] Yuming Zhou and Baowen Xu. Predicting the maintainability of open source software using design metrics. Wuhan University Journal of Natural Sciences, 13(1):14{20, 2008. [133] Fang Zhuo, Bruce Lowther, Paul Oman, and Jack Hagemeister. Constructing and test- ing software maintainability assessment models. In [1993] Proceedings First International Software Metrics Symposium, pages 61{70. IEEE, 1993. 110 Appendix A Generated Fuzzy Rules The table below provides the details of the generated rules, including the linguistic patterns associated with each rule, the linguistic type(s), the maintainability subgroup SQ classication, the total weight in the dataset, the phase where the rule is introduced and an example that expresses the rule. 111 Table A.1: Fuzzy Rules for Software Maintainability Subgroup SQs ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 1 faction \enhance"g = 1 Lexical, Semantic Understandability 3 fRng [UI] Some en- hancements to advanced search after initial merge. 2 faction \adapt"g = 1 AND succeeding <N>is <Version> OR <Sys Name> Lexical, Semantic Understandability 3 fRng Revise contacts app to adapt to WebIDL changes in a uniform way. 3 fdocumentation, guide, user manualg = 1 Lexical Understandability 17 fRng [meta] Compre- hensive Data Documentation Content. 4 fgive more = 1g Lexical, Semantic Understandability 9 fRn'g datasourcerealm should provide additional info on SQLException. 5 ftypo, misspellg = 1 Lexical Understandability 29 fRn'g dom .ipc .pro- cessPrelauch is misspelled. 6 fcommentg = 1 Lexical Understandability 8 fRn'g Rewrite how we return the latest comment times- tamp. Continued on next page 112 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 7 fannotation, description, descriptor, informationg = 1 AND <V>isfadd, provide, need, shouldg Lexical, Semantic Understandability 11 fRn'g \Found in" An- notations should be hyperlinked. 8 fannotation, description, descriptor, informationg = 1 AND preceding <A>is synonym offmoreg Lexical, Semantic Understandability 7 fRn'g Should provide more suitable description while user want to download the up- dates in roaming area. 9 fconsistent, inconsistentg = 1 Lexical Understandability 11 fRn'g Inconsistent le path in zip le. 10 fdeprecatedg = 1 Lexical Understandability 12 fRn'g Don't use depre- cated IDBCursor constants. 11 faction \clarify"g = 1 Lexical Understandability 9 fRn'g Clarify spec re: behaviour for malformed X- If-[Un]Modied- Since headers. 12 flang=1g OR ftranslationg = 1 Lexical Understandability 13 fRn'g [keyboard] No Keyboard sup- port for Spanish Latin America. Continued on next page 113 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 13 fstart vb = 1g AND <V>fKey Verbg AND fwhen, where, ifg = 0 AND fnegg = 0 Syntax, Seman- tic, Lexical Understandability 62 fRn'g Update DevHub front page to up- level partnership and simplify pre- sentation. 14 <R><V><N>= 1 AND <V>fKey Verbg AND <R>fNeg Wordsg Syntax, Seman- tic, Lexical Understandability 8 fRn'g Not updating Athena d2p Datasets. 15 fundenedg = 1 Lexical Understandability 9 fRn'g [System] TypeEr- ror: runningApps [displayedApp] is undened. 16 <A><P><V>= 1 AND <A>fNeg Wordsg Syntax, Seman- tic, Lexical Understandability 5 fRn'g [Helix] Labels in browser settings are so small it's hard to read. 17 fmatch, mismatchg = 1 Lexical Understandability 9 fRn'g Compartment mismatch when sending multi- ple messages in test outgoing.js. Continued on next page 114 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 18 faccessibilityg = 1 Lexical Understandability 4 fRn'g Recent lockscreen re- design breaks accessibility mode. 19 <V><N>= 1 AND <V>fKey Verbg Syntax, Semantic Understandability 47 fRn'g Remove unused imports. 20 fcleanupg = 1 Lexical Understandability 5 fRn'g Look into doing more cleanup in the cleanup data method. 21 foutdatedg=1 Semantic Understandability 9 fRn'g tbpl-manifest.ini is out of date with what tests can be enabled & removed. 22 foption, menu, noticationg = 1 AND preceding <V>fKey Verbg Lexical, Syntax Understandability 6 fRn'g [MMS] Add op- tions menu but- ton on top bar. 23 falternativeg = 1 Semantic Understandability 2 fRn'g Remove \R$" as alternative currency from all non-portuguese- BR keyboard layout. Continued on next page 115 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 24 <A>is synonyms of feasier, better, nicerg Semantic Understandability 6 fRng need better error messages when cong or option xml les not found. 25 ferror, exception, log, warng = 1 AND fnegg = 1 Lexical, Semantic Diagnosability 16 fRng Misleading error message \No out- put stream or le set for the appen- der ..." 26 ferror, exception, log, warng = 1 AND feasier, better, nicerg = 1 Lexical, Semantic Diagnosability 17 fRng Need better er- ror message for annotation scan- ning errors. 27 ftestg = 1 AND fnegg = 1 Lexical, Semantic Diagnosability 14 fRng No unit tests for LoggingEvent se- rialization. 28 ftestg = 1 AND fbetter, moreg = 1 Lexical, Semantic Diagnosability 9 fRng more structure for junit test results. 29 faction \throw"g = 1 ANDfexceptiong= 1 ANDfnegg = 1 Lexical, Semantic Diagnosability 11 fRn'g SocketServer throws uncaught exception. Continued on next page 116 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 30 faction \throw"g = 1 AND fcould, need, shouldg = 1 Lexical, Semantic Diagnosability 7 fRn'g should throw a RangeError when an array's length is set to >= 4294967296. 31 ffeedbackg = 1 AND fnegg = 1 Lexical, Semantic Diagnosability 6 fRn'g Change logging defaults to avoid unusable feed- back by default. 32 ffail silentlyg = 1 Lexical Diagnosability 3 fRn'g Submit fails silently 33 flocationg = 1 AND fnegg = 1 Semantic, Lexical Accessibility 9 fRng maven local repository loca- tion hard coded in pom test phase. 34 faction \break"g = 1 Semantic Accessibility 52 fRng Write All Data to a File Broken. 35 flinkg = 1 AND fnegg = 1 Semantic, Lexical Accessibility 12 fRng [PATCH] FCKeditor image and link browse doesn't work under HTTPS. Continued on next page 117 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 36 faction \permit"g = 1 Semantic, Lexical Accessibility 9 fRng [PATCH] Null conguration and publica- tion permission problems. 37 fauthenticationg = 1 Lexical Accessibility 7 fRng Authentication fails when proxy uses digest au- thorization. 38 fhttp error codeg = 1 Lexical Accessibility 10 fRn'g Accessing Servlet while Reloading context gives 404 error. 39 faction \click"g = 1 AND fnegg = 1 Semantic, Lexical Accessibility 11 fRn'g triple clicking the details links of the update his- tory dialog quits instantbird. 40 faction \disable"g = 1 Semantic Accessibility 12 fRn'g Cannot Disable Or Enable Items. 41 fpartialg = 1 Semantic Accessibility 3 fRn'g Apps be- come partially cut/disappearing after switched from the video player. Continued on next page 118 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 42 faction \hidden"g = 1 Semantic Accessibility 4 fRn'g Vertical scroll- bars become hidden when the window shrinks horizontally. 43 faction \over ow"g = 1 Semantic Accessibility 6 fRn'g stop ndbar from over owing into participant list. 44 fnot foundg = 1 Semantic Accessibility 8 fRn'g Schema error: !!Schema not found 45 favailableg = 1 AND fnegg =1 Semantic, Lexical Accessibility 9 fRn'g MDC not avail- able in chainsaw. 46 fmake availableg = 1 Semantic Accessibility 3 fRn'g make core and custom meta- data available via a template transformer 47 freadonly, writeonlyg = 1 Lexical Accessibility 4 fRn'g constructor prop- erty shouldn't be readonly. 48 fvb \start"g = 1 AND fnegg = 1 Semantic, Lexical Accessibility 3 fRn'g when the context path is empty, tomcat will startup with a FileNotFoundEx- ception. Continued on next page 119 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 49 faction \open"g = 1 AND fnegg = 1 Semantic, Lexical Accessibility 3 fRn'g WebDAV mod- ule: index en.xml and links en.xml cannot be opened. 50 flinkg = 1 AND fcould, need, shouldg = 1 Semantic, Lexical Accessibility 4 fRn'g Need a link to lo- gin in logout page 51 fonlyg = 1 AND faction \display"g Semantic, Lexical Accessibility 4 fRn'g [Everything.me] When selectting more button, phone only dis- play \Loading..." screen 52 fonlyg = 1 AND faction \return"g Semantic, Lexical Accessibility 3 fRn'g We are trying to get imei by di- aling the num- ber *#06# on a device, which has two sim card slots, but only one imei returns Continued on next page 120 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 53 fvb \restartg = 1 AND fnegg = 1 Semantic, Lexical Accessibility 4 fRn'g patch for option on csv dataset to disallow restart- ing at the top of the le 54 faction \show"g = 1 AND fnegg = 1 Semantic, Lexical Accessibility 2 fRn'g SOAP/XML- RPC Request does not show Request details in View Results Tree. 55 faction \block"g = 1 Semantic, Lexical Accessibility 3 fRn'g MyLar Seems to Block and make Eclipse unusable when running ex- ternal build 56 fvisibleg = 1 Lexical Accessibility 3 fRn'g make key project resources always visible 57 fmemoryg = 1 AND fnegg = 1 Semantic, Lexical Scalability 2 fRn'g I get Out of Memory error when running a JMX le that is 14 MB. Continued on next page 121 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 58 fmemoryg = 1 AND fbetterg = 1 Semantic, Lexical Scalability 2 fRn'g Enhance memory leak detection by selectively ap- plying methods [PATCH?] 59 fleakg = 1 Lexical Scalability 7 fRn'g SocketNode can leak Sockets. 60 fadj \slow"g = 1 Semantic Scalability 4 fRn'g Loading medium to large test plans is slow. 61 faction \lag"g = 1 Semantic Scalability 3 fRng UI lags while stats service is receiving chat rooms. 62 faction \hang"g = 1 Semantic Scalability 4 fRng Severe pauses/hang when deal- ing with large conversation backlogs 63 fwaste timeg = 1 Semantic Scalability 3 fRn'g PATCH bw c wastes a lot of time when rendering the reader's view c Continued on next page 122 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 64 ftoog = 1 AND <A>isfhigh, large, manyg Semantic, Lexical Scalability 6 fRn'g startup time is too high if there are few PC in \lib/" and a few webapps 65 faction \stuck"g = 1 Semantic Scalability 2 fRn'g When Output- Buer.doFlush gets Exception, doFlush gets stuck to true 66 fhigh CPUg = 1 Semantic Scalability 2 fRn'g High CPU load in the NIO con- nector, when a client breaks connection unex- pectedly 67 fincreaseg = 1 Semantic Scalability 2 fRng ATMO V2: In- crease Spot In- stance limits 68 freduce sizeg = 1 Semantic Scalability 2 fRng Reduce cluster size for sync view job 69 faction \overload"g = 1 Semantic Scalability 1 fRn'g delivery: up- load.xbld .pro- ductdelivery.prod .mozaws.net overloaded Continued on next page 123 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 70 fmore timeg = 1 Semantic Scalability 2 fRn'g [Camera] Camera app consumes more time be- tween two image captures 71 fspeedupg = 1 Semantic Scalability 2 fRn'g Nicklist speedup: low-hanging fruit 72 fmoduleg = 1 AND fcould, need, shouldg = 1 Semantic, Lexical Modularity 3 fRng [betafox] need to modularize pup- pet 73 fmove tog = 1 Semantic Modularity 2 fRn'g Move the tests package to JUnit 74 fnegg = 1 AND fwithg = 1 Semantic, Lexical Modularity 5 fRng sql:setDataSource is not working correctly with javax.servlet.jsp .jstl.sql .data- Source 75 faction \separate"g = 1 Syntax Modularity 2 fRng Split package is- sue in impl and jstlel bundles 76 faction \dissect"g = 1 Syntax Modularity 1 fRn'g Dissect main/views.py into smaller les 77 fmove outg = 1 Semantic Modularity 1 fRn'g Move MainPing tests out of Main- Summary tests Continued on next page 124 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 78 faction \bundle"g = 1 Syntax Modularity 1 fRn'g Feature request: Bundle groovy- all with JMeter 79 faction \reuse"g = 1 Syntax Modularity 2 fRn'g Report/Dashboard reuses the same output directory 80 faction \extract"g = 1 Syntax Modularity 2 fRn'g Extract slf4j binding into its own jar and make it a jmeter lib 81 faction \decouple"g = 1 Syntax Modularity 2 fRn'g [tracking] Decou- ple VAMO from AMO 82 fclassg = 1 AND faction \inherit"g = 1 Lexical, Syntax Modularity 1 fRn'g ClientRecord inherits from WBORecord, so it lacks cleartext 83 faction \recover"g = 1 AND fcould, need, shouldg = 1 Semantic, Lexical Restorability 2 fRn'g [Loop] Make sure that we recover from an expired token situation 84 faction \recover"g = 1 AND fnegg = 1 Semantic, Lexical Restorability 1 fRn'g Pulsetranslator doesn't recover from server shutdown Continued on next page 125 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 85 fcacheg = 1 Lexical Restorability 3 fRng JMS : Cache of InitialContext has some issues 86 faction \refresh"g = 1 Semantic Restorability 2 fRng http sampler doesnt refresh lelist 87 faction \reset"g = 1 Semantic Restorability 1 fRn'g SyncTimer counter is not re- set when scenario is stopped 88 fresilientg = 1 Lexical Restorability 1 fRn'g Make pulse- translator more resilient to errors 89 fcookiesg = 1 Lexical Restorability 2 fRn'g JMeter sending null cookies. 90 fback upg = 1 Lexical Restorability 1 fRn'g provide mecha- nism to back up all task scapes, tasklist etc 91 fSys nameg = 1 AND fof, after, sinceg = 0 AND fnegg = 1 Lexical, Semantic Portability 5 fRng Android Addons broken 92 faction \fail"g = 1 AND fSys nameg = 1 Lexical, Semantic Portability 21 fRn'g [Peak] build fails on Mac OS X, kernel tree missing headers (elf.h...) Continued on next page 126 Table A.1 { continued from previous page ID Rule (Linguistic Patterns) Type Classication Weight Introduced in Example of Is- sues 93 faction \run"g = 1 AND fSys nameg = 1 Lexical, Semantic Portability 11 fRn'g Can't run Weave unit tests with Firefox 3.6 / 3.7. 94 faction \work"g = 1 AND fSys nameg = 1 Lexical, Semantic Portability 12 fRn'g Canvas doesn't work in Win32 installer builds 95 faction \crash"g = 1 AND fSys nameg = 1 Lexical, Semantic Portability 7 fRn'g Installing a Root Certicate crashes Firefox 96 f<P><Sys name>g = 1 Lexical, Seman- tic, Syntax Portability 23 fRn'g On windows, export bug list to CSV breaks because of extra line breaks 97 faction \deploy"g = 1 AND fSys nameg = 1 Lexical, Semantic Portability 6 fRng Deploy release zip to Linux 98 faction \integrate"g AND fSys nameg = 1 Syntax Portability 4 fRn'g Integrate AWS Lambda with Datadog. 99 faction \upgrade"g = 1 AND fSys name versiong = 1 Lexical, Syntax Portability 9 fRn'g Upgrade to An- gularJS 1.3.15 127
Abstract (if available)
Abstract
Beyond the functional requirements of a system, software maintainability is essential for project success. While many metrics for software maintainability have been developed, the effective use of accuracy measures for these metrics has not been observed. Moreover, while there exists a large knowledge base of software maintainability in the forms of ontology and standards, this knowledge is rarely used. Especially in open source ecosystems, due to the large number of developers and inefficiency in identifying quality issues, it is extremely difficult to accurately measure and predict software maintainability. ❧ In this dissertation, a set of deep analysis of the effectiveness of the current metrics in the open source ecosystems has been reported. Based on the findings, a novel approach is introduced for better assessment of the overall software maintainability through fuzzy methods and linguistic analysis on issues summaries. Expert input and a large data set of over 60,000 issues summaries found in 22 well-established open source projects from three major open source ecosystems are used to build, validate and evaluate the approach. ❧ The results validate the generalizability of the proposed approach in correctly and automatically identifying software maintainability related issues. When compared to some state-of-the-art measurements, the proposed approach is a more suitable metric to reflect the effort needed to understand the existing source code and the changes between versions. Further analysis on these projects provides a means for identifying the trend of software maintainability changes as software evolves. This informs which areas should be focused on to ensure maintainability at different stages of the development and maintenance process. Furthermore, this analysis shows the differences of software maintainability in different classifications and the relationships among software qualities that contribute to software maintainability.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Techniques for methodically exploring software development alternatives
PDF
Automatic test generation system for software
PDF
Constraint-based program analysis for concurrent software
PDF
Architectural evolution and decay in software systems
PDF
Software architecture recovery using text classification -- recover and RELAX
PDF
The effects of required security on software development effort
PDF
Software connectors for highly distributed and voluminous data-intensive systems
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Software security economics and threat modeling based on attack path analysis; a stakeholder value driven approach
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Improved size and effort estimation models for software maintenance
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Using metrics of scattering to assess software quality
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Calibrating COCOMO® II for functional size metrics
PDF
Formalizing informal stakeholder inputs using gap-bridging methods
PDF
Incremental development productivity decline
Asset Metadata
Creator
Chen, Qianqian Celia
(author)
Core Title
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/30/2019
Defense Date
03/04/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
natural language processing,OAI-PMH Harvest,software maintainability,software maintenance,software quality
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry (
committee chair
), Gupta, Sandeep (
committee member
), Wang, Chao (
committee member
)
Creator Email
qchen2@oxy.edu,qianqiac@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-161735
Unique identifier
UC11660208
Identifier
etd-ChenQianqi-7375.pdf (filename),usctheses-c89-161735 (legacy record id)
Legacy Identifier
etd-ChenQianqi-7375.pdf
Dmrecord
161735
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Chen, Qianqian Celia
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
natural language processing
software maintainability
software maintenance
software quality