Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A unified framework for studying architectural decay of software systems
(USC Thesis Other)
A unified framework for studying architectural decay of software systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A Unied Framework for Studying Architectural Decay of Software Systems by Joshua Garcia A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2014 Copyright 2014 Joshua Garcia Dedication To my parents, family, and loved ones. ii Acknowledgments I have had the pleasure of interacting with many people along the way to completing this dissertation. From those people, I have gained new knowledge, skills, and friendships. In some cases, I have been lucky to maintain the support of existing relationships. In fact, I can probably write a dissertation-length manuscript about how lucky and grateful I am to have had the backing of so many people|I hope the following pages will be a good-enough substitute. First, I must express my deepest gratitude to my advisor, Nenad (Neno) Medvidovic, for guiding and supporting me since my undergraduate years. Neno has been simultane- ously my greatest endorser and harshest critic. By being both, he has been encouraging while still providing the criticism necessary to make my work as strong as possible. De- spite his incredibly busy schedule, he has worked long hours to (1) fund me and (2) help me write about and present my work. Not only has Neno nurtured my technical and communication skills, he has also provided me sympathy and understanding when unfor- tunate events have occurred in my life. I am lucky to have had such a competent, caring, and understanding advisor who is always ready to provide insight and inspiration. iii My dissertation would not have made it this far without the guidance of my qualifying- exam and dissertation committees. I am grateful to the members of my committees: Professors Barry Boehm, Stan Settles, William G.J. Halfond, and Fei Sha. Several people were instrumental in helping me to enter the USC Computer Science PhD program. Those people include Alexandre Fran cois, Prof. Tom Jordan, Prof. Lau- rent Itti, and Margery Berti. Neno's research group and its alumni have (1) taught me an immense amount about conducting software-engineering research and surviving the PhD program and (2) shared their friendship with me. My fellow group members and group alumni have encouraged and inspired me. They have taken time out to review my papers and presentations, write papers with me, or listen to and critique my edgling ideas. When I would be stuck or frustrated, they would take time out of their busy schedules to help guide me and pull me out of whatever rut I was in. In some cases, they would help me formulate and esh out ideas or run experiments. Without their support, I would never have made it through. I am immensely grateful to all of them. During my PhD journey, Daniel Popescu has been the research-group member that I have worked with the most. Together, we discussed new ideas, ran experiments, assisted in teaching courses, and wrote several papers. I was privileged to experience many trials, tribulations, and successes working with Daniel. All of these experiences with him have helped to improve my critical thinking, communication skills, and technical skills. He even spearheaded work that I eventually took over, which includes the architectural- smells work and our joint ESEC/FSE paper. I am immensely grateful for his friendship and the many opportunities I've had to work with him. iv Despite juggling multiple projects, graduating, and starting up his own business, George has always been there to provide support and be a friend I can rely on. Many times, when I was stuck with a task, running the issue by George would reveal insightful ways of progressing. George would even be generous enough to pay for dinner and memo- rabilia from conference travels, invite me to his new oce, and allow me to muck around with his vintage computers. When times were hard for me, George never hesitated to take signicant time out of his busy schedule to provide guidance and help. Chris Mattmann has shown me what it means to be simultaneously a professional software engineer, researcher, and manager in industry. I always appreciate his willingness to work with me and his enthusiasm for our collaborations. Chris was responsible for (1) giving me an amazing opportunity to work at the NASA Jet Propulsion Laboratory (JPL) and (2) spearheading one of the rst software-engineering research papers that I worked on. I am grateful for all the opportunities and help he has given me. The experiences I had with the other alumni or senior members of Neno's research group were satisfying and enjoyable in their own unique ways. I have been lucky to collaborate with and obtain help from Ivo Krka on multiple research papers and courses. Our oce and lunch discussions about (1) strategy and tactics for conducting software- engineering research and (2) TV shows were always enjoyable and informative. The long hours working on robots and writing papers with Farshad were stressful but exciting. Discussions with Dave about juggling research and industry work at JPL were insightful and informative. I am grateful to Yuriy, Sam, and Chiyoung for their generosity in giving me advice. I am further honored to have collaborated with Yuriy and Sam and to have the opportunity to work more with Sam in the near future. v Not only did I interact with one generation of Neno's students, but I also had the pleasure of working with a new generation of them. Having the opportunity to write a paper with Jae was a great experience. Writing an ESEC/FSE paper with Reza was chal- lenging and fullling. Arman, Daniel Link, Duc, Eder, Jae, Pooyan, Reza, and Youn|I am honored to have had a chance to work with and socialize with all of you. Thank you all for (1) the support you gave me on my defense day, (2) the wonderful mug, and (3) the thoughtful cards. Besides working with Neno's research group, one of the great opportunities I have had was to collaborate with a variety of other researchers. I am lucky to have published an ASE paper with Igor, who has now become a close friend. I am honored to have been able to meet with G.J. extensively, publish an ESEC/FSE paper with him, and learn from him in general. It has been a pleasure writing proposals and papers with Prof. Yuanfang Cai and her student, Ran; researching Hadoop with Chris Douglas; and working on architecture recovery with Prof. Lin Tan and her students, Thibaud and Devin. Not only have I had the opportunity to work with researchers in North America, but I have had the honor of collaborating with researchers in Brazil. Those researchers include Professors Alessandro Garcia, Arndt von Staa, and Eduardo Santana de Almeida; and their students Isela Macia Bertran, Willian Oizumi, and Simone da Silva Amorim. As a PhD student, I was given the opportunity to work at JPL, thanks to Chris Mattmann. Chris placed me on very interesting scientic computing and Big Data projects. During my time at JPL, I was fortunate enough to work under Dan Crich- ton, Amy Braverman, and Dana Freeborn. Furthermore, I was honored to work with a vi variety of smart people. Even after leaving, I am lucky to still be able to talk to Luca about music or ask Sean Kelly for programming advice. I am also grateful to have had Mark Nakamura's help with some portions of the ARCADE study in this dissertation. Besides working at JPL, I was fortunate enough to be a research assistant at the Southern California Earthquake Center (SCEC) during the rst summer of my PhD. As part of that assistantship, I was given the opportunity to work on large-scale data and computation issues under the supervision of Phil Maechling and the guidance of Scott Callaghan. Furthermore, I was honored to have interacted more with Gideon Juve, whose advice was helpful for experiments I conducted as part of this dissertation. The completion of key components of this dissertation has been made possible with the help of other students and researchers. I would like to thank Periklis Andritsos, Valerio Maggio, Spiros Mancoridis, Onaiza Maqbool, and Vassilios Tzerpos for their help with using or implementing their tools or techniques. I am grateful to Eric Dashofy, Chris Douglas, Bassel Haddad, Chris Mattmann, Chet Ramey, and Yongjie Zheng for their assistance with recovering the ground-truth architectures of ArchStudio, Bash, Hadoop, and OODT. My thanks to Rainer Koschke for his help with (1) using his tools and (2) my early studies of architecture recovery. I would also like to acknowledge Anita Singh for her help with the recovery-criterion analysis. Finally, I am grateful to Ashish Vaswani for his tips about natural language processing. Going to conferences has been a great opportunity to get to know a variety of re- searchers and practitioners of software engineering. To all the students I have roomed with, thank you for sharing expenses and space with me. Those people include Kivan c, Naeem, Nupul, Sai, and Xu. Kivan c, may there always be time for anime|even though vii there really isn't. Nupul, the long discussions in our room and at USC have been en- tertaining. Naeem, thank you for being so forthcoming and helpful as a roommate and with my transition to Virginia. Sai, thank you for being a hospitable and respectful roommate|twice. I am also grateful to the students who I have had signicant interactions with after meeting them at conferences. Robert Dyer, it has been a pleasure interacting with you at conferences and online. Suman, I am glad we have been able to keep in touch, even years after meeting at my rst software-engineering conference. Adrian Kuhn, thank you for your support as I learned about the intersection of information retrieval and software engineering. Performing administrative tasks at USC would be signicantly more dicult without the support of the sta at USC. To Lizsl and Steve, thank you for all your help with the PhD and undergraduate programs. To Julie, I appreciate all your assistance with completing administrative tasks. The eorts of these and other sta members have helped to simplify a complex journey. To my extended family (especially my family in the Philippines), I want to thank you for supporting my life and my formal education. You all have been there for me in a variety of ways; I very much appreciate all of that and will never forget it. In particular, I would like to mention the love and support of my grandparents (Aiding and Will), my aunts (Khendy, Agnes, Dwin, and Sara), and uncles (Junie, Sonny, and Bovic). My girlfriend's family and family friends have also extended their assistance to me and my girlfriend during the course of working on my PhD. To them, I want to extend my gratitude for all they have done for me and her. In particular, I would like to mention viii her parents (Malou and Hermie), her brothers (C.J. and Chris), her aunts (Margee and Betty), and her brother's girlfriend (Fran). I need to acknowledge my doggy children, Nikki and Lily. Even though they can't read any of this, they have helped brighten my darker days, especially along my PhD journey. I wish Nikki were still around to help us celebrate the end of it. Nikki was the biggest loss during my PhD journey; and I will miss her always! Luckily, Lily was my little \return to happiness"|that's actually what her name means. I am forever grateful to my parents, Jose and Patricia, for all their sacrices, love, and support. Without their help, I could not have started the PhD, let alone nish it. Their presence in my life and support of my decisions has been an incredible blessing. From before my birth until now, all the time, money, and eort you expended on me is very much appreciated|I am extremely lucky to have the two of you; and I love you both. Finally, I must express my love and appreciation to my long-time girlfriend, Catherine. She has been there for me and supported me through the many ups and downs of my PhD journey. In fact, she's been there for me since the end of high school. I am very lucky to have had her with me through so many journeys in my life, including this PhD journey that is coming to a close. I do not know how I could have made it through some of the most dicult times of my life without her. Cathy, I love you with all my heart! Thank you all again for being part of my PhD journey! For everyone I did not mention by name, forgive me for not doing so|your help is much appreciated. I hope that, if I have not done so already, I will be able to give back for all that everyone has done for me. ix Table of Contents Dedication ii Acknowledgments iii List of Figures xii Abstract xiv Chapter 1 Introduction 1 1.1 Problems 2 1.2 Contributions 4 Chapter 2 Defining Architectural Smells 8 2.1 Denition 10 2.2 Systems Under Discussion 12 2.3 Initial Architectural Smells 16 Chapter 3 Formalizing and Detecting Architectural Smells 31 3.1 Architectural Concept Formalization 31 3.2 Formal Architectural-Smell Denitions 37 3.3 Detection of Architectural Smells 44 Chapter 4 Framework for Ground-Truth Architecture Recovery 51 4.1 Mapping Principles 53 4.2 Ground-Truth Recovery Process 59 Chapter 5 Enhancing Architectural Recovery Using Concerns 70 5.1 Obtaining Concerns through Probabilistic Topic Models 72 5.2 Brick Recovery 77 5.3 Concern Meta-Classication 80 5.4 Brick Classication 83 Chapter 6 Framework for Studying Architectural Change and Decay 86 6.1 Foundation 86 6.2 ARCADE 92 x Chapter 7 Evaluation 97 7.1 Applying the Ground-Truth Recovery Framework 97 7.2 A Comparative Analysis of Recovery Techniques 120 7.3 An Empirical Study of Architectural Change and Decay in Open-Source Software Systems 148 Chapter 8 Related Work 166 8.1 Architectural Smells 166 8.2 Architecture Recovery 172 8.3 Architectural Evolution 180 Chapter 9 Conclusion and Future Work 183 9.1 Future Work 186 References 192 xi List of Figures 2.1 Structural View of the Grid Reference Architecture 13 2.2 System Stack Layers in MIDAS 15 2.3 The top diagram depicts Connector Envy involving communication and facilitation services. The bottom diagram shows Connector Envy involving a conversion service. 18 2.4 The Scattered Parasitic Functionality occurring across three components. 22 2.5 An Ambiguous Interface is implemented using a single public method with a generic type as a parameter. 24 2.6 The connector SoftwareEventBus is accompanied by a direct method in- vocation between two components. 27 3.1 Shorthand Predicates for Architectural Connectivity 37 4.1 Classication of the principles used for ground-truth recovery. 53 4.2 Dierent ways of applying mapping principles. 54 4.3 For the same groups and classes, a sequence of principles can result in signicantly dierent groupings. 58 4.4 The ground-truth recovery of Hadoop 0.19.0 showing the main components of the Map/Reduce and HDFS subsystems. At this magnication, the gure is intended only as an illustration of Hadoop's complexity. This diagram can be found fully magnied at [9]. 60 4.5 The process for obtaining ground-truth recoveries. 61 5.1 Overall approach for recovering components and connectors 71 xii 5.2 An LDA model of Hadoop 0.19.0 with 40 topics 76 6.1 ARCADE's key components and the artifacts it uses and produces. 93 7.1 Summary information about systems recovered 98 7.2 An architectural diagram from Hadoop's documentation. 100 7.3 For Hadoop, an application principle overrides a domain principle. 101 7.4 Two views of Hadoop's ground-truth architecture. 101 7.5 Conceptual architecture of Bash. 102 7.6 Ground-truth architecture of Bash. 103 7.7 Conceptural architecture of ArchStudio. 104 7.8 Ground-truth architecture of ArchStudio. 105 7.9 Conceptual architecture of OODT. 106 7.10 Ground-truth architecture of OODT. 107 7.11 Data on the number of entities within components 112 7.12 Data on the number of core and utility components 112 7.13 The extent to which package or directory structures represent the architecture112 7.14 Time spent by recoverers and certiers, and the number and purpose of exchanged email messages 115 xiii Abstract The eort and cost of software maintenance tends to dominate other activities in a soft- ware system's lifecycle. A critical aspect of maintenance is understanding and updating a software system's architecture. However, the maintenance of a system's architecture is ex- acerbated by the related phenomena of architectural drift and erosion [164]|collectively called architectural decay|which are caused by careless, unintended addition, removal, and/or modication of architectural design decisions. These phenomena make the archi- tecture more dicult to understand and maintain and, in more severe cases, can lead to errors that result in wasted eort or loss of time or money. To deal with architectural decay, an engineer must be able to obtain (1) the current architecture of her system and understand (2) the symptoms of decay that may occur in a software system and (3) the manner in which architectures tend to change and the decay it often causes. The high-level contribution of this dissertation is a unied framework for addressing dierent aspects of architectural decay in software systems. This framework includes a catalog comprising an expansive list of architectural smells (i.e., architectural-decay in- stances) and a means of identifying such smells in software architectures; a framework for constructing ground-truth architectures to aid the evaluation of automated recovery xiv techniques; ARC, a novel recovery approach that is accurate and extracts rich architec- tural abstractions; and ARCADE, a framework for the study of architectural change and decay. Together, these aspects of the unied framework are a comprehensive means of addressing the dierent problems that arise due to architectural decay. This dissertation provides several evaluations of its dierent contributions: it presents case studies of architectural smells, describes lessons learned from applying the ground- truth recovery framework, compares architecture-recovery techniques along multiple accu- racy measures, and contributes the most extensive empirical study of architectural change and decay to date. This dissertation's comparative analysis of architecture-recovery tech- niques addresses several shortcomings of previous analyses, including the quality of ground truth utilized, the selection of recovery techniques to be analyzed, and the limited number of perspectives from which the techniques are evaluated. The empirical study of architec- tural change and decay in this dissertation is the largest empirical study to date of its kind in long-lived software systems; the study comprises over 112 million source-lines-of-code and 460 system versions from a dozen software systems. xv Chapter 1 Introduction The eort and cost of software maintenance tends to dominate other activities in a soft- ware system's lifecycle. A critical aspect of maintenance is understanding and updating a software system's architecture. However, the maintenance of a system's architecture is ex- acerbated by the related phenomena of architectural drift and erosion [164]|collectively called architectural decay|which are caused by careless, unintended addition, removal, and/or modication of architectural design decisions. These phenomena make the archi- tecture more dicult to understand and maintain and, in more severe cases, can lead to errors that result in wasted eort or loss of time or money. To deal with architectural decay, an engineer must be able to obtain (1) the current architecture of her system and understand (2) the symptoms of decay that may occur in a software system and (3) the manner in which architectures tend to change and the decay it often causes. The rest of this chapter is organized as follows. Section 1.1 describes the key problems that motivate this dissertation; Section 1.2 overviews the contributions that address those problems. 1 1.1 Problems To determine the current architecture of a software system, a number of techniques have been proposed to help recover a system's architecture from its implementation [51, 91]. However, existing architecture-recovery techniques are known to suer from inaccuracies and typically return dierent results as \the architecture" for the same system. In turn, this can lead to (1) diculties in assessing a recovery technique, (2) risks in relying on a given technique, and (3) awed strategies for improving a technique. These problems to a large extent stem from a lack of \ground truths" preventing high quality evaluation of architecture-recovery techniques. We refer to each such baseline as a ground truth. In this context, a ground truth is the architecture of a software system that has been veried as accurate by the system's architects or developers who have intimate knowledge of the underlying application and problem domain. Such knowledge is often undocumented and thus less likely to be known to engineers who were not involved in constructing the system. There are examples in the literature of researchers who had a similar motivation to ours and who had extensively studied and documented the architectures of existing applications, but without the involvement of the applications' own engineers (e.g., [32, 117]). Besides the obstacles that exist in evaluating recovery techniques, these techniques are limited in the types of constructs that they can automatically extract from a system's implementation. Automated recovery techniques mainly map implementation-level enti- ties to high-level system components by clustering the entities and taking the resulting 2 clusters to be components [17, 90, 110, 15, 127, 169]. However, existing automated re- covery techniques obtain neither a system's concerns nor its connectors. Concerns are associated with a system's components and are the responsibilities, concepts, or roles in a software system. Connectors play a critical role in mediating component interac- tions [164]. Engineers need to examine these elements in order to examine architectural decay in a software system. By not recovering concerns associated with the components, the prevailing coupling-and-cohesion-based clustering methods make it dicult to under- stand the meaning of a cluster or whether a cluster truly represents a component. In the case of recovery techniques for connectors, those techniques uniformly depend on signif- icant human involvement. In particular, existing techniques for connector recovery use patterns or queries to identify the connectors within a system [55, 75, 124, 152]. These techniques require an architect to write a pattern or query for each implementation vari- ant of every possible connector type. Creating such specications is a manual task that can be time consuming and error prone. To deal with architectural decay once an architecture is recovered, the instances of such decay must be identied. Although decay has been studied at the code level, research about the symptoms of architectural decay that can occur and the means of detecting that decay have been highly limited. The dierent kinds of architectural constructs that can decay (e.g., components, connectors, congurations and interfaces) and the consequences of such decay are poorly understood. In turn, the ill-conceived design decisions that negatively aect the maintainability of a software system's architecture have been largely neglected. 3 Despite neglecting the dierent symptoms of architectural decay, the study of software architecture, from its very inception, has recognized architectural decay in general as a regularly-occurring phenomenon in long-lived systems. At the same time, there is a relative dearth of empirical data about the nature of architectural change and the actual extent of decay in existing systems. With such empirical data, engineers can better understand architectural change, the decay associated with (and often caused by) that change, and ultimately, the very nature of architectural decay. In turn, such a study can allow engineers to determine strategies for stemming that decay. 1.2 Contributions The high-level contribution of this dissertation is a unied framework for addressing dierent aspects of architectural decay in software systems. The framework provides a (1) means of studying and identifying architectural decay, including instances of decay and the manner in which decay evolves; (2) an approach for recovering an accurate and rich view of the architecture of an implemented system; and (3) support for evaluating and improving techniques for recovering an architecture from an implementation. Together, these aspects of the framework are a comprehensive means of addressing the dierent problems that arise due to architectural decay. The specic individual contributions of this dissertation are as follows. Conceptualization of Architectural Smells. We dene the concept of archi- tectural smells, which are instances of architectural decay aecting the structure of a system's architecture. We then expound upon a set of four initial smells by describing 4 them in detail and illustrating their occurence in case studies from research literature and our own architectural recovery [32] [118] and industrial maintenance eorts. Catalog of Architectural Smells and Mechanisms for Detecting Smells. We formalize architectural concepts in order to incorporate the notion of concerns into ar- chitecture and rigorously distinguish between components and connectors. This formal- ization of architectural concepts allows us to formalize an expanded list of architectural smell denitions. We directly use these formalized smell denitions and architectural concepts to help us produce algorithms to detect dierent types of smells automatically. A Framework for Obtaining Ground-Truth Architectures. We present a framework intended to aid the recovery of ground-truth architectures. The framework denes a set of principles and a process that results in a reliable ground-truth recovery. The process involves an architect or long-term contributor of the system in a limited yet meaningful way. The framework's principles, referred to as mapping principles, serve as rules or guidelines for grouping code-level entities into architectural elements and for identifying the elements' interfaces. The framework bases these principles on four types of information used to obtain ground truth: generic information (e.g., system-module dependencies), domain information (e.g., architectural-style rules), application informa- tion (e.g., the purpose of the source code elements), and information about the system context (e.g., the programming language used). We further discuss our ndings in obtaining the ground-truth architectures for four existing systems. The systems in our study come from several problem domains, including large-scale data-intensive computing; architectural modeling and analysis; and operating system command-line shells. These software systems have been used and maintained for 5 years, are written in Java or C, and range from 70 KSLOC to 280 KSLOC. For each system, we had access to one or two of its architects or key developers. The variety of the systems allowed us to form some general insights about obtaining ground-truth architectures. We also discuss our experience and lessons learned in enlisting the help of the systems' engineers. Enhancement of Architectural Recovery Using Concerns. We provide a novel technique that extracts system concerns and leverages them to automate the recovery of both components and connectors. The objective of this work is to obtain automatically recovered software architectures that are more comprehensive and more accurate than those yielded by current methods. To better understand the accuracy of existing architecture recovery techniques and to address the shortcomings encountered in previous comparative studies of such techniques, we present a comparative analysis of six automated recovery techniques, including our novel recovery technique. Each recovery technique is applied to a set of eight ground-truth architectures and evaluated for accuracy using three architecture-recovery metrics|one widely-used metric and two novel metrics. Our results indicate that two of the selected recovery techniques (of which one of them is our own recovery technique) are superior to the rest along multiple measures. However, the results also show that there is signicant room for improvement in all of the studied techniques. In fact, while the accuracy of individual techniques varies across the dierent subject systems, on the whole the techniques performed surprisingly poorly. We discuss the threats to our study's validity, the possible reasons behind our results, and several possible avenues of future research in automated architecture recovery. 6 A Workbench for Evaluating Architectural Change and Decay. To study architectural change, the decay associated with (and often caused by) that change, and ultimately, the very nature of architectural decay, we present a novel approach, Archi- tecture Recovery, Change, And Decay Evaluator (ARCADE). ARCADE is a software workbench that employs (1) a suite of architecture-recovery techniques, (2) a catalogue of architectural smell denitions, (3) accompanying smell-detection algorithms, and (4) a set of metrics for measuring dierent aspects of architectural change and decay. ARCADE leverages these four elements to construct an expansive view showcasing the actual (as opposed to idealized) evolution of a software system's architecture. While analogous anal- yses have been attempted at the level of system implementation [95, 68, 88, 37, 52, 132], ARCADE represents the rst solution of which we are aware that enables investigating such issues at the level of architecture. To demonstrate the kinds of research questions that can be pursued using ARCADE, we have performed an empirical study in which we analyzed several hundred versions of 12 open-source Apache systems, totaling over 112 million source-lines-of-code (SLOC). 7 Chapter 2 Defining Architectural Smells As the cost of developing software increases, so does the incentive to evolve and adapt existing systems to meet new requirements, rather than building entirely new systems. Today, it is not uncommon for a software application family to be maintained and up- graded over a span of ve years, ten years, or longer. However, in order to successfully modify a legacy application to support new functionality, run on new platforms, or inte- grate with new systems, evolution must be carefully managed and executed. Frequently, it is necessary to refactor [125], or restructure the design of a system, so that new re- quirements can be supported in an ecient and reliable manner. The most commonly used way to determine how to refactor is to identify code bad smells [57] [125]. Code smells are implementation structures that negatively aect system lifecycle properties, such as understandability, testability, extensibility, and reusability; that is, code smells ultimately result in maintainability problems. Common examples of code smells include very long parameter lists and duplicated code (i.e., clones). Code smells are dened in terms of implementation-level constructs, such as methods, classes, parameters, and statements. Consequently, refactoring methods to correct code smells 8 also operate at the implementation level (e.g., moving a method from one class to another, adding a new class, or altering the class inheritance hierarchy). While detection and correction of code smells is one way to improve system maintain- ability, some maintainability issues originate from poor use of software architecture-level abstractions | components, connectors, styles, and so on | rather than implementation constructs. In our previous work [82], we introduced the notion of architectural bad smells and identied four representative smells. Architectural bad smells are combinations of ar- chitectural constructs that induce reductions in system maintainability and thus represent instances of architectural decay. Architectural smells are analogous to code smells be- cause they both represent common \solutions" that are not necessarily faulty or errant, but still negatively impact software quality. In this chapter, we describe our four original architectural smells in detail and illustrate their occurence in case studies from research literature and our own architecture recovery [32] [118] and industrial maintenance eorts. The remainder of this chapter is organized as follows. Section 2.1 explains the char- acteristics and signicance of architectural smells. Section 2.2 introduces two long-term software maintenance eorts on industrial systems and case studies from research liter- ature that we use to illustrate our four representative architectural smells. Section 2.3 describes our four architectural smells in detail, and illustrates the impact of each smell through concrete examples drawn from the systems mentioned in Section 2.2. 9 2.1 Denition In this section, we dene what constitutes an architectural smell and discuss the important properties of architectural smells. The term architectural smell was originally used in [100]. The authors of [100] dene an architectural smell as a bad smell, an indication of an underlying problem, that occurs at a higher level of a system's granularity than a code smell. However, we found that this definition of architectural smell does not recognize that both code and architectural smells specifically affect lifecycle qualities, not just any system quality. Therefore, we dene architectural smells as a commonly used architectural decision that negatively im- pacts system lifecycle qualities. Architectural smells may be caused by applying a design solution in an inappropriate context, mixing combinations of design abstractions that have undesirable emergent behaviors, or applying design abstractions at the wrong level of granularity. Architectural smells must affect lifecycle properties, such as understand- ability, testability, extensibility, and reusability, but they may also have harmful side effects on other quality properties like performance and reliability. Architectural smells are remedied by altering the internal structure of the system and the behaviors of internal system elements without changing the external behavior of the system. Besides dening architectural smells explicitly in terms of lifecycle properties, we extend, in three ways, the denition of architectural smell found in [100]. Our rst extension to the denition is our explicit capture of architectural smells as design instances that are independent from the engineering processes that created the design. That is, human organizations and processes are orthogonal to the denition 10 and impact of a specic architectural smell. In practical terms, this means that the detection and correction of architectural smells is not dependent on an understanding of the history of a software system. For example, an independent analyst should be able to audit a documented architecture and indicate possible smells without knowing about the development organization, management, or processes. For our second extension to the denition, we do not dierentiate between architec- tural smells that are part of an intended design (e.g., a set of UML specications for a system that has not yet been built) as opposed to an implemented design (e.g., the implicit architecture of an executing system). Furthermore, we do not consider the non- conformance of an implemented architecture to an intended architecture, by itself, to be an architectural smell because an implemented architecture may improve maintainability by violating its intended design. For example, it is possible for an intended architecture of a system to include poor design elements, while the (non-conforming) implemented architecture replaces those elements with better solutions. For our last extension, we attempt to facilitate the detection of architectural smells through specic, concrete denitions captured in terms of standard architectural building blocks | components, connectors, interfaces, and congurations. Increasingly, software engineers reason about their systems in terms of these concepts [157, 164], so in order to be readily applicable and maximally eective, our architectural smell denitions similarly utilize these abstractions (see Section 2.3). The denition in [100] does not utilize explicit architectural interfaces or rst-class connectors in their smells. In many contexts, a design that exhibits a smell will be justied by other concerns. Ar- chitectural smells always involve a trade-o between dierent properties, and the system 11 architects must determine whether action to correct the smell will result in a net benet. Furthermore, refactoring to reduce or eliminate an architectural smell may involve risk and almost always requires investment of developer eort. 2.2 Systems Under Discussion Our experience with two long-term software projects brought us to the realization that some commonly-used design structures adversely aect system maintainability. In this section, we introduce these projects by summarizing their context and objectives. Later in the chapter, we utilize specic examples from these projects to illustrate the impact of architectural bad smells. Maintenance of large-scale software systems includes both architectural recovery and refactoring activities. Architectural recovery is necessary when a system's conceptual architecture is unknown or undocumented. Architectural refactoring is required when a system's architecture is determined to be unsatisfactory and must be altered. We discov- ered architectural bad smells during both an architectural recovery eort (summarized in Section 2.2.1) and an architectural refactoring eort (summarized in Section 2.2.2). To substantiate our observations, we found further examples of architectural bad smells that appear in recovery and refactoring eorts published in the research literature. 2.2.1 Grid Architecture Recovery An extensive study of grid system [56] implementations contributed to our collection and insights of architectural smells. Grid technologies allow heterogeneous organizations to solve complex problems using shared computing resources. Four years ago, we conducted 12 a pilot study [115] in which we extracted and studied the architecture of ve widely- used grid technologies and compared their architectures to the published grid reference architecture [56]. We subsequently completed a more comprehensive grid architecture recovery project and recently published a report [118] on the architectures of eighteen grid technologies, including a new reference architecture for the grid. The examined grid systems were developed in C, C++, or Java and contained up to 2.2 million SLOC (Source Lines of Code). Many of these systems included similar design elements that have a negative eect on quality properties. Resource Node Resource Node Fabric Resource Resource Fabric Application other grid nodes Client/Server Interactions job execution/ (meta-)data query 1...* grid nodes (typically 1 per organization) 1...* resource nodes Collective Layered Interactions Notification interactions Resource Utilization Balancing Data Exchange Request Interactions P2P interactions Figure 2.1: Structural View of the Grid Reference Architecture Figure 2.1 shows the identied reference architecture for the grid. A grid system is composed of four subsystem types: Application, Collective, Resource, and Fabric. Each subsystem type is usually instantiated multiple times. An Application can be any client that needs grid services and is able to use an API that interfaces with Collective or Re- source components. The components in the Collective subsystem are used to orchestrate and distribute data and grid jobs to the various available resources in a manner consistent 13 with the security and trust policies specied by the institutions within a grid system (i.e., the virtual organization). The Resource subsystem contains components that perform individual operations required by a grid system by leveraging available lower-level Fab- ric components. Fabric components oer access capabilities to computational and data resources on an individual node (e.g., access to le-system operations). Each subsystem type uses different interaction mechanisms to communicate with other subsystems types, as noted in Figure 2.1. The interaction mechanisms are described in [118]. 2.2.2 MIDAS Architecture Refactoring In collaboration with an industrial partner, for the last three years we have been devel- oping a lightweight middleware platform, called MIDAS, for distributed sensor applica- tions [107] [155]. Over ten software engineers in three geographically distributed locations contributed to MIDAS in multiple development cycles to address changing and growing requirements. In its current version, MIDAS implements many high-level services (e.g., transparent fault-tolerance through component replication) that were not anticipated at the commencement of the project. Additionally, MIDAS was ported to a new operat- ing system (Linux) and programming language (C++), and capabilities tailored for a new domain (mobile robotics) were added. As a consequence, the MIDAS architecture was forced to evolve in unanticipated ways, and the system's complexity grew substan- tially. In its current version, the MIDAS middleware platform consists of approximately 100 KSLOC in C++ and Java. The iterative development of MIDAS eventually caused several architectural elements to lose conceptual coherence (e.g., by providing multiple 14 services). As a consequence, we recently spent three person-months refactoring the sys- tem to achieve better modularity, understandability, and adaptability. While performing the refactoring, we again encountered architectural structures that negatively aected system lifecycle properties. Figure 2.2 shows a layered view of the MIDAS middleware platform. The bottom of the MIDAS architecture is a virtual machine layer that allows the middleware to be deployed on heterogeneous OS and hardware platforms eciently. The host abstraction facilities provided by the virtual machine are leveraged by the middleware's architec- tural constructs at the layer above. These architectural constructs enable a software organization to directly map its system's architecture to the system's implementation. Finally, these constructs are used to implement advanced distributed services such as fault-tolerance and resource discovery. Figure 2.2: System Stack Layers in MIDAS 15 2.2.3 Studies from Research Literature Given the above experiences, we examined the work in architectural recovery and refac- toring published in research literature [32] [67] [71] [166], which helped us to understand architectural design challenges and common bad smells. In this chapter, we refer to ex- amples from a case study that extracted and analyzed the architecture of Linux [32]. In this study, Bowman et al. created a conceptual architecture of the Linux kernel based on available documentation and then extracted the architectural dependencies within the kernel source code (800 KSLOC). They concluded that the kernel contained a number of design problems, such as unnecessary and unintended dependencies. 2.3 Initial Architectural Smells This section describes four architectural smells in detail. We dene each architectural smell in terms of participating architectural elements | components, connectors, in- terfaces, and congurations. Components are computational elements that implement application functionality in a software system [156]. Connectors provide application- independent interaction facilities, such as transfer of data and control [123]. Interfaces are the interaction points between components and connectors. Finally, congurations represent the set of associations and relationships between components and/or connec- tors. We provide a generic schematic view of each smell captured in one or more UML diagrams. Architects can use diagrams such as these to inspect their own designs for architectural smells. 16 2.3.1 Connector Envy Description. Components with Connector Envy encompass extensive interaction-related functionality that should be delegated to a connector. Connectors provide the following types of interaction services: communication, coordination, conversion, and facilitation [123]. Communication concerns the transfer of data (e.g., messages, computational re- sults, etc.) between architectural elements. Coordination concerns the transfer of control (e.g., the passing of thread execution) between architectural elements. Conversion is con- cerned with the translation of diering interaction services between architectural elements (e.g., conversion of data formats, types, protocols, etc.). Facilitation describes the me- diation, optimization, and streamlining of interaction (e.g., load balancing, monitoring, and fault tolerance). Components that extensively utilize functionality from one or more of these four categories suer from the Connector Envy smell. Figure 2.3a shows a schematic view of one Connector Envy smell, where ComponentA implements communication and facilitation services. ComponentA imports a communi- cation library, which implies that it manages the low-level networking facilities used to implement remote communication. The naming, delivery and routing services handled by remote communication are a type of facilitation service. Figure 2.3b depicts another Connector Envy smell, where ComponentB performs a conversion as part of its processing. The interface of ComponentB called process is implemented by the PublicInterface class of ComponentB. PublicInterface implements its process method by calling a conversion method that transforms a parameter of type Type into a ConcernType. 17 ComponentA Communication Library <<import>> ProcessingInterfaceA ProcessingInterfaceB ComponentB process + process(Type P) - convert(Type P) PublicInterface process(Type P){ b = new CoreClassB(); b.processCoreConcern (convert(P)); } + processCoreConcern (ConcernType P) CoreClassB a b Figure 2.3: The top diagram depicts Connector Envy involving communication and fa- cilitation services. The bottom diagram shows Connector Envy involving a conversion service. Quality Impact and Trade-os. Coupling connector capabilities with component functionality reduces reusability, understandability, and testability. Reusability is reduced by the creation of dependencies between interaction services and application-specic ser- vices, which make it dicult to reuse either type of service without including the other. The overall understandability of the component decreases because disparate concerns are commingled. Lastly, testability is aected by Connector Envy because application func- tionality and interaction functionality cannot be separately tested. If a test fails, either the application logic or the interaction mechanism could be the source of the error. As an example, consider a MapDisplay component that draws a map of the route followed by a robot through its environment. The component expects position data to arrive as Cartesian coordinates and converts that data to a screen coordinate system that uses only positive x and y values. The MapDisplay suers from Connector Envy because it performs conversion of data formats between the robot controller and the user interface. If the MapDisplay is used in a new, simulated robot whose controller represents the world in screen coordinates, the conversion mechanism becomes super uous, yet the MapDisplay cannot be reused intact without it. Errors in the displayed location of the 18 robot could arise from incorrect data conversion or some other part of the MapDisplay, yet the encapsulation of the adapter within the MapDisplay makes it dicult to test and verify in isolation. The Connector Envy smell may be acceptable when performance is of higher priority than maintainability. More specically, explicitly separating the interaction mechanism from the application-specic code creates an extra level of indirection. In some cases, it may also require the creation of additional threads or processes. Highly resource- constrained applications that use simple interaction mechanisms without rich semantics may benet from retaining this smell. However, making such a trade-o simply for eciency reasons, without considering the maintainability implications of the smell, can have a disastrous cumulative eect as multiple incompatible connector types are placed within multiple components that are used in the same system. Example from Industrial Systems. The Gfarm Filesystem Daemon (gfsd) from a grid technology called Grid Datafarm [163] is a concrete example of a component with Connector Envy that follows the form described in Figure 2.3. The gfsd is a Resource component and runs on a Resource node as depicted in Figure 2.1. The gfsd imports a library that is used to build the lightweight remote procedure call (RPC) mechanism within the gfsd. This built-in RPC mechanism provides no interfaces to other components and, thus, is used solely by the gfsd. While the general schematic in Figure 2.3 shows only an instance of communication and facilitation, this instance of the smell also introduces coordination services by implementing a procedure call mechanism. The interfaces of the gfsd provide remote le operations, le replication, user authentication and node resource status monitoring. These interfaces and the gfsd's RPC mechanism enable the 19 notication, request, and P2P interactions shown in Figure 2.1 that occur across Resource nodes in Grid Datafarm. Reusability, modiability, and understandability are adversely aected by the Con- nector Envy smell in the gfsd. The reusability eects of Connector Envy can be seen in a situation where a new Resource component, called Gfarm work ow system daemon (gwsd), that provides work ow-based services is added to Grid Datafarm. The RPC mechanism within the gfsd is built without interfaces that can be made available to other components, hence the RPC mechanism cannot be used with the gwsd. Understandability is reduced by the unnecessary dependencies between the gfsd's application-specic func- tionality (e.g., le replication, local le operations, etc.) and the RPC mechanism. The combination of application-specic functionality and interaction mechanisms throughout the functions of the gfsd enlarge the component in terms of function size, number of functions, and shared variables. Both modiability and understandability are adversely aected by having the overwhelming majority of the gfsd's functions involve the use or construction of Grid Datafarm's RPC mechanism. It is possible that since grid technologies need to be ecient, the creators of Grid Datafarm may have intentionally built a gfsd with Connector Envy in order to avoid the performance eects of the indirection required for a fully separated connector. An- other fact to consider is that Grid Datafarm has been in use for at least seven years and has undergone a signicant number of updates that have expanded the gfsd's func- tionality. This has likely resulted in further commingling of connector-functionality with application-specic functionality. 20 2.3.2 Scattered Parasitic Functionality Description. Scattered Parasitic Functionality describes a system where multiple com- ponents are responsible for realizing the same high-level concern and, additionally, some of those components are responsible for orthogonal concerns. This smell violates the principle of separation of concerns in two ways. First, this smell scatters a single concern across multiple components. Secondly, at least one component addresses multiple orthog- onal concerns. In other words, the scattered concern infects a component with another orthogonal concern, akin to a parasite. Combining all components involved creates a large component that encompasses orthogonal concerns. Scattered Parasitic Functionality may be caused by cross-cutting concerns that are not addressed properly. Note that, while similar on the surface, this architectural smell diers from the shotgun surgery code smell [57] because the code smell is agnostic to orthogonal concerns. Figure 2.4 depicts three components that are each responsible for the same high-level concern called SharedConcern, while ComponentB and ComponentC are responsible for orthogonal concerns. The three components in Figure 2.4 cannot be combined without creating a component that deals with more than one clearly-dened concern. ComponentB and ComponentC violate the principle of separation of concerns since they are both responsible for multiple orthogonal concerns. Quality Impact and Trade-os. The Scattered Parasitic Functionality smell ad- versely aects modiability, understandability, testability, and reusability. Using the concrete illustration from Figure 4, modiability, testability, and understandability of the 21 access ComponentA + SharedConcern ClassA ComponentB + SharedConcern + ConcernB ClassB ComponentC + SharedConcern + ConcernC ClassC Figure 2.4: The Scattered Parasitic Functionality occurring across three components. system are reduced because when SharedConcern needs to be changed, there are three possible places where SharedConcern can be updated and tested. Another facet reducing understandability is that both ComponentB and ComponentC also deal with orthogo- nal concerns. Designers cannot reuse the implementation of SharedConcern depicted in Figure 2.4 without using all three components in the gure. One situation where scattered functionality is acceptable is when the SharedConcern needs to be provided by multiple o-the-shelf (OTS) components whose internals are not available for modication. Example from Industrial Systems. Bowman et al.'s study [32] illustrates an oc- currence of Scattered Parasitic Functionality in the widely used Linux operating system. The case study reveals that Linux's status reporting of execution processes is actually implemented throughout the kernel, even though Linux's conceptual architecture indi- cates that status reporting should be implemented in the PROC le system component. Consequently, the status reporting functionality is scattered across components in the sys- tem. This instance of the smell resulted in two unintended dependencies on the PROC le system, namely, the Network Interface and Process Scheduler components became dependent on the PROC le system. 22 The PROC le system example suers from the same diminished lifecycle properties as the notional system described in the schematic in Figure 4. Modiability and testability are reduced because updates to status reporting functionality result in multiple places throughout the kernel that can be tested or changed. Furthermore, understandability is decreased by the additional associations created by Scattered Parasitic Functionality among components. The developers of Linux may have implemented the operating system in this manner since status reporting of dierent components may be assigned to each one of those components. Although it may at rst glance make sense to distribute such functionality across components, more maintainable solutions exist, such as implementing a monitoring connector to exchange status reporting data or creating an aspect [87] for status reporting. 2.3.3 Ambiguous Interfaces Description. Ambiguous Interfaces are interfaces that oer only a single, general entry- point into a component. This smell appears especially in event-based publish-subscribe systems, where interactions are not explicitly modeled and multiple components exchange event messages via a shared event bus. In this class of systems, Ambiguous Interfaces undermine static dependency analysis for determining execution ows among the compo- nents. They also appear in systems where components use general types such as strings or integers to perform dynamic dispatch. Unlike other constructs that reduce static an- alyzability, such as function pointers and polymorphism, Ambiguous Interfaces are not 23 programming language constructs; rather, Ambiguous Interfaces reduce static analyz- ability at the architectural level and can occur independently of the implementation-level constructs that realize them. Two criteria dene the Ambiguous Interface smell depicted in Figure 2.5. First, an Ambiguous Interface oers only one public service or method, although its component oers and processes multiple services. The component accepts all invocation requests through this single entry-point and internally dispatches to other services or methods. Second, since the interface only oers one entry-point, the accepted type is consequently overly general. Therefore, a component implementing this interface claims to handle more types of parameters than it will actually process by accepting the parameter P of generic type GeneralType. The decision whether the component lters or accepts an incoming event is part of the component implementation and usually hidden to other elements in the system. Quality Impact and Trade-os. Ambiguous Interfaces reduce a system's ana- lyzability and understandability because an Ambiguous Interface does not reveal which services a component is oering. A user of this component has to inspect the compo- nent's implementation before using its services. Additionally, in an event-based system, ComponentA process + process(GeneralType P) PublicInterface process(GeneralType P){ if (P.type == TypeA) {...} if (P.type == TypeB) {...} ... Figure 2.5: An Ambiguous Interface is implemented using a single public method with a generic type as a parameter. 24 Ambiguous Interfaces cause a static analysis to over-generalize potential dependencies. They indicate that all subscribers attached to an event bus are dependent on all publish- ers attached to that same bus. Therefore, the system seems to be more widely coupled than what is actually manifested at run-time. Even though systems utilizing the event- based style typically have Ambiguous Interfaces, components utilizing direct invocation may also suer from Ambiguous Interfaces. Although dependencies between these com- ponents are statically recoverable, the particular service being invoked by the calling component may not be if the called component contains a single interface that is an entry point to multiple services. The following example helps to illustrate the negative eect of the wide coupling. Consider an event-based system containingn components, where all components are con- nected to a shared event bus. Each component can publish events and subscribes to all events. A change to one publisher service of a component could impact (n 1) compo- nents, since all components appear to be subscribed to the event, even if they immediately discard this event. A more precise interface would increase understandability by narrow- ing the number of possible subscribers to the publishing service. Continuing with the above example, if each component would list its detailed subscriptions, a maintenance engineer could see which m components (m n) would be aected by changing the specic publisher service. Therefore, the engineer would only have to inspect the change eect on m components instead of n 1. Often times, components exchange events in long interactions sequences; in these cases, the Ambiguous Interface smell forces an ar- chitect to repeatedly determine component dependencies for each step in the interaction sequence. 25 Example from Industrial Systems. A signicant number of event-based middle- ware systems suer from the form of Ambiguous Interface smell depicted in Figure 2.5. An example of a widely used system that follows this design is the Java Messaging Service (JMS) [73]. Consumers in JMS receive generic Message objects through a single receive method. The message objects are typically cast to specic message types before any one of them is to be processed. Another event-based system that acts in this manner is the Information Bus [137]. In this system, publishers mark the events they send with subjects and consumers can subscribe to a particular subject. Consumers may subscribe to events using a partially-specied subject or through wild-cards, which encourage programmers to subscribe to more events then they actually process. The event-based mechanism used by MIDAS conforms to the diagram in Figure 2.5. In the manner described above, MIDAS is able to easily achieve dynamic adaptation. Through the use of DLLs, MIDAS can add, remove, and replace components during run-time, even in a highly resource-constrained sensor network system. As mentioned in Section 2.2.2, we have recently spent three person-months refactoring the system to achieve better modularity, understandability, and adaptability. During the refactoring, determining dependencies and causality of events in the system was dicult due to the issues of over-generalized potential dependencies described above. An extensive amount of recovery was needed to determine which dependencies occur between components and the context of those dependencies. 26 2.3.4 Extraneous Adjacent Connector Description. The Extraneous Adjacent Connector smell occurs when two connectors of dierent types are used to link a pair of components. Eight types of connectors have been identied and classied in the literature [123]. In this chapter, we focus primarily on the impact of combining two particular types of connectors, procedure call and event connec- tors, but this smell applies to other connector types as well. Figure 2.6 shows a schematic view of two components that communicate using both a procedure call connector and an event-based connector. ComponentA <<Connector>> SoftwareEventBus ComponentB + operation() ClassA + operation() ClassB ... a = new ClassA(); a.operation(); ... send receive receive send <<call>> Figure 2.6: The connector SoftwareEventBus is accompanied by a direct method invoca- tion between two components. In an event-based communication model, components transmit messages, called events, to other components asynchronously and possibly anonymously. In Figure 2.6, Compo- nentA and ComponentB communicate by sending events to the SoftwareEventBus, which dispatches the event to the recipient. Procedure calls transfer data and control through the direct invocation of a service interface provided by a component. As shown in Fig- ure 2.6, an object of type ClassB in ComponentB communicates with ComponentA using a direct method call. Quality Impact and Trade-os. An architect's choice of connector types may aect particular lifecycle properties. For example, procedure calls have a positive aect 27 on understandability, since direct method invocations make the transfer of control explicit and, as a result, control dependencies become easily traceable. On the other hand, event connectors increase reusability and adaptability because senders and receivers of events are usually unaware of each other and, therefore, can more easily be replaced or updated. However, having two architectural elements that communicate over dierent connector types in parallel carries the danger that the benecial eects of each individual connector may cancel each other out. While method calls increase understandability, using an additional event-based con- nector reduces this benet because it is unclear whether and under what circumstances additional communication occurs between ComponentA and ComponentB. For example, it is not evident whether ComponentA functionality needs to invoke services in Compo- nentB. Furthermore, while an event connector can enforce an ordered delivery of events (e.g., using a FIFO policy), the procedure call might bypass this ordering. Consequently, understandability is aected, because a software maintenance engineer has to consider the (often unforeseen and even unforeseeable) side eects the connector types may have on one another. On the other hand, the direct method invocation potentially cancels the positive impact of the event connector on adaptability and reusability. In cases where only an event connector is used, components can be replaced during system runtime or redeployed onto dierent hosts. In the scenario in Figure 2.6, ComponentA's implementation cannot be replaced, moved, or updated during runtime without invalidating the direct reference ComponentB has on ClassA. 28 This smell may be acceptable in certain cases. For example, standalone desktop applications often use both connector types to handle user input via a GUI. In these cases, event connectors are not used for adaptability benets, but to enable asynchronous handling of GUI events from the user. Example from Industrial Systems. In the MIDAS system, shown in Figure 2.2, the primary method of communication is through event-based connectors provided by the underlying architectural framework. All high-level services of MIDAS, such as resource discovery and fault-tolerance were also implemented using event-based communication. While refactoring as described in Section 2.2.2, we observed an instance of the Extrane- ous Adjacent Connector smell. We identied that the Service Discovery Engine, which contains resource discovery logic, was directly accessing the Service Registry component using procedure calls. During the refactoring, an additional event-based connector for routing had to be placed between these two components because the Fault Tolerance En- gine, which contains the fault tolerance logic, also needed access to the Service Registry. However, the existing procedure call connector increased the coupling between those two components and prevented dynamic adaptation of both components. This smell was accidentally introduced in MIDAS to solve another challenge encoun- tered during the implementation. In the original design, the Service Discovery Engine was broadcasting its events to all attached connectors. One of these connectors enabled the Service Discovery Engine to access peers over a UDP/IP network. This instance of the Extraneous Adjacent Connector smell was introduced so that the Service Discovery Engine could directly access the Service Registry, avoiding unnecessary network trac. 29 However, as discussed, the introduced smell instance caused the adaptability of the system to decrease. 30 Chapter 3 Formalizing and Detecting Architectural Smells In this chapter, we formalize architectural concepts and architectural-smell denitions, and present detection algorithms for a subset of architectural smells. This subset is representative of both (1) architectural smells in general and (2) the manner in which architectural-smell denitions can be transformed to detection algorithms. The remaining architectural smells have been shown to be very dicult to detect [142, 65]. Addressing them properly would require signicant additional research, which is outside the scope of this dissertation. The remainder of this chapter is organized as follows. Section 3.1 describes our formalization of architectural concepts that are used in Section 3.2 to dene an expanded list of architectural smells. Section 3.3 presents architectural-smell detection algorithms that leverage our formal smell denitions to detect a subset of architectural smells from our catalog. 3.1 Architectural Concept Formalization In this section, we provide denitions of basic software architectural concepts and use them to dene an expanded set of architectural smells. Our denitions are not intended 31 to be complete; they are restricted to those architectural concepts that will be useful for identifying smells. We also provide shorthand predicates in Figure 3.1 that we use to help us dene architectural smells. A software system's architecture is a graph G whose vertices are a set B of \bricks" (software components and connectors) and whose topology represents the interconnec- tions (i.e., \links" L) among those bricks. In order to represent and detect architectural smells, we model a system's architecture as a tuple comprising G, the nonempty set of \words" W that are used to \describe" (i.e., implement) the system modeled by the architecture, and the nonempty set of \topics" T addressed by the system; each topic is dened as a multinomial probability distribution over the system's words. By examining the words that have the highest probabilities in a topic, the meaning of that topic can be discerned. In this way, a topic can serve as a representation of a concern addressed by a software system. In other words, the set of topics T is a representation of the system's concerns. A = (G;W;T ) G = (B; L) W =fw i i2Ng T =fz i i2Ng z =Pd(W ) A brick b2 B can be either simple or composite. A composite brick cb2 CB is an architecture in its own right, allowing for multiple levels of architectural abstraction. We omit the formal denition of CB because it is essentially the same as that for architecture 32 A above. Each simple brick sb2 SB is a tuple comprising the brick's internal state S, its interfaceI, set of operationsO, the map M that relates the operations and the interfaces through which they are exported, and the probability distribution over the system's topics T. B =SB[CB SB =fb i i2Ng b = (S;I;O;M; b ) A brick's state S is dened as a set of variables, where each variable v is a tuple comprising a name n (which must be one of the words in W ), a type t, and a value val. S=fv i i2Ng v = (n;t;val) nW A brick's interface consists of a set of interface elements ie, each of which is a tuple comprising a name ni (which must be one of the words in W ), a possibly empty set of parameters P , and a possibly empty set of return variables RV . 33 I =fie i i2Ng ie = (ni;P;RV ) niW P =fv j j2N 0 g RV =fv k k2N 0 g A brick's operation op is a tuple comprising a set VO of variables that comprise the operation's state, an algorithm alg that realizes the operation, a multinomial probabil- ity distribution op over the operation's topics (called \document-topic distribution" for short) and a function op type. T op is the set of topics over which op is distributed, i.e., T op represents the operation's concerns. op type determines whether a topic in T op is application-specic (pertaining to the system's \business logic") or application- independent (pertaining to the bricks' interaction needs). O =fop i i2Ng op = (VO;alg; op ;op type) VO =fv k k2N 0 g op =Pd(T op ) T op =fz j j2Ng op type :T op !SP SP =fspcf;indpg 34 The mapping relation M relates a brick's operations with the interface elements through which they are accessed. The tuples in the relation are restricted such that every interface is paired with an operation in the tuple if their types match. Note that multiple operations can be part of dierent tuples containing the same interface. M =f(ie k ;op j ) (ie k 2I)^ (op j 2O)^ ((8v ie 2 (ie k :P [ie k :RV ))(9v op 2op j :VO)v ie :t =v op :t)g The document-topic distribution b is a probability distribution over topics T . b = Pd(T ) represents the extent to which the topics T are present within the brick b. A link l is a tuple comprising a source interface src and a destination interface dst. Links are the channels over which components and connectors transfer data and control over their interfaces. L =fl i i2N 0 g l = (src;dst) src;dst2I A set of componentsC are bricks where (1) each interface of a component is application- specic and (2) each component c2 C is primarily application-specic. An interface is application-specic if half or more of the words in the name of an interface are application- specic; otherwise, the interface is application-independent. w type determines if a word is application-specic or application-independent. A brick is primarily application-specic if, for each topic that occurs in the component with a probability above a thresholdth z c , 35 that topic is application-specic;th z c is specied by an architect. A topic is application- specic if the words of the topic are primarily application-specic; z type determines if a topic is application-specic or application-independent. C =fc i (c i 2 B)^ (c i :I =I)^ (8z c 2TP (z c jc i )>th z c )z type(z c ) = spcf )g I =fia j (ia j 2I)^ (ia j :nAS)g AS =fw j (w j 2W )^ (w type(w j ) = spcf )g w type :W!SP z type :T!SP 0th z c 1 A set of connectorsR are bricks where (1) each interface of a connector is application- independent and (2) each connector r2 R is primarily application-independent. Inter- faces are considered application-independent if half or more of the words naming the inter- face are application-independent. A brick is considered primarily application-independent if, for each topic that occurs in the connector with a probability above a threshold th z r , that topic is application-independent; th z r is specied by an architect. TP(r) is a rela- tion indicating the connector types that connector r may be an instance of. 36 R =fr i (r i 2 B)^ (r i :I =D)^ (8z r 2TP (z r jc i )>th z r )z type(z r ) =indp)g D =fid j (id j 2I)^ (id j :nAD)g AD =fw k w k 2W^w type(w k ) =indpg 0th z r 1 TP(r)fproc call;event;stream;distributor;data access; adaptor;arbitratorg 3.2 Formal Architectural-Smell Denitions We utilize the architectural concepts dened in Section 3.1 to formally dene architectural smells in this section. For certain smells, we employ the shorthand predicates indicating architectural connectivity dened in Figure 3.1. connected(b 1 ;b 2 ) 9b 1 ;b 2 2 B (out link(b 1 ;l)^ in link(b 2 ;l))_ (in link(b 1 ;l)^out link(b 2 ;l)) out link(b;l) (b2 B)^ (l2L)^ (9ia2b:I)(l:src =ia) in link(b;l) (b2 B)^ (l2L)^ (9ia2b:I)(l:dst =ia) Figure 3.1: Shorthand Predicates for Architectural Connectivity 37 Extraneous Adjacent Connector. The Extraneous Adjacent Connector smell oc- curs when two connectors of dierent types are used to link a pair of components. For- mally, components c 1 ; c 2 2 C and connectors r 1 ; r 2 2 R are involved in an instance of an Extraneous Adjacent Connector smell i: connected(c 1 ; r 1 ) ^ connected(r 1 ; c 2 )^ connected(c 1 ; r 2 ) ^ connected(r 2 ; c 2 )^ TP(r 1 ) 6= TP(r 2 ) Ambiguous Interface. Ambiguous Interfaces are interfaces that oer only a single, general entry-point to a component or connector; contain a single parameter; and dispatch to dierent internal operations based on the content of the parameter. Formally, an interface i2I of b2 B is an Ambiguous Interface i: jb:Ij = 1^ie 1 2b:I^jie 1 :Pj = 1^jfop j (ie 1 ;op j )2b:Mgj> 1 Scattered parasitic functionality is a concern-based architectural smell. It de- scribes a system in which multiple bricks are responsible for realizing the same high-level concern while some of those bricks are also responsible for additional, orthogonal con- cerns. Such an orthogonal concern \infects" a brick, akin to a parasite. Formally, a set of bricks SPF2 B suer from this smell i 9z2T (numBricksWithTopic(z)> th tc ) ^ ((8b2 SPF)(P (zjb)> th spf )) 38 where 0th spf 1 species the acceptable degree of scattering per concern; th tc captures that scattering of a topic is allowed to occur across a given number of bricks before they are considered to be aected by this smell; and numBricksWithTopic returns the number of bricks that has concern z with a proportion above th spf . Connector Envy. Components with Connector Envy encompass extensive interaction- related functionality that should be delegated to a connector. A componentc2 C suers from Connector Envy in the following cases: Connector Interface Implementation. In this case, a component exposes an application- independent interface. To detect this smell, we leverage the application specicity of interfaces and their corresponding operations. In particular, Connector Interface Implementation occurs if a component's application-specic interface actually ex- poses an application-independent operation. Formally, interface ia of component c exhibits this smell i ia2c:I (9op2c:O)^ ((ia;op)2c:M)^ (op type(op) =indp). Consider an example of a component with an interface that is 51% application- specic. If the operation that maps to this interface is application-independent, then this interface suers from Connector Interface Implementation. In this case, an engineer should determine why that mismatch of application specicity is occurring. Note that the \strength" of this smell (i.e., the extent to which the smell is an indicator of decay) can also be considered. In the above example, the interface is only slightly application-specic|making the strength of this smell weak and possi- bly less problematic; however, it is reasonable to expect the degree of smell strength 39 to vary in dierent cases. For example, if the interface were 80% application-specic and mapped to an application-independent operation, then this particular smell's strength is signicantly higher and more likely to be problematic. Unacceptably High Connector Concern. In this case, a single application-independent concern as represented by a topic is too high as specied through a threshold se- lected by an architect. Formally, a component c2 C exhibits this smell i9z2T (z type(z) =indp)^ (P (zjc)>th zc ), whereth zc species the threshold for the acceptable degree of application independence for a concern in a component. Data Flow Interface Envy. For this variation, one component c passes the param- eters of its interface to its return values. This interface indicates that it either transforms or transfers its input, which are communication services that should be delegated to a connector. Formally, a component's interface i2 c:I exhibits this smell i8p2i:P (9rv2i:RV )^ (p =rv). Component Envy. Connectors with Component Envy encompass extensive application- specic functionality that should be performed by a component. Formally, a connector r2 R suers from Component Envy in the following cases: Component Interface Implementation. In this case, a connector exposes an application- specic interface. To detect this smell, we leverage the application specicity of interfaces and their corresponding operations. In particular, the smell occurs if a connector's application-independent interface actually exposes an application- specic operation. This smell is analogous to the Connector Interface Implemen- tation smell. Formally, interface ia of connector r 2 R exhibits this smell i: 40 (ia2r:I)^ (9op2r:O) ((ia;op)2r:M)^ (op type(op) =spcf). Similar to Con- nector Interface Implementation, this smell's strength can vary based on the degree of an interface's application specicity. Unacceptably High Application-Specic Concern. For this smell variation, an application- oriented concern as represented by a topic is too high as specied through a thresh- old selected by an architect. Formally, a connectorr2 R exhibits this smell case i: 9z2 T (z type(z) = spcf )^ (P (zj r) > th zr ), where th zr species the threshold for the acceptable degree of application specicity for a concern in a component. Concern overload indicates that a brick implements an excessive number of con- cerns. Formally, a brick b2B suers from this smell i jfz j (j2R)^ (P (z j jb)>th z b )gj>th co where 0 th z b 1 is the threshold indicating that a topic signicantly represented in the brick; and th co 2 R is a threshold indicating the maximum acceptable number of concerns per brick. Link Overload is a dependency-based smell that occurs when a brick has interfaces involved in an excessive number of links (i.e., dependencies on other bricks), aecting the system's separation of concerns and eective isolation of changes. A brick may have an excessive number of incoming links, outgoing links, or both. Formally, a brick b i suers from outgoing link overload i jf8l2Ll:src2b i :Igj>th lo where th lo is a threshold indicating the maximum number of links for a brick that is considered to be reasonable. Excessive incoming links are dened analogously. 41 Unused Interface. A brick's interface is unused if that interface is linked with no other bricks. These interfaces add unnecessary complexity to a software system which, in turn, hinder software maintenance. Formally, a brickb 1 2B contains an Unused Interface ie 1 2b 1 :I i::9b 2 2B (ie 2 2b 2 :I)^ (b 1 6=b 2 )^ ( ((ie 1 ;ie 2 )2L)_ ((ie 2 ;ie 1 )2L) ). Duplicate Component Functionality. A component has duplicated functional- ity if it shares the same functionality as another component. Duplicated functionality increases complexity by causing any changes to one instance of the functionality to pos- sibly require changes in the other instance. For example, changing one instance of the functionality without changing the other may create errors. Formally, c 1 ;c 2 2C together have a duplicate functionality with each other i9op 1 2c 1 :O;op 2 2c 2 :Oop 1 =op 2 . Dependency cycle indicates a set of components whose links form a circular chain, causing changes to one component to possibly aect all other components in the cycle. Formally, this smell occurs in a set of three or more components c k 2 C i 9l2L (8x (1xk) ((x<k) =) (l:src2 c x :I^ l:dst2 c x+1 :I))^ ((x =k) =) (l:src2 c x :I^ l:dst2 c 1 :I)) Unused Brick. A brickb is unused if its interfaces are all unused interfaces. Similar to Unused Interfaces, this smell adds unnecessary complexity to a system which, in turn, hinders maintenance. Formally, a brick b is unused i@l2Lin link(b;l). Connector Chain Overload. Connector Chain Overload occurs when a long set of linked connectors have an excessive number of connector types. The use of multiple connector types in tandem (like in the case of Extraneous Adjacent Connectors) may result in the simultaneous use of incompatible connector types. This excessive use of 42 linked connector types may also provide overly complex interaction services that may be replaced with a simpler mechanism. For example, a set of connectors that are linked and perform authentication, authorization, encryption, streaming, data access and distribu- tion is an instance of Connector Chain Overload. Formally, a set of connectors CCO2R has Connector Chain Overload i: (8r 1 2 CCO9r 2 2 CCOconnected(r 1 ;r 2 ))^ (jfty (ty2 TP(r 3 ))^ (r 3 2 CCO)gj> th ty )^ (jCCOj>th cl ) whereth ty is a threshold specifying an excessive number of connector types; and th cl is a threshold specifying an excessive number of linked connectors. Lego Syndrome. A brick suers from Lego Syndrome when it handles an extremely small amount of functionality. This smell type represents bricks that are excessively small and do not represent an appropriate separation of concerns; thus, such bricks should be moved into another brick. Formally, a brick b has Lego Syndrome ifjb:Oj < th ls , where th ls species a threshold for an excessively small number of operations, i.e., functionality. Sloppy Delegation. Sloppy Delegation occurs when a component delegates to an- other component a small amount of functionality that it could have performed itself. In particular, a component could perform that functionality itself if all the data it needs to perform that functionality is part of that component's state. This inappropriate sep- aration of concerns complicates the functionality of the system which, in turn, hinders maintenance of that system. For example, a component that stores an aircraft's current velocity, fuel level, and altitude and passes that data to another component that solely calculates that aircraft's burn rate is an example of Sloppy Delegation. Formally, Sloppy Delegation occurs between bricks b 1 and b 2 i9ie2b 2 :I (ie:Pb 1 :S)^ (b 1 6=b 2 ). 43 Brick Functionality Overload. A brick that performs an excessive amount of functionality suers from Brick Functionality Overload. Excessive functionality is another form of inappropriate modularity in a software system, which violates the principles of separation of concerns and isolation of change. Formally, a brickb has brick functionality overload ifjb:Oj>th bfo , where th bfo species a threshold for an excessively high number of operations, i.e., functionality; th bfo is specied by an architect. 3.3 Detection of Architectural Smells We focus on the detection of seven smells in this section. These seven smells are rep- resentative of both (1) architectural smells in general and (2) the manner in which architectural-smell denitions can be transformed to detection algorithms. The remaining architectural smells have been shown to be very dicult to detect [142, 65]. Addressing them properly would require signicant additional research, which is outside the scope of this dissertation. Each of these smells fall into one of three categories: dependency- based, concern-based, or connector-based. Dependency-based smells are architectural smells that arise due to the dependency relations between bricks. Concern-based smells are architectural smells that have an inappropriate separation of concerns. Connector- based smells occur due to misuse of connectors which result in maintainability issues. Our formalization of architectural concepts and architectural-smell denitions allows us to build algorithms for detecting each of the architectural smells. The limited number of bricks, links, and concerns aorded by employing an architectural level of abstraction allows our detection algorithms to be simple and ecient. In the rest of this section, 44 we enumerate the architectural-smell denitions and their corresponding detection algo- rithms. 3.3.1 Concern-Based Smell Detection Algorithm 1: detectSPF Input: B: a set of bricks, T : a set of system concerns, th spf : threshold for concern relevance Output: spfInstances : a map where keys are concerns and values are bricks 1 spcInstances initialize map as empty 2 concernCounts initialize all concern counts to 0 3 for b2B do 4 T b getConcernsOfBrick(b) 5 for z2T b do 6 if P (zjb)> th spf then 7 concernCounts[z] = concernCounts[z] + 1 8 meanConcernCounts computeMean(concernCounts) 9 stdDevOfConcernCounts computeStandardDeviation(concernCounts) 10 th tc meanConcernCounts + stdDevOfConcernCounts 11 for z2T do 12 if concernCounts[z]> th tc then 13 for b2B do 14 if P (zjb)> th spf then 15 spfInstances[z] spfInstances[z][fbg Scattered Parasitic Functionality. Algorithm 1, detectSPF, detects Scattered Parasitic Functionality by automatically setting a threshold for th tc . detectSPF returns a map spfInstances where each key in the map is a scattered concern z, and each value is a set of bricks that have the corresponding key z above the threshold th spf . Lines 3-7 create a map where keys are concerns and values are the number of bricks that have that concern above threshold th spf . Line 8 calculates the mean number of concerns in a brick, while Line 9 calculates the standard deviation for the number of concerns in a brick. The 45 sum of that mean and standard deviation is used to set the threshold th tc . Lines 11-15 actually identify the bricks aected by Scattered Parasitic Functionality. Concern Overload. Algorithm 2, detectCO, determines which bricks in the system have Concern Overload. The algorithm operates in a manner similar to detectSPF. detectCO begins by creating a map where keys are bricks and values are the number of relevant concerns in the brick (Lines 3-7). detectCO uses that map to compute the mean and standard deviation for the number of concerns in a brick (Lines 8-9). The sum of that mean and standard deviation are used as the threshold value for th t . The algorithm then uses that threshold to determine which bricks have concern overload (Lines 11-13). Algorithm 2: detectCO Input: B: a set of bricks, T : a set of system concerns, th z b : threshold for concern relevance Output: smells : a set of Concern Overload instances 1 smells ; 2 brickConcernCounts initialize all brick concern counts to 0 3 for b2B do 4 T b getConcernsOfBrick(b) 5 for z2T b do 6 if P (zjb)> th z b then 7 brickConcernCounts[b] = brickConcernCounts[b] + 1 8 meanBrickConcerns computeMean(brickConcernCounts) 9 stdDevOfBrickConcerns computeStandardDeviation(brickConcernCounts) 10 th t meanBrickConcerns + stdDevOfBrickConcerns 11 for b2B do 12 if brickConcernCounts[b]> th t then 13 smells smells[fbg 3.3.2 Dependency-Based Smells Brick Dependency Cycle. We detect Dependency Cycles by identifying strongly connected components in a software system's architectural graph G = (B;L). A strongly 46 connected component is a graph or subgraph where each vertex is reachable from every other vertex. Each strongly connected component in G is a Dependency Cycle. Any algorithm that detects strongly connected components [49, 96] can then be used to identify Dependency Cycles. Link Overload. Algorithm 3, detectLO, extracts the Link Overload variants for a set of bricks B by examining their links L. The algorithm rst determines the number of incoming, outgoing, and combined links per brick (Lines 4-6). detectLO sets the threshold th lo for each variant of Link Overload by computing the mean and standard deviation for incoming, outgoing, and combined links (Lines 8-10). The last part of detectLO identies each brick with Link Overload and the directionality that indicates the variant of Link Overload the brick suers from (Lines 11-14). Algorithm 3: detectLO Input: B: a set of bricks, L: links between brinks Output: smells : a set of Link Overload instances 1 smells ; 2 numLinks initialize map as empty 3 directionality f\in"; \out"; \both"g 4 for b2B do 5 for d2directionality do 6 numLinks[(b;d)] numLinks[(b;d)] + getNumLinks(b; d; L) 7 for d2directionality do 8 meanLinks[d] computeMean(numLinks; d; B) 9 stdDevLinks[d] computeStandardDeviation(numLinks; d; B) 10 th lo [d] meanLinks[d] + stdDevLinks[d] 11 for b2B do 12 for d2directionality do 13 if getNumLinks(b; d; L)> th lo [d] then 14 smells smells[f(b;d)g 47 3.3.3 Connector-Based Smells Connector Interface Implementation. Algorithm 4, detectCII, identies Connector Interface Implementation instances. detectCII takes a set of components C and a set of linksL of an architectureA as input and returns ciis, i.e., a set of Connector Interface Implementation instances. The algorithm iterates over every operation op that maps to an interface ia of each component c2 C. If op is application-independent, then ia is stored to ciis. Unacceptably High Connector Concern. Algorithm 5, detectUHNC, detects components with Unacceptably High Connector Concern. detectUHNC takes a set of componentsC and a thresholdth zc , which species the acceptable degree of application- independence for a concern of a component. For each component c and concern z, if (1) the probability of z given c is greater than th zc and (2) z is an application-independent concern, then detectUHNC stores a pair (c;z) to uhncs. Extraneous Adjacent Connector. Algorithm 6, detectEAC, determines which components and connector types are involved in Extraneous Adjacent Connector smells. The input to detectEAC are the componentsC, connectorsR, and linksL of an architec- tureA. detectEAC returns a map eacInstances, which represents Extraneous Adjacent Algorithm 4: detectCII Input: C: a set of components, L: links between brinks Output: ciis : a set of Connector Interface Implementation instances 1 ciis ; 2 for op2c:O do 3 for ia2c:I do 4 if (ia;op)2c:M^op type(op) =indp then 5 ciis ciis[fiag 48 Algorithm 5: detectUHNC Input: C: a set of components, th zc : a threshold specifying the acceptable degree of application-independence for a concern of a component Output: uhncs: a set of Unacceptably High Connector Concern instances 1 uhncs ; 2 for c2C do 3 for z2T do 4 if z type(z) =indp^P (zjc)>th zc then 5 uhncs uhncs[f(c;z)g Connector instances. Each key of the map is a pair of components suering from the Extraneous Adjacent Connector smell; each value of the map is the sets of connector types that are part of that smell. The algorithm (1) identies component-connector links and connector-type edges and then (2) utilizes those edges to determine actual instances of Extraneous Adjacent Connector smells. A component-connector link (c;r) is an undi- rected edge representing that component c is linked to connector r. A connector-type edge is a tuple (c 1 ;c 2 ; connTypes) representing an undirected edge whose endpoints are components c 1 ;c 2 2C that are linked by a set connTypes of connector types. detectEAC begins by determining all the component-connector links (compConnLinks) and connector-type edges (connTypeEdges) of an architecture (Lines 3-11). The algo- rithm iterates over each component, link, and connector to produce a set of component- connector links (Lines 3-7). detectEAC then uses the identied component-connector links to determine all the connector-type edges in the system (Lines 8-11). The second part of detectEAC uses connTypeEdges to determine the components and connector types that are involved in Extraneous Adjacent Connector smells (Lines 16-19). For any two connector-type edges that involve the same pairs of components|if those edges dier in their associated connector types (Line 18)|detectEAC stores an 49 entry in eacInstances where the key is the matching pair of components, and the value is the set of connector types from the two connector-type edges (Line 19). For example, if two connector-type edges are (client;server;proc call) and (client;server;event), then an entry with key (client;server) and valuefproc call;eventg is added to eacInstances. Algorithm 6: detectEAC Input: C: a set of components, R: a set of connectors, L: a set of links Output: eacInstances: a map representing Extraneous Adjacent Connector instances 1 connTypeEdges ; 2 compConnLinks ; 3 for c2C do 4 for l2L do 5 for r2R do 6 if linked(c;r;l) then 7 compConnLinks compConnLinks[f(c;r)g 8 for (c 1 ;r)2 compConnLinks do 9 for (c 2 ;r)2 compConnLinks do 10 if c 1 6=c 2 then 11 connTypeEdges connTypeEdges[f(c 1 ;c 2 ;TP (r))g 12 rstCompIndex 0 13 secondCompIndex 1 14 connTypeIndex 2 15 eacInstances ; 16 for edge 1 2 connTypeEdges do 17 for edge 2 2 connTypeEdges do 18 if edge 1 [rstCompIndex] = edge 2 [rstCompIndex]^ edge 1 [secondCompIndex] = edge 2 [secondCompIndex]^ edge 1 [connTypeIndex]6= edge 2 [connTypeIndex] then 19 eacInstances[(edge 1 [rstCompIndex]; edge 1 [secondCompIndex])] eacInstances[(edge 1 [rstCompIndex]; edge 1 [secondCompIndex])][ edge 1 [connTypeIndex][edge 2 [connTypeIndex] 50 Chapter 4 Framework for Ground-Truth Architecture Recovery Dierent objective criteria have previously been used for evaluating recovery techniques [177, 110]. The most comprehensive criterion is a comparison of the result of a recov- ery technique with an authoritative recovery. An authoritative recovery is the mapping, from architectural elements to code-level entities that implement those elements, that has been constructed or certied as correct by a human. Obtaining reliable authorita- tive recoveries remains challenging for non-trivial systems. Specically, an authoritative recovery performed by an engineer who is not an expert for a particular system may not re ect the true architecture as viewed by the system's architect. This runs the risk of missing application-level or domain-level design decisions, which runs the risk of wasting signicant eort. A more reliable authoritative recovery is one that, after construction, is certied by an architect or engineer of the software system. More specically, the engineer is a long-term contributor with intimate knowledge of the system's architecture, or long- term contributor for short. We refer to such an authoritative recovery as a ground-truth recovery. Typically, only a long-term contributor or architect of a system has suciently 51 comprehensive domain knowledge, application-specic knowledge, and experience that may not be present in system documentation or implementation-level artifacts. However, there is a dearth of ground-truth recoveries due to the colossal eort required for an engineer to produce such a recovery. The ground-truth recoveries that are available are often of smaller systems, which reduce condence that a recovery technique would work on larger systems, or only identify a few very coarse-grained components in larger systems, rendering the obtained recovery less informative. In this chapter, we propose a framework intended to aid the recovery of ground- truth architectures. The framework denes a set of principles and a process that results in a reliable ground-truth recovery [61]. The process involves an architect or long-term contributor of the system in a limited yet meaningful way. The framework's principles, re- ferred to as mapping principles, serve as rules or guidelines for grouping code-level entities into architectural elements and for identifying the elements' interfaces. The framework bases these principles on four types of information used to obtain ground truth: generic information (e.g., system-module dependencies), domain information (e.g., architectural- style rules), application information (e.g., the purpose of the source code elements), and information about the system context (e.g., the programming language used). We focus our discussion around a case study in which we applied our framework to produce a ground-truth recovery of Apache Hadoop [4], a widely-used framework for distributed processing of large datasets across clusters of computers [7]. The nal product of this case study is a ground-truth recovery constructed with the aid of one of Hadoop's long-term contributors. 52 Generic principles Domain principles Application principles System context Figure 4.1: Classication of the principles used for ground-truth recovery. The rest of the chapter is structured in the following manner. Section 4.1 describes the mapping principles of the framework, how they relate, and examples of such prin- ciples. Section 4.2 describes our process for obtaining a ground-truth recovery and our application of the process to Hadoop. 4.1 Mapping Principles In this section, we present the types of mapping principles that we use in our framework for obtaining ground-truth recoveries. A key aspect of our framework is the identication of these mapping principles. To reiterate, the mapping principles are rules or guidelines that determine whether particular code-level entities should/can be mapped to the same architectural element or grouping. The principles are classied based on the four dier- ent kinds of information previously described: generic, domain, application, and system context information. They underlie the process for obtaining ground-truth recoveries we present in Section 4.2. Figure 4.1 depicts how the dierent kinds of principles relate to each other. Next, we dene the mapping principles and illustrate them with examples from the Hadoop case study. 53 Tracking Group JobTracker TaskTracker (a) Merged by two-way as- sociation TaskTracker Group JobTracker Group JobTracker TaskTracker (b) Not merged due dpr or dptr Web Server HttpServlet TaskLogServlet ListPathsServlet (c) Servlets merged into web server NameNode Group TaskTracker Group TaskTracker TaskLogServlet ListPathsServlet NameNode (d) Servlets separated accord- ing to use by domain compo- nents Figure 4.2: Dierent ways of applying mapping principles. Generic principles are independent of domain and application information. These principles include long-standing software-engineering principles often used to automate architectural recovery techniques, such as separation of concerns, isolation of change, and coupling and cohesion. For example, a generic principle may state that code-level modules that depend on common sets of code-level entities (e.g., interfaces or libraries) should be grouped together. In the case of Hadoop, this principle results in grouping code-level entities that involve sorting since they depend on interfaces involving reporting, comparing, and swapping. The generic principles are typically concerned only with the code-level entities and dependencies between them, and do not consider domain and application semantics. Such principles are typically implemented by and can be obtained from existing recovery techniques. In our case study, we used the rules from Focus [121], an architectural recovery technique we have previously proposed, as the generic principles. 54 Domain principles are the mapping principles for recovering an architecture that are based on domain information. Domain information can be any data specic to the do- main of the system in question, e.g., telecommunications, avionics, scientic systems, dis- tributed data systems, robotics, etc. The domain principles stem from (1) the knowledge about a specic domain documented in the research literature and industry standards, and (2) an engineer's experience working in a particular domain. Figure 4.1 indicates that domain principles are at a level of abstraction below generic principles and above application principles. Architectural pattern rules are examples of domain principles. The domain principles we applied on Hadoop were extracted from an existing refer- ence architecture of grid systems that denes the typical components in such systems [56], and our knowledge of the domain of distributed data systems (master/slave archi- tectural pattern, distributed communication). For example, the type of master/slave pattern where the master manages jobs that are distributed among slaves is commonly used in systems where massive amounts of data are stored and processed. This suggests that Hadoop's JobTracker and TaskTracker components that follow this pattern accord- ing to the application documentation should be represented in the code. Hence, we introduced a domain principle dp tr that groups together the code-level entities that pri- marily implement the master (JobTracker) and slave (TaskTracker) functionalities. The documentation also suggested that NameNode and DataNode also follow a master/slave pattern; we introduced similar domain principles for these components. We will refer to JobTracker, TaskTracker, NameNode and DataNode, as domain components since they are identied using domain principles. 55 Domain principles have a higher priority than the generic principles (i.e., a domain principle may prevent a generic principle from being applied). For example, a generic Focus rule states that two code-level entities should be grouped based on their two-way dependencies (generic principle gp tw ). In Hadoop, the JobTracker and TaskTracker can be grouped usinggp tw as shown in Figure 4.2(a). However, these classes also communicate remotely with each other. Thus, a domain principle dp r would indicate that they belong in separate groupings, as shown in Figure 4.2(b), sincedp r is of higher priority thangp tw . Note that dp tr would also produce the same eect as dp r for Figure 4.2(b). Application principles are mapping principles that are relevant specically for the par- ticular application being recovered. For example, Hadoop is able to perform distributed upgrades. Therefore, an application principle that utilizes this information is a rule stat- ing that any objects that are used primarily for performing distributed upgrades should be grouped together. Application principles may be extracted from available documen- tation and comments in the code. However, some application principles may only be devised based on the feedback from the system's lead developer or architect. For exam- ple, the lead developer of Hadoop revealed to us an application principle stating that any grouping related to sorting functionality should be grouped with the code-level entities representing map tasks. Application principles contain more specic information than either domain or generic principles, and therefore have a higher priority. A case where application principles are applied is when an application utilizes components outside of the application's primary domain. For example, a distributed data system may choose to utilize web system com- ponents to perform certain functionalities. Specically, Hadoop uses servlets to allow 56 transfer of HDFS data over HTTP. Servlets are Java classes that respond to requests by using a request-response protocol and are often used as part of web servers. A domain principle dp ws may indicate that web servlets should be grouped together to form a web server. Figure 4.2(c) shows a grouping of servlet classes based on dp ws . However, for Hadoop, an application principle ap ws representing this information states that servlets should be grouped with domain components. Thus, ap ws supersedes dp ws . For exam- ple, Figure 4.2(d) shows that the TaskTracker is dependent on a servlet for task logging, which constitutes one group, while the NameNode is dependent on a servlet that obtains meta-data about a le system, i.e., the ListPathsServlet, which constitutes another group. Lastly, system context principles, represented in Figure 4.1 as the gray area spanning the other three categories of principles, are mapping principles that involve properties about the infrastructure used to build the application being recovered. For example, the choice of OS or programming language can impart useful recovery information. The choice of OS implies dierences in the way components are congured, installed, deployed, run, or compiled. Therefore, the OS over which an application is run may signicantly aect the way code-level entities map to components, or even what kinds of components may exist in the rst place. A system context principle can be a generic, domain, or application principle. For example, Hadoop 0.19.0 is designed to run on GNU/Linux machines and comes with bash shell scripts that control the running of Hadoop daemons. Therefore, a system context principle states that code-level entities that primarily implement a daemon should constitute a grouping. This principle is also relevant for the domain of distributed systems and, thus, is also a domain principle. One grouping yielded by this principle includes the 57 code implementing Hadoop's Balancer daemon, which attempts to balance data across DataNodes. Furthermore, the choice of an OO programming language can make a signicant dierence as to the mapping principles chosen. For example, inheritance can be used as principles to guide recovery, which would be unavailable for systems built using non- OO languages. For example, Hadoop has a Task class that represents the general state or functionality specic to map or reduce tasks. An application-specic rule using this information states that the Task class and any class that inherits from it constitute a grouping that represents map/reduce task data. Figure 4.3(a) provides a more comprehensive example of how a set of domain and application principles results in a correct grouping when applied in the correct order. Failing to follow the precedence of mapping principles described above will yield incorrect groupings (Figure 4.3(b)): the Web Server group should not exist, the Nodes group should split into two groups, and JvmManager should comprise its own group rather than being contained within the TaskTracker group. NameNode Group DataNode Group ListPathsServlet TaskTracker Group TaskTracker JvmManager Group TaskLogServlet (a) Grouping yielded by a correct se- quence of principles Nodes Group NameNode Group DataNode Group Web Server TaskLogServlet ListPathsServlet TaskTracker Group TaskTracker JvmManager Group (b) Grouping yielded by an incorrect sequence of principles Figure 4.3: For the same groups and classes, a sequence of principles can result in signif- icantly dierent groupings. 58 4.2 Ground-Truth Recovery Process In this section, we detail the process for deriving ground-truth recoveries based on the mapping principles described in Section 4.1. In the case of Hadoop 0.19.0, this process resulted in the ground-truth architecture comprising 70 components. Figure 4.4 depicts the main components that constitute the Map/Reduce and HDFS subsystems of Hadoop. Most of Hadoop's utility components (approximately 20 components in total) have been removed for clarity as their inclusion results in an architectural diagram that is an almost- complete graph, thus obscuring the important architectural facets. At this magnication, the diagram in Figure 4.4 is not intended to be readable, but rather to convey a sense of the size, structure, and complexity of Hadoop's ground-truth architecture. The yellow (lighter) ovals represent components from the Map/Reduce subsystem, while the green (darker) ovals represent components from HDFS. For reference, this diagram as well as the complete architectural diagram of Hadoop can be viewed at full magnication at [9]. The idea behind the ground-truth recovery process is to utilize the dierent types of available information and the corresponding mapping principles to obtain an informed authoritative recovery. The authoritative recovery is then analyzed by a lead developer or architect of the system, and revised into a ground-truth based on the suggested modica- tions. The eort required to obtain a ground-truth recovery using our process is compa- rable to the existing approaches that have produced authoritative, but not ground-truth, recoveries (see Section 8.2.2). Our process (1) provides a set of well-dened steps that can help ensure consistency when the recoveries are performed by dierent researchers, 59 JobShell JobClient Job Tracking HDFS Upgrade Management Datanode Protocol NameNode HDFS Protocol Core Client Protocol Tracker Protocols Task Tracking and Running FS Logging Namenode JvmManager Job History KeyFieldBasedPartitioner Mapred client Balancer Distributed File System Mapred utilities Secondary Name Node HTTP Protocol Access DB I/O Map/Reduce Tasks DataNode Sequence File Handling Namenode Protocol FSDataSet Transfer HDFS over HTTP C++ Pipes Protocol Chain MapReduce Multiple Output File Formatting DFS Tools DelegatingMapper Join utilities Client Datanode Protocol C++ Pipes Job Input Format Data Structures Core Name Node Structures JobQueueClient Value Aggregator Client-facing Exceptions InterDataNode Protocol TotalOrderPartitioner Figure 4.4: The ground-truth recovery of Hadoop 0.19.0 showing the main components of the Map/Reduce and HDFS subsystems. At this magnication, the gure is intended only as an illustration of Hadoop's complexity. This diagram can be found fully magnied at [9]. (2) results in more informed intermediate recoveries due to the multiple types of utilized information and mapping principles, and (3) ultimately yields a ground-truth recovery. The process we propose consists of eight steps depicted in Figure 4.5. The initial two steps deal with the identication and selection of the specic mapping principles that will be applied during the recovery. In Steps 3 and 4, the generic principles are applied on the extracted implementation-level dependency information. In Step 5, the available domain and application principles are used to rene the generic recovery. In Step 6, the 60 Documentation, domain literature, system context Step 1: Gather domain and application principles Step 2: Select a generic recovery technique Existing recovery techniques Selected generic recovery technique Step 3: Extract implementation- level information System implementation Implementation- level dependencies Step 4: Apply generic recovery technique Generic recovery Domain and application principles Step 5: Utilize application and domain principles Authoritative recovery Step 6: Identify utility components Revised authoritative recovery Step 7: Pass authoritative recovery to certifier Proposed modifications Step 8: Modify groupings per certifier's suggestions Ground-truth recovery Legend Artifact Activity Figure 4.5: The process for obtaining ground-truth recoveries. utility components, whose primary purpose is to provide helper functions (e.g., sorting, parsing, GUI rendering) to other components, are identied. In the nal two steps, the lead engineer or architect validates the obtained authoritative recovery, which is then rened into a ground-truth recovery. The ground-truth recovery involves two types of actors who perform the dierent steps in the process: the lead developer or architect certies and recommend changes (Steps 7{8) to an authoritative recovery performed by one or more engineers who need not have been involved in the system's development (Steps 1{6). We refer to the lead developer or architect as the certier, and to an engineer or researcher producing the authoritative recovery to be certied as the recoverer. The process is designed so that it does not require a certier to be extensively involved in the recovery; instead, the certier is asked only to comment on the authoritative recovery obtained by recoverers and to verify its correctness. The objective of our process is to minimize the burden on the certier while optimizing the use of the certier's time and expertise. For example, a certier can impart domain and application principles not evident in the documentation. A certier can also identify the generic principles that should not be followed in a given 61 recovery. Next, we describe the individual steps of our process and illustrate them using the Hadoop case study. 4.2.0.1 Gather domain and application information The rst step in our process is for a recoverer to utilize any available documentation to determine domain or application information that can be used to produce domain or ap- plication principles. From our experience, it is common for systems to contain some kind of documentation about components and connectors. We refer to these as expected compo- nents and connectors. The documents often include diagrams showing some components and their congurations. They also tend to contain explanations of each component's key functionalities. Therefore, domain and application principles, in this case, would indicate that some code-level entities should map to these components. Furthermore, documentation should provide information about system context information. System context information that is likely to be found in documentation includes choice of OSs; choice of programming languages; and information about building, compiling, deploying, and/or installing the system. Depending on the availability of a certier, the domain and application principles may subsequently be rened according to the certier's suggestions. For Hadoop, we utilized two recoverers, both of whom are PhD students who have had previous experience recovering software architectures. One of the recoverers had specic industry experience building and using applications based on the Map/Reduce paradigm. The other recoverer also had industry experience building and using distributed data sys- tems for large-scale data storage and processing. Therefore, our recoverers were also able to obtained mapping principles from experience with systems of the domain in question. 62 4.2.0.2 Select a generic recovery technique The recoverers can select an existing recovery technique to aid in the production of an authoritative recovery. Latter steps in this process work toward eliminating any bias that may remain from selecting a particular recovery technique ahead of time. In particular, use of domain and application information, manual modication to the output of the selected recovery technique, modications made according to the certier, and nal ver- ication of the authoritative recovery by the certier will work toward eliminating any such bias. Therefore, many existing recovery techniques are appropriate for this step. However, for larger systems, automated techniques that work at larger granularities (e.g., that recover classes instead of functions, or les instead of classes) may be more appro- priate than more manual techniques. Furthermore, the selected recovery technique may also depend on the recoverer's prior experience (e.g., a recoverer may prefer a recovery technique she helped to develop). It is important to note that the selection of a recovery technique will inject generic principles into the recovery. For example, many recovery techniques will use dierent metrics or rules to represent generic principles, such as coupling, cohesion, or separation of concerns. As the obtained recovery is modied with domain and application information or with suggestions made by the certier, other principles will be added, removed, or modied. In the Hadoop case study, we used Focus [121] as the generic recovery technique due to the facts that we have developed it in-house and have extensively tested it on systems from various domains. Focus was also useful because it is specically targeted at OOPLs such as Java, the implementation language of Hadoop. 63 4.2.0.3 Extract relevant implementation-level information Architectural recovery often involves automatic extraction of implementation-level infor- mation. The information extracted will depend on the recovery technique selected by the recoverer and the available tool support. For Hadoop, we used IBM RSA [6] and the Class Dependency Analyzer [8]. This extracted information typically includes at least the structural dependencies between the code-level entities, which is the kind of information we used for Focus. However, as described in Section 8.2.2, the relevant implementa- tion information may also include evolutionary dependencies, ownership information, or semantic information obtained using information-retrieval techniques. 4.2.0.4 Apply generic recovery technique At this point, the recoverers can apply their chosen generic recovery technique to obtain an initial recovery. This step is a straightforward application of the recovery technique since more specic information will be injected through domain, application, and system context principles in later steps. 4.2.0.5 Utilize domain and application principles Any mapping principles determined in Step 1 can be used to modify the recovery obtained at the previous step. In particular, any groupings obtained should be modied, if possible, to resemble expected components or connectors. This is the rst step in the process to make use of domain or application information in the recovery. Domain and application principles had a signicant eect on the components identied in Hadoop. For example, the rule of Focus that indicates that classes or groupings 64 with two-way associations should be merged (gp tw ) would have aected 60 out of the 65 components that we identied for Hadoop 0.19.0. In particular, these 60 components would have been merged with one or more of the other identied components. Domain and application principles prevented these erroneous mergers. For example, the groupings for the NameNode and DataNode would be merged according togp tw , but a domain principle stated that domain components should never be grouped prevented this merger. 4.2.0.6 Identify utility components The recoverer should identify any utility components (e.g., libraries, middleware packages, application framework classes). Previous empirical evaluations of recovery techniques that use authoritative recoveries [15, 110] have shown that proper identication of utility com- ponents has been a major factor in determining the correctness of recovery techniques. The generic principle for identifying utility components is to select groupings that have higher numbers of incoming than outgoing structural dependencies. Furthermore, utility components typically do not resemble the expected application components. Utility com- ponents may be split apart if the certier in latter steps decides that code-level entities in utility components belong more appropriately to dierent groupings. Note that Steps 4{6 can be iteratively applied until the recoverer determines that the obtained recovery has an appropriate granularity. For example, an automated clustering technique used in Step 4 may need to be applied multiple times to obtain a smaller number of (larger) groupings. Our initial recovery of Hadoop consisted of 21 utility components. Most of the utility components originated from a particular directory of Hadoop called core. This directory 65 contains about 280 classes that can be mapped to 14 components based on mapping principles involving the Java package structure (e.g., the main metrics-handling utilities comprise the classes and subpackages of the org.apache.hadoop.metrics package). We also identied another seven utility components in other parts of Hadoop using other mapping principles. 4.2.0.7 Pass authoritative recovery to certier At this point, the recoverer has produced an authoritative recovery that has been enriched and modied with the help of dierent kinds of mapping principles. Key to the authori- tative recovery is the clear relationship between the proposed architectural elements and the code-level elements. The recovered architecture is only now passed to a certier for verication. The certier looks through the proposed groupings and suggests (1) addition of new groupings, (2) splitting of existing groupings, or (3) transfer of code-level entities from one grouping to another. These modications should also include rationale behind the change in groupings in order to determine new domain or application information. In turn, this new information may reveal new mapping principles that can be used by the recoverer to further modify the obtained recovery. The certier's rationale may also reveal the choice not to follow generic principles in certain cases. For example, our certier for Hadoop identied that our authoritative recovery's grouping that represents NameNode was too large. In particular, he identied that our grouping can be split into six groups that represent (1) the core of the NameNode, (2) its main data structures, (3) exceptions, (4) a grouping for the SecondaryNameNode, (5) 66 a grouping representing the NameNode's use of the HDFS protocol, and (6) servlets for transferring HDFS data over HTTP. The recoverers missed this alternative grouping because the certier's knowledge re- vealed application principles that are not discernible from documentation or source code. One of the generic principles that helped produce the NameNode cluster in the authori- tative recovery grouped classes that were tightly coupled. However, the certier deemed that important components within the NameNode were obfsucated by the larger group- ing. This was the case, e.g., with the servlets used by the NameNode as well as the classes representing NameNode's HDFS protocol usage, both of which were moved to their own groupings in the ground-truth architecture. 4.2.0.8 Modify groupings according to certier's suggestions At this point, the recoverer modies the groupings as suggested by the certier. The recoverer and certier may repeat steps 7 and 8 until both are satised with the obtained recovery, at which point a ground-truth recovery is nalized. Our certier for Hadoop was intimately familiar with over 90% of the components (59 of 65 components) we identied in our authoritative recovery. Therefore, he was able to verify with condence the existence and appropriate grouping of a large majority of the components. The certier deemed 34 components to be appropriate as-is. However, he identied the need for close to 40% of the components (25 of 65 components) to be regrouped or modied in some fashion. This is only a single data point, but it is indica- tive of the risks of halting the architectural recovery process at the point at which an 67 authoritative recovery is available. The recoverers carefully studied several sources of ex- isting information, but still misinterpreted or completely missed certain key architectural details. For example, one grouping for handling progress reports was identied by the certier as inaccurate. The recoverers selected a class called TaskInProgress as the sole member of a grouping representing progress reports due to a Focus rule not to further merge those groupings whose dependencies are 2-3 times higher than the average. However, the certier indicated that another class called Progress and an interface called Progressable that are not structurally dependent on each other are actually better representatives for progress reporting. Focus rules mapped these classes to other groupings due to the lack of structural dependency. The class TaskInProgress was re-mapped by the certier to the grouping containing the JobTracker. Therefore, the nal grouping for this class was not based on structurally dependencies at all but on application principles. Such modications that disregarded structural dependencies but were based on application- specic semantics of classes occurred often in the nal step of the ground-truth recovery process. As another example, several groupings for utility components were of coarser gran- ularity in the authoritative recovery but were broken down into other groupings by the certier. For example, the authoritative recovery had a grouping representing general HDFS utilities, which we will refer to as DFS Utils, that can be used by the NameN- ode, DataNode, and their protocol implementations. However, the certier revealed that a signicant number of classes in this grouping handled HDFS upgrade management. 68 Therefore, these classes were extracted into their own group, even though they were structurally cohesive with the other elements in DFS utils. 69 Chapter 5 Enhancing Architectural Recovery Using Concerns To deal with drift and erosion, a number of techniques have been proposed to help re- cover a system's architecture from its implementation [51]. These techniques mainly map implementation-level entities to high-level system components by clustering the entities and taking the resulting clusters to be components [17, 90, 110, 15, 127, 169]. However, the prevailing coupling-and-cohesion-based clustering methods do not recover the con- cerns associated with the components, making it dicult to understand the meaning of a cluster or whether a cluster truly represents a component. Moreover, the architecture of a software system also consists of connectors, which play a critical role in mediating component interactions [164]. Although a number of existing component recovery methods are automated, the connector recovery techniques uniformly depend on signicant human involvement. In particular, existing techniques for connector recovery use patterns or queries to identify the connectors within a system [55, 75, 124, 152]. These techniques require an architect to write a pattern or query for each implementation variant of every possible connector type. Creating such specications is a manual task that can be time consuming and error prone. 70 This chapter describes Architectural Recovery using Concerns (ARC), a novel tech- nique that leverages system concerns to automate the recovery of both components and connectors. The objective of this work is to obtain automatically recovered software ar- chitectures that are more comprehensive and more accurate than those yielded by current methods. Dierent from existing techniques, which exploit only structural information to discover components (e.g., programming language-level dependencies and shared di- rectories), our approach rst recovers concerns from the implemented system using an information retrieval technique, and then combines the concerns with the structural infor- mation to automatically identify components as well as connectors. A concern is a role, responsibility, concept, or purpose of a software system. Components and connectors are, therefore, high-level elements containing data and computation that implement the concerns. The key distinction guiding this work is that concerns are application-specic in the case of components and application-independent in the case of connectors [164]. Source Code Concern Extraction Structural Information Extraction Concerns Structural Information Brick Recovery Bricks Component/ Connector Classification Concern Meta- Classification Concerns with Application Specifity Components/ Connector Model Artifact Procedure IO Flow Figure 5.1: Overall approach for recovering components and connectors Figure 5.1 depicts the key aspects of our approach. First, we extract system concerns and structural information from source code. Then, we use this information to obtain system bricks, which are either components or connectors. In a parallel step, concerns 71 are classied into application-specic (i.e., those related to components, their function- ality, and their data) and application-independent (i.e., those related to connectors and the integration and interaction services they provide). Finally, we use these concerns to classify each recovered brick as a component or connector. We discuss each of these steps in the remainder of the section. 5.1 Obtaining Concerns through Probabilistic Topic Models This section describes (1) the representation of concerns that ARC leverages and (2) the specic mechanisms ARC utilizes to extract that representation. 5.1.1 Concern Representation To obtain concerns, we leverage a statistical language model used in information retrieval called Latent Dirichlet Allocation (LDA) [27]. Information retrieval techniques have been used by software engineering researchers [111, 94, 19], but not for the purpose of architectural recovery. LDA allows us to compute similarity measures between concerns and identify which concerns appear in a single software entity, such as a class, function, package, etc. Table 5.1: A topic about events from event-based systems and another topic about weather. Both of these topics appear in Class 1. (a) A weather topic Topic: Weather Word Prob. temperature 0.7 wind 0.1 humidity 0.2 (b) An event topic Topic: Event Word Prob. send 0.2 receive 0.3 event 0.5 (c) A class document Document: Class 1 Topic Prob. Event 0.1 Weather 0.9 72 In LDA, we represent a software system as a set of documents called a corpus. A document is represented as a bag of words, which are identiers and comments in the source code of a software entity. A document can have dierent topics, which are the concerns in our approach. A topic z is a multinomial probability distribution over words w drawn from a Dirichlet distribution with shape parameter (further claried below). Table 5.1(a) shows an example of a topic labeled \Weather," in which words such as \temperature," \wind," and \humidity" appear with certain probabilities. LDA allows us to represent concerns in a human-readable form since a topic's meaning can be ascertained by examining its most probable words. Consequently, a document d is represented as a multinomial probability distribution over topicsz (called the document-topic distribution) drawn from a Dirichlet distribution with shape parameter (further claried below). A document in our work is a class. For example, \Class 1" in Table 5.1(c) is a document that has two topics that occur with certain probabilities. These topics represent two concerns and the degree to which they are addressed in the class. 5.1.2 Extracting Concerns from Source Code Topics are extracted using approximate iterative algorithms that maximize the likelihood estimates of the topic distribution and document-topic distribution [70]. To extract topics, the corpus and number of topics T to extract are required as input. Larger values of T result in topics that tend to be more similar to each other; smallerT values tend to result in more general and distinguishable topics [19]. The parameters to the distribution are set to = 50=T and = :01 because they have been shown to work well (1) across 73 dierent corpora [162] and (2) for other software-engineering tasks. The dierent kinds of software-engineering tasks that LDA-based topic models have been applied to include determining the topics of software systems for program understanding and obtaining semantic representations of software systems for mining of code search engines, software categorization, bug localization, traceability, and computing class cohesiveness [20, 165, 103, 19, 101]. To determine a possible number of topics, an engineer can examine the words with the highest probabilities in the topic-word distribution;T can then be selected experimentally by the architect so that she can decide on the number of topics that are most sensible for the software system being recovered. For our implementation of ARC, we selected Gibbs sampling [70] to extract an LDA representation of a corpus. This sampling technique is an ecient, approximative algo- rithm that uses a Markov Chain Monte Carlo method used for statistical inference of the posterior distribution, which is the distribution obtained after examining the evidence (i.e., the source code of the system). The implementation of Gibbs sampling is obtained from MALLET, a Java framework for information retrieval using machine-learning tech- niques [119]. We perform several pre-processing steps on the source-code text analyzed by ARC. These pre-processing steps enable correct comparison and dierentiation of words and concerns and are either (1) commonly-used in natural-language processing or (2) specic to analyzing source code for software-engineering tasks. As part of the commonly-used pre-processing steps, ARC eliminates non-alphabetic characters, converts all text to lowercase, removes stop words, and stems words. Stop words are words that are either extremely common or describe structural relationships 74 between words (e.g. \to", \the", \he", \she"). In either case, stop words convey few semantics of text. ARC lters a set of common English stop words. Stemming converts an in ected word (e.g. \ ying") to its root (e.g. \ y") allowing words to be compared regardless of in ection. ARC leverages a commonly-used Porter Stemmer [143], which is a common stemming algorithm. To apply information-retrieval techniques, such as LDA, to software-engineering tasks, ARC separates CamelCase identiers and removes programming-language keywords. Iden- tiers are words that name software entities such as variables, methods, functions, classes, packages, etc. CamelCase identiers are identiers where each word of an identier is appended to each other and the rst letter of each word (except possibly the rst word) is capitalized (e.g., \leParser", \reduceTerms", \computeTransition", etc.). ARC sep- arates each individual word of a CamelCase identier and applies the commonly-used pre-processing steps described previously. Separating CamelCase identiers ensures that individual words of identiers can be compared. To prevent programming-language semantics from obscuring application- or system- specic semantics, ARC removes programming-language keywords. For example, words such as \class", \int", or \cout" appear frequently in C++ code. The frequency of such words may prevent the extraction of concerns that are relevant for the dierentiation between components or connectors. Figure 5.2 depicts 40 concerns of an LDA model extracted from Hadoop 0.19.0. Hadoop is a framework for large-scale distributed computation using the map-reduce paradigm [4]. A variety of the concerns are application-independent, such as compres- sion of data (concern 1), lesystem metadata (concern 3), credential handling (concern 75 Figure 5.2: An LDA model of Hadoop 0.19.0 with 40 topics Concern ID Top Words of Concern 0 shadow bs block heap size data len live su bu mtf crc weight group 1 compress lzo buf compressor decompressor direct len buer codec zlib size load header strategi 2 tag io serial record pid tree print process read valu xml deseri map cpt 3 src permiss fs client leas le set replic io time path statu block size 4 path uri fs statu le absolut io size system parent kf dst ftp har 5 read write byte data writabl length io buer string int input set java output 6 job queue statu id event prioriti info state run conf schedul progress io submit 7 task id job attempt histori kei log time map nish append string valu type 8 kei valu retri list val aggreg object polici map arrai max store java servic 9 upgrad node version io data apach hadoop org datanod df hdf protocol info command 10 group user ugi access inform unix mode conf string org congur displai hadoop matcher 11 le path dir fs conf io system local directori creat length delet exist congur 12 arg command conf option line java hadoop org run apach set tool shell jar 13 task id job tracker statu attempt event conf kill report map action tip progress 14 system src cmd fs println le equal argv path dst command err usag io 15 socket address connect host client call channel server timeout net port conf addr inet 16 writabl compar valu io kei org apach hadoop reader record util java long iter 17 block datanod info data target idx stamp list id volum node meta map locat 18 time op block num scan metric interv max number min read period oper doc 19 map task id output jvm reduc memori num max size fetch copi shu local 20 split input job format path apach org hadoop conf reader lter io mapr record 21 checksum block data size packet chunk oset byte len read datanod write buf crc 22 kei compress segment valu class val length codec le buer writer data progress fs 23 dir edit storag imag le checkpoint directori fs sd version log node type state 24 io stream input output buer data read close length byte java apach hadoop write 25 token jj char stream java cur state string error activ kind io tkn liter 26 block node datanod replic replica safe descriptor total mode map corrupt list num info 27 code param set valu link congur ar return object number speci properti true ha 28 node datanod move rack balanc sourc util target num size locat exclud choos parent 29 index rang skip record spill start length map sort end raw split partit iter 30 cach conf archiv db uri local job set congur url mapr distribut queri string 31 log info thread start util time current error run debug process warn id string 32 servlet http print request io respons hadoop url org apach java lter td javax 33 output job counter kei reduc valu hadoop report io map apach mapr conf org 34 string eld append str start kei length end char oset sb list separ result 35 inod quota directori node count path permiss ns dir exist ds parent block space 36 metric context record valu tag updat set map org apach hadoop vari int attribut 37 type cb append id const record rio java prex hadoop eld gen info code 38 task map statu progress tip reduc run time state tracker fail nish counter max 39 conf class mapper valu map congur chain instanc serial object kei set add input 76 10), forking processes (concern 12), and remote procedure calls and sockets (concern 15). Other concerns are application-specic, such as managing jobs (concern 6), ID handling (concern 7), namenode and datanode upgrades (concern 9), and job and task tracking (concern 13). 5.2 Brick Recovery Next, we recover bricks using hierarchical clustering, which group together implementa- tion entities into clusters which may, in turn, be composed of other clusters. We can use both structural information (dependency relationships between software entities) and concerns (represented as topics) as features that determine whether or not a software entity belongs in a cluster. Features are distinctive properties of software entities. In a clustering technique, each entity to be clustered is represented by a feature vector v =ff i j 1 i ng, which is an n-dimensional vector of numerical features. n is the number of implementation-level entities in the system. A feature f i is a property of an entity. For example, Table 5.2 shows the set of structural and concern-oriented features for three implementation-level classes. The structural features are boolean: 1 indicates an existing dependency, and 0 indicates no dependency. In Table 5.2, all three classes depend on the Region and AbstractImpl classes. The concern-oriented features each have a proportion value associated with each class. A recovery technique based solely on structural information would cluster all three classes together or arbitrarily choose to create clusters out of the three classes. However, since the concerns reveal that the SAnalyzer and SMonitor are related to the Strategy 77 Table 5.2: Example classes with their features. Structural Concerns Name Region AbsractImpl Strategy Weather Events WAnalyzer 1 1 0 0.9 0.1 SAnalyzer 1 1 0.8 0.1 0.1 SMonitor 1 1 0.8 0.1 0.1 concern, our approach would cluster those two classes together and avoid clustering the WAnalyzer with either of those two classes. This example illustrates how the correctness of clustering is dependent on the selection of features of the entities to be clustered. The correctness of the clustering is also dependent on the similarity measure chosen to determine which entities belong in the same cluster [110]. We choose a similarity measure that takes both concerns and structural information into account. For structural information, we rely on state-of-the-art clustering techniques since each technique tends to utilize a dierent structural similarity measure. Similarity measures for concerns need to compare probability distributions, i.e., our representations of concerns. Therefore, we take similarity measures that compute the distance between probability distributions, such as the symmetric Kullback-Leibler Divergence or the Jensen-Shannon Divergence [99], and then combine that measure with the structural information measure. We choose a symmetric measure to compare concerns. A measure Sim(p;q), where p and q are concerns, is symmetric if Sim(p;q) =Sim(q;p). The order of the inputs passed to Sim is not relevant to the similarity; thus, a non-symmetric measure would incorrectly cluster implementation entities. The main similarity measure we selected is the Jensen-Shannon divergence,D js , which computes the symmetric distance between two probability distributions. Each feature 78 vector is a probability distribution over features. D js is dened as D js = p(c i )=p(c ij ) D kl (v i jjv ij ) +p(c j )=p(c ij )D kl (v j jjv ij ). c ij is the cluster resulting from combining clus- ters c i and c j . v ij is c ij 's associated feature vector. D kl (v i jjv j ) is the Kullback-Leibler divergence, which is a non-symmetric distance measure for probability distributions, and is dened as D kl (v i jjv j ) = n X k=1 f ik log( f ik f jk ). The overall hierarchical clustering algorithm that ARC uses is depicted in Algorithm 7. The algorithm is hierachical because the bricks (i.e., clusters) that the algorithm creates are composite bricks (i.e., bricks may comprise other bricks). The algorithm takes as input a set E of software entities (e.g., classes or les) and the number n of bricks to be recovered. recoverBricks returns the set B of recovered bricks. The algorithm begins by creating a single brick for each software entity in E (Line 1) and terminates once the desired number of bricks is obtained (Line 2). recoverBricks then determines the two most similar bricks by using D js to compare all currently formed bricks (Line 3). These two bricks (b 1 and b 2 ) are then merged to create a new brick consisting of the combination of the two bricks (Line 4). To merge bricks, mergeBricks combines (1) the software entities in b 1 and b 2 and (2) the two concerns associated with each break by averaging over the concerns of each brick to create a new merged concern. The set B of bricks are then updated so that b 1 andb 2 are removed fromB and the new brickb merged is added to B. 79 Algorithm 7: recoverBricks Input: E: a set of software entities, n: the number of bricks to recover Output: B: a set of bricks 1 B make each entity a singleton brick 2 whilejBj6=n do 3 b 1 ;b 2 mostSimilarBricks(B) 4 b merged = mergeBricks(b 1 ; b 2 ) 5 B updateBricks(b 1 ; b 2 ; b merged ) 5.3 Concern Meta-Classication In order to determine whether a brick implements a component or connector, we must determine whether each concern is application-specic or application-independent. We automate this classication, which we call concern meta-classication, through the use of supervised learning techniques because they allow us to create a function that can classify concerns. Table 5.3: Example concerns labeled as application-specic or application-independent and one unlabeled concern. Concern 1 Label: app-indep word prob. event .095 port .087 request .076 send .055 receive .054 Concern 3 Label: app-specic word prob. temperature .025 wind .021 humidity .013 pressure .012 request .009 Concern 2 Label: app-indep word prob. buer .054 message .053 listen .041 connection .015 session .013 Concern 4 Label: ? word prob. connection .056 request .051 pool .050 send .047 listen .024 80 To obtain correct classications in supervised learning, it is important to choose the most relevant features to represent concerns. The key features of concerns are the dier- ent words of a concern, where some words denote application-independent concerns, while others denote application-specic concerns. For example, Table 5.3 shows four concerns: two application-independent concerns, one application-specic concern, and one concern that is not yet labeled as application-specic or application-independent. For each con- cern, we show the top 5 words and their probability values. Bolded words are words that appear in the unlabeled concern and one of the labeled concerns. Furthermore, dier- ent learning algorithms may work better for dierent classication problems than others, making the selection of the learning algorithm itself another important factor. We provide a set of examples, i.e., items to be classied, and labels that identify the category to which an example belongs as input to a supervised learning algorithm. For concern meta-classication, we take the examples to be concerns and the labels to be application-specic or application-independent. These examples are called the training set. The supervised learning algorithm uses these labeled examples to create a classier. A classier is a function that labels examples. The classier that our technique produces labels concerns as either application-specic or application-independent. For the produced classier to correctly label concerns, we must appropriately select the features we use to represent concerns and the supervised learning algorithms used to create our classier for concerns. The features are important because the correct selection of features allows the supervised learning algorithm to produce a classier that can correctly distinguish between application-specic or application-independent concerns. Given that a concern, in our approach, is represented as a probability distribution over words, we can 81 take that distribution to be a feature of the concern. In particular, the dierent words of the distribution are important for distinguishing whether a concern is application-specic or application-independent. We choose the k-nearest neighbor algorithm for classifying concerns. First, a set of correctly labeled concerns must be provided as input to the k-nearest neighbor algorithm, along with a set of concerns that are not yet labeled. For every unlabeled concern, the algorithm computes the similarity between an unlabeled concern and labeled concern, and gives the unlabeled concern the label that is among the majority of the k most similar concerns to the unlabeled concern. For example, consider Concern 4 in Table 5.3. To determine similarity between con- cerns, we choose the similarity measure used in brick recovery. Using that measure, closer probabilities of words shared between concerns result in greater similarity between those concerns. Only the word \request" is shared between Concern 3 and Concern 1. Concerns 1 and 2 both share two high probability words with Concern 4. In this example, we let k = 3. Therefore, the similarity measure would determine that Concern 4 is most similar to Concerns 1 and 2, resulting in Concern 4 being labeled application-independent. A key determining factor in correctly classifying concerns using an instance-based learning algorithm is the way in which we compare concerns. Similar to brick recovery, to correctly classify concerns, a similarity measure must be carefully chosen. We can use the similarity measure from the brick recovery step (i.e. the Jensen-Shannon Divergence D js ) since the same considerations apply for concern meta-classication. In particular, the criteria for the potential correctness of concern similarity measurement is the same for both brick recovery and concern meta-classication. 82 The last major issue is obtaining the data needed for any supervised learning algorithm ARC uses to produce the concern meta-classier. We must provide a sucient number of labeled concerns to the supervised learning algorithm used by ARC in order to produce an accurate, resulting concern meta-classier. To that end, we can extract concerns from a wide variety of open-source applications and connector-implementing libraries. 5.4 Brick Classication We can determine whether a brick implements a component or connector using supervised learning. To enable correct brick classication, we must focus on the selection of relevant features for bricks; we select the following features: (a) labeled concerns of the brick, (b) usage of connector-implementing libraries, and (c) the brick's involvement in design patterns that provide interaction services. Table 5.4: Example bricks with their features and labels. Primary Concern Type Socket Usage Datagram Usage MediatorCount ObserverCount Label Brick 1 Application-Independent 15 0 2 0 Connector Brick 2 Application-Specic 4 0 0 0 Component Brick 3 Application-Specic 15 20 4 2 Connector Three dierent intuitions about distinguishing components from connectors inform our selection of features. First, since concerns of entities in a brick occur with certain probabilities, we can determine whether the concerns in a brick are primarily application- independent or application-specic. For example, Table 5.4 depicts three bricks, two that are application-specic and one that is application-independent. Second, if a brick extensively uses a library that can be used to implement higher-order connectors, such as a socket or datagram library, then the brick likely implements a connector, which is an 83 idea we have used before for individual classes rather than bricks [81]. In Table 5.4, Brick 1 uses a socket library 15 times, while Brick 3 uses a datagram library 20 times. Lastly, certain design patterns are more relevant to interaction needs than others. For example, the Adaptor/Wrapper, Proxy, Chain of Responsibility, Mediator and Observer design patterns provide interaction services between programming language-level objects, which we call connector-oriented design patterns. Therefore, their existence in a brick indicates that the brick is more likely to be implementing a connector. In Table 5.4, MediatorCount and ObserverCount count the number of classes in a brick that participate in any mediator or observer patterns, respectively. As the number of classes within a Brick that are involved in such design patterns grows, so does the probability that the Brick implements a connector. Note that thresholds for determining when a combination of features result in one labeling or another is determined by the supervised learning algorithms. We will need to determine the correctness of dierent supervised learning algorithms in our context empirically. Table 5.4 illustrates how a classier produced by a learning algorithm may distinguish between components and connectors. Brick 1 is dominated by application-independent concerns, exhibits high usage of socket or datagram libraries, and has some classes that implement connector-oriented design patterns. Therefore, Brick 1 is labeled as a con- nector. Brick 2 is labeled a component because it is dominated by application-specic concerns, does not consist of any connector-oriented design patterns, and exhibits low usage of socket or datagram libraries. However, even though Brick 3 is dominated by application-specic concerns, it utilizes socket and datagram libraries extensively and consists of a signicant number of classes that implement design patterns resulting in it 84 being labeled as a connector. A remaining issue that requires additional experimentation is discovering less clear-cut cases for brick classication, such as classication of Brick 3, and how dierent learning algorithms deal with those cases. 85 Chapter 6 Framework for Studying Architectural Change and Decay 6.1 Foundation Our work discussed in this chapter was directly enabled by ve research threads: (1) software architecture recovery, (2) formalization of architectural concepts, (3) denition of architectural smells, and selection of existing and development of new (4) architecture change and (5) decay metrics. Before we discuss the details of the ARCADE workbench in Section 3, we will summarize this foundational work. Some of the outcomes reported here were described in prior publications, while others are novel; we will clearly delineate the two in the remainder of this section. 6.1.1 Architecture Recovery Tool Suite We recently conducted a comparative evaluation of software architecture recovery tech- niques [59]. The objective was to evaluate the existing techniques' accuracy and scalability 86 on a set of systems for which other researchers and we had previously obtained \ground- truth" architectures [60]. To that end, we implemented a tool suite oering a large set of architecture recovery choices to an engineer. Our study showed that, on the whole, most of the state-of-the-art recovery techniques suer from accuracy and/or scalability problems. At the same time, two techniques con- sistently outperformed the rest across the subject systems. We select these techniques for ARCADE to ensure high accuracy for our analysis. These two techniques|ACDC [169] and ARC [64]|take dierent approaches to architecture recovery: ACDC leverages a system's structural characteristics, while ARC focuses on the concerns implemented by a system. The former is obtained via static dependency analysis, while the latter leverages information retrieval and machine learning. 6.1.2 Architectural Change Metrics Dierent metrics have been proposed at the system and component levels to attempt to quantify architectural change. In this section, we highlight three metrics on which we have relied. These metrics treat clusters as non-hierarchical (i.e., components do not contain sub-components and only contain groups of code-level entities). We have demon- strated that non-hierarchical components represent architectures accurately according to a number of architects of existing software systems [60]; hierarchical components can be analyzed simply by \ attening them out" without loss of architectural information. MojoFM [172] is a widely-used change metric. It is a distance measure between two architectures, expressed as a percentage. MojoFM is based on two operations used to transform one architecture into another: moves (Move) of implementation-level entities 87 from one architectural cluster (i.e., component) to another and merges (Join) of clusters. MoJoFM is dened as MoJoFM (A i ;A j ) = (1 mno(A i ;A j ) max(mno(8A i ;A j )) ) 100% whereA i is the architecture obtained from versioni of a systemS;A j is the architecture from versionj ofS; and mno(A i ;A j ) is the minimum number of Move and Join operations needed to transform A i into A j . Cluster-to-cluster (c2c) is a metric we developed in our recent work [59] to as- sess component-level change. This metric measures the degree of overlap between the implementation-level entities contained within two clusters: c2c(c i ;c j ) = jentities(c i )\ entities(c j )j max(jentities(c i )j;jentities(c j )j) 100% where entities(c) is the set of entities in cluster c; c n is a cluster from version n of systemS; and versioni is dierent from versionj. The denominator is used to normalize the entity overlap in the numerator by the number of entities in the larger of the two clusters. This ensures thatc2c provides the most conservative value of similarity between two clusters. Architecture coverage (c2c cvg ) is a new change metric we have developed to in- dicate the extent to which one architecture's clusters overlap the clusters of another architecture according to c2c: 88 c2c cvg (A 1 ; A 2 ) = jsimC (A 1 ; A 2 )j j(A 1 :G:C )j 100% simC (A 1 ; A 2 ) =fc i j (8c i;j 2A 1;2 )(c2c(c i ; c j )> th cvg )g simC (A 1 ; A 2 ) returns the subset of clusters from A 1 that have at least one \similar" cluster in A 2 , i.e., A 1 's clusters for which c2c returns a value above a threshold th cvg for one or more clusters from A 2 . A 1 :G:C are the clusters of A 1 . c2c cvg allows an engineer to determine the extent to which certain components existed in an earlier version of a system or were added in a later version. For example, consider a system whose version v2 was created after v1, and for which c2c cvg (A 1 ;A 2 ) = 70%, and c2c cvg (A 2 ;A 1 ) = 40%. This means that 70% of the components in version v1 still exist in version v2, while 100% c2c cvg (A 2 ;A 1 ) = 60% of the components in version v2 have been newly added. 6.1.3 Architectural Decay Metrics Several metrics have been proposed to quantify architectural decay. In our work, we rely on three existing metrics that are based on component interdependencies. We dene three additional new metrics that explicitly leverage our denition and detection of architectural smells. Ratio of cohesive interactions (RCI) [33] quanties the number of dependencies in a system as compared to the maximum number of possible dependencies: 89 RCI = jf(c i ;c j )2Lgj jCj (jCj 1) 100% A high RCI value means that changes and errors in one component are likely to propagate to a larger number of other components, which, in turn, indicates signicant decay. Instability [114] of a componentc measures the likelihood thatc would change using the following equation: instab(c) = jf(c src ;c)2Lgj jf(c src ;c)2Lgj +jf(c;c dst )2Lgj 100% The numerator of instab is the number of a component's incoming dependencies, while the denominator is the sum of incoming and outgoing dependencies. High values for instab(c) are indicators of decay: a high number of incoming dependencies relative to outgoing dependencies suggests that c is more likely to be forced to change in response to changes to other parts of the system. To obtain a system-wide measure using instab, we compute the average instability for all components in an architecture. Modularization quality (MQ) [127] quanties the degree of coupling and cohesion between components in a system. Like the change metrics, MQ represents a component as a cluster. MQ for a given architecture is dened as MQ = P jCj i=1 CF i 90 where CF i is the \cluster factor" of clusteri, representing its coupling and cohesion. CF i is dened as CF i = 8 > > > > > > > > > < > > > > > > > > > : 0 i = 0 2 i 2 i + k X j=1 j6=i ( i;j + j;i ) otherwise where i is the number of edges within the cluster, which measures cohesion; and i;j is the number of edges from cluster i to cluster j, which measures coupling. Lower MQ values indicate that components in the system are less cohesive and more coupled which, in turn, indicates greater architectural decay. To make the values of MQ comparable across dierent versions of a system, we normalize them by the number of clusters (i.e., components) in a particular version of a system. Bi-directional component coupling (BDCC) is a new metric we have developed. A bi-directional dependency increases the likelihood that the two involved components must be evolved together. BDCC is dened as follows: BDCC = jf(c i ;c j )j (c i ;c j )2L^ (c j ;c i )2Lgj jCj 2 100% The numerator of BDCC counts the pairs of components that are involved in a bi- directional dependency, while the denominator counts all possible bi-directional depen- dencies in an architecture. 91 Architectural-smell density (ASD) is a metric we have developed to quantify how extensive the number of detected smells is relative to the number of components in a system: ASD = jSmellsj jCj As may be expected, higher smell densities indicate greater decay. Architectural-smell coverage (ASC) is a newly dened decay metric that measures the proportion of components in a system that are aected by an architectural smell: ASC = jfcjc2C^ hasSmell(c)gj jCj 100% Depending on its implementation, hasSmell(c) allows us to determine, both, whether component c contains an instance of a specic smell and whether c contains an instance of any smell. As in the previous case, a higher value of this metric indicates greater decay. 6.2 ARCADE To study architectural change and decay, we have developed ARCADE, a novel software workbench that employs four key elements: (1) architecture recovery, (2) architectural- smell detection, (3) metrics that quantify change and decay, and (4) analysis of the cor- relation of the reported implementation-level issues in a given system with the discovered architecture-level issues. ARCADE combines these elements in the manner depicted in Figure 6.1 to investigate a variety of questions regarding architectural change and decay. In this section, we provide an overview of ARCADE, focusing specically on its aspects 92 that have enabled the empirical study detailed in Section 4. We conclude the section by summarizing ARCADE's implementation details. Figure 6.1: ARCADE's key components and the artifacts it uses and produces. 6.2.1 Recovery ARCADE's foundational element is architecture recovery, depicted as Recovery Tech- niques in Figure 6.1. The architectures produced by Recovery Techniques are directly used for studying change and decay. ARCADE currently provides access to eight recovery techniques. This allows an engineer (1) to extract multiple architectural views and (2) to ensure maximum accuracy of extracted architectures by highlighting their dierent aspects. Since our previous evaluation [59] showed that two of the techniques|ACDC and ARC|exhibit signicantly better accuracy and scalability than the remaining techniques, and that they produce complementary architectural views (recall Section 2.1) we focus on them in our study of architectural change and decay in the remainder of the chapter. ACDC's view is oriented toward components that are based on structural patterns (e.g., components consisting of entities that together form a particular subgraph). On the other hand, ARC's view produces components that are semantically coherent due to 93 sharing similar system-level concerns (e.g., a component whose main concern is handling of distributed jobs). To congure ACDC for our study, we used its default settings. When applying ARC to a system, we set the number of clusters to 20% of the number of system classes, and the number of concerns to 18% of the number of classes. We used these settings because they demonstrated the best accuracy in our previous work [59]. 6.2.2 Architectural-Smell Detection ARCADE enables the study of architectural decay through the detection of architectural smells. Architectural Smell Detector in Figure 6.1 implements smell-detection algorithms based on our formalization of architectural concepts (Section 2.2) and denitions of smells (Section 2.3). In this chapter, we focus on the detection of the four smells dened in Section 2.3 because they have been shown to be good indicators of architectural problems [63, 62, 106, 128]. Recall from Section 2.3 that detecting three of the smells|scattered parasitic func- tionality, concern overload, and link overload|depends on the values of specic thresh- olds. These thresholds may be set manually, as we have done in our prior work (e.g., [128]). In the study reported in this chapter, we attempted to avoid bias by setting the thresholds automatically. For example, in our results reported in Section 4, concern over- load's th co threshold was set to be the mean plus one standard deviation of the number of concerns in a system's components; likewise, link overload's th lo threshold was set to be the mean plus one standard deviation of the number of incoming links for components in a system. On the other hand, th spf of scattered parasitic functionality and th z c of 94 concern overload are set to 0.1. We have experimented with several non-extreme (i.e., not very close to 0 or 1) thresholds and have found that the overall trends and conclusions reported in Section 7.3.2 remain unchanged. 6.2.3 Quantifying Change and Decay For each architecture, ARCADE computes the three change metrics (recall Section 2.4) and six decay metrics (Section 2.5). As depicted in Figure 6.1, Change Metrics Calculator and Decay Metrics Calculator analyze the architectures yielded by Recovery Techniques. The computed metrics comprise two of the artifacts produced by ARCADE, which are then used to interpret the degree of architectural change and decay in the manner dis- cussed further in Section 4. 6.2.4 Relating Issues and Architectural Decay We hypothesized that a software system's issue repository (e.g., Jira or Bugzilla) can be a valuable source of information about architectural decay. System stakeholders re- port in such a repository the bugs, perfective or adaptive changes, discussions about re-engineering the system, etc. To study the relationships between the reported issues on the one hand and architectural decay on the other, ARCADE supports the calculation of correlation coecients between the issues and the identied architectural smells. To that end, ARCADE includes two components: Issue Extractor and Relation An- alyzer. Issue Extractor obtains issues from an issue repository. Relation Analyzer deter- mines relationships between the extracted issues and architectural smells by computing 95 two dierent correlation coecients: Pearson's and Spearman's. Pearson's correlation co- ecient allows ARCADE to determine linear relationships between architectural smells and issues. ARCADE uses Spearman's correlation coecient to determine if that rela- tionship is actually monotonic but not linear. These correlations allowed us to study (1) whether implementation-level issues can help predict the presence of architectural smells and (2) whether addressing the reported issues can help to stem architectural decay. 6.2.5 ARCADE's Implementation ARCADE is implemented in Java and Python; it is available for download from [12]. To represent topic models for ARCADE's concern-based smell detection (recall Section 2.3), we have used the MALLET machine learning toolkit [119]. ARCADE employs JGraphT [135], a Java graph library, to aid in dependency-based smell detection and for the calculation of decay metrics. To calculate the MoJoFM metric [172], we procured an implementation of MoJoFM from its original creators. For the other metrics, ARCADE employs our own implementations. ARCADE's Issue Extractor is implemented using jira-client [13], a REST client for the Jira issue repository implemented in Java. Finally, ARCADE's Relation Analyzer relies on Apache Commons Math [40], a mathematics library, for computing correlation coecients. 96 Chapter 7 Evaluation Our evaluation focuses on (1) our experience and lessons learned when applying our ground-truth recovery framework to four software systems; (2) a comparative analysis of eight variations of six architecture-recovery techniques, including ARC, across eight ground-truth architectures; and (3) an empirical study using ARCADE of 463 versions of twelve open-source software systems comprising over 112 MSLOC. 7.1 Applying the Ground-Truth Recovery Framework This section (1) describes our experience applying the ground-truth recovery framework to four open-source software systems, (2) presents information about the architectures of those systems, and (3) discusses lessons learned from our experience. 7.1.1 Experience To date, we have produced four ground-truth architectures using the above framework; a fth recovery, of Google's Chromium [11], is currently underway. Figure 7.1 summarizes 97 each System with which we have worked, its Version, application Domain, primary imple- mentation Language, size in terms of SLOC, and the number of Recoverers and C ertiers involved in its architecture recovery. The variations in the systems' sizes, domains, and implementation languages allow us to extrapolate broader lessons about ground-truth recovery. We specically selected the systems' versions with which the available certiers were closely familiar. We selected Focus [121] as the architecture recovery technique; re- call from Section 4.2 that our framework tries to minimize the bias of the chosen recovery technique on the recovered architecture. In the remainder of the section, we discuss each of the four systems. For each, we illustrate the recovered ground-truth architectures, or certain portions thereof. The de- picted ground-truth diagrams will be at a very low magnication by necessity. The point of showing these diagrams is not to explain their details, but instead to highlight the discrepancies between them and the systems' corresponding conceptual-architecture di- agrams that were obtained from available documentation. These discrepancies will be revisited in Section 7.1.2.1. High-resolution ground-truth diagrams as well as other de- tails from the four systems' recoveries can be found at [10]. System Ver Dom Lang SLOC R C Bash 1.14.4 OS Shell C 70K 1 1 OODT 0.2 Data Middle- ware Java 180K 1 1 Hadoop 0.19.0 Data Middle- ware Java 200K 2 1 ArchStudio 4 Architectural IDE Java 280K 1 2 Figure 7.1: Summary information about systems recovered 98 7.1.1.1 Apache Hadoop Hadoop is a widely used open-source framework for distributed processing of large datasets across clusters of computers [7]. Hadoop is about 200 KSLOC in size and contains over 1,700 OO classes. Two principal subsystems of Hadoop implement the Hadoop Dis- tributed File System (HDFS) [159] and the Map/Reduce programming paradigm [47]. Map/Reduce is used for processing large-scale datasets in a parallel and distributed fashion. In this paradigm, a dataset is divided into parallelizable chunks of data. The dataset and the operations that can be performed on it are called jobs. A master, called JobTracker in Hadoop, runs on a single node. JobTracker takes a job and distributes it to workers, called TaskTrackers. Hadoop uses HDFS to store data. HDFS relies on a fault-tolerant master/slave architecture. A master node, called NameNode, manages the lesystem's namespace, access to les by clients, and the distribution of data. The other nodes in the cluster are slaves of the NameNode, called DataNodes. DataNodes store and manage blocks comprising les in HDFS. Hadoop is accompanied with rich usage documentation, auto-generated API documen- tation, and high-level architectural documentation. Figure 7.2 depicts an architectural diagram taken from the Hadoop documentation. This diagram captures dependencies be- tween certain system components and several operations that those components perform. The available documentation helped the recoverers get familiar with the application and to identify certain expected components (e.g., TaskTracker, NameNode). The documen- tation also revealed that Hadoop is divided into three major subsystems, corresponding to HDFS, Map/Reduce, and the system utilities. 99 Namenode Client Metadata ops Read Datanodes Client Rack 1 Write Datanodes Rack 2 Metadata Block ops Replication Blocks Write Figure 7.2: An architectural diagram from Hadoop's documentation. Our certier for Hadoop was a Yahoo! engineer who has been a long-time contributor to the system and has an intimate understanding of its architecture. We relied on two recoverers, both of whom are PhD students with previous experience in software architec- ture recovery. One of the recoverers had industry experience with building applications based on Map/Reduce; the other recoverer had industry experience with building dis- tributed systems for large-scale data storage and processing. This experience aided the recoverers in obtaining mapping principles relevant to Hadoop's application domain. The recoverers relied on a combination of system dependencies obtained using IBM's Rational Software Architect (RSA) [6] and the Class Dependency Analyzer (CDA) [8]. To illustrate how dierent types of mapping principles are used in the ground-truth recovery process, Figure 7.3 shows how an application principle for Hadoop overrides a domain principle. A domain principle for Hadoop states that servlets should be grouped into a Web Server component (Figure 7.3(a)). Servlets are Java classes that respond to 100 Web Server HttpServlet TaskLogServlet ListPathsServlet (a) Servlets merged into web server NameNode Group TaskTracker Group TaskTracker TaskLogServlet ListPathsServlet NameNode (b) Servlets separated ac- cording to use by domain components Figure 7.3: For Hadoop, an application principle overrides a domain principle. requests by using a request-response protocol and are often used as part of web servers. Therefore, the HttpServlet, TaskLogServlet, and ListPathServlet classes are all grouped together based on this domain principle. However, an application principle, stating that servlets should be grouped with the modules that depend on them, overrides that do- main principle. Thus, the TaskTracker and TaskLogServlet classes are grouped into one component, while the NameNode and ListPathServlet classes are grouped into another component (Figure 7.3(b)). o r g . ap ache. ha do o p . i p c o r g . ap ache. ha do o p . i o o r g . ap ache. ha do o p . co nf 1__Ot her_ut i li t i es 0__M etr i cs 3a __S ecurity _Da ta _S tr uctur es 9__Netw o rk_Util ities o r g . ap ache. ha do o p . m et rics o r g . ap ache. ha do o p . m a pr ed o r g . ap ache. ha do o p . util o r g . ap ache. ha do o p . fs 45 __T o ta l Or der Pa rt i t i o n er 50 __C+ +_P i p es_Pr o to co l 13 __Co n nect o rs 23 __Pr o g ress 43 __Multiple_Outp ut_Fil e_F o rmat ting 21__JobClient 14 __T ask _( Da ta _sto r a g e_co m p o nen t) 17 __T ask _Tr ack ing _a nd _R un ning 47 __V alue_A g g r eg at o r 48 __DB_I/ O 7__Filters 3__D o main_da ta _str uctur es 22__Job_History 35 __J ob_Tr ack ing 46 __K eyFiel d B a sedP art i t i oner 2__Jo in_ut i li t i es 34 a__M ap red _cl ient 49 __C+ +_P i p es_J o b 36 __S equen ce_Fi le_Han dli n g 34 __Ma pr ed_ut i li t i es 42 __Cha i n _Ma pReduce 37 __T r ack er_P ro t o co l s 41__JvmManager o r g . ap ache. ha do o p . net o r g . ap ache. ha do o p . security o r g . ap ache. ha do o p . htt p o r g . ap ache. ha do o p . l o g o r g . ap ache. ha do o p . fi lecache o r g . ap ache. ha do o p . hdfs 30 __Int erD at aNo de_P ro t o co l 4__DFS_Utils 20 __Da ta N o d e 5__FSDat aS et 16 c__Cli en t-facing _Excep tio ns 29 a__D at an o de_P ro t o co l 32 __Cli en t_D at an o de_P ro t o co l 28 __HT TP_P ro t o co l_A ccess 32 __Pr o to col _Ut i li t i es 16 e__Co r e_N a m e_No d e_S tr uctures 16__Namenode 16 d__S econda ry _N a m e_No de 27 __DFS _T o o ls 24 __FS _Lo g g ing 19 __Distr i b uted _Fi le_S ystem 16 a__NameN ode_HDFS_Proto co l_Co r e 29 __N a m en o de_P ro t o co l 33 __Orp han ed_P ro t o co l _Ut i li t i es 31 __Cli en t_P ro t o co l 11 __B a l a ncer 4a __HDFS _Up g r a de_Ma na g em en t o r g . ap ache. ha do o p . reco r d 3b __Inp ut_F o rmat _Dat a_S t ruct ures 14 a__An t_T a sk 39__JobQueueClient 16 b__Tr an sfer_HD FS _o v er _HT TP 40 __J obS hell 44 __Deleg at i n g Ma pp er (a) Ground-truth architecture of Hadoop in a circu- lar layout JobShell JobClient Job Tracking HDFS Upgrade Management Datanode Protocol NameNode HDFS Protocol Core Client Protocol Tracker Protocols Task Tracking and Running FS Logging Namenode JvmManager Job History KeyFieldBasedPartitioner Mapred client Balancer Distributed File System Mapred utilities Secondary Name Node HTTP Protocol Access DB I/O Map/Reduce Tasks DataNode Sequence File Handling Namenode Protocol FSDataSet Transfer HDFS over HTTP C++ Pipes Protocol Chain MapReduce Multiple Output File Formatting DFS Tools DelegatingMapper Join utilities Client Datanode Protocol C++ Pipes Job Input Format Data Structures Core Name Node Structures JobQueueClient Value Aggregator Client-facing Exceptions InterDataNode Protocol TotalOrderPartitioner (b) Ground-truth architecture of Hadoop with many utility components removed Figure 7.4: Two views of Hadoop's ground-truth architecture. 101 Two views of the recovered ground-truth architecture of Hadoop are depicted in Fig- ure 7.4. Again, at this magnication the diagrams are not intended to be readable, but rather to convey a sense of the size, structure, and complexity of the architecture. Fig- ure 7.4(a) shows all of Hadoop's recovered components and their interdependencies in a circular layout. Figure 7.4(b) shows only the components from the Map/Reduce subsys- tem in yellow (lighter) ovals on top, and the components from HDFS in green (darker) ovals on the bottom. The recovered ground-truth architecture of Hadoop is signicantly dierent and more complex than what is depicted in Figure 7.2 or in any other available documentation. 7.1.1.2 Bash The Bourne-Again SHell (Bash) is a command-line shell that provides a user interface to a GNU operating system (OS). Bash is included in popular OSs, such as GNU/Linux and Mac OS X. Bash is written in C, consists of about 70 KSLOC, and contains over 200 source les. Input Lexicalb Analysisbandb Parsing Expansion Commandb Execution Braceb Expansion Tildeb Expansion Wordb Splitting Filenameb Generation Variableb andb Parameterb Expansion,b Command,bb Process,b Arithmeticb Substitution Exitb Status Figure 7.5: Conceptual architecture of Bash. 102 Version Handling vi Mode Line Editing POSIX Library Support libc General Command Structures and Handling Command History Error Handling Readline Line Editing Support Signal Handling Job Control Shell Initialization Builtins I/O Handling Mail Checking Arithmetic Expressions Library Tilde Expansion Readline Key Bindings Completion and Expansion Conditional Commands Readline Search Memory Allocation Library Misc. Library Pathname Expansion Library Figure 7.6: Ground-truth architecture of Bash. Bash's conceptual architecture, depicted in Figure 7.5, follows the data- ow style. Command-line input is parsed, processed, and executed by the dierent components in the architecture. Extensive documentation explaining the use and installation can be found on GNU's website. Some architectural information about Bash is also shipped with its code base. A chapter describing Bash's component architecture, written by Bash's pri- mary developer/maintainer|who is also our certier for the ground-truth architecture| can be found in [34]. Our certier has been involved with the system for over 20 years and has been the primary developer of Bash for over 17 years. The recovery of Bash was unique in that an incomplete recovery performed by another researcher [110] was available. Our single recoverer was a PhD student with experience in architecture recovery. Dependencies between the Bash les were obtained using the mkdep tool [3]. 103 Figure 7.6 shows the ground-truth architecture of Bash, depicting its components and their interdependencies. Note that there are signicantly more components and dependencies in the ground-truth architecture than in the conceptual architecture from Figure 7.5. Additionally, the data- ow from the conceptual architecture is not apparent in the recovered architecture. Despite these discrepancies, the conceptual and recovered views have certain similarities. For example, using the application principles resulted in the recovery of the Readline Line Editing Support component in Figure 7.6, which is a key functionality of the Input component from Figure 7.5. 7.1.1.3 ArchStudio ArchStudio [44] is a development environment for modeling, analyzing, implementing, and visualizing systems and software architectures. ArchStudio is implemented in Java, and consists of about 280 KSLOC spread over 800 OO classes. Other Tools (PL Support, Evolution) Other Tools (PL Support, Evolution) xArchADT Data Binding Library ArchEdit (Syntax Editing) ArchLight (Analysis) Engine 1 Engine 2 Other Tools (PL Support, Evolution) Other Tools (PL Support, Evolution) Archipelago (Graphical Editing) Figure 7.7: Conceptural architecture of ArchStudio. Conceptually, ArchStudio consists of a set of tools that all interact with a component called xArchADT, as depicted in Figure 7.7. ArchStudio uses an extensible architecture description language (ADL) called xADL [45] to represent all aspects of an architecture. ArchStudio tools interact via xADL descriptions through xArchADT. The components 104 AIM Cha ng eSetUtils AIMLa uncher ArchStudio Ut i ls Cha ng eSetAD T Fil eTr ack er Cha ng eSetRel a tio nshipM ana g er Pr uner Bo o l ea nEv a l V ersio nP run er S electo r Cha ng eSetsV iewer Archipelago XArchC han g eS et Archlig htIssueV iew EditorMa na g er Archlig ht Cha ng eSetS yn c ArchEdit Bo o l ea nNo ta tio n G r a phLa y o ut Cop yP asteM an ag er RelatedElements Cha ng eSetIdRelatio n shi p s Cha ng eSetS t atusRelatio n shi p s G r a phLa y o utP refs Archlig htT estAD T Cha ng eSetIdVi ew G ua rdTr a ck er XArchAD T Reso urces TypeWr angl er Archlig htT o o l A g g reg a to r AIMEclipse Archipelago T y pesP refs Archipelago Pr efs S electo r Dr i v er Launcher FileManager S chemat ro n Prefs Tr aceli n k V i ew Cha ng eSetS t atusVi ew BasePr eferen ces XArchD eta ch Archlig htNo ticeV i ew Cha ng eSetV iew Pr eferen cesAD T S chemat ro n Archlig htNo ticeAD T Archlig htP refs Archlig htIssueAD T EditorPrefs Explici t AD T Ratio n aleV i ew Figure 7.8: Ground-truth architecture of ArchStudio. and connectors in ArchStudio's architecture follow the constraints of the Myx architec- tural style [5]. ArchStudio is implemented as a set of Eclipse plug-ins. This information is incorporated in the ground-truth recovery process in the form of system-context prin- ciples, and helped us to identify ArchStudio's components. Our two certiers for Arch- Studio were its primary architect and one of its developers. The recoverer was an MS student with signicant industry experience, including prior experience with architecture recovery. CDA was used to extract dependencies from ArchStudio. The ground-truth architecture of ArchStudio is depicted in Figure 7.8. It is important to notice that dependencies are sparse and that there are a signicant number of discon- nected components. This is consistent with the design decision to implement ArchStudio on top of the myx.fw architectural framework. The framework handles component inter- actions, which makes the component dependencies implicit. Our recovery process took into account the usage of myx.fw by dening a system context principle stating that 105 Commons Product Profile Query Workflow Resource Catalog Metadata File Manager Curator Catalog & Archive Utilities Grid Services Figure 7.9: Conceptual architecture of OODT. components of ArchStudio must inherit from the class that represents components in myx.fw. 7.1.1.4 Apache OODT Apache OODT [116] is an architecture-based framework for highly distributed, data- intensive systems. Created over the past decade, OODT consists of over 180 KSLOC. OODT provides services, tools, APIs, and user interfaces for information integration and large scale data-processing across a number of scientic domains (earth science, planetary science, bioinformatics, etc.). OODT is NASA's rst project to be stewarded at the Apache Software Foundation. The conceptual architecture of OODT is depicted in Figure 7.9. OODT provides components for sharing large-scale data and metadata deployed locally at a scientic institution or a data center. To enable this sharing of data, OODT has product server components that expose data les. Querying and retrieval of this data is enabled by prole server components that expose metadata associated with these data les, allow querying of metadata, and support retrieval of data les that have been queried. The data pro- cessing side of OODT provides components for le and metadata management, work ow management, and resource management. Complementary to these services are OODT's 106 client frameworks for remote data acquisition; automatic le identication, metadata extraction, and ingestion; and scientic algorithm integration. Figure 7.10: Ground-truth architecture of OODT. One of the key architects and primary developers of OODT served as the certier for our recovery. The recoverer was a PhD student with architectural recovery experience. The recoverer had worked previously as a developer on the OODT project. However, he had very limited knowledge about the relationship of code-level entities to architectural elements. CDA [8] was used to extract dependencies from OODT. OODT's ground-truth architecture consists of 217 components and is depicted (at very low magnication) in Figure 7.10. Similarly to Hadoop, OODT's ground-truth architecture is considerably more complex than its conceptual architecture. OODT has about 20 subsystems, each of which is modularized into its own project. In turn, those subsystems were decomposed into sub-architectures at a much lower-level of abstraction than the conceptual architecture from Figure 7.9. 107 OODT's ground-truth recovery was unique in that a number of its application map- ping principles involved the system's package structure. One such principle states that each component of a subsystem responsible for external system interfaces is located in packages ending with \.system". 7.1.2 Lessons Learned The overall aim of our work is to encourage obtaining more ground-truth architectures and to provide potential research directions for improving existing recovery techniques. To address that aim, we make several observations about the ground-truth architectural recoveries we have performed to date and the process used to obttain them. We dis- cuss the discrepancies between the conceptual and ground-truth architectures (Section 7.1.2.1), the recovered components' sizes (Section 7.1.2.2), the role of utility components (Section 7.1.2.3), and the extent to which implementation packages and directories repre- sent architectural components (Section 7.1.2.4). To improve understanding of the benets of our ground-truth recovery framework, we discuss the relative importance of application and domain principles in comparison to generic principles (Section 7.1.2.5), and the eort required to obtain the ground-truth architectures (Section 7.1.2.6). 7.1.2.1 Discrepancies Between Architectures The signicantly higher numbers of dependencies in the recovered ground-truth archi- tectures in comparison to the conceptual architectures of the studied systems suggest possible architectural drift and erosion (i.e., accidental, unplanned introduction of design decisions). While drift and erosion may explain some of the clutter in the ground-truth 108 architectures, there are also more prominent reasons behind it: (1) implicit capture of architectural constraints (e.g., styles) in the implementation, (2) dierent levels of ab- straction, and (3) dierent goals of architectural views. The conceptual view of a system's architecture often considers the architectural style(s) used for the system's design, and consequently the specic types of software connectors employed by those styles. For example, according to its conceptual architecture from Figure 7.5, Bash follows the data- ow style and thus includes data-stream connectors. However, existing architecture recovery techniques are not able to directly recover connec- tors. In the case of Bash, this means that the data-stream connectors remain encapsulated inside implementation-level artifacts for I/O manipulation, such as buers, and are thus mapped to the I/O utility component. Besides style-based discrepancies, there are signicantly more components in the ground-truth architectures than in their conceptual counterparts. In Bash, the appli- cation principles obtained from documentation and the modications to the architecture recommended by the certier resulted in architectural diagrams at a lower-level of ab- straction than the conceptual architecture of Figure 7.5. For example, the Job Control component in Figure 7.6 can be considered part of the Command Execution component in Figure 7.5. Furthermore, application principles help to identify utility components, which are not typically depicted in conceptual architectural diagrams. For example, Hadoop's ground-truth architecture diagram that omits the utility components (Figure 7.4(b)) is indeed closer to the conceptual diagrams found in the documentation than the complete ground-truth diagram (Figure 7.4(a)). 109 Even though the conceptual and ground-truth architectures are at dierent levels of abstraction, all of them are considered valid by their certiers. The documentation of Bash, ArchStudio, and OODT, which describes their respective conceptual architectures, was either written or recommended by the certiers. In the case of Hadoop, our certier's explanations indicated that the ground-truth was an architectural view that would help a developer understand the code from an architectural perspective. On the other hand, the intent behind the conceptual architecture may not be to strongly correlate with the implementation, but rather to partially and often informally represent certain important aspects of the system. For example, the conceptual architecture of Bash from Figure 7.5 represents a data element Exit Status in the same way it represents the processing components. Our ndings have potential impact on future research in the area of architecture re- covery. First, since existing recovery techniques focus on components and are ill-suited for explicitly capturing architectural styles and connectors, further research should be done to identify them properly. Second, the recovery technique should be able to adapt to the goals of the recovery by presenting the appropriate, possibly partial, views. Third, fur- ther research should be conducted to determine how to identify those dierences between the conceptual and the recovered architecture that are likely to suggest true architectural drift and erosion, as opposed to the dierences that are simply a by-product of dierent abstraction levels and architectural views' objectives. 110 7.1.2.2 Recovered Component Size Existing architecture recovery techniques can vary greatly in terms of the sizes of compo- nents they construct. Some techniques try to balance the numbers of clustered code-level entities (i.e., classes or les) across components [168]. Others allow sizes to vary de- pending on a system's structural dependencies, so that a single component may end up encompassing a large fraction of the system's entities [110]. We examined the extent to which components in our subject systems varied in number of constituent implementation entities. The goal of this analysis is to inform developers of new recovery techniques and future ground-truth recovery eorts about the ranges of component sizes one should expect. Figure 7.11 summarizes this information. For each recovered System, the table notes the total number of implementation Entities, the Mean Size of its components in the number of contained entities, and the ratio between the mean size and the total number of entities, expressed as a percentage (% Mean Size). Across the four systems, the component sizes are not normally distributed and are mostly below the mean. The positive skewness of the component sizes, ranging from 1.93 to 4.88, conrms this fact. Over 84% of the groupings had component sizes that were smaller than the mean plus one standard deviation. Components that were beyond this size, which we refer to as large components, occurred infrequently. Over one half (52%- 72%) of the components in Bash, Hadoop, and ArchStudio were small, i.e., less than half the mean size, while 42% of OODT's components were small. This suggests that recovery techniques are likely to obtain better results if the components are kept fairly small. 111 System Entities Mean Size % Mean Size Bash 214 9 4.00% OODT 896 4 0.50% Hadoop 1236 18 1.50% ArchStudio 812 15 1.90% Figure 7.11: Data on the number of entities within components We also examined the extent to which components were singletons, i.e., consisted of only one implementation entity. Singleton components tended not to be explicitly called out in documentation and thus it is unlikely that they would be properly identied using traditional recovery techniques. Hadoop and ArchStudio had fewer than 5% singletons, while Bash had 16%. OODT was an outlier at both ends of the component-size spectrum: it had the greatest number of large components (16%) as well as singletons (24%). System Core Util Total % Util Bash 16 9 25 36.00% OODT 177 40 217 18.43% Hadoop 47 21 68 30.88% ArchStudio 52 2 54 3.70% Figure 7.12: Data on the number of core and utility components System Total Span Pkgs % Span Pkgs Share Pkg % Share Pkg Not Pkg % Not Pkg Bash 25 6 24% 17 68% 23 92% OODT 217 43 20% 85 39% 128 59% Hadoop 68 18 26% 40 59% 58 85% ArchStudio 54 18 33% 0 0% 18 33% Figure 7.13: The extent to which package or directory structures represent the architec- ture 7.1.2.3 Core and Utility Components Previous research has shown that utility components are important for obtaining cor- rect architecture recoveries [110, 15]. Therefore, we explicitly examined the extent to which utility components were found in each system. We discuss the roles recoverers and certiers played in correctly identifying the utility components. 112 Figure 7.12 distinguishes utility components from core application components. The table depicts, for each System, the number of Core components, the number of Utility components, the Total number of components, and the percentage of utility components in the system (% Util). Utility components constituted signicant portions of three of our four subject systems (ArchStudio being the exception). Bash contains the largest number of utility components since, as an OS shell, it utilizes extensive functionality for interfacing with the OS and handling command-line input. Hadoop also interfaces with the OS extensively, to perform its networking and lesystem functionality. One of Hadoop's three major subsystems is, in fact, dedicated to providing utility services (recall the discussion of Figure 7.4). One of OODT's major subsystems is also tasked with providing utility services. Additionally, a number of OODT subsystems contain their own specialized utility modules. On the whole, the recoverers had little trouble distinguishing entities that implement core functionality from those implementing utility functionality. However, in each of the four systems, the certiers made key recommendations regarding how the overall utility functionality should be divided into individual components based on application princi- ples. For example, OODT has a largely standardized package structure per subsystem for housing utility components, of which the recoverer was unaware. For both Hadoop and Bash, in addition to supplying application principles for splitting the utility function- ality into components, the certiers also provided specic advice about moving to core components certain entities originally assigned to utility components by the recoverers. 113 7.1.2.4 Packages and Directory Structure Conventional software engineering wisdom suggests that implementation-level package and directory structures should organize and group code elements according to the ele- ments' conceptual functionalities. Hence, a straightforward architecture recovery method could be to simply consider packages and/or directories to be software components. In fact, this approach has been adopted by some existing work [22, 177]. To evaluate the va- lidity of such a method, we examined the extent to which packages or directories represent the components in the four ground-truth architectures. Figure 7.13 summarizes our ndings. We calculated the numbers of components (1) whose constituent entities span multiple packages or directories, and the components (2) whose entities share a single package or directory with entities belonging to other components. For each System, the table depicts the Total number of components; the number of components whose constituent implementation entities span multiple packages and/or directories (Span Pkgs); the percentage of such components in the system (% Span Pkgs); the number of components whose entities share a package or directory with other components' entities (Share Pkg); the percentage of such components in the system (% Share Pkg); the total number of components for which the package or directory structure does not match the architecture (Not Pkg), which is the sum of Span Pkgs and Share Pkg; and the percentage of such components in the system (% Not Pkg). Other than in ArchStudio, packages and directories were not generally representative of components in the studied systems. For Bash and Hadoop, the packages and directories 114 were almost entirely dierent from architectural components (the value of % Not Pkg was 85% and 92%, respectively). Bash has signicantly more components than directories. Furthermore, Bash is a smaller system than the others, and is primarily maintained by a single person who is both the architect and lead developer. Therefore, the need to maintain clearly separated components is likely reduced and, thus, not mirrored in the directory structure. Hadoop has relatively few packages scattered across many components and three major subsystems. An interesting aspect of the Hadoop recovery was that our certier suggested many utility components that were split across packages. The certier actually indicated that a later version of Hadoop had a package structure more representative of components, which we used as an aid in nalizing Hadoop's ground-truth recovery. Although a majority of OODT's components were also not represented by packages, the mismatch was lower than in the case of Hadoop or Bash. This stems from OODT's package structure, which repeats across subsystems and identies a recurring set of com- ponent types: utilities, domain data structures, command-line tools and actions, and external system interfaces. For this reason, after studying our rst attempt at OODT's recovered architecture, the certier informed us that we needed to follow the recurring package structure more closely in order to identify the appropriate component types. Time Spent Emails System RT (h) CT (h) CT/RT Total Recov Intro Sched Misc Hadoop 120 8 7% 62 29 5 11 17 OODT 100 10 10% 25 5 5 10 5 Bash 80 2 3% 10 8 2 0 0 ArchStudio 100 7 7% 35 29 5 0 1 Mean 100 7 7% 33 18 4 5 6 Figure 7.14: Time spent by recoverers and certiers, and the number and purpose of exchanged email messages 115 ArchStudio's components resemble its packages and directory structure more closely than the other studied systems. The recovery process for ArchStudio did not explicitly in- clude mapping principles indicating that packages and directories constitute components. However, ArchStudio's ground-truth architecture has relatively few components, each of which contains many classes. Furthermore, its implementation on top of the myx.fw framework requires explicit declaration of components in the implementation. Both of these factors may account for the 67% match of components to packages. 7.1.2.5 Application of Mapping Principles For all systems except Bash, whose architecture was already partially recovered, the struc- tural module-clustering rules of Focus [121] served as the generic principles that yielded the initial recovery. However, application and domain principles eventually dominated all the recoveries. In the case of Hadoop, the principles for the domain of distributed systems included groupings regarding entry points of classes, threads, processes, and deployment onto dierent hosts. Keeping separate the expected components suggested in the Hadoop documentation (e.g., TaskTracker and JobTracker) helped further inform the recovery. Communication protocols and the code that implements them were isolated based on application principles, which were also key to producing the nal ground-truth archi- tecture. The certier's assessment of the quality of our recoveries and suggestions for their modications heavily focused on semantic coherency, i.e., whether a component captured cohesive functionality or related system concerns. Many application principles of Hadoop|e.g., \instrumentation classes should be grouped with metrics classes" or 116 \the map-side join classes should not be grouped with join utilities"|were not obvious from either the documentation or the information about the domain. The certier's role in the recovery was therefore critical. OODT's recovery came about more as a result of generic principles and package struc- ture than in the other systems. In particular, Focus's rules suggest grouping classes by inheritance, and this helped to identify components in OODT in several instances. Exist- ing OODT documentation, in particular architectural diagrams and their explanations, provided information about application principles that identied many other OODT com- ponents. Furthermore, OODT contains a variety of connector types (e.g., data streams, RPCs, and arbitrators) and a combination of application and generic principles were used to identify the classes implementing these connector types. ArchStudio's recovery was heavily based on application and domain principles derived from the myx.fw framework and the Myx architectural style. Those framework- and style-based principles, along with expected components obtained from documentation, dominated the recovery of ArchStudio. The available documentation similarly informed the recovery of Bash as it provided the application principles that were used to identify the les that implement dierent Bash components. 7.1.2.6 Eort of Recoverers and Certiers One of the major challenges we anticipated in obtaining the ground-truth architectures was having the required access to one or more certiers to aid in the recovery. It is typically not expected that a system's architect will be willing to dedicate time to an eort whose purpose falls outside the architect's regular duties (e.g., to aid academic 117 research). Unsurprisingly, having prior professional connections with certiers helped to incorporate them into our studies. However, we found that, at least in the case of open- source projects, architects and developers were readily willing to expend the time and eort to help. In fact, for two of the four systems we studied, we did not have any prior connections with the eventual certiers. Another part of addressing this challenge involved trying to minimize the burden on the certiers, while optimizing the use of their time and expertise. To examine whether it is realistic to recruit certiers in aiding ground-truth recovery in general, we recorded the amount of time they spent on each of the four recoveries, as well as the amount and nature of communication between them and the recoverers. Without exception, the communication between certiers and recoverers was performed via email. While potentially more time consuming than face-to-face meetings or phone conversations, we resorted to email because it lessens the burden on the certiers and allows them to control when they spend the time on the recovery. Figure 7.14 depicts the time recoverers and certiers spent on the recovery eort and the number of email messages exchanged between them. For each System, the table shows the number of hours spent by the recoverers (RT); the number of hours spent by certiers (CT); the ratio of the time spent by certiers to the time spent by recoverers, expressed as a percentage (CT/RT); the Total number of emails exchanged; the number of messages involving discussions about the recovery itself (Recov); the number of emails involving introductions and initial requests for help (Intro); the number of emails about scheduling meetings (Sched); and the number of emails about any other issues, e.g., reminders to respond (Misc). 118 An overwhelming majority of the time spent to produce the ground-truth recoveries was invested by recoverers. On average, recoverers spent about 100 person-hours per system. By contrast, certiers spent an average of seven hours on a recovery, which was only 3%-10% of the recoverers' time. This ratio suggests that certiers need not spend a prohibitive amount of time verifying an intermediate recovery. Bash involved a particularly low amount of time spent by the certier. This can be attributed to the availability of documentation that contained most of the domain and application principles. The availability of these principles, together with the smaller size of Bash relative to the other subject systems, resulted in fewer iterations with the certier. On the other hand, the signicantly higher number of components recovered in OODT than in the other systems was the likely reason why its certier took longer to analyze the recovery results. On average, about 30 email messages in total were exchanged to obtain a ground- truth recovery. Without exception, very few messages (2-5) were required to introduce the certiers to the recovery project and to secure their help. One possible reason for this is that open-source developers and architects may see the usefulness of an improved architectural understanding of their systems more readily than their counterparts work- ing, e.g., on commercial projects. Our on-going work on recovering the architecture of Google's Chromium will provide one test of that hypothesis. A majority of email exchanges between recoverers and certiers involved discussions of the systems' architectures and suggestions for modifying intermediate recoveries. Schedul- ing issues and reminders to respond unsurprisingly resulted in additional email exchanges. An outlier in terms of email trac was OODT. The primary reason for its comparatively 119 small amount of email discussion is that OODT's recoverer had previously worked as a developer of that system, although one with limited knowledge about OODT's overall architecture. 7.2 A Comparative Analysis of Recovery Techniques The eort and cost of software maintenance dominate the activities in a software sys- tem's lifecycle. Understanding and updating a system's software architecture is a critical facet of maintenance. However, the maintenance of an architecture is challenged by the related phenomena of architectural drift and erosion [164]. These phenomena are caused by careless, unintended addition, removal, and modication of architectural design deci- sions. To deal with drift and erosion, sooner or later engineers are forced to recover a system's architecture from its implementation. A number of techniques have been pro- posed to aid architecture recovery [51]. The existing techniques however are known to suer from inaccuracies, and dierent techniques typically return dierent results as \the architecture" for the same system. In turn, this can lead to diculties in assessing a recovery technique, which makes it unclear how to identify the best technique for a given recovery scenario; risks in relying on a particular technique, because its accuracy is unknown; and awed strategies for improving a technique, because the needed baseline assessments are missing. Previous comparative studies [18, 110, 15, 134, 177, 89]|while informative and pro- viding preliminary hints as to the respective quality of dierent recovery techniques|have 120 been limited in a number of ways. First, these analyses disagree as to which techniques are most accurate. Second, they do not elaborate the conditions under which each tech- nique excels or falters. Third, the quality of the \ground truths" on which these studies base their conclusions is questionable. Fourth, previous comparative analyses have been limited in terms of scale (i.e., the numbers and sizes of systems selected, or the number of techniques evaluated). To better understand the accuracy of existing architecture recovery techniques and to address the shortcomings encountered in previous comparative studies, this chapter presents a comparative analysis of six automated architecture recovery techniques: Al- gorithm for Comprehension-Driven Clustering (ACDC) [169], Architecture Recovery us- ing Concerns (ARC) [64], Bunch [127], scaLable InforMation BOttleneck (LIMBO) [15], Weighted Combined Algorithm (WCA) [110], and a technique developed by Corazza et al. [42], which we will refer to as zone-based recovery (ZBR) in this chapter. We have se- lected these techniques because they have been shown to be accurate based on prior work and have been evaluated in several previous publications. The six selected techniques rely on two kinds of input obtained from implementation-level artifacts: textual and structural. Textual input refers to the words found in the source code and comments of implementation-level entities. Structural inputs are the control- ow-based and/or data- ow-based dependencies between implementation-level entities. Most work on automated component recovery has focused on structural input. However, recent work has increas- ingly included textual input as a way of improving the accuracy of architecture recovery techniques. 121 We assess the accuracy of these techniques on eight architectures from six dierent open-source systems: ArchStudio [44], Bash [34], Hadoop [4], Linux [31], Mozilla [2], and OODT [116]. In the case of two of the systems|Linux and Mozilla|we use a pair of archi- tectures each, at dierent levels of detail. These six systems span a number of application domains, are implemented in three dierent programming languages (C, C++, Java), two dierent programming paradigms (procedural and object-oriented), and greatly vary in size (from 70ksloc to 4msloc). The systems' eight architectures comprise our ground truth. Four of the architectures|those of ArchStudio, Bash, Hadoop, and OODT|were veried as correct by one or more architects of the corresponding systems [60]; the re- maining four architectures|two each of Linux and Mozilla|are a result of a meticulous recovery process conducted by other researchers and were used in previous evaluations of recovery techniques [15]. Our assessment is performed at the system-wide level and at the level of individual system components. We performed extensive groundwork to position ourselves to conduct this compara- tive analysis. Over the span of a decade, we developed, evaluated, and rened a semi- automated software architecture recovery technique, called Focus [50, 121]. We subse- quently used Focus to recover the architectures of 18 open-source platforms for scientic computing and processing large datasets [117]. This work informed and aided us in the development of a more recent approach [61] for obtaining the architecture of a software system that can be relied on as its ground truth. We applied that approach in obtaining four of the ground-truth architectures [60] that we use to evaluate the recovery tech- niques in this chapter. Given the limited or no current availability of several recovery 122 techniques' implementations, we reimplemented three of the six techniques used in this study in consultation with their authors. Our results indicate that two of the selected recovery techniques are superior to the rest along multiple measures. However, the results also show that there is signicant room for improvement in all of the studied techniques. In fact, while the accuracy of individual techniques varies across the dierent subject systems, on the whole the tech- niques performed surprisingly poorly. We discuss the threats to our study's validity, the possible reasons behind our results, and several possible avenues of future research in automated architecture recovery. 7.2.1 Selected Recovery Techniques For our comparative analysis, we have selected the following six software architecture recovery techniques: Algorithm for Comprehension-Driven Clustering (ACDC) [169], Weighted Combined Algorithm (WCA) [109], and its two measurements WCA-UE and WCA-UENM [110], scaLable InforMation BOttleneck (LIMBO) [15], Bunch [108], Zone-Based Recovery (ZBR) [41, 42] and two of its variants, Architecture Recovery using Concerns (ARC) [64]. The selected techniques have been previously published, they are automated, and they have been designed specically for architecture recovery. Each technique provides 123 either a reference implementation or the details of its underlying algorithms. Finally, the techniques have been shown to be accurate in previous evaluations. The six techniques provide a broad cross-section in that they dier in the underlying mechanisms employed and type of information utilized to derive architectural details. Note that we have chosen not to include the recent SArF recovery technique [89] because its evaluation to date has not assessed its accuracy beyond recovering the package structure of Java applications. We elaborate more on each selected technique next. ACDC recovers components using patterns. ACDC was introduced by Tzerpos and Holt [169], who observed that certain patterns for grouping implementation-level entities recur. These patterns include grouping together (1) implementation-level entities that reside in the same source le, (2) entities in the same directory, (3) entities from the associated body and header les (e.g., .h and .c les in C), (4) entities that are leaves in a system's graph, (5) entities that are accessed by the majority of subsystems, (6) entities that depend on a large number of other resources, and (7) entities that belong to a subgraph obtained through dominance analysis. This last pattern, called the subgraph dominator pattern, clusters entities that are dominated by a given node together with the dominator node itself. ACDC has been included in previous comparative analyses, and its accuracy has been evaluated and conrmed in previous experiments [15, 110, 177]. WCA is a hierarchical clustering technique. In a clustering technique, each entity to be clustered is represented by a feature vector v =ff i j 1 i ng, which is an n- dimensional vector of numerical features. n is the number of implementation-level entities in the system. A feature f i is a property of an entity. In WCA, a feature f i of an entity e j indicates whether or not e j depends on another entity e i . Each feature in a feature 124 vector corresponds to a dierent entity. Therefore, a feature vector of an entitye in WCA indicates all the entities that e depends on. For example, consider a system with just three entities: e 1 , e 2 , and e 3 . For this system, e 1 will have the feature vectorf1; 0; 1g if e 1 depends on itself and e 3 but not on e 2 . A hierarchical clustering technique, like WCA, begins by placing each implementation- level entity in its own cluster, where a cluster represents an architectural component. The technique computes the pair-wise similarity between all the clusters and then combines the two most similar clusters into a new cluster. This is repeated until all elements have been clustered or the desired number of clusters is obtained. When two clusters are merged by WCA, a new feature vector is formed by combining the feature vectors of the two clusters. Two clusters c i and c j with feature vectors v i and v j , respectively, are combined into a new feature vector v wca ij =f v i +v j m i +m j = f ik +f jk m i +m j g;k = 1;::;n, wherem i and m j are the numbers of entities in c i and c j , respectively. To compute similarity between clusters or entities, WCA provides two measures, UE and UENM. The UE measure for entities e 1 and e 2 is dened as UE = 0:5 sumBoth(e 1 ; e 2 ) 0:5 sumBoth(e 1 ; e 2 ) +only(e 1 ;e 2 ) +only(e 2 ;e 1 ) : sumBoth(e i ; e j ) is the sum of features present in both entities. only(e i ; e j ) is the num- ber of entities present in e i but not in e j . A greater number of shared features be- tween two entities indicates their increased similarity. UENM is dened as UENM = 0:5sumBoth(e 1 ;e 2 ) 0:5sumBoth(e 1 ;e 2 )+2(only(e 1 ;e 2 )+only(e 2 ;e 1 ))+numBoth(e 1 ;e 2 )+notBoth(e 1 ;e 2 ) : numBoth(e i ; e j ) is the 125 number of features present in both entities. notBoth(e i ; e j ) is the number of features ab- sent from both entities. UENM incorporates more information into the measure, which allows it to better distinguish between entities. WCA's accuracy has been evaluated and conrmed (within the corresponding experiment parameters) in previous comparative analyses [110, 134]. LIMBO is another hierarchical clustering technique, but it diers from WCA in three key ways. First, LIMBO uses a mechanism called Summary Artifacts (SA) to reduce the computations needed while minimizing accuracy loss. Second, LIMBO uses the Information Loss (IL) measure to compute similarities between entities, which is dened as I = (p(c i )+p(c j ))D js (v i ; v j ). p(c) = mc n is the probability of a clusterc, wherem c is the number of entities in the cluster. D js is the Jensen-Shannon divergence, which computes the symmetric distance between two probability distributions. Each feature vector is a probability distribution over features. D js is dened asD js =p(c i )=p(c ij )D kl (v i jjv ij )+ p(c j )=p(c ij )D kl (v j jjv ij ). c ij is the cluster resulting from combining clusters c i and c j . v ij is c ij 's associated feature vector. D kl (v i jjv j ) is the Kullback-Leibler divergence, which is a non-symmetric distance measure for probability distributions, and is dened as D kl (v i jjv j ) = n X k=1 f ik log( f ik f jk ). Lastly, LIMBO associates a new feature vector for a combined cluster c ij using the following formula: v lim ij =f m i v i +m j v j m i +m j g. LIMBO has been included in previous comparative analyses [15, 110, 134] and has demonstrated signicant accuracy. Bunch uses a hill-climbing algorithm to nd a partitioning of entities into clusters that maximizes an objective function. Bunch initially starts with a random partition and stops when it can no longer nd a better partition. The objective function used in 126 Bunch is called Modularization Quality (MQ) and is dened as MQ = P k i=1 CF i . k is a partition's number of clusters. CF i is the \cluster factor" of cluster i, representing its coupling and cohesion, and is dened as CF i = 8 > > > > > > > > > < > > > > > > > > > : 0 i = 0 2 i 2 i + k X j=1 j6=i ( i;j + j;i ) otherwise i is the number of edges within the cluster, which measures cohesion. i;j is the number of edges from clusteri to clusterj, which measures coupling. We analyze two variations of Bunch's hill-climbing algorithm: next ascent hill climbing (NAHC) and steep ascent hill climbing (SAHC). In each step of the NAHC algorithm, the rst neighbor of the current partition that improves MQ is chosen. In each step of the SAHC algorithm, the entire set of neighboring partitions to the current partition is examined to nd the partition that improves MQ by the largest margin. Bunch has been used in previous comparative analyses [15, 89, 177], although its accuracy has mostly shown to be limited compared to other techniques. For our comparative analysis, we have considered both variants of Bunch|NAHC and SAHC|to obtain the most accurate clustering that this technique can provide. ZBR, or zone-based recovery, is a technique developed by Corrazza et al. [41, 42], which utilizes textual information, hierarchical clustering, and a weighting scheme for feature vectors. We have selected the latest formulation of Corazza et al.'s technique [42]. The textual information used in ZBR tries to infer a software system's semantics when 127 recovering the system's architecture. ZBR has been shown to work well in recovering the package structure of Java systems [42]. In ZBR, each source le is considered a document, i.e., a bag of words, where each word is obtained from program identiers and comments. Each of these documents is divided into zones, which are regions in which a word may reside. For a source le, there can be up to six zones: class names, attribute names, function names, parameter names, comments, and bodies of functions. Each word is scored a term frequency (tf) { inverse document frequency (idf) value, dened as tf -idf (t;d) = tf (t;d) idf (t;d). For a document d, tf for a term t is the number of occurrences of t in d. For a set of D documents, idf is dened as idf (t;D) = log jDj jfd2Djt2dgj . The denominator of the log function of idf gives the number of documents in which t is found. Thus, a tf -idf value is greater for a term that appears more frequently in a single document and lower as a term appears in multiple documents. Each term in a document gets a dierent tf-idf value for each zone; each zone is weighted using the Expectation-Maximization (EM) algorithm for Gaussian Mixtures [120]. ZBR can vary based on the inputs to EM for the zone weights [42]. In our analysis, we rst use uniform zone weights as input to EM (referred to as ZBR-UNI hereafter), and then vary the zone weights by the number of tokens, i.e., word instances, in a zone divided by the total number of tokens in the system (referred to as ZBR-TOK). Clusters in ZBR consist of source les and each source le's feature vector consists of the tf-idf values for each weighted zone. ZBR computes similarity between entities using cosine similarity. Cosine similarity is dened as sim cos (v 1 ; v 2 ) = v 1 v 2 kv 1 kkv 2 k . The 128 numerator of sim cos is the dot product of the two feature vectors; the denominator is the product of the magnitude of the two feature vectors. ARC is the nal architecture recovery technique used in our study. It has been developed as part of our on-going research. ARC recovers concerns of implementation- level entities and uses a hierarchical clustering technique to obtain architectural elements. ARC recovers concerns using the statistical language model LDA [27], which is obtained from the identiers and comments in a system's source code. LDA allows ARC to com- pute similarity measures between concerns and identify which concerns appear in a single implementation-level entity. Thus, ARC, like ZBR and unlike the other techniques, at- tempts to rely on a program's semantics to perform recovery. Similarly to ZBR, ARC represents a software system as a set of documents. A docu- ment can have dierent topics, which are the concerns in ARC. A topic z is a multinomial probability distribution over words w. A document d is represented as a multinomial probability distribution over topics z (called the document-topic distribution). Each implementation-level entity is treated as a document where its document-topic distribu- tion is its feature vector. Hierarchical clustering is performed by computing similarities between entities using the Jensen-Shannon divergence (D js ), which allows computing similarities between document-topic distributions. 7.2.2 Comparing Recovery Techniques We aim to meet two objectives with our analysis: obj1: Determine which techniques, if any, generally work better than others. obj2: Determine under what conditions each technique excels or falters. 129 To that end, we compare recovery techniques in terms of their overall accuracy (Section 7.2.2.2), the extent to which a cluster (i.e., architectural component) produced by a technique resembles a cluster produced by a human expert (Section 7.2.2.3), and the extent to which a recovery criterion (similarity measure, objective function, or pattern) implemented by a technique is re ected in a given cluster (Section 7.2.2.4). 7.2.2.1 Subject Systems and Implementation Details Table 7.1 provides an overview of the eight architectures against which we assessed the selected recovery techniques, and the six systems to which those architectures belong. We consider these to be \ground-truth" architectures because they have been recovered from a system by one or more experts. An expert in this context is an engineer with intimate knowledge of the system, its architectural design, and/or its implementation. Table 7.1 summarizes each System from which we have obtained an architecture; the system's Version; application Domain; primary implementation Language; size in terms of SLOC ; and the number of Components in the architecture. The eight architectures vary by size, domain, implementation language, and programming paradigm (procedural vs. OO). We produced the ground-truth architectures of ArchStudio, Bash, Hadoop, and OODT as part of our prior work, with the assistance of those systems' architects [61, 60]. We obtained the architectures of Linux and Mozilla from the authors of previous studies in- volving the two systems' architecture recoveries [15, 169]. For both of these systems, we 130 were provided nested architectures in which individual components comprised subcom- ponents. Using Shtern and Tzerpos's method [158], we obtained two non-hierarchical variations for Linux and Mozilla: a compact architecture of coarser granularity (denoted by the sux -C in Table 7.1) and a detailed architecture of ner granularity (denoted by the sux -D). We obtained the implementations of three of the selected architecture recovery techniques| ACDC, Bunch, and ARC|from their original authors. On the other hand, working im- plementations of WCA, LIMBO, and ZBR were unavailable. We re-implemented those techniques based on their published documentation, with guidance provided by the tech- niques' authors. ZBR is implemented in Python, while the other techniques are imple- mented in Java. ZBR uses gensim for computing tf-idf values [146], and scikit-learn [139] for its EM algorithm for Gaussian mixtues. We obtain module dependencies using the Class Dependency Analyzer (CDA) [8] for the subject systems implemented in Java and the mkdep tool [3] for the systems implemented in C and C++. Finally, ARC uses topic models from MALLET [119] to represent concerns. Table 7.1: Summary information about the subject systems System Ver Dom Lang SLOC Comps ArchStudio 4 IDE Java 280K 54 Bash 1.14.4 OS Shell C 70K 25 Hadoop 0.19.0 Data Process- ing Java 200K 68 Linux-C 2.0.27 OS C 750K 7 Linux-D 2.0.27 OS C 750K 120 Mozilla- C 1.3 Browser C/C++ 4M 10 Mozilla- D 1.3 Browser C/C++ 4M 233 OODT 0.2 Data Manage- ment Java 180K 217 131 7.2.2.2 Overall Accuracy We assess the overall accuracy of the selected techniques using a widely-used measure in architecture recovery called MoJoFM [172]. MoJoFM allows us to determine which techniques are generally more accurate than others (obj1) since it provides a system-wide measure. Furthermore, by comparing the obtained MoJoFM results, we can determine system-wide conditions under which techniques excel or falter (obj2). MoJoFM is a distance measure between two architectures expressed as a percentage. This measure is based on two key operations used to transform one architecture into another: moves (Move) of entities from one cluster to another cluster and merges (Join) of clusters. MoJoFM is dened as: MoJoFM (A;B) = (1 mno(A;B) max(mno(8A;B)) ) 100% A is the architecture produced by a given recovery technique. B is the ground-truth architecture against which the technique is being assessed. mno(A;B) is the minimum number of Move and Join operations needed to transformA intoB. Therefore, MoJoFM quanties the amount of eort required to transform one architecture into another. A 100% MoJoFM value indicates full correspondence betweenA andB, while a 0% MoJoFM value indicates that the two architectures are completely dierent. Table 7.2 depicts the obtained MoJoFM results. A MoJoFM value is given for each pair comprising a subject System's architecture and a recovery technique. For techniques whose numbers of clusters vary|ARC, WCA, and LIMBO|the table shows the results for the numbers of clusters that maximize the corresponding MoJoFM values. To obtain 132 Table 7.2: MoJoFM results System ARC ACDC WCA- UE WCA- UENM LIMBOBunch- NAHC Bunch- SAHC Z-Uni Z-Tok AVG ArchStudio 76.28% 87.68% 49.73% 45.87% 31.20% 59.50% 50.07% 48.53% 39.47% 54.26% Bash 57.89% 49.35% 41.56% 42.21% 27.27% 47.97% 38.51% 36.97% 36.97% 42.08% Hadoop 54.28% 62.92% 42.15% 39.57% 19.23% 51.24% 46.95% 36.00% 45.91% 44.25% Linux-D 51.47% 36.31% 33.51% 32.54% 18.46% 32.54% 31.14% MEM MEM 33.71% Linux-C 75.72% 63.76% 61.98% 59.74% 57.70% 73.65% 75.13% MEM MEM 66.81% Mozilla-D 43.44% 41.20% MJE MJE MJE 40.18% 31.65% MEM MEM 39.12% Mozilla-C 62.50% 60.30% 32.49% 32.40% 34.97% 69.02% 64.29% MEM MEM 50.85% OODT 48.48% 46.01% 43.67% 41.97% MJE 36.65% 31.56% 30.89% 33.57% 39.10% AVG 58.76% 55.94% 43.58% 42.04% 31.47% 51.34% 46.16% 38.10% 38.98% 45.15% the maximum MoJoFM value for a given (architecture;technique) pair, we selected a range of clusters and computed MoJoFM values for each resulting architecture. We varied the number of clusters from a single cluster to one half of the number of entities in a particular system. We stopped increasing the number of clusters for a given system once the MoJoFM values for the resulting architectures no longer improved. ARC can additionally vary in the number of concerns, so in its case we also selected the number of concerns that maximizes the MoJoFM values. We selected as few as 10 concerns and as many as 1 3 V , where V is the number of terms in a system. We stopped generating sets of concerns once we found that increasing the number of concerns no longer improved the MoJoFM values. We ran Bunch three times for each of its variations since the Bunch algorithm is non-deterministic. From those results, we selected the Bunch results that produced the highest MoJoFM values. In total, in generating Table 7.2, we computed MoJoFM values for approximately 1,540,000 architectures. The table also shows the average MoJoFM values (AVG) for each architecture (right-most column) and recovery technique (bottom row). Dark gray and 133 light gray highlighting indicates the techniques that obtained, respectively, the highest and second-highest MoJoFM values for a given subject system's architecture. Some entries indicate MoJoFM errors (MJE) or memory errors (MEM ). WCA and LIMBO produce architectures for which MoJoFM does not terminate. We conrmed this error with MoJoFM's creators, but the error has not been resolved at the time of this dissertation's submission. For the memory errors, ZBR stores a large amount of data that surpasses the memory available on our test hardware (16GB RAM). ZBR must store data of sizenzV , wheren is the number of entities in the system to be clustered, z is the number of zones, and V is the number of terms. For the other techniques, the data to be stored is typicallyn 2 , wherenV . Since ZBR must store signicantly more data, it does not scale to the two largest systems, Linux and Mozilla. ARC and ACDC produce the best results in general, with respective average MoJoFM values of 58.76% and 55.94%. Bunch generally obtained higher MoJoFM values with its NAHC variant (51.34% on average) than its SAHC variant (46.16%). Furthermore, several techniques performed well in the cases of the coarse-grained architectures of the two largest systems (Linux-C and Mozilla-C). The average MoJoFM values were 66.81% for Linux-C and 50.85% for Mozilla-C; these values were, respectively, the highest and third-highest across all eight architectures in our study. The results also indicate that there is still signicant room for improvement in each of our selected architecture recovery techniques. For example, the average MoJoFM values in the cases of Linux-D, Mozilla-D, and OODT were all under 40%. These three architectures contain the largest numbers of clusters in our study (between 120 and 233), suggesting that the state-of-the-art recovery techniques generally perform worse at ner 134 granularities. Our recent study indicates that it is particularly important for a recovery technique to handle architectures at ner granularities: while they typically elide many details when representing a system's conceptual architecture, engineers and architects tend to prefer much more detail when that same system's architecture is recovered from the implementation [60]. Even for the recovery techniques that generally obtained better MoJoFM results than others, the range of values varied signicantly across the subject systems: 43%{76% for ARC, 36%{88% for ACDC, and 33%{74% for Bunch-NAHC. Combined with the gener- ally poorer results yielded by the remaining three techniques, this degree of variation does not inspire condence in any one technique: currently, they are simply too unpredictable. We do not recommend trying to optimize these techniques such that they maximize the MoJoFM values for the eight ground-truth architectures we have collected to date, since the number of subject systems is likely too low and not representative of software archi- tectures in general. At the same time, introducing improvements that would minimize the variability across dierent systems|even the relatively limited sample used in our study|would certainly make a given technique more dependable. 7.2.2.3 Cluster-to-Cluster Comparison To obtain more specic information about the conditions under which each recovery tech- nique excels or falters (obj2), we compare the extent to which each cluster produced by a technique matches each cluster in a ground-truth architecture. We call this comparison a cluster-to-cluster (c2c) analysis. To this end, we introduce a measure that indicates the overlap of entities between two clusters: 135 Table 7.3: Cluster-to-Cluster Analysis, Moderate Match System ARC ACDC WCA- UE WCA- UENM LIMBO Bunch Z-Uni Z-Tok AVG ArchStudio 26% (14/54) 33% (18/54) 33% (18/54) 19% (10/54) 11% (6/54) 15% (8/54) 13% (7/54) 11% (6/54) 20% Bash 20% (5/25) 20% (5/25) 12% (3/25) 8% (2/25) 36% (9/25) 12% (3/25) 20% (5/25) 20% (5/25) 19% Hadoop 31% (21/68) 13% (9/68) 0% (0/68) 47% (32/68) 1% (1/68) 10% (7/68) 25% (17/68) 22% (15/68) 19% Linux-D 33% (40/120) 22% (26/120) 13% (15/120) 30% (36/120) 0% (0/120) 9% (11/120) MEM MEM 18% Linux-C 71% (5/7) 57% (4/7) 29% (2/7) 71% (5/7) 14% (1/7) 57% (4/7) MEM MEM 50% Mozilla-D 33% (78/233) 40% (93/233) 2% (5/233) 3% (6/233) 0% (0/233) 16% (37/233) MEM MEM 16% Mozilla-C 100% (10/10) 100% (10/10) 70% (7/10) 70% (7/10) 0% (0/10) 100% (10/10) MEM MEM 73% OODT 29% (62/217) 9% (20/217) 43% (93/217) 28% (61/217) MJE 4% (9/217) 23% (50/217) 18% (39/217) 22% AVG 32% (235/734) 25% (185/734) 19% (143/734) 22% (159/734) 3% (17/517) 12% (89/734) 22% (79/364) 18% (65/364) 19.13% c2c var (A;B) = jA\Bj jAj 100% In our case,A is the set of entities grouped in a cluster by a given technique, whileB is the set of entities grouped in a ground-truth cluster. To further assess the overall accuracy of the recovery techniques (obj1), we also summarize the results of the c2c analysis across each architecture as a whole. For ease of analysis, we consider the matching of clusters at three discrete strength levels: moderate, strong, and very strong. A moderate match occurs for 40% < c2c var 60%; a strong match occurs for 60%< c2c var 80%; nally, a very strong match occurs for 80% < c2c var 100%. For example, if a cluster c1 arc , computed by applying ARC to a system, yields a c2c var value of 55% when compared to cluster c5 gt in the system's ground-truth architecture, then c1 arc has a moderate match with c5 gt . 136 Table 7.4: Cluster-to-Cluster Analysis, Strong Match System ARC ACDC WCA- UE WCA- UENM LIMBO Bunch Z-Uni Z-Tok AVG ArchStudio 17% (9/54) 33% (18/54) 13% (7/54) 24% (13/54) 15% (8/54) 6% (3/54) 6% (3/54) 7% (4/54) 15% Bash 12% (3/25) 4% (1/25) 4% (1/25) 4% (1/25) 8% (2/25) 12% (3/25) 4% (1/25) 4% (1/25) 7% Hadoop 26% (18/68) 7% (5/68) 0% (0/68) 26% (18/68) 0% (0/68) 9% (6/68) 9% (6/68) 9% (6/68) 11% Linux-D 20% (24/120) 18% (21/120) 3% (3/120) 8% (10/120) 0% (0/120) 5% (6/120) MEM MEM 9% Linux-C 57% (4/7) 86% (6/7) 71% (5/7) 71% (5/7) 0% (0/7) 29% (2/7) MEM MEM 52% Mozilla-D 13% (31/233) 20% (47/233) 0% (0/233) 0% (0/233) 0% (0/233) 6% (13/233) MEM MEM 7% Mozilla-C 80% (8/10) 100% (10/10) 30% (3/10) 30% (3/10) 0% (0/10) 80% (8/10) MEM MEM 53% OODT 9% (19/217) 6% (13/217) 14% (30/217) 13% (29/217) MJE 0% (1/217) 4% (9/217) 3% (7/217) 7% AVG 16% (116/734) 16% (121/734) 7% (49/734) 11% (79/734) 2% (10/517) 6% (42/734) 5% (19/364) 5% (18/364) 8.50% By comparing matches at these three dierent strength levels, we aim to determine conditions under which each technique's accuracy varies signicantly (obj2). It is impor- tant to note that a cluster computed by a given technique can match multiple clusters of a ground-truth architecture. Although the converse is also the case, in our analysis we focus primarily on the ability of a technique to group the same entities as a human expert would. As a hypothetical example, consider two ground-truth clusters, c1 gt with entitiesfe 1 ;e 2 ;e 3 g andc2 gt with entitiesfe 4 ;e 5 ;e 6 g. Assume that ACDC computes three clusters: c1 acdc with entitiesfe 1 ;e 2 g,c2 acdc with entitiesfe 3 ;e 4 g, andc3 acdc with entities fe 5 ;e 6 g. c2c meas of c1 acdc and c1 gt is 100%, which is a very strong match; the c2c meas values of c2 acdc and, both, c1 gt and c2 gt are 50%, which are moderate matches; nally, c2c meas of c3 acdc and c2 gt is 100%, which is another very strong match. Note that a 100% value of c2c meas need not mean that the two clusters are identical. As dened, c2c meas only re ects a recovery technique's ability to group the appropriate implementation-level entities on a piecemeal (cluster-by-cluster) basis. The measure does 137 Table 7.5: Cluster-to-Cluster Analysis, Very Strong Match System ARC ACDC WCA- UE WCA- UENM LIMBO Bunch Z-Uni Z-Tok AVG ArchStudio 37% (20/54) 24% (13/54) 17% (9/54) 13% (7/54) 7% (4/54) 2% (1/54) 11% (6/54) 9% (5/54) 15% Bash 52% (13/25) 44% (11/25) 64% (16/25) 64% (16/25) 0% (0/25) 0% (0/25) 24% (6/25) 24% (6/25) 34% Hadoop 37% (25/68) 9% (6/68) 0% (0/68) 31% (21/68) 0% (0/68) 4% (3/68) 16% (11/68) 13% (9/68) 14% Linux-D 48% (58/120) 47% (56/120) 43% (51/120) 44% (53/120) 0% (0/120) 2% (2/120) MEM MEM 31% Linux-C 86% (6/7) 57% (4/7) 86% (6/7) 86% (6/7) 0% (0/7) 43% (3/7) MEM MEM 60% Mozilla-D 67% (156/233) 46% (107/233) 0% (0/233) 0% (0/233) 0% (0/233) 6% (15/233) MEM MEM 20% Mozilla-C 100% (10/10) 100% (10/10) 10% (1/10) 10% (1/10) 0% (0/10) 90% (9/10) MEM MEM 52% OODT 57% (124/217) 8% (17/217) 35% (77/217) 28% (61/217) MJE 0% (0/217) 7% (16/217) 7% (16/217) 20% AVG 56% (412/734) 31% (224/734) 22% (160/734) 22% (165/734) 1% (4/517) 4% (33/734) 11% (39/364) 10% (36/364) 19.63% not ensure that the granularity of each recovered cluster matches the ground truth. In the above example, the clusters recovered by ACDC are ner-grained than those in the ground-truth architecture. In that case, even though some of the recovered clusters may yield the maximum c2c meas , others will not. For this reason, we also accumulate the c2c meas values for each subject system, to determine which techniques are generally more accurate than others (obj1). Tables 7.3, 7.4, and 7.5, illustrate the matching for the moderate, strong, and very- strong levels, respectively. Each table shows how many components (i.e., clusters) of a ground-truth architecture are matched by the respective technique at the given strength level, both as a numerical ratio and as a percentage value. As in the previous analysis, the highlighted cells are used to denote the techniques with the highest (dark gray) and second-highest (light gray) values for a particular architecture. The optimal result for any technique would be to have a 100% average match rate across all architectures at the very strong level. Higher match rates at higher strength levels are preferred over higher 138 match rates at lower strength levels. We focus our attention on the bottom-most row in each table, indicating the average values for each technique computed across all eight architectures. For all three strength levels, ARC and ACDC are again shown as the most accu- rate techniques overall (obj1). ARC provides the highest accuracy at the moderate and very-strong levels (Tables 7.3 and 7.5), while ACDC slightly outperforms ARC at the strong level (Table 7.4). Overall, the recovery techniques perform very poorly at match- ing components in the strong (\middle") range: the mean value of c2c var across all eight techniques was only 8.5% (the bottom-right cell in Table 7.4). By comparison, the tech- niques more than double the match rate in both the moderate (19.13%) and very strong (19.63%) ranges. This was an unexpected trend. We are in the process of performing ad- ditional analysis, and are also contacting the authors of the dierent recovery techniques used in our study, to help shed light on the reasons behind these results. The results of the c2c analysis further reinforce the observation made in the case of the MoJoFM analysis: simply put, the state-of-the-art architecture recovery techniques need to improve. On the whole, the techniques performed poorly in obtaining individual clusters that match ground-truth architectures. The average accuracy of each technique is below 33%, with the exception of ARC at the very-strong level, which matched slightly over half (56%) of the ground-truth clusters. Furthermore, as already mentioned above, the mean values across all eight techniques are uniformly under 20% (the bottom-right cells in Tables 7.3, 7.4, and 7.5). Finally, similarly to the MoJoFM analysis, we observe higher accuracy measurements for coarse-grained architectures. The average accuracy across all recovery techniques and 139 Table 7.6: Criterion-Indicator Analysis, Strong Level System ARC ACDC WCA- UE WCA- UENM LIMBO Bunch Z-Uni Z-Tok AVG ArchStudio 44% (24/54) 81% (44/54) 2% (1/54) 0% (0/54) 0% (0/54) 0% (0/54) 0% (0/54) 0% (0/54) 16% Bash 40% (10/25) 16% (4/25) 16% (4/25) 0% (0/25) 0% (0/25) 8% (2/25) 0% (0/25) 4% (1/25) 11% Hadoop 43% (29/68) 68% (46/68) 3% (2/68) 0% (0/68) 1% (1/68) 1% (1/68) 7% (5/68) 7% (5/68) 16% Linux-D 30% (36/120) 53% (64/120) 4% (5/120) 0% (0/120) 3% (4/120) 1% (1/120) MEM MEM 15% Linux-C 100% (7/7) 57% (4/7) 0% (0/7) 0% (0/7) 0% (0/7) 0% (0/7) MEM MEM 26% Mozilla-D 61% (141/233) 46% (107/233) 3% (6/233) 0% (0/233) 0% (0/233) 5% (12/233) MEM MEM 19% Mozilla-C 100% (10/10) 100% (10/10) 0% (0/10) 0% (0/10) 0% (0/10) 30% (3/10) MEM MEM 38% OODT 8% (18/217) 61% (133/217) 7% (15/217) 0% (0/217) 0% (1/217) 0% (0/217) 3% (6/217) 2% (4/217) 10% AVG 37% (275/734) 56% (412/734) 4% (33/734) 0% (0/734) 1% (6/734) 3% (19/734) 3% (11/364) 3% (10/364) 13.38% all three strength levels is 54% for Linux-C and 59% for Mozilla-C. On the other hand, the average accuracy is well under 20% for all remaining architectures. 7.2.2.4 Recovery Criteria Indicators As part of meetingobj1 andobj2, we analyze the accuracy of the recovery criteria imple- mented by the dierent techniques. The selected techniques use three types of recovery criteria (recall Section 7.2.1): (1) establishing a similarity measure, (2) maximizing an objective function, or (3) identifying a particular pattern involving a given system's mod- ules. In support of obj2, we determine the extent to which each criterion is re ected in each cluster in the ground-truth architectures. This analysis can suggest possible ways of combining criteria for future recovery techniques. We analyze each criterion across all techniques for a given architecture, and across all architectures for a given technique. Analyzing recovery criteria in this manner also allows us to further assess obj1 and de- termine the extent to which the results of the recovery-criterion analysis are consistent 140 Table 7.7: Criterion-Indicator Analysis, Very Strong Level System ARC ACDC WCA- UE WCA- UENM LIMBO Bunch Z-Uni Z-Tok AVG ArchStudio 11% (6/54) 70% (38/54) 0% (0/54) 0% (0/54) 0% (0/54) 0% (0/54) 0% (0/54) 0% (0/54) 10% Bash 16% (4/25) 16% (4/25) 0% (0/25) 0% (0/25) 0% (0/25) 8% (2/25) 0% (0/25) 0% (0/25) 5% Hadoop 6% (4/68) 56% (38/68) 1% (1/68) 0% (0/68) 0% (0/68) 0% (0/68) 4% (3/68) 4% (3/68) 9% Linux-D 3% (4/120) 33% (40/120) 1% (1/120) 0% (0/120) 0% (0/120) 1% (1/120) MEM MEM 6% Linux-C 14% (1/7) 14% (1/7) 0% (0/7) 0% (0/7) 0% (0/7) 0% (0/7) MEM MEM 5% Mozilla-D 19% (45/233) 25% (58/233) 1% (2/233) 0% (0/233) 0% (0/233) 2% (4/233) MEM MEM 8% Mozilla-C 90% (9/10) 80% (8/10) 0% (0/10) 0% (0/10) 0% (0/10) 10% (1/10) MEM MEM 30% OODT 0% (0/217) 53% (116/217) 3% (6/217) 0% (0/217) 0% (0/217) 0% (0/217) 1% (2/217) 0% (0/217) 7% AVG 10% (73/734) 41% (303/734) 1% (10/734) 0% (0/734) 0% (0/734) 1% (8/734) 1% (5/364) 1% (3/364) 6.88% with the results of the MoJoFM analysis. This recovery-criterion analysis is similar to a previous analysis of coupling between Java classes in packages [23]. For each of the three types of recovery criteria, we have dened functions that quantify the extent to which a criterion is indicated by a cluster. We refer to these functions as criteria indicators. Each criterion indicator gives a value between 0 and 1. Values of 1 signify that the cluster is completely indicated by the recovery criterion, while a value of 0 signies that the recovery criterion does not indicate the cluster at all. For simplicity of analysis in this chapter, we select the criterion-indicator values (v ci ) within two dierent ranges: strong for 0:6<v ci 1, and very strong for 0:8<v ci 1. Bunch is the only technique among the six selected recovery techniques that uses an objective function as a recovery criterion. Bunch's objective function MQ is computed using the \cluster factor" CF i for each cluster c i (recall Section 7.2.1). Therefore, we directly use CF i as the criterion indicator in the case of Bunch. 141 For the hierarchical clustering techniques|ARC, LIMBO, WCA, and ZBR|we com- pute the extent to which the similarity measure that each technique uses is an indicator of the clusters in the ground-truth architectures. To this end, we compute the following criterion indicator, crit hier , for each cluster c in a ground-truth architecture: crit hier = 8 > > > > > > < > > > > > > : 0 E c = 1 1 Ec Ec X i;j i6=j sim(c i ;c j ) E c > 1 E c is the number of entities in cluster c. sim is the similarity measure of a technique, which is computed for two clusters c i and c j . For example, sim can be D js for ARC or UE for WCA (recall Section 7.2.1). Thus, crit hier computes the average pair-wise similarity between entities in a cluster. ACDC is the lone technique that uses patterns. We focus on ACDC's main pattern, the subgraph dominator pattern. As our analysis will show, this pattern is overwhelmingly responsible for ACDC's accuracy. To obtain a criterion indicator for the subgraph dominator pattern, we need to com- pute the extent to which a cluster can be characterized as having a single entity that dominates the cluster's remaining entities. To this end, each cluster of an architecture must be represented as a rooted directed graph, RDG c , whose entities are nodes and their dependencies are edges. A node n 1 in a graph dominates another node n 2 if every path from the root node to n 2 must pass through n 1 . It is possible that, in an actual component, disjoint subsets of entities in a cluster are dominated by distinct entities. For example, a cluster c may have two disjoint subsets 142 of entities EN 1 and EN 2 , where no entity in EN 1 has any dependencies on any entity in EN 2 , and vice versa. In that case, no single entity dominates all other entities in the cluster. Thus, to create a criterion for the subgraph dominator pattern, we compute a minimum spanning forest (MSF), which is a set of disjoint minimum spanning trees. A minimum spanning tree (MST) of a connected graph G is a connected subgraph SG of G that connects all p nodes of G with p 1 of G's edges. To understand how an MSF helps us to quantify the subgraph dominator pattern, consider the case of an MST that contains all entities of a cluster c. Then, RDG c is indicated by the subgraph dominator pattern since all of RDG c 's entities are dominated by its root entity. If a single MST cannot include all entities of c, we can compute an MSF using a variation of Prim's algorithm [1, 145]. Once an MST is computed that does not contain all nodes of RDG c , one of the remaining nodes is selected and another MST is computed. MSTs are continuously computed in this manner until are nodes are accounted for. Thus, by computing an MSF for a cluster c, the following criterion indicator can be used for ACDC's subgraph dominator pattern: crit sdp = numEnt(maxMST(MSF c )) E c maxMST returns the MST from an MSF that has the largest number of entities. numEnt is the number of entities in an MST. As the number of entities in the largest MST of a cluster's MSF increases, so does the value for crit sdp . In turn, a higher value for crit sdp means that the cluster is more highly indicative of ACDC's subgraph dominator pattern. 143 Tables 7.6 and 7.7 depict the criterion-indicator analysis for the strong and very-strong levels, respectively. Each table depicts the average criterion-indicator values across all clusters of an architecture. Dark gray and light gray cells denote the techniques with the highest and second-highest criterion-indicator values for a given architecture. As with the MoJoFM and c2c analyses, ARC and ACDC showed the best results for the criterion-indicator analysis. However, in this case ACDC yielded markedly better results than ARC. At the strong level, ACDC's cluster criterion is indicated in 56% of the cases across the eight architectures, as compared to ARC's 37%. The dierence is even more pronounced at the very-strong level, where ACDC yields a 41% average across the eight architectures, as compared to ARC's 10%. It is interesting to note that the MoJoFM and c2c analyses|where ACDC was matched or slightly outperformed by ARC|incorporated all patterns implemented by ACDC. On the other hand, the criterion-indicator analysis|where ACDC is clearly the most accurate technique|re ects only ACDC's subgraph dominator pattern. While fur- ther study is required to understand all of their underlying causes, our results indicate that the other patterns ACDC uses in recovering an architecture may, in fact, result in reducing its accuracy. Finally, the criterion-indicator results show that the criteria employed by techniques other than ACDC and ARC do not indicate clusters in the ground-truth architectures. These remaining techniques achieve strong criteria indicator values for at most 4% of ground-truth clusters, and very-strong criteria indicator values for at most 1% of ground- truth clusters. We consider both of these values to be negligible. Our results clearly show 144 that clustering criteria used in much of the current research on software architecture recovery are, simply put, wrong and that they need to be carefully reconsidered. 7.2.3 Discussion Two techniques|ARC and ACDC|routinely outperformed the remaining four tech- niques across all eight ground-truth architectures and all three analyses (MoJoFM, c2c, and recovery-criteria). However, on the whole, all of the studied techniques performed poorly. For the MoJoFM analysis, the techniques obtained values between 38% and 59%. For the three strength levels of the c2c analysis, the techniques on average obtained matches for under 20% of ground-truth clusters. Finally, for the recovery-criterion anal- ysis, criterion-indicator values varied from 0% to 56% on average for the two strength levels. The particularly poor performance of techniques other than ACDC and ARC across all three analyses suggest that their recovery criteria do not re ect the way engineers actually map entities to components. WCA and LIMBO focus on grouping together entities with similar structural dependencies. Bunch attempts to optimize the coupling and cohesion between entities. ZBR attempts to maximize term similarity weighted by zones in a cluster. None of these criteria seem to be appropriate for architecture recovery. Given the signicantly greater accuracy of ACDC and ARC, techniques that employ structural patterns and system concerns as criteria for recovery may be a fruitful starting point for improving existing and developing new recovery techniques. In particular, the subgraph dominator pattern, or similar patterns, may be particularly benecial. At the same time, the relative accuracy of the subgraph dominator pattern when compared to the 145 combination of patterns used in ACDC (recall Section 7.2.2.4) suggests that researchers creating new recovery techniques should be careful when including multiple criteria in a technique: additional criteria are likely to render a technique more complex and may, at the same time, be detrimental to its accuracy. We also observe that the results of the recovery-criterion analysis were generally con- sistent with the MoJoFM and c2c analyses. This suggests that, in order to test any newly developed or selected criteria for a recovery technique, it may be benecial to perform a criteria-indicator analysis rst. Finally, the results of even the two most accurate techniques varied greatly. ARC's average results varied by 40% for the c2c analysis across the three strength levels, by nearly 30% for the two strength levels of the recovery-criterion analysis, and by over 30% for the MoJoFM analysis. Similarly, ACDC's average results varied by 15% for both the c2c and recovery-criterion analyses, and by over 50% for the MoJoFM analysis. Thus, relying upon even the top-performing techniques alone is insucient to reliably perform an architecture's recovery in general. This unpredictability of existing techniques, in concert with their overall unreliability, suggest that eective architecture recovery is likely to require extensive manual intervention|the very thing automated techniques have aimed to eliminate. 7.2.4 Threats To Validity We have collected a very large amount of data, and used it to draw a number of conclu- sions about the recovery techniques. However, there are certain issues that potentially 146 undermine the validity of our results and our condence in them. We highlight the three most important issues. First, we selected a total of eight variants of six dierent techniques from a much larger body of research. Clearly, including additional techniques would strengthen our results. However, already discussed, other techniques (1) do not have implementations available, (2) are not targeted at software architectures, (3) have been shown to be inferior to the techniques we have selected, or (4) have not been suciently evaluated to meet the threshold we deemed reasonable. We have tried to mitigate the risk by obtaining techniques that employ dierent kinds of input (textual and structural), use dierent underlying recovery mechanisms (optimization criteria, similarity measures, and pattern matching), and have been previously shown to be accurate. Furthermore, even though we had to re-implement three of the techniques, we regularly consulted their authors in order to ensure the correctness of our implementations. Our condence in the veracity of our results is also helped by the fact that two techniques{ACDC and ARC|routinely outperformed the other four techniques and that all techniques performed poorly across all studies. Second, we have evaluated the techniques on architectures drawn from only six soft- ware systems. This is a very limited sample and additional systems are needed to further validate our results. However, we were restricted to systems that can be relied on as ground-truths and, to our knowledge, the eight architectures we have collected, recov- ered, validated, and used in our study are the largest such set readily available. We have mitigated this specic threat to validity in our study by selecting subject systems that vary across domain, size, implementation paradigm, and implementation language. Thus, 147 although they are limited in number, our subject systems are likely to be representative of a broad class of software systems. Third, it is widely recognized that there is no single \correct" architecture for a given system. Therefore, even though we invested signicant eort in recovering and verifying the ground-truth architectures of four of our subject systems (ArchStudio, Bash, Hadoop, and OODT), and other researchers did so for the remaining two systems (Linux and Mozilla), it is possible that dierent architectures|both at the level of individual clusters and their overall congurations|could have been recovered by another set of researchers and been deemed \correct" by the systems' architects. Clearly, that would have changed our analysis results. However, it is very likely that the alternative set of ground-truth architectures would be highly similar to the architectures we used in our study, and would therefore yield similar results. Even if we allow for some non-trivial discrepancies, most of the recovery techniques did so poorly along all three measures we used that it is very dicult to foresee a scenario where a dierent set of ground truths would have changed those results substantially. 7.3 An Empirical Study of Architectural Change and Decay in Open-Source Software Systems 7.3.1 Empirical Study Setup Our study targets the following three research questions about architectural change and decay. 148 RQ1: In what ways do architectures change? The relative dearth of empirical studies about architectural change has resulted in that phenomenon being poorly under- stood. As a result, the extent of architectural change, types of architectural change, and the points in a system's lifecycle when major architectural change occurs are generally unclear. RQ2: In what ways do architectures decay? Although architectural decay has long been recognized as a phenomenon [140, 164], it has only recently been rigorously characterized and quantied [59, 63, 128]. To obtain a broad view of decay over time, our study specically focuses on instances of architectural decay|architectural smells|and metrics for measuring decay. RQ3: Does the number of code-level issues reported for a system version correlate with the number of architectural smells present in that version? Issues reported in software repositories (e.g., Jira and Bugzilla) document bugs, new- feature requests, enhancement requests, etc. It is currently unknown whether the number of issues associated with a system's version is correlated with the system's architectural decay. If such a correlation exists, then the reported issues may be used to predict decay and simply addressing those issues may alleviate a system's architectural problems. In order to answer the above three research questions, we applied ARCADE to a total of 463 versions of 12 Apache open-source systems, totaling over 112 million source-lines of code (MSLOC). All of these systems are implemented in Java and store issues in the Apache Jira repository. Table 7.8 summarizes each System we analyzed, the application 149 Table 7.8: Subject systems analyzed in our study System Dom Ver Time MSLOC ActiveMQ Message Bro- ker 20 8/04-12/05 3.40 Cassandra Distr. DBMS 123 9/09-9/13 22.0 Chukwa Data Monitor 6 5/09-2/14 2.20 Hadoop Data Process 70 4/06-8/13 30.0 Ivy Depend. Mgr 20 12/07-2/14 0.40 JackRabbit Content Repo 84 8/04-2/14 15.0 Jena Semantic Web 7 6/12-9/13 27.0 JSPWiki Wiki Engine 32 10/07-3/14 1.20 Lucene Search En- gines 16 12/10-1/14 5.10 PDFBox PDF Library 16 2/08-3/14 2.00 Struts Web Apps 48 3/00-2/14 2.00 Xerces XML Library 21 3/03-11/09 2.30 Domain of each system, the number of Versions analyzed for each system, the Time- span between the earliest and latest analyzed version of each system, and the cumulative size of all selected versions of each system (MSLOC). We applied ARCADE's work ow depicted in Figure 6.1 to the dierent versions of each system. For each version, ARCADE produced ve types of artifacts: (1) two recovered Architectures, one by ACDC and the other by ARC; (2) code-level Issues extracted from Apache Jira; (3) the values of Change Metrics; (4) the values of Decay Metrics; and (5) Correlation Data. All artifacts produced in our study are available at [12]. In our analysis of the subject systems, we leveraged their shared hierarchical versioning scheme: major.minor.patch. A major version entails extensive changes to a system's functionality and typically results in API modications that are not backward-compatible. A minor version involves fewer and smaller changes than a major version and typically ensures backward-compatibility of APIs. A patch version, also referred to as a point version, results from bug xes or improvements to a system that involve limited change to the functionality. This shared versioning scheme enabled us to make certain comparisons despite the dierences among the systems and their numbers of versions. 150 7.3.2 Results This section presents the key results of our empirical study. For each research ques- tion stated above, we discuss the method employed in attempting to answer it and the associated ndings. 7.3.2.1 RQ1: Architectural Change To understand the nature of architectural change, we employed ARCADE to compute the change metrics described in Section 2.4. We did so for each of the two architectural Table 7.9: Average MoJoFM values between versions ACDC ARC System Major Minor Patch Major Minor Patch ActiveMQ 67.17% 90.25% 91.49% 82.45% 59.03% 58.62% Cassandra 86.35% 85.11% 91.73% 81.17% 79.88% 81.65% Chukwa O1J 81.01% O1P O1J 70.46% O1P Hadoop 78.27% 88.67% 91.44% 67.52% 76.11% 75.92% Ivy 86.85% 86.26% 93.31% 71.69% 71.69% 67.11% Jackrabbit 81.58% 85.30% 89.39% 65.09% 67.22% 66.49% Jena O1J 83.02% 90.28% O1J 58.99% 57.26% JSPWiki O1J 79.05% 84.96% O1J 76.94% 67.62% Lucene 30.56% 90.37% 90.74% 61.70% 70.66% 69.42% PDFBox O1J 85.02% 88.81% O1J 69.53% 71.77% Struts 100.00% 74.71% 88.28% 100.00% 66.43% 70.94% Xerces 84.35% 88.76% 88.19% 75.32% 70.23% 75.65% AVG 76.89% 84.79% 89.87% 75.62% 69.76% 69.31% Table 7.10: Average c2c cvg between versions Average c2ccvg over ACDC Average c2ccvg over ARC Major Minor Patch Major Minor Patch System (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) (k;k +1) ActiveMQ 21.79% 19.76% 83.13% 81.30% 86.23% 86.23% 6.45% 5.10% 40.45% 37.24% 40.59% 40.59% Cassandra 61.55% 55.73% 55.95% 50.01% 92.86% 92.86% 57.41% 47.86% 47.50% 40.10% 61.71% 61.49% Chukwa O1J O1J 65.46% 62.84% O1P O1P O1J O1J 46.03% 43.17% O1P O1P Hadoop 59.09% 50.40% 85.38% 78.74% 90.20% 89.47% 45.17% 32.70% 54.09% 49.90% 56.49% 56.08% Ivy 43.75% 43.75% 79.22% 71.73% 100.00% 100.00% 20.59% 18.42% 43.17% 38.34% 44.30% 44.30% Jackrabbit 46.75% 50.00% 76.09% 71.18% 84.71% 84.57% 27.72% 27.62% 39.86% 36.57% 40.88% 40.78% Jena O1J O1J 63.32% 49.42% 84.87% 85.81% O1J O1J 31.21% 24.85% 37.77% 37.50% JSPWiki O1J O1J 48.10% 50.00% 75.79% 75.00% O1J O1J 22.36% 22.36% 41.18% 41.07% Lucene 0.00% 0.00% 88.29% 85.78% 90.20% 90.20% 2.50% 1.69% 48.35% 46.47% 50.84% 50.78% PDFBox O1J O1J 78.58% 77.39% 87.97% 87.97% O1J O1J 41.92% 40.69% 45.63% 45.53% Struts 100.00% 100.00% 54.37% 59.08% 85.98% 85.59% 100.00% 100.00% 29.83% 34.90% 45.64% 45.43% Xerces 20.00% 15.79% 78.44% 75.78% 86.54% 86.54% 22.43% 18.18% 51.82% 50.37% 59.35% 59.25% AVG 44.12% 41.93% 71.36% 67.77% 87.76% 87.66% 35.28% 31.45% 41.38% 38.75% 47.67% 47.53% 151 views (produced by ACDC and ARC) of each version of our subject systems, at both the system and component levels. For change at the system level, we analyzed the MoJoFM values computed for dierent pairs of system versions. A version pair is an ordered pair (v k ;v k+1 ) of versions of a given system, in whichv k+1 followsv k in the version sequence (e.g., (1:9:5; 2:0:0)). We dene a distance between versions vdist Q (v i ; v j ) that indicates the number of versions by which v i is separated from v j among a system's Q versions used in our study. For example, if Q chk is the sequence of versions we selected for Apache Chukwa, and we used no versions between 0:1:0 and 0:3:0 in our study, then vdist Q chk (0:1:0; 0:3:0) = 1. For both ACDC and ARC, Table 7.9 depicts the average MoJoFM values computed by comparing version pairs involving Major versions, Minor versions within the same major version, and Patch versions within the same minor version, withvdist Q = 1. This means that, e.g., for a system with versions 1:8:5; 1:9:0, 1:9:5; and 2:0:0, the Major columns in Table 7.9 would only include the MojoFM values for version pair (1:9:5; 2:0:0); the Minor columns would only include the MojoFM values for version pair (1:8:5; 1:9:0); and the Patch columns would only include the MojoFM values for version pair (1:9:0; 1:9:5). For some systems, we only analyzed one major version (denoted as O1J in Table 7.9) and only one patch version (denoted as O1P). Table 7.9 shows that the degree of architectural change between consecutive system versions is relatively small for all three version types. It also indicates that the system- wide similarity is generally greater between patch versions than between minor versions. However, there are several exceptions to this among the architectures recovered by ARC. 152 The results for major versions show an interesting discrepancy: according to the archi- tectures recovered by ACDC, introduction of a major system version did, in fact, involve more substantial architectural changes on average than the minor or patch versions; on the other hand, ARC yielded architectures that, on average, remained most stable dur- ing a jump to a new major version. While the average version-pair similarity for major changes is nearly identical between ACDC (77%) and ARC (76%), the MoJoFM results indicate that, for minor and patch versions, semantic changes in architectures (captured by ARC) may be more substantial than structural ones (captured by ACDC). The observation that the most signicant architectural changes need not occur be- tween major system versions was unanticipated. At the same time, it is interesting to note that, both, the lowest (31%) and the highest (100%) MoJoFM averages in our sub- ject systems occurred between major versions recovered by ACDC (the two values are highlighted in Table 7.9). This suggests that the architectural structure of a system can vary substantially. To understand architectural change at the level of individual components, we relied on ARCADE's c2c cvg metric. In the results reported here, we set the threshold th cvg (recall Section 2.4) to 67%. This setting allowed ARCADE to treat two clusters as dierent versions of the same cluster, while allowing a reasonable fraction of the new cluster's constituent code elements to change. Table 7.10 depicts average c2c cvg values for architectures recovered by ACDC and ARC. These values are computed for Major, Minor, and Patch version pairs, respectively, with vdist Q = 1. Average c2c cvg values are computed for each version pair (v k ;v k+1 ), which obtains the percentage of extant 153 components, and its inverse (v k+1 ;v k ), which allows us to determine the extent to which new components were added to a version. The results in Table 7.10 indicate that there is a greater degree of component-level change than system-level change. For example, for ACDC, both c2c cvg averages between major versions (44% and 42% in Table 7.10) are notably lower than the MoJoFM average for major versions (77% in Table 7.9). This is even more pronounced in the case of ARC (35% and 31% c2c cvg averages vs. the 76% MojoFM average). The same trends hold for changes to minor and patch versions. The average c2c cvg values for a version pair and its corresponding inverse pair were highly similar for both ARC and ACDC. Although individual version pairs and their in- verses were dissimilar in several cases (e.g., in the case of Hadoop,c2c cvg (0:1:0; 0:2:0) = 75% while c2c cvg (0:2:0; 0:1:0) = 46%), the dierences between their average c2c cvg val- ues ranged from under 1% (patch versions in ARC) to about 5% (minor versions in ACDC). This corroborates the observation we made during the study that the numbers of components in our subject systems experienced a \jump" at times, but on the whole these numbers tended to remain largely constant between consecutive versions. The c2c cvg results further corroborate the observation from the analysis of MojoFM results that, at the architectural level, semantic change is more substantial than structural change. ARC yielded averagec2c cvg values for minor versions (41% and 39% in Table 7.10) and patch versions (48% and 48%) that were signicantly lower than the corresponding ACDC values (71% and 68% for minor versions; 88% and 88% for patch versions). Both MojoFM and c2c cvg also indicate that transitions between major versions typ- ically result in comparable or more pronounced architectural changes than transitions 154 between minor or patch versions for vdist Q = 1. However, we hypothesized that sig- nicant architectural changes may occur across multiple minor versions within the same major version. For example, version pair (1:0:0; 1:9:0) may re ect more change than version pair (1:9:0; 2:0:0). Table 7.11: Minimum similarity between minor versions ACDC ARC c2ccvg ACDC c2ccvgARC System MoJoFM (v k ;v k+1 ) (v k+1 ;v k ) (v k ;v k+1 ) (v k+1 ;v k ) ActiveMQ 86.00% 52.59% 81.82% 80.00% 39.52% 36.29% Cassandra 73.53% 6.40% 31.82% 24.14% 35.65% 25.15% Chukwa 65.35% 60.27% 63.41% 59.57% 41.67% 40.28% Hadoop 50.93% 69.96% 48.89% 49.11% 36.03% 35.79% Ivy 69.96% 48.28% 41.67% 23.81% 23.08% 17.65% Jackrabbit 58.40% 56.96% 52.54% 43.66% 30.73% 30.73% Jena 73.55% 56.32% 45.83% 38.15% 27.52% 20.83% JSPWiki 62.85% 57.04% 6.06% 6.06% 0.00% 0.00% Lucene 58.14% 59.48% 78.95% 75.00% 43.77% 42.15% PDFBox 69.99% 64.21% 40.38% 38.14% 52.63% 48.78% Struts 27.08% 8.33% 15.79% 25.00% 8.62% 18.52% Xerces 84.67% 64.43% 57.14% 48.00% 38.26% 41.67% AVG 65.04% 50.36% 47.03% 42.55% 31.46% 29.82% To assess whether this actually occurs in our subject systems, we conducted an anal- ysis to determine the minimum similarity between all minor version pairs within a major version with vdist Q 1. Table 7.11 shows the results of that analysis for MoJoFM and c2c cvg in architectures produced by, both, ACDC and ARC. The average MoJoFM values for minimum similarity (65% for ACDC and 50% for ARC) indicate that the extent of system-level architectural change within a single major version exceeds the change oc- curring between major versions (77% for ACDC and 76% for ARC in Table 7.9). The minimum c2c cvg averages for all pairs of minor versions shown in Table 7.11 (47% and 43% for ACDC; 31% and 30% for ARC) were similar to thec2c cvg averages between major versions from Table 7.10 (44% and 42% for ACDC; 35% and 31% for ARC). Together, these results indicate that the cumulative extent of architectural changes across dierent minor versions within a single major version of a system is at least similar to, and may 155 signicantly surpass, the changes that typically occur during a transition to a new major version. 7.3.2.2 RQ2: Architectural Decay Table 7.12: Architectural-smell occurrences ACDC ARC System LO DC CO LO SPF DC min max avg min max avg min max avg min max avg min max avg min max avg ActiveMQ 5 17 13 1 1 1 12 34 20 11 21 15 1 16 8 1 3 1 Cassandra 1 8 5 1 1 1 11 115 45 10 54 26 1 14 4 1 3 2 Chukwa 4 8 6 1 1 1 9 16 12 7 13 10 1 2 1 1 1 1 Hadoop 0 30 5 0 1 1 5 260 38 5 140 25 1 51 5 1 7 2 Ivy 0 6 2 1 1 1 1 25 9 1 16 6 1 5 2 1 1 1 Jackrabbit 4 14 9 1 1 1 17 81 51 9 46 32 1 18 8 1 3 1 Jena 20 32 24 1 1 1 111 169 137 56 72 65 24 54 35 1 2 1 JSPWiki 2 5 4 1 1 1 8 20 15 6 10 8 1 3 2 1 1 1 Lucene 1 7 4 1 2 1 9 59 43 14 34 25 1 5 3 1 2 1 PDFBox 4 8 6 1 1 1 13 32 22 11 19 15 1 6 3 1 2 1 Struts 0 5 2 1 1 1 3 24 15 3 11 8 1 5 2 1 3 1 Xerces 0 4 1 1 1 1 6 33 19 7 25 17 1 7 4 1 7 3 Table 7.13: Minimum, maximum, and average values for the decay metrics from the ARC view System RCI BDCC Instability MQ Smell Density Smell Coverage min max avg min max avg min max avg min max avg min max avg min max avg ActiveMQ 4% 6% 5% 1% 2% 1% 43% 51% 48% 5% 8% 6% 0.32 0.42 0.38 76% 86% 82% Cassandra 2% 9% 5% 0% 2% 1% 43% 52% 47% 6% 10% 9% 0.28 0.45 0.37 80% 93% 87% Chukwa 5% 8% 6% 1% 1% 1% 37% 48% 43% 6% 9% 7% 0.34 0.45 0.39 34% 45% 39% Hadoop 1% 17% 6% 0% 4% 1% 31% 52% 45% 9% 14% 12% 0.12 0.45 0.36 80% 95% 87% Ivy 7% 41% 21% 1% 16% 7% 39% 50% 45% 8% 12% 10% 0.13 0.33 0.22 93% 100%99% Jackrabbit 2% 10% 4% 0% 2% 1% 43% 51% 46% 8% 10% 8% 0.29 0.47 0.34 84% 93% 88% Jena 1% 2% 2% 0% 0% 0% 35% 39% 38% 6% 7% 7% 0.28 0.34 0.32 90% 94% 92% JSPWiki 8% 10% 9% 2% 3% 3% 37% 43% 40% 7% 9% 8% 0.27 0.39 0.32 75% 90% 83% Lucene 2% 5% 3% 0% 1% 0% 44% 61% 54% 6% 14% 9% 0.22 0.43 0.33 60% 84% 71% PDFBox 6% 8% 7% 1% 2% 1% 45% 52% 48% 5% 7% 6% 0.30 0.45 0.37 81% 90% 84% Struts 4% 18% 6% 0% 6% 1% 39% 53% 43% 3% 7% 6% 0.27 0.63 0.38 68% 93% 80% Xerces 4% 15% 6% 1% 3% 1% 48% 56% 51% 6% 15% 8% 0.28 0.44 0.36 76% 95% 87% AVG 4% 12% 7% 1% 4% 2% 40% 51% 46% 6% 10% 8% 0.26 0.44 0.35 75% 88% 82% ARCADE detects architectural smells to track an evolving system's architectural de- cay. Table 7.12 depicts the minimum, maximum, and average numbers of architectural smells occurring across the analyzed versions of each subject system. The table is divided into two regions: the left region shows the smells detected in the architectural view pro- duced by ACDC, while the right region presents the smells detected in the architectural 156 Table 7.14: Minimum, maximum, and average values for the decay metrics from the ACDC view System RCI BDCC Instability MQ Smell Density Smell Coverage min max avg min max avg min max avg min max avg min max avg min max avg ActiveMQ 19% 31% 22% 8% 14% 10% 36% 41% 39% 22% 30% 24% 0.14 0.22 0.17 83% 96% 92% Cassandra 37% 57% 44% 24% 45% 31% 41% 49% 45% 21% 29% 25% 0.09 0.24 0.16 88% 100%94% Chukwa 29% 34% 32% 15% 18% 16% 38% 41% 39% 23% 26% 24% 0.14 0.21 0.17 83% 94% 90% Hadoop 14% 80% 38% 5% 75% 25% 35% 49% 42% 20% 32% 28% 0.12 0.45 0.36 80% 95% 87% Ivy 31% 87% 49% 18% 87% 35% 43% 52% 48% 24% 36% 30% 0.31 0.62 0.42 84% 100%91% Jackrabbit 22% 41% 27% 10% 25% 13% 37% 44% 40% 22% 26% 23% 0.11 0.21 0.15 90% 100%95% Jena 12% 19% 17% 5% 8% 7% 36% 39% 37% 20% 22% 22% 0.14 0.17 0.15 96% 98% 97% JSPWiki 27% 38% 33% 14% 24% 19% 41% 45% 43% 23% 26% 25% 0.12 0.18 0.15 97% 100%99% Lucene 33% 55% 42% 20% 40% 29% 38% 46% 41% 26% 41% 32% 0.11 0.23 0.16 86% 100%95% PDFBox 30% 38% 35% 16% 25% 20% 39% 44% 42% 19% 24% 22% 0.14 0.24 0.19 94% 100%97% Struts 37% 67% 50% 21% 50% 33% 40% 48% 43% 21% 31% 25% 0.05 0.24 0.14 79% 100%92% Xerces 45% 64% 53% 30% 46% 35% 42% 48% 45% 32% 44% 37% 0.07 0.24 0.12 85% 100%93% AVG 28% 51% 37% 15% 38% 23% 39% 45% 42% 23% 31% 26% 0.13 0.27 0.19 87% 99% 94% view produced by ARC. For the ARC view, the table depicts the number of occurrences for the concern overload (CO), link overload (LO), scattered parasitic functionality (SPF), and dependency cycle (DC) smells. For the ACDC view, the table shows the number of occurrences for the link overload (LO) and dependency cycle (DC) smells only because ACDC does not recover a representation of concerns. As Table 7.12 indicates, the number of occurrences of dierent types of architectural smells follows a highly consistent ordering. Almost without exception, in architectures recovered by ARC, CO LO SPF DC; in architectures recovered by ACDC, LO DC. This same trend occurs in each individual system version. Clearly, this data does not take into account the relative \severity" of dierent types of architectural smells, or of dierent instances of the same type. Such an analysis may have a signicant qualitative component, and is outside the scope of this dissertation. Regardless, if engineers want to improve the maintainability of their system by reducing architectural-smell occurrences, they can use this ordering to prioritize their activities while taking into account similarities between smells of the same type and their remedies. 157 The number of occurrences of each type of architectural smell either increases over time or remains relatively constant in each of the subject systems. While several of the systems did, in rare instances, exhibit a decrease in some of the smell types between two consecutive versions, none showed a sustained decrease. Dependency cycle (DC) smells are low in numbers and, surprisingly, do not vary. Particularly in the ACDC view, for each system, a single dependency cycle is propagated across all versions; in other words, a dependency cycle is introduced early in each subject system and never removed. The number of scattered parasitic functionality (SPF) smells tends to remain relatively constant or to slowly increase as a system evolves. On the other hand, the numbers of link overload (LO) and concern overload (CO) smells generally increase in a system over time. This trend for LO and CO is not unexpected since the number of components in a system tends to increase over time and each instance of these smell types aects a single component. To quantify architectural decay across the subject systems as they evolve, we employed the six decay metrics introduced in Section 2.5. Table 7.13 depicts the resulting values for the architectural view produced by ARC, while Table 7.14 shows the resulting metric values for the ACDC view. For each metric, the tables show the minimum, maximum, and average values for each system, as well as the average of each of those three values across all systems. Overall, the results show that major sources of architectural decay in the subject systems were (1) architectural-smell occurrences (Smell Coverage) and (2) low cohesion within components and high coupling among them (MQ, i.e., modularization quality). The latter is consistent with ndings in our recent work [60, 59], in which architects of several long-lived systems conrmed that their architectures tend to have 158 highly-coupled and/or weakly-cohesive components.The remaining metrics show low-to- moderate decay throughout the subject systems as they evolve. For architectures recovered by ARC (Table 7.13), the average MQ varies from 6%-10%, while for ACDC (Table 7.14) it varies from 23%-31%. 1 These low values of modularization quality indicate that a signicant majority of components across all systems exhibit a combination of high coupling and low cohesion. Likewise, the smell coverage for ARC varies from 75%-88%, while for ACDC it varies from 87%-99%. These values indicate that an overwhelming majority of components in all systems suer from architectural smells as they evolve. The maximum MQ values, representing the best modularization quality|10% for ARC and 31% for ACDC|signify that decay due to high coupling and low cohesion occurs immediately, in the very rst version of a system. Similarly, the minimum smell coverage|75% for ARC and 87% for ACDC|reveals that architectural smells aect a large majority of components even in the rst version of a system. Put another way, signicant signs of architectural decay are detectable from the initial publicly available version of each of our subject systems. The architectural Smell Density was generally low as the systems evolved. For ARC, the smell density varied from 0.26-0.44 with an average of 0.35, while for ACDC, the smell density varied from 0.13-0.27 with an average of 0.19. The combination of high smell coverage (i.e., most components are aected by at least one architectural smell) and low smell density (i.e., a low number of architectural smells per component) indicates that instances of architectural smells that aect multiple components|SPF and DC|tend to aect many components at once. In turn, this suggests that engineers may specically 1 The MQ values for each system have been normalized by the system's number of components. 159 target these two smell types if they wish to reduce the portion of a system that exhibits architectural decay. Recall from Section 2.5 that Instability is the likelihood that a system's components will have to change as a result of changes to their dependent components. For architectural views recovered by both ARC and ACDC, the instability of the subject systems was moderate and did not grow drastically throughout each system's lifespan. Instability for ARC (Table 7.13) varied from 40%-51%, while for ACDC (Table 7.14) it varied from 39%-45%. Finally, the values of bi-directional component coupling (BDCC ) and ratio of cohesive interactions (RCI ) were generally low as our subject systems evolved. The RCI values ranged between 4%-12% with an average of 7% for ARC, and 28%-51% with an average of 37% for ACDC. For BDCC, the values ranged between 1%-4% with an average of 2% for ARC, and 15%-38% with an average of 23% for ACDC. Note that, for each version of each system, ACDC yielded a signicantly more \clut- tered" architecture than ARC (as measured by BDCC and RCI). On the other hand, the smell density was higher for ARC. Together, the results for BDCC, RCI, and smell den- sity suggest that architectural decay in the studied systems was less likely to be caused by the numbers of structural dependencies, and much more likely due to factors involving the balance of cohesion and coupling and prevalence of architectural smells. This is not too surprising. Structural dependencies are well understood, and have been so for a long time. They are relatively easy to identify and track. On the other hand, despite their widely-acknowledged importance in software engineering, the interplay between cohesion 160 and coupling is much subtler and is not nearly as well understood, while architectural smells have only recently gained the attention of software engineers. 7.3.2.3 RQ3: Relating Decay and Issues To understand the relationship between architectural decay and reported issues, we an- alyzed the correlation between the numbers of architectural smells and issues. In total, our study resulted in the analysis of 3,015 architectural smells and 24,170 issues. Ta- ble 7.15 depicts Pearson and Spearman Correlation values and their associated p-values [175] for each of the two architectural views of each subject system. We use Pearson's correlation coecient to measure the strength and direction of the linear correlation be- tween the number of issues and the number of smells. Spearman's coecient determines whether the correlation between the number of issues and the number of smells can be described as a monotonic function. Spearman's correlation is typically recommended over Pearson's for non-normal data [25]. We include both coecients to better determine whether the relationship between smells and issues is linear, monotonic, or neither. Each p-value in Table 7.15 is the probability that we have obtained the corresponding corre- lation by chance. The table shows three dierent averages of the correlations computed using Fisher's z transformation [161]. Fisher's z transformation converts correlation co- ecients, which are usually not normally distributed, to a normally distributed space, allowing those coecients to be added and thus averaged. Among the three correlation averages computed, one average is unweighted (AVG), while the others are weighted by the number of versions for each system (NAVG) and the total SLOC across all versions 161 for each system (SAVG). The last row of Table 7.15 (ALL) shows the correlations and p- values for the case where we treat the number of issues and architectural smells obtained across all system versions as a single, large sample. To interpret these results, we computed their statistical signicance using the t-test [85, 175] and setting the signicance level to = 0:05 [136]. For each correlation co- ecient, we test the null hypothesis H 0 (There is no correlation between the numbers of issues and architectural smells.) against the alternative hypothesis H a (There is a positive or negative correlation between the numbers of issues and architectural smells.). Dark gray cells in Table 7.15 depict correlation coecients that are statistically signicant because their corresponding p-values are less than . For these correlation coecients, we reject H 0 and accept H a . In the case of ve of our subject systems (Hadoop, Ivy, Jena, Struts, and Xerces) we nd statistically signicant positive correlations, suggesting that the number of issues can be used as an indicator of the presence of architectural smells. On the other hand, for three other systems (Chukwa, Jackrabbit, and Lucene) we nd statistically signi- cant negative correlations, indicating that the numbers of architectural smells and issues change in opposite directions. Finally, we are unable to draw statistically signicant conclusions about the remaining systems (ActiveMQ, Cassandra, JSPWiki, and PDF- Box). Understanding these varying results will require further analysis. To that end, we intend to examine whether factors such as issue types (i.e., issues dealing with bugs, feature requests, etc.) as well as system-dependent characteristics (e.g., developer traits, project-specic cultures, etc.) may impact architectural decay. 162 While the correlations varied in the case of individual systems, the three computed averages at the bottom of Table 7.15 suggest that, in a majority of cases, there is a small positive correlation between the numbers of issues and architectural smells. Three of the Pearson correlation averages are not statistically signicant, however, and we cannot draw denitive conclusions from those results. The fact that the Spearman correlation averages (0.144-0.326) are both statistically signicant and larger than their correspond- ing statistically-signicant Pearson correlation averages (0.113-0.179) suggests that the relationship between issues and architectural smells may be monotonic but not linear. Finally, for the case where we treat all versions of our subject systems as a single sample (the bottom row of Table 7.15), only the Spearman correlation for ACDC was statistically signicant. That result further corroborates the possibility that the numbers of issues and smells have a positive monotonic correlation. However, the result may be biased by the characteristics of those subject systems for which we included larger numbers of versions, further motivating us to isolate and study those characteristics. Table 7.15: Correlation Between Issues and Smells ACDC ARC Pearson Spearman Pearson Spearman System Corr p Corr p Corr p Corr p ActiveMQ 0.351 0.129 0.088 0.356 0.388 0.091 0.037 0.438 Cassandra -0.050 0.583 0.102 0.131 -0.112 0.217 0.102 0.131 Chukwa 0.370 0.469 0.490 0.162 -0.820 0.046 -0.770 0.963 Hadoop 0.187 0.121 0.432 0.000 0.000 1.000 0.381 0.001 Ivy 0.457 0.043 0.595 0.003 0.415 0.069 0.700 0.000 Jackrabbit -0.469 0.000 -0.280 0.995 -0.514 0.000 -0.288 0.996 Jena 0.490 0.264 0.560 0.096 0.750 0.052 0.790 0.017 JSPWiki 0.219 0.228 0.151 0.205 -0.182 0.319 -0.073 0.654 Lucene -0.622 0.010 -0.801 1.000 -0.386 0.140 -0.457 0.962 PdfBox 0.349 0.185 0.313 0.119 0.266 0.319 0.194 0.236 Struts 0.319 0.027 0.367 0.005 0.161 0.274 0.216 0.070 Xerces 0.515 0.017 0.746 0.000 0.600 0.004 0.812 0.000 AVG 0.179 0.000 0.240 0.000 0.044 0.382 0.177 0.000 NAVG 0.025 0.619 0.166 0.000 -0.071 0.158 0.144 0.002 SAVG 0.113 0.024 0.249 0.000 0.129 0.010 0.326 0.000 ALL 0.054 0.288 0.167 0.001 -0.082 0.105 0.016 0.751 163 7.3.3 Threats to Validity We identify several potential threats to the validity of our results and the corresponding mitigating factors. The key threats to external validity involve our subject systems. Although we used a limited number of systems, we selected them so that they vary along multiple dimensions, including application domain, number of versions, size, and time frame. The dierent numbers of versions analyzed per system pose another potential threat to validity. This is unavoidable, however, since some systems simply undergo more evolution than others; this was certainly the case with our subject systems. In order to further mitigate this threat, we compared versions against each other based on type (major, minor, and patch) and, whenever appropriate, normalized the computed metrics by the sizes and numbers of versions of each system. Another threat stems from the fact that we only analyzed Apache systems that use Jira as their issue repository and are implemented in Java. The diversity of the chosen systems helps to reduce this threat, as does the wide adoption of Apache software, Java, and Jira. The construct validity of our study is mainly threatened by the accuracy of the recovered architectural views and of our detection of architectural change and decay. To mitigate the rst threat, we selected the two architecture recovery techniques, ACDC and ARC, that have demonstrated the greatest accuracy in our extensive comparative analysis of available techniques [59]. The two techniques are developed independently of one another and use very dierent strategies for recovering an architecture. This, coupled with the fact that their results yield similar trends, helps to strengthen the condence 164 in our conclusions. To ensure that we properly detect architectural decay, we relied on a variety of metrics and on the detected architectural smells. To ensure accurate detection of architectural smells, we selected architectural-smell types whose denitions can be most directly transformed into detection algorithms [128]. We further ensure the accuracy of our detection algorithms by setting the dierent thresholds from the smell denitions (1) automatically, to avoid bias, or (2) manually, after extensive experimentation with multiple settings. Although we selected a subset of architectural-smell types for our study, those types have been shown to be good indicators of architectural problems [63, 62, 106, 128]. Finally, to properly characterize architectural change, we selected a widely-used system-level change metric (i.e., MoJoFM) and a previously validated component-level change metric (i.e., c2c) [59]. Our study's primary threat to internal validity and conclusion validity involves the relationship between reported implementation issues and architectural smells. We found a small positive correlation between issues and smells in the general case. This eect was statistically signicant for most of the average correlations. However, several individual systems had statistically signicant negative correlations. This forces us to limit and qualify our conclusions, and motivates the need to study the eects of other relevant variables, such as the ones we identied in Section 7.3.2.3. 165 Chapter 8 Related Work This dissertation covers three areas of related work: architectural smells, architecture recovery, and architectural evolution. Section 8.1 covers concepts related to architectural smells and techniques for identifying smells at the code level. Section 8.2 explains work related to manual architecture recovery, recovery of ground-truth architectures, and au- tomated architecture recovery. Section 8.3 discusses studies of software evolution and decay at two levels|the code and architectural levels. 8.1 Architectural Smells As part of related work, we discuss (1) concepts related to architectural smells and (2) smell detection at the code level. We distinguish architectural smells from other prob- lematic architectural phenomena. For code-smell detection, we cover the dierent types of code-smell detection techniques and their key elements. 166 8.1.1 Concepts Related to Architectural Smells We provide an overview of four topics that are directly related to architectural smells: code smells, architectural antipatterns, architectural mismatches, and defects. The term code smells was introduced by Beck and Fowler [57] for code structures that intuitively appear as bad solutions and indicate possibilities for code improvements. For most code smells, refactoring solutions that result in higher quality software are known. Although bad smells were originally based on subjective intuitions of bad code practice, recent work has developed ways to detect code smells based on metrics [112] and has investigated the impact of bad smells using historical information [102]. As previously noted, code smells only apply to implementation issues (e.g., a class with too many or too few methods), and do not guide software architects towards higher-level design improvements. Closely related to code smells are antipatterns [35]. An antipattern describes a re- curring situation that has a negative impact on a software project. Antipatterns include wide-ranging concerns related to project management, architecture, and development, and generally indicate organizational and process difculties (e.g., design-by-committee) rather than design problems. Architectural smells, on the other hand, focus on design problems that are independent of process and organizational concerns, and concretely address the internal structure and behavior of systems. The general definition of antipat- terns allows both code and architectural smells to be classied as antipatterns. However, antipatterns that specifically pertain to architectural issues typically capture the causes 167 and characteristics of poor design from a system-wide viewpoint (e.g., stove-piped sys- tems). Therefore, not all architectural antipatterns are dened in terms of standard ar- chitectural building blocks (e.g., vendor lock-in). Dening architectural smells in terms of standard architectural building blocks makes it possible to audit documented or recovered architecture for possible smells without needing to understand the history of a software system. Furthermore, architectural antipatterns can negatively affect any system quality, while architectural smells must affect lifecycle properties. Another concept similar to architectural smells is architectural mismatch [66]. Archi- tectural mismatch is the set of con icting assumptions architectural elements may make about the system in which they are used. In turn, these con icting assumptions may prevent the integration of an architectural element into a system. Work conducted in [58] and [14] has resulted in a set of conceptual features used to dene architectural designs in order to detect architectural mismatch. While instructive to our work, architectural mismatch research has focused heavily on the functional properties of a system without considering the eects on lifecycle properties. Finally, defects are similar to architectural smells. A defect is a manifestation of an error in a system [147]. An error is a mental mistake made by a designer or developer [147]. In other words, a defect is an error that is manifested in either a requirements, design, or implemented system that is undesired or unintended [97]. Defects are never desirable in a software system, while smells may be desirable if a designer or developer prefers the reduction in certain lifecycle properties for a gain in other properties, such as performance. 168 8.1.2 Code-Smell Detection The existing code-smell detection techniques are almost uniformly automated. Auto- mated code-smell detection techniques tend to utilize metrics that require thresholds to be specied. Several techniques utilize other mechanisms, such as biologically-inspired computing techniques, visualization, or change history. Lastly, a subset of these tech- niques are geared toward particular domains. One of the earliest code-smell detection techniques is manual; it relies on reading techniques to identify code smells [167]. In a reading technique, an engineer reads a software artifact to detect poor designs. The engineer must mark up artifacts and compare them to identify discrepancies that reveal code smells. jCosmo, one of the rst automated techniques for detection and visualization of code smells, relies on two key ingredients|(1) static analysis and (2) a meta-model for code- level entities (e.g., for methods and classes) [170]. jCosmo's visualization of code smells leverages the Rigi software visualization tool [176] and its graph-based model. Most of the automated code-smell detection techniques utilize the extraction of metrics and the specication of thresholds associated with those metrics. Marinescu [113] employs metrics-based rules and heuristics to identify code smells. Munro [131] takes the use of these metrics-based rules a step further by including templates that expand upon the text-based descriptions provided by Marinescu. Dexun et al. [48] detect the feature envy smell utilizing distance metrics and the weight of interactions between two code-level entities. 169 A few automated techniques apply a metrics-based approach to detect smells in spe- cic domains. Macia et el. [105] utilize a metrics-based approach in the context of aspect-oriented systems to detect code smells. The thresholds associated with metrics are categorized as either low, high, or very high in order to detect the dierent aspect- oriented code smells. Fard and Mesbah [53] present JSNose, a technique to detect code smells in JavaScript. JSNose detects each smell using a specic set of metrics and a pre-selected value for the threshold associated with each metric. For example, a long method for JSNose is any method with more than 50 source lines of code. Greiler et al. [69] provide code-smell detection techniques for testing code implemented in a tool called TestHound. After code smells for tests are identied, information about them is presented in three dierent reports|two reports about code that sets up dependencies, state, and preconditions necessary to exercise a test, and another report about the causes of test failure and ways to improve failed tests. Hermans et al. [78] transform code smells into their counterparts in spreadsheets and detect those transformed smells. Their technique also visualizes detected smells in the form of data- ow diagrams. Guo et al. provide an approach to code-smell detection that aims to tailor code-smell detection to specic domains. The approach relies on iterative modication of thresholds and metrics-based rules by a focus group consisting of software engineers. Moha et al. [129] present DECOR, a code-smell detection approach, and its implementation DETEX. DECOR and DETEX employ (1) a smell-denition language that allows the specication and detection of indi- vidual code smells or groups of them, (2) an object-oriented meta-model, (3) a framework for metrics computation, and (4) the Visitor design pattern to generate detection rules. 170 Several code-smell detection techniques rely on information other than code-level met- rics. Macia et al. [104] provide a tool called SCOOP to identify code smells that are relevant to architectural problems. SCOOP utilizes static analysis, architectural infor- mation, and data about groupings of code smells. Carneiro et al. [46] identify code smells by visualizing the dierent concerns associated with code-level entities and provid- ing dierent views of those entities. This approach is implemented as an Eclipse plug-in called SourceMiner. Palomba et al. [138] propose an approach called HIST (Historical Information for Smell deTection) that employs change history information to detect four architectural smells. HIST relies on determining co-changes between code-level entities (e.g., whether two methods were changed together in the history of a software system). Two techniques for code-smell detection employ articial immune systems (AIS) al- gorithms, which operate by mimicking the immune system of a living organism. AIS algorithms attempt to distinguish normal entities (e.g., non-smelly code) with abnormal entities (e.g., smelly code). Kessentini et al. [86] utilize an AIS algorithm that relies on similarity scoring and optimization algorithms to determine if a piece of code contains a code smell. Their algorithm assigns a value between 0 and 1 for each class, where higher values indicate that the class is more likely to be involved in a code smell. Hassaine et al. [77] employ an AIS algorithm that utilizes detectors of code smells that (1) are mutants of each other and (2) directly classify each programming-language class into smelly or non- smelly. Neither of the two AIS-based techniques are capable of distinguishing between types of code smells (e.g., they cannot distinguish between a blob or spaghetti code). Techniques for code-smell and architectural-smell detection use similar mechanisms. Like the majority of code-smell detection techniques, our architectural-smell detection 171 algorithms are metrics-based and employ thresholds associated with those metrics. The formalization of architectural concepts to enable architectural-smell detection is analogous to the use of code-level meta-models by certain code-smell detection techniques. Besides the abstraction levels at which code smells and architectural smells must be detected, these two types of smells dier in (1) the conditions under which they can be detected and (2) the implications of repairing such smells. Unlike code-smell detection, architectural-smell detection can be conducted before a system is implemented (e.g., in an architectural specication). Furthermore, detecting code smells does not necessarily indicate the existence of architectural smells [106]; this suggests that repairing code smells may still leave architectural smells in a system. 8.2 Architecture Recovery Three recent surveys together provide a detailed overview of software architecture recov- ery techniques. Ducasse and Pollet [51] provide a practitioner-oriented survey of over thirty existing approaches for software architecture recovery. Their survey is organized according to the goals, processes, inputs, techniques, and outputs of the recovery ap- proaches. Koschke presents a tutorial and survey [91] of architecture recovery organized around Symphony, a conceptual framework based on the use of architectural views and viewpoints. Maqbool and Babri present a survey on the use of hierarchical clustering tech- niques for architecture recovery [110]. They compare dierent measures and hierarchical clustering algorithms, and discuss the resulting research issues and trends. 172 In the rest of this section, we focus on three dierent aspects of architecture recovery. Section 8.2.1 discusses manual recovery techniques that obtain both components and connectors. Section 8.2.2 presents studies concerned with obtaining architectures that are suitable for evaluation of automated recovery techniques. Section 8.2.3 overviews automated recovery techniques and previous evaluations of such techniques. 8.2.1 Manual Architecture Recovery Manual component identication includes techniques that (1) require manual specication of patterns or queries or (2) provide tools to aid in the manual construction of components out of code-level entities. Some techniques provide a tool or tool suite that is designed to enable manual re- covery of architectures through view creation and manipulation. Rigi [176] provides an environment for program understanding that allows for the manual construction of archi- tectural views produced from implementation-level artifacts. The Dali workbench [83, 84] uses a suite of tools to extract views, manipulate them, and perform analysis on them. Dali utilizes (1) Rigi for graph editing and manipulation and (2) SQL for view fusion, pattern matching, and storage of architectural models. Another tool, called Arch [153], also provides the ability to manipulate recovered graphical and textual views. Holt [79] presents a set of relational operators based on Tarski relational algebra that allows for the manipulation of views of software entities; these operators enable the abstraction and decomposition of those entities and their interconnections. Several approaches to component identication rely on pattern specications that are matched against code-level constructs. ManSART [75, 74, 179] recovers the architectural 173 styles of systems by using a query language to specify patterns consisting of components and connectors that are used to match against code-level constructs. They also provide an approach to take the recovered views and manipulate them in order to merge views, create hierarchies in a view, or abstract away parts of views. Similar to ManSART, Fiutem et al. [55, 54] use pattern specications along with data- ow analysis to extract more relevant details useful for identifying components and connectors. Pinzger and Gall [141] utilize an XML-based technique for their pattern spec- ication. The X-Ray technique [124] employs (1) a query language and syntactic pattern matching to identify connectors and (2) reachability and dominance analysis to identify components. Guo et al. [72] present the Software Architecture Reconstruction Method (ARM), which is used to specify design patterns; thus, the focus of ARM is on object- oriented programming language constructs. Such constructs are typically not considered architectural. Although Guo et al. discuss the possibility of ARM for identication of higher-level architectural patterns, how this would actually be accomplished using ARM is not explained. Lastly, DiscoTect [152] produces architectural components by taking as input a specication that maps code-level executions to architectural elements and then uses Colored Petri Nets to dynamically detect them. Muller et al. provide a methodology [130] that creates a hierarchical (k; 2)-partite graph of entities which groups code-level entities into components. Unlike most other recovery techniques, this technique has two additional features: (1) each software entity may belong to more than one component; and (2) the technique provides a method for explicitly determining interfaces for recovered components. 174 Focus [122] is an approach for recovering an object-oriented software system's archi- tecture so that the architect can concentrate on evolving a particular part of the system to satisfy new requirements. The architect using Focus recovers a view containing both components and connectors but does not recover connector's explicitly. Focus provides a set of rules used to group classes into components. These rules utilize the structural dependencies between classes. Most existing techniques for identifying connectors leverage specications of patterns that match against pieces of text, such as identiers [55, 72, 75, 124, 141, 152]. For these techniques, an architect must specify the correct patterns for each connector. This pattern specication is a highly-manual task that requires examination and study of the source code to identify the correct pattern for every possible implementation variant of every connector of interest to an architect. ARTISAn [81] is a recovery technique that consists of two key steps that categorize each programming-language class as either a processing, data, or connecting element. Unlike other manual recovery techniques, ARTISAn leverages a rule-based approach for identifying connectors rather than specication of patterns. In the rst step, an engineer labels an initial set of elements as either processing, data, or connecting. After this initial step, a set of propagation rules are used based on the control and data dependencies of classes to determine whether each class is a processing, data, or connecting element. Both the initial labeling of classes and determination of propagation rules must be performed manually. Propagation rules need to be discovered by engineers for a particular domain. In addition to being a manual technique, ARTISAn lacks the ability to distinguish between connector types. 175 8.2.2 Ground-Truth Architectures The research literature in the area of architecture recovery contains several examples of manually extracted architectural information, authoritative architectures, and ground- truth architectures, all of which were used to evaluate recovery techniques. Due to the scope of this dissertation, we do not discuss the existing (semi-)automated architecture recovery techniques, which are surveyed in [51, 91, 110]. Several studies have attempted to re-document and recover the high-level architectures of widely-used systems, including the Linux kernel [32], the Apache Web Server [71], and a portion of Hadoop [21]. In another set of studies [15, 22, 90, 169, 177], researchers evaluate their proposed recovery techniques using authoritative architecture recoveries of several additional systems. Since none of these were ground-truth architectures, it is likely that they suered from inaccuracies. Furthermore, some of the above studies captured only very high-level architectural views, or erroneously con ated implementation packages with architectural components. Ground-truth architectures have been used to evaluate four recently proposed recovery techniques [26, 38, 39, 92] that extend Murphy et al.'s re exion models [133]. Christl and Koschke [38] use a ground-truth recovery of SHriMP [178], a tool for visualizing graphs comprising about 300 classes. An update to this technique [39] was evaluated on two additional ground-truth architectures of relatively small systems (up to 32 KSLOC). Bittencourt et al. [26] used ground-truth architectures of systems with comparable sizes: Design Wizard (7 KSLOC), a tool for extracting and querying designs, and Design Suite (24 KSLOC), a tool for abstracting and visualizing designs. 176 To our knowledge, only two ground-truth architectures of larger systems were obtained and used to evaluate recovery techniques. OpenOce (6 MSLOC), an open-source pro- ductivity software suite, was used by Koschke [92]. TOBEY (250 KSLOC) [15], the back- end for IBM's compiler products, was used to evaluate the ACDC [169] and LIMBO [15] recovery techniques. In the case of OpenOce, the ground-truth was very coarse-grained, making the recovered architecture less informative. For TOBEY, a ground-truth was ob- tained by relying exclusively on a series of interviews with developers, calling into question its accuracy. Furthermore, TOBEY is a proprietary system, which limits its use in eval- uating and improving recovery techniques. Common to all authoritative and ground-truth architecture recoveries discussed in this section is that neither the process through which they were obtained nor the eort needed to produce them were documented. 8.2.3 Automated Architecture Recovery Automated software-architecture recovery techniques have been around for over three decades [24, 80, 153, 154]. Most of them group implementation-level entities (e.g., les, classes, or functions) into clusters, where each cluster represents a component [91, 110, 174]. Some existing techniques search for specic patterns, usually relating to structural dependencies between entities, to identify components [169, 150, 151], while others have used concept analysis [160, 51] for the same purpose. Another class of techniques tries to partition system entities and their dependencies into components in a manner that maximizes some objective function [127, 144, 89]. Another set of techniques employs hierarchical clustering [15, 110, 134], which allows a recovered architectural view to be 177 shown at multiple levels of abstractions. These techniques iteratively obtain components by combining similar entities at each step. While a majority of the existing recovery techniques rely on structural input to identify components [51], over time researchers have also attempted to include non-structural information, such as directory paths and le authorship [16, 17, 126, 64]. Recent work has focused on utilizing textual input [41, 42, 126, 64] obtained from source code and comments. The resulting techniques combine textual input with other commonly-used architecture recovery mechanisms, such as objective function maximization or hierarchical clustering. Most recovery techniques are evaluated individually for their accuracy or other quality attributes. In certain cases, techniques have been evaluated against other techniques. An- quetil and Lethbridge [18] compare generic clustering algorithms that are not specically designed for architecture recovery. They utilize a combination of structural and non- structural input to obtain architectures. Other comparative analyses focus on techniques specically designed for architecture recovery. WCA [110, 134] has been evaluated against LIMBO [15, 110, 134]. WCA has two variations that dier based on the measures used for determining the similarity between entities: Unbiased Ellenberg (UE) and Unbiased Ellenberg-NM (UENM). LIMBO has been shown to be generally superior to WCA-UE, but has been outperformed by WCA-UENM. LIMBO, ACDC [169], and Bunch [15] have been compared to generic clustering techniques [15]. In this study, LIMBO and ACDC were shown to be the most accurate. Another study compared ACDC, Bunch, and generic clustering techniques [177]. The complete linkage (CL) clustering algorithm was shown to be generally the most accurate, while ACDC was shown to have limited accuracy. WCA 178 was shown to be superior to CL in yet another study [109]. Finally, Software Architecture Finder (SArF) [89], another technique that maximizes an objective function, has recently been compared against ACDC and Bunch [89]. A software system's package structure was used as the ground truth in this comparison. SArF showed better accuracy than ACDC and Bunch in the study. It can be seen from the above summary that dierent studies con ict in their con- clusions. We postulate that part of the reason for this is that each of these comparative analyses has been limited in one or more important ways. Each study has relied on a very small number of subject systems or recovery techniques, or both. Comparative anal- yses that have included a wider variety of techniques have done so by relying on generic clustering techniques that are not specically designed for architecture recovery. Com- parisons involving techniques actually designed for architecture recovery typically only include structural input. Furthermore, previous evaluations have either relied on a few \home-grown" subject systems, which tend to be small, or have had to rely on larger, third-party systems whose actual (\ground-truth") architectures are not fully known. In some cases, researchers have used directory or package structure as a widely available architectural proxy. However, our recent study [60] has shown that in most real systems, architects do not consider the directory or package structure to be an accurate re ection of a system's architecture. Finally, the above studies give little guidance as to the con- ditions under which a recovery technique excels or falters. Our work described in this dissertation tries to address each of these shortcomings. 179 8.3 Architectural Evolution Software evolution has been studied extensively at the code level, dating back several decades (e.g., Lehman's laws [95]). We will highlight several examples that have in uenced our work. Godfrey and Tu [68] discovered that Linux's already large size did not prevent it from continuing to grow quickly. Kim et al. [88] studied the evolution of code clones and determined that refactoring may not always improve software with respect to clones. Chatzigeorgiou and Manakos [37] studied three types of code smells across 24 versions of two systems. They found that most code smells exist throughout all versions they examined. Li and Shatnawi [98] studied three versions of Eclipse and found that classes with code smells are prone to errors. Eick et al. [52] found a reduction in modularity over the 15-year evolution of software for a telephone switching system. Murgia et al. [132] studied the evolution of Eclipse and Netbeans and found that 8%-20% of code-level entities contain about 80% of all bugs. While interesting, informative, and in uential in our work, these studies do not examine the evolution of a software system's architecture. A few studies have attempted to investigate architectural evolution and decay. These studies are smaller in scope than our work in this dissertation. Additionally, unlike ARCADE's use of structural and semantic architectural views, only one of these studies considers more than one architectural perspective|however, in that study, as well as several others, the chosen perspectives are arguably not architectural at all. None of these studies rely on architectural smells as indicators of decay. Each study also diers from our work in other important ways. 180 Two studies have examined architectural decay by using the re exion method, a technique for comparing intended and recovered architectures. Brunet et al. [36] studied the evolution of architectural violations in 76 versions selected from four subject systems. Rosik et al. [148] conducted a case study using the re exion method to assess whether architectural drift, i.e., unintended design decisions, occurred in their subject system and whether instances of drift remain unsolved. These studies do not examine the extent of architectural change or use any architectural-decay metrics. Another group of studies has treated implementation packages as architectural com- ponents. Our recent work has shown that software architects consider packages to be an inaccurate architectural proxy [60]. We thus consider the results of these studies to be more indicative of implementation change than of architectural change. This is consistent with the widely referenced 4+1 architectural-view model [93], in which packages belong to a system's implementation view. Bouwers et al. [28, 29, 30] assess the usefulness of a metric for balancing the number of components in a system and another metric for assessing coupling between components. We considered both these metrics for inclusion in ARCADE. We decided against includ- ing the balancing metric because our previous studies [60, 59] determined that ACDC and ARC, the two architecture recovery techniques we used, obtain appropriate numbers of components in practice. Although ARCADE already includes several coupling met- rics, we are exploring the possibility of incorporating Bouwers et al.'s coupling metric and assessing its eectiveness for recovered architectures. Wermelinger et al. [173] apply architectural-decay metrics to 53 versions of Eclipse. In their study, Eclipse exhibited generally decreasing cohesion, increasing coupling, and low instability, which is similar to 181 our ndings. Sangwan et al. [149] apply architectural-decay metrics to 21 versions of Hi- bernate, and conclude that Hibernate tends to have low instability. Finally, Zimmerman et al. [180] propose that true coupling is determined by studying revision histories and code-level entities rather than the decomposition of modules or les. Our study similarly examines coupling trends over time, but does so from recovered architectural views. Three additional studies investigate dierent facets of architectural decay. Hassaine et al. [76] present a recovery technique, which they use to study decay in six versions of three systems. Unlike our study, they do not investigate decay at the component level, or the extent of architectural change. Furthermore, the accuracy of their recovery technique is unclear. van Gurp et al. [171] conduct two case studies of software systems to better understand the nature of architectural decay and how to prevent it. They provide only qualitative evidence for their ndings and do not specify the number of versions they have studied. D'Ambros et al. [43] present an approach for studying software evolution that focuses on the storage and visualization of evolution information at the code and architectural levels. Their study utilizes a dierent set of architectural metrics than ours, specically targeted at their visualizations. 182 Chapter 9 Conclusion and Future Work Throughout a software system's lifecycle, maintenance activities tend to dominate other activities in terms of money, eort, and time. A major aspect of maintaining a software system is updating, understanding, and evolving that system's software architecture. The phenomena of architectural drift and erosion [140, 164]|collectively referred to as archi- tectural decay|contribute to and exacerbate software maintenance. Architectural decay is particularly egregious since such decay directly aects the principal design decisions of a software system which can lead to major losses for an organization. To address architectural decay, engineers must understand the current architecture of their systems, identify the instances that cause architectural decay, and be knowledgeable of the manner in which architectures change and the decay such change often causes. This dissertation contributes a unied framework that comprehensively addresses the problems that arise from architectural decay. This framework includes a catalog com- prising an expansive list of architectural smells (i.e., architectural-decay instances) and a means of identifying such smells in software architectures; a framework for construct- ing ground-truth architectures to aid the evaluation of automated recovery techniques; 183 ARC, a novel recovery approach that is accurate and rich in terms of the architectural abstractions it extracts; and ARCADE, a framework for the study of architectural change and decay. This dissertation provides several evaluations of its dierent contributions; it describes lessons learned from applying the ground-truth recovery framework, compares architecture-recovery techniques along multiple accuracy measures, and contributes the most extensive empirical study of architectural change and decay to date. To better understand the manner in which architectures decay, this dissertation con- tributes the concept of architectural smells; a catalog of architectural smells, which in- cludes formalizations of architectural smells and their underlying architectural concepts; and mechanisms for detecting architectural smells. Code smells have helped developers identify when and where source code needs to be refactored [57]. Analogously, architec- tural smells tell architects when architectures decay and where to refactor their archi- tectures to stem that decay. Architectural smells manifest themselves as violations of traditional software engineering principles, such as isolation of change and separation of concerns, but they go beyond these general principles by providing specic repeatable forms that can be automatically detected. The notion of architectural smells can be applied to large, complex systems by revealing opportunities for smaller, local changes within the architecture that cumulatively add up to improved system quality. Therefore, architects can use the concept and catalog of smells to analyze the most relevant parts of an architecture without needing to deal with the intractability of analyzing the system as a whole. To recover architectures in which we can identify decay, we aimed (1) to help im- prove future architecture recovery techniques and (2) to encourage further construction 184 of ground-truth architectures. The produced ground-truth architectures suggest three properties a recovered architecture is likely to have. First, our ground-truth architectures consisted of components with fairly limited numbers of entities, grouped predominantly based on their semantic similarities. Second, the components in a ground-truth archi- tecture rarely have a direct correspondence to a system's package structure. Third, the perceived accuracy of a recovered architecture largely depends on the appropriate iden- tication of utility components. Clustering software entities is an almost uniformly employed method for automated architecture recovery. In order to improve its accuracy, it is important to provide a care- ful and reliable comparison of the current state-of-the-art software architecture recovery techniques. The work described in this dissertation is a step in the direction of creating a set of baseline measurements that can be used as a foundation for future research in the area. With these ground-truth architectures, we have presented a comparative analysis of eight dierent variants of six state-of-the-art, automated architecture recovery techniques. We assessed their accuracy on eight architectures derived from real software systems. In order to conduct the analysis, we complemented the previously available MoJoFM metric with two new metrics for assessing architecture recovery techniques: c2c and criterion analysis. We perform the analysis, both, at the system-wide level and at the level of individual system components. After conducting the previously-described comparative analysis, we utilized the re- sults of that study to aid in the construction of ARCADE, a framework for studying 185 architectural change and decay. ARCADE's key ingredients include architecture recov- ery, architectural change and decay metrics, architectural smell detection, and correlation between architectural smells and reported issues. This dissertation presented the largest empirical study to date of architectural change and decay in long-lived software systems. The study's scope is re ected in the number of subject systems (12), total number of examined system versions (463), total amount of analyzed code (112.6 MSLOC), the number of applied architecture-recovery techniques (2) resulting in distinct architectural views produced for each system, the number of analyzed architectural models (926, i.e., two views per system version), the number of architectural smells detected (3,015), the number of implementation issues reported (24,170), and the number of architectural change metrics (3) and decay metrics (6) applied to each of the 926 architectural models. This scope was enabled by ARCADE, a novel automated workbench for architectural recovery and analysis. 9.1 Future Work The unied framework of this dissertation provides a foundation for studying a wide vari- ety of architectural phenomena of existing software systems. Besides enabling additional studies of recovered architectures, future directions can further evaluate dierent aspects of our unied framework. In the rest of this chapter, we focus on possible avenues for future work and our on-going work on examining and addressing architectural decay. 186 9.1.1 Architectural Smells Future work on architectural smells includes a categorization of architectural smells, eval- uation of architectural smell detection, correction processes that aim to restructure archi- tectures to remove smells, and tool support to aid in those processes. A categorization of architectural smells would include an analysis of the impact, origins, and ways to correct the smells. Architectural smells may be captured in an architectural description language, which would allow conceptual architectures to be analyzed for smells before they are im- plemented. Correction of smells would include the inception of a set of architectural refactoring operations and the provision of tools to help recommend particular opera- tions for detected smells. In attempting to repair architectures of widely-used systems, the authors of [166] identied a set of operations that can be used as a starting point for determining a complete set of architectural refactoring operations. By trying to correct some of the architectural smells we found in both our own and others' experiences, such as [32] [67] [71] [166], we hope to identify other architectural refactoring operations and determine which operations are relevant to particular smells. 9.1.2 Automated Architecture Recovery The unied framework that this dissertation contributes enables diverse future direc- tions for the further study of architectural decay. These future directions include the construction of more ground-truth architectures that are larger in size and vary along other dimensions; the opening of other avenues for comparing and improving recovery techniques; further evaluation of the kinds of rich views recovered by ARC; and further 187 empirical study of architectural evolution and decay, which includes the prediction of decay in software systems to enable preemption of decay before it occurs. 9.1.2.1 Obtaining Ground-Truth Architectures Our recovery experience appears to invalidate the prior intuition that constructing a ground-truth architecture for large systems is infeasible. In fact, we provide considerable evidence that engineers are not only willing to help recover the architectures of their systems, but also that their involvement can be made manageable by our ground-truth recovery framework. These results have encouraged us to continue this eort, with our on-going work focusing on Google Chromium, a very large system comprising 12 MSLOC. More broadly, we believe that our ndings can help improve the understanding of software architectures in general and of the ways architectures degrade. We observed that the dierences between the conceptual and ground-truth architectures stem both from architectural degradation and from the dierent abstraction levels and purposes of those architectures. This raises further questions regarding the appropriate ways of constructing these dierent views, as well as understanding and reconciling their dierences in support of software design, maintenance, and evolution tasks. Finally, it is important to note that, although we recovered a single ground-truth architecture for each of the systems studied in this dissertation, a single system may have multiple ground-truth architectures. This will depend on the perspective from which the recoverers are approaching the architecture. Our recovery framework can, in fact, accommodate multiple ground-truths for a single system: once the mapping principles 188 are veried by certiers, then it is likely that more than one recovered architecture con- sistent with those mapping principles could be considered as a ground truth. This is an observation we are exploring in our on-going work. 9.1.2.2 Comparing and Improving Recovery Techniques As part of our on-going work, we are studying additional open-source software systems. This includes extracting the ground-truth architecture of Google's Chromium, which, at 12msloc, has presented a challenging task. We are also trying to isolate the key factors needed to recover an accurate architecture. The recovery techniques may have performed poorly in our study because dierent criteria may be eective at identifying dierent kinds of system components. To test this hypothesis, we have begun to identify characteristics that may be recognized in dierent types of components, including (1) components that implement a system's core business logic, (2) components that provide utility functionality that is intended to be reused across multiple systems, and (3) components that mediate the interactions among the core and utility components (i.e., the system's connectors). Once we have identied these three types of components in our ground-truth architec- tures, we will analyze whether and to what extent the accuracy of the recovery techniques we used in this study varies across component types. Additionally, identifying the char- acteristics of dierent component types has the potential to suggest new, more targeted clustering criteria that will, in turn, result in the composition of more accurate recovery techniques. Although concern extraction and brick recovery have been utilized and evaluated as part of our comparative analysis of recovery techniques, future work includes an empirical 189 evaluation of concern meta-classication and connector recovery. Such an evaluation can help identify means of improving recovery techniques and assess whether the links between components and connectors are spurious or missing. 9.1.3 Studying Architectural Evolution One overarching conclusion of our study using ARCADE is that a system's architectural semantics are more important than structure in stemming decay. This was indicated by two independent ndings: (1) the semantic view of an architecture undergoes signicantly more change than its structural view, and (2) larger amounts of structural \clutter" in a system do not necessarily correlate with greater architectural decay. This suggests that software engineers are more adept at handling structural complexity than semantic com- plexity. At the same time, a signicant segment of the research of software architecture, and in particular the research of architecture recovery, has focused on system structure. Along with the results of our recent evaluation of recovery techniques [59], this suggests that there is both a need and an opportunity for investigating more eective approaches to architecture recovery. ARCADE provides a powerful foundation for studying a wide variety of architectural phenomena as software systems evolve. Besides including additional subject systems, we are working to extend ARCADE to support other architectural constructs (e.g., compo- nent types, software connectors [164], their interfaces, and their concerns). To improve our understanding of the relationship between architectural decay and reported issues, we 190 intend to incorporate additional types of architectural smells from our catalog. Our long- term goal is to leverage ARCADE to predict architectural decay and major architectural change based on available implementation-level information. 191 References [1] MinimumSpanningForest (jung2 2.0.1 API). [2] Mozilla home of the mozilla project mozilla.org. [3] mkdep, 1993. [4] Apache TM Hadoop TM , 2012. [5] Archstudio - Foundations - Myx and myx.fw:, 2012. [6] IBM Rational Software Architect, 2012. [7] Poweredby - Hadoop Wiki, 2012. [8] Programmer's Friend - Class Dependency Analyzer, 2012. [9] recoveries:hadoop 0.19.0 [USC Softarch Wiki], 2012. [10] recoveries:start [USC SoftArch Wiki]. http://softarch.usc.edu/wiki/doku. php?id=recoveries:start, 2012. [11] The Chromium Projects. http://www.chromium.org/, 2012. [12] arcade:start [USC SoftArch Wiki]. http://softarch.usc.edu/wiki/doku.php? id=arcade:start, 2014. [13] rcarz/jira-client GitHub. https://github.com/rcarz/jira-client, 2014. [14] A.A.E.S. Abd-Allah. Composing heterogeneous software architectures. PhD thesis, University of Southern California, 1996. [15] Periklis Andritsos and Vassilios Tzerpos. Information-theoretic software clustering. IEEE TSE, 2005. [16] N. Anquetil and T. Lethbridge. File clustering using naming conventions for legacy systems. In Conference of the Centre for Advanced Studies on Collaborative Re- search, 1997. [17] N. Anquetil and T.C. Lethbridge. Recovering software architecture from the names of source les. Journal of Software Maintenance: Research and Practice, 1999. 192 [18] Nicolas Anquetil and TC Lethbridge. Comparative study of clustering algorithms and abstract representations for software remodularisation. IEE Proceedings- Software, 2003. [19] H. U Asuncion, A. U Asuncion, and R. N Taylor. Software traceability with topic modeling. In Proceedings of the 32nd ICSE, 2010. [20] S. Bajracharya and C. Lopes. Mining search topics from a code search engine usage log. In 6th International Working Conference on Mining Software Repositories, 2009. [21] Len Bass, Rick Kazman, and Ipek Ozkaya. Developing architectural documentation for the hadoop distributed le system. In OSS. 2011. [22] F. Beck and S. Diehl. Evaluating the Impact of Software Evolution on Software Clustering. In WCRE, 2010. [23] Fabian Beck and Stephan Diehl. On the congruence of modularity and code cou- pling. In ESEC/FSE, 2011. [24] LA Belady and CJ Evangelisti. System partitioning and its measure. Journal of Systems and Software (JSS), 1981. [25] Anthony J Bishara and James B Hittner. Testing the signicance of a correla- tion with nonnormal data: Comparison of pearson, spearman, transformation, and resampling approaches. Psychological methods, 2012. [26] R.A. Bittencourt, J. de Souza Santos, D.D.S. Guerrero, and G.C. Murphy. Improv- ing Automated Mapping in Re exion Models Using Information Retrieval Tech- niques. In WCRE, 2010. [27] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003. [28] Eric Bouwers, Jos e Pedro Correia, Arie van Deursen, and Joost Visser. Quantifying the analyzability of software architectures. In Software Architecture (WICSA), 2011 9th Working IEEE/IFIP Conference on, pages 83{92. IEEE, 2011. [29] Eric Bouwers, Arie van Deursen, and Joost Visser. Evaluating usefulness of software metrics: an industrial experience report. In ICSE, pages 921{930. IEEE Press, 2013. [30] Eric Bouwers, Arie van Deursen, and Joost Visser. Dependency proles for soft- ware architecture evaluations. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 540{543. IEEE, 2011. [31] Daniel P Bovet and Marco Cesati. Understanding the Linux kernel. O'Reilly Media, 2008. [32] I.T. Bowman, R.C. Holt, and N.V. Brewster. Linux as a case study: its extracted software architecture. In ICSE, 1999. 193 [33] L.C. Briand, Sandro Morasca, and V.R. Basili. Measuring and assessing maintain- ability at the end of high level design. In Proceedings of the Conference on Software Maintenance (ICSM 1993)., 1993. [34] A. Brown and G. Wilson. The Architecture of Open Source Applications, volume 1. Lulu. com, 2011. [35] WJ Brown, RC Malveau, HW McCormick III, TJ Mowbray, J Wiley, and I Sons. AntiPatterns - Refactoring Software, Architectures, and Projects in Crisis. Wiley, New York, 1998. [36] Joao Brunet, Roberto Almeida Bittencourt, Dalton Serey, and Jorge Figueiredo. On the evolutionary nature of architectural violations. In Reverse Engineering (WCRE), 2012 19th Working Conference on. IEEE, 2012. [37] Alexander Chatzigeorgiou and Anastasios Manakos. Investigating the evolution of bad smells in object-oriented code. In Quality of Information and Communications Technology (QUATIC), 2010 Seventh International Conference on the. IEEE, 2010. [38] A. Christl, R. Koschke, and M.A. Storey. Equipping the re exion method with automated clustering. In WCRE, 2005. [39] Andreas Christl, Rainer Koschke, and Margaret A. Storey. Automated clustering to support the re exion method. Information and Software Technology, 49(3), 2007. [40] Apache Commons. Commons-math: The apache commons mathematics library. http://commons. apache. org/math/acesso em, 11:2012, 2009. [41] Anna Corazza, Sergio Di Martino, and Giuseppe Scanniello. A probabilistic based approach towards software system clustering. In European Conference on Software Maintenance and Reengineering (CSMR), 2010. [42] Anna Corazza, Sergio Di Martino, Valerio Maggio, and Giuseppe Scanniello. Inves- tigating the use of lexical information for software system clustering. In European Conference on Software Maintenance and Reengineering (CSMR), 2011. [43] Marco D'Ambros, Harald Gall, Michele Lanza, and Martin Pinzger. Analysing software repositories to understand software evolution. Springer, 2008. [44] E. Dashofy, H. Asuncion, S. Hendrickson, G. Suryanarayana, J. Georgas, and R. Taylor. Archstudio 4: An architecture-based meta-modeling environment. In Companion to ICSE, 2007. [45] E. Dashofy and A. Van Der Hoek. Representing product family architectures in an extensible architecture description language. Software Product-Family Engineering, pages 124{144, 2002. 194 [46] G de F Carneiro, Marcos Silva, Leandra Mara, Eduardo Figueiredo, Claudio Sant'Anna, Alessandro Garcia, and Manoel Mendon ca. Identifying code smells with multiple concern views. In Software Engineering (SBES), 2010 Brazilian Sympo- sium on, pages 128{137. IEEE, 2010. [47] Jerey Dean and Sanjay Ghemawat. Mapreduce: simplied data processing on large clusters. Commun. ACM, 51(1), 2008. [48] Jiang Dexun, Ma Peijun, Su Xiaohong, and Wang Tiantian. Detecting bad smells with weight based distance metrics theory. In Instrumentation, Measurement, Com- puter, Communication and Control (IMCCC), 2012 Second International Confer- ence on, pages 299{304. IEEE, 2012. [49] Edsger Wybe Dijkstra, Edsger Wybe Dijkstra, Edsger Wybe Dijkstra, and Eds- ger Wybe Dijkstra. A discipline of programming, volume 1. prentice-hall Engle- wood Clis, 1976. [50] Lei Ding and Nenad Medvidovic. Focus: A light-weight, incremental approach to software architecture recovery and evolution. In Working IEEE/IFIP Conference on Software Architecture (WICSA), 2001. [51] S. Ducasse and D. Pollet. Software architecture reconstruction: A process-oriented taxonomy. IEEE TSE, 2009. [52] Stephen G Eick, Todd L Graves, Alan F Karr, J Steve Marron, and Audris Mockus. Does code decay? assessing the evidence from change management data. IEEE TSE, 2001. [53] Amin Milani Fard and Ali Mesbah. Jsnose: Detecting javascript code smells. In Source Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International Working Conference on, pages 116{125. IEEE, 2013. [54] R. Fiutem, G. Antoniol, P. Tonella, and E. Merlo. ART: an architectural reverse engineering environment. Journal of Software Maintenance: Research and Practice, 11(5):339{364, 1999. [55] R. Fiutem, P. Tonella, G. Anteniol, and E. Merlo. A cliche-based environment to support architectural reverse engineering. In 3rd WCRE, pages 277{286, 2002. [56] Ian Foster et al. The anatomy of the grid: Enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 15(3), 2001. [57] M. Fowler. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999. [58] C. Gacek. Detecting Architectural Mismatches During Systems Composition. PhD thesis, Univ. of Southern California, 1998. 195 [59] Joshua Garcia, Igor Ivkovic, and Nenad Medvidovic. A comparative analysis of soft- ware architecture recovery techniques. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 486{496, 2013. [60] Joshua Garcia, Ivo Krka, Chris Mattmann, and Nenad Medvidovic. Obtaining ground-truth software architectures. ICSE, 2013. [61] Joshua Garcia, Ivo Krka, Nenad Medvidovic, and Chris Douglas. A framework for obtaining the ground-truth in architectural recovery. In Joint Working IEEE/I- FIP Conference on Software Architecture & 6th European Conference on Software Architecture (WICSA/ECSA), pages 292{296. IEEE, 2012. [62] Joshua Garcia, Daniel Popescu, George Edwards, and Nenad Medvidovic. Toward a catalogue of architectural bad smells. In QoSA '09: Proc. 5th Int'l Conf. on Quality of Software Architectures, 2009. [63] Joshua Garcia, Daniel Popescu, George Edwards, and Medvidovic Nenad. Identi- fying Architectural Bad Smells. In 13th European Conference on Software Mainte- nance and Reengineering, 2009. [64] Joshua Garcia, Daniel Popescu, Chris Mattmann, Nenad Medvidovic, and Yuanfang Cai. Enhancing architectural recovery using concerns. In ASE, 2011. [65] Joshua Garcia, Daniel Popescu, Gholamreza Sa, William GJ Halfond, and Nenad Medvidovic. Identifying message ow in distributed event-based systems. In Pro- ceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 367{377. ACM, 2013. [66] David Garlan, Robert Allen, and John Ockerbloom. Architectural mismatch or why it's hard to build systems out of existing parts. In Proc. of the 17th International Conference on Software Engineering, 1995. [67] Michael W. Godfrey and Eric H. S. Lee. Secrets from the monster: Extracting mozilla's software architecture. In Proc. of the Second International Symposium on Constructing Software Engineering Tools, 2000. [68] Michael W Godfrey and Qiang Tu. Evolution in open source software: A case study. In Software Maintenance, 2000. Proceedings. International Conference on. IEEE, 2000. [69] Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. Automated de- tection of test xture strategies and smells. In Software Testing, Verication and Validation (ICST), 2013 IEEE Sixth International Conference on, pages 322{331. IEEE, 2013. [70] T.L. Griths and M. Steyvers. Finding scientic topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228, 2004. 196 [71] B Gr one, A Kn opfel, and R Kugel. Architecture recovery of apache 1.3 { a case study. In Proc. of the International Conference on Software Engineering Research and Practice 2002, 2002. [72] G.Y. Guo, J.M. Atlee, and R. Kazman. A software architecture reconstruction method. In WICSA, page 15. Springer Netherlands, 1999. [73] Kim Haase. Java message service tutorial, 2002. [74] David R. Harris, Howard B. Reubenstein, and Alexander S. Yeh. Reverse engineer- ing to the architectural level. In Proceedings of the 17th ICSE, ICSE '95, pages 186{195, New York, NY, USA, 1995. ACM. [75] D.R. Harris, H.B. Reubenstein, and A.S. Yeh. Recognizers for extracting archi- tectural features from source code. Reverse Engineering, Working Conference on, 0:252, 1995. [76] Salima Hassaine, Y Gu eh eneuc, Sylvie Hamel, and Giuliano Antoniol. Advise: Ar- chitectural decay in software evolution. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on. IEEE, 2012. [77] Salima Hassaine, Foutse Khomh, Y-G Gueheneuc, and Sylvie Hamel. Ids: An immune-inspired approach for the detection of software design smells. In Quality of Information and Communications Technology (QUATIC), 2010 Seventh Inter- national Conference on the, pages 343{348. IEEE, 2010. [78] Felienne Hermans, Martin Pinzger, and Arie van Deursen. Detecting and visualizing inter-worksheet smells in spreadsheets. In Proceedings of the 2012 International Conference on Software Engineering, pages 441{451. IEEE Press, 2012. [79] Richard C. Holt. Structural manipulations of software architecture using tarski relational algebra. Reverse Engineering, Working Conference on, 0:210, 1998. [80] David H. Hutchens and Victor R. Basili. System structure analysis: Clustering with data bindings. IEEE TSE, 1985. [81] V. Jakobac, N. Medvidovic, and A. Egyed. Separating architectural concerns to ease program understanding. In Proceedings of the 2005 workshop on Modeling and analysis of concerns in software, pages 1{5. ACM, 2005. [82] George Edwards Joshua Garcia, Daniel Popescu and Nenad Medvidovic. Identifying Architectural Bad Smells. In 13th European Conference on Software Maintenance and Reengineering, 2009. [83] R. Kazman and S.J. Carriere. View extraction and view fusion in architectural un- derstanding. In Software Reuse, 1998. Proceedings. Fifth International Conference on, pages 290 {299, June 1998. 197 [84] Rick Kazman and S. Jeromy Carrire. Playing detective: Reconstructing software architecture from available evidence. Automated Software Engineering, 6:107{138, 1999. 10.1023/A:1008781513258. [85] M. G. Kendall and A. Stuart. The Advanced Theory of Statistics; Vol. II, Inference and Relationship. International Statistical Review, 1976. [86] Marouane Kessentini, St ephane Vaucher, and Houari Sahraoui. Deviance from perfection is a better criterion than closeness to evil when identifying risky code. In Proceedings of the IEEE/ACM international conference on Automated software engineering, pages 113{122. ACM, 2010. [87] G. Kiczales and E. Hilsdale. Aspect-Oriented Programming. Springer, 2003. [88] Miryung Kim, Vibha Sazawal, David Notkin, and Gail Murphy. An empirical study of code clone genealogies. In ESEC/FSE. ACM, 2005. [89] Kenichi Kobayashi, Manabu Kamimura, Koki Kato, Keisuke Yano, and Akihiko Matsuo. Feature-gathering dependency-based software clustering using dedication and modularity. In International Conference on Software Maintenance (ICSM), 2012. [90] R. Koschke. Atomic Architectural Component Recovery for Understanding and Evolution. PhD thesis, University of Stuttgart, 2000. [91] R. Koschke. Architecture reconstruction. Software Engineering, 2009. [92] Rainer Koschke. Incremental re exion analysis. In CSMR, 2010. [93] Philippe B Kruchten. The 4+ 1 view model of architecture. Software, IEEE, 1995. [94] A. Kuhn, S. Ducasse, and T. Grba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3), 2007. [95] Meir M Lehman. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 1980. [96] Charles E Leiserson, Ronald L Rivest, Cliord Stein, and Thomas H Cormen. Introduction to algorithms. The MIT press, 2001. [97] N. G. Leveson. Safeware: System Safety and Computers. Addison-Wesley, 1995. [98] Wei Li and Raed Shatnawi. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. Journal of Systems and Software, 2007. [99] J. Lin. Divergence measures based on the Shannon entropy. Information Theory, IEEE Transactions on, 37(1):145{151, 1991. [100] M. Lippert and S. Roock. Refactoring in Large Software Projects: Performing Complex Restructurings Successfully. Wiley, 2006. 198 [101] Y. Liu, D. Poshyvanyk, R. Ferenc, T. Gyimothy, and N. Chrisochoides. Modeling class cohesion as mixtures of latent topics. 2009. [102] Angela Lozano, Michel Wermelinger, and Bashar Nuseibeh. Assessing the impact of bad smells using historical information. 9th International Workshop on Principles of Software Evolution, 2007. [103] S. K Lukins, N. A Kraft, and L. H Etzkorn. Source code retrieval for bug localization using latent dirichlet allocation. In 2008 15th Working Conference on Reverse Engineering, page 155?164, 2008. [104] Isela Macia, Roberta Arcoverde, Elder Cirilo, Alessandro Garcia, and Arndt von Staa. Supporting the identication of architecturally-relevant code anomalies. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on, pages 662{665. IEEE, 2012. [105] Isela Macia, Alessandro Garcia, and Arndt von Staa. Dening and applying detec- tion strategies for aspect-oriented code smells. In Software Engineering (SBES), 2010 Brazilian Symposium on, pages 60{69. IEEE, 2010. [106] Isela Macia, Joshua Garcia, Daniel Popescu, Alessandro Garcia, Nenad Medvidovic, and Arndt von Staa. Are automatically-detected code anomalies relevant to archi- tectural modularity?: an exploratory analysis of evolving systems. In Proceedings of the 11th annual international conference on Aspect-oriented Software Development. ACM, 2012. [107] Sam Malek, Chiyoung Seo, Sharmila Ravula, Brad Petrus, and Nenad Medvidovic. Reconceptualizing a family of heterogeneous embedded systems via explicit archi- tectural support. In 29th International Conference on Software Engineering, 2007. [108] S. Mancoridis, B.S. Mitchell, Y. Chen, and E.R. Gansner. Bunch: A clustering tool for the recovery and maintenance of software system structures. In International Conference on Software Maintenance (ICSM), 1999. [109] Onaiza Maqbool and Haroon Babri. The weighted combined algorithm: A linkage algorithm for software clustering. In European Conference on Software Maintenance and Reengineering (CSMR), 2004. [110] Onaiza Maqbool and Haroon Babri. Hierarchical clustering for software architec- ture recovery. IEEE TSE, 2007. [111] Andrian Marcus and Jonathan I. Maletic. Recovering documentation-to-source- code traceability links using latent semantic indexing. In Proceedings of the 25th ICSE, ICSE '03, pages 125{135, Washington, DC, USA, 2003. IEEE Computer Society. [112] R Marinescu. Detection strategies: metrics-based rules for detecting design aws. In Proc. of the 20th IEEE International Conference on Software Maintenance, 2004. 199 [113] Radu Marinescu. Detection strategies: Metrics-based rules for detecting design aws. In Software Maintenance, 2004. Proceedings. 20th IEEE International Con- ference on, pages 350{359. IEEE, 2004. [114] Robert Cecil Martin. Agile software development: principles, patterns, and prac- tices. Prentice Hall PTR, 2003. [115] C.A. Mattmann, N. Medvidovic, P. Ramirez, and V. Jakobac. Unlocking the Grid. In Proc. of the 8th International SIGSOFT Symposium on Component-based Soft- ware Engineering, 2005. [116] Chris Mattmann, Daniel J. Crichton, Nenad Medvidovic, and Steve Hughes. A software architecture-based framework for highly distributed and data-intensive sci- entic applications. In ICSE, 2006. [117] Chris Mattmann, Joshua Garcia, Ivo Krka, Daniel Popescu, and Nenad Medvidovic. The anatomy and physiology of the grid revisited. In Joint Working IEEE/IFIP Conference on Software Architecture & 6th European Conference on Software Ar- chitecture (WICSA/ECSA), 2009. [118] Chris A. Mattmann, Joshua Garcia, Ivo Krka, Daniel Popescu, and Nenad Medvi- dovic. The anatomy and physiology of the grid revisited. Technical Report USC- CSSE-2008-820, Univ. of Southern California, 2008. [119] A.K. McCallum. Mallet: A machine learning for language toolkit. 2002. [120] Georey J McLachlan and Thriyambakam Krishnan. The EM algorithm and ex- tensions, volume 382. Wiley-Interscience, 2007. [121] N. Medvidovic and V. Jakobac. Using software evolution to focus architectural recovery. Automated Software Engineering, 2006. [122] Nenad Medvidovic and Vladimir Jakobac. Using software evolution to fo- cus architectural recovery. Automated Software Engineering, 13:225{256, 2006. 10.1007/s10515-006-7737-5. [123] N. R. Mehta, N. Medvidovic, and S. Phadke. Towards a taxonomy of software connectors. In Proc. of the 22nd International Conference on Software Engineering, 2000. [124] N. C Mendon ca and J. Kramer. An approach for recovering distributed system architectures. ASE Journal, 8(3):311{354, 2001. [125] T Mens and T Tourwe. A survey of software refactoring. IEEE TSE, January 2004. [126] Janardan Misra, KM Annervaz, Vikrant Kaulgud, Shubhashis Sengupta, and Gary Titus. Software clustering: Unifying syntactic and semantic features. In Working Conference on Reverse Engineering (WCRE), 2012. 200 [127] Brian S. Mitchell and Spiros Mancoridis. On the automatic modularization of software systems using the bunch tool. IEEE TSE, 2006. [128] Ran Mo, J. Garcia, Yuanfang Cai, and N. Medvidovic. Mapping architectural decay instances to dependency models. In Managing Technical Debt (MTD), 2013 4th International Workshop on, pages 39{46, 2013. [129] Naouel Moha, Yann-Gael Gueheneuc, Laurence Duchien, and A Le Meur. Decor: A method for the specication and detection of code and design smells. Software Engineering, IEEE Transactions on, 36(1):20{36, 2010. [130] H.A. M "uller, M.A. Orgun, S.R. Tilley, and J.S. Uhl. A reverse-engineering approach to subsystem structure identication. Journal of Software Maintenance: Research and Practice, 5(4):181{204, 1993. [131] Matthew James Munro. Product metrics for automatic identication of" bad smell" design problems in java source-code. In Software Metrics, 2005. 11th IEEE Inter- national Symposium, pages 15{15. IEEE, 2005. [132] Alessandro Murgia, Giulio Concas, Sandro Pinna, Roberto Tonelli, and Ivana Turnu. Empirical study of software quality evolution in open source projects using agile practices. In Proceedings of the First International Symposium on Emerging Trends in Software Metrics 2009. Lulu. com, 2009. [133] Gail C. Murphy, David Notkin, and Kevin Sullivan. Software re exion models: bridging the gap between source and high-level models. In FSE, 1995. [134] Rashid Naseem, Onaiza Maqbool, and Siraj Muhammad. Improved similarity mea- sures for software clustering. In European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 2011. [135] Barak Naveh et al. Jgrapht. Internet: http://jgrapht. sourceforge. net, 2008. [136] Georey Norman. Biostatistics : the bare essentials. B.C. Decker, Hamilton Lewis- ton, NY, 2008. [137] B. Oki, M. P uegl, A. Siegel, and D. Skeen. The Information Bus: an architec- ture for extensible distributed systems. In Proc. of the 14th ACM Symposium on Operating Systems Principles, 1994. [138] Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrea De Lucia, and Denys Poshyvanyk. Detecting bad smells in source code using change history information. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 268{278. IEEE, 2013. 201 [139] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 2011. [140] Dewayne E Perry and Alexander L Wolf. Foundations for the study of software architecture. ACM SIGSOFT SEN, 1992. [141] M. Pinzger and H. Gall. Pattern-supported architecture recovery. In International Workshop on Program Comprehension (IWPC), pages 53{61. IEEE, 2002. [142] Daniel Popescu, Joshua Garcia, Kevin Bierho, and Nenad Medvidovic. Impact analysis for distributed event-based systems. In Proceedings of the 6th ACM In- ternational Conference on Distributed Event-Based Systems, pages 241{251. ACM, 2012. [143] Martin F. Porter. Snowball: A language for stemming algorithms. Published online, October 2001. Accessed 11.03.2008, 15.00h. [144] Kata Praditwong, Mark Harman, and Xin Yao. Software module clustering as a multi-objective search problem. IEEE TSE, 2011. [145] Robert Clay Prim. Shortest connection networks and some generalizations. Bell system technical journal, 1957. [146] Radim Reh u rek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the International Conference on Language Re- sources and Evaluation (LREC) 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta. ELRA. http://is.muni.cz/publication/884893/ en. [147] R. Roshandel. Calculating architectural reliability via modeling and analysis. In Proc. of the 26th International Conference on Software Engineering, 2004. [148] Jacek Rosik, Andrew Le Gear, Jim Buckley, Muhammad Ali Babar, and Dave Connolly. Assessing architectural drift in commercial software development: a case study. Software: Practice and Experience, 2011. [149] Raghvinder S Sangwan, Pamela Vercellone-Smith, and Colin J Neill. Use of a mul- tidimensional approach to study the evolution of software complexity. Innovations in Systems and Software Engineering, 2010. [150] K. Sartipi. Alborz: a query-based tool for software architecture recovery. In Inter- national Workshop on Program Comprehension (IWPC), 2001. [151] Kamran Sartipi. Software architecture recovery based on pattern matching. Inter- national Conference on Software Maintenance (ICSM), 2003. [152] B. Schmerl, J. Aldrich, D. Garlan, R. Kazman, and H. Yan. Discovering architec- tures from running systems. IEEE TSE, pages 454{466, 2006. 202 [153] Robert W. Schwanke. An intelligent tool for re-engineering software modularity. In ICSE, 1991. [154] Robert W. Schwanke and Stephen Jos e Hanson. Using neural networks to modu- larize software. Machine Learning, 1994. [155] Chiyoung Seo, S Malek, G Edwards, D Popescu, N Medvidovic, B Petrus, and S Ravula. Exploring the role of software architecture in dynamic and fault tolerant pervasive systems. International Workshop on Software Engineering for Pervasive Computing Applications, Systems and Environments, 2007. [156] M. Shaw et al. Abstractions for software architecture and tools to support them. IEEE TSE, 1995. [157] M. Shaw and D. Garlan. Software architecture: perspectives on an emerging disci- pline. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1996. [158] Mark Shtern and Vassilios Tzerpos. A framework for the comparison of nested software decompositions. In Working Conference on Reverse Engineering (WCRE). IEEE, 2004. [159] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed le system. In 26th Symposium on Mass Storage Systems and Technologies, 2010. [160] M. Si and T. Reps. Identifying modules via concept analysis. IEEE TSE, 1999. [161] N Clayton Silver and William P Dunlap. Averaging correlation coecients: Should sher's z transformation be used? Journal of Applied Psychology, 1987. [162] M. Steyvers and T. Griths. Probabilistic topic models. Handbook of latent se- mantic analysis, 427, 2007. [163] O. Tatebe, Y. Morita, S. Matsuoka, N. Soda, and S. Sekiguchi. Grid datafarm architecture for petascale data intensive computing. In Proc. of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002. [164] R.N. Taylor, N. Medvidovic, and E.M. Dashofy. Software Architecture: Founda- tions, Theory, and Practice. 2009. [165] K. Tian, M. Revelle, and D. Poshyvanyk. Using latent dirichlet allocation for auto- matic categorization of software. In Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on, page 163?166, 2009. [166] J Tran, M Godfrey, E Lee, and R Holt. Architectural repair of open source software. International Workshop on Program Comprehension (IWPC), 2000. [167] Guilherme Travassos, Forrest Shull, Michael Fredericks, and Victor R Basili. De- tecting defects in object-oriented designs: using reading techniques to increase soft- ware quality. In ACM Sigplan Notices, volume 34, pages 47{56. ACM, 1999. 203 [168] V. Tzerpos. Comprehension-driven software clustering. PhD thesis, University of Toronto, 2001. [169] V. Tzerpos and R.C. Holt. ACDC: an algorithm for comprehension-driven cluster- ing. In Working Conference on Reverse Engineering (WCRE), 2000. [170] Eva Van Emden and Leon Moonen. Java quality assurance by detecting code smells. In Reverse Engineering, 2002. Proceedings. Ninth Working Conference on, pages 97{106. IEEE, 2002. [171] Jilles van Gurp, Sjaak Brinkkemper, and Jan Bosch. Design preservation over subsequent releases of a software product: a case study of baan erp. Journal of Software Maintenance and Evolution: Research and Practice, 2005. [172] Zhihua Wen and Vassilios Tzerpos. An eectiveness measure for software clustering algorithms. In International Workshop on Program Comprehension (IWPC). IEEE, 2004. [173] Michel Wermelinger, Yijun Yu, Angela Lozano, and Andrea Capiluppi. Assessing architectural evolution: a case study. Empirical Software Engineering, 2011. [174] Theo A Wiggerts. Using clustering algorithms in legacy systems remodularization. In Working Conference on Reverse Engineering (WCRE), 1997. [175] Robert Witte. Statistics. J. Wiley & Sons, Hoboken, NJ, 2010. [176] K. Wong, S.R. Tilley, H.A. Muller, and M.A.D. Storey. Structural redocumentation: A case study. Software, IEEE, 12(1):46{54, 1995. [177] J. Wu, A.E. Hassan, and R.C. Holt. Comparison of clustering algorithms in the context of software evolution. In International Conference on Software Maintenance (ICSM), 2005. [178] J. Wu and M.A.D. Storey. A multi-perspective software visualization environment. In CASCON, 2000. [179] Alexander S. Yeh, David R. Harris, and Melissa P. Chase. Manipulating recovered software architecture views. Software Engineering, International Conference on, 0:184, 1997. [180] Thomas Zimmermann, Stephan Diehl, and Andreas Zeller. How history justies system architecture (or not). In Software Evolution, 2003. Proceedings. Sixth In- ternational Workshop on Principles of. IEEE, 2003. 204
Abstract (if available)
Abstract
The effort and cost of software maintenance tends to dominate other activities in a software system's lifecycle. A critical aspect of maintenance is understanding and updating a software system's architecture. However, the maintenance of a system's architecture is exacerbated by the related phenomena of architectural drift and erosion—collectively called architectural decay—which are caused by careless, unintended addition, removal, and/or modification of architectural design decisions. These phenomena make the architecture more difficult to understand and maintain and, in more severe cases, can lead to errors that result in wasted effort or loss of time or money. To deal with architectural decay, an engineer must be able to obtain (1) the current architecture of her system and understand (2) the symptoms of decay that may occur in a software system and (3) the manner in which architectures tend to change and the decay it often causes. ❧ The high‐level contribution of this dissertation is a unified framework for addressing different aspects of architectural decay in software systems. This framework includes a catalog comprising an expansive list of architectural smells (i.e., architectural‐decay instances) and a means of identifying such smells in software architectures
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Architectural evolution and decay in software systems
PDF
Domain-based effort distribution model for software cost estimation
PDF
Proactive detection of higher-order software design conflicts
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Techniques for methodically exploring software development alternatives
PDF
Value-based, dependency-aware inspection and test prioritization
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
A value-based theory of software engineering
PDF
Incremental development productivity decline
PDF
Composable risk-driven processes for developing software systems from commercial-off-the-shelf (COTS) products
PDF
Shrinking the cone of uncertainty with continuous assessment for software team dynamics in design and development
PDF
A model for estimating schedule acceleration in agile software development projects
PDF
Using metrics of scattering to assess software quality
PDF
Improved size and effort estimation models for software maintenance
PDF
Domain specific software architecture for large-scale scientific software
PDF
The incremental commitment spiral model process patterns for rapid-fielding projects
PDF
A system framework for evidence based implementations in a health care organization
Asset Metadata
Creator
Garcia, Joshua
(author)
Core Title
A unified framework for studying architectural decay of software systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Software Engineering)
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
architectural decay,architectural evolution,architectural smell,architecture recovery,ground-truth architecture,OAI-PMH Harvest,Software Architecture
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medvidović, Nenad (
committee chair
), Halfond, William G. J. (
committee member
), Settles, F. Stan (
committee member
)
Creator Email
fusionex@gmail.com,joshua.nsl.garcia@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-463954
Unique identifier
UC11287934
Identifier
etd-GarciaJosh-2842.pdf (filename),usctheses-c3-463954 (legacy record id)
Legacy Identifier
etd-GarciaJosh-2842.pdf
Dmrecord
463954
Document Type
Dissertation
Rights
Garcia, Joshua
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
architectural decay
architectural evolution
architectural smell
architecture recovery
ground-truth architecture