Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Techniques for methodically exploring software development alternatives
(USC Thesis Other)
Techniques for methodically exploring software development alternatives
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TECHNIQUES FOR METHODICALLY EXPLORING
SOFTWARE DEVELOPMENT ALTERNATIVES
by
Arman Shahbazian
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2018
Copyright 2018 Arman Shahbazian
Dedication
To
my mom, brother, and loved ones
for their encouragement, endless support, and sel
ess sacrices.
Also
to those who are no longer with me today.
You are always close in heart.
ii
Acknowledgments
Every PhD is a story of hard-work, sacrice, rejection, despair, and a few moments of
glory. Hopefully, that story nds its way into a dissertation which means that, against all
odds, it has come to a happy conclusion. Just like anything worthwhile in this life, a PhD
is the result of teamwork and collaboration. In my journey, I had a series of wonderful
people who helped me along the way, and now I wish to acknowledge and thank them.
To do them justice, I should write a dissertation-length manuscript about how lucky and
grateful I am to have had them in my life. However, due to considerations involving
space, USC's dissertation submission deadline, and carpal tunnel syndrome, I hope they
will accept the following pages as a substitute.
First, I want to express my deepest gratitude to my advisor, Nenad (Neno) Medvi-
dovic. Neno is a smart, caring, and patient professor. These qualities in tandem with his
expertise in the realm of software architecture helped me focus my research, formulate rel-
evant and interesting problems, and present in this dissertation what I believe is a strong
contribution to the eld of software engineering. I feel truly fortunate to have worked
with Neno, who is indubitably not only the best PhD advisor
1
, but also an amazing life
mentor. I sincerely hope to continue learning from him in the future.
1
My personal experience and anecdotal evidence obtained from numerous discussions with students
and colleagues suggest that Neno is the coolest PhD advisor as well.
iii
My dissertation would not have made it this far without the guidance of my qualifying
exam and dissertation committees. I am grateful to the members of my committees:
Professors Barry Boehm, Sandeep Gupta, William G.J. Halfond, and Chao Wang.
Regardless of all the hard word, sacrice, good luck, and consumed caeinated drinks
2
,
success is largely a factor of people who were part of our journeys. I owe a great debt
of gratitude to Neno's research group and its alumni. They have taught me an immense
amount about conducting software-engineering research and surviving the PhD life. My
fellow group members and group alumni have encouraged and inspired me. They have
taken time out to review my papers and presentations, write papers with me, or listen
to and critique my
edgling ideas. In some cases, they would help me formulate and
esh out ideas or run experiments. Without their support, I would never have made it
through. In the beginning of my PhD, I worked most closely with Reza Sa. Working
with him helped me understand how a research project comes about and the painstaking
eort required to make it ready for publication. Despite having graduated many years
ago, George Edwards helped me tremendously in creating what formed the basis for this
dissertation. I also had the opportunity of working with Josh, Jae, Youn, Duc, Pooyan,
Daniel, Yixue, and Daye. I am immensely grateful to all of them.
I would like to issue my heartfelt thanks to Dr. Marija Mikic, who mentored me
during my internships at Google and provided invaluable support afterwards. I hope to
continue working with and learning from her in the future.
2
In my case, these include, but are not limited to, 2605 cups of tea, 4342 cups of coee, and a couple
of hundred cans of energy drinks.
iv
Besides working with Neno's research group, one of the great opportunities I have
had was to collaborate with a variety of other researchers. I am lucky to have published
an ICSA paper with Michael Langhammer. I am honored to have been able to publish
an ESEC/FSE paper with G.J., and learn from him in general. I am also grateful to
professors David Rosenblum, Mehdi Jazayeri, Sam Malek, and Mehdi Mirakhorli for
their help and support.
Performing administrative tasks at USC would be signicantly more dicult without
the support of the sta at USC. To Lizsl, thank you for all your help and support during
the PhD program. To Jack and Julie, I appreciate all your assistance with completing
administrative tasks. The eorts of these and other sta members have helped to simplify
a complex journey.
No one can survive the brutal rejections of academia without strong personal support.
I have received such support from my friends around the world. Special thanks to Ehsan,
Hana, Hesam, Hossein, Majid, Mohammad, Morteza, Reihane, Roozbeh, Setareh, and
Sumita who were sources of joy, laughter, and support during the past ve years. I am
grateful to Setareh who has stood by me for more than two years no matter what and
always received me with open arms. I also wish to thank her because she drew some of
the diagrams in this dissertation
3
, read my work, and patiently listened to my practice
talks.
This brings me to the nal set of people who were absolutely crucial to my formal
education: my family. From a very young age, my family has encouraged me to pursue
my interests in the elds of mathematics and sciences. My brother was the rst person
3
The better looking ones, in case you are wondering which diagrams.
v
to introduce me to computers. He paved the way by immigrating halfway around the
world to pursue computer science and I followed in his footsteps. Whatever credit I
deserve for my research and achievements, my family deserves many times that much
credit. I am forever thankful to my family for their unconditional love and unwaivering
encouragement. Without them, I would never have been able to write this dissertation.
I can only hope that I can pay a fraction of your kindness back in my lifetime. I love you
with all my heart!
It's a clich e, but a valid one: life is a journey, not a destination, and so is PhD
4
.
To everyone who took part in my journey, and to those I did not mention by name but
nonetheless treasure
5
, thank you! I hope that I will be able to give back for all that you
have done for me and make your aspirations as easy to achieve as you have made mine.
4
Although unlike life, USC requires PhDs to be completed in ve to seven years.
5
I also promise to buy you a cup of coee the next time I see you.
vi
Table of Contents
Dedication ii
Acknowledgments iii
List Of Tables ix
List of Figures xi
Abstract xiii
Chapter 1 Introduction 1
1.1 Motivation and Problem Description 2
1.2 Insights and Hypotheses 5
1.3 Dissertation's Solution and Contributions 7
1.4 Structure of This Dissertation 12
Chapter 2 Foundation 13
2.1 Software Architecture and Design Decisions 13
2.2 Architecture Recovery 15
2.3 Software Repositories 21
2.4 Automated Software Analysis 24
Chapter 3 Software Design Exploration Techniques 28
3.1 RecovAr 28
3.1.1 Change Analysis 31
3.1.2 Mapping 35
3.1.3 Decision Extraction 37
3.2 PredictAr 40
3.2.1 Recovering Signicant Issues 41
3.2.2 Predicting Signicant Issues 44
vii
3.3 eQual 53
3.3.1 Running Example 55
3.3.2 Modeling 57
3.3.3 Preparation 64
3.3.4 Selection 68
3.3.5 Assessment 72
Chapter 4 Evaluation 78
4.1 RecovAr 79
4.1.1 Applicability 79
4.1.2 Precision 83
4.1.3 Recall 90
4.1.4 Threats to Validity 93
4.2 PredictAr 94
4.2.1 Subject Systems 95
4.2.2 Architectural Signicance Analysis 96
4.2.3 Accuracy of Signicance Prediction 97
4.2.4 Threats to Validity 99
4.3 eQual 100
4.3.1 Usability 101
4.3.2 Eectiveness 104
4.3.3 Scalability 110
4.3.4 Threats to Validity 113
Chapter 5 Related Work 115
5.1 Architecture-Based Software Analysis 116
5.2 Software Quality Assessment 120
Chapter 6 Concluding Remarks 124
6.1 Future Work 128
References 130
Appendix A
Screenshots 145
viii
List Of Tables
3.1 Examples of recovered Hadoop decisions. 40
3.2 Bag of words representation of two issue sentences I
1
and I
2
. 49
3.3 One-hot encoding of the I
1
and I
2
. 49
3.4 Hadoop variation points and representative bounds [25,26]. 57
4.1 Subject systems analyzed in our study. 80
4.2 Number of changes, recovered decisions, and frequencies of issues and
changes per decision for our subject systems grouped by the employed
recovery methods. 80
4.3 Average scores of recovered decisions per recovery-technique for Hadoop
and Struts. 86
4.4 RecovAr's recall before (top row) and after (bottom row) the clean-up of
the raw data. 93
4.5 Subject systems used for PredictAr's analyses. 95
4.6 Analyzed issues' general distribution. Issue types are bug (B), feature/im-
provement (F/I ), or other (O). Issue priority is critical (C ), major (Mj ),
or minor (Mn). 98
4.7 Overview of the results of our architectural signicance analysis under the
ARC recovery technique. Issue types are bug (B), feature/improvement
(F/I ), or other (O). Issue priority is critical (C ), major (Mj ), or minor
(Mn). 98
ix
4.8 Overview of the results of our architectural signicance analysis under the
ACDC recovery technique. Issue types are bug (B), feature/improvement
(F/I ), or other (O). Issue priority is critical (C ), major (Mj ), or minor
(Mn). 98
4.9 Precision and recall of PredictAr's classier for each system. The Cross-
Project row shows the result of applying the classier on the combined
issues of all systems. 99
4.10 Software systems used to study the eectiveness of eQual. Var. Points
is the number of variation points in each system; Terms is the number
of terms in the systems' tness models; and Size is the total number of
variants in the design space. 104
4.11 Comparison between the two seeding strategies employed by eQual and the
quality of the nominal solutions commonly selected by architects. Default
depicts the common solutions used by architects, obtained from [151]. 106
x
List of Figures
2.1 Architecture recovery and analysis components and artifacts leveraged in
this dissertation. 16
2.2 A high-level overview of the proposed process for leveraging an MDE plat-
form. 25
3.1 Schematic overview of eQual, PredictAr, and RecovAr depicting their re-
lationship and applicability at dierent stages of software development. 29
3.2 Overview of RecovAr. Using the existing source code, commit logs, and
extracted issues obtained from code and issue repositories, our approach
automatically extracts the underlying design decisions. Implementation of
the new components spans over 4,000 Source Lines of Code (SLoC). 30
3.3 Calculating the costs of the edges and nding the perfect matching. The
bold connectors are the selected edges that lead to minimum overall cost. 33
3.4 Extracted changes between the architectures depicted in Fig. 3.3. Double-
lined diamonds indicate removals while regular diamonds denote additions. 36
3.5 Architectural impact list. Squares represents issues and diamonds repre-
sent entities. An edge from an issue to an entity means that resolving that
issue resulted in modifying that entity. 36
3.6 The overarching decisions graph contains two decisions D1 and D2. Squares
denote issues, and circles denote changes. 37
3.7 Framework for the identication of architecturally signicant issues in a
software project. 41
3.8 Work
ow for building our automatic architectural signicance classier. 52
xi
3.9 eQual's architecture. The Design Environment and Simulation Engine
components are provided by DomainPro [145,146]. 54
3.10 Hadoop system model in a domain-specic language [25]. 56
3.11 Software development process supported by DomainPro. 60
3.12 Reliability time-series of a Hadoop variant that uses machines with 90%
reliability and no redundancy. 71
3.13 Reliability time-series of a Hadoop variant that uses machines with 50%
reliability and redundancy factor of 2. 71
3.14 eQual's visualization of two candidate Hadoop variants showing their re-
spective values for the four properties of interest. 74
4.1 Distribution of types of decision in Hadoop: solid black denotes simple
decisions; grey denotes compound decisions; white denotes cross-cutting
decisions. 82
4.2 Distribution of types of decision in Struts: solid black denotes simple de-
cisions; grey denotes compound decisions; white denotes cross-cutting de-
cisions. 82
4.3 Smoothed cumulative distribution of the decision scores for Hadoop. 88
4.4 Smoothed cumulative distribution of the decision scores for Struts. 89
4.5 Classication of the recovered decisions based on the satised criteria. 90
4.6 Optimal Proximity distribution of the best variant generated by eQual
using generation sizes of 50, 100, and 200. Each box plot comprises 100
executions of eQual's exploration function. The y-axes of three of the
systems legends range 0.6{1, whereas the y-axes on the rest start at 0. This
is done to enable better visualization of the variants' quality distributions
for systems with smaller ranges. 107
4.7 eQual's scalability with respect to the number of simulation nodes. 110
4.8 Sizes of data les generated by simulations for dierent values of Compu-
tation Size (c). 111
xii
4.9 eQual's scalability with respect to the number of variants. 113
A.1 An example of a meta-model designed in DomainPro. 145
A.2 An example of a system designed in DomainPro using the metamodel
shown in Figure A.1. 146
A.3 An example of the standalone analyses capabilities of DomainPro. 146
A.4 Screenshot depicting the interface used to retrieve the answers to the vari-
ation point questions eQual asks. 147
A.5 Screenshot depicting the interface used to retrieve the answers to the non-
functional property questions eQual asks. 147
A.6 eQual's visualization of dierent design variants using radar charts. 148
xiii
Abstract
Designing and maintaining a software system's architecture typically involve making nu-
merous design decisions, each potentially aecting the system's functional and nonfunc-
tional properties. Understanding these design decisions can help inform future decisions
and implementation choices, and can avoid introducing architectural ineciencies later.
Unfortunately, the support for engineers to make these decisions is generally lacking.
There is a relative shortage of techniques, tools, and empirical studies pertaining to ar-
chitectural design decisions. Moreover, design decisions are rarely well documented and
are typically a lost artifact of the architecture creation and maintenance process. The
loss of this information can thus hurt development. To address these shortcomings, we
develop a set of techniques to enable methodical exploration of such decisions and their
eects.
We develop a technique, named RecovAr, for automatically recovering design decisions
from the project's readily available history artifacts, such as an issue tracker and version
control repository. RecovAr uses state-of-the-art architectural recovery techniques on a
series of version control commits and maps those commits to issues to identify decisions
that aect system architecture. While some decisions can still be lost through this process,
our evaluation on two large open-source systems with over 8 years of development each
xiv
shows that RecovAr has the recall of 75% and a precision of 77%. To create RecovAr,
we formally dene architectural design decisions and develop an approach for tracing
such decisions in project histories. Additionally, the work introduces methods to classify
whether decisions are architectural and to map decisions to code elements.
Building on RecovAr, we create PredictAr. PredictAr aims to prevent the conse-
quences of inadvertent architectural change. The result of such changes is accumulation
of technical debt and deterioration of software quality. In this dissertation we take a step
toward addressing that scarcity by using the information in the issue and code repos-
itories of open-source software systems to investigate the cause and frequency of such
architectural design decisions. We develop a predictive model that is able to identify the
architectural signicance of newly submitted issues, thereby helping engineers to prevent
the adverse eects of architectural decay. The results of this study are based on the
analysis of 21,062 issues aecting 301 versions of 5 large open-source systems for which
the code changes and issues were publicly accessible.
We close the loop by helping engineers to not only predict and recover architectural de-
sign decisions, but also make new design decisions that are informed and well-considered.
Recent studies have shown that the number of available design alternatives grows rapidly
with system size, creating an enormous space of intertwined design concerns. This disser-
tation presents eQual, a novel model-driven technique for simulation-based assessment
of architectural designs that helps architects understand and explore the eects of their
decisions. We demonstrate that eQual eectively explores massive spaces of design alter-
natives and signicantly outperforms state-of-the-art approaches, without being cumber-
some for architects to use.
xv
Chapter 1
Introduction
Software architecture has become the centerpiece of modern software development [157].
Developers are increasingly relying on software architecture to lead them through the
process of creating and implementing large and complex systems. Understanding of a
software system's architecture and the set of decisions that led to its creation is crucial
for making new decisions about the system both at the design and implementation levels.
Further, engineers who are aware of the architectural impacts of their changes deliver
higher-quality code [126]. Despite their far-reaching implications, design decisions are
rarely documented and are typically a lost artifact of the architecture creation and main-
tenance process. To better approach this problem, an engineer needs to be able to do the
following:
1. Recover undocumented design decisions in a system.
2. Know when committing changes can cause architecture of the system to change.
3. Objectively assess the eects of dierent design decisions on the system's quality.
1
The rest of this chapter is organized as follows. Section 1.1 expounds the motivation
of this dissertation. Section 1.2 formulates the hypotheses driving our research and the
intuitions behind them. Section 1.3 overviews our solutions and contributions. Finally,
Section 1.4 describes the structure of this dissertation.
1.1 Motivation and Problem Description
Challenges of Making Design Decisions
It is known that the design decisions making up a system's architecture directly impact
its quality [158]. Ideally, individual design decisions would be carefully assessed and doc-
umented to ensure that these decisions satisfy the system's requirements, however, this
is frequently not done in practice. This leads to a host of problems, such as accumula-
tion of technical debt and deterioration of software quality [38, 92, 181]. Undocumented
design decisions are also the core reason knowledge vaporization occurs in software sys-
tems [77]. Knowledge vaporization plays a major role in increasing maintenance costs
and exacerbating architectural drift and erosion [77]. For these reasons, it is important
to provide supporting techniques to help engineers understand a system's constituent de-
sign decisions. Understanding these design decisions can help inform future decisions and
implementation choices and can avoid introducing regressions and architectural inecien-
cies later. Further, design decisions that are not carefully assessed and are suboptimal,
signicantly hamper a system's ability to fulll its functional and non-functional require-
ments.
2
A well publicized recent example is the Healthcare.gov portal (a.k.a. Obamacare) that
was marred with serious technical problems at launch [98,117,165]. Several studies have
pointed out that the failures causing downtimes of up to 60% were due to
awed architec-
tural and deployment design decisions [166,167]. As a result, Obamacare's development
costs, originally estimated to be$100M, surpassed $1.5B [96].
It is important to note that architectural design decisions are made at dierent stages
of a software development project spanning a wide range of issues, from system struc-
ture, to behavior, interaction, deployment, evolution, and a variety of non-functional
properties [157]. Engineers face distinct challenges depending on the stage at which the
software development project is. In long-lived software systems, engineers should be
aware of the previous design decisions to prevent changes that can invalidate those earlier
decisions. In practice, this requires recovering the undocumented design decisions, and
making engineers aware of the potential architectural consequences of their code changes.
Furthermore, engineers need to be able to objectively assess the eects of their decisions
in order to make the \right" choices that \best" satisfy the system requirements. Achiev-
ing these tasks manually is dicult, and there is insucient support for engineers to do
so automatically. Although the precise number of these decisions had not been quantied
to date, but it had been long accepted that it grew quickly with the complexity of the
system [32]. The diculty of designing software systems lies not only in the number of de-
sign decisions, but also in the fact that there are many intertwined factors and trade-os
that need to be taken into consideration [30,31,132].
3
Existing Approaches
To address these problems, considerable eort has been devoted to studying software
architecture from dierent perspectives. Kruchten et al. proposed the 4+1 view model
to describe software architecture using ve concurrent views addressing specic concerns
[89]. A separate research thread has focused on architecture description languages (ADLs)
[58, 110, 124] for describing and modeling software systems' architectures. Architectures
of the several now widely adopted systems such as mobile systems [11], and World Wide
Web [51] has been extensively studied. More recently, researchers have recognized the
importance of software architecture in the evolution and maintenance of software systems
[56,92,147] which has led to the design of several architectural recovery techniques to help
counteract the challenges brought about by architectural drift and erosion [42,56,57,164].
Numerous techniques and methodologies have been proposed to estimate the eects of
design decisions on the quality of the produced software system [8,49,91]. Despite these
eorts, architectural decay is still evident during the evolution of many, if not most,
software systems [93], and bad design decisions still creep into software systems [98,126].
The existing approaches that aim to help engineers make better design decisions, rely
on static or dynamic analysis for assessing software models that represent the variants
stemming from dierent decisions. Static analysis techniques (e.g., [22, 64, 107]) tend to
require architects to develop complex mathematical models, which imposes steep learning
curves, signicant modeling eort, and limits on the resulting system's scalability. Addi-
tionally, depending on the mathematical models they rely on (e.g., Markov chains [61],
event calculi [85], or queueing networks [23]) these techniques are conned to specic kinds
of software system models [8]. While they come with shortcomings of their own (e.g.,
4
false negatives, longer execution times), dynamic analysis techniques | i.e., architectural
model simulations [46,99] | are more capable of capturing the nondeterminism re
ective
of reality [81] and are more amenable to constructing models that are tailored to the
task at hand. Despite notable eorts [41, 104, 170], simulations of software architectural
models have not been as widely employed as traditional static analyses [8], due to at least
four reasons. First, creating simulatable system design models is acknowledged as di-
cult [46]. Second, running simulations on complex models is time consuming, mandating
that scalability issues be explicitly addressed [130]. Third, quantitative assessment of de-
sign variants is a complex computational problem because of the large number of involved
trade-os [108]. Finally, depending on the size of the simulations, massive datasets may
need to be analyzed and understood to assess system behavior.
This dissertation targets the discussed challenges inherent in making design decisions,
and the shortcomings of existing approaches. More specically, we develop techniques to:
1. Recover architectural design decisions in existing software systems.
2. Predict architectural signicance of implementation issues, i.e., whether resolving
those issues would lead to architectural change.
3. Assess the quality of architectural designs and design variants.
1.2 Insights and Hypotheses
In this section, we discuss the hypotheses that we plan to test in this dissertation and
the insights from which these hypotheses are drawn. The insights are mainly drawn from
5
our knowledge of the software engineering domain and the underlying techniques that are
relevant to detection and assessment of architectural design decisions.
Hypothesis 1
Insight 1A: A set of principal design decisions yield the architecture of a system. Fre-
quently, these decisions deviate from the architects' well-considered intent, and software
systems regularly exhibit increased architectural decay as they evolve.
Insight 1B: Descriptions of the changes made to a software system are typically tracked
by issue repositories, in which system stakeholders report bugs, describe perfective or
adaptive changes, discuss re-engineering the system, etc.
Hypothesis 1: Architectural design decisions can be recovered and classied using the
information in a system's issue and code repositories with high precision and recall.
Hypothesis 2
Insight 2A: Software systems decay over time because newly added design decisions
remove or modify existing design decisions without being well-considered.
Insight 2B: Issue trackers generally contain information about the problem and a de-
scription of possible solutions that need to get implemented.
Hypothesis 2: A technique can be devised that can accurately predict the architectural
signicance of an issue.
6
Hypothesis 3
Insight 3A: Architects generally consider several alternatives while making design deci-
sions.
Insight 3B: In real-life software systems, there are many trade-os and intertwined
factors in play that are hard to analyze and compare manually.
Insight 3C: Discrete event simulations can be representative of a system's real behavior,
due to their ability to accurately model a system and its non-determinism.
Hypothesis 3: We can develop a technique that with a small burden helps architects
simultaneously explore and objectively compare thousands of software development al-
ternatives.
1.3 Dissertation's Solution and Contributions
This dissertation presents three main techniques to address the challenges brought about
by architectural design decisions at dierent stages of a software project: 1 RecovAr,
2 PredictAr, and 3 eQual. These techniques directly map to our hypotheses as listed
in Section 1.2. In the remainder of this section, we will overview each technique and its
contributions.
RecovAr
To test hypothesis 1, we developed RecovAr, a technique that uses a project's readily
available history artifacts (e.g., an issue tracker or code repository), to automatically
7
recover the architectural design decisions embodied in that system. RecovAr is indepen-
dent of a system's architectural paradigm (e.g., component-based, microservices, SOA,
system-of-systems), however, it does assume the existence of suitable means of obtain-
ing static architectural structure from implementation artifacts. Specically, our work
discussed in this dissertation relies on two existing techniques that recover such archi-
tectural structure from code, ACDC [164] and ARC [57]. Existing literature refers to
these and similar techniques as \architecture recovery techniques", and the produced
artifacts as \architectures". For legacy and simplicity reasons, we will also use this ter-
minology in this dissertation, with the understanding that what is recovered is only a
partial, structural view of a system's static architecture. By recovering this architectural
information at multiple points during the system's development and mapping the changes
in the structure to the rich information available in the system's issue tracking systems,
RecovAr can recover many of the system's design decisions and traceability links between
those decisions and code changes, issues, and other documentation.
We applied RecovAr to over 100 versions of Hadoop and Struts, two large, widely
adopted open-source systems, with over 8 years of development each and, on average,
more than 1 million lines of code. We found that RecovAr can accurately uncover the
architectural design decisions embodied in the systems, recovering 75% of the decisions
with a precision of 77%.
In creating RecovAr we make the following contributions:
We formally dene the notion of an architectural design decision and develop a
technique for tracing such decisions in existing software project histories.
8
We introduce methods to classify whether decisions are architectural and to map
decisions to code elements.
We empirically examine how design decisions manifest in software systems, evalu-
ating our technique on two large, widely-used systems.
We develop a methodology for preserving design-decision knowledge in software
projects.
PredictAr
To test our second hypothesis, we create a technique to predict the architectural signif-
icance of implementation issues. PredictAr aims 1 to automatically identify the issues
leading to architectural design decisions in existing systems and 2 to predict the probable
impact of implementation-level issues and resulting system changes on those decisions.
Despite their far-reaching implications [158], there is a relative shortage of techniques,
tools, and empirical studies pertaining to the role that architectural design decisions play
in long-lived systems. Existing techniques typically focus on undoing the side-eects of
architectural decay ex post facto (e.g., [155, 168]), unlike these techniques, we develop a
predictive model that is able to identify the architectural signicance of newly submitted
issues. Similar to RecovAr, PredictAr uses the readily available information in the issue
and code repositories of software systems and enables engineers to investigate the causes
and frequencies of making new and modifying existing architectural design decisions.
To that end, we mine the issue and code repositories of ve large open-source soft-
ware systems to extract their issues and pertinent code changes. In order to detect the
architectural changes in a system, we use a state-of-the-art software architecture analysis
9
workbench [17]. For each issue, we recover the architecture of the system before and after
its resolution. We use a2a [17], a metric specically designed for measuring architectural
change, to identify the issues causing architectural changes. We call these issues architec-
turally signicant. We have shown in RecovAr that, a2a can be used to accurately recover
architectural design decisions [147]. PredictAr enables engineers to prevent architectural
decay before the oending code changes are committed and merged with the system's
code-base.
PredictAr adds the following contributions on top of RecovAr:
A classier for predicting the architectural signicance of newly submitted issues.
A reusable dataset of 21,062 issues identied across ve large open-source software
systems that are labeled by their architectural signicance.
eQual
Aimed at our third hypothesis, eQual is a model-driven architecture-based technique that
enables simulation-driven assessment of architectural designs and design variants. eQual
is designed to be easy to use, eective in producing accurate solutions, and scalable to
large models. eQual enables engineers to explore a potentially massive space of design
decisions while minimizing the burden on the engineers. eQual does so by asking a small
number of relatively simple questions about the system the engineers are designing and
their preferences for the system. In response, eQual automatically builds the requisite
system models, runs simulations in the background, and delivers to the engineers the
ranked list of variants that correspond to their current design choices. The engineers can
10
then adjust their preferences, tighten or relax the acceptable bounds on system parame-
ters, or explore other variants, to reach the most appropriate architecture for the problem
at hand.
We evaluate eQual in three ways, corresponding to its three goals stated above. First,
we demonstrate that using eQual is signicantly easier than the state-of-the-art alterna-
tive that targets the same problem, GuideArch [49]. As a representative illustration, an
architect only needed to spend six minutes to interactively answer the 27 questions eQual
requires to analyze a recently published model of Hadoop [25,26], while GuideArch's 356
inputs required more than four hours. Second, we extensively evaluate eQual's eec-
tiveness. We show that the quality of the designs eQual produces is higher than that
of GuideArch. We additionally evaluate the quality of designs yielded by eQual against
previously published ground-truths obtained from real-world software systems. We show
that eQual recommends variants that are of comparable quality to the real-world solu-
tions determined to be optimal, while it signicantly outperforms the nominal solutions
that are most commonly selected by architects. Third, we demonstrate that eQual scales
optimally with the number of available computational nodes, system events, and system
variants. In our experiments, eQual was able to analyze tens of thousands of variants
and terabytes of simulation-generated data.
eQual's primary contributions are:
A method for automatically generating architectural assessment models from simple
inputs that architects provide.
11
Bipartite Relative Time-series Assessment, a technique for ecient, distributed
analysis of simulation-generated data, solving a previously prohibitively inecient
variant-assessment problem.
An architecture for seamlessly distributing and parallelizing simulations to multiple
nodes.
An evaluation of each of eQual's three goals on real-world systems, comparing to the
state-of-the-art alternative, and demonstrating its ease of use, accuracy of produced
solutions, and scalability.
An extensible platform for using general-purpose or proprietary evolutionary algo-
rithms to automate design-space exploration.
1.4 Structure of This Dissertation
The remainder of this dissertation is organized as follows: Chapter 2 overviews the founda-
tional work enabling this dissertation. The proposed solutions are discussed in Chapter 3.
Chapter 4 describes the evaluation followed by Chapter 5 which covers the related work.
Finally, Chapter 6 lays out the future directions, and concludes this dissertation.
12
Chapter 2
Foundation
This section describes some of the concepts we will use throughout this dissertation. In
particular, we focus on software architecture, architecture recovery techniques used, and
software quality assessment.
2.1 Software Architecture and Design Decisions
For many years, research community and industry alike had been focused on the result,
the consequences of the design decisions made, trying to capture them in the architecture
of the system under consideration, often using a graphical representation. Such represen-
tations were, and to a great extent still are, centered on views [89], as captured by the
ISO/IEC/IEEE 42010 standard [34], or the use of an architecture description language
(ADL) [110]. However, this approach to documenting software architectures can cause
problems such as expensive system evolution, lack of stakeholders communication, and
limited reusability of architectures [149].
13
Architecture as a set of design decisions was proposed to address these shortcomings.
This new paradigm focuses on capturing and documenting rationale, constraints, and
alternatives of design decisions [172]. More specically Jansen et al. dened architec-
tural design decisions as a description of the set of architectural additions, subtractions
and modications to the software architecture, the rationale, the design rules, and ad-
ditional requirements that (partially) realize one or more requirements on a given archi-
tecture [24, 77]. The key element in their denition is rationale, i.e., the reasons behind
an architectural design decision. Kruchten et al. [88] proposed an ontology that classied
architectural decisions into three categories:
1. Existence decisions (ontocrises).
2. Property decisions (diacrises).
3. Executive decisions (pericrises).
Among the three categories, existence decisions | decisions that state some element or
artifact will positively show up or disappear in a system | are the most prevalent and
capture the most volatile aspects of a system [77, 88]. Property and executive decisions
are enduring guidelines that are mostly driven by the business environment and aect
the methodologies, and to a large extent the choices of technologies and tools.
Inspired by the described existing work, the notion of design decisions used in this
dissertation values the rationales and consequences as two equally important constituent
parts of design decisions. However, not all design decisions are created equal. Some design
decisions are straightforward, with clear singular rationale and consequence, while some
are cross-cutting and intertwined [24], i.e., aect multiple components and/or connectors
14
and often become intimately intertwined with other design decisions. To distinguish
between dierent kinds of design decisions we classify them into three categories:
1. Simple decisions that have a singular rationale and consequence.
2. Compound decisions that include several closely related rationales, but their conse-
quences are generally contained to one component.
3. Cross-cutting decisions that aect a wider range of components and their rationale
follows a higher level concern such as architectural quality of the system.
More precisely, we narrow the denition of architectural design decisions as the description
(rationale) of a set of architectural additions, subtractions, and modications to the
software architecture. This denition is also in tune with the architectural design decision
as dened by Jansen et al. [77], and Bosch [24]. In section 3.1, we further elaborate
architectural design decisions based on their complexity and impact.
2.2 Architecture Recovery
To capture the consequence aspect of design decisions, we build on top of our existing
work in architecture modeling and recovering. To that end, we use ARCADE [17, 93],
the workbench developed in our prior work, to obtain the static architectures of a system
from its source code. To study architectural change and decay, ARCADE provides the
following facilities:
1. Performs architecture recovery from a system's implementation.
2. Uses the recovered information to compute architectural change and decay metrics.
15
Source
Code
Recovery
Techniques
Architectures
Change
Metrics
Calculator
Change
Metrics
Values
Legend
Artifacts
Components Dataflow
Connector
Figure 2.1: Architecture recovery and analysis components and artifacts leveraged in this
dissertation.
3. Automates analysis and discovery of architecture-level issues.
In this dissertation, we will directly leverage the architectural recovery techniques that
ARCADE provides. Furthermore, architectural metrics included in ARCADE inspire the
analyses RecovAr conducts (Section 3.1).
ARCADE's foundational element is architecture recovery, depicted as the Recovery
Techniques component in ARCADE's data
ow architecture shown in Figure 2.1. The
architectural views produced by Recovery Techniques are directly used for studying ar-
chitectural changes. ARCADE currently provides access to ten recovery techniques; nine
techniques use algorithms for clustering implementation-level elements into architectural
components [56], while one technique reports the implementation view of a system's ar-
chitecture (i.e., the system's directory and package structure). ARCADE thereby allows
an engineer 1 to extract multiple architectural views and 2 to ensure maximum accuracy
of extracted architectures by highlighting their dierent aspects.
For each architecture, ARCADE computes the change metrics [17]. To that end,
the Change Metrics Calculator component analyzes the architectural information yielded
16
by Recovery Techniques. The computed metrics comprise the nal artifact produced by
ARCADE (Change Metrics Values in Figure 2.1) that is relevant to this dissertation.
This artifact is then used to interpret the degree of architectural change. ARCADE
employs our own implementations of the change metrics.
Architectural evolution studies are typically conducted by comparing the architecture
of a software system version with the architectures of its ancestors or descendants. To
conduct this comparison, ARCADE needs to know the evolution path of the software
system. The evolution path is a sequence of version pairs. A version pair is an ordered
pair (s;t) of versions from a given system, where t is the target version that evolved
directly from the source versions. Each value for the metric from Section 3.1 is computed
using a version pair. For our evaluation, we obtained the correct evolution paths for our
subject systems by using git-log [62] and svn-graph-branches [112]. The two tools provide
analogous functionality on the two dierent types of repositories, Git and SVN. Both tools
can parse repository log les and create graphs that show relationships between software
versions. For example, if version 1.1 of a system is created from version 1.0, then in the
evolution graphs version 1.1 will be linked from version 1.0.
A critical part of ARCADE is the selection of appropriate architecture recovery tech-
niques. Since our previous evaluation [56] showed that two of the techniques|ACDC and
ARC |exhibit signicantly better accuracy and scalability than the remaining clustering-
based techniques, and that they produce complementary architectural views, we focus on
them in this dissertation. ACDC 's view is based on several common, familiar subsystem
patterns found in large systems, such as directory structures and graphical properties of
source les (e.g. their out-degrees or subgraph domination) [164]. On the other hand,
17
ARC 's view produces components that are semantically coherent due to sharing similar
system-level concerns (e.g., a component whose main concern is handling of distributed
jobs).
ARC and ACDC perform sophisticated analyses to extract an architectural view of
a system. Both of these techniques introduce challenges that must be addressed when
studying architectural change. In the following sub-sections, we explain ACDC and ARC
in more detail, the challenges arising from using them for studying architectural change,
and the solutions we developed to address those challenges.
ACDC
To recover architecture using ACDC , we obtained an implementation of the technique
from ACDC 's authors [164] and used its default settings. Although ACDC relies on a
deterministic clustering algorithm, it turned out that its implementation is not deter-
ministic. This initially introduced inaccuracies in our empirical analysis. Specically,
applying the original implementation of ACDC on the same source code twice yields
architectures that are usually 95% similar [17].
We traced the source of ACDC 's non-determinism bug to the implementation of the
Orphan Adoption (OA) algorithm used in its implementation. OA is an incremental
clustering algorithm that ACDC employs to assign a system's implementation entities to
architectural components. The order of entities provided as input aects the result of OA,
and subsequently the architecture recovered by ACDC . In the original implementation
of ACDC , this order is not the same in every execution of the algorithm, causing the
18
non-deterministic output. We resolved this problem by rst sorting the input to OA
based on the full package name of each class le.
ARC
To represent system-level concerns, ARC leverages probabilistic topic modeling [20],
which are machine-learning algorithms for determining thematic topics in text documents.
For ARC, each topic obtained using such algorithms represents a system-level concern. In
order to represent the topic models needed for ARC , we have used MALLET, a machine
learning toolkit [106]. The topic-model extraction algorithms implemented by MALLET
are non-deterministic. This posed a problem when trying to meaningfully compare two
concern-based architectures as required for our study, since we needed a shared topic
model for their recovery. Therefore, for each subject system, we created a topic model
by using all available versions of the system as the input to MALLET. Another challenge
arising from using topic modeling is that there is no generally agreed-upon or objectively
computable number of topics for a given body of text [20, 106]. For each system in our
study, the number of topics was determined based on our experience with ARC from a
previous empirical evaluation [56]. We used the resulting multi-version topic model for a
system to recover the architectures for all of that system's versions.
In addition, we also computed architectural changes between a large number of pairs
consisting of dierent versions of the same system each by using topic models created
from only the involved two versions. The architectural change results yielded by the two
approaches|a single topic model for all system versions vs. dierent topic models for
each pair of versions|are highly similar, with a variation of 1-2%. This supports our
19
hypothesis that topic models created from a large number of versions would not produce
signicant noise when recovering the architecture of a particular version.
Our ultimate goal in employing topic modeling is to extract meaning from system
code. Natural language texts (e.g., newspaper articles and books) are intended for hu-
man readers and use vocabularies with typically well-dened semantics. Text found in
code, however, does not necessarily use identiers or comments that are human-readable.
In fact, such code may be intentionally obfuscated to prevent human readability (e.g., to
protect intellectual property). To obtain accurate topics from software, developers should
use meaningful identiers or write intelligible comments. To mitigate such issues, prob-
abilistic topic models rely upon frequencies of words in code sampled to t a probability
distribution known to be representative of textual documents. Furthermore, it is reason-
able to expect that ARCADE would be used by an organization that owns the software
system under maintenance, precluding the need to resort to measures like obfuscation.
To reduce issues arising from noise that appears in text, we must select appropriate
stop words, which are words that have low semantics and reduce the quality of obtained
topics, for the domain of software. We selected stop words from the English language
(e.g., articles like \the"), general computing, programming languages (e.g., keywords in
Java), and individual systems (e.g., a system's name). Finally, it is also possible in topic
modeling that the resulting topics are not easily understandable by a human unfamiliar
with the software system in question. Such topics may be confusing for new engineers still
learning about the system; however, those topics may still be meaningful for the original
and/or more knowledgeable engineers.
20
To construct a highly accurate topic model for each subject system, we identied fre-
quently used yet unimportant words (e.g., words about license agreements that appear in
many source les). Those words should be ignored when constructing topic models to pre-
vent excessive overlap between topics. To identify those words, we designed a renement
process involving three PhD students. Each student individually identied words involv-
ing license agreements, meaningless variable names, and subject-system names. From
this set, the participants agreed upon a set of common words that should be ignored.
We then supplied the resulting words to MALLET as stop words which, in turn, ignores
those words during topic-model construction.
2.3 Software Repositories
Code Repositories
In computer software engineering, code repository or revision control is any kind of prac-
tice that tracks and provides control over changes to source code. Software developers
sometimes use revision control software to maintain documentation and conguration
les as well as source code. As teams design, develop and deploy software, it is common
for multiple versions of the same software to be deployed in dierent sites and for the
software's developers to be working simultaneously on updates. Bugs or features of the
software are often only present in certain versions (because of the xing of some problems
and the introduction of others as the program develops). Therefore, for the purposes of
locating and xing bugs and issues, it is vitally important to be able to retrieve and run
dierent versions of the software to determine in which version(s) the problem occurs. It
21
may also be necessary to develop two versions of the software concurrently (for instance,
where one version has bugs xed, but no new features (branch), while the other version
is where new features are worked on (trunk).
In terms of graph theory, revisions are generally thought of as a line of development
(the trunk) with branches o this, forming a directed tree, visualized as one or more
parallel lines of development (the \mainlines" of the branches) branching o a trunk. In
reality the structure is more complicated, forming a directed acyclic graph, but for many
purposes \tree with merges" is an adequate approximation. In the presence of merges,
the resulting graph is no longer a tree, as nodes can have multiple parents, but is instead
a rooted directed acyclic graph (DAG). The graph is acyclic since parents are always
backwards in time, and rooted because there is an oldest version. However, assuming
that there is a trunk, merges from branches can be considered as \external" to the tree,
the changes in the branch are packaged up as a patch, which is applied to HEAD (of
the trunk), creating a new revision without any explicit reference to the branch, and
preserving the tree structure. Thus, while the actual relations between versions form a
DAG, this can be considered a tree plus merges, and the trunk itself is a line. We use
this graph to obtain the list of consequent system changes for our purposes.
Issue and Bug Trackers
An issue tracking system is a computer software package that manages and maintains
lists of issues, as needed by an organization. Issue tracking systems are commonly used
in an organization's customer support call center to create, update, and resolve reported
customer issues, or even issues reported by that organization's other employees. A support
22
ticket should include vital information for the account involved and the issue encountered.
An issue tracking system often also contains a knowledge base containing information on
resolutions to common problems, and other such data. An issue tracking system is similar
to a \bug tracker", and often, a software company will use both, and some bug trackers
are capable of being used as an issue tracking system, and vice versa.
There exist several issue tracking systems, more generally known as ticket tracking
systems, but Bugzilla [5] and Jira [79] are among the two most popular due to their use
in major open-source projects, such as Mozilla, Eclipse, and Hadoop. Both Bugzilla and
Jira are Web-based systems, oering two principal user interfaces: an interface to consult
the list of stored issues and an interface to post and reply to issues. These systems are
just front-end to databases and can be queried in dierent ways.
Bugzilla program model is based on CGI-BIN and can be easily queried via HTML
get; Jira supports RSS and, thus, any RSS reader with an XML parser can be used
to extract data. When posting an issue, most tracking systems oer a set of elds to
label the issue, including severity; keywords; the product, component, version, hardware,
and operating system against which the issue is lled. Neither Bugzilla nor Jira make a
real distinction between error, faults, bugs and their synonyms, such as defects, crashes,
problems. For example, all Bugzilla issues are referred to as bugs and only the \Severity"
eld allows the tag \Evolution". Issue life cycle somehow depends on the specic issue
tracking tool; however, similarities can be identied. When a new issue is discovered, its
life begins in the \New" or \Unconrmed" status. At this stage, the issue is assigned a
unique identication number (ID) as well as some properties such as severity, priority,
components it aects, discovered and reported time-stamps.
23
Subsequently, a programmer takes the lead or is assigned to the task of proposing a
x. Fixes are often submitted as Unix patches or context di. Patches are associated to
issues via an attachment table linked to the issue ID. Once an issue resolution is proposed
and approved, its status reaches the nal disposition of \Close". However, at least for the
projects considered in this dissertation, an issue almost never reaches the \Close" state:
once it is tagged as \Resolved", it remains forever in this state.
2.4 Automated Software Analysis
Automated design analysis and code generation using software system models is achieved
through model transformation [37, 47, 113, 169]. A model transformation maps an input
model to some output, which could be another model specied in a dierent language,
a set of analysis results, or anything else a program can produce. Model transformation
allows a single model to be used for a variety of purposes. For example, model transfor-
mations are commonly used to automatically generate, from a design model, other models
required by a specic analysis technique or implementation code for a specic run-time
platform.
In practical terms, a model transformation is usually implemented by a program called
a model interpreter (MI ). An MI reads the data contained in models, manipulates that
data, and produces output for a particular semantic domain. By dening the meaning of
models in that semantic domain, an MI denes (one set of) semantics for the language.
Model execution takes place in the semantic domain, which is implemented by an external
application (e.g., an analysis tool) or run-time environment (e.g., a middleware platform).
24
Legend
Model
Repository
Metamodel Editor
Metamodel
Configurable
Model Editor
Model
Editor
Config
Files
Configurable
Code Generator
Code
Generator
Config
Files
Configurable
Analysis Tool
Analysis
Tool
Config
Files
Editor
Metainterpreter
Presentation
Semantics
Generator
Metainterpreter
Code
Generation
Semantics
Analysis
Metainterpreter
Analysis
Semantics
Type Definitions
Analysis
Definitions
Code Generation
Definitions
Presentation
Definitions
Application
Model
Presentation
Interpretation
Synthesis
Interpretation
Analysis
Interpretation
MDE Platform
Component
Model
Key
Metatype
Properties
Semantics
Framework
Extension
Model Interpreter
Framework
Figure 2.2: A high-level overview of the proposed process for leveraging an MDE platform.
Model-driven engineering (MDE) [82] is a system creation and evolution paradigm
that leverages metamodeling to create DSLs and uses MIs to automate design analysis and
code generation. MDE platforms are toolsets that facilitate and automate MDE processes.
An MDE platform minimally includes a metamodel editor, a metamodel interpreter (or
metainterpreter for short), and a metaprogrammable model editor. The Generic Modeling
Environment (GME) [60] and the Eclipse EMF/GEF/GMF family of tools [45, 60] are
well-known examples of MDE platforms.
Figure 2.2 illustrates the process of using an MDE platform. Software engineers rst
use the metamodel editor to create a metamodel that denes a DSL. The metamodel
specication process consists of instantiating the types available in the metalanguage
(the metatypes) and setting the values of their properties. The available metatypes vary
25
from one MDE platform to another. Each metatype instance captures the type denition
for a class of domain-specic model entities.
Once a metamodel has been specied, an engineer invokes the MDE platform's metain-
terpreter, which takes the metamodel as input and produces a set of conguration les
or plug-ins for the metaprogrammable model editor as output. The conguration les
or plug-ins specify how to manage internal representations of domain-specic models,
enforce constraints on model well-formedness, visually render models, and modify models
based on engineer commands, according to the DSL syntax. A model editor that has been
congured for a particular DSL becomes a domain-specic model editor, and is used by
engineers to develop domain-specic application models.
Unlike model editing MIs, which are automatically generated, current MDE platforms
do not provide built-in MIs for design analysis/simulation and code generation. Instead,
they require engineers to implement these MIs by accessing an API provided by the model
editor to extract the information contained in application models and generate output
that can be directly executed within the target semantic domain [12,13,55,76]. To do so,
engineers must perform a range of time-consuming and error-prone tasks:
1. Discover the semantic relationships between the types present in the DSL and those
present in the semantic domain.
2. Verify that the DSL contains sucient and suitable information to perform a trans-
formation to the semantic domain.
3. Determine the compatibility between the assumptions and constraints of the DSL
and the semantic domain, and resolve con
icts between the two.
26
4. Design and implement an MI that performs the model transformation with accept-
able eciency and scalability.
5. Verify the correctness of the transformation implemented by the MI.
This shortcoming of existing MDE platforms directly motivated the research presented
in this dissertation. As part of the preliminary work required by the research threads
described in Section 3, we developed a modeling and simulation framework (DomainPro)
that enables automatic generation of editing, analysis/simulation, and code generation
MIs (denoted by dashed rectangles in Figure 2.2) [145]. The simulation generator designed
for DomainPro is a component-based variant of the widely used discrete event simulation
paradigm. Outputs of such simulations are captured in form of time-series objects. A
time series is a series of data points indexed (or listed or graphed) in time order. These
time-series objects need to be analyzed and compared in order to extract meaningful
insights about the software. Possible solutions and the devised approach is discussed in
Section 3.3.5.
27
Chapter 3
Software Design Exploration Techniques
In this chapter, we present our solutions to the problems and hypotheses specied in
Chapter 1. Collectively, these techniques enable engineers to recover, realize, and explore
the design decisions that comprise a software system. Figure 3.1 depicts a high-level
schematic overview of our techniques and their roles in the software development process.
In Section 3.1, we lay out RecovAr, a technique for recovering design decisions in existing
software systems. Section 3.2 describes PredictAr and how it enables engineers to predict
architectural signicance of implementation issues. Finally, Section 3.3 explains how
eQual helps engineers explore the eects of their design decisions on the quality of a
software system.
3.1 RecovAr
As we discussed in Chapters 1 and 2, knowledge vaporization in software systems plays
a major role in increasing maintenance costs, and exacerbates architectural drift and
erosion [77]. The goal of RecovAr is to uncover architectural design decisions in software
28
Software Development Timeline
Idea Implementation & Maintenance Implementation & Maintenance Design & Analysis Design & Analysis
eQual RecovAr PredictAr
Figure 3.1: Schematic overview of eQual, PredictAr, and RecovAr showing their relation-
ship and applicability at dierent stages of software development.
systems, and thereby help reverse the course of knowledge vaporization by providing a
better understanding of such decisions and their eects.
In this section, we further elaborate on the denition and classication of design
decisions. We describe how architectural changes can be recovered from the source code
of real software systems. We also describe a process whereby architectural design decisions
are identied in real systems. Finally, RecovAr enables engineers to continuously capture
the architectural decisions in software systems during their evolution.
Section 2.1 identied two constituent parts of an architectural design decision, ratio-
nale and consequence. The static architecture of a system explicitly captures the system's
components and possibly other architectural entities, but rationale is usually missing or,
at best, implicit in the structural elements. For this reason, our approach focuses on
the consequences of design decisions. We have developed a technique that leverages the
combination of source code and issue repositories to obtain the design decision conse-
quences. Issue repositories are used to keep of track of bugs, development tasks, and
feature requests in a software development project. Code repositories contain historical
29
Issues Commit Logs
Architecture
Recovery
n
Mapping
Decision Extraction
Source Code
Decisions
Legend
New Module
Off-the-shelf
Module
Data Flow
Issue
Entity
Recovered
Component
Change
Instance
Architectures
Change Analysis
Change Instances
1 2 3 4 5
A
4 2 3 5 6 7 1
B
4 7 4 6
c1 c2 c3
c1
2
c2
4
c3
D1 D2
3
Impact List
1 3 5
1 2 3
7 2 4 6
4
Figure 3.2: Overview of RecovAr. Using the existing source code, commit logs, and
extracted issues obtained from code and issue repositories, our approach automatically
extracts the underlying design decisions. Implementation of the new components spans
over 4,000 Source Lines of Code (SLoC).
data about the inception, evolution, and incremental code changes in a software system.
Together, these repositories provide the most reliable and accurate pieces of information
about a system.
30
RecovAr automatically extracts the required information from a system's repositories
and outputs the set of design decisions made during the system's lifecycle. In order to
achieve this, RecovAr rst recovers the static architecture of the target system. RecovAr
then cross-links the issues to their corresponding code-level changes. These links are
in turn analyzed to identify candidate architectural changes and, subsequently, their
rationales. A high-level overview of RecovAr's work
ow is displayed in Figure 3.2.
RecovAr begins by recovering the static architecture of a system. This step is only
required if an up-to-date, reliable, documented architecture is not available. After re-
covering or obtaining the architectures of dierent versions of its target system, RecovAr
follows through three distinct phases. In the rst phase (Change Analysis) RecovAr
identies how the architecture of the system has changed along its evolution path. The
second phase (Mapping) mines the system's issue repository and creates a mapping
(called architectural impact list) from issues to the architectural entities they have af-
fected. Finally, the third phase (Decision Extraction) creates an overarching decision
graph by putting together the architectural changes and the architectural impact list.
This graph is in turn traversed to uncover the individual design decisions. In the remain-
der of this section we detail each of the three phases.
3.1.1 Change Analysis
Architectural change has been recognized as a critical phenomenon from the very begin-
nings of the study of software architecture [128]. However, only recently have researchers
tried to empirically measure and analyze architectural change in software systems [17,93].
These eorts rely on architectural change metrics that quantify the extent to which the
31
architecture of a software system changes as the system evolves. This work has served as
a motivation and useful foundation in obtaining a concrete view of architectural changes.
Specically, we have designed Change Analyzer (CA), which is inspired by the manner
in which existing metrics (e.g., a2a [17], MoJo [163], and MoJoFM [176]) measure archi-
tectural change. These metrics consider ve operations used to transform architecture
A into architecture B: addition, removal, and relocation of system entities (e.g., meth-
ods, classes, libraries) from one component to another, as well as addition and removal
of components themselves [6, 109, 125]. We use a similar notion and dene architectural
change as a set of architectural deltas. An architectural delta happens at two levels:
1. Any modication to a component's internal entities including additions and re-
movals (relocation is treated as a combined addition to the destination component
and removal from the source component).
2. Additions and removals of entire components.
We then aggregate these deltas into architectural change instances. Algorithm 1 describes
the details of the approach used to extract the architectural deltas and changes.
CA works in two passes. In the rst pass, CA matches the most similar components in
the given pair of architectures. In the second pass, CA compares the matched components,
extracts the architectural delta(s), and clusters them into architectural change instances.
The objective of the matching pass is to nd the most similar components in a way
that minimizes the overall dierence between the matched components. Since two ar-
chitectures can have dierent numbers of components, CA rst balances (Algorithm 1,
line 5) the two architectures. To do so, CA adds \dummy" (i.e., empty) components to
32
1 2 3 4 5
1 2 3 4 7 5 6
1
2
4
5
4
4
2
Architecture A
Architecture B
a1 a2 a3
b1
3
1
b2 b3
Figure 3.3: Calculating the costs of the edges and nding the perfect matching. The bold
connectors are the selected edges that lead to minimum overall cost.
the architecture with fewer components until both architectures have the same number
of components. After balancing the architectures, CA creates a weighted bipartite graph
from architecture A to architecture B and calculates the cost of each edge. Existence of
an edge denotes that componentC
A
has been transformed into componentC
B
. The cost
of an edge is the total number of architectural deltas required to eect the transformation.
Figure 3.3 displays a simple, notional example of two architectures and the corre-
sponding bipartite graph with all possible edges. MinCostMatcher (Algorithm 1, line 11)
takes the two architectures and the set of edges between them, and selects the edges in a
way that ensures a bijective matching between the components of the two architectures
33
Algorithm 1: Change Analysis
Input: ArchitectureA;ArchitectureB
Output: Changes( a set of architectural changes
1 Let ComponentsA = ArchitectureA's components
2 Let ComponentsB = ArchitectureB's components
3 Let E
all
, E
chosen
=;
4 ifjComponentsAj6=jComponentsBj then
5 Balance(ComponentsA;ComponentsB)
6 foreach c
a
2 ComponentsA do
7 foreach c
b
2 ComponentsB do
8 cost =CalculateChangeCost(c
a
;c
b
)
9 e =fc
a
;c
b
;costg
10 add e to E
all
11 E
chosen
=MinCostMatcher(ComponentsA;ComponentsB;E
all
)
12 foreach e2E
chosen
do
13 Changes =GetChangeInstances(e:c
a
;e:c
b
)[Changes
14 return Changes
with the minimum overall cost (sum of the costs of the selected edges). MinCostMatcher
is based on linear programming; its details are omitted for brevity.
In the second pass, CA extracts the architectural deltas between the matched compo-
nents. If there are no common architectural entities between two matched components,
we create two change instances, one for the component that has been removed and one
for the newly added component. The reason is to distinguish between transformations of
components and their additions and removals. Figure 3.4 depicts the extracted changes
of our example architectures.
34
Algorithm 2: GetChangeInstances method
Input: ComponentA;ComponentB
Output: Changes
1 Let EntitiesA = ComponentA's entities
2 Let EntitiesB = ComponentB's entities
3 if EntitiesA\EntitiesB =; then
4 Change ch
1
;ch
2
5 ch
1
:deltas =EntitiesA
6 ch
2
:deltas =EntitiesB
7 returnfch
1
;ch
2
g
8 else
9 Change ch
10 ch:deltas = (EntitiesA[EntitiesB) (EntitiesA\EntitiesB)
11 returnfchg
3.1.2 Mapping
The output of CA is a set of architectural changes that is a superset of the consequences of
design decisions. The goal of Mapping is to nd all the issues that point to the rationale of
the design decisions that yielded those consequences. To that end, Mapping rst identies
the issues that satisfy two conditions:
1. They belong to the version of the system being analyzed.
2. They have been marked as resolved and their consequent code changes have been
merged with the main code base of the system.
35
4 7
4 6
Changes
c1
c2 c3
Figure 3.4: Extracted changes between the architectures depicted in Fig. 3.3. Double-
lined diamonds indicate removals while regular diamonds denote additions.
1 3 5
i 1 i 2 i 3
7 2 4 6
i 4
Figure 3.5: Architectural impact list. Squares represents issues and diamonds represent
entities. An edge from an issue to an entity means that resolving that issue resulted in
modifying that entity.
Mapping then extracts the code-level entities aected by each issue. These code-level
entities are identied by mining the issues' commit logs and pull requests. Using one
or more architecture recovery methods available in ARCADE, the code-level entities are
translated into corresponding architectural entities. The list of all issues, as well as the
mapping between the issues and the architectural entities aected by them is called the
Architectural Impact List.
Figure 3.5 displays a graph-based view of this list. It is possible for issues to have
overlapping entities (e.g., i2 and i3 are both connected to entity 5). It is also important to
note that the presence of an edge from an issue to an entity does not necessarily indicate
36
c 1
i 2
c 2
i 4
c 3
D 1 D 2
i 3
Figure 3.6: The overarching decisions graph contains two decisions D1 and D2. Squares
denote issues, and circles denote changes.
architectural change (e.g., entities 1 and 5 are not part of any of the architectural changes
in Figure 3.4). This is intuitively expected, since a great many of issues do not incur
substantial enough change in the source code and thereby the architecture of the system.
3.1.3 Decision Extraction
In its nal phase, RecovAr creates the overarching decision graph by putting together the
architectural changes and their pertaining issues. This graph is traversed and individual
design decisions are identied. Algorithm 3 details this phase.
Algorithm 3 traverses the architectural impact list generated in the Mapping phase
and the list of changes. If there is an intersection between the entities matched to issues
and the entities involved in changes, then it adds an edge connecting the issue with the
change. The intuition behind this is that an issue contains the rationale for a decision
if it aects the change(s), which are the consequences, of that decision. We note that,
hypothetically, there can be situations in which an issue is the cause of a change without
directly aecting any architectural deltas in that change. For example, if an issue leads
37
Algorithm 3: Decision Extraction
Input: ArchitecturalImpactList;Changes
Output: Decisions
1 Let DecisionsGraph = bipartite graph of decisions
2 foreach (issue;entities)2ArchitecturalImpactList do
3 foreach c2Changes do
4 if c:deltas\entities6=; then
5 connect(issue;c) in DecisionsGraph
6 Decisions =FindDecisions(DecisionsGraph)
7 return Decisions
to removing all the dependencies to an entity, that entity might get relocated out of its
containing component by the architecture recovery technique. However, detecting these
situations in a system's architecture is not possible with existing recovery techniques,
because they abstract away the dependencies among internal entities of a component.
Although such information could easily be incorporated, RecovAr would be unable to
deal with such scenarios as currently implemented.
The decisions graph for our running example is depicted in Figure 3.6. The Find-
Decisions method in Algorithm 3 removes all orphaned changes and issues, and in the
remaining graph locates the largest disconnected subgraphs. Each disconnected subgraph
represents a decision. The reason is that these disconnected subgraphs are the largest sets
of interrelated rationales and consequences that do not depend on other issues or changes.
Intuitively, we expect that, in a real-world system, only a subset of issues will impose
changes whose impact on the system can be considered architectural. Furthermore, each
38
of those issues will re
ect a specic, targeted objective. Therefore, in a typical system, the
graph of changes and issues should contain disconnected subgraphs of reasonable sizes.
This is discussed further in our evaluation in Section 4.1.
In Section 2.1, we identied three dierent types of decisions:
1. Simple decisions are the decisions that consist of a single change and a single issue.
These decisions have a clear rationale and consequence.
2. Compound decisions are the decisions that include multiple issues and a single
change. These decisions are similar to simple decisions and the issues involved are
closely related to an overarching rationale.
3. Cross-cutting decisions are the decisions that include multiple changes and one
or more issues. These decisions have a higher-level, compound rationale | e.g.,
improving system reliability or performance | that requires multiple changes to be
achieved.
For illustration, Table 3.1 lists three real examples of decisions, one of each type,
uncovered from Hadoop. Information in the Issue(s) column contains the summaries of
the issues pertaining to that decision. Each boxed number indicates a separate issue
or change. The data in the Change(s) column are short descriptions of the changes
involved in a given decision. The simple decision in the top row is an update to satisfy a
requirement by changing the job tracking module. The compound decision in the middle
row describes the two sides of a problem that is resolved by changing the compression
module of Hadoop. Finally the uncovered cross-cutting decision in the bottom row is
about a series of changes applied to increase the reliability of Hadoop's task execution.
39
Decision Type Issue(s) Change(s)
Simple 1 Job tracking module only
kept track of the jobs executed
in the past 24 hours. If an
admin checked the history
after a day of inactivity, e.g.,
on Monday, the list would be
empty.
1 hadoop.mapred component
was modied.
Compound 1 UTF8 compressor does not
handle end of line correctly.
2 Sequenced les should sup-
port custom compressors.
1 CompressionInputStream
was added and Compression-
Codec was modied.
Cross-cutting 1 Random seeks corrupt the
InputStream data.
2 Streaming must send status
signals every 10 seconds.
3 Task status should include
timestamp for job transitions.
1 hadoop.streaming was
modied.
2 hadoop.metrics component
was modied.
3 hadoop.fs was modied.
Table 3.1: Examples of recovered Hadoop decisions.
Applying RecovAr continuously throughout a project's lifecycle (e.g., as can be done
with testing [118,119,137]), helps preserve architectural knowledge and could encourage
engineers to write architecturally-conscious issue descriptions, increasing system qual-
ity [148].
3.2 PredictAr
PredictAr's goal is to detect the implementation issues leading to|possibly unintent-
ional|design decisions and subsequently change the system's architecture. We do so by
predicting whether resolving a submitted issue would require architecturally signicant
40
Commit
Analysis
Architecture
Recovery
Change
Detection
For each issue:
Significant
Issues Issues
After
Before
System Architectures
Legend
Data Flow Component Artifact
Figure 3.7: Framework for the identication of architecturally signicant issues in a
software project.
changes. This information could prove valuable to engineers and help them deliver higher
quality code [126]. In the remainder of this section, we will describe the methodology em-
ployed to obtain the ground truth data (Section 3.2.1); and explain the devised work
ow
for classifying issues based on their architectural signicance (Section 3.2.2).
3.2.1 Recovering Signicant Issues
First step of the PredictAr approach is creating and labeling the corpus of issues. Figure
3.7 depicts our framework for identifying a system's architecturally signicant issues. The
process begins by mining the set of issues from a system's issue repositories and ltering
the ones not conforming to our criteria, i.e., issues that are not \resolved" or \closed",
or do not have a set of xing commits.
For each issue, our framework automatically extracts its pertinent commit informa-
tion. The commit information is used to identify the system version at which the issue
has been merged with the code base and the version immediately preceding it. Finding
xing commits is not always easy since there is no standard method for engineers to keep
41
track of this information on issues trackers. For instance, in Jira, we found and support
three popular methods:
1. Issues that directly map to xing commits.
2. Using pull requests that contain the list of commits and pertinent issues.
3. Using patch les submitted with issues when the issues are xed.
We then extract the system's source-code at these two snapshots. By applying the two
selected architecture recovery techniques (ACDC [164] and ARC [57]), we recover two
architectural views of the system at each snapshot. Finally, the a2a [17] similarity metric
is used, with the highest sensitivity, to identify any architectural discrepancies stemming
from the issue and its extracted commits. Issues whose resolution has caused architectural
change, as indicated by a2a, are labeled as signicant.
Architecture-to-architecture (a2a) is a similarity metric we developed for assessing
system-level change. a2a was inspired by the widely used MoJoFM [163] metric. Mo-
JoFM proved to be ill-suited for our study because it assumes that the entity sets in the
architectures (depending on the recovery method used, entities may be classes, methods
or other building blocks of a system) undergoing comparison will be identical; this is unre-
alistic for systems whose versions are known to have evolved, sometimes substantially. In
order to address this fundamental shortcoming, we introducemto, a distance metric that
measures distance between two architectures with arbitrary entity sets, then normalize it
to calculate a2a.
42
Minimum-transform-operation (mto) is the minimum number of operations needed to
transform one architecture to another:
mto(A
1
;A
2
) =remC(A
1
;A
2
) +addC(A
1
;A
2
)
+remE(A
1
;A
2
) +addE(A
1
;A
2
) +movE(A
1
;A
2
)
The ve operations used to transform architectureA
1
intoA
2
comprise additions (addE),
removals (remE), and moves (movE) of implementation-level entities from one cluster
(i.e., component) to another; as well as additions (addC ) and removals (remC ) of clusters
themselves [6,109,125].
Note that each addition and removal of an implementation-level entity requires two
operations: an entity is rst added to the architecture and only then moved to the
appropriate cluster; conversely, an entity is rst moved out of its current cluster and
only then removed from the architecture. Similar to Section 3.1.1, this is supported by
several foundational works on architectural adaptation (e.g., [6,109,125]). The underlying
intuition is as follows. If we think of the recovered architecture as a set of constituent
building blocks (i.e., clusters and entities) and their congurations (i.e., arrangement
of entities inside clusters), then there is a dierence between (a) simply changing the
architectural conguration and (b) also changing the constituent building blocks.
We normalizemto to calculatea2a, a similarity metric between two architectures with
dierent implementation-level entities:
a2a(A
1
;A
2
) = (1
mto(A
1
;A
2
)
mto(A
;
;A
1
) + mto(A
;
;A
2
)
) 100%
43
wheremto(A
;
;A
i
) is the number of operations required to transform a \null" architecture
A
;
into A
i
. In other words, the denominator mto(A
;
;A
1
) +mto(A
;
;A
2
) is the number
of operations needed to construct architectures A
1
and A
2
from a \null" architecture.
3.2.2 Predicting Signicant Issues
The main objective of our work is to classify issues based on their architectural signi-
cance. Recent studies have shown that developers who explicitly consider the impact of
their code-level changes on their system's architecture deliver higher quality code [126].
This suggests that notifying engineers of the likely architectural importance of issues at
the time they are submitted can result in better-informed implementation decisions. To
classify the issues and enable such notication, we use the information that is readily
available for newly submitted issues: title, description, priority, and type.
A classier is a function f :R
d
!
that assigns a label from a nite set of classes
=f!
1
;:::;!
q
g to observations x2 R
d
. In this dissertation we are interested in the
family of binary classiers where there are only two classes and thus
contains only two
symbols ! =f0; 1g or ! =fsignificant;non{significantg. Three families of machine
learning techniques are available to build a classier: unsupervised learning, supervised
learning, and reinforcement learning [144]. Unsupervised learning, for example clustering
algorithms, classies available data based on some tness or cost function: often a distance
or similarity. Supervised learning, e.g., Naive Bayes, assumes that a training set of labeled
data is available. A classier is then built by maximizing some gain or minimizing a
cost function, representative of the accuracy of the classier with respect to the a priori
classication. In reinforcement learning, a user is required to decide if the classication for
44
the current piece of data is correct; the classier then incrementally learns a classication
function. This later family of techniques is not well suited for oine classication but
has been successfully applied in traceability recovery [69].
Supervised learning techniques, in particular algorithms such as Bayesian classiers,
or logistic regression, produce classiers that are more easily interpretable but that require
a labeled corpus. A labeled corpus is a set of pairs (observation, label) assumed to be
random variables (X;Y ) drawn from a xed but unknown probability distribution. The
objective of the learning techniques is to nd a classier f with a low error probability
P
[f(X)6=Y ]. Both the selection and the evaluation of f must be based on some data
set D
n
containing n labeled pieces of data because the data distribution is unknown.
Therefore,D
n
is usually split into two parts, the training sampleD
m
and the test sample
D
nm
. A learning algorithm is a method that takes the training sampleD
m
as input and
outputs a classier f(x;D
m
) = f
m
(x). We use the process introduced in Section 3.2.1
to obtain our training corpus. A common learning method chooses a function f
m
from a
function class that minimizes the training error:
L(f;D
m
) =
1
m
m
X
i=1
I
ff(x
i
)6=y
i
g
Where I
A
is the indicator function of event A, i.e., returns 1, if event A occurs, and
0 if it does not. To evaluate the chosen function, the error probability P
[f(X)6=Y ] is
estimated by the test error L(f;D
nm
).
Previous studies have shown that when training models on primarily textual data with
on the order of several thousand data points, a classier with high bias performs well.
45
Theoretical and empirical results suggest that Naive Bayes [54] does well in such circum-
stances [52, 121], and we adopt it for our classication. A Bayesian classier is a simple
classication technique that classies a d-dimensional observation x
i
by determining its
most probable class ! computed as:
! =arg
!
k
maxp(!
k
ja
1
;:::;a
d
)
where !
k
ranges over the set of classes in
and the observation x
i
is written as a
generic attribute vector. By using the rule of Bayes, the probabilityp(!
k
ja
1
;:::;a
d
) called
probability a posteriori, is rewritten as:
p(a
1
;:::;a
d
j!
k
)
q
P
h=1
p(a
1
;:::;a
d
j!
h
)p(!
h
)
p(!
k
)
The classier structure is drastically simplied under the assumption that, given a
class!
k
, all attributes are conditionally independent. Under this assumption the following
common form of a posteriori probability is obtained:
p(!
k
ja
1
;:::;a
d
) =
d
Q
j=1
p(a
j
j!
k
)
q
P
h=1
d
Q
j=1
p(a
j
j!
h
)p(!
h
)
p(!
k
)
When the independence assumption is made, the classier is called naive Bayes clas-
sier. The p(!
k
) marginal probability (or base probability) is the probability that a
member of a class !
k
will be observed. The p(a
j
j!
k
) prior conditional probability is
the probability that the j th attribute assumes a particular value a
j
given the class !
k
.
46
These two prior probabilities determine the structure of the naive Bayes classier. They
are learned, i.e., estimated, on a training set when building the classier.
In order to apply Naive Bayes to classify issues into signicant and non-signicant we
must extract and select the appropriate features. For our study, features are expected to
satisfy two principal criteria:
1. Features should be salient, i.e., be important and meaningful with respect to the
problem domain.
2. Features should be discriminatory, i.e., the selected features need to bear enough
information to successfully distinguish dierent classes of the data at hand.
Before tting the model and applying Naive Bayes for training, we need to think
about the best methods to represent the textual and non-textual parts of the implemen-
tation issues. A commonly used model in natural language processing is the bag of words
model [171]. The idea behind this model is to create the vocabulary, i.e., the dictionary
of all the words that occur in the training set. Subsequently, each word in that dictionary
is associated with the number of its occurrences. For illustration, let I
1
and I
2
be two
simplied issues in our training dataset (for brevity, we have removed their titles and
issue types):
I
1
:fdescription: This is counter-productive., priority: Criticalg
I
2
:fdescription: This change improves performance., priority: Minorg
47
Based on the descriptions of these two, the vocabulary can be written as:
V =fthis : 2; is : 1; counter{productive : 1; change : 1; improves : 1; performance : 1g
This vocabulary can be used to construct ad-dimensional feature vector for each issue.
The dimensionality is equal to the number of dierent words in the vocabulary (d =jVj).
This process is called vectorization. Table 3.2 depicts the bag of words representation
of our illustrative issues. In our example, we simply tokenized the issue sentences into
words by splitting them by white-spaces. Tokenization describes the general process of
breaking down a text corpus into individual elements that serve as input for various
natural language processing algorithms. There are several ways to tokenize each string.
Usually, tokenization is accompanied by other optional processing steps, such as the
removal of stop words and punctuation characters, and stemming or lemmatizing [131].
In our technique, we remove stop words and apply stemming after tokenizing the issues'
textual contents.
Stop words are words that are rather common in a text corpus and thus considered un-
informative, e.g., words such as \so", \and", or \the". One approach to stop word removal
is to search against a language-specic stop word dictionary. An alternative approach is
to create a stop list by sorting all words in the entire text corpus by frequency. The stop
list|after conversion into a set of non-redundant words|is then used to remove all those
words from the input documents that are ranked among the top n words in this stop
list. We use the English language stop words supplied by the widely adopted Natural
Language Toolkit (NLTK) [19].
48
this is counter-productive change improves performance
I
1
1 1 1 0 0 0
I
2
1 0 0 2 1 1
Table 3.2: Bag of words representation of two issue sentences I
1
and I
2
.
Critical Major Minor
I
1
1 0 0
I
2
0 0 1
Table 3.3: One-hot encoding of the I
1
and I
2
.
Stemming describes the process of transforming a word into its root form. The origi-
nal stemming algorithm was developed my Martin F. Porter in 1979 and is hence known
as Porter stemmer [131]. Stemming can create non-real words, for example, \thus" is
converted to \thu". In contrast to stemming, lemmatization aims to obtain the canonical
(grammatically correct) forms of the words, the so-called lemmas. Lemmatization is com-
putationally more expensive than stemming. In practice, however, choice of stemming or
lemmatization is eectively inconsequential on the performance of the text classier [160].
After tokenizing the textual parts of the issues, we append the non-textual features
to the resulting vector. To do so, we one-hot encode the priority and type of the issues.
A one hot encoding is a representation of categorical variables as binary vectors. For
instance, Table 3.3 depicts the one-hot encoding of the issue priorities for I
1
andI
2
. This
encoding indicates that I
1
has critical priority and I
2
is an issue labeled with priority
value of minor. Similarly, we append the feature vectors of issue title and issue type to
create the complete feature work of an issue.
49
In the context of issue classication, the decision rule of a naive Bayes classier based
on the posterior probabilities can be expressed as:
if P (! =significantj I)P (! =non{significantj I))classify I as significant
As we described earlier, the posterior probability is the product of the class-conditional
probability and the prior probability.
P (! =significantj I) = P (Ij ! =significant)P (significant)
P (! =non{significantj I) = P (Ij ! =non{significant)P (non{significant)
The prior probabilities can be obtained via the maximum-likelihood estimate based on
the frequencies of signicant and non-signicant issues in the training dataset:
^
P (! =significant) =
# of significant issues
# of all issues
^
P (! =non{significant) =
# of non{significant issues
# of all issues
Assuming that the words in every issue are conditionally independent (according to the
naive assumption), two dierent models can be used to compute the class-conditional
probabilities: The Multi-variate Bernoulli model and the Multinomial model. The Multi-
variate Bernoulli model is based on binary data, i.e., every token in the feature vector
of a document is associated with the value 1 or 0. Therefore, Multi-variate Bernoulli
50
model removes potentially important information from our feature set. To alleviate this
problem, we use the more versatile Multinomial Naive Bayes.
In Multinomial Naive Bayes, rather than binary values, we use the term frequency
(tf(t;I)). The term frequency is typically dened as the number of times a given term
t (i.e., word or token) appears in an issue I (this approach is sometimes also called raw
frequency). In practice, the term frequency is often normalized by dividing the raw term
frequency by the document length.
normalized term frequency =
tf(t;I)
n
I
where
tf(t;I): Raw term frequency (the count of term t in issue I).
n
I
: The total number of terms in issue I.
The term frequencies can then be used to compute the maximum-likelihood estimate
based on the training data to estimate the class-conditional probabilities in the multino-
mial model:
^
P (x
i
j!
j
) =
P
tf(x
i
;I2!
j
) +
P
N
I2!
j
+V
where:
x
i
: A word from the feature vector x of a particular sample.
51
T r a i n i n g
C l a s s i f i e r
F e a t u r e
E x t r a c t i o n
1 0 0 3 1 2 2
F e a t u r e
V e c t o r s
P r e - P r o c e s s i n g
P r oc e s s e d
I s s u e s
A n a l y ze d
I s s ue s
Figure 3.8: Work
ow for building our automatic architectural signicance classier.
P
tf(x
i
;I2!
j
): The sum of raw term frequencies of wordx
i
from all issues in the
training sample that belong to class !
j
.
P
N
I2!
j
: The sum of all term frequencies in the training dataset for class !
j
.
: An additive smoothing parameter. We used = 1 for Laplace smoothing.
V : The size of the vocabulary (number of dierent words in the training set).
The class-conditional probability of encountering the feature vector X (pertaining to
issue I) can be calculated as the product of the likelihoods of the individual features
(under the naive assumption of conditional independence).
P (Xj!
j
) =P (x
1
j!
j
)P (x
2
j!
j
):::P (x
n
j!
j
) =
m
Y
i=1
P (x
i
j!
j
)
Figure 3.8 displays the overview of our classication-model building process. As ex-
plained earlier, for each issue, we remove the English stop-words [134]. On top of this,
we remove code snippets and stack traces that are sometimes submitted alongside is-
sues descriptions. As described earlier, we then use the Porter stemmer to reduce the
in
ected (or sometimes derived) words to their word stems [131]. To account for both
52
false positives and false negatives, we quantify accuracy via standard information retrieval
measures: precision and recall. The details of our evaluation, subject systems, and results
are descried in Section 4.2.
3.3 eQual
eQual helps engineers explore the design space for their system via four steps:
1. Modeling: creating a system's architectural model that is amenable to dynamic
analysis.
2. Preparation: answering a set of questions that guides eQual's exploration.
3. Selection: generating design variants using the modeling and preparation steps
inputs.
4. Assessment: objectively comparing variants and ranking them based on the inputs
of the preparation step.
Steps 1 and 2 are interactive and generate eQual's inputs: a system's design model and
the answers to a set of design-related questions. eQual uses steps 3 and 4 to produce
a list of ranked system variants. Steps 3 and 4 take place iteratively: the outputs of
assessment feed back into selection, helping eQual to choose better alternatives, thereby
generating improved system variants.
Critically, eQual's required inputs are types of artifacts the architects should already
have created or have the knowledge to create reasonably easily. Modern software develop-
ment typically involves modeling of the system's architecture (eQual's step 1), although
sometimes informally or even implicitly [157]. Likewise, the answers to questions eQual
53
Variant
Visualization
Preparation
Interface
Ranked
Variants
Questions
Answers
Simulation Pool Connector
Global
Assessment
Selection
Preparation
Analysis
Reports
Sim
Configs.
Sim
Configs.
Ranked Variants
Analysis
Reports
System Design
System Design
Questions,
Ranked
Variants
Answers,
System
Design
System Design
Controller Connector
Questions
Answers
2
2
4
4
Client Connector
Design Environment
1
Ranked
Variants
3
Local
Assessment
Node Connector
Updates
Analysis
Reports
Simulation Data
Sim Configs.
Simulation
Engine
Figure 3.9: eQual's architecture. The Design Environment and Simulation Engine components are provided by DomainPro [145,
146].
54
asks (eQual's step 2), which result in specications of the desired behavioral properties
in a system, are concerns the architects have to consider regardless of whether they use
eQual.
eQual's overall architecture, with the components performing the four steps clearly
denoted, is depicted in Figure 3.2. The remainder of this section details each of eQual's
four steps.
3.3.1 Running Example
We will use a model of Hadoop to describe and evaluate (Section 4.3) eQual. In this
model, a computation is the problem being solved. A task is an operation that is per-
formed on the input (e.g., map or reduce in Hadoop). Tasks can be replicated, e.g., for
reliability [25, 26]. A job is one instance of a task. Machines can execute these jobs. A
task scheduler breaks up a computation into tasks, creates jobs as copies of these tasks,
and assigns jobs to machines in the machine pool. After returning a response to the task
scheduler, each machine rejoins the pool and can be selected for a new task. Figure 3.10
depicts a partial system model captured in a DSL (described in Section 3.3.2).
Although eQual has been used to analyze Hadoop's entire design space, for simplicity
of exposition, our description and evaluation in this dissertation will focus on Hadoop's
variation points that aect its key non-functional properties. There are ve such variation
points:
1. Computation Size, the number of tasks required to complete a computation.
2. Redundancy Level, the number of machines that will run identical jobs.
55
Leave
pool
Computation Split computation into tasks
Task scheduler
Machine pool
Get next job from queue
Select a machine
Assign job to machine
Gather results, or create
new jobs
Job
queue
Perform
job
Return
to pool
Create jobs for tasks
Figure 3.10: Hadoop system model in a domain-specic language [25].
3. Pool Size, the number of available machines.
4. Machine Reliability, the probability of a machine returning the correct result before
a timeout.
5. Processing Power of each machine.
Figure 3.4 depicts representative bounds for the variation points. This choice of
variation-point bounds does not in any way impact eQual, its analysis, or its comparison
to competing approaches. We have opted for these bounds simply because they were
56
Variation Point Lower Bound Upper Bound
Computation Size 500 2000
Redundancy Level 1 5
Pool Size 10 100
Machine Reliability 0.5 0.9
Processing Power 0.5 2
Table 3.4: Hadoop variation points and representative bounds [25,26].
obtained from a previously published analysis of a volunteer computing platform that
shares a number of characteristics with Hadoop [25,26].
3.3.2 Modeling
eQual's rst input is a system's architectural model that is amenable to dynamic analy-
sis. Several existing approaches enable creating such models, including ArchStudio [40],
XTEAM [46], PCM [105], and DomainPro [145]. Each of these as well as other simi-
lar approaches would suit our purpose. We decided to use our prior work, DomainPro,
because of its simple interface, dynamic analysis capabilities that use event-driven sim-
ulation, and model-driven architecture (MDA) approach [115] that allows architects to
dene variation points in their models and try dierent alternatives (although completely
manually).
As is common with MDA approaches, a system is designed in DomainPro in two
phases. First, an engineer must specify or reuse a previously dened metamodel for the
system. Second, the engineer designs the system by specializing and instantiating ele-
ments of this metamodel. A metamodel is a collection of design building blocks (e.g.,
57
components, interfaces, ports, services, hosts, resources, etc.) that are relevant to model-
ing systems of certain kinds or in certain domains. For example, Android-based systems
are most naturally modeled using concepts such as activities, fragments, and content
providers, while certain Web-based systems are best modeled using services.
DomainPro provides a built-in, generic metamodel for component-based architec-
tures [110]. We employ this metamodel in eQual's design, implementation, and eval-
uation. While more targeted metamodels for dierent classes of systems (e.g., highly dis-
tributed, Web-based, mobile, IoT, etc.) are likely to increase the eectiveness of eQual's
subsequent steps, this would result in additional burden on architects and would render
more dicult direct comparison to techniques, such as GuideArch, that are not MDA-
based (details in Chapter 4.3).
The over-arching goal of DomainPro is to simplify, and automate the development of
software systems. It is intended for software architecture-based modeling, analysis, and
code generation. We focus specically on leveraging DSLs to automatically generate a
range of tools that will be used to manipulate (i.e., \interpret") the application specic
software models captured in the DSL. These tools support engineers in automatic:
1. Model editing.
2. Model analysis and simulation.
3. System implementation.
DomainPro aims to maximize the
exibility in both presentation and analysis. The
drawing environment enables engineers to attach semantics to dierent part of the drawing
58
they have in mind. It does not restrict engineers to adhere to a certain notation or
formalization.
Automated design analysis and code generation is achieved through model transfor-
mation. In practical terms a model transformation is usually implemented by a program
called a model interpreter (MI). An MI reads the data contained in models, manipulates
that data, and produces output for a particular semantic domain. The semantic domain
may be implemented by an external application (e.g., an analysis tool) or run-time en-
vironment (e.g., middleware platform). The remainder of this section rst describes the
process of using DomainPro, then details its metamodeling facilities, model interpretation
components, and domain-specic model analysis using the provided simulation MI.
Software engineers rst use DomainPro's metamodel editor to create a metamodel
that denes a DSL. The metamodel specication process consists of instantiating the
types available in the metalanguage (the metatypes) and setting the values of their prop-
erties. DomainPro rst invokes the appropriate metainterpreter, which produces a model
interpreter framework (MIF) extension (a set of C# plug-in classes) by deriving the simu-
lation or implementation semantics of the DSL types in the metamodel. DomainPro then
compiles the provided MIF (also implemented in C#) with the extension. The output of
the compilation is a domain-specic model editor, simulation generator, and C# (.Net
framework) code generator, already congured with the DSL's custom semantics.
DomainPro is implemented using Microsoft .Net Framework. It contains 4 sub-
projects, Core, Language Builder, Designer, and Analyst. Language Builder is the metain-
terpreter, Designer is the editor MI, and Analyst is the simulation MI. By separating the
59
Meta
model
Develop
Metamodel
Generate Model
Interpreters
Legend
Activity
Enable System
(Re-)Design
Editing
MI
CodeGen
MI
Simulation
MI
Artifact
Figure 3.11: Software development process supported by DomainPro.
core types and putting them in the Core project, we facilitate the isomorphic implemen-
tation of dierent MIs.Software development process that is supported by DomainPro is
displayed in Figure 3.11
Metamodeling
Metalanguages are generally centered around a small set of basic metatypes, which we
refer to as the core metatypes. The core metatypes are derived from basic information rep-
resentation paradigms, such as the object-oriented or the relational data model. Domain-
Pro targets software architecture-based modeling, analysis, and code generation. There-
fore, the metalanguage should be suciently
exible to capture the range of abstractions
60
and patterns commonly used for modeling software and systems architectures. Based
on our literature review we identied the following: structure, component, resource,
interface, link, implementation, method, anddatatype. Once a set of core metatypes
has been chosen, the next task will be to attach partial semantics to each metatype. Cru-
cially, the semantics of metatypes should only incorporate assumptions shared among a
broad family of DSLs. Semantics that vary from one application context to another will
be left for engineers to specify using properties, as described below. Note that semantic
assumptions will be incorporated directly into DomainPro's interpretation components,
simplifying their implementations and increasing the scalability and eciency of the MDE
platform.
The use of semantic assumptions results in a trade-o between the ability to syn-
thesize supporting toolsets and DSL
exibility. For example, current MDE platforms
can synthesize model editors because their metatype semantics include visualization,
presentation, and editing concerns, but they lack the semantics necessary to synthe-
size other types of tools. One approach for arriving at a set of semantic assumptions is
co-renement [71]. Co-renement begins with an initial candidate set of metalanguage
semantics (e.g., the semantics re
ecting the well-understood constructs and abstractions
underlying architecture-based software development [158]) and target platform seman-
tics (e.g., the semantics of a programming language or middleware platform selected to
implement the architectures). Expanding the metalanguage semantics strengthens re-
strictions on DSL denition, but weakens restrictions on automated model interpreter
generation; expanding the semantics of the target platform (if possible) has the inverse
result. Thus, through an iterative process, the semantics of each can be brought into
61
alignment. In addition to embedded semantic assumptions, DomainPro attaches seman-
tics to metatypes through properties. Properties are typed attributes and associations
with other metatypes. In contrast to semantic assumptions, metatype properties are used
to capture domain-specic semantics that vary from one context to another. This use of
properties has the advantage that metamodel developers are not required to write intricate
formal semantic specications using a complex notation such as Structured Operational
Semantics [129]. Instead, software engineers can congure semantics using menus and
dialogs, and automatically check properties for consistency. For example, the rendering
of domain-specic types within the model editors of today's MDE platforms does not
need to be specied because their rendering semantics are encoded within the properties
of their metatypes. The
ip side, in this case, is that the semantics that can be captured
are restricted by the MDE platform's developer.
Interpretation
In contrast to the canonical architecture, each interpretation capability in DomainPro is
realized through a paired metainterpreter and its associated model interpreter framework
(MIF), which is the template for constructing the desired collection of editing, analysis,
and code generation model interpreters (MIs).
Notionally, a MIF can be thought of as a virtual machine whose instruction set is the
set of implemented transformation operations supplied by the metainterpreter, which is,
in turn, akin to a compiler whose function is to generate programs to be executed by the
MIF virtual machine. Our implementation of a MIF is like an application framework.
This ensures that the MIF achieves requisite performance and scalability by internalizing
62
program control logic and invoking application extensions, rather than being passively
invoked by application control logic (inversion of control). Moreover, an application
framework strictly controls the ways in which its behavior is modied, and thereby ensures
that assumptions (including embedded semantic assumptions) are not violated.
The editing components of DomainPro operates similarly to existing MDE plat-
forms [59]. However, other interpretation components of our approach are not present in
existing MDE platforms. DomainPro's metainterpreter derives the semantics of domain-
specic types from their metatype property denitions and determines a set of rules
for transforming each domain-specic type to the target semantic domain|software
architecture-based development. The metainterpreter encodes the transformation rules
in an MIF extension. The resulting MIF implements the actual transformation logic (e.g.,
operations and algorithms) for each MI.
Simulation
As described, DomainPro automatically generates fully congured, domain-specic inter-
preters that today have to be programmed manually. One type of interpreter generated by
DomainPro is simulation generators. The simulation generator designed for DomainPro
is a component-based variant of the widely used discrete event simulation paradigm [184].
DomainPro's simulation generators consist of the simulation MIF (built into DomainPro)
and domain-specic simulation MIF extension code (autogenerated by DomainPro). En-
gineers specify a list of watched types to capture the behaviors of the system. Depending
on the metatype of the watched types DomainPro attaches a tting listener to that type
that monitors and records the changes of its relevant attributes during the simulation
63
time. Both of these are extensible and engineers can ne-tune these attributes, and de-
ne custom listeners to monitor other types if they need. Out of the box DomainPro
provides listeners for the following types (monitored attributes are listed in the paren-
theses): Data (value), method (number of invocations, invocations intervals, execution
time, blocking time), resource (idle capacity, queue length), and component (number of
blocking methods, number of executing methods).
The Hadoop model in Figure 3.10 uses DomainPro's generic metamodel by specializing
and instantiating the metamodel elements [145]. Figure 3.10 represents Hadoop's model
using the resulting visual DSL: a Computation is a DomainPro Operation depicted as a
circle; each activity in the Task Scheduler Component is a DomainPro Task depicted as
an oval; Machine Pool is a DomainPro Resource depicted as the shape containing the
lled-in circles; data-
ows are DomainPro Links represented with wide arrows; and so
on. Figure 3.10 omits the DomainPro Parameters for each modeling element for clarity;
Figure 3.4 lists the key parameters we reference in this dissertation.
3.3.3 Preparation
eQual's second input consists of answers to a set of questions that fall into two cate-
gories: the system's (1) variation points and (2) properties of interest (e.g., scalability
and dependability). eQual formulates these questions in terms of the system's design,
presenting specic choices that are intended to be straightforward for architects.
A. Questions Regarding Variation Points
For each variation point V , eQual asks architects three questions:
64
1. What is V 's lower bound?
2. What is V 's upper bound?
3. What is the desired function for exploring V ?
The lower and upper bounds capture the acceptable range of alternatives for each vari-
ation point. Exploration functions enable architects to customize how eQual samples
the specied range in the process of design exploration (as detailed in Sections 3.3.4 and
3.3.5). For example, in our model of Hadoop, the Pool Size variation point has the lower
bound of 10 and upper bound of 100 (recall Figure 3.4). The current prototype of eQual
provides 12 exploration functions: Uniform, Poisson, Gamma, Exponential, etc. eQual
also allows architects to provide lists of concrete values instead of ranges.
As shown by a prior analysis [147], well over 100 design decisions have been made
during the development of Hadoop. Each design decision involved selecting an alternative
for a variation point. Hadoop's architects had considered as few as 2 and as many as 8
alternatives per variation point. Even the 68 minor Hadoop revisions analyzed in [147]
introduced up to 4 new design decisions each. Exploring the eects of just those newly
introduced design decisions could be non-trivial. While manually exploring the resulting
system variants by considering 1-2 alternatives per decision might be feasible, the resulting
decision space grows exponentially with the number of alternatives and quickly eclipses
human abilities. For example, a Hadoop version that involves merely 4 design decisions
and 5 alternatives per decision will have more than 500 variants. By contrast, the burden
eQual places on architects is to answer 4 3 = 12 questions about the involved variation
points.
65
We do not expect architects to be able to answer the above questions right away. For
example, an architect may not be sure what the lower bound for the size of the node pool
or the upper bound for the redundancy level should be. Instead, eQual allows several
possibilities:
1. Architects may know the exact answer to a question, e.g., based on the requirements
or domain knowledge.
2. Architects may be able to provide a partial answer, such as a property's lower
bound.
3. Architects may be unable to answer the questions, leaving the range of alterna-
tives unbounded. This allows architects to explore properties for which they have
dierent degrees of knowledge.
B. Questions Regarding Non-Functional Properties
eQual's second set of questions deals with the system's desired non-functional proper-
ties. These properties are the basis for assessing design alternatives (as discussed in Sec-
tion 3.3.5). The non-functional properties are determined from the system requirements
and the characteristics of the domain. For example, in Hadoop, prior work identied four
properties [25,26]:
1. Reliability: Ratio of tasks that have to be restarted to all tasks.
2. Machine Utilization: Percentage of machines that are being utilized at any point
in time.
66
3. Execution Time: The amount of time it takes a given conguration to process the
given task.
4. Cost: The total number of executed jobs can be used a proxy to estimate the cost
of the system.
Each property has to be tied to an aspect of the output of the system's dynamic analy-
sis. In DomainPro and other approaches that use discrete-event simulation for dynamic
analysis of system models (e.g., IBM's Rational Rhapsody [75]), the state of a system
is captured at the times of event occurrences. Hence, the output is a set of time-series
objects regarding dierent aspects of the simulated system.
For each non-functional property P , eQual asks three questions:
1. What time-series object is of interest?
2. Is P directly or inversely related to overall system quality?
3. What is P 's importance coecient?
For example, in the case of Hadoop's Machine Utilization property, the relevant time-
series object is the one capturing the idle capacity of the machine pool in the Hadoop
model (recall Figure 3.10); the direction of the relationship is inverse (lower idle capacity
means higher utilization); and the importance coecient may be set to 3 (an ordinal value
that would treat Machine Utilization as more important than, e.g., Cost whose coecient
is 1, and less important than, e.g., Reliability whose coecient is 5). Thus, for the above-
discussed example of a newly introduced Hadoop minor version with 4 variation points,
67
given Hadoop's 4 properties of interest, an architect using eQual would have to answer a
total of 24 questions: 12 questions for the variation points, and 12 for the properties.
eQual's objective is to elaborate the information architects already must take into
account, without further burdening them. In current practice, architects will often ignore,
accidentally omit, indirectly consider, or incorrectly record information captured by these
questions. The many well-known software project failures provide ample evidence of
this [157]. By consolidating the questions into one place and a standard format, eQual
aims to convert this frequently haphazard process into a methodical design. As the
architects explore the design alternatives and gain a better understanding of the system,
they will be able to go back and add, remove, or change their answers.
3.3.4 Selection
The system's design model (Section 3.3.2) and the answers pertaining to the system's
variation points and properties (Section 3.3.3) are the inputs to the selection step, whose
objective is to explore the space of design variants intelligently and to make it tractable.
For example, in Hadoop, this can help the engineer explore the eects of non-trivial
decisions, such as: What yields better system reliability at an acceptable cost, a greater
number of less-reliable machines or fewer more-reliable machines? How will this choice
aect system performance?
Selection begins by generating an initial set of variants, i.e., by making an initial
selection of alternatives for the system variation points using the information provided
by the architects during preparation (Section 3.3.3). We call this initial set seed variants
68
and process eQual uses to pick the seed (and later) variants the selection strategy. The
seed variants feed into assessment (Section 3.3.5), where eQual comparatively analyzes the
variants. Assessment feeds its ranking of the variants back to selection (recall Figure 3.2),
which in turn uses this information to generate an improved set of variants during the
next iteration.
The key factors that determine the eectiveness of the selection process are the man-
ners in which 1 the seed variants are generated and 2 the information supplied by the
assessment step is used to generate subsequent variants. In principle, eQual allows any
such selection strategy. The objective behind allowing dierent strategies is to allow an
architect to control the selection step's number of iterations and generated variants to t
her needs, specic context, and available computational resources.
In order to generate the variants, we use genetic algorithm. A genetic algorithm
(GA) is a metaheuristic inspired by the process of natural selection that belongs to the
larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to
generate high-quality solutions to optimization and search problems by relying on bio-
inspired operators such as mutation, crossover and selection [116]. In a genetic algorithm,
a population of candidate solutions (called individuals, creatures, or phenotypes) to an
optimization problem is evolved toward better solutions. Each candidate solution has a
set of properties (its chromosomes or genotype) which can be mutated and altered; tra-
ditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings
are also possible.[2]
69
The evolution usually starts from a population of randomly generated individuals,
and is an iterative process, with the population in each iteration called a generation. In
each generation, the tness of every individual in the population is evaluated; the tness
is usually the value of the objective function in the optimization problem being solved.
The more t individuals are stochastically selected from the current population, and each
individual's genome is modied (recombined and possibly randomly mutated) to form
a new generation. The new generation of candidate solutions is then used in the next
iteration of the algorithm. Commonly, the algorithm terminates when either a maximum
number of generations has been produced, or a satisfactory tness level has been reached
for the population.
In our prototype implementation, we have explored two selection strategies based on
the genetic evolutionary-algorithm paradigm: random seeding and edge-case seeding. In
random seeding, we choose the seed variants completely randomly. In edge-case seeding,
we aim to generate as many variants as possible containing either side of the boundary
conditions that have been provided to eQual. For example, in Hadoop one variant would
be generated by selecting all lower-bound values from Table 3.4 (500, 1, 10, 0.5, 0.5),
another by selecting all upper-bound values (2000, 5, 100, 0.9, 2), a third by selecting
upper-bound values for the top three variation points and lower-bound values for the
remaining two variation points (2000, 5, 100, 0.5, 0.5), and so on. Note that edge-case
seeding is not possible in cases where options are nominal and do not have binary or
numerical values assigned to them.
Both strategies are able to quickly prune the space of variants and arrive at good
candidate designs. However, our empirical evaluation (discussed in Section 4.3) indicates
70
Figure 3.12: Reliability time-series of a Hadoop variant that uses machines with 90%
reliability and no redundancy.
Figure 3.13: Reliability time-series of a Hadoop variant that uses machines with 50%
reliability and redundancy factor of 2.
that the edge-case heuristic is more ecient and arrives at better solutions. We aim to
preserve Pareto optimal [29] solutions at each step since this provides an intuitive way to
explore the extreme eects of decisions, giving architects insight into the dierent system
aspects and their inter-relationships.
71
3.3.5 Assessment
To assess a variant's quality, eQual dynamically analyzes it via simulation. We have
opted for simulation-based analysis because simulations are representative of a system's
real behavior due to their inherent nondeterminism [81]. eQual specically relies on
discrete-event simulations, generating outputs in the form of time-series objects. Compar-
ing dierent variants thus requires an analysis of their simulation-generated time-series.
Although there are dozens of similarity metrics, in most domains (e.g., robotics, speech
recognition, software engineering) Dynamic Time Warping (DTW) has been shown to
perform better than the alternatives [43]. We thus use DTW.
For each design variant, eQual generates a single time-series object for each non-
functional property. For Hadoop, that means four time-series per variant, corresponding
to the system's 1 Reliability, 2 Execution Time, 3 Machine Utilization, and 4 Cost.
Each data point in a time-series corresponds to the computed value for the given property
at the given time.
Depending on the direction of the relationship of a property with the overall system
quality, we aim to nd the variant that has yielded the time-series with the highest (direct
relationship, e.g., for Reliability) or lowest (inverse relationship, e.g., for Cost) values for
that property. To this end, we need to compare each time-series with the optimum time-
series. The optimum time-series for a given non-functional property is a constant time-
series each of whose data points is equal to the highest (or lowest) value of the property
achieved across all simulations. This comparison requires having access to all of the
simulation-generated data in one place, and computing the global optimum and distances
72
from it. In turn, this may entail transferring hundreds of megabytes of data per variant,
and having to redo all of the calculations in case a new variant is added that changes the
optimum time-series. Such a solution would cause prohibitive performance and scalability
problems in scenarios with multiple iterations involving thousands of variants.
BRTA
To address this problem, we devised the Bipartite Relative Time-series Assessment tech-
nique (BRTA). BRTA enables distribution of the time-series analysis and eliminates the
need to transfer all simulation-generated data to a central node. Instead, as indicated
in Figure 3.2, multiple nodes are tasked with assessing dierent subsets of variants via
simulation; a node may be responsible for as few as one variant. Each node behaves in the
manner described above: it performs a discrete-event simulation of the design variants
with which it is tasked, computes an optimum time-series, and uses DTW to compare
the individual time-series with the optimum. Note that the optimum time-series is now
a local optimum since the other nodes will perform the same tasks on their design vari-
ants. In addition, for each node, BRTA calculates the range (minimum-to-maximum) for
the time-series computed locally, as well as the normalized distance (distance divided by
the number of points in the time-series). For example, Figures 3.12 and 3.13 show the
time-series captured by eQual for the Reliability of two of Hadoop's variants. Figure 3.12
variant's normalized distance from its local maximum is 0.04, and Figure 3.13 variant's
is 0.35.
Instead of returning all simulation-generated data to the assessment node (recall Fig-
ure 3.2), BRTA only sends a summary containing the above measurements. The global
73
Figure 3.14: eQual's visualization of two candidate Hadoop variants showing their re-
spective values for the four properties of interest.
assessment algorithm gathers these summaries and calculates the distance to the global
optimum time-series for each time-series using the following formula:
D
g
=
8
>
>
>
>
>
>
<
>
>
>
>
>
>
:
Max
g
O
l
+D
l
(if direct)
O
l
Min
g
+D
l
(if inverse)
D
g
is the distance to the global optimum; O
l
is the local optimum; D
l
is the distance to
the local optimum; Max
g
(Min
g
) is the global max (min) value among all time-series of
a non-functional property.
The updated summaries now include the globally normalized values for each time-
series in each variant and will be used to rank the variants. To use the D
g
values of
dierent non-functional properties to calculate the overall utility of a design variant, we
linearly rescale them to a value between 0 and 1. The overall quality of the system,
then, is the average, weighted by the importance coecients provided by the architects
74
(recall Section 3.3.3), among all of these values. In cases when multiple variants have
comparable qualities, eQual also allows architects to visually compare them. Figure 3.14
shows an example such a visualization: the overlapping radar diagrams provide architects
with details that are lost in the single system-quality numerical values.
Mathematical Proof
We conclude this section by providing the mathematical proof of BRTA's correctness. A
reader may choose to skip the rest of this section if she is not interested in the details
contained therein.
Denition. Time-series t is generated from an alternative simulation. We dene time-
series s as a constant, i.e., s
i
= ^ s everywhere. Our goal is to nd the minimum distance
between t and s using the DTW algorithm. D
i;j
denotes the minimum DTW distance
between the rst i elements of t and the rst j elements of s.
Denition. cost function is the absolute distance between two points. Given that s is
constant, we dene c
j
=cost(^ s;t
j
) for all 1jm. Also, let
c
j
=
j
X
k=1
c
j
=c
1
+::: +c
j
denote cumulative cost.
Assumption. The cost function is always non-negative, i.e., c
j
0 for all 1jm.
Lemma. For all 1in and 1jm,
D
i;j
c
j
:
75
Proof. We prove Lemma 3.3.5 by induction. It is easy to verify that
D
1;j
=c
j
for all j, and
D
i;1
=ic
1
c
1
for all i.
Thus, the lemma holds for these two base cases of our induction. For inductive step,
we consider the case where i and j are both larger than 1. The value of D
i;j
is, by
denition, minimum of three values listed below. We show, using induction hypothesis,
that neither of them is less than c
j
.
D
i1;j
+c
j
c
j
+c
j
c
j
(reason: non-negativity of the costs).
D
i;j1
+c
j
c
j1
+c
j
=c
j
(reason: by denition of cumulative costs,c
j1
+c
j
=c
j
).
D
i1;j1
+c
j
c
j1
+c
j
=c
j
(reason: the same as previous item).
It completes proof of Lemma 3.3.5.
Theorem. Whenever j is no less than i, the following equality holds: D
i;j
=c
j
.
Proof. We prove Theorem 3.3.5 using induction. For i = 1, it is easy to check
1
that
D
1;j
=c
j
. Let i be greater than 1 and ji. Value of D
i;j
is dened as the minimum of
three values one of which is D
i1;j1
+c
j
. By induction hypothesis, D
i1;j1
is exactly
equal to c
j1
. Thus,
D
i1;j1
+c
j
=c
j1
+c
j
=c
j
:
1
It is, in fact, one of the base cases in the proof of Lemma 3.3.5.
76
As a result, D
i;j
gets minimum of three values one of which is c
j
and consequently, it
cannot be greater thanc
j
. Given that Lemma 3.3.5 imposes it cannot be less thanc
j
, we
can conclude it must be equal to it.
Theorem 3.3.5 implies that if nm, then D
n;m
= c
m
=c
1
+::: +c
m
.
77
Chapter 4
Evaluation
The devised techniques, PredictAr, RecovAr, and eQual, have been designed and devel-
oped based on the three hypotheses dened in Section 1.2. This chapter presents the
empirical and analytical evaluation of the proposed techniques as a means to test the
dened hypotheses. To that end, we describe our evaluation in the next three sections.
In Section 4.1, the evaluation of RecovAr is presented. The focus of this section is to
empirically test RecovAr's applicability and accuracy. We discuss the real-world systems
on which RecovAr was applied, demonstrating its applicability (Section 4.1.1). We look at
RecovAr's precision and recall to measure its accuracy in recovering architectural design
decisions (Sections 4.1.2 and 4.1.3 respectively). We round this section with a discussion
of potential threats to the validity of our results (Section 4.1.4).
Section 4.2 describes the evaluation of PredictAr. Section 4.2.1 summarizes the sub-
ject systems used in our evaluation. Sections 4.2.2 and 4.2.3 depict the results of our
architectural signicance detection and prediction using our subject systems. Lastly,
Section 4.2.4 summarizes the threats to the validity of PredictAr's evaluation.
78
In Section 4.3, we evaluate eQual. We organize our evaluation of eQual around three
questions:
1. How onerous is it for architects to answer the preparation questions eQual asks?
2. Can eQual nd solutions that are close to optimal?
3. Can eQual scale to models with large numbers of variation points?
More specically, we aim to measure eQual's usability (Section 4.3.1), eectiveness (Sec-
tion 4.3.2), and scalability (Section 4.3.3). Similar to the previous sections, we conclude
eQual's evaluation with a discussion of the threats to its validity (Section 4.3.4).
4.1 RecovAr
We have empirically evaluated RecovAr's applicability and accuracy. Section 4.1.1 dis-
cusses the real-world systems on which RecovAr was applied, demonstrating its appli-
cability. Sections 4.1.2 and 4.1.3 discuss RecovAr's precision and recall, respectively.
Section 4.1.4 lists the threats to the validity of our results.
4.1.1 Applicability
Table 4.1 describes the two subject systems that we have used in our evaluation. These
systems were selected from the catalogue of Apache open-source software systems [3].
We selected Struts [1] and Hadoop [2] because they are widely adopted and t the target
prole of candidate systems for our approach; meaning these systems satisfy the following
requirements:
79
1. They are open-source.
2. They have accessible issue and code repositories.
3. Their developers log the xing commits (i.e., the changes applied to the system to
resolve the corresponding issues).
Furthermore, these systems are at the higher end of the Apache software systems' spec-
trum in terms of size and lifespan. Both of these projects use GitHub [63] as their version
control and source repository, and Jira [79] as their issue repository. We analyzed more
than 100 versions of Hadoop and Struts in total. Our analyses spanned over 8 years of
development, and over 35 million SLoC, and over 4,000 resolved issues.
System Domain Versions Issues MSLoC
Hadoop Distributed Processing 68 2969 30.0
Struts Web Apps Framework 36 1351 6.7
Table 4.1: Subject systems analyzed in our study.
Systems! Hadoop Struts
Recovery Methods! ACDC ARC ACDC ARC
No. of Iss. in Decisions 427 674 70 94
No. of Changes 950 3935 220 1359
No. of Decisions 112 149 27 23
Avg. Issues/Decision 3.81 4.52 2.59 4.94
Avg. Changes/Decision 1.77 2.36 1.77 2.21
Table 4.2: Number of changes, recovered decisions, and frequencies of issues and changes
per decision for our subject systems grouped by the employed recovery methods.
80
An overview of the results of applying RecovAr to the two subject systems is depicted
in Table 4.2. These results are grouped by:
1. System (Hadoop vs. Struts).
2. Employed recovery technique (ARC vs. ACDC ).
In this table, No. of Iss. in Decisions represents the total number of issues that were
identied to be part of an architectural design decision. On average, only about 18% of
the issues for Hadoop and 6% of the issues for Struts have had architecturally signicant
eects, and hence have been considered parts of a design decision. This is in line with
the intuition that only a subset of issues will impose changes whose impact on the system
can be considered architectural. Moreover, this observation bolsters the importance of
RecovAr for understanding the current state of a system and the decisions that have led to
it. Without having access to RecovAr, architects would have to analyze 5-to-15 times more
issues and commits to uncover the rationales and root causes behind the architectural
changes of their system. The remainder of Table 4.2 displays the total number of detected
architectural changes (No. of Changes), the total number of uncovered architectural
design decisions (No. of Decisions), and the average numbers of issues and changes per
decision (Avg. Issues/Decision and Avg. Changes/Decision, respectively). It is worth
mentioning that not all the detected changes were matched to design decisions, which we
will elaborate on further when evaluating RecovAr's recall (Section 4.1.3).
As displayed in Table 4.2, depending on the technique used to recover the architecture,
the number of uncovered design decisions varies. The reason is that ACDC and ARC
approach architecture recovery from dierent perspectives: ACDC leverages a system's
81
51%
13%
36%
(a) Hadoop-ACDC
37%
15%
48%
(b) Hadoop-ARC
Figure 4.1: Distribution of types of decision in Hadoop: solid black denotes simple deci-
sions; grey denotes compound decisions; white denotes cross-cutting decisions.
33%
11%
56%
(a) Struts-ACDC
26%
13%
61%
(b) Struts-ARC
Figure 4.2: Distribution of types of decision in Struts: solid black denotes simple decisions;
grey denotes compound decisions; white denotes cross-cutting decisions.
module dependencies; ARC derives a more semantic of a system's architecture, detecting
concerns via information retrieval techniques. Therefore, the nature of the recovered
architectures and changes, and consequently the uncovered design decisions, are dierent.
Recent work has shown that these recovery techniques provide complementary views
of a system's architecture [93]. The propagation of these complementary views to our
approach has yielded some tangible eects. For instance, RecovAr running ARC was
able to uncover a decision about refactoring the names of a set of classes and methods in
82
Hadoop, while RecovAr running ACDC could not uncover that decision. The reason is
that ARC is sensitive to lexical changes by design.
RecovAr aims to uncover three kinds of architectural design decisions (recall Sec-
tion 2.1). Our results conrmed the presence of all three kinds in our subject systems.
Figures 4.1, and 4.2 depict the distribution of dierent kinds of decisions detected for
each pair of systems and recovery techniques. While the relative proportion of simple
and cross-cutting decisions varies across systems and employed recovery techniques, the
number of compound decisions is consistently the smallest.
4.1.2 Precision
To assess RecovAr's precision, we need to determine whether the uncovered architectural
design decisions are valid. As captured in the premise of RecovAr, architectural design
decisions are not generally documented, hence a ground-truth for our analyses was not
readily available.
To overcome this hurdle, we devised a systematic plan to objectively assess the un-
covered design decisions. We dened a set of criteria targeting the two aspects of an
architectural design decision | rationale and consequence | and used them as the basis
of our assessment. Two PhD students carried out the analysis and the results of their
independent examinations were later aggregated. In the remainder of this section, we will
elaborate on the details of the conducted analyses.
We use four criteria targeting dierent parts of an architectural design decision (two
targeting rationales and two targeting consequences). Each criterion is rated using a
83
three-level-scale, with the numeric values of 0, 0.5, and 1. In this scale, 0 means that
the criterion is not satised; 0.5 means that the satisfaction of the criterion is conrmed
after further investigation by examining the source-code, details of the issues, or com-
mit logs; nally, 1 means that the criterion is evidently satised. The reason we use a
three-level scale in our analysis is to measure the precision of RecovAr's results from the
viewpoint of non-experts, and to distinguish the decisions according to the eort required
for understanding them. To that end, any criterion whose evaluation requires:
1. In-depth system expertise.
2. Inspection of information other than that captured in design decisions.
3. Having access to the original architects of the system, is given a rating of 0.
The criteria for assessing rationales are two-fold:
1. Rationale Clarity indicates whether the rationale and its constituent parts are easily
understandable. This is accomplished by looking at issue summaries and pinpoint-
ing the problems or requirements driving the decision.
2. Rationale Cohesion indicates the degree to which there is a coherent relationship
among the issues that make up a given rationale. Rationale Cohesion is only ana-
lyzed if the decision is shown to possess Rationale Clarity.
The criteria for assessing consequences are also two-fold:
1. Consequence-Rationale Association assesses whether the changes and their con-
stituent architectural deltas are related to the listed rationale.
84
2. Consequence Tractability assesses whether the size of the changes is tractable. In
other words, is the number of changes and their constituent deltas small enough to
be understandable in a short amount of time?
1
The two PhD students independently scored every decision based on the above criteria.
The three-level scale allowed us to develop a ner-grained understanding of the decisions'
quality.
As illustrative examples, we explain the scoring procedures for two decisions in Hadoop.
Listing 4.1 displays a simple design decision as uncovered by RecovAr in Hadoop version
0.9.0. The rationale consists of a single issue that explains the intent is to separate the
user logs from system logs. However, the rationale summary does not explain why this
needs to happen. Looking at the issue in Jira, the reason is that system logs are clut-
tering the user logs, and system logs need to be cleared out more frequently than user
logs. Since we had to look at the issue to understand \why" this decision was made,
the Rationale Clarity in this case was scored 0.5. Since we only have one issue, the
Rationale Cohesion is not applicable. The consequence involves one change with a single
architectural delta, i.e., adding the TaskLog. The relationship of this change to the issue
is clear and the change size is tractable. Therefore, Consequence-Rationale Association
and Consequence Tractability each received 1.
Listing 4.2 is a cross-cutting example from Hadoop 0.10.1. Although the rationales
seem unrelated, after inspecting the code and issue logs, we realized that LzoCodec will
1
Our evaluation considered decisions that included more than ve changes not to satisfy this criterion,
but this heuristic can be relaxed.
85
Systems! Hadoop Struts
Recovery Methods! ACDC ARC ACDC ARC
Simple 0.89 0.95 0.90 0.99
Compound 0.50 0.52 0.76 0.56
Cross-Cutting 0.61 0.76 0.78 0.77
Overall 0.72 0.72 0.81 0.71
Table 4.3: Average scores of recovered decisions per recovery-technique for Hadoop and
Struts.
be available only if the Native Library is loaded. Therefore, this decision received 0.5 for
Rationale Cohesion.
Rationales :
I s s u e 1 :
Desc : Seperating user l o g s from system l o g s in map reduce .
ID : HADOOP489
Consequences :
Change 1 :
Added : org . apache . hadoop . mapred . TaskLog
Listing 4.1: A simple decision from Hadoop v. 0.9.0
86
Rationales :
I s s u e 1 :
Desc : Implement the LzoCodec to support the l z o
compression algorithms .
ID : HADOOP851
I s s u e 2 :
Desc : Native l i b r a r i e s are not loaded .
ID : HADOOP873
Consequences :
Change 1 :
Added : o . a . h . i . compress . BlockCompressorStream
Added : o . a . h . i . compress . BlockDecompressorStream
Added : o . a . h . i . compress . CompressorStream
Added : o . a . h . i . compress . Compressor
Added : o . a . h . i . compress . DeCompressor
Added : o . a . h . i . compress . LzoCodec
Added : o . a . h . u t i l . NativeCodeLoader
Change 2 :
Removed : o . a . h . u t i l . NativeCodeLoader
Removed : o . a . h . i . compress . Compressor
Removed : o . a . h . i . compress . DeCompressor
Listing 4.2: Part of a cross-cutting decision from Hadoop v. 0.10.1; o.a.h.i is short-
hand for org.apache.hadoop.io
87
0 0.2 0.4 0.6 0.8 1
Decisions' scores
0%
20%
40%
60%
80%
100%
Decisions with a lower
or equal score
ACDC
ARC
Figure 4.3: Smoothed cumulative distribution of the decision scores for Hadoop.
Table 4.3 displays the average scores of the analyzed decisions, grouped by the de-
cision type and the recovery technique used for uncovering the decisions. Figures 4.3
and 4.4 display the cumulative distributions of the decision scores for Hadoop and Struts,
respectively. The right-leaning feature of these distributions indicates that higher-quality
decisions are more prevalent than lower-quality ones. The threshold of acceptability for
measuring precision is adjustable, but in our evaluation we required that a decision scores
at least 0.5 in the majority (i.e., at least three) of the criteria. In our analyses, on aver-
age (considering both ARC and ACDC ) 76% of the decisions for Hadoop and 78% of the
decision for Struts met this condition. Figure 4.5 depicts a descriptive view of the results
of our evaluation, classifying the decisions by the required criteria. The values denote the
proportion of decisions that have at least partially satised the criteria corresponding to
a given intersection.
88
0 0.2 0.4 0.6 0.8 1
Decisions' scores
0%
20%
40%
60%
80%
100%
Decisions with a lower
or equal score
ACDC
ARC
Figure 4.4: Smoothed cumulative distribution of the decision scores for Struts.
Most of the unacceptable decisions were made in the newly introduced major versions
of the two systems. This is consistent with prior ndings: The number of architectural
changes between a minor version (e.g., 0.20.2) and the immediately following major ver-
sion (e.g., 1.0.0) tends to be signicantly higher than the architectural change between
two consecutive minor versions [17]. In these cases, the decision sizes (number of ratio-
nales and consequences) tend to be higher than our conservative thresholds, and these
decisions tend to be rated as unacceptable. However, these decisions still provide valuable
insight into why the architecture has changed.
The reason that the ARC -based decisions generally score lower (i.e., they are less
right-leaning) than the ACDC -based ones is due to the nature of changes extracted
by ARC . As discussed previously, ACDC adopts primarily a structural approach to
architecture, while ARC follows a semantic approach, which requires a higher level of
89
1 . 9 %
0 %
1 . 6 %
2 . 9 %
4 . 7 %
0 . 6 %
0 %
7 %
2 . 2 %
5 5 %
0 %
0 %
C l a r i t y
C o h e si o n A s so c i a t i o n
T r a c t a b i l i t y
0 %
3 %
0 %
Figure 4.5: Classication of the recovered decisions based on the satised criteria.
system understanding. Therefore, attaining a conclusive rating for these decisions was
not possible by only looking at the decision elements dened earlier. Our ndings suggest
that the uncovered decisions based on ARC are more suitable for experienced users.
4.1.3 Recall
Another target of our evaluation was the extent to which RecovAr manages to success-
fully capture the design decisions in our subject systems. Based on the denition of the
architectural design decisions (recall Section 2.1), every architectural change is a conse-
quence of a design decision. We thus use the coverage of architectural changes by the
identied design decisions as a proxy indicator for measuring RecovAr's recall.
Our initial analysis reported low recall values, indicating that a relatively small frac-
tion of the extracted changes formed design decisions. The rst row of Table 4.4 displays
the results of this analysis. The recall of the extracted architectural changes was consis-
tently around 20% in our subject systems regardless of the used recovery technique. To
90
understand the cause of this, we manually examined the detected architectural changes
for which RecovAr could not locate the rationale. We were able to identify two major rea-
sons why an architectural change was not marked as part of a design decision by RecovAr.
The rst was when architectural change was happening in o-the-shelf components that
are integrated with the system and evolve separately. These can be third-party libraries,
integrations with the other Apache software projects, or even changes in the core Java
libraries that are detected by the recovery techniques. Examples of this phenomenon
for Struts include changes to the Spring Framework's architecture [154], and for Hadoop
changes to Jetty [4] and several non-core Apache Common projects. The second reason is
what we call the \orphaned commit" phenomenon. Orphaned commits are those commits
that conceptually belong to an issue. However, one of the following reasons prevents us
from using them.
1. The commits were not logged in an issue.
2. The pertinent commits have been merged with the code-base before their containing
issues have been marked as resolved.
3. A human error in the issue data rendered them useless for our approach (e.g.,
incorrectly specied aected version).
We consider orphaned commits a shortcoming of our approach that can aect its recall.
Orphaned commits might also limit RecovAr's ability to recover the initial architectural
design decisions that are not documented as issues. This is less concerning when issue
trackers are used in tandem with project management tools for task assignments in the
early stages of development. However, the imposed changes on a system's architecture
91
do not capture the original intentions of the developers and architects. Therefore, we
carefully inspected the architectural changes to eliminate the ones caused by external
factors. In our inspection, we created a list of namespaces whose elements should not
be considered architectural changes caused by the developer decisions. Partial lists of
these namespaces for Hadoop and Struts are displayed for illustration in Listings 4.3
and 4.4, respectively. We veried each entry by searching the system's code repository
and conrming that the instances where imported and not developed internally by the
developer teams.
com . facebook .
java . lang .
org . apache . commons . c l i .
javax . ws . rs .
. . .
Listing 4.3: Imported namespaces for Hadoop
com . opensymphony . xwork2 . u t i l .
java . i o .
org . apache . commons .
org . springframework .
. . .
Listing 4.4: Imported namespaces for Struts
92
Systems! Hadoop Struts
Recovery Methods! ACDC ARC ACDC ARC
Before Cleanup 20% 19% 21% 24%
After Cleanup 85% 67% 80% 63%
Table 4.4: RecovAr's recall before (top row) and after (bottom row) the clean-up of the
raw data.
We subsequently reevaluated RecovAr's recall. The results are displayed in the second
row of Table 4.4. The recall was 73% on average after eliminating externally caused
changes. This also reveals an interesting byproduct of RecovAr: by using RecovAr or a
specially modied version of it, we can detect the parts of a system that are not developed
or maintained by the system's core team. This information can be used for automatic
detection of external libraries and dependencies in software systems, and can help the
recovery techniques in extracting a more accurate view of a system's core architecture.
4.1.4 Threats to Validity
We identify several potential threats to the validity of our study with their corresponding
mitigating factors.
External Validity
The key threats to external validity involve our subject systems. We chose the two
systems in our evaluations from the higher end of the Apache spectrum in terms of size
and lifespan; each have a vibrant community, and are widely adopted. Another threat
stems from the fact that both of our systems use GitHub and Jira. However, RecovAr
only relies on the basic issue and commit information that can be found in any generic
93
issue tracker or version control system. The dierent numbers of versions analyzed per
system pose another potential threat to validity. This is unavoidable, however, since some
systems simply undergo more evolution than others.
Construct Validity
The construct validity of our study is mainly threatened by the accuracy of the recov-
ered architectural views and of our detection of architectural decisions. To mitigate the
rst threat, we selected the two architecture recovery techniques, ACDC and ARC, that
have demonstrated the greatest accuracy in a comparative analysis of available tech-
niques [56]. These techniques are developed independently of one another and use very
dierent strategies for recovering an architecture. This, coupled with the fact that their
results exhibit similar trends, helps to strengthen the condence in our conclusions. The
manual inspection of the accuracy of the design decisions uncovered by our approach is
another threat. Human error in this process could aect the reported results. To al-
leviate this problem, two PhD students independently analyzed the results to limit the
potential biases and mistakes. Moreover, the inspection procedure was designed to be
very conservative.
4.2 PredictAr
In this section, we will summarize our evaluation of PredictAr. We will go over our
subject systems (Section 4.2.1), and describe the results of applying our technique for
determining the architectural signicance of issues (Section 4.2.2). In Section 4.2.3, we
will describe the evaluation of our machine learning technique for predicting architectural
94
System Domain Versions Issues Avg. LOC
Hadoop Data Proc. Framework 68 7374 1.96M
Nutch Web Crawler 21 1524 118K
Wicket Web App Framework 72 4637 332K
CXF Service Framework 120 5852 915K
OpenJPA Java Persistence API 20 1675 511K
Table 4.5: Subject systems used for PredictAr's analyses.
signicance of issues. Finally, Section 4.2.4 will overview the threats to the validity of
our analyses and results.
4.2.1 Subject Systems
We report the empirical results involving ve Apache [3] open-source projects. Apache
was chosen because it is one of the largest open-source organizations in the world and
has produced a number of impactful systems. Furthermore, Apache systems have well-
maintained code repositories, release notes, and issue trackers. Table 4.5 lists our subject
systems. We analyzed the largest available Apache systems that rely on Jira [79] for
tracking issues and satisfy the following criteria:
1. The systems belong to dierent software domains, to ensure broad applicability of
our results.
2. The issues and their xing commits are tracked. Specically, we analyze \resolved"
and \closed" issues because they have complete sets of xing commits.
95
3. The systems have large numbers of resolved and closed issues to give us sucient
data points for our analyses and machine learning models.
4.2.2 Architectural Signicance Analysis
Table 4.6 contains the general distribution of issues in each system. The data is further
subdivided based on the issue type and priority. Issue type can be bug (B), feature or
improvement (F/I ), or other types such as test and task (O). Issue priority is either
critical (C ), major (Mj ), or minor (Mn). Table 4.7 displays the information about the
architecturally signicant issues recovered using ARC . Table 4.8 displays the results of
our analyses using ACDC .
The data shows that, in general, there are more bugs submitted to issue repositories
than features or improvements. Interestingly, although the number of architecturally
signicant issues is only a small fraction of all submitted issues, their distribution in
terms of priority is not very dierent from the original issue distribution. While this
nding deserves a closer inspection, it appears that engineers are typically unable to
isolate architecturally signicant issues, and consequently do not consider them any more
(or less) important than \regular" issues.
The distribution of bugs, features, and improvements does not show drastic change
between architecturally signicant and regular issues either. This is a nding we have not
seen in literature previously. Conversely, it is intuitively expected that new features or
improvements be more architecturally impactful than bugs. Overall, our results suggest
96
that architectural signicance is an overlooked facet of implementation issues, and cannot
be easily inferred from the existing tags applied to issues.
4.2.3 Accuracy of Signicance Prediction
We use precision and recall as the performance evaluation metrics. Precision shows the
ratio of correctly predicted architecturally signicant issues over all predicted signicant
issues. Recall denotes the ratio of correctly predicted architecturally signicant issues
over all of the actually signicant issues. Table 4.9 shows the evaluation results obtained
using the 10-fold-cross-validation setup, where each dataset is randomly partitioned into
10 equal-sized subsets. Nine of the subsets are used as training data, while the last subset
is retained as testing data. The process is then repeated 10 times.
The numbers reported in Table 4.9 are the average values across the 10 repetitions.
The top ve rows show the results of training and running our classier on the individual
systems; the Cross-Project row is the result of applying the classier on our entire corpus
of issues across all systems. The precision of our classier is very good, surpassing 90% in
certain cases under both ARC and ACDC. The overall precision across the ve systems is
above 80% under both recovery techniques. The recall values for our classier are lower
than the precision values. In the case of ARC, with the exception of Nutch, they are
all above 50% for the individual systems, and around 60% across all systems. In the
case of ACDC, the recall values are lower: only the Hadoop and Cross-Project values
are above 50%. The reason may lie in ACDC's dependency analysis-based architecture
recovery algorithm, a hypothesis we will have to evaluate further. Nutch is again a notable
outlier. OpenJpa yields the second-lowest recall values under both ARC and ACDC.
97
System# No. of Issues B % F/I % O % C % Mj % Mn %
Hadoop 7374 56.6 32.2 11.2 15.2 61.4 23.4
Nutch 1524 45.7 43.8 10.6 7.0 52.2 40.8
Wicket 4637 59.7 33.5 6.8 2.4 59.5 38.1
Cxf 5852 62.3 26.0 11.7 3.4 77.7 18.9
OpenJPA 1675 62.2 21.6 16.2 5.3 71.1 23.6
Table 4.6: Analyzed issues' general distribution. Issue types are bug (B), feature/im-
provement (F/I ), or other (O). Issue priority is critical (C ), major (Mj ), or minor (Mn).
System# No. of Issues B % F/I % O % C % Mj % Mn %
Hadoop 1066 61.5 36.6 1.9 26.6 60.2 13.2
Nutch 89 42.7 57.3 0.0 4.5 64.0 31.5
Wicket 1362 57.9 37.0 5.1 1.5 59.1 39.4
Cxf 2441 64.0 31.7 4.3 3.7 76.5 19.8
OpenJPA 580 58.6 25.5 15.9 4.7 77.8 17.6
Table 4.7: Overview of the results of our architectural signicance analysis under the
ARC recovery technique. Issue types are bug (B), feature/improvement (F/I ), or other
(O). Issue priority is critical (C ), major (Mj ), or minor (Mn).
System# No. of Issues B % F/I % O % C % Mj % Mn %
Hadoop 633 58.9 38.6 2.5 27.8 59.1 13.1
Nutch 60 38.3 61.7 0.0 1.7 66.7 31.7
Wicket 930 60.3 35.8 3.9 2.0 62.8 35.2
Cxf 1403 57.0 37.1 5.9 3.4 77.7 19.0
OpenJPA 453 56.3 26.3 17.4 4.2 79.7 16.1
Table 4.8: Overview of the results of our architectural signicance analysis under the
ACDC recovery technique. Issue types are bug (B), feature/improvement (F/I ), or
other (O). Issue priority is critical (C ), major (Mj ), or minor (Mn).
98
Recovery Method! ARC ACDC
System# Precision Recall Precision Recall
Hadoop 0.793 0.637 0.883 0.547
Nutch 0.941 0.276 0.951 0.217
Wicket 0.843 0.657 0.678 0.417
Cxf 0.801 0.698 0.928 0.468
OpenJpa 0.965 0.503 0.903 0.399
Cross-Project 0.816 0.592 0.806 0.573
Table 4.9: Precision and recall of PredictAr's classier for each system. The Cross-Project
row shows the result of applying the classier on the combined issues of all systems.
These two systems have much smaller datasets of issues compared to the rest of our corpus.
Moreover, Nutch has about an order of magnitude fewer architecturally signicant issues
than other systems, which further hampers the ecacy of our classication model.
4.2.4 Threats to Validity
In this section, we list the threats to the validity of our results. More specically, we
identify external and construct threats.
External Validity
Due to practical limitations, we only used open-source projects. Furthermore, all the
issues in our study belong to systems implemented in Java and use the Jira issue tracker.
To help mitigate this issue, we selected systems from dierent domains, thus expanding
our study's scope (recall Section 4.2.1). However, this limitation can still hamper the
99
generalizability of our results to propriety projects, written in dierent languages, or
using issue trackers other than Jira.
Construct Validity
The dataset containing architecturally signicant issues depends on the recovery tech-
niques employed. To mitigate this problem, we selected two techniques that exhibit
higher accuracy than their competitors [56]. Furthermore, any technique can be easily
incorporated in our framework. In fact, dierent techniques can be used to achieve task
specic goals.
4.3 eQual
In this section, we discuss the evaluation of eQual. We evaluate eQual to answer three
questions:
1. How onerous is it for architects to answer the preparation questions eQual asks?
2. Can eQual nd solutions that are close to optimal?
3. Can eQual scale to models with large numbers of variation points?
More specically, we aim to measure eQual's usability, eectiveness, and scalability. To
evaluate eQual, we have implemented its prototype on top of DomainPro [145]. The re-
sulting extension to DomainPro totals 4.7K C# and 1.0K JavaScript SLoC. Furthermore,
to enable an extensive evaluation of eQual's eectiveness, we built a utility totaling an
additional 1,000 C# and 200 MATLAB SLoC, as detailed in Section 4.3.2. Screenshots
of our prototypes can be found in Appendix A.
100
4.3.1 Usability
The focus of the usability evaluation is to measure how onerous eQual is to apply by
architects in practice. The assumption we make in this evaluation is that eQual provides
high-quality results. The actual quality of those results will be evaluated empirically in
Section 4.3.2. We rst present an analytical argument for eQual's usability, and then the
results of its empirical evaluation.
A. Analytical Argument
In Section 3.3.3, we explained the questions eQual asks of architects. Let us assume that
a system has N
v
variation points and N
p
properties. For each of them, eQual asks a
three-part question. The maximum number of eld entries eQual requires an architect
to make is, therefore, 3 (N
v
+N
p
). Recall that the architect also has the option of not
answering some (or even all) of the questions regarding variation points.
As discussed above, our analysis of Hadoop has relied on previously identied four
critical non-functional properties [25, 26]. Prior research suggests that there are usually
4{6 non-functional properties of interest in a software project, and rarely more than
10 [15]. Moreover, the analysis presented in a recent study [147] showed that the num-
ber of variation points per Hadoop version ranged between 1 and 12 [147]. Taking the
largest number of variation points for a single Hadoop version and the four properties,
an architect using eQual would have to provide no more than 48 answers to explore the
4-dimensional decision space of at least 2
12
system variants.
101
The number of individual answers an architect is expected to provide is precisely
bounded by eQual. The granularity of each answer is very small and its cognitive com-
plexity is low. Most importantly, eQual does not place a new burden on architects, but
only makes explicit the information that architects already have to consider during system
design.
B. Empirical Comparison to the State-of-the-Art
In addition to the analytical argument provided above, we also modeled Hadoop in
GuideArch [49], the most closely related approach for exploring alternative design de-
cisions. We compared the eQual and GuideArch models in terms of the numbers of eld
entries and the time required to complete them.
GuideArch helps architects make decisions in the early stages of software design using
fuzzy math [185] to deal with uncertainties about system variation points in a quantiable
manner. To apply fuzzy math, GuideArch uses three-point estimates, asking architects
to provide pessimistic, most likely, and optimistic values to describe the eects of their
decisions on the system's non-functional properties. For instance, in the case of Hadoop's
Processing Power, for each decision (such as using machines with 2Ghz CPUs) architects
have to specify the pessimistic, optimistic, and most likely values for the Reliability,
Utilization, Execution Time, and Cost properties.
GuideArch does not require the creation of a system model. However, GuideArch's
usefulness is contingent on the accuracy of its inputs, which requires in-depth knowledge
of the system's domain and behavior. This level of expertise is rarely available. To
alleviate this issue, GuideArch's authors suggest that architects obtain this information
102
by analyzing prior data, looking at similar systems, studying manufacturer specications,
and reading scientic publications [49]. These are dicult and time-consuming tasks that
are likely to rival the design eort required by eQual.
The specication of non-functional properties in GuideArch is similar to eQual. How-
ever, as discussed above, the specication of variation points is dierent, which, in turn,
impacts the modeling and analysis of available options. GuideArch requires that all
options be specied discretely, and is unable to explore ranges.
In a representative experiment, we selected ve options for each of Hadoop's variation
points from Figure 3.4, totaling 25 alternatives. For example, instead of simply speci-
fying the range 10{100 for Pool Size, we explicitly provided 10, 25, 50, 75, and 100 as
the options. The next step was to specify how each decision aects the non-functional
properties of the system. In doing so, we had to ll in a 2512 matrix in GuideArch. For
eQual, we had to answer 27 questions: 3 5 for the ve variation points, and 3 4 for
the four non-functional properties. Overall, it took us more than four hours to complete
over 300 mandatory elds in GuideArch. By contrast, it took the same author under six
minutes to answer the 27 questions required by eQual.
This discrepancy only grows for larger problems (e.g., more variation points or more
GuideArch options within a given variation point). In general, if T
f
is the total number
of eld entries in GuideArch, N
P
the number of properties, N
V
the number of variation
points, and a
i
the number of alternatives for a variation point v
i
, then
T
f
= 3N
P
N
V
P
i=1
a
i
103
System Domain Var. Points Terms Size
Apache Web Server 11 9 10
6
BDB C Berkeley DB C 9 10 10
5
BDB J Berkeley DB Java 6 8 10
6
Clasp Answer Set Solver 10 17 10
15
LLVM Compiler Platform 7 8 10
7
AJStats Analysis Tool 3 4 10
7
Table 4.10: Software systems used to study the eectiveness of eQual. Var. Points is the
number of variation points in each system; Terms is the number of terms in the systems'
tness models; and Size is the total number of variants in the design space.
The number of GuideArch eld entries grows quadratically in the number of properties
and variation points. The number of eld entries in eQual grows linearly: 3 (N
v
+N
p
).
This results in a footprint for eQual that is orders of magnitude smaller than GuideArch's
when applied to large systems.
4.3.2 Eectiveness
Most other approaches concerned with design quality (e.g., DomainPro [145], Palla-
dio [16], ArchStudio [40], XTEAM [46], PCM [105]), focus on a single variant, and do
not explore the space of possible design decisions (see Section 5.2 for further discussion).
They are complementary to eQual, as eQual can use each of them to evaluate the in-
dividual variants' quality; our prototype eQual implementation uses DomainPro for this
purpose. Prior work has shown that those techniques that aid engineers with arriving
at eective designs for their systems (e.g., ArchDesigner [7]) underperform GuideArch in
the quality of their top-ranked designs [48].
104
For the these reasons, we evaluated eQual's eectiveness by directly comparing it
with GuideArch as the leading, state-of-the-art approach. Unlike prior work in the area,
which has traditionally been limited to such head-to-head comparisons [7, 49], we also
assessed the quality of eQual's results on systems with known optimal congurations.
Our evaluation indicates that eQual produces eective designs, which are of higher quality
than prior work.
A. Head-to-Head Comparison with the State-of-the-Art
Both eQual and GuideArch use known optimization methods. Their absolute eectiveness
is dicult to determine as it requires ground-truth results for the modeled systems, but
we can compare their eectiveness relative to one another.
To that end, we analyzed the Hadoop models created with eQual and GuideArch
as described in Section 4.3.1 and compared the top-ranked variants they returned. For
example, in the case of the experiment highlighted in Section 4.3.1, both tools produced
Hadoop variants that were equally reliable (94%), had equal machine utilization (99%),
and comparable cost (17 for GuideArch vs. 19 for eQual). However, eQual's top-ranked
variant was nearly 7.5 times faster than GuideArch's (154s vs. 1,135s.). We identied one
possible reason for this discrepancy: We observed that GuideArch consistently selects
variants with lower machine reliability but higher redundancy than those selected by
eQual. The exact reasons behind this strategy are unclear from GuideArch's publications;
we have contacted its authors for possible answers.
We acknowledge that, as creators of eQual, we are more familiar with it than with
GuideArch. However, the author who performed the analysis has extensive experience
105
Random Edge-Case
System Default Mean Mean
Apache 0.264 0.311 0.146 0.899 0.163
BDB C 0.763 0.564 0.325 0.983 0.035
BDB J 0.182 0.517 0.408 1.000 0.000
Clasp 0.323 0.352 0.174 0.859 0.179
LLVM 0.253 0.235 0.234 0.902 0.219
AJStats 0.856 0.780 0.269 0.963 0.048
Overall 0.440 0.460 0.259 0.934 0.107
Table 4.11: Comparison between the two seeding strategies employed by eQual and the
quality of the nominal solutions commonly selected by architects. Default depicts the
common solutions used by architects, obtained from [151].
with architectural modeling, including with GuideArch. Furthermore, we have a good
understanding of Hadoop and made every eort to use GuideArch fairly. Our observations
are buttressed by the fact that the quality of the variants GuideArch recommends depends
heavily on the architect's ability to predict the eects of the design decisions on the
system's non-functional properties. This is a non-trivial, error-prone task regardless of
one's familiarity with GuideArch.
B. Evaluation on Systems with Known Optimal Designs
We further evaluated eQual's eectiveness against known tness models of six real-world
systems, summarized in Figure 4.10. Fitness models describe the non-functional proper-
ties of a system using its variation points and their interactions. The tness models we
used in our evaluation were obtained by Siegmund et al. [150,151] and shown to accurately
predict the non-functional properties of highly congurable systems. These tness models
aim to detect interactions among options (or features) and evaluate their in
uence on the
106
50 100 200
Generation Size
0.6
0.7
0.8
0.9
1
Optimal Proximity
AJStat
50 100 200
Generation Size
0.6
0.7
0.8
0.9
1
Optimal Proximity
BDB J
50 100 200
Generation Size
0.6
0.7
0.8
0.9
1
Optimal Proximity
BDB C
50 100 200
Generation Size
0
0.2
0.4
0.6
0.8
1
Optimal Proximity
Apache
50 100 200
Generation Size
0
0.2
0.4
0.6
0.8
1
Optimal Proximity
Clasp
50 100 200
Generation Size
0
0.2
0.4
0.6
0.8
1
Optimal Proximity
LLVM
Figure 4.6: Optimal Proximity distribution of the best variant generated by eQual using
generation sizes of 50, 100, and 200. Each box plot comprises 100 executions of eQual's
exploration function. The y-axes of three of the systems legends range 0.6{1, whereas the
y-axes on the rest start at 0. This is done to enable better visualization of the variants'
quality distributions for systems with smaller ranges.
107
system's non-functional attributes. Each has been obtained by numerous measurements
of dierent variants of a software system. We decided to use these tness models because
they are analogous to our objective in eQual, despite being applied to software systems
at a later stage (i.e., systems that are already implemented and deployed). Furthermore,
the resulting subject systems' decision spaces range from 100,000 variants to 1 quadrillion
variants, making them attractive for testing eQual's range of applicability.
Conceptually, a tness model is simply a function from variants to a tness measure-
ment
:C!R, where tness can be an aggregation of any measurable non-functional
property that produces interval-scaled data. The model is described as a sum of terms
over variation option values. Individual terms of the tness model can have dierent
shapes, including n:c(X), n:c(X)
2
, or n:c(X):
p
c(Y ). For illustration, a congurable
database management system with the options encryption (E), compression (C), page
size (P), and database size (D) may have the following tness model:
(c) = 50 + 20:c(E) + 15:c(C) 0:5c(P ) + 2:5:c(E):c(C):c(D)
In general, the tness models are of the following form:
(c) =
0
+
X
i::j2O
(c(i)::c(j))
0
represents a minimum constant base tness shared by all variants. Each term of the
form (c(i)::c(j)) captures an aspect of the overall tness of the system.
Because only aggregate tness models were available to us, without loss of general-
ity, we treated each term as an individual non-functional property of a given system,
and translated its coecients into eQual's coecients. Then, using the formula of each
108
term, we generated the corresponding constant time-series representing the term. These
time-series were subsequently passed to eQual for exploration. To measure eQual's eec-
tiveness, we normalized each variant's tness and calculated the tness of the best variant
found by eQual using the ground-truth tness models. We then calculated that variant's
distance from the global optimum. We call this the Optimal Proximity. These steps were
accomplished via an extension to eQual totaling 1,000 C# SLoC, and an additional 200
MATLAB SLoC to tune and visualize the resulting eQual models.
Table 4.11 and Figure 4.6 depict the results of applying eQual on our six subject
systems using the two strategies discussed in Section 3.3.4: random seeding and edge-
case seeding. Table 4.11 compares eQual's two strategies against the solutions yielded by
using the default values suggested by the six systems' developers [151]. These results were
obtained by setting the cross-over ratio for the genetic algorithm to 0.85 and the mutation
rate to 0.35, using 4 generations of size 200. These hyper-parameters were obtained over
nearly 30,000 test executions, by using grid-search to nd the most suitable parameters
on average. The results in Table 4.11 show that in most cases even the purely random
seeding strategy for eQual is at least as eective as the default values suggested by the
developers. On the other hand, the edge-case strategy nds superior variants that on
average exhibit over 93% of the global optimum. Figure 4.6 provides additional detail,
showing the distribution of running eQual on the six subject systems 100 times using
the edge-case strategy, with generation sizes of 50, 100, and 200. The gure shows that,
with a larger number of generations, eQual is able to produce variants that, on average,
tend to match the reported global optimum for each system; in the case of Clasp, the
109
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
2 4 8 16
Total Execution Time (s)
Number of Simulation Nodes
C=500 C=1000 C=2000
Figure 4.7: eQual's scalability with respect to the number of simulation nodes.
lone exception, the quality of eQual's suggested variant was still over 90% of the global
optimum.
4.3.3 Scalability
To evaluate eQual's scalability, we used the Google Compute Engine (GCE). We created
16 n1-standard-1 nodes (the most basic conguration available in GCE, with 1 vCPU
and 3.75 GB memory) as simulation nodes, and a single n1-standard-2 node (2 vCPU
and 7.5 GB memory) as the central controller node. All nodes were located in Google's
us-central1-f datacenter. We used the variation points and non-functional property de-
scriptions described in Section 3.3.
110
c=500 c=1000 c=2000
0
20
40
60
80
100
Generated Simulation Data (MB)
Figure 4.8: Sizes of data les generated by simulations for dierent values of Computation
Size (c).
Number of Nodes
The rst aspect of eQual's scalability we evaluated is the observed speed-up when increas-
ing the number of employed simulation nodes. We used the general genetic algorithm
for 8 generations and the generation size of 256 variants, totaling 2,048 variants. We did
this with 2, 4, 8, and 16 simulation nodes, and for three values of the Computation Size
variation point. The execution time was inversely proportional to the number of nodes
(Figure 4.7), suggesting that our approach can be scaled up optimally by adding more
nodes. Using more powerful nodes can further speed up the computation. Note that each
data point in Figure 4.7 consists of 2,048 simulations. Overall, we simulated more than
24,500 design variants.
Number of Events
For the second part of the scalability evaluation, we measured the impact of increasing the
111
number of events that are generated during a simulation. This is indicative of how well
eQual performs on larger models. The total number of events generated in Hadoop is non-
deterministic. However, based on the characteristics of the model, we hypothesized that
increasing the Computation Size should increase the number of events roughly linearly
if other variation points remain unchanged. We empirically evaluated this hypothesis
by using the average sizes of the time-series object les generated during simulation as
a reasonable proxy for the number of events. Figure 4.8 shows that, on average, the
total number of events is directly proportional to Computation Size. Coupled with the
performance that eQual demonstrated for the same values of Computation Size, shown
in Figure 4.7 and discussed above, this is indicative of eQual's scalability in the face of
growing numbers of simulation events.
Number of Variants
Finally, we studied eQual's performance in the face of growing numbers of design variants.
We modied the genetic algorithm congurations to use ve generation sizes: 16, 32, 64,
128, and 256. For each generation size, we ran eQual for 8 generations, on 4, 8, and 16
nodes. Figure 4.9 shows that eQual is able to analyze over 2,000 design variants in120
minutes on 4 nodes, with a speed-up linear in the number of nodes, down to30 minutes
on 16 nodes.
112
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 1024 2048
Total Execution Time (s)
Number of Variants
n=4 n=8 n=16
Figure 4.9: eQual's scalability with respect to the number of variants.
4.3.4 Threats to Validity
While the evaluation of eQual indicates that it can easily scale, has a small footprint,
and nds accurate solutions, we also took several steps to mitigate the possible threats
to our work's validity.
External Validity
The threat to the external validity of our work is mitigated by the use of an MDA-based
approach. MDA solutions have been shown to be suciently robust and scalable, and
are widely used in research and industry [74,83].
Construct Validity
Two constituent parts of eQual help mitigate the threats to its construct validity:
113
1. The dynamic analysis of system designs by simulation.
2. Creating assessment models based on DTW.
In the past, these two solutions have been extensively used to great success. Discrete event
simulations have been used in a plethora of domains (e.g., avionics, robotics, healthcare,
computer networks, nance, etc.) [153, 159], and the bedrock of our assessment models,
DTW, is so prevalent that it is \dicult to overstate its ubiquity" [43, 133]. DTW also
subsumes Euclidean distance [43] as a special case [133], which increases the eQual's
range of applicability.
114
Chapter 5
Related Work
This dissertation covers two threads of related research:
1. Software architecture analysis, especially studies justifying, modeling, and recover-
ing software architectures and their constituent design decisions.
2. Quality assessment of software systems, more specically, techniques targeting soft-
ware models and design variants.
To that end, Section 5.1 covers concepts related to the denition of software architecture,
architecture recovery techniques, and studies leveraging these concepts to understand
software evolution. Section 5.2 explains the work related to assessment and analysis of
software systems using architectural models. We further detail each research thread in
its respective section.
115
5.1 Architecture-Based Software Analysis
Software architecture has emerged as the centerpiece of modern software development [157].
Developers are increasingly relying on software architecture to lead them through the
process of creating and implementing large and complex systems. For that reason, the
community of software engineering researchers has devoted a lot of eort to support
engineers in better understanding software architecture.
For many years, these eorts have been focused on the result, the consequences of the
design decisions made, trying to capture them in the architecture of the system under
consideration, often using a graphical representation. Such representations were, and
to a great extent still are, centered on views [89], as captured by the ISO/IEC/IEEE
42010 standard [34], or the use of an architecture description language (ADL) [110].
However, this approach to documenting software architectures can cause problems such as
expensive system evolution, lack of stakeholders communication, and limited reusability
of architectures [149]. Therefore, architecture as a set of design decisions was proposed to
address these shortcomings. This new paradigm focuses on capturing and documenting
rationale, constraints, and alternatives of design decisions [172]. In the remainder of this
section, we touch upon the related eorts in this area.
Architecture Recovery and Evolution
More often than not, a system's up-to-date and reliable architecture is not available. This
leads to a host of problems ranging from inadvertent accumulation of technical debt to
increasing maintenance costs [77]. To address these problems, researchers have proposed
numerous automated architecture recovery techniques.
116
Most of the automated software architecture recovery techniques are focused on the
structure of systems rather than the design decisions forming those systems. These recov-
ery techniques have been around for over three decades [18,73,142,143]. These techniques
typically cluster implementation-level entities (e.g., les, classes, or functions), where each
cluster denotes a component [84,103,179]. Garcia et al. conducted a comparative study
on the most notable recovery techniques [56]. Their analyses showed that ACDC [164],
and ARC [57] exhibit the best accuracy among the rest of the techniques. For this reason,
we use ACDC , and ARC in our work as well.
Automated architecture recovery techniques have been, directly and indirectly, used to
study software evolution. In our recent works [90], and [93], we have shown the benet of
studying architectural evolution by proposing a framework to correlate a software system's
quality with the evolution of its architecture. Several other studies [39, 72, 120, 161, 177]
have attempted to investigate architectural evolution. These studies are smaller in scope
than our prior work and the work presented in this dissertation. Additionally, unlike
RecovAr and PredictAr, these techniques focus on the structural architecture of a system
and not its constituent design decisions.
Architectural Design Decisions
Understanding architectural decisions is important for software maintenance and com-
prehension. A number of studies have been conducted to justify its necessity and show
its concrete benets. Falessi et al. argued for the value of capturing and explicitly doc-
umenting design decisions [50]. Burge et al. showed that understanding architectural
117
decision helps make better decisions [27]. Tang et al. empirically showed that knowing
the architectural decisions can improve the quality of software systems [156].
Some researchers focused on the importance of having architectural awareness, i.e.,
an ability to understand and assess varied aspects of the software architecture and ar-
chitectural decisions. Tyree et al. [162] described the importance of design decisions in
demystifying the software architecture and lling in the shortcomings of traditional ap-
proaches, such as RM-ODP (Reference Model for Open Distributed Processing) [135],
or 4+1 [89]. They devised a methodology for architects to document architectural de-
sign decisions, requirements, and pertinent assumptions. Nowak et al. also suggested
that architectural awareness can enhance the eciency and quality of the architecture
design process, and they proposed a methodology to manually capture the architectural
knowledge in the design process [122]. Other decision centric approaches (e.g., [36,186])
have been proposed to direct the derivation of target architectures from requirements.
These techniques aim to make design rationale reusable. RecovAr can augment these
techniques and reduce the architects' burden by pointing to the existing decisions where
such documents do not exist.
Jansen and Bosch et al. [24,77] dened architectural design decisions and argued for
the benets of the invaluable information getting lost when architecture is modeled using
purely structural elements. Several researchers focused on studying the concrete benets
of using design decisions in improving software system's quality [100, 156], and decision
making under uncertainty [27]. A recent survey by Weinreich et al. [173] showed that
knowledge vaporization is a problem in practice, even at the individual level. However,
118
unlike RecovAr, none of these research studies have focused on automatic recovery of
undocumented design decisions.
Roeller et al. [136] proposed RAAM to support reconstruction of the assumptions
picture of a system, i.e., early architectural design decisions. A serious shortcoming
of this approach is that the researchers need to acquire a deep understanding of the
software system to reconstruct the assumptions. ADDRA [78] was designed to recover
architectural design decisions in an after the fact documentation eort. It was built on
the premise that in practice, software architectures are often documented after the fact,
i.e. when a system is realized and architectural design decisions have been taken. Similar
to RAAM, and unlike our approach, ADDRA also relies on architects to articulate their
\tacit" knowledge.
Speculative Software Analysis
Applying machine learning and natural language processing techniques for software com-
prehension and maintenance has been receiving increasing attention from the research
community. Especially, issue data extracted from software repositories is widely used in
that it contains important information related to bug and software quality [28]. Anto-
niol et al. built a classier using machine learning techniques to classify issues into two
classes: bugs and non bugs [9]. Wiese et al. used issues as contextual data to improve the
co-change prediction model, i.e., a model to help developers aware of artifacts that will
change together with the artifact they are working on, of software systems [178]. Weiss
et al. employed a nearest neighbors technique to automatically predict the xing eorts
of issues to facilitate issue assignment and maintenance scheduling [174].
119
Furthermore, there is a rich corpus of defect prediction techniques that are tangentially
related to our work. A 2015 study by Malhotra [102] and a 2012 systematic review by
Hall et al. [67] have identied over 250 such studies. Predictive models have also been
employed for software cost and quality estimation. For instance, Wen et al. [175] identied
and studied 84 machine learning based software eort estimation models. Their study
showed that machine-learning models perform close to acceptable levels and outperform
the non-machine-learning based techniques in most scenarios.
There are two major dierences between our techniques and the existing approaches
in this area: 1 the existing work does not explicitly take advantage of the architectural
information about a software system, and 2 existing research typically aims to locate
immediate defects in software systems, whereas PredictAr identies issues whose resolu-
tion can lead to inadvertent architectural change which can cause defects further down
the line.
5.2 Software Quality Assessment
Researchers have tackled multiple aspects of quality assessment of software models during
the past two decades. The Object Management Group (OMG) extended the UML to
support non-functional property specications [65]. Palladio Component Model (PCM)
is another language that can be used to model applications and their non-functional
properties. It uses PCM-Bench tool to derive a linear-quadratic model and estimate the
performance of the system [16]. These models and many more have been built to support
non-functional property specications in dierent scenarios. They have been employed by
120
multiple approaches for design exploration and optimization. A survey of such models can
be found in [8,14,87]. Mian et al. [114] proposed an approach to transform models built
using dierent languages to each other. In this section, we will extend the classication
proposed by Marten et al. in [105] to describe the most closely related work.
Rule-Based
Rule-based approaches identify problems in a software design model and apply rules to re-
pair them. MOSES uses stepwise renement and simulation for performance analysis [35].
ArchE helps meet the quality requirements during the design phase by supporting mod-
iability and performance analysis, and suggesting potential design improvements [107].
DeepCompass explores the design space of embedded systems, relying on the ROBO-
COP component model and a Pareto analysis to resolve the con
icting goals of optimal
performance and cost between dierent architecture candidates [21]. Parsons et al. [127]
introduced an approach for detecting performance anti-patterns in Java-EE architectures.
Their method requires an implementation of a component-based system, which can be
monitored for performance. Since the approach is meant to improve existing systems, it
cannot be used during early development stages. Their approach is meant to improve
existing systems and cannot be used during early development stages. PUMA facil-
itates communication between systems designed in UML and non-functional property
prediction tools [180], focusing on performance and support feedback loop specication
in JESS [182]. FORMULA aims to overcome the challenge of the size of the design space
in DSE by exploring only a small subset of the space by removing the design candidates
considered equivalent based on a user-dened notion of equivalence [80]. This heavily
121
relies on \correct" estimates of non-functional attributes of the model from the users.
Unlike eQual, these approaches are also limited by their predened rules and cannot
explore the complete design space.
Metaheuristic
Metaheuristic approaches treat architecture improvement as an optimization problem.
DeSi [101], ArcheOpterix [8], and PerOpteryx [86] use evolutionary algorithms to optimize
a system's deployment with respect to quality criteria. Multi-criteria genetic algorithms
can automatically improve software architecture based on trade-o analyses, but existing
approaches (e.g., [105]) tend to suer from scalability issues. AQOSA provides modeling
based on AADL and performance analysis tools, and evaluates design alternatives based
on cost, reliability, and performance [97]. The SASSY framework targets service-oriented
architecture models and selects services and a pattern application to fulll the quality
requirements [111]. Metaheuristic simulation based on a genetic algorithm can derive the
deployment architecture and runtime reconguration rules to move a legacy application
to the cloud environment [53]. Mixed-integer linear programming can nd minimum cost
conguration for a given cloud-based application [10]. DESERT explores design alter-
natives by modeling system variations in a tree structure and using Boolean constraints
to cut branches without feasible solutions [44]. DESERT-FD automates the constraint
generation process and design space exploration [44]. GDSE uses meta-programming of
domain-specic design space exploration problems and expresses constraints for solvers
to generate architectural solutions [139]. GuideArch is the most closely related technique
122
for exploring design decisions [49], but it relies on more burdensome fuzzy math sys-
tem specications [185] than eQual's requirements (recall Section 4.3.1). eQual similarly
uses metaheuristic search to nd architectural solutions within a large design space, but
focuses on non-functional properties and and requires less work by the architects to use.
Software Product Lines
Software Product Lines (SPLs) typically target a fundamentally dierent problem than
eQual. SPLs [33, 68, 141, 152] allow for product derivation, the process of conguring
reusable software artifacts for a set of requirements. Unlike SPLs, eQual neither adds
nor removes features from a product. SPLs can use genetic algorithms to optimize feature
selection [66], but, unlike eQual, this requires developers to create objective functions to
measure each variant's tness. Combining multi-objective search and constraint solving
allows conguring large SPLs [70], but unlike eQual, this requires using historical data
about the SPL's products to evaluate variants, which prevents it from being used for
new systems. Optimizing highly congurable systems, despite being aimed at fully im-
plemented and deployed software systems, has clear relations to eQual. Among these
techniques, we used the studies conducted by Siegmund et al. [150, 151] to evaluate the
eectiveness of eQual. Oh et al. [123], and Sayyad [140] have recently devised techniques
to more eciently explore the space of the system congurations. These techniques can
complement eQual's exploration strategies.
123
Chapter 6
Concluding Remarks
Reliance on software architecture to lead the engineers through the process of creat-
ing and implementing large and complex systems is steadily increasing. Therefore, it is
imperative that architects, and engineers alike, understand and analyze the constituent
design decisions that yield a system's architecture. Despite their far-reaching implica-
tions, design decisions are rarely thoroughly analyzed. Moreover, design decisions are
typically a lost artifact of the architecture creation and maintenance process, as they are
not documented. These problems lead to creation of sub-optimal systems, and exacer-
bates knowledge vaporization in existing systems. In this dissertation, we presented our
techniques for methodically exploring software development alternatives. We devised, de-
scribed, and empirically and analytically evaluated three techniques that help engineers
overcome the challenges arising from making and maintaining design decisions at dierent
stages of a software development project.
RecovAr was a step toward addressing the problems arising from knowledge vapor-
ization and architectural erosion (Section 3.1). We formally dened the notion of an
architectural design decision. Using a project's readily available history artifacts (e.g., an
124
issue tracker or code repository), RecovAr automatically recovers the architectural design
decisions embodied in that system. We empirically examined how design decisions man-
ifest in software systems, using two large, widely-adopted open-source software systems.
In our evaluation, RecovAr exhibited high accuracy and recall (Section 4.1).
In Sections 3.2 and 4.2, we described a method for automatically detecting archi-
tecturally signicant issues and classifying them based on the textual and non-textual
information contained in each issue. Expanding on these resulting, we built a classica-
tion model to predict the architectural signicance of newly submitted issues. PredictAr
aims to raise architectural awareness thus helping engineers deliver higher quality code
based on well-informed decisions. Our study was conducted on ve large open-source soft-
ware projects. Using our automated detection technique, we analyzed 21,062 issues and
identied their architectural signicance. Our results showed that current categorizations
of issues (type and priority) do not eectively encompass architectural signicance.
Finally, we introduced eQual (Sections 3.3 and 4.3). Our technique guides architects
in making informed choices, by quantifying the consequences of their decisions early and
throughout the design process. eQual does so in a way that aims to minimize additional
burden on the architects, instead only providing structure and automated support to the
architects' already existing tasks. eQual is able to solve very large problems eciently,
guiding architects to select high-quality architectural variants for their system.
eQual was envisioned because the perceived criticality of the early decisions is not
re
ected in the support available to architects for making and evaluating them. The
work described in this dissertation provides an important step in addressing the chasm
125
between the needed and available support in this area. eQual provides a body of features,
such as ranking the alternatives, nding the optimum alternative, analyzing the critical
boundary decisions, and simultaneously visualizing multiple alternatives, that collectively
help architects to easily explore the design space without assuming a particular level of
expertise or experience with the system. eQual is tunable, allowing the architects to
modify or dene their own exploration strategies and functions.
We evaluated eQual in three ways. First, we demonstrated that using eQual is
signicantly easier than the state-of-the-art alternative that targets the same problem,
GuideArch [49]. As a representative illustration, an architect only needed to spend six
minutes to interactively answer the 27 questions eQual requires to analyze a recently
published model of Hadoop [25, 26], while GuideArch's 356 inputs required more than
four hours. Second, we extensively evaluated eQual's eectiveness. We showed that the
quality of the designs eQual produces is higher than that of GuideArch. We additionally
evaluated the quality of designs yielded by eQual against previously published ground-
truths obtained from real-world software systems. We showed that eQual recommends
variants that are of comparable quality to the real-world solutions determined to be op-
timal, while it signicantly outperforms the nominal solutions that are most commonly
selected by architects. Lastly, we demonstrated that eQual scales optimally with the
number of available computational nodes, system events, and system variants.
In summary, this dissertation puts forth the following contributions. In the context
of RecovAr, our contributions are:
126
1. Formal denition of an architectural design decision and a technique for tracing
such decisions in existing software project histories.
2. A method to classify whether decisions are architectural and to map those decisions
to code elements.
3. Empirical examination of how design decisions manifest in software systems, using
two large, widely-used systems.
4. A methodology for preserving design-decision knowledge in software projects.
PredictAr adds the following contributions:
1. A classier for predicting the architectural signicance of newly submitted issues.
2. A reusable dataset of 21,062 issues identied across ve large open-source software
systems that are labeled by their architectural signicance.
Finally, eQual makes the following contributions:
1. A method for automatically generating architectural assessment models from simple
inputs that architects provide.
2. Bipartite Relative Time-series Assessment, a technique for ecient, distributed
analysis of simulation-generated data, solving a previously prohibitively inecient
variant-assessment problem.
3. An architecture for seamlessly distributing and parallelizing simulations to multiple
nodes.
127
4. An evaluation of each of eQual's three goals on real-world systems, comparing to the
state-of-the-art alternative, and demonstrating its ease of use, accuracy of produced
solutions, and scalability.
5. An extensible platform for using general-purpose or proprietary evolutionary algo-
rithms to automate design-space exploration.
6.1 Future Work
There are a number of remaining research challenges that can be addressed in future
work. There is a slew of information in software repositories that can help increase the
accuracy of our RecovAr. These include comments, commit messages, documentations,
pull requests, tests, etc. RecovAr can be extended with a summarization technique to
provide succinct summaries of the recovered rationales and consequences. Also deploying
RecovAr in a larger scale will enable us to answer several unanswered research questions
such as, how many design decisions are typically made? Another interesting thrust is
to predict the consequences of issues, i.e., parts of the system that can be aected, and
guide the developers before making changes. PredictAr was a successful rst step in this
direction.
Furthermore, our study can be expanded to more systems by adding the support for
other issue trackers. The performance of PredictAr's classication model can be improved
by adapting recent advances in the eld of generative adversarial nets, which in theory
can be used to articially augment the size of the training dataset [183]. Furthermore,
based on our positive results we encourage researchers to explore the feasibility of similar
128
techniques to predict the non-functional eects of implementation issues (e.g., security
and reliability) in existing software systems [94,95,138].
While our results show promise, research challenges remain to improving eQual's
practical eectiveness. Our work to date has assumed that architects know the relative
importance of the non-functional properties in their systems. eQual allows architects to
change the non-functional properties' importance coecients, but future versions need
to actively guide architects in the identication of design hot-spots and help their under-
standing of the relative importance of the properties. Combining eQual with a software
architecture recovery technique will extend its applicability to existing software systems
with legacy architectures, for their future development and improvements. Our work on
automatically extracting design decisions (recall Section 3.1) from implemented systems
has been a successful rst step in this direction. Our successful application of eQual to
the six implemented real-world systems with known tness models [150, 151] provides
further condence in the likely success of this strategy.
129
References
[1] Apache struts. http://struts.apache.org/.
[2] Apache Hadoop, http://hadoop.apache.org, 2017.
[3] Apache software foundation, http://apache.org, 2017.
[4] Eclipse jetty, https://eclipse.org/jetty/, 2017.
[5] Jira, https://www.bugzilla.org/, 2017.
[6] Brent Agnew, Christine Hofmeister, and James Purtilo. Planning for change: A
reconguration language for distributed systems. Distributed Systems Engineering,
1(5):313, 1994.
[7] Tariq Al-Naeem, Ian Gorton, Muhammed Ali Babar, Fethi Rabhi, and Boualem
Benatallah. A quality-driven systematic approach for architecting distributed soft-
ware applications. In International Conference on Software Engineering (ICSE),
pages 244{253, 2005.
[8] Aldeida Aleti, Barbora Buhnova, Lars Grunske, Anne Koziolek, and Indika Mee-
deniya. Software architecture optimization methods: A systematic literature re-
view. IEEE Transactions on Software Engineering (TSE), 39(5):658{683, 2013.
[9] Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-
Ga
"el Gu eh eneuc. Is it a bug or an enhancement?: A text-based approach to classify
change requests. In Proceedings of the 2008 Conference of the Center for Advanced
Studies on Collaborative Research: Meeting of Minds, pages 23:304{23:318. ACM,
ACM, 2008.
[10] Danilo Ardagna, Giovanni Paolo Gibilisco, Michele Ciavotta, and Alexander
Lavrentev. A multi-model optimization framework for the model driven design of
cloud applications. In Search-Based Software Engineering, pages 61{76. Springer,
2014.
[11] Hamid Bagheri, Joshua Garcia, Alireza Sadeghi, Sam Malek, and Nenad Medvi-
dovic. Software architectural principles in contemporary mobile software: from
conception to practice. Journal of Systems and Software, 119:31{44, 2016.
130
[12] Hamid Bagheri and Kevin Sullivan. Pol: specication-driven synthesis of architec-
tural code frameworks for platform-based applications. In ACM SIGPLAN Notices,
volume 48, pages 93{102. ACM, 2012.
[13] Hamid Bagheri, Chong Tang, and Kevin Sullivan. Trademaker: automated dy-
namic analysis of synthesized tradespaces. In Proceedings of the 36th International
Conference on Software Engineering, pages 106{116. ACM, 2014.
[14] Simonetta Balsamo, Antinisca Di Marco, Paola Inverardi, and Marta Simeoni.
Model-based performance prediction in software development: A survey. IEEE
Transactions on Software Engineering, 30(5):295{310, 2004.
[15] Jagdish Bansiya and Carl G Davis. A hierarchical model for object-oriented design
quality assessment. IEEE Transactions on Software Engineering (TSE), 28(1):4{
17, 2002.
[16] Steen Becker, Heiko Koziolek, and Ralf Reussner. The palladio component model
for model-driven performance prediction. Journal of Systems and Software, 82(1):3{
22, 2009.
[17] Pooyan Behnamghader, Duc Minh Le, Joshua Garcia, Daniel Link, Arman Shah-
bazian, and Nenad Medvidovic. A large-scale study of architectural evolution in
open-source software systems. Empirical Software Engineering (EMSE), pages 1{
48, 2016.
[18] LA Belady and CJ Evangelisti. System partitioning and its measure. Journal of
Systems and Software (JSS), 1981.
[19] Steven Bird and Edward Loper. Nltk: the natural language toolkit. In Proceed-
ings of the ACL 2004 on Interactive poster and demonstration sessions, page 31.
Association for Computational Linguistics, 2004.
[20] David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77{
84, 2012.
[21] Egor Bondarev, Michel RV Chaudron, and Erwin A de Kock. Exploring perfor-
mance trade-os of a jpeg decoder using the deepcompass framework. In Proceed-
ings of the 6th international workshop on Software and performance, pages 153{163.
ACM, 2007.
[22] Bas Boone, Soe Van Hoecke, Gregory Van Seghbroeck, Niels Joncheere, Viviane
Jonckers, Filip De Turck, Chris Develder, and Bart Dhoedt. Salsa: Qos-aware
load balancing for autonomous service brokering. Journal of Systems and Software,
83(3):446{456, 2010.
[23] Aleksandr Alekseevich Borovkov. Asymptotic methods in queuing theory. John
Wiley & Sons, 1984.
131
[24] Jan Bosch. Software architecture: The next step. In European Workshop on Soft-
ware Architecture, pages 194{199. Springer, 2004.
[25] Yuriy Brun, George Edwards, Jae young Bang, and Nenad Medvidovic. Smart re-
dundancy for distributed computation. In International Conference on Distributed
Computing Systems (ICDCS), pages 665{676, Minneapolis, MN, USA, 6 2011.
[26] Yuriy Brun, Jae young Bang, George Edwards, and Nenad Medvidovic. Self-
adapting reliability in distributed software systems. IEEE Transactions on Software
Engineering (TSE), 41(8):764{780, 8 2015.
[27] Janet E Burge. Design rationale: Researching under uncertainty. Articial Intelli-
gence for Engineering Design, Analysis and Manufacturing, 22(04):311{324, 2008.
[28] Yguarat~ a Cerqueira Cavalcanti, Paulo Anselmo Mota Silveira Neto, Ivan do Carmo
Machado, Tassio Ferreira Vale, Eduardo Santana Almeida, and Silvio Romero
de Lemos Meira. Challenges and opportunities for software change request repos-
itories: a systematic mapping study. Journal of Software: Evolution and Process,
26(7):620{653, 2014.
[29] Yair Censor. Pareto optimality in multiobjective problems. Applied Mathematics
& Optimization, 4(1):41{59, 1977.
[30] Jane Cleland-Huang, Raaella Settimi, Oussama BenKhadra, Eugenia Berezhan-
skaya, and Selvia Christina. Goal-centric traceability for managing non-functional
requirements. In International Conference on Software Engineering (ICSE), pages
362{371. ACM, 2005.
[31] Jane Cleland-Huang, Raaella Settimi, Xuchang Zou, and Peter Solc. Automated
classication of non-functional requirements. Requirements Engineering, 12(2):103{
120, 2007.
[32] Paul C Clements. Software architecture in practice. PhD thesis, Software Engi-
neering Institute, 2002.
[33] Thelma Elita Colanzi, Silvia Regina Vergilio, Itana Gimenes, and Willian Nalepa
Oizumi. A search-based approach for software product line design. In Proceedings
of the 18th International Software Product Line Conference-Volume 1, pages 237{
241. ACM, 2014.
[34] International Standardization Organization/International Electrotechnical Com-
mittee et al. Iso/iec 42010: 2011-systems and software engineering{recommended
practice for architectural description of software-intensive systems. Technical re-
port, Technical report, ISO, 2011.
[35] Vittorio Cortellessa, Pierluigi Pierini, Romina Spalazzese, and Alessio Vianale.
Moses: Modeling software and platform architecture in uml 2 for simulation-based
performance analysis. In Quality of Software Architectures. Models and Architec-
tures, pages 86{102. Springer, 2008.
132
[36] X. Cui, Y. Sun, and H. Mei. Towards automated solution synthesis and rationale
capture in decision-centric architecture design. In WICSA, pages 221{230, 2008.
[37] Krzysztof Czarnecki and Simon Helsen. Classication of model transformation ap-
proaches. In Proceedings of the 2nd OOPSLA Workshop on Generative Techniques
in the Context of the Model Driven Architecture, volume 45, pages 1{17. USA, 2003.
[38] Marco D'Ambros, Alberto Bacchelli, and Michele Lanza. On the impact of design
aws on software defects. In QSIC 2010 (10th International Conference on Quality
Software), pages 23{31. IEEE, 2010.
[39] Marco D'Ambros, Harald Gall, Michele Lanza, and Martin Pinzger. Analysing
software repositories to understand software evolution. In Software Evolution, pages
37{67. Springer, 2008.
[40] Eric Dashofy, Hazel Asuncion, Scott Hendrickson, Girish Suryanarayana, John
Georgas, and Richard Taylor. Archstudio 4: An architecture-based meta-modeling
environment. In International Conference on Software Engineering (ICSE) Demo
track, pages 67{68, 2007.
[41] Pablo de Oliveira Castro, St ephane Louise, and Denis Barthou. Reducing memory
requirements of stream programs by graph transformations. In High Performance
Computing and Simulation (HPCS), 2010 International Conference on, pages 171{
180. IEEE, 2010.
[42] Lakshitha De Silva and Dharini Balasubramaniam. Controlling software architec-
ture erosion: A survey. Journal of Systems and Software, 85(1):132{151, 2012.
[43] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn
Keogh. Querying and mining of time series data: experimental comparison of repre-
sentations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542{
1552, 2008.
[44] Brandon K Eames, Sandeep K Neema, and Rohit Saraswat. Desertfd: a nite-
domain constraint based tool for design space exploration. Design Automation for
Embedded Systems, 14(1):43{74, 2010.
[45] Eclipse Modeling Framework Project. www.eclipse.org/emf/.
[46] George Edwards, Yuriy Brun, and Nenad Medvidovic. Automated analysis and
code generation for domain-specic models. In 2012 Joint Working IEEE/IFIP
Conference on Software Architecture (WICSA) and European Conference on Soft-
ware Architecture (ECSA)., pages 161{170. IEEE, 2012.
[47] George Edwards and Nenad Medvidovic. A methodology and framework for cre-
ating domain-specic development infrastructures. In International Conference on
Automated Software Engineering (ASE), pages 168{177. IEEE, 2008.
133
[48] Naeem Esfahani. Management of uncertainty in self-adaptive software. PhD thesis,
George Mason University, 2014.
[49] Naeem Esfahani, Sam Malek, and Kaveh Razavi. GuideArch: Guiding the explo-
ration of architectural solution space under uncertainty. In International Confer-
ence on Software Engineering (ICSE), pages 43{52, 2013.
[50] Davide Falessi, Lionel C Briand, Giovanni Cantone, Rafael Capilla, and Philippe
Kruchten. The value of design rationale information. ACM Transactions on Soft-
ware Engineering and Methodology (TOSEM), 22(3):21, 2013.
[51] Roy T Fielding and Richard N Taylor. Principled design of the modern web archi-
tecture. ACM Transactions on Internet Technology (TOIT), 2(2):115{150, 2002.
[52] George Forman and Ira Cohen. Learning from little: Comparison of classiers
given little training. In European Conference on Principles of Data Mining and
Knowledge Discovery, pages 161{172. Springer, 2004.
[53] S oren Frey, Florian Fittkau, and Wilhelm Hasselbring. Search-based genetic op-
timization for deployment and reconguration of software in the cloud. In Inter-
national Conference on Software Engineering (ICSE), pages 512{521. IEEE Press,
2013.
[54] Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classiers.
Mach. Learn., 29(2-3):131{163, 1997.
[55] Harald Ganzinger, Robert Giegerich, Ulrich Moncke, and Reinhard Wilhelm. A
Truly Generative Semantics-Directed Compiler Generator. ACM SIGPLAN No-
tices, 17(6):172{184, 6 1982.
[56] Joshua Garcia, Igor Ivkovic, and Nenad Medvidovic. A comparative analysis of soft-
ware architecture recovery techniques. In International Conference on Automated
Software Engineering (ASE), pages 486{496. IEEE, 2013.
[57] Joshua Garcia, Daniel Popescu, Chris Mattmann, Nenad Medvidovic, and Yuanfang
Cai. Enhancing architectural recovery using concerns. In International Conference
on Automated Software Engineering (ASE), pages 552{555. IEEE Computer Soci-
ety, 2011.
[58] David Garlan, Robert Monroe, and David Wile. ACME: An architecture descrip-
tion interchange language. In CASCON First Decade High Impact Papers, pages
159{173, 2010.
[59] Miguel Garzon et al. Umple: A framework for model driven development of object-
oriented systems. In Software Analysis, Evolution and Reengineering (SANER),
IEEE, 2015, pages 494{498. IEEE, 2015.
[60] The Generic Modeling Environment. www.isis.vanderbilt.edu/Projects/gme/.
134
[61] Walter R Gilks. Markov chain monte carlo. Wiley Online Library, 2005.
[62] Git. Git log. http://git-scm.com/docs/git-log, 2014.
[63] GitHub. https://github.com/, 2017.
[64] Swapna S Gokhale. Software application design based on architecture, reliability
and cost. In Computers and Communications, 2004. Proceedings. ISCC 2004. Ninth
International Symposium on, volume 2, pages 1098{1103. IEEE, 2004.
[65] OM Group et al. UML prole for schedulability, perfomance and time specication.
Version 1.1, formal/05-01, 2, 2005.
[66] Jianmei Guo, Jules White, Guangxin Wang, Jian Li, and Yinglin Wang. A ge-
netic algorithm for optimized feature selection with resource constraints in software
product lines. Journal of Systems and Software, 84(12):2208{2221, 2011.
[67] Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. A
systematic literature review on fault prediction performance in software engineering.
IEEE Transactions on Software Engineering, 38(6):1276{1304, 2012.
[68] Svein Hallsteinsen, Mike Hinchey, Sooyong Park, and Klaus Schmid. Dynamic
software product lines. Computer, 41(4), 2008.
[69] Jane Human Hayes, Alex Dekhtyar, and Senthil Karthikeyan Sundaram. Ad-
vancing candidate link generation for requirements tracing: The study of methods.
IEEE Transactions on Software Engineering, 32(1):4{19, 2006.
[70] Christopher Henard, Mike Papadakis, Mark Harman, and Yves Le Traon. Com-
bining multi-objective search and constraint solving for conguring large software
product lines. In International Conference on Software Engineering (ICSE), vol-
ume 1, pages 517{528. IEEE, 2015.
[71] Scott A. Hissam et al. Packaging Predictable Assembly. In Proceedings of the
IFIP/ACM Working Conference on Component Deployment (CD), pages 108{124,
Berlin, Germany, 6 2002.
[72] Ric Holt and Jason Y Pak. Gase: visualizing software evolution-in-the-large. In
Proceedings of the Third Working Conference on Reverse Engineering, 1996., pages
163{167. IEEE, 1996.
[73] David H. Hutchens and Victor R. Basili. System structure analysis: Clustering
with data bindings. IEEE TSE, 1985.
[74] John Hutchinson, Jon Whittle, Mark Rounceeld, and Steinar Kristoersen. Em-
pirical assessment of mde in industry. In International Conference on Software
Engineering (ICSE), pages 471{480. IEEE, 2011.
[75] IBM. Ibm rationale rhapsody. http://www-03.ibm.com/software/products/en/
ratirhapfami.
135
[76] Ethan Jackson and Janos Sztipanovits. Formalizing the Structural Semantics of
Domain-Specic Modeling Languages. Software and Systems Modeling, 8(4):451{
478, 9 2009.
[77] Anton Jansen and Jan Bosch. Software architecture as a set of architectural de-
sign decisions. In 5th Working IEEE/IFIP Conference on Software Architecture
(WICSA'05), pages 109{120. IEEE, 2005.
[78] Anton Jansen, Jan Bosch, and Paris Avgeriou. Documenting after the fact: Recov-
ering architectural design decisions. Journal of Systems and Software, 81(4):536{
557, 2008.
[79] Jira. https://www.atlassian.com/software/jira, 2017.
[80] Eunsuk Kang, Ethan Jackson, and Wolfram Schulte. An approach for eective
design space exploration. In Radu Calinescu and Ethan Jackson, editors, Founda-
tions of Computer Software. Modeling, Development, and Verication of Adaptive
Systems, pages 33{54, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
[81] W David Kelton and Averill M Law. Simulation modeling and analysis. McGraw
Hill Boston, 2000.
[82] Stuart Kent. Model driven engineering. In Integrated formal methods, pages 286{
298. Springer, 2002.
[83] Anneke G Kleppe, Jos Warmer, Wim Bast, and MDA Explained. The model driven
architecture: practice and promise. Addison-Wesley Longman Publishing Co., Inc.,
Boston, MA, 2003.
[84] R. Koschke. Architecture reconstruction. Software Engineering, 2009.
[85] Robert Kowalski and Marek Sergot. A logic-based calculus of events. In Foun-
dations of knowledge base management, pages 23{55. Springer Berlin Heidelberg,
1989.
[86] Anne Koziolek. Automated improvement of software architecture models for perfor-
mance and other quality attributes, volume 7. KIT Scientic Publishing, 2014.
[87] Heiko Koziolek. Performance evaluation of component-based software systems: A
survey. Performance Evaluation, 67(8):634{658, 2010.
[88] Philippe Kruchten. An ontology of architectural design decisions in software in-
tensive systems. In 2nd Groningen workshop on software variability, pages 54{61.
Citeseer, 2004.
[89] Philippe B Kruchten. The 4+ 1 view model of architecture. IEEE software,
12(6):42{50, 1995.
136
[90] M. Langhammer, A. Shahbazian, N. Medvidovic, and R. H. Reussner. Automated
extraction of rich software models from limited system information. In 2016 13th
Working IEEE/IFIP Conference on Software Architecture (WICSA), pages 99{108,
4 2016.
[91] Michael Langhammer, Arman Shahbazian, et al. Automated extraction of rich soft-
ware models from limited system information. In 2016 13th Working IEEE/IFIP
Conference on Software Architecture (WICSA). IEEE, 2016.
[92] Duc Le, Daniel Link, Arman Shahbazian, and Nenad Medvidovic. An empirical
study of architectural decay in open-source software. In IEEE International Con-
ference on Software Architecture (ICSA). IEEE, 2018.
[93] Duc Minh Le, Pooyan Behnamghader, Joshua Garcia, Daniel Link, Arman Shah-
bazian, and Nenad Medvidovic. An empirical study of architectural change in open-
source software systems. In 12th IEEE Working Conference on Mining Software
Repositories, pages 235{245, 2015.
[94] Youn Kyu Lee, Jae Young Bang, Gholamreza Sa, Arman Shahbazian, Yixue Zhao,
and Nenad Medvidovic. A sealant for inter-app security holes in android. In 2017
IEEE/ACM 39th International Conference on Software Engineering (ICSE), pages
312{323, 5 2017.
[95] Youn Kyu Lee, Peera Yoodee, Arman Shahbazian, Daye Nam, and Nenad Medvi-
dovic. Sealant: A detection and visualization tool for inter-app security vulnerabil-
ities in android. In Proceedings of the 32Nd IEEE/ACM International Conference
on Automated Software Engineering, ASE 2017, pages 883{888, Piscataway, NJ,
USA, 2017. IEEE Press.
[96] Daniel R. Levinson. An overview of 60 contracts that contributed to the de-
velopment and operation of the federal marketplace, oei-03-14-00231. http:
//oig.hhs.gov/oei/reports/oei-03-14-00231.pdf, 8 2014.
[97] Rui Li, Ramin Etemaadi, Michael TM Emmerich, and Michel RV Chaudron. An
evolutionary multiobjective optimization approach to component-based software
architecture design. In Evolutionary Computation (CEC), 2011 IEEE Congress
on, pages 432{439. IEEE, 2011.
[98] FMS Luke Chung. Healthcare.gov is a technological disaster. http://goo.gl/
8B1fcN, 2013.
[99] Jiefei Ma, Franck Le, Alessandra Russo, and Jorge Lobo. Declarative framework
for specication, simulation and analysis of distributed applications. IEEE Trans-
actions on Knowledge and Data Engineering, 28(6):1489{1502, 2016.
[100] Ivano Malavolta, Henry Muccini, and V Smrithi Rekha. Supporting architectural
design decisions evolution through model driven engineering. Software Engineering
for Resilient Systems, pages 63{77, 2011.
137
[101] Sam Malek, Nenad Medvidovic, and Marija Mikic-Rakic. An extensible frame-
work for improving a distributed software system's deployment architecture. IEEE
Transactions on Software Engineering (TSE), 38(1):73{100, 2012.
[102] Ruchika Malhotra. A systematic review of machine learning techniques for software
fault prediction. Applied Soft Computing, 27:504{518, 2015.
[103] Onaiza Maqbool and Haroon Babri. Hierarchical clustering for software architec-
ture recovery. IEEE TSE, 2007.
[104] Marzio Marseguerra, Enrico Zio, and Luca Podollini. Genetic algorithms and
monte carlo simulation for the optimization of system design and operation. In
Computational Intelligence in Reliability Engineering, pages 101{150. Springer,
2007.
[105] Anne Martens, Heiko Koziolek, Steen Becker, and Ralf Reussner. Automatically
improve software architecture models for performance, reliability, and cost using
evolutionary algorithms. In International Conference on Performance Engineering
(WOSP/SIPEW), pages 105{116, 2010.
[106] A.K. McCallum. Mallet: A machine learning for language toolkit. 2002.
[107] John D McGregor, Felix Bachmann, Len Bass, Philip Bianco, and Mark Klein.
Using arche in the classroom: One experience. Technical report, DTIC Document,
2007.
[108] Gianantonio Me, Coral Calero, and Patricia Lago. Architectural patterns and qual-
ity attributes interaction. In IEEE Workshop on Qualitative Reasoning about Soft-
ware Architectures (QRASA). IEEE, 2016.
[109] Nenad Medvidovic. Adls and dynamic architecture changes. In Joint proceedings
of the second international software architecture workshop (ISAW-2) and interna-
tional workshop on multiple perspectives in software development (Viewpoints' 96)
on SIGSOFT'96 workshops, pages 24{27. ACM, 1996.
[110] Nenad Medvidovic and Richard N Taylor. A classication and comparison frame-
work for software architecture description languages. IEEE Transactions on Soft-
ware Engineering (TSE), 26(1):70{93, 2000.
[111] Daniel A Menasc e, John M Ewing, Hassan Gomaa, Sam Malex, and Jo~ ao P Sousa.
A framework for utility-based service oriented design in sassy. In Proceedings of the
rst joint WOSP/SIPEW international conference on Performance engineering,
pages 27{36. ACM, 2010.
[112] Olivier Mengu e. Svn graph branches. https://code.google.com/p/
svn-graph-branches/, 2014.
[113] Tom Mens and Pieter Van Gorp. A taxonomy of model transformation. Electronic
Notes in Theoretical Computer Science, 152:125{142, 2006.
138
[114] Zhibao Mian, Leonardo Bottaci, Yiannis Papadopoulos, Septavera Sharvia, and
Nidhal Mahmud. Model transformation for multi-objective architecture optimisa-
tion of dependable systems. In Dependability Problems of Complex Information
Systems, pages 91{110. Springer, 2015.
[115] Joaquin Miller, Jishnu Mukerji, M Belaunde, et al. Mda guide. Object Management
Group, 2003.
[116] Melanie Mitchell. An introduction to genetic algorithms. MIT press, 1998.
[117] Tim Mullaney. Demand overwhelmed healthcare.gov. http://goo.gl/k3o4Rg,
2013.
[118] Kvan c Mu slu, Yuriy Brun, and Alexandra Meliou. Data debugging with continuous
testing. In ESEC/FSE NIER, pages 631{634, 2013.
[119] Kvan c Mu slu, Yuriy Brun, and Alexandra Meliou. Preventing data errors with
continuous testing. In ISSTA, pages 373{384, 2015.
[120] Taiga Nakamura and Victor R Basili. Metrics of software architecture changes
based on structural distance. In Software Metrics, 2005. 11th IEEE International
Symposium, pages 24{24. IEEE, 2005.
[121] Andrew Y Ng and Michael I Jordan. On discriminative vs. generative classiers: A
comparison of logistic regression and naive bayes. In Advances in neural information
processing systems, pages 841{848, 2002.
[122] Marcin Nowak and Cesare Pautasso. Team situational awareness and architectural
decision making with the software architecture warehouse. In Software Architecture,
pages 146{161. Springer Berlin Heidelberg, 2013.
[123] Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund. Finding near-
optimal congurations in product lines by random sampling. In Proceedings of the
2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017,
pages 61{71, New York, NY, USA, 2017. ACM.
[124] Flavio Oquendo. -ADL: An architecture description language based on the higher-
order typed -calculus for specifying dynamic and mobile software architectures.
ACM SIGSOFT Software Engineering Notes, 29(3):1{14, 2004.
[125] Peyman Oreizy, Nenad Medvidovic, and Richard N Taylor. Architecture-based
runtime software evolution. In International Conference on Software Engineering
(ICSE), pages 177{186. IEEE Computer Society, 1998.
[126] Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and
Mark Harman. Are developers aware of the architectural impact of their changes?
In ASE, pages 95{105, 2017.
[127] Trevor Parsons. Automatic detection of performance design and deployment an-
tipatterns in component based enterprise systems. PhD thesis, Citeseer, 2007.
139
[128] Dewayne E Perry and Alexander L Wolf. Foundations for the study of software
architecture. ACM SIGSOFT Software Engineering Notes, 17(4):40{52, 1992.
[129] Gordon D. Plotkin. A Structural Approach to Operational Semantics. Technical
Report DAIMI FN-19, Computer Science Department, Aarhus University, 1981.
[130] Carlo Poloni and Valentino Pediroda. Ga coupled with computationally expen-
sive simulations: tools to improve eciency. Genetic Algorithms and Evolution
Strategies in Engineering and Computer Science, pages 267{288, 1997.
[131] M. F. Porter. Readings in information retrieval. chapter An Algorithm for Sux
Stripping, pages 313{316. Morgan Kaufmann Publishers Inc., 1997.
[132] Pasqualina Potena. Composition and tradeo of non-functional attributes in soft-
ware systems: research directions. In Joint Meeting of the European Software Engi-
neering Conference and ACM SIGSOFT Symposium on the Foundations of Software
Engineering (ESEC/FSE), pages 583{586. ACM, 2007.
[133] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista,
Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. Addressing big
data time series: Mining trillions of time series subsequences under dynamic time
warping. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(3):10,
2013.
[134] C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann, 2nd edition,
1979.
[135] RM-ODP. http://www.rm-odp.net/, 2018.
[136] Ronny Roeller, Patricia Lago, and Hans van Vliet. Recovering architectural as-
sumptions. Journal of Systems and Software, 79(4):552{573, 2006.
[137] David Sa and Michael D. Ernst. Reducing wasted development time via continuous
testing. In ISSRE, pages 281{292, 2003.
[138] Gholamreza Sa, Arman Shahbazian, William G. J. Halfond, and Nenad Medvi-
dovic. Detecting event anomalies in event-based systems. In Joint Meeting of the
European Software Engineering Conference and ACM SIGSOFT Symposium on the
Foundations of Software Engineering (ESEC/FSE), ESEC/FSE 2015, pages 25{37,
New York, NY, USA, 2015. ACM.
[139] Tripti Saxena and Gabor Karsai. Mde-based approach for generalizing design space
exploration. In Model Driven Engineering Languages and Systems, pages 46{60.
Springer, 2010.
[140] A. S. Sayyad, J. Ingram, T. Menzies, and H. Ammar. Scalable product line congu-
ration: A straw to break the camel's back. In 2013 28th IEEE/ACM International
Conference on Automated Software Engineering (ASE), pages 465{474, 11 2013.
140
[141] Abdel Salam Sayyad, Tim Menzies, and Hany Ammar. On the value of user prefer-
ences in search-based software engineering: a case study in software product lines.
In International Conference on Software Engineering (ICSE), pages 492{501. IEEE
Press, 2013.
[142] Robert W. Schwanke. An intelligent tool for re-engineering software modularity.
In ICSE, 1991.
[143] Robert W. Schwanke and Stephen Jos e Hanson. Using neural networks to modu-
larize software. Machine Learning, 1994.
[144] Fabrizio Sebastiani. Machine learning in automated text categorization. ACM
computing surveys (CSUR), 34(1):1{47, 2002.
[145] Arman Shahbazian, George Edwards, and Nenad Medvidovic. An end-to-end do-
main specic modeling and analysis platform. In International Workshop on Mod-
eling in Software Engineering, pages 8{12. ACM, 2016.
[146] Arman Shahbazian, Youn Kyu Lee, Yuriy Brun, and Nenad Medvidovic. Poster:
Making well-informed software design decisions. 2018.
[147] Arman Shahbazian, Youn Kyu Lee, Duc Le, Yuriy Brun, and Nenad Medvidovic.
Recovering architectural design decisions. In 2018 IEEE International Conference
on Software Architecture (ICSA). IEEE, 2018.
[148] Arman Shahbazian, Daye Nam, and Nenad Medvidovic. Toward predicting archi-
tectural signicance of implementation issues. In 2018 IEEE/ACM 15th Interna-
tional Conference on Mining Software Repositories (MSR), 5 2018.
[149] Mojtaba Shahin, Peng Liang, and Mohammad Reza Khayyambashi. Architectural
design decision: Existing models and tools. In Joint Working IEEE/IFIP Confer-
ence on Software Architecture, 2009 & European Conference on Software Architec-
ture. WICSA/ECSA 2009., pages 293{296. IEEE, 2009.
[150] N. Siegmund, S. S. Kolesnikov, C. Kstner, S. Apel, D. Batory, M. Rosenmller, and
G. Saake. Predicting performance via automated feature-interaction detection. In
2012 34th International Conference on Software Engineering (ICSE), pages 167{
177, 6 2012.
[151] Norbert Siegmund, Alexander Grebhahn, Sven Apel, and Christian K astner.
Performance-in
uence models for highly congurable systems. In Proceedings of
the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE
2015, pages 284{294, New York, NY, USA, 2015. ACM.
[152] Norbert Siegmund, Marko Rosenm uller, Martin Kuhlemann, Christian K astner,
Sven Apel, and Gunter Saake. Spl conqueror: Toward optimization of non-
functional properties in software product lines. Software Quality Journal, 20(3-
4):487{517, 2012.
141
[153] Ghanem Soltana, Nicolas Sannier, Mehrdad Sabetzadeh, and Lionel C Briand.
A model-based framework for probabilistic simulation of legal policies. In 2015
ACM/IEEE 18th International Conference on Model Driven Engineering Languages
and Systems (MODELS)., pages 70{79. IEEE, 2015.
[154] Spring application framework. https://spring.io/, 2018.
[155] L. Tahvildari, R. Gregory, and K. Kontogiannis. An approach for measuring soft-
ware evolution using source code features. In Software Engineering Conference,
1999. (APSEC '99) Proceedings. Sixth Asia Pacic, pages 10{17, 1999.
[156] Antony Tang, Minh H Tran, Jun Han, and Hans Van Vliet. Design reasoning
improves software design quality. In International Conference on the Quality of
Software Architectures, pages 28{42. Springer, 2008.
[157] Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. Software architecture:
Foundations, Theory, and Practice. Wiley Publishing, 2009.
[158] Richard N Taylor, Nenad Medvidovic, and Eric M Dashofy. Software architecture:
foundations, theory, and practice. 2009.
[159] Atul Thakur, Ashis Gopal Banerjee, and Satyandra K Gupta. A survey of cad model
simplication techniques for physics-based simulation applications. Computer-
Aided Design, 41(2):65{80, 2009.
[160] Michal Toman, Roman Tesar, and Karel Jezek. In
uence of word normalization on
text classication. Proceedings of InSciT, 4:354{358, 2006.
[161] Qiang Tu and Michael W Godfrey. An integrated approach for studying architec-
tural evolution. In Program Comprehension, 2002. Proceedings. 10th International
Workshop on, pages 127{136. IEEE, 2002.
[162] Je Tyree and Art Akerman. Architecture decisions: Demystifying architecture.
IEEE Software, 22(2):19{27, 2005.
[163] Vassilios Tzerpos and Richard C Holt. Mojo: A distance metric for software cluster-
ings. In Sixth Working Conference on Reverse Engineering, 1999., pages 187{193.
IEEE, 1999.
[164] Vassilios Tzerpos and Richard C Holt. Acdc: An algorithm for comprehension-
driven clustering. In wcre, pages 258{267, 2000.
[165] United States Government Accountability Oce. Report to congressional requester,
gao-15-238. http://www.gao.gov/assets/670/668834.pdf, 2015.
[166] US Centers for Medicare and Medicaid Services. Mckinsey and co. presentation on
health care law. http://goo.gl/Nns9mr, 2013.
[167] US Department of Health and Human Services. Healthcare.gov progress and per-
formance report. http://goo.gl/XJRC7Q, 2013.
142
[168] Christopher Van der Westhuizen and Andr e Van Der Hoek. Understanding
and propagating architectural changes. In Software Architecture, pages 95{109.
Springer, 2002.
[169] D aniel Varr o, Gergely Varr o, and Andr as Pataricza. Designing the automatic trans-
formation of visual languages. Science of Computer Programming, 44(2):205{227,
2002.
[170] Andreea Vescan. A metrics-based evolutionary approach for the component selec-
tion problem. In UKSIM'09. 11th International Conference on Computer Modelling
and Simulation, 2009., pages 83{88. IEEE, 2009.
[171] Hanna M Wallach. Topic modeling: beyond bag-of-words. In Proceedings of the
23rd international conference on Machine learning, pages 977{984. ACM, 2006.
[172] Rainer Weinreich and Iris Groher. Software architecture knowledge management
approaches and their support for knowledge management activities: A systematic
literature review. Information and Software Technology, 80:265{286, 2016.
[173] Rainer Weinreich, Iris Groher, and Cornelia Miesbauer. An expert survey on kinds,
in
uence factors and documentation of design decisions in practice. Future Gener-
ation Computer Systems, 47:145{160, 2015.
[174] Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, and Andreas Zeller. How
long will it take to x this bug? In Proceedings of the Fourth International Work-
shop on Mining Software Repositories, pages 1{1. IEEE Computer Society, 2007.
[175] Jianfeng Wen, Shixian Li, Zhiyong Lin, Yong Hu, and Changqin Huang. Systematic
literature review of machine learning based software development eort estimation
models. Information and Software Technology, 54(1):41{59, 2012.
[176] Zhihua Wen and Vassilios Tzerpos. An eectiveness measure for software clustering
algorithms. In Program Comprehension, 2004. Proceedings. 12th IEEE Interna-
tional Workshop on, pages 194{203. IEEE, 2004.
[177] Richard Wettel and Michele Lanza. Visual exploration of large-scale system evolu-
tion. In 15th Working Conference on Reverse Engineering, 2008. WCRE'08., pages
219{228. IEEE, 2008.
[178] Igor Scaliante Wiese, Reginaldo R, Igor Steinmacher, Rodrigo Takashi Kuroda,
Gustavo Ansaldi Oliva, Christoph Treude, and Marco Aurlio Gerosa. Using con-
textual information to predict co-changes. J. Syst. Softw., 128(C):220{235, 2017.
[179] Theo A Wiggerts. Using clustering algorithms in legacy systems remodularization.
In Working Conference on Reverse Engineering (WCRE), 1997.
143
[180] Murray Woodside, Dorina C Petriu, Dorin B Petriu, Hui Shen, Toqeer Israr, and
Jose Merseguer. Performance by unied model analysis (puma). In Proceedings of
the 5th international workshop on Software and performance, pages 1{12. ACM,
2005.
[181] Lu Xiao, Yuanfang Cai, Rick Kazman, Ran Mo, and Qiong Feng. Identifying and
quantifying architectural debt. In 2016 IEEE/ACM 38th International Conference
on Software Engineering (ICSE)., pages 488{498. IEEE, 2016.
[182] Jing Xu. Rule-based automatic software performance diagnosis and improvement.
Performance Evaluation, 69(11):525{550, 2012.
[183] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative
adversarial nets with policy gradient. In AAAI, pages 2852{2858, 2017.
[184] Bernard Zeigler et al. Theory of modeling and simulation: integrating discrete event
and continuous complex dynamic systems. Academic press, 2000.
[185] Hans-J urgen Zimmermann. Fuzzy set theory and its applications. Springer Science
& Business Media, 2011.
[186] Olaf Zimmermann, Thomas Gschwind, Jochen K uster, Frank Leymann, and Nelly
Schuster. Reusable architectural decision models for enterprise application devel-
opment. In QoSA, pages 15{32, 2007.
144
Appendix A
Screenshots
This appendix presents a set of screenshots depicting the prototypes of DomainPro and
eQual in action.
Figure A.1: An example of a meta-model designed in DomainPro.
145
Figure A.2: An example of a system designed in DomainPro using the metamodel shown
in Figure A.1.
Figure A.3: An example of the standalone analyses capabilities of DomainPro.
146
Figure A.4: Screenshot depicting the interface used to retrieve the answers to the variation
point questions eQual asks.
Figure A.5: Screenshot depicting the interface used to retrieve the answers to the non-
functional property questions eQual asks.
147
Figure A.6: eQual's visualization of dierent design variants using radar charts.
148
Abstract (if available)
Abstract
Designing and maintaining a software system’s architecture typically involve making numerous design decisions, each potentially affecting the system’s functional and nonfunctional properties. Understanding these design decisions can help inform future decisions and implementation choices, and can avoid introducing architectural inefficiencies later. Unfortunately, the support for engineers to make these decisions is generally lacking. There is a relative shortage of techniques, tools, and empirical studies pertaining to architectural design decisions. Moreover, design decisions are rarely well documented and are typically a lost artifact of the architecture creation and maintenance process. The loss of this information can thus hurt development. To address these shortcomings, we develop a set of techniques to enable methodical exploration of such decisions and their effects. ❧ We develop a technique, named RecovAr , for automatically recovering design decisions from the project’s readily available history artifacts, such as an issue tracker and version control repository. RecovAr uses state-of-the-art architectural recovery techniques on a series of version control commits and maps those commits to issues to identify decisions that affect system architecture. While some decisions can still be lost through this process, our evaluation on two large open-source systems with over 8 years of development each shows that RecovAr has the recall of 75% and a precision of 77%. To create RecovAr, we formally define architectural design decisions and develop an approach for tracing such decisions in project histories. Additionally, the work introduces methods to classify whether decisions are architectural and to map decisions to code elements. ❧ Building on RecovAr , we create PredictAr . PredictAr aims to prevent the consequences of inadvertent architectural change. The result of such changes is accumulation of technical debt and deterioration of software quality. In this dissertation we take a step toward addressing that scarcity by using the information in the issue and code repositories of open-source software systems to investigate the cause and frequency of such architectural design decisions. We develop a predictive model that is able to identify the architectural significance of newly submitted issues, thereby helping engineers to prevent the adverse effects of architectural decay. The results of this study are based on the analysis of 21,062 issues affecting 301 versions of 5 large open-source systems for which the code changes and issues were publicly accessible. ❧ We close the loop by helping engineers to not only predict and recover architectural design decisions, but also make new design decisions that are informed and well-considered. Recent studies have shown that the number of available design alternatives grows rapidly with system size, creating an enormous space of intertwined design concerns. This dissertation presents eQual, a novel model-driven technique for simulation-based assessment of architectural designs that helps architects understand and explore the effects of their decisions. We demonstrate that eQual effectively explores massive spaces of design alternatives and significantly outperforms state-of-the-art approaches, without being cumbersome for architects to use
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Constraint-based program analysis for concurrent software
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Architectural evolution and decay in software systems
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Proactive detection of higher-order software design conflicts
PDF
Calculating architectural reliability via modeling and analysis
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Automated synthesis of domain-specific model interpreters
PDF
Automatic test generation system for software
PDF
Energy optimization of mobile applications
PDF
Deriving component‐level behavior models from scenario‐based requirements
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
A unified framework for studying architectural decay of software systems
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Detecting SQL antipatterns in mobile applications
Asset Metadata
Creator
Shahbazian, Arman
(author)
Core Title
Techniques for methodically exploring software development alternatives
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/25/2018
Defense Date
05/17/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
design decisions,OAI-PMH Harvest,search-based software engineering,software engineering,software modeling,software quality
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medvidovic, Nenad (
committee chair
), Gupta, Sandeep (
committee member
), Wang, Chao (
committee member
)
Creator Email
ar.shahbazian@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-26578
Unique identifier
UC11672486
Identifier
etd-Shahbazian-6460.pdf (filename),usctheses-c89-26578 (legacy record id)
Legacy Identifier
etd-Shahbazian-6460.pdf
Dmrecord
26578
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Shahbazian, Arman
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
design decisions
search-based software engineering
software engineering
software modeling
software quality