Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
(USC Thesis Other)
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Software Quality Understanding by Analysis of Abundant Data (SQUAAD):
Towards Better Understanding of Life Cycle Software Qualities
by
Pooyan Behnamghader
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
December 2019
Copyright 2019 Pooyan Behnamghader
Acknowledgments
Foremost, I would like to thank Dr. Boehm for supporting me through this journey and
for pointing me in productive directions. Barry, you set an example for me to follow.
I would also like to thank Drs. Halfond, Medvidovi c, Razaviyayn, Siegel, and Wang
for their feedback on my thesis proposal and nal defense. They helped me improve the
quality of my work, and I am grateful for that.
Many thanks to Di, Iordanis, Jincheng, Kam, Patavee, and Reem for their contribu-
tions to the empirical studies presented in this dissertation. They played an essential role
in my research, and I learned a lot from them.
I want to extend my gratitude to Julie and other members of the Center for Systems
and Software Engineering for making this experience joyful and memorable.
Special gratitude goes out to all down at the Systems Engineering Research Center
and also the USC Oce of the Provost for providing the funding for the work.
My warmest thanks go to my family. Farzaneh and Kazem, thank you for your in-
credible parenthood and patience, especially over the past few years. Afshan and Shadan,
thank you for your encouragement and love. Ehsan, thank you for your brotherhood and
consultation.
Last but by no means the least, I am very thankful to my old and new friends. You
guys were there for me during the ups and downs of this journey. I could not have done
this without you.
ii
Dedication
To Farzaneh and Kazem.
iii
Abstract
Researchers oftentimes measure quality metrics only in the changed les when analyzing
software evolution over commit history. This approach is not suitable for compilation
and using program analysis techniques that consider relationships between les. At the
same time, compiling the whole software not only is costly but may also leave us with a
large number of uncompilable and unanalyzed revisions. In this dissertation, I intend to
demonstrate if analyzing changes in a module results in achieving a high compilation ratio
over commit history and a better understanding of compilability and its impact on soft-
ware quality. I (and my team) conduct a large-scale multi-perspective empirical study on
more than 37k distinct revisions of the core module of 68 systems across Apache, Google,
and Net
ix to assess their compilability and identify when the software is uncompilable
as a result of a developers fault. We study the characteristics of uncompilable revisions
and analyze compilable ones to understand the impact of developers on software quality.
We achieve high compilation ratios: 98.4% for Apache, 99.0% for Google, and 94.3% for
Net
ix. We identify sequences of uncompiled commits and create a model to predict
uncompilability based on commit metadata. We identify statistical dierences between
the impact of compilable and uncompilable commits on software quality. We conclude
that focusing on a module results in a more complete and accurate software evolution
analysis, reduces the cost and complexity, and facilitates manual inspection. The analysis
presented in this dissertation can be applied by any organization wishing to improve its
software and its software engineering.
iv
Table of Contents
Acknowledgments ii
Dedication iii
Abstract iv
List Of Tables vii
List Of Figures viii
Chapter 1 Introduction 1
Chapter 2 Summary of Contributions 4
Chapter 3 Compilability Over Commit History 9
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Targeting a Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Distinct Revisions of a Module . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Ancestry Relationships . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Maximum Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Empirical Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 How eective is my approach in reaching the maximum compilation? 20
3.5.2 What are the characteristics of sequences of uncompilable commits? 21
3.5.3 Is it feasible to predict uncompilability based on commit metadata? 22
3.5.4 Why do developers commit broken code and how to prevent it? . . 28
Chapter 4 Compilability and Its Impact on Software Quality 36
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Scalable Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Empirical Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
v
4.4.1 How important is analyzing every commit from multiple perspectives? 44
4.4.2 How do quality metrics change when the software is compilable? . 46
4.4.3 How eective is my approach in identifying change in quality metrics? 50
4.4.4 How do quality metrics change when the software is uncompilable? 53
Chapter 5 Discussions 56
5.1 Focusing on a Module in Software Evolution Analysis by Commit Level . 57
5.1.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.2 Benets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.3 Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 A Tool Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.2 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 6 Conclusions 79
Bibliography 82
vi
List Of Tables
3.1 Report on broken sequences. . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Experiment's scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Number of breakers, carriers, xers, and neutrals. . . . . . . . . . . . . . 21
3.4 Frequency of the number of instructions and tools. . . . . . . . . . . . . . 22
3.5 Confusion matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Quality metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Percentage of const(X)\change(Y ) to const(X). . . . . . . . . . . . . . 45
4.3 Percentage of impactfuls that change a quality metric to all impactfuls. . . 47
4.4 Ratio of change to quality metrics for aliated and external. . . . . . . . 52
4.5 Ratio of change to quality metrics for broken sequence and neutral. . . . . 54
4.6 Ratio of change to quality metrics for B1 and S2. . . . . . . . . . . . . . 55
5.1 Impactful developers and commits in Apache systems. . . . . . . . . . . . 64
5.2 Comparison of related frameworks. . . . . . . . . . . . . . . . . . . . . . . 68
vii
List Of Figures
3.1 A DAG depicting the evolution of system S. . . . . . . . . . . . . . . . . . 12
3.2 Compilation ratio distribution. . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Distribution of characteristics of broken sequences. . . . . . . . . . . . . . 23
3.4 CDF of breaker, carrier, xer, and neutral interval. . . . . . . . . . . . . 25
3.5 Probability density of interval in log
7:75s
scale. . . . . . . . . . . . . . . . 26
3.6 Probability density of message similarity. . . . . . . . . . . . . . . . . . . 27
4.1 Architecture of SQUAAD. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1 Visualization of software quality evolution (all commits v.s. impactfuls). . 61
5.2 Impactful commits/developers ratio distribution. . . . . . . . . . . . . . . 62
viii
Chapter 1
Introduction
Software developers can prevent failures and disasters and reduce the total cost of own-
ership by putting more emphasis on improving software maintainability in their software
development process. One way to improve software maintainability is to produce clean
code while changing the software and to assess and monitor code quality continuously
while the software is evolving [37]. A software system may undergo thousands of changes
over its development history. These changes are recorded by commits in modern version
control systems. A commit contains a wealth of information on when, why, and by whom
software has changed.
Prior research [14,21,30,40,56] has been focusing mainly on analyzing ocial releases
of software to understand how its quality evolves. While this approach gives an insight
on change over each release, it only shows the major milestones of software evolution.
Important details between releases may be overlooked. For example, a developer may
unknowingly increase the amount of technical debt over a few commits. If that debt is
not addressed quickly, it can impose extra cost. It gets even worse, if they leave the team
without paying that debt. In another example, a developer may break the compilability of
the software in a commit. Not only this will break the code for other contributors during
the development, it will also cause the unavailability of bytecode in a post-development
1
analysis. Since software developers do not ship uncompiled code, this issue does not arise
when analyzing ocial releases.
Due to the cost and complexity involved in large-scale software evolution analysis
by commit level, researchers oftentimes achieve scale and omit redundancy by running
static analysis on distinct revisions of each le [1,19,55,59]. This approach is extremely
ecient in generating an Abstract Syntax Tree (AST) and calculating le-based quality
metrics, such as size and complexity. Nevertheless, analyzing distinct revisions of each
le without considering its relationship with other les is not suitable when the analysis
requires compilation. At the same time, compiling the whole software after every commit
not only is costly but also may leave us with many uncompilable revisions since only a
one-line syntax error is enough to break the build for the whole software.
In this Ph.D. dissertation, I take steps towards better understanding of compilability
and its impact on software quality. I introduce an approach to reach a high compilation
ratio over commit history by focusing on the evolution of a software module (instead
of the changed les or the whole software). I (and my team) employ this approach to
conduct a series of empirical studies to answer the following research questions:
Compilability Over Commit History
1. How eective is my approach in reaching the maximum compilation?
2. What are the characteristics of sequences of uncompilable commits?
3. Is it feasible to predict uncompilability based on commit metadata?
4. Why do developers commit broken code and how to prevent it?
Compilability and Its Impact on Software Quality
1. How important is analyzing every commit from multiple perspectives?
2. How do quality metrics change when the software is compilable?
2
3. How eective is my approach in identifying change in quality metrics?
4. How do quality metrics change when the software is uncompilable?
I discuss the benets and risks of focusing on a software module in software evolution
analysis by commit level. I also discuss the Software Quality Understanding by Analysis of
Abundant Data (SQUAAD) framework which has enabled me (and my team) to conduct
multiple empirical studies on software evolution.
The remainder of this dissertation is organized as follows. Chapter 2 summarizes
my (published) contributions. Chapter 3 focuses on compilability over commit history.
Chapter 4 focuses on compilability and its impact on software quality. Chapter 5 includes
discussions on the benets and risks of focusing on a module, the SQUAAD framework,
the threats to validity of the analysis results presented here, and future directions. Chap-
ter 6 concludes this dissertation by summarizing what I have accomplished with regards
to answering the research questions mentioned above.
3
Chapter 2
Summary of Contributions
I have contributed to a series of published empirical studies in the area of software evo-
lution. The rst two studies are the focus of my dissertation.
A recent maintainability trends analysis [9] involves a total of 19,580 examined revi-
sions from 38 Apache-family systems across a timespan from January 2002 through March
2017, comprising 586 MSLOC. In this analysis, to obtain software quality, we employ three
widely-used open-source static analysis tools: PMD
1
, FindBugs
2
, and SonarQube
3
. We
select a subset of quality attributes related to size (basic), code quality, and security. We
nd that on average, 2% of the commits that change the core module of the software
are not compilable. We investigate when, how, and why developers commit uncompilable
code. Our results suggest that commits have dierent impact on the software quality met-
rics. We nd that dierent quality attributes may change even if the code count does not
change. We calculate the probability for a metric to change while another one is constant
based on the collected data. Our results also show that although the security metrics
change less frequently, it is crucial to utilize them as they can reveal the introduction of
dierent kinds of security problems.
1
https://pmd.github.io
2
http://ndbugs.sourceforge.net
3
https://www.sonarqube.org
4
In another recent publication [12], we demonstrate how analyzing changes in a mod-
ule (instead of les or the whole software) results in achieving a high compilation ratio
and a more complete understanding of software quality evolution at commit level gran-
ularity. We conduct a large-scale multi-perspective empirical study on 37838 distinct
revisions (out of 77580 total revisions) of the core module of 68 industry-scale systems
across Apache, Google, and Net
ix to assess and predict their compilability, and to study
the impact of developers of each organization on dierent quality metrics measured by
tools that analyze source/byte code. We reach a compilability ratio of 98.4% for Apache,
99.0% for Google, and 94.3% for Net
ix commits. We manually inspect all uncompilable
commits to understand the characteristics of uncompilability. We create a model to pre-
dict the uncompilability of commits based on these behavioral meta-data information and
without considering code artifacts and reached a 89% F1-Score. We analyze the evolution
of software quality metrics in the compiled revisions using FindBugs and SonarQube that
depend on the availability of bytecode and source code. Our analysis shows that the
aliation of developers may aect how they impact software quality.
Two extensions [2, 3] of commit impact analysis [9] study the impact of developers
on technical debt in open source software systems based on their level of involvement
and the characteristics of their commits. We investigate whether there is any statistical
dierence in the amount of change to the technical debt that a commit imposes considering
the seniority of the committer and the number of commits she has had by the time of
the commit, the interval between the commit and its parent commit, and whether the
committer is a core developer of the system based on their commit frequency. Our results
show that while there is no statistically signicant dierence between core developers and
others, in almost all subject systems, some developers increase or decrease the amount
of technical debt more than others. We also nd that the seniority of the developer has
5
a positive correlation and the interval between the commit and its parent commit has a
negative correlation with the decrease in the amount of technical debt.
In a recent publication [10], we introduce SQUAAD, our comprehensive framework
including a cloud-based automated infrastructure accompanied by a data analytics and
visualization tool-set that facilitates large-scale maintainability analysis over development
history of a software system. SQUAAD targets the whole software (or one of its modules)
in a repository, determines its distinct revisions, compiles all revisions, and runs complex
static and dynamic analysis on the development history. Its cloud-based infrastructure is
accompanied by a data analytics tool set and user interfaces for data visualization. In a
follow up paper [13], we summarize three process frameworks and tools providing more
anticipatory ways to improve systems and software maintainability and life-cycle cost-
eectiveness. The rst framework is an Opportunity Tree for identifying and anticipating
such ways. The second framework (SQUAAD) is a toolset for tracking a software project's
incremental code commits, and analyzing and visualizing each commits incremental and
cumulative TD. The third framework is a System and Software Maintenance Readiness
Framework (SMRF), that identies needed software maintenance readiness levels at de-
velopment decision reviews, similar to the Technology Readiness Levels framework.
In my previous work [11,30], we conduct a large-scale empirical study of architectural
evolution in open source software systems. The scope of our study is re
ected in the total
number of subject systems (23) and the examined ocial versions (931), as well as the
cumulative amount of analyzed code (140 MSLOC). We employ three program analysis
techniques to recover the architecture of each version from semantics and structural-based
architectural perspectives. We also design two software change metrics to quantify the
amount of architectural change between two versions. Our study reveals several unex-
pected ndings about the nature, scope, time, and frequency of architectural change. We
nd that the versioning scheme of a system is not necessarily re
ecting the architectural
6
change as the system may undergo major architectural modications between minor re-
leases and patches. We also nd that the architecture may be relatively unstable during
the pre-release phase.
Other than my primary research in the area of software evolution, I have contributed
to projects in other areas. The summary of those contributions is as follows.
I am a co-author of a publication [4] that concentrates on the analysis of open source
software projects to evaluate the relationships between multiple software system charac-
teristics and technical debt and the relationships between software process factors and
technical debt. In this study, we employ various statistical methods to investigate how
technical debt and technical debt density relate to dierent system characteristics and
process characteristics across a representative sample of 91 Apache Java open source
projects. From the results of the data analysis on the hypotheses, we can conclude for
similar systems that the size of a software system and the software domain it belongs to
can correlate with increasing its technical debt and technical debt density signicantly.
While the number of system releases and commits has a signicant positive relation-
ship with its technical debt, the results show no signicant relationship between system
technical debt and the number of its contributors and branches.
I am a co-author of two recent publications [50,51] that conduct an exploratory study
by analyzing 1491 veried purchase reviews (7198 review sentences) of 6 IoT products
obtained from Amazon. Our results demonstrate that only 26.72% of all sentences are
software related, based on our taxonomy dened through external and internal content
analysis sessions. We investigate how much information in those software related sen-
tences is useful for software engineers performing software evolution and maintenance
tasks. The results show that only 55.28% of software related sentences (14.79% of all
sentences) are directly applicable to software engineers. Moreover, our results reveal that
7
users rarely discuss the security aspect of the software in these products and that there
exist patterns in the type of review sentences with regards to the rating.
In another contribution [35], we propose a novel automated approach for debugging
websites to detect presentation failures based on image processing and probabilistic tech-
niques. Our approach rst builds a model that links observable changes in the website's
appearance to faulty elements and styling properties. Then using this model, our ap-
proach predicts the elements and styling properties most likely to cause the observed
presentation failure for the page under test and reports these to the developer. In eval-
uation, our approach is more accurate and faster than prior techniques for identifying
faulty elements in a website.
I am a co-author of two other studies [32, 33] in the area of software architecture
recovery. The rst study [33] discusses the value of architecture recovery techniques for
maintenance tasks. While there have been taxonomies of dierent recovery methods
and surveys of their results along with measurements of how these results conform to
expert's opinions on the systems, there has not been a survey that goes beyond an auto-
matic comparison and instead seeks to answer questions about the viability of individual
methods in given situations, the quality of their results and whether these results can
be used to indicate and measure the quality and quantity of architectural changes. The
second study [32] introduces RELAX, a new concern based recovery method that uses
text classication, assembles the overall recovery result from smaller, independent parts,
is based on an algorithm with linear time complexity and is tailorable to the recovery of a
single system or a sequence thereof through the selection of meaningfully named, seman-
tic topics. An intuitive and informative architectural visualization rounds out RELAXs
contributions.
In what follows, I expand on the rst two studies that are the focus of my dissertation.
8
Chapter 3
Compilability Over Commit History
A software revision is expected to be compilable and committing uncompilable code can
be a symptom of careless development and delivery delays. In addition, a compilable
revision might become uncompilable in a post-development analysis because of compila-
tion environment issues (e.g., unavailability of dependencies). Tufano et al. [57] mine the
commit history of 100 Apache projects and report that surprisingly 62% of the commit
history is currently uncompilable.
In order to study compilability over commit history, rst we need to identify all
commits that are uncompilable as a result of a developer's fault during development.
Compiling the whole software after each commit may leave us with a high ratio of un-
compilable commits as even a one-line syntax error (e.g., missing semicolon) in a le or an
unavailable dependency for a module is enough to break the compilation. My approach
to address this diculty is to focus on the evolution of a module instead of the whole
software. I identify, compile, and analyze the distinct revisions of that module and omit
other modules to prevent their errors from breaking the compilation.
I (and my team) employ this approach and conduct a large-scale empirical study to
answer the following research questions. 1) How eective is my approach in reaching
the maximum compilation? 2) What are the characteristics of sequences of uncompilable
9
commits? 3) Is it feasible to predict uncompilability based on commit metadata? 4) Why
do developers commit broken code and how to prevent it?
The remainder of this chapter is organized as follows. I discuss related work in Section
3.1. I explain my method to identify distinct revisions of a target module (Section 3.2.1)
and the ancestry relationships between them (Section 3.2.2). I explain my method to
reach the maximum compilation for that target module (Section 3.3). I explain the
research questions and the data collection method in Section 3.4 and the results of the
empirical study in Section 3.5.
3.1 Related Work
Multiple recent studies have assessed the compilability of software repositories. Some
studies [26, 52] analyze compilability only in the last commit in the repository. Some
[27,45] analyze the log les recorded by continuous integration frameworks. The studies
that compile commit history either focus on missing dependencies [34] or reach a low
compilation ratio [57].
Hassan et al. [26] analyze the possibility of automatically building the last commit
for the top 200 Java repositories on GitHub. They use default Ant, Maven, and Gradle
commands to build the systems automatically. For those that fail, they manually x the
compile errors and nd that at least 57% of build failures can be automatically resolved.
Sul r et al. [52] automatically build the most recent version of 7200 open-source Java
systems and analyze failures. They conclude that the most frequent errors are related to
dependencies, Java compilation, and documentation generation.
Seo et al. [45] analyze 26.6 million builds produced during a period of 9 months by
Google engineers, focusing on the frequency and the cause of build failure as well as
how long it takes to x the issues. Their work is focused on developer's build activity
10
rather than commit history. Hassan et al. [27] propose a build-outcome prediction model
based on combined features of build-instance metadata and code dierence information
of commit to predict whether a build will be successful without attempting actual build.
They use a dataset recorded by continuous integration practices containing more than
250,000 build instances over a period of 5 years for training and achieve the outcome with
an average F-Measure over 87%.
Macho et al. [34] identify 125 commits in 23 repositories that repair a missing de-
pendency, qualitatively and quantitatively analyze how the x is applied, and propose
an approach to x dependency build breakage automatically. Tufano et al. [57] study
the compilability of 219,395 snapshots of 100 Java projects from Apache Foundation.
They analyze how frequent broken snapshots happen and the major possible causes be-
hind them. They indicate that only 4% of projects exhibit no compile error and 62% of
all commits are currently not compilable, and 58% of compile errors are related to the
resolution of artifacts.
3.2 Targeting a Module
In this section, I explain my approach to identify 1) distinct revisions of a module and 2)
ancestry relationships between them. In my dissertation, a \module" is considered as
a collection of les and directories in a software repository. This denition is consistent
with the denition of module in References [14,15].
3.2.1 Distinct Revisions of a Module
We target a module to analyze its evolution. The \target" module can be as small as an
extension of a library, or as large as a complete software system hosted by a repository. We
detect \impactful" commits which change the source code in the target, and calculate
11
their impact by comparing the target before and after each commit. There are three
types of commit based on the number of parents: 1) orphan with zero, 2) simple with
one, and 3) merge with more than one parent. Inspired by this, we call an impactful
1. an \orphan", if it creates the target for the rst time.
2. a \simple", if it changes its parent's revision of the target.
3. a \merge", if it carries merge agent's combination of the developments on the
target over at least two branches leading to it.
Figure 3.1 depicts the evolution of a software system.
node: commit edge: parent!child gray: impactful
Figure 3.1: A DAG depicting the evolution of system S.
An orphan commit is an impactful orphan if it contains the rst revision of the target.
If an orphan does not contain the target, we do not consider it an impactful; however,
in this case, there will be a simple commit that creates the target for the rst time. We
mark that simple as an impactful orphan. In Figure 3.1, node
3
is an impactful orphan.
12
We can not calculate the impact of an impactful orphan by comparing two revisions as
the target has just been created.
A simple commit is an impactful if it changes the target. In Figure 3.1, impactful
simples are node
6b
, node
8a
, and node
8b
and are denoted in gray. To nd the impact of
an impactful simple on a certain quality attribute, we measure that quality attribute in
two revisions: the revision created by the commit, and the revision created by its parent.
Then, we calculate the dierence between the measured values to nd whether and how
the quality attribute has changed in that commit.
A merge commit is the result of integrating multiple branches into a single branch.
Developers create branches for various purposes such as adding a new feature or xing a
bug. Each branch separates the main line of the development. When the development
on a branch is nished, the developers integrate that branch into another branch (e.g.,
into the main line of development) using a merge. The target after the merge may be
identical to:
1. the targets of all parents.
2. the target of one of parents.
3. the target of none of parents.
The rst case indicates that no change has happened to the target in any of the
merging branches. We do not consider this merge as an impactful. For example, there
are two branches starting from node
4
and merging at node
5
. There is no impactful on
any of these branches. As a result, the target created by node
5
is identical to the target
created by node
4
.
The second case indicates that the target has changed by at least one impactful in the
branch leading to its identical parent and has remained untouched in other branches. The
merge only transfers those changes to the main line of the development and does not have
13
any impact. In this case, we do not consider this merge as an impactful. For instance,
from node
6
to node
7
, the target only changes at node
6b
, and node
6c
is identical to
node
6b
. As a result, the target created by node
7
is identical to node
6c
and node
6b
, but
it is dierent from node
6a
and node
6
. Particularly, the dierence between node
7
and
node
6a
is the same as the dierence between node
6b
and node
6
.
The third case indicates that the target has changed by impactfuls over at least two
branches leading to the merge. Consequently, the merge has created a new revision
of the target which contains the developments in all branches and is dierent from all
previous revisions. In this case, we consider this merge as an impactful; however, we
can not calculate its impact by comparing two revisions. If we compare the target after
an impactful merge with any of its parents, the dierence will be the merge agent's
combination of all developments over all other branches leading to the impactful merge. If
we compare it with the target after the common ancestor commit of all branches leading to
the impactful merge, the dierence will be the combination of all changes over all branches.
For example, node
9
is an impactful merge which contains the changes introduced in
node
8a
and node
8b
. Since we are interested in studying the changes introduced by one
developer in a single commit, we can not calculate the impact of an impactful merge.
3.2.2 Ancestry Relationships
In order to calculate the impact of an impactful simple node
i
, we need to compare two
revisions: 1) revision
i
created by node
i
and 2) revision
p
created by node
i
's direct parent
node
p
. Revision
i
is dierent from all previous revisions since node
i
is an impactful. If
node
p
is also an impactful then so is revision
p
. Consequently, we need to analyze both
revision
i
and revision
p
to calculate the impact of node
i
. If node
p
is unimpactful then
revision
p
is identical to revision
a
created by an ancestor impactful node
a
. In this case,
14
we can analyze revision
a
instead of revision
p
to avoid redundancy. We call node
a
, the
\impact-parent" of node
i
. Consequently, node
i
is an \impact-child" of node
a
.
In order to nd node
a
, we generate a Directed Acyclic Graph (DAG) of the evolution.
Each node represents a commit. Each edge represents a directparent!child relationship
between two commits.
We start the search by looking at node
i
's (direct) parent node
p
. If node
p
is an
impactful, then we have found the impact-parent. If node
p
is unimpactful, there are two
possibilities: 1) node
p
is an unimpactful simple. In this case, we continue the search
from node
p
's parent. 2) node
p
is an unimpactful merge. In this case, there are two
possibilities: a) the revision node
p
creates is identical with the revisions all its parents
create. This indicates that there has been no change on the module over the branches
leading to node
p
. We randomly pick one of the parents and continue the search from
it. b) the revision node
p
creates is identical with the revision created by one parent
and is dierent from the revisions created by other parent(s). This indicates that the
unimpactful merge carries changes created in the branch leading to that parent. We pick
that parent and continue the search.
We continue this process until we nd an impactful to set it as the impact-parent. If
we reach an unimpactful orphan without nding the impact-parent, then we treat node
i
as an impactful orphan that does not have an impact-parent.
For example, in Figure 3.1, to nd the impact-parent of node
8b
, we look at its parent,
node
8
. Node
8
is an unimpactful simple. We continue the search from node
8
's parent,
node
7
. Node
7
is an unimpactful merge. The target in node
7
is identical to node
6c
as
there is an impactful on the branch leading to node
6c
. We choose node
6c
and continue
the search. Node
6c
is an unimpactful simple. We continue the search from node
6c
's
parent, node
6b
. Node
6b
is an impactful. We set node
6b
as the impact-parent of node
8b
15
(node
6b
! node
8b
). Other impact-parent relationships in Figure 3.1 are node
3
!
node
6b
and node
6b
!node
8a
.
3.3 Maximum Compilation
Uncompilability of commits in a post-development analysis can be the result of a devel-
oper's fault during development (e.g., syntax error) or compilation environment issues
(e.g., unavailability of dependencies) during mining. The rst step towards understand-
ing compilability over commit history is reaching the maximum compilation by xing the
environment issues and identifying all commits that are uncompilable as a result of a
developer's fault.
We use the following terminology to categorize impactfuls based on their impact on
compilability. An impactful is a \broken" if it creates an uncompilable revision; oth-
erwise, it is a \solid". A broken is a \breaker" if it breaks the compilability of its
solid impact-parent; otherwise it is a \carrier". A solid is a \xer" if it xes its bro-
ken impact-parent; otherwise it is a \neutral". We do not categorize impactful orphans
and merges into breaker, carrier, xer, and neutral as their impact is not identiable by
comparing the compilability of two revisions.
For each system, we modify the default build command to compile the target and
to skip running tests and modiers. For example, this command for the core module of
Google Error-Prune is \mvn clean compile -DskipTests -pl core -am". We compile older
revisions using dierent versions of Java and build tools.
After the rst attempt to compile all revisions, we generate a report showing sequences
of brokens. Table 3.1 shows this report for three sequences in a subject system. Each
row represents a broken, its impact-parent, and its impact-child. All commits belonging
16
to a sequence are denoted in the same color. All broken impact-parent and impact-child
commits are denoted in orange.
Table 3.1: Report on broken sequences.
Time Message Commit Author Sequence
Broken Child Broken Child Broken Parent Child Broken Child L D A
6/12/06 13:51 6/12/06 13:52 NUTCH- NUTCH 13918f7 1a47936 0c08989 dev 3 dev 3 1 1 1
12/29/05 7:28 12/29/05 23:55 A frame Fix thi b117756 b8bd3f1 c4fd3de dev 2 dev 2 1 987 1
5/4/05 12:38 5/4/05 12:57 Fixed b Add res 1e027c5 4b78b a17464e dev 1 dev 1 6 1372 2
5/4/05 6:53 5/4/05 12:38 Whitesp Fixed b 4b78b 9707c44 1e027c5 dev 2 dev 1
5/4/05 3:10 5/4/05 6:53 Add uti Whitesp 9707c44 e21843f 4b78b dev 2 dev 2
5/3/05 15:32 5/4/05 3:10 Better Add uti e21843f 6a9871b 9707c44 dev 1 dev 2
5/3/05 14:07 5/3/05 15:32 Update Better 6a9871b 45b5cf7 e21843f dev 1 dev 1
5/3/05 14:05 5/3/05 14:07 Rewrite Update 45b5cf7 aa1b8c5 6a9871b dev 1 dev 1
L: Length (#) D: Duration (m) A: Authors (#)
We investigate the report to nd the longest sequences. Exceptionally long sequences
are usually the result of a change in the structure and/or the build command, or a
missing dependency. For each sequence, we compare the rst broken (breaker) with its
impact-parent and the last broken with its impact-child (xer) to determine the reason
for introduction and resolution of the error.
If the structure has changed, we consider an alternative for the target. For example,
the core module of Apache HttpClient is changed from \httpclient/src/main/java" to
\httpclient5/src/main /java". This usually results in an update to the build command.
If there is a missing dependency, we x it. For example, Google MOE has a missing
dependency, \joda-time-2.3.jar", over a period. We download it to \lib/" directory. In
another example, the SNAPSHOT version of a dependency in Google-Truth is missing.
We modify core/pom.xml to replace the SNAPSHOT with a stable version.
After xing the environment issues, we compile the remaining brokens. We repeat
this process until we reach the maximum compilation and identify all brokens caused by
a developer's fault.
17
In order to reduce the manual eort involved in this process and to achieve replication
and scale, I have designed and implemented an automated solution to compile all revi-
sions using dierent versions of Java and build tools, to resolve missing dependencies for
multiple revisions, and to consider changes in structure and build commands. I explain
this solution in Section 4.2.
3.4 Empirical Study Setup
3.4.1 Research Questions
1. How eective is my approach in reaching the maximum compilation?
The rst step towards better understanding compilability is reaching the maximum
compilation with a high ratio and identifying the impact of every commit on com-
pilability. In this research question, we evaluate the eectiveness of my approach
in performing this step.
2. What are the characteristics of sequences of uncompilable commits?
Uncompilability results in missing data and an incomplete analysis. In this re-
search question, we investigate the periods where the software is uncompilable to
understand how long it lasts and how many impactful commits and developers are
involved.
3. Is it feasible to predict uncompilability based on commit metadata?
Commit metadata contains a wealth of information on development activities. In
this research question, we assess the feasibility of predicting whether a commit
is uncompilable using commit metadata (i.e., time, message, author) and without
considering code dierences.
4. Why do developers commit broken code and how to prevent it?
18
Committing uncompilable software can be a symptom of careless development. In
this research question, we inspect uncompilable commits to understand why devel-
opers commit broken code and provide a guideline for preventing it.
3.4.2 Data Collection
We analyze the development history of 68 open-source software systems across 3 dierent
organizations (Apache, Google, and Net
ix). We have gradually built this dataset by
rst [9] analyzing 38 Apache systems, and then [12] extending it to include revisions
committed in 2017/18 of those 38 Apache systems, as well as 30 new open-source systems
from Google and Net
ix comprising a total of 37838 distinct revisions and more than 1.5
billion lines of code.
For selecting Google and Net
ix subject systems, we retrieve all Java systems owned
by Google and Net
ix on GitHub. We select a system only if it 1) requires Ant, Maven, or
Gradle for compilation and is not an Android, a Bazel, or an Eclipse project, 2) does not
require manual installation of other tools (e.g., Protoc) for compilation, 3) is an ocial
product of the organization, and 4) has a core module containing a substantial amount
of code. We target the core module and identify impactfuls in each system and exclude
the ones with less than 100 distinct revisions.
Table 3.2: Experiment's scale.
Org. Timespan Sys.
Developers Commits MS
All. Imp. All Imp. LOC
Apache 01/02-02/18 38 1424 937 46952 22627 734.3
Google 08/08-01/18 18 1217 771 19249 11527 760
Net
ix 05/11-01/18 12 641 290 11379 3684 36.8
Total 01/02-02/18 68 3282 1998 77580 37838 1531.1
Imp:: Impactful commits and developers that change code in the core module.
19
For selecting Apache subject systems, we use the same criteria as Net
ix and Google,
except that we only consider subject systems with less than 3000 commits by April 2017.
Table 3.2 shows the scale of our study.
Next, we analyze each system to reach its maximum compilation using the method
explained in Section 3.3. We inspect all brokens to conrm that the software is uncompi-
lable as a result of a developer's fault. We also identify all breakers, carriers, xers, and
neutrals.
3.5 Results
3.5.1 How eective is my approach in reaching the maximum compilation?
We achieve an average system compilation ratio of 98.4% for Apache, 98.1% for Google,
and 93.9% for Net
ix. Figure 3.2 shows the distribution of compilation ratio across sys-
tems in each organization. The compilation ratio across all revisions in each organization
is 98.4% for Apache, 99.0% for Google, and 94.3% for Net
ix.
0
10
20
30
40
50
60
70
80
81 87 90 92 93 94 95 96 97 98 99 100
Frequency (%)
Compilation Ratio (%)
Apache Google Netflix
Figure 3.2: Compilation ratio distribution.
20
We identify 348 breakers, 312 carriers, 347 xers, and 36,141 neutrals. The number
of breakers and xers can be dierent as each breaker can have 1) no xer if it is never
xed, 2) one xer, or 3) more than one xer if it is xed in more than one branch starting
from it. Table 3.3 shows the number of commits in each category.
Table 3.3: Number of breakers, carriers, xers, and neutrals.
Org.
Broken Solid
Breaker Carrier Total Fixer Neutral Total
Apache 215 139 354 220 21776 21996
Google 44 69 113 46 11244 11290
Net
ix 89 104 193 81 3121 3202
Total 348 312 660 347 36141 36488
In 47% of systems, we reach the maximum compilation using one instruction (i.e., the
default build command). Also in 30% of them, one alternative suces to x environment
issues (i.e., xing a missing dependency and addressing a change in structure/build tool).
However, up to 5 dierent instructions are required for some systems. Table 3.4 shows
the frequency of the number of required instructions and build tools for systems in each
organization. For example, 2 Net
ix systems use both Maven and Gradle over their
commit history and 13 Apache systems need two instructions.
The result of this analysis shows that my approach is capable of achieving a high
compilation ratio and identifying the impact of commits on compilability. It also shows
that my approach does not demand an extensive number of instructions to reach the
maximum compilation as only 5 systems need more than three instructions.
3.5.2 What are the characteristics of sequences of uncompilable commits?
We dene a \broken sequence" as a series of back-to-back impactful simples starting with
a breaker and ending with a xer. We identify a total of 303 broken sequences (210 for
21
Table 3.4: Frequency of the number of instructions and tools.
Org.
Instructions Build Tools
1 2 3 4 5 A M G AM MG
Apache 18 13 4 2 1 2 33 0 3 0
Google 7 6 4 1 0 2 12 1 2 1
Net
ix 7 1 3 0 1 0 0 10 0 2
Total 32 20 11 3 2 4 45 11 5 3
A: Ant M: Maven G: Gradle
Apache, 43 for Google, and 50 for Net
ix) in our dataset. For each sequence, the \length"
is the number of brokens, the \duration" is the time between the breaker and the xer,
and the \involvement" is the number of authors. Figure 3.3 shows the distribution of
these three characteristics. For example in Net
ix, 74% of broken sequences contain one
commit, 33% last between one minute to an hour, and 89% involve one author.
In 79% of sequences, length is 1 meaning that the breaker is the impact-parent of
the xer. However, in 4% of sequences, more than 10 brokens are involved which can
imply that the code is repetitively changing without it even being compilable. In 57%
of sequences, duration is no more than an hour. Nevertheless, in 17% of sequences the
software remains uncompilable for several weeks or even months. Although, in 93% of
sequences involvement is 1; we identify 56 sequences that 2 authors and 8 sequences that
3 authors are changing the software while it is uncompilable.
3.5.3 Is it feasible to predict uncompilability based on commit metadata?
Next, we study the dierence between the metadata (i.e., time, message, author) of
breaker, carrier, xer, and neutral to assess the feasibility of creating a model to predict
uncompilability.
22
0
10
20
30
40
50
60
70
80
1 2 4 7 11 16 22
Frequency (%)
Length (# of Brokens)
0
10
20
30
40
50
60
Min Hour Day Week Month Year More
Frequency (%)
Duration (Time Between the Breaker and the Fixer)
0
10
20
30
40
50
60
70
80
90
100
1 2 3
Frequency (%)
Involvement (# of Authors)
Apache
Google
Netflix
Figure 3.3: Distribution of characteristics of broken sequences.
23
Time. For each impactful, we dene the \interval" as the time dierence between
the commit and its impact-parent. Although it takes several minutes to build most of
our subject systems, we observe that a large number of impactfuls have a relatively short
interval. For example, we identify 3,056 impactfuls with an interval less than a minute.
This can indicate that the developers do not necessarily attempt to build the software
before committing the change.
To investigate it further, we calculate the interval for all breakers, carriers, xers,
and neutrals. Logically, the intervals should be positive numbers since every commit
is younger than its parent, but we identify 1,453 impactfuls with a negative interval.
The negative values can be a result of inaccurate time on the local machine of the
developer or a post-commit modication in the metadata. We exclude the impactfuls
with negative intervals and calculate the Cumulative Distribution Function (CDF) for
log
7:75s
(interval). We use 7:75s as the scale to present the result in a more understand-
able way as log
7:75s
(1m) = 2 and log
7:75s
(1h) = 4. Figure 3.4 depicts the CDFs. For
example, it shows that almost 30% of breakers and neutrals in Google happen in less
than an hour. However, this ratio is almost 70% for carriers and xers.
Interestingly, CDFs show no dierence between breaker and neutral in short inter-
vals (i.e., less than an hour). This contradicts our initial assumption and preliminary
observation that when developers commit too early (apparently without attempting the
compilation), the chance of introducing an error is higher. However, carrier and xer
tend to have shorter intervals in comparison with breaker and neutral. In all three orga-
nizations, a high ratio of carriers and xers are committed in less than a few minutes.
This can imply that when a developer introduces a compile error, oftentimes they either
x it quickly or continue the development by committing more changes in a short period
without xing the existing errors. In addition, it shows that carrier and xer do not
happen very late and their distribution is not that wide, while the distribution of breaker
24
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6 7 8 9 10
Cumulative Probability
Apache
Breaker Carrier Fixer Neutral
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6 7 8 9 10
Cumulative Probability
Google
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6 7 8 9 10
Cumulative Probability
log(interval,7.75s)
Netflix
1! a few seconds 2! a minute 3! a few minutes
4! an hour 5! a few hours 6! a couple days
7! a couple weeks 8! a couple months 9! more than a year
Figure 3.4: CDF of breaker, carrier, xer, and neutral interval.
25
and neutral is wide. This can imply that developers do not usually commit a broken and
leave the software for a long interval.
Motivated by these observations, we investigate whether we could use an impactful's
interval to create a model to predict uncompilability of its impact-parent. We convert
the interval to log
7:75s
scale. If the interval is negative, we assume interval = 1s or
log
7:75s
(interval) = 0. We calculate the probability distribution of interval of impactfuls
where their impact-parent is broken and solid. The result as depicted in Figure 3.5 shows
the dierence between the two distributions.
0 2 4 6 8 10
log(interval,7.75s)
0.00
0.05
0.10
0.15
0.20
0.25
probability density
broken
solid
Figure 3.5: Probability density of interval in log
7:75s
scale.
Message. In manual inspection, we observe that in many broken sequences, there
are subsequent commits with very similar or almost identical messages. To quantify this
observation, we implement string matching techniques suggested by Wael et al. [24] to
determine the similarity of the messages of impactfuls and their impact-parent. Most
messages are short and ungrammatical and some contain symbolic words (e.g., URL).
We nd that many similar messages are, in fact, almost identical rather than being
a paraphrase of one another. Therefore, instead of assessing semantics similarity, we
implement Jaccard Distance to compare the two messages. The result, as depicted in
Figure 3.6, shows that the probability of message similarity of an impactful with its
impact-parent where the parent is broken is dierent from those where the parent is
26
solid. It can be seen that the probability mass of broken is distributed to the right of the
x-axis more than that of solid. This can be explained by that a high similarity between
messages indicates that the rst commit is incomplete and needs follow up commits to
complete the same task.
0.0 0.2 0.4 0.6 0.8 1.0
message similarity
0.5
1.0
1.5
2.0
2.5
probability density
broken
solid
Figure 3.6: Probability density of message similarity.
In addition, we observe that the messages of many back-to-back brokens mention the
same issue number. To quantify this observation, we extract the issue numbers from the
message of impactfuls and their impact-parent and collect all pairs where both messages
contain an issue number. The result shows that 55% of commits whose parent is broken
mention the same issue number as the parent. This ratio is 30% when the parent is solid.
We observe that developers even acknowledge the existence of the error when x-
ing it by saying specic words (e.g., \oops" and \sorry") in the message. We analyze
the message of impactfuls to identify the words that could be used as an indicator of
existence of compile error in their impact-parent using Naive Bayes and Logistic Re-
gression. We analyze the parameters of the model, which are probability estimation in
Naive Bayes, and coecients of decision boundary in Logistic Regression. The result
shows that words \oops", \compile", \compilation", \missing", \missed", \adapted" and
phrases that contain an issue number (e.g., \drill-4134", \opennlp-580") contribute to
the prediction model.
27
Author. The authors of impactfuls and their impact-parent are the same in 62%
of breakers, 67% of carriers, 88% of xers, and 63% of neutrals. However, our analysis
shows that considering this as a feature does not contribute to the prediction model.
In summary, the following 6 features from a pair of an impactful and its impact-
parent are extracted for training a model to predict uncompilability of the parent:
1) The interval of the commit, 2) whether it is less than an hour, and 3) whether it is
negative. 4) The similarity between messages of the commit and its parent, 5) whether
they mention the same issue number, and 6) whether the message contains some specic
words (e.g., \oops").
Our dataset is comprised of 37,148 pairs of impactfuls and their impact-parent, and is
divided to 36,489 pairs with a solid parent (i.e., breaker or neutral), and 659 pairs with
a broken parent (i.e., carrier or xer). Since the dataset is imbalanced, we can not use it
to train the model directly. He et al. [28] suggest techniques for dealing with imbalanced
data. We implement the random under-sampling technique, which randomly selects a
set of majority class instances, and removes these instances from that class to make the
remaining data balanced. The balanced dataset contains 1,200 pairs. We randomly split
it to a training dataset composed of 900 pairs and a test dataset composed of 300 pairs.
We train the logistic regression model, test it, and achieve an F1-score of 0.89 and an
AUC of 0.96. Table 3.5 shows the confusion matrix. This model shows the feasibility of
applying statistical learning to predict uncompilability based on commit metadata and
without considering code dierences.
3.5.4 Why do developers commit broken code and how to prevent it?
A revision of a software system is expected to be compilable. However, developers know-
ingly and unknowingly commit uncompilable code. Hassan et al. [26] propose a taxonomy
for the reasons of build failure in the latest revision in top 200 Java projects of GitHub.
28
Table 3.5: Confusion matrix.
n=300 Predicted Solid Predicted Broken Total
Actual Solid 153 9 162
Actual Broken 19 119 138
Total 172 128
This work is limited as they do not resolve the compilation environment issues (e.g.,
missing dependencies) and their approach is not evolutionary as they only look at one
revision (latest revision) per system.
We take further steps towards better understanding uncompilability over commit
history by answering the following research question. Why do developers commit broken
code and how to prevent it? We study the brokens in 38 Apache systems to investigate
what causes the error. Our investigation reveals a variety of interesting phenomena.
Based on this eort, we provide a guideline for preventing developers from committing
broken code:
1. Relying on snapshot versions of dependencies may cause compile error.
We have observed that some revisions depend on snapshot versions of their depen-
dencies. Snapshots are what might become an actual release and are subject to
frequent updates.
For example, 28 of Qpid JMS, 122 of Tiles, 56 of Phoenix, 26 of CXF-Fediz, 28 of
Flume, and 13 of C-Validator impactful revisions depend on dierent snapshot ver-
sions of their dependencies (e.g., proton-j, struts-master, hbase, wss4j, hadoop-core,
and common-parent). In all of these instances, we have resolved the dependency
issues by automatically altering the build conguration and using stable versions of
those dependencies.
29
In one exceptional case in CXF-Fediz, APIs in a snapshot version of its dependency
are slightly dierent from the ocial version. This makes 4 subsequent impactful
revisions (from commit 00c043f) completely uncompilable. In this period, one class
uses a method call that does not exist in the ocial release of its dependency. This
problem is xed at commit d39f7f0 which is dedicated to resolving this issue.
This suggests that using snapshots is not suitable for repeatable build since snap-
shots may change, move or be replaced, particularly when the actual release is
ready.
2. Compiling the project in a new environment from scratch may reveal
missing dependencies.
One of our most interesting observations is a compile error that is introduced at
Zookeeper commit 1d2a786 and xed after 52 subsequent commits at 4ee7c1d. In
this case, a modication in build.xml le breaks the execution of a dependency.
However, that dependency is not triggered unless the system is built from scratch.
The developers do not realize that problem for almost 6 months. We x this de-
pendency issue before including Zookeeper's result in our dataset.
In another example, at Nutch commit 0032314 a name-space change happens as one
of the dependencies (org.gora) migrates to Apache foundation (org.apache.gora).
This causes a missing dependency error that remains in the repository for 8 months
until that branch is merged into another branch at commit 0460d89. We x the
problem by compiling and deploying Gora on the analysis server.
This suggests that compiling a system from scratch can help nding missing depen-
dencies when the build conguration les change.
3. After a le is added, compiling the project in a new environment may
reveal an error.
30
In multiple cases, we have observed that developers forget to stage and commit new
les in their local repository. For example, in Helix commit 917af3e, a relatively
large set of modications aects 60 les in the system. This introduces a compile
error because of some missing classes that lasts for 23 commits over a period of 22
days. The missing classes are added to the system at commit c589fb8.
In Mina-SSHD commit f26b5f7, a relatively large refactoring happens which intro-
duces a mismatch between a class name and its corresponding le name. This error
is xed after 12 subsequent commits over a period of 35 days at commit 070a8e7.
The reason can be that the working directory and the remote directory are not
completely synced as renamed les are not automatically staged in Git.
This suggests that developers need to compile the new revisions in a new environ-
ment other than their own working repository when they add new les.
4. When contributing to the system alone, do compile the system in a new
environment once in a while.
We have observed that over the periods that only one developer works on the system,
dierent types of compile errors may remain in the repository for several days. For
example, at C-Net commit 44cec10, a small patch containing a setter functions with
missing parenthesis is applied to the source code. This typo is xed 4 impactfuls
later at commit b1147fd after 12 days. The xing commit only contains one line of
code, and its message is \Fix typo."
In another example, in C-IO commit 5064788, the compilation depends on the
absolute path of a directory in the local machine of one of the developers. The
problem remains in the system for 9 subsequent impactfuls over 4 days and is nally
xed at commit 2eb161d. Only one developer is working on the system during this
period.
31
In another instance, at C-Jexl commit 86b3e40, a refactoring causes a name clash
which lasts for 9 subsequent commits over a period of 13 days. The error is xed
at commit 000c48d. Only one developer is working on the system over this period.
This indicates that when the level of involvement decreases, developers may fail to
follow good practice.
5. Committing too early may result in a broken.
We have observed that developers may introduce a compile error when they com-
mit too early. For example, in C-VFS commit 54b824f, the developer intends to
increase the performance by avoiding Boolean instantiation in one line of code.
While changing that line, a typo (missing parenthesis) is introduced in the source
code. In less than a minute, the following commit (26af957) resolves this problem.
In another instance, in C-BCEL commit 9640239, the developer intends to change
a comment (according to the commit message). However, two lines of code are
modied along with that comment and a method name mismatch is introduced.
The subsequent commit (676ca07) xes the problem two minutes later.
In Hama commit bac5191, a small refactoring (variable rename) causes a compile
error. The compile error is xed after 4 minutes at commit 6948f4d which only
contains a change in one line of code.
At OpenNLP commit 740d5a5, 10 new les are added to the repository without any
modication to the other les. The next two commits over the next two minutes
have the same commit message as the rst one and contain the missing parts to
make the code base compilable.
In some cases, the developers acknowledge compilation issues, still, they commit the
changes and x the problems in the subsequent commits. For example, CXF-Fediz
commit cae5c37 is broken and says \Precompile Subject Constraints". The compile
32
errors are xed in the next commit b2dc6b4. C-Collections commit 90f4139 breaks
the compilation. The next impactful 67c51ed says \Fix compile error.", but the
problem is xed in less than an hour in the next impactful 9acc3e8 where it says \...
to avoid compilation problems.". The message of C-Conguration commit 5aaa012
says \... there are unresolved compilation failures." The problems are xed after
four impactfuls over a period of 2 minutes at commit 3a771aa.
These observation suggest that committing too early may result in a broken.
6. Committing too often may create subsequent brokens.
In some cases, developers tend to commit multiple brokens over a short period.
For example, at Santuario-Java commit 1fe126e, the system undergoes refactoring
which breaks its compilability. The next commit also introduces more refactoring
to the already broken code. Finally, all compile errors are xed at commit 8d7cec1.
These three commits are pushed to the repository within a period of 2 minutes.
At Mina commit 02f02e7, uncompilable code is committed. Then it is followed by
two brokens cleaning the rst one up within 2 minutes. The compile error is nally
resolved the next day at commit 2cc1d14.
At C-Net commit 2d5c8f8, a refactoring to use an interface instead of an implemen-
tation introduces a compile error which aects the next 12 impactfuls over a period
of 2 hours. This compile error is resolved at commit c2ceeda. The commit contains
multiple changes in the same class le. This indicates that sometimes developers do
not compile the code before committing their changes, even if their changes impact
multiple lines of code.
33
The preceding two observations suggest that although it is recommended by some
experts to commit early and often
1
, committing too early and too often can degrade
the quality of the source code by introducing basic compile errors.
7. Large refactorings may create brokens.
At CXF-Fediz commit 3c0a524, a relatively large refactoring impacting 11 les
breaks the compilation. The problem remains in the system for 4 impactfuls over
two days. It is nally xed at commit b2dc6b4.
In Drill, commit d068b3e introduces multiple compile errors after a relatively large
refactoring changing 15 les. The next 8 subsequent impactfuls over the next 2
hours are dedicated mostly to xing those errors.
In C-SCXML, 10 impactfuls over 2 days create 2 broken sequences: One is starting
from 094fefe and ending at 9bc786e and another starting from 2665b57 and ending
at 1b32987. The system is undergoing refactoring for type safety improvement and
removing unnecessary casts.
This observation suggests that developers should compile the source code in a new
environment after a large refactoring before producing a new commit.
8. Maintenance and code cleanups may create brokens.
At C-Jexl commit 9dcf08d, removing depreciate code causes a compile error that
is resolved after two subsequent commits at commit 0fae596. In another example,
Hama commit 893ba2a introduces a compile error while removing depreciate code
which stays in the repository for 4 subsequent impactfuls. The issue is nally
resolved at 17e0cbb.
In some cases, such as Nutch commit 9714c44, a cleanup in old code introduces a
compile error which is immediately xed in the next commit, 024b6a6.
1
http://sethrobertson.github.io/GitBestPractices/#commit
34
Three impactfuls in C-JCS are broken. Interestingly, all of them are maintenance
related. A patch is applied to the main line of code at commit c2b9de1. A simple
refactoring happens at commit cfa533f. A change in the access modiers of a class
is introduced at commit 0e55dab.
In C-BCEL commit 25d3c6f, a large number of classes are undergoing mainte-
nance for adding proper annotations. This introduces a compile error because of a
mistaken double annotation. The extra annotation is removed after 5 subsequent
commits at commit bd70827 in less than an hour.
After a major rewrite in one of the core classes at Nutch commit 45b5cf7, an in-
compatible type comparison is introduced. This problem remains in the system
for 6 subsequent commits over a period of 2 days. It is nally resolved at commit
a17464e.
In summary, our analysis suggests that compile errors can last for several commits
in the development history. Interestingly, most of the compile errors that remain in the
repository for multiple commits happen during a maintenance task, such as removing
unused/depreciated code. Our observations indicate that developers do not necessary
follow best practice when committing their code and that developers should compile the
code, even if the changes are minimal.
35
Chapter 4
Compilability and Its Impact on Software Quality
Some static and dynamic analysis techniques (e.g., FindBugs [7] and test-coverage [36])
depend on the availability of bytecode and it is necessary to compile the software before
analyzing it using those techniques. Alexandru et al. [1] declare the unavailability of
compiled revisions to be the main unresolved source for the manual eort in software
evolution analysis.
In addition, not only are compiled revisions unavailable, there are broken sequences
over which the software is completely uncompilable and bytecode generation is not possi-
ble. For some of those sequences, even conducting source code analysis is not possible as
the software is not parsable. Consequently, there is a dearth of empirical data regarding
how software quality evolves when the software is uncompilable and the nature of change
to quality metrics measured by bytecode analysis techniques.
In this chapter, I take steps towards understanding how software quality evolves when
it is uncompilable. First, I analyze compilable revisions using source/byte code analysis
techniques to answer the following research questions. 1) How important is analyzing
every commit from multiple perspectives? 2) How do quality metrics change when the
software is compilable? 3) How eective is my approach in identifying change in quality
metrics? Next, I analyze uncompilable broken sequences to answer the following research
question. 4) How do quality metrics change when the software is uncompilable?
36
The remainder of this chapter is organized as follows. I discuss related work in Section
4.1. I explain my method to scale the compilation and analysis of thousands of commits
in Section 4.2. I explain the research questions and data collection methods in Section
4.3 and the results of the empirical study in Section 4.4.
4.1 Related Work
I focus the related work discussion in two areas: software quality evolution analysis
approaches that 1) analyze full commit history without employing bytecode analysis
[1, 19, 53, 54, 55, 59] or 2) employ bytecode analysis without being able to analyze full
commit history [11,17,30,46,49].
Full commit history. Studies that analyze the full commit history (i.e., thousands of
revisions) to understand software quality evolution run static analysis on distinct revisions
of les. As a result, they are not capable of generating bytecode and employing bytecode
analysis.
Dyer et al. [19] introduce BOA, a query system developed specically for mining
repositories. BOA parses each Java le to extract and store the Abstract Syntax Tree
(AST). Alexandru et al. [1] introduce Lisa, a tool for reducing redundancies in multi-
revision code analysis. Lisa runs the analysis on every le in the bare local Git repository
at the start of the analysis and loads the already analyzed results for each le when
analyzing every revision. Tufano et al. [59] introduce HistoryMiner that mines every
commit in the history of software and applies a lightweight static analysis technique on
les aected by the commit to check if a code smell is introduced. Trautsch et al. [55]
introduce SmartShark, a distributed framework designed to address the problems with
external validity of mining software repository studies. SmartShark runs static analysis
37
on aected les by every commit, constructs the AST, and calculates metrics such as
number of clone for each le.
Bytecode analysis. Studies that employ bytecode analysis have a limited scope as
they only analyze releases or a relatively small subset of commits.
Spacco et al. [49] study warnings reported by FindBugs, which requires bytecode
analysis, and introduce techniques for tracking warnings across software releases. In
our work on software architecture evolution [11, 30], we analyze both source code and
bytecode to understand how software quality evolves across releases. Shahbazian et al.
[46] analyze 301 commits associated to software issues to understand when architectural
change happens. Dini et al. [17] conduct regression analysis using Ekstazi that requires
bytecode and analyze 800 commits from 8 software systems (100 commits from each
software).
In summary. The capability of reaching maximum compilation with a high ratio
over commit history enables me to conduct bytecode analysis and distinguishes my work
from other mining approaches that analyze the full commit history. In addition, my work
is unique in that it is capable of identifying the impact of each commit on compilability.
This is essential for understanding how software quality evolves when the software is
compilable/uncompilable.
4.2 Scalable Analysis
Compiling and analyzing commit history to assess software quality is computationally
expensive. For example, Tufano et al. [58] sequentially analyze the commit history of
200 systems. They only analyze the changed les using a light-weight static analysis
technique, still the analysis takes several weeks. Trautsch et al. [55] address the problem
of scalability by distributing the analysis of changed les on HPC clusters. My approach
38
to achieve scale is distributing the analysis of distinct revisions of a module (target)
over the cloud which enables customizing the analysis environment and does not require
expensive on-premise infrastructure.
I have designed and implemented SQUAAD [10], a comprehensive cloud-based frame-
work to conduct large-scale empirical studies on software quality evolution. SQUAAD
targets a module and identies its distinct revisions and ancestry relationships between
them. It distributes the revisions over the cloud to be compiled and analyzed using static
(and dynamic) analysis. It collects quality metrics calculated by dierent program anal-
ysis tools in a unied relational schema which facilitates multi-perspective analysis. I
explain SQUAAD's architecture as is depicted in Figure 4.1.
Analysis
Report Analysis
Report Git Repository Compile
Configuration Source/
Byte-Code Pre
Analysis
Wrapper Analysis
Wrapper Post
Analysis
Wrapper History
Compiler Relational
Database GitHubAnalyzer GitAnalyzer Project/User
Metadata Git
Repository Analysis
Report ReportParser Project/Users
Information Commit
Analysis Value Vectors Artifact Component Dataflow
Connector Orchestrator Impactful
Commits info Data
Analyzer Plotter Analysis Server Data Master Repository Host
(GitHub) NoSQL
Datastore Git Repository Compile
Configuration Source/
Byte-Code Pre
Analysis
Wrapper Analysis
Wrapper Post
Analysis
Wrapper History
Compiler Git Repository Compile
Configuration Analysis
Report Source/
Byte Code Pre
Analysis
Wrapper Analysis
Wrapper Post
Analysis
Wrapper History
Compiler RecoveryUnit Figure 4.1: Architecture of SQUAAD.
SQUAAD distributes the analysis over \RecoveryUnit"s. A RecoveryUnit can be
a cloud instance, a virtual machine (deployed on an on-premise server), or a process.
Each RecoveryUnit downloads the source code of multiple revisions, compiles the source
code, and runs program analysis on source/byte-code. For complex analyses that need
39
either a server deployment or a specic environment to run, SQUAAD provisions the
RecoveryUnits over the cloud.
Each RecoveryUnit contains a \HistoryCompiler" which implements my method to
reach the maximum compilation discussed in Section 3.3. The HistoryCompiler receives
a \Compile Conguration" le for each subject system that contains its Git repository's
address, target module location(s), and instruction(s) to compile the software and x
environment issues. If there are multiple build commands, the HistoryCompiler executes
them based on their declaration order. It is capable of compiling older revisions using
dierent versions of Java and build tools.
SQUAAD supports incorporating program analysis tools by providing a standard
\AnalysisWrapper" that serves as an interface. Each AnalysisWrapper is accompanied
by a \PreAnalysisWrapper" and a \PostAnalysisWrapper". PreAnalysisWrapper
prepares environment and data for running the analysis. An AnalysisWrapper is invoked
if and only if its PreAnalysisWrapper executes successfully. Some tools require extra
work after the analysis is done (e.g., retrieving data from their internal database). This
is done by a PostAnalysisWrapper which then stores artifacts generated by the analysis
into a \NoSQL Datastore".
SQUAAD retrieves quality metrics from the artifacts generated by program analysis
tools using \ReportParser"s . A ReportParser receives a report in a standard format
(e.g., JSON) and transforms it to a map of metric-values. This map as well as the revision
ID will be stored in a relational database.
SQUAAD distributes the analysis using its \Orchestrator". The Orchestrator inter-
acts with cloud infrastructures to perform cloud management operations, such as launch-
ing/stopping/terminating instances and setting up required software. The Orchestrator
receives a list of revisions, chronologically sorts them, and uses a round-robin algorithm
40
to schedule the analysis of each revision. This results in a relatively balanced distribution
of revisions over RecoveryUnits and reduces the overall mining time.
The \GitAnalyzer" downloads a system's Git repository. It runs a light-weight
mining process to retrieve all commits' metadata and stores them in the database. It
implements my methods to detect impactfuls and their relationships as explained earlier
in Sections 3.2.1 and 3.2.2. This information is later used by the Orchestrator to distribute
the revisions. The \GitHubAnalyzer" retrieves organizations/projects' information via
GitHub API and stores it in the relational database. This includes but not limited to,
the list of repositories and developers of an organization, as well as the number of forks
of a repository. The \DataAnalyzer" and the \Plotter" employ the data generated by
other components, run data analytics, and visualize software quality evolution.
4.3 Empirical Study Setup
4.3.1 Research Questions
1. How important is analyzing every commit from multiple perspectives?
Uncompilability causes missing data (i.e., unanalyzed commits). Even a small un-
analyzed commit may contain change in dierent quality metrics. We study the
compilable revisions using dierent techniques to understand the probability for a
quality metric to change when another one does not change. The result of this re-
search question shows the importance of analyzing every commit (even small ones)
from multiple perspectives.
2. How do quality metrics change when the software is compilable?
A software revision is expected to be compilable. To understand the dierence
between the impact of commits on software quality based on their compilability,
rst we study how quality metrics change when the software is compilable. The
41
result of this research question provides us with a ground truth to compare against
the results for uncompilable commits.
3. How eective is my approach in identifying change in quality metrics?
Reaching the maximum compilation results in a more complete analysis as it min-
imizes missing data and enables identifying all changes to quality metrics. We
analyze the compiled revisions to investigate whether we can identify statistically
signicant dierence between two groups of commits in the same organization in
terms of changing quality metrics, specially for the metrics with low change fre-
quency.
4. How do quality metrics change when the software is uncompilable?
There is a dearth of empirical data on how software quality evolves when the soft-
ware is uncompilable. We analyze each uncompilable sequence to understand how
software quality changes from the start to the end of the sequence. Comparing this
result against the ground truth obtained from the compilable commits sheds light
on the impact of uncompilability on software quality.
4.3.2 Data Collection
We analyze all compiled distinct revisions of Apache systems using three program analysis
techniques: PMD, FindBugs, and SonarQube. PMD depends on the availability of source
code. FindBugs depends on the availability of bytecode. Executing PMD and FindBugs
is straightforward. It is a one line bash script command and needs the path to the
source/binary les. Both tools generate an XML report which can be parsed by an XML
ReportParser.
SonarQube does not require bytecode. However, it has its own analysis server and
saves the results in an internal database. It requires generating a conguration le for
42
each revision. To run SonarQube, we develop a PreAnalysisWrapper to deploy SonarQube
server on each RecoveryUnit and to wait till the server is available. To analyze each
revision, we develop an AnalysisWrapper to generate the SonarQube conguration le
and to execute SonarQube on each revision. When the execution of an analysis is done, we
need to fetch the data using SonarQube RESTful API. We face four technical challenges
in the process of execution and retrieving data from SonarQube server. 1) The server
randomly crashes when concurrently analyzing more than three revisions. 2) The result of
the analysis is not available through the API immediately after the execution. 3) There
is a limitation on the number of data points returned by each request. 4) The server
crashes after analyzing a few hundred revisions in a RecoveryUnit.
We resolve the rst challenge by distributing the revisions over multiple RecoveryU-
nits as opposed to analyzing multiple revisions concurrently on a single RecoveryUnit.
We resolve the next three challenges by developing a PostAnalysisWrapper that a) peri-
odically checks if the new result is available, b) sends multiple requests to the server and
collects each response, c) merges all responses and creates a nal report for the analyzed
revision, and d) removes the result from SonarQube server to prevent a crash.
Finally, using XML and JSON ReportParsers, we parse the results generated by the
three analysis tools and store the quality metrics into a unied relational database. We
consider three groups of metrics: Basic, Code Quality, and Security. Basic metrics are
simple to calculate in comparison with code quality and security metrics. Table 4.1 lists
the quality metrics selected for this study.
Basic metrics measure the size of a software system. For the basic group, we select
lines of code, functions, and classes. Lines of code (LC) is the number of physical lines
excluding whitespaces, tabulations, and comments. Functions (FN) is the number of
methods. Classes (CS) is the number of classes including annotations, enums, interfaces,
and nested classes.
43
Table 4.1: Quality metrics.
Group Abbr. Tool Description
LC SonarQube Physical Lines excl. Whitespaces/Comments
Basic FN SonarQube Functions
CS FindBugs Classes
CX SonarQube Complexity (Number of Paths)
Code SM SonarQube Code Smells
Quality PD PMD Empty Code, Naming, Braces, Import Statements,
Coupling, Unused Code, Unnecessary, Design,
Optimization, String and StringBuer, Code Size
VL SonarQube Vulnerabilities
Security SG PMD Security Guidelines
MC FindBugs Malicious Code, Security
Complexity, Code Smells, and a subset of PMD violations are our code quality metrics.
Complexity (CX) is the number of paths through the code. Whenever the control
ow
splits, complexity increases by one. Code Smells (SM) are pieces of code that make the
system hard to maintain. PMD denes a rich set of violations that covers a variety of
quality attributes. PD is the number of issues in a subset of those violations related to
code quality.
Security metrics are designed to identify potential security holes. We select the num-
ber of Vulnerability (VL) issues from SonarQube as our rst and Security Code Guidelines
(SG) violations from PMD as our second security metric. FindBugs has two classes of
issues for security: Malicious Code and Security. We consider the sum of the number of
issues in both of them (FB) as our third security metric.
4.4 Results
4.4.1 How important is analyzing every commit from multiple perspectives?
To understand the importance of analyzing every commit from multiple perspectives, we
calculate the probability for a metric to change while another one does not change. We
44
dene Const(m) as the set of all impactfuls that do not change m; and Change(m) as
the set of all impactfuls that change m.
For each quality metric m
1
, we count all impactfuls in which m
1
does not change
while another metric m
2
changes (Const(m
1
)\Change(m
2
)). Then, we calculate the
ratio of the number of these commits to the total number of impactfuls in which m
1
does
not change:
P (m
1
;m
2
) =
jConst(m
1
)\Change(m
2
)j
jConst(m
1
)j
100
Table 4.2 summarizes the results of this analysis.
Table 4.2: Percentage of const(X)\change(Y ) to const(X).
Const Change
Basic Code Quality Security
LC FN CS CX SM PD VL SG MC
LC - 1.2 0.4 5.5 13.3 17.0 0.7 0.2 0.6
FN 55.1 - 2.0 29.6 31.0 43.2 2.0 0.3 0.7
CS 65.1 24.5 - 45.7 38.4 54.0 3.1 0.9 1.2
CX 40.1 1.8 1.6 - 22.8 30.2 1.2 0.3 0.7
SM 52.6 17.1 3.9 33.5 - 38.8 1.3 0.5 0.6
PD 37.9 6.5 1.7 17.7 16.2 - 1.0 0.3 0.5
VL 69.1 32.8 13.8 51.5 43.7 58.8 - 1.6 1.4
SG 70.2 34.5 15.5 53.1 45.6 60.2 5.7 - 1.4
MC 70.1 34.4 15.2 53.0 45.3 60.0 4.9 0.7 -
For each pair of metrics, there exists at least one impactful in which one metric
changes while the other one remains constant. This illustrates that no single software
quality metric alone suces to show how software quality changes. Based on this nding,
we advise using a combination of software metrics to measure change and to ensure
software quality during the software development process.
45
Interestingly, code quality metrics may change frequently, even if LC remains constant.
For example, C-SCXML is under maintenance over a period of 7 days. In that period,
there are 11 commits (from 98050 to 947b18d) in which at least one of CX, SM, and
PD changes while LC remains the same. This recommends that even when the change
to the code is minimal, running static analysis techniques can reveal change in quality
attributes.
Our results show that when the code quality metrics do not change, the probability
of change in one of security metrics drops signicantly. However, there is still a chance
that a security issue is added or removed. For example, in C-BCEL commit bfb12d3 the
developer changes one line to avoid calling an overridable method from a constructor.
However, this creates a security issue because a reference to an array is stored directly in
the object. None of the code quality metrics is changed in this case, but a security issue
is created.
Based on our ndings, we advise developers not to rely on a single static analysis
tool as the only quality indicator. Our results also indicates that every missing commit,
even the smallest ones that do not change the size of the software, may contain change
in dierent quality metrics.
4.4.2 How do quality metrics change when the software is compilable?
In order to understand how quality metric changes when the software is compilable, we
calculate the ratio of change to each quality metric in each subject system. We dene
All(s) as the set of all impactfuls in systems andChange(s;m) as the set of all impactfuls
that change m in s. Then, we calculate the ratio of Change(s;m) to All(s):
T (s;m) =
jChange(s;m)j
jAll(s)j
100
46
Table 4.3 summarizes the result of this analysis.
Table 4.3: Percentage of impactfuls that change a quality metric to all impactfuls.
System
Compilability Basic Code Quality Security
Ratio LC FN CS CX SM PD VL SG MC
Avro 100.0 79.7 43.9 20.9 69.5 47.1 72.7 5.3 2.7 2.1
Calcite 96.6 89.3 57.0 33.6 75.4 58.1 82.8 4.1 0.9 0.9
C-Bcel 97.8 53.8 16.4 4.9 31.0 44.6 52.9 4.6 3.0 4.1
C-Beanutils 100.0 47.1 16.7 6.5 26.8 27.5 39.9 0.0 0.7 0.0
C-Codec 100.0 42.0 19.9 6.0 23.4 26.2 35.7 0.8 0.5 0.0
C-Collections 99.1 51.4 22.3 10.7 30.4 29.1 38.2 0.9 1.6 1.6
C-Compress 99.1 64.3 31.6 10.0 48.8 38.6 54.4 3.5 2.3 1.8
C-Conguration 97.9 65.4 34.3 9.2 36.7 27.8 42.5 0.0 0.9 0.9
C-CSV 98.7 38.7 15.9 3.2 25.0 19.6 32.3 0.7 1.2 0.7
C-DBCP 100.0 56.1 19.3 4.8 35.3 40.1 46.0 2.7 1.6 2.1
C-IO 96.9 54.8 29.2 8.2 38.3 34.5 44.8 1.4 1.8 1.9
C-JCS 97.8 78.8 47.0 16.7 56.8 47.7 65.9 8.3 3.0 3.8
C-Jexl 93.1 63.8 40.1 18.8 56.4 45.6 58.9 4.2 1.7 1.7
C-Net 97.2 58.7 20.6 6.3 39.0 38.9 50.5 4.5 1.4 0.3
C-Pool 100.0 44.4 18.4 7.9 27.4 27.4 35.7 3.6 0.7 0.7
C-SCXML 96.3 74.2 32.1 13.6 52.4 48.8 62.7 4.2 0.0 0.6
C-Validator 97.2 56.4 12.5 4.8 26.0 36.3 43.2 0.4 0.7 0.7
C-VFS 99.5 51.0 20.7 5.6 28.9 30.8 43.8 2.0 0.2 1.0
CXF-Fediz 90.0 77.6 38.5 18.6 62.1 50.3 68.3 9.3 1.2 3.1
Drill 96.9 87.6 48.0 28.7 73.7 57.7 79.6 8.3 1.5 3.2
Flume 98.6 87.5 46.6 29.4 66.2 58.8 71.5 7.4 1.2 1.8
Giraph 98.7 85.2 57.9 38.9 68.8 48.7 73.8 7.7 2.6 1.6
Hama 98.0 79.1 44.3 24.8 59.8 60.0 70.5 18.9 6.3 7.7
Helix 95.3 86.5 46.7 24.3 70.1 67.1 77.8 9.0 1.4 2.9
Httpclient 99.0 72.8 34.4 13.8 54.2 39.8 55.8 2.9 2.0 1.6
Httpcore 99.8 68.7 40.1 16.9 49.9 41.2 54.0 0.4 1.8 1.2
Mina 95.3 79.1 40.2 15.3 58.2 53.8 66.7 9.6 1.2 0.8
Mina-sshd 97.0 84.9 52.5 30.0 74.8 58.2 74.9 17.0 2.6 3.7
Nutch 96.4 81.2 32.6 17.4 62.0 58.6 73.8 10.1 3.2 4.9
OpenNLP 96.8 67.9 39.4 22.0 53.6 55.8 61.1 6.8 4.6 4.1
Parquet-MR 98.9 75.6 45.2 29.2 63.1 52.2 66.3 1.9 4.8 2.9
Phoenix 97.7 84.4 42.6 19.7 71.6 57.7 76.9 10.2 4.0 7.5
Qpid-JMS 100.0 75.1 32.6 14.2 60.2 43.0 60.0 5.8 1.9 2.3
Ranger 98.6 87.5 43.6 13.8 73.7 62.5 80.9 17.3 0.3 3.8
Santuario 96.8 77.0 35.4 17.4 66.3 54.1 72.5 6.7 3.0 4.3
Shiro 98.6 69.5 42.6 21.3 52.1 42.2 56.4 2.8 2.1 2.5
Tiles 97.3 82.3 53.5 28.4 66.8 56.1 67.7 7.1 0.6 2.9
Zookeeper 98.7 83.4 38.9 15.3 70.5 53.4 68.6 8.0 1.8 4.0
AVG 97.8 70.1 35.6 16.6 52.8 45.8 60.0 5.7 1.9 2.4
DEV 2.0 14.7 12.5 9.2 16.9 12.0 14.5 4.7 1.4 1.8
47
Our analysis reveals that on average only 70% of impactfuls change the number of
physical lines (LC). All systems with an exceptionally low LC ratio are Commons systems
that are mainly libraries. We further investigate this observation by looking into these
systems.
The main reason for the low LC ratio in Commons systems can be that their Java
classes are heavily documented using JavaDocs. A lot of impactfuls only change those
documentations and impact code quality metrics such as comment density without chang-
ing LC. In multiple instances, we observe that a small change in a class is followed by
several commits xing JavaDocs. For example, at C-CSV commit c43e8fa, a logic change
is followed by 17 subsequent impactfuls, from 10b1110 to 3740067, rening the documen-
tations.
On the other hand, there are some systems with very few impactfuls that are only
dedicated to documentations. For example, we study Calcite for the commits that do
not change LC and nd that only 1.5% of commits change JavaDocs without aecting
LC. This suggests that the concentration of development in terms of changing code and
documentation can signicantly vary among dierent subject systems depending on their
domain (e.g., framework, library, etc.).
One interesting observation is that on average, the ratio of commits that change the
number of functions (FN) to the commits that change LC is 50%. All ve systems with
exceptionally lower ratio of FN to LC are Commons systems. The reason for this can
be that Commons systems have more stable APIs as they are libraries. As a result,
most impactfuls change the code in the body of already existing methods, rather than
introducing or removing any method.
The ratio of commits that change the number of classes (CS) to the commits that
change FN is 44%. Similar to the ratio of FN to LC, all 7 systems with exceptionally
lower ratio of CS to FN are Commons systems which suggests that libraries typically
48
undergo less architectural changes. One explanation for why the metrics are lower for
libraries is that they are, in general, more mature, having been developed in the past,
and re-used often. This reinforces my previous observation [11] on three industry-scale
library systems that their developers care about stability and backward compatibility,
and are likely to maintain the system's architecture stable across a single major release.
The complexity in our study is the number of paths in the core module. Change
in the number of paths means change in the control
ow of the program. On average,
53% of impactfuls change the complexity (CX) of the core module. This emphasizes the
importance of running regression tests after each commit as a change in the control
ow
may result in failing test cases.
Code smells are pieces of code that are confusing for maintainers or give them a pause.
A change in the number of code smells (SM) have a direct impact on the maintainability
of a software system. On average, 46% of impactfuls either introduce new code smells or
resolve already existing ones. This illustrates that developers are frequently changing the
number of code smells in their impactfuls. If they do not utilize proper tools to detect
and realize new smells, they will need to dedicate more time and eort to maintain the
system.
PMD has a broad set of issue categories. We select a subset of them that are related
to code quality metrics. Those categories are listed in Table 4.1. On average 60% of
impactfuls change the issues detected by PMD (PD). We further investigate this by
studying our subject systems to see if they actually use PMD or not. Interestingly, 64%
of our subject systems have the PMD plugin in their conguration les. Considering
that, the high average number of PD may indicate that developers do not necessarily x
PMD issues before every commit. We look at commit messages to investigate whether
developers mention either introducing new PMD issues in the code or removing some
49
existing ones. Only 0.2% of the commits directly mention PMD in their message, most
of which indicate xing issues.
We utilize all three tools to measure the number of security issues. VL is the ratio
of impactfuls that change the number vulnerabilities detected by SonarQube to all im-
pactfuls. SG is the ratio of impactfuls that change the number of security code guideline
violations detected by PMD. MC is the ratio of impactfuls that change the number of
security and malicious code bugs detected by FindBugs.
We observe that VL has a higher ratio on average in comparison with SG and MC.
However, there are instances where this does not hold. In two subjects systems with zero
VL, at least one of the other two metrics shows change in the number of issues. For
example, at C-Beanutils commit e307883, a setter method that stores an array directly
and a getter method that returns that array are introduced. This may become a security
threat if these methods are called by an untrusted code as the caller can keep a reference
to the array and change its content later. There are other instances that one security
metric shows no change in the number of issues over the period that we analyze a software
system while at least another one shows change.
4.4.3 How eective is my approach in identifying change in quality
metrics?
Some quality metrics (e.g., security) have a low change frequency ratio (i.e. less than a
few percent) over commit history [9]. Consequently, a statistical analysis on their change
1) can signicantly be aected by missing data and 2) requires a large quantity of data.
We minimize missing data (by reaching the maximum compilation) and collect quality
metrics from a large number of commits.
To evaluate the eectiveness of our approach empirically, we design an experiment
to divide the developers of an organization into two groups and ask whether there is a
50
dierence between them in terms of missing data and change frequency in quality metrics.
We extend our dataset and collect the quality metrics for more than 37k distinct revisions
of the core module of 68 subject systems from Apache, Google, and Net
ix.
We use the aliation status of developers to divide them into two groups. Studying
the dierence between the developers of an organization based their aliation status
and by looking at their impact on software quality is a novel research contribution itself.
There are studies [44,64] that perform quantitative analysis to understand the eects of
team and organization structure on software quality through empirical analysis; however,
they do not use static analysis and detailed code artifacts. Vasilescu et al. [64] examine
how the makeup of a team and its diversity can be quantied and in
uence the quality
of a project. They examine the composition of the teams and its eects software quality.
They use GHTorrent for data collection which does not collect and analyze code artifacts.
Scholtes et al. [44] investigate how the team size and collaboration aects the quality of
the project. They perform detailed analysis using data from 580,000 commits and 58
open source projects without using static code analysis or code artifacts.
We assume a developer to be \aliated" if they use the ocial email of the organiza-
tion while committing to a repository owned by that organization; otherwise, they are \ex-
ternal". The ocial email domain for Apache is @apache.org, for Google is @google.com,
and for Net
ix is @net
ix.com. The ratio of aliated developers is 80.6% for Apache,
68.7% for Google, and 42.5% for Net
ix.
We calculate the ratio of brokens to understand the extent of missing data in our
dataset for aliated and external of each organization. We also calculate change fre-
quency ratio in ve quality metric by comparing the metric value in neutrals and their
parent: lines of code (LC), code complexity (CX), code smells (SM), malicious code and
security issues (MC) detected by FindBugs, and security vulnerabilities (VL) detected
by SonarQube.
51
Table 4.4: Ratio of change to quality metrics for aliated and external.
Aliation
broken neutral Change Ratios (%)
(%) (#) LC CX SM MC VL
Apache
A 1.55 16810 69.80 53.39 45.42 2.77 5.97
E 1.58 4966 85.11 71.07 57.34 3.25 8.46
P 1 0 0 0 0.559 0
Google
A 0.45 8734 77.59 59.18 44.74 0.28 2.74
E 3.09 2510 77.34 61.38 53.03 2.02 6.24
P 0 1 0.429 0 0 0
Net
ix
A 7.52 1610 78.45 62.05 54.60 1.30 8.82
E 4.42 1511 77.22 60.73 49.80 1.32 8.87
P 0.001 0.963 0.974 0.079 1 1
A: Aliated P : P-Value LC : Lines of Code CX: Code Complexity
E: External SM: Code Smells MC: Malicious Code VL : Vulnerabilities
In order to compare the ratios, we perform a Games-Howell statistical signicance
test as there is dierence in the sample sizes and variances. We consider a 5% condence
interval or P-Value less than 0.05. Nine (out of 18) tests identify statistically signicant
dierence. Table 4.4 shows the ratio of brokens and the number of neutrals by aliated
and external, and the results of tests.
In Apache, there is no statistically signicant dierence between aliated and external
in broken ratio. However, aliated has a lower change frequency ratio in LC, CX, SM, and
VL. The ratio of MC is lower in aliated but the dierence is not signicant. As opposed
to Apache, there is no signicant dierence between aliated and external of Google and
Net
ix in ratio of LC and CX. The broken ratios of Google external and Net
ix aliated
are signicantly higher. For SM, VL and MC, Google aliated has signicantly lower
ratios, but there is no signicant dierence between aliated and external of Net
ix.
This analysis shows that our approach is capable of minimizing the number of missing
data points and collecting a large quantity of data on software quality. It also shows that
we can use the collected data to identify statistically signicant dierence between the
52
impact of two groups of commits/developers within an organization, even for quality
metrics with low change frequency ratio.
4.4.4 How do quality metrics change when the software is uncompilable?
To understand how software quality evolves over uncompilability, we identify all broken
sequences in our dataset. Each broken sequence starts with a breaker and ends with
a xer. For brokens involved in a broken sequence (i.e, one breaker, any number of
carrier, and one xer), we can not calculate the impact on software quality. However, we
can compare software quality between two revisions to understand how software quality
evolves over the sequence: the revision produced by the impact-parent of the breaker
and the one produced by the xer. The former one is the last solid revision before the
sequence begins and the latter one is the rst solid revision after the sequence ends.
Our most recent dataset contains 212 Apache, 43 Google, and 50 Net
ix broken se-
quences. For each sequence, we identify the impact-parent of the breaker and the xer
and analyze their software quality. We compare these two revisions and calculate the
frequency of change in each quality metric over the broken sequences. Table 4.5 shows
the frequency of change over broken sequences in each organization. It also contains the
frequency of change over neutrals and the P-Values of Games-Howell test to compare the
dierence between broken sequences and neutrals. We consider a 5% condence interval
or P-Value less than 0.05.
Our analysis shows that there is a statistically signicant dierence between broken
sequences and neutral in all metrics (i.e., LC, CX, SM, MC, VL) for Apache and in LC,
CX, SM, VL for Google and Net
ix. This result was expected since for a neutral we
calculate the impact of a single impactful, while for a broken sequence we calculate the
cumulative impact of at least two impactfuls: the breaker and the xer. Our analysis in
the previous chapter shows that the majority of broken sequences only contain a breaker
53
Table 4.5: Ratio of change to quality metrics for broken sequence and neutral.
Org. #
Change Ratios (%)
LC CX SM MC VL
Apache
N 21844 71.48 55.01 47.02 2.81 6.1
B 212 90.57 83.49 82.55 14.15 21.23
P 0.001 0.001 0.001 0.001 0.001
Google
N 11259 77.8 59.7 46.9 0.6 3.36
B 43 88.37 79.07 83.72 4.65 20.93
P 0.033 0.002 0.001 0.212 0.005
Net
ix
N 3303 78.02 61.34 52.16 1.3 8.99
B 50 94 82 74 6 34
P 0.001 0.001 0.001 0.167 0.001
N: A neutral. B: Any broken sequence. P: P-Value
and a xer; however, a broken sequence may include multiple (in few cases more than
10) carriers.
In order to address this issue, we identify two sets of sequences: 1) B1: all broken
sequences that contain a breaker followed by a xer (i.e, no carrier) and 2) S2: all
sequences of two subsequent neutrals. Comparing software quality change over these two
sets of sequences provides a better insight into the impact of uncompilability over software
quality as they contain the same number of subsequent impactfuls. Table 4.6 shows the
result of our analysis.
Our analysis shows there is no statistically signicant dierence for change in LC be-
tween B1 and S2 in Apache, Google, and Net
ix. The statistically insignicant dierence
in LC implies that the compilability status and change in size do not necessarily aect
each other. This strengthen our previous chapter's conclusions and may further imply
that the developers unconsciously commit broken code and uncompilability is a symptom
of careless development.
54
Table 4.6: Ratio of change to quality metrics for B1 and S2.
Org. #
Change Ratios (%)
LC CX SM MC VL
Apache
S2 21350 86.76 73.7 66.22 5.05 11.01
B1 149 89.26 81.88 76.51 10.07 18.12
P 0.327 0.01 0.003 0.043 0.025
Google
S2 11093 91.63 78.61 66.84 1.14 6.16
B1 25 80 72 76 0 16
P 0.154 0.471 0.294 0.001 0.189
Net
ix
S2 2759 91.77 81.26 71.4 2.5 15.26
B1 35 94.29 80 71.43 5.71 31.43
P 0.535 0.858 0.9 0.421 0.043
S2: A sequence of 2 neutrals. B1: A broken sequence with one broken. P: P-Value
In Apache, the frequency of change to CX, SM, MC, and VL is signicantly higher
in B1 in comparison with S2. This indicates that when the software is uncompilable its
complexity, maintainability (i.e, code smells), and security change more frequently. Con-
sidering the challenges involved in analyzing uncompilable commits (e.g., unavailability
of bytecode), the higher frequency of change in code quality and security metrics over
broken sequences reinforces the importance of understanding how software evolves when
it is uncompilable.
The size of B1 in Google and Net
ix is small, still in both organizations, the frequency
of change in a security metric is signicantly dierent between B1 and S2. By collecting
more uncompilable sequences in these two organizations, we would have been able to get
a better insight on how uncompilability impacts software quality. However, we select
all Java systems owned by Google and Net
ix on GitHub that comply with our system
selection criteria (Section 3.4.2). This lack of data further reinforces the signicance of
our approach in conducting large scale analysis and assessing every possible data-point.
55
Chapter 5
Discussions
My approach to reach a high maximum compilation ratio over commit history is focusing
on the evolution of a software module. In this chapter, I discuss multiple other benets
for focusing on a software module in software evolution analysis by commit level and
quantitatively analyze the risks of using my approach. I also discuss my tool-based
approach (SQUAAD) in further details and the replicability and threads to validity to
the analysis results presented in this dissertation. Finally, I explain potential future
directions for my research.
The remainder of this chapter is organized as follows. I discuss related work that
emphasize on analyzing software modules in Section 5.1.1 and the benets and risks of
focusing on a software module in a study of software quality evolution by commit level
in Sections 5.1.2 and 5.1.3. I discuss SQUAAD' key dierences with other state-of-art
tools for conducting commit level mining software repository studies in Section 5.2.1 and
the replicability of the results presented in this dissertation in Section 5.2.2. I discuss
the threads to validity of my analysis results in Section 5.3 and the future directions in
Section 5.4.
56
5.1 Focusing on a Module in Software Evolution Analysis
by Commit Level
Software developers do not contribute equally to dierent modules of a software system.
A case study [38] on Apache Server shows that only 15 developers (out of almost 400
involved developers) mainly contribute to the core of this project, and in any period,
there are 4-6 active \core developer"s per week. Multiple studies [39, 43, 47, 62, 63] have
encountered benets of analyzing the evolution of modules instead the whole software.
My approach to analyze software quality evolution by commit level is to target a
software module and analyze its evolution. A module can be as small as an extension of a
library, or as large as a complete software system hosted by a repository. I have encoun-
tered multiple benets for focusing on a module when conducting large-scale empirical
studies on software evolution. For example, as I demonstrate in chapter 3, focusing on
a software module (instead of the whole software) enables achieving a high maximum
compilation ratio over the commit history.
In this section, I discuss related work (that focuses on software modules) and explain
the benets and risks of focusing on a module instead of the whole software in software
evolution analysis by commit level.
5.1.1 Related work
Some studies focus on the evolution of modules for visualization purposes. A recent
software evolution visualization (SEV) approach [39] emphasizes on the information on
software modules and subsystems, instead of presenting data as whole and including all
available versions and showing general information on evolution process [5,15,20,23]. This
work is focused on obtaining the detailed information on software modules and subsystems
that is necessary for most software engineering tasks. Another recent visualization study
57
[43] considers tracking changes in modules and relationships in a software project to
analyze software when it goes through several types of changes.
Some studies focus on modules to facilitate more eciently running expensive analysis.
For example, a module-level regression test selection technique [63] can track dependencies
between project modules and skip tests whenever the dependencies of a module do not
change between the current and the last successful build. Another technique [62] can
classify dierent modules of a software system as fault-prone or non-fault prone. This
approach is particularly benecial from a managerial perspective as managers can allocate
more resources to the modules/subsystems that contain more defects. Another study [47]
discusses that distributing resources based on modules is more eective than traditional
approach of allocating the budget to the whole system. For example, if a manager realizes
that modules with more than 5 developers tend to be more faulty, a team restructuring
can happen.
5.1.2 Benets
I have identied multiple benets for focusing on a module while conducting the study
presented in this dissertation.
It provides a more complete view of evolution. In several instances, we have observed
that a commit introduces a compile error in one module and causes the whole software to
be uncompilable over a period; however, other modules are compilable and are evolving.
For example, in Net
ix-Spectator
1
commit fbeb719, a compile error in a unit test breaks
the build. This causes the unavailability of bytecode which results in missing data and an
incomplete analysis. After focusing on the core module and skipping tests, we achieve a
100% compilation ratio and obtain a complete view of the evolution of the core module.
1
https://github.com/net
ix/spectator
58
It facilitates manual inspection of individual data points which is a labor intensive
task. In Apache CXF-Fediz
2
, commit b06255a changes the core module and uses a
new interface, which causes a compile error. The error is xed in the next impactful
(44d7340) which is a small commit that comments out the line containing the error and
says \Switching to WSS4J 2.0.0-rc1 + commenting some stu as a result". There are
four other commits changing code in other modules between these two commits that can
be skipped when inspecting why the software is uncompilable.
It reduces the cost and complexity of analysis. Apache Tiles
3
is a templating
framework for user interface development. It is designed to be integrated with a variety
of other systems. Its core module is accompanied with multiple smaller components (e.g.,
sdk, agents, and plugins). Although majority (70%) of developers change the core module
at least once, only 24% of commits are impactful. Instead of analyzing all commits to
understand the evolution of the core module, we can focus on that smaller subset.
It provides a more accurate analysis by omitting irrelevant changes. Apache Santu-
ario
4
provides implementation of primary security standards for XML. Its core module
(\src/main") is accompanied by a large number of tests (\src/test") comprising 30% of
all source les. Consequently, 34% of commits do not change code in the core module
and including them may cause inaccuracy when analyzing the frequency of change to a
quality metric.
It helps better understanding of heterogeneous projects that are developed by dier-
ent development teams. Apache Parquet-MR
5
contains dierent sub-projects in the
same repository. Each sub-project has its own set of developers and reviewers. Conse-
quently, only 34% developers change the core sub-project at least once over the whole
2
https://github.com/apache/cxf-fediz
3
https://github.com/apache/tiles
4
https://github.com/apache/santuario-java
5
https://github.com/apache/parquet-mr
59
development history. Instead of focusing on the whole project, we can focus on each
sub-project to evaluate the performance of each team.
It helps better understanding of multi-language projects. For example, Apache
Avro
6
is implemented in 10 programming languages in the same repository. Instead
of analyzing the whole project and dealing with the issues of analyzing multi-language
projects [1, 6], we can run analysis on each module (e.g., \lang/java", \lang/c++") in-
dividually by a proper program analysis technique. Some mining repository techniques
only support one programming language [19] or limit their mining task to one language
per project [55].
It improves the understandability of data visualization when thousands of data
points are depicted. Apache Accumulo
7
is a distributed data storage framework. Over
a period of 5 years (10/2011 to 11/2016) 105 developers contribute 8513 commits to its
repository. Only 1969 commits by 76 developers change Accumulo's core module. Figure
5.1 shows the evolution of the number of unused code blocks in its core module using all
commits (top) and only the ones that change the core module (bottom) over that period.
Both graphs show the same evolutionary trend, but the top one contains 6544 (3.3x) more
data points that are not relevant to changes in the core module.
5.1.3 Risks
The benets of focusing on a software module comes at the price of skipping other modules
and artices. In order to understand to what extent this happens, we calculate the ratio
of commits and developers that change code in the core module. Figure 5.2 shows the
distribution of these ratios across systems in each organization. For example, it shows
that in 50% of Net
ix systems, between 20% and 30% commits, and in 44% of Google
systems, between 70% and 80% of developers are impactful.
6
https://github.com/apache/avro
7
https://github.com/apache/accumulo
60
Figure 5.1: Visualization of software quality evolution (all commits v.s. impactfuls).
Evolution of Unused Code in the Core Module of Apache Accumulo (All Commits)
dev_1 dev_2 dev_3 dev_4 dev_5 dev_6 dev_7 dev_8 dev_9 dev_10 dev_11 dev_12 dev_13 dev_14 dev_15 dev_16
dev_17 dev_18 dev_19 dev_20 dev_21 dev_22 dev_23 dev_24 dev_25 dev_26 dev_27 dev_28 dev_29 dev_30 dev_31
dev_32 dev_33 dev_34 dev_35 dev_36 dev_37 dev_38 dev_39 dev_40 dev_41 dev_42 dev_43 dev_44 dev_45 dev_46
dev_47 dev_48 dev_49 dev_50 dev_51 dev_52 dev_53 dev_54 dev_55 dev_56 dev_57 dev_58 dev_59 dev_60 dev_61
dev_62 dev_63 dev_64 dev_65 dev_66 dev_67 dev_68 dev_69 dev_70 dev_71 dev_72 dev_73 dev_74 dev_75 dev_76
dev_77 dev_78 dev_79 dev_80 dev_81 dev_82 dev_83 dev_84 dev_85 dev_86 dev_87 dev_88 dev_89 dev_90 dev_91
dev_92 dev_93 dev_94 dev_95 dev_96 dev_97 dev_98 dev_99 dev_100 dev_101 dev_102 dev_103 dev_104 dev_105
30.10.11 17.06.12 04.02.13 23.09.13 13.05.14 30.12.14 19.08.15 06.04.16 24.11.16
Commit Date
0
50
100
150
200
250
300
350
400
Unused Code Block
Evolution of Unused Code in the Core Module of Apache Accumulo (Only Impactfuls)
dev_1 dev_2 dev_3 dev_4 dev_5 dev_6 dev_7 dev_8 dev_9 dev_10 dev_11 dev_12 dev_13 dev_14 dev_15 dev_16
dev_17 dev_18 dev_19 dev_20 dev_21 dev_22 dev_23 dev_24 dev_25 dev_26 dev_27 dev_28 dev_29 dev_30 dev_31
dev_32 dev_33 dev_34 dev_35 dev_36 dev_37 dev_38 dev_39 dev_40 dev_41 dev_42 dev_43 dev_44 dev_45 dev_46
dev_47 dev_48 dev_49 dev_50 dev_51 dev_52 dev_53 dev_54 dev_55 dev_56 dev_57 dev_58 dev_59 dev_60 dev_61
dev_62 dev_63 dev_64 dev_65 dev_66 dev_67 dev_68 dev_69 dev_70 dev_71 dev_72 dev_73 dev_74 dev_75 dev_76
30.10.11 17.06.12 04.02.13 23.09.13 13.05.14 30.12.14 19.08.15 06.04.16 24.11.16
Commit Date
0
50
100
150
200
250
300
350
400
Unused Code Block
The evolution of unused code blocks in the core module of Apache Accumulo using two
sets of commits: 1) all 8513 commits (top) and 2) 1969 commits that change the core
module (bottom).
61
0
10
20
30
40
50
10 20 30 40 50 60 70 80 90 100
Frequency (%)
Impactful (C.)ommits/(D.)evelopers Ratio (%)
Apache C. Google C. Netflix C. Apache D. Google D. Netflix D.
Figure 5.2: Impactful commits/developers ratio distribution.
This shows that focusing on a target module (e.g., the core module in this study),
may result in missing a large number of changes. 1) If those modules are irrelevant, then
including their changes may degrade the analysis. For example, in a study of architec-
tural stability of a library, introducing a plug-in is not necessarily an architectural change.
However, architectural change metrics [11] detect it as a change since new entities are
introduced. 2) If they are relevant, they could be analyzed either separately or in com-
bination with other modules. As explained in Section 5.1, a module can be as small as a
plug-in or as a large as the whole software. For example, in a study of compilability of the
whole software, one can target its core module rst to x the compilation environment
issues, then gradually include other modules, instead of targeting the whole software and
xing the issues in dierent modules all together.
We take further steps to understand the risks of focusing on the core module by
studying Apache systems. We focus only on Apache systems as they are organized by
one organization and follow similar development guidelines. Table 5.1 summarizes the
number of commits, impactful commits, developers, and impactful developers for each
Apache system until Feb. 2017. On average 48% of commits are impactful and 69%
of developers have at least one impactful commit. This illustrates the centrality of the
62
core module in the development process of our subject systems. Considering our system
selection criteria, this observation is expected; however, there are some subject systems
with exceptionally low ratios. We look into these subject systems to investigate this
observation further.
Flume and Tiles are frameworks designed to be integrated with a variety of other
systems. Hence, the core module in both systems is accompanied with multiple smaller
components (e.g., sdk, agents, and plugins). Additionally, there are proportionally large
test suites developed for the core module in both systems. These test suites are not
analyzed in this study. As a result, the ratio of impactful commits for Flume (29%) and
Tiles (24%) are relatively lower than the average. However, in both systems, the high
ratio of impactful developers (58% and 70%) suggests that the core module is the focus
of the development.
Ranger is a comprehensive data security framework designed to be integrated with
dierent tools across Hadoop platform. Unlike Flume and Tiles, it has a relatively small
test suite. However, its repository contains an exceptionally large number of small mod-
ules, agents, and plugins. A large number of commits aecting those modules are skipped
in our analysis. As a result, the ratio of impactful commits for Ranger (29%) is relatively
lower than other subject systems. The ratio of impactful developers (53%) is not within
the standard deviation of the mean as well. However, it still shows that more than half
of the developers are engaged in the development of the core module.
Avro is a data serialization system that is implemented in 10 dierent programming
languages. All those implementations are located in one repository. Since we are inter-
ested in the Java implementation, we skipped the commits that contain developments for
other programming languages. Consequently, the ratios of impactful commits (25%) and
impactful developers (49%) are relatively low.
63
Table 5.1: Impactful developers and commits in Apache systems.
System Domain Timespan
Developers Commits
All Imp % All Imp %
Avro Data Serialization 07-11{01-17 39 19 48 757 188 24
Calcite Data Management 08-14{02-17 121 95 78 1201 673 56
C-Bcel Bytecode Eng. 06-06{12-16 13 8 61 900 589 65
C-Beanutils Re
ection Wrapper 03-10{09-16 9 7 77 392 139 35
C-Codec Encoder/Decoder 09-11{09-16 6 5 83 706 368 52
C-Collections Collections Ext. 03-12{10-16 16 10 62 890 570 64
C-Compress Compress Lib. 07-08{02-17 30 22 73 2037 1240 60
C-Conguration Conf. Interface 04-14{01-17 4 4 100 637 340 53
C-CSV CSV Library 11-11{02-17 11 7 63 1032 606 58
C-DBCP DB Conn. Pooling 01-14{11-16 8 5 62 393 188 47
C-IO IO Functionality 01-02{12-16 42 32 76 1941 943 48
C-JCS Java Caching 04-14{02-17 7 4 57 401 139 34
C-Jexl Expression Lang. 08-09{10-16 8 4 50 675 317 46
C-Net Clientside Protocol 11-06{02-17 11 7 63 1588 902 56
C-Pool Object Pooling 04-12{02-17 11 8 72 546 278 50
C-SCXML State Chart XML 07-06{08-16 16 8 50 811 348 42
C-Validator Data Verication 08-07{02-17 16 10 62 639 287 44
C-VFS Virtual File System 11-06{01-17 21 13 61 1242 614 49
CXF-Fediz Web Security 04-12{03-17 12 6 50 1211 190 15
Drill SQL Query Engine 04-15{02-17 84 65 77 995 636 63
Flume Data Collection 08-11{11-16 45 26 57 1194 347 29
Giraph Graph Processing 11-12{01-17 33 26 78 585 392 67
Hama BSP Computing 06-08{04-16 23 16 69 1582 732 46
Helix Cluster MNGMT 01-12{06-16 31 21 67 1521 800 52
Httpclient Client-side HTTP 03-09{01-17 13 9 69 1916 934 48
Httpcore HTTP Transport 03-09{02-17 7 6 85 1354 566 41
Mina Network 11-09{03-15 13 9 69 628 276 43
Mina-sshd SSH Protocols 04-09{02-17 22 21 95 1092 843 77
Nutch Web Crawler 03-05{01-17 40 28 70 2221 926 41
OpenNLP NLP Toolkit 04-13{02-17 14 14 100 665 438 65
Parquet-MR Storage Format 02-13{01-17 119 41 34 1647 349 21
Phoenix OLTP Analytics 01-14{02-17 78 62 79 1908 1290 67
Qpid-JMS Messaging 02-15{02-17 5 3 60 921 431 46
Ranger Data Security 03-15{02-17 47 25 53 1412 415 29
Santuario XML Security 01-11{02-17 3 3 100 743 496 66
Shiro Java Security 03-09{11-16 22 13 59 700 289 41
Tiles Web Templating 07-06{07-16 10 7 70 1345 327 24
Zookeeper Distributed Comp. 06-08{02-17 31 25 80 1339 618 46
AVG 69 48
DEV 15 14
64
Parquet-MR has an exceptionally low ratios for impactful commits (21%) and im-
pactful developers (34%). Its repository contains multiple subprojects which are the con-
version and integration of Parquet (a columnar storage format for Hadoop) with other
technologies, such as Avro and Hive. Each subproject is assigned to a couple of reviewers
and has its own developers. As a result, in comparison with other projects, a relatively
smaller subset of developers contribute to the core module.
CXF-Fediz is a web security framework that has two relatively large and architec-
turally independent components: cxf.fediz.core (the core module) and cxf.fediz.service.idp.
The rst one is a federation plugin for web applications. The second one is an identity
provider and security token service. Since our focus is on the core module of this system,
we do not analyze a proportionally large number of commits that had an impact on the
second module. As a result, the ratio of impactful commits are relatively low (16%).
While the ratio of impactful developers (50%) is not within the standard deviation of the
mean, it still shows a high level of engagement in this module.
C-SCXML and C-Jexl are the only two systems that have relatively low impactful
developer ratios (50%, 50%) while their impactful commit ratios (41%, 47%) are within
the standard deviation of the mean. By looking at C-SCXML's development history, we
nd that half of the developers have less than 5 commits within the period considered in
this study. All of those commits are non-impactful changes such as adding dependencies
or xing repository's conguration. On the other hand, the three most active developers
have committed more than 86% of all commits. We observe a similar pattern in C-Jexl.
The two most active developers have committed more than 89% of the commits while
half of the developers have 17 non-impactful commits combined. This shows a meaningful
distinction between the role of the key developers and others in these two subject systems.
Our analysis reveals that on average half of the commits have an impact on the source
code of the core module. In addition, more than two-third of the developers are directly
65
engaged in the development of the core module. Our results suggest that the architecture
of the system, the level of its integration with other systems, and the distribution of
tasks during the development may signicantly aect the ratios of impactful commits
and developers.
5.2 A Tool Based Approach
Due to eort and scalability challenges involved in studying software evolution on a large-
scale, researchers are forced to adjust the scope which oftentimes results in skipping source
code facts, a coarse-grained analysis (i.e., studying ocial releases), or using simple static
analysis techniques to analyze the commit history.
Over the past couple of years, multiple tools and techniques are introduced to study
software evolution by commit level [16,19,25,29,40,41,48,53,54,55,59]. Some of them do
not run analysis on source code [25,41]. Some of them are designed and implemented to
run source code analysis sequentially [48, 59]. Consequently, the execution of the study
requires strong on-premise resources and takes multiple weeks [59].
There are some mining techniques designed to scale by running static analysis on
dierent revisions of each le in parallel [1,19,55]. These techniques are extremely eective
in generating an Abstract Syntax Tree (AST) [1, 19] and calculating le-based quality
metrics, such as size, cohesion, and complexity [55]. They can also aggregate the result
of individual les analysis to evaluate the quality of a project. However, this approach
to achieve scale and avoid redundancy is not suitable to understand software quality
evolution by program analysis techniques that
analyze a module considering all of its source code entities and their relationship.
For example, architecture recovery techniques generate clusters of source code en-
tities by analyzing semantic and/or structure relationships between them [22, 60].
66
It does not suce to analyze the changed les separately and aggregate the results
to recover the architecture of a new revision.
require bytecode. Some static and dynamic program analysis techniques depend on
the availability of the compiled version (e.g., FindBugs [7] and test-coverage [36]). A
commit may change the version of a dependency. The new dependency is available
and the build conguration (e.g., build.gradle) is syntactically correct. However,
the new revisions do not compile as the dependency is not backward compatible [9].
A recent study [1] declares the unavailability of the compiled versions as the main
unresolved source for the manual eort in software evolution analysis.
In addition, some tools rely on a specic environment to run. This includes dy-
namic techniques that need an execution environment and static techniques with specic
requirements. For example, SonarQube requires deploying its own analysis server, gener-
ating a conguration le for each revision, executing the analysis, and fetching the results
using SonarQube API.
I take steps toward addressing that scarcity by developing Software Quality Under-
standing by Analysis of Abundant Data (SQUAAD) [10], a comprehensive framework
including a cloud-based automated infrastructure accompanied by a data analytics tool-
set. SQUAAD is designed to target a module, compile its distinct revisions, and run
static/dynamic analysis on it. Before conducting a large-scale analysis, SQUAAD runs
a light-weight mining task on the software's Git repository to determine which commits
change a module (impactful commits [9]) and the evolutionary relationship between those
commits [12]. Then it automatically 1) distributes hundreds of revisions on multiple cloud
instances, 2) compiles each revision using default and/or user dened congurations, 3)
provisions the environment and runs static/dynamic programming analysis techniques on
each revision, 4) collects the generated artifacts, 5) either parses them to extract quality
67
attributes or compares them with each other to calculate the dierence (e.g., by archi-
tectural distance metrics), and 6) runs statistical analysis on software quality evolution.
The entire analysis work
ow is automated. As soon as the framework is congured for
a subject system, a researcher can run the represented analysis on that system and study
its evolution. The automation of analysis conducted by SQUAAD enables replication of
my study results which is a major threat to the external validity of repository mining
studies [54]. I have also developed user interfaces to illustrate the evolution of dierent
quality attributes and the impact of each developer on software quality.
5.2.1 Related Work
Over the past few years, several automated tools are developed to collect data on software
quality evolution by commit level and to represent the data in form of graphs and statis-
tics. To the best of my knowledge, none of them is capable of compiling the software
and running complex static/dynamic analysis that considers the relationship between
les and/or requires an execution environment. Table 5.2 summarizes this discussion.
Some tools, such as RepoGrams [41] and Lean GHTorrent [25] are excluded from this
comparison as they skip source code facts.
Table 5.2: Comparison of related frameworks.
Techniques Year Program Analysis Redundancy Binary
Type Compilation Distributed Reduction Metrics
SQUAAD 18 Static/Dynamic Yes Cloud Module Yes
SmartShark 16/17 Static - HPC File -
HistoryMiner 17 Static - - File -
Candoia 17 Static - - - -
Lisa 17 Static - - File -
QualBoa 16 Static - - - -
Boa 15/17 Static - - File Cluster -
MetricMiner 13 Static - - - -
68
Boa [18, 19] is a query system developed specically for mining repositories. It can
eciently aggregate and query information by executing tasks on a Hadoop cluster. The
original version of Boa indexes metadata information about projects, authors, revision
history and les. It stores no source-code facts. The newer version parses Java les to
extract and store the Abstract Syntax Tree (AST). Using Boa language, one can design
mining experiments, such as nding the programming language most used, the number
of created projects per year, and the number of public methods per project. There is
no support for running more complex analysis (e.g., architecture recovery) that requires
analysis of multiple les together, compilation (e.g., bytecode and dynamic analysis), and
COTS. Upadhyaya et al. [61] introduce an optimization on mining tasks in context of Boa
analysis. They develop a notion of mining-task-specic similarity to cluster the artifacts
(i.e., les, classes, methods). It is sucient to run the mining task for one of the artifacts
in the same cluster and apply the result to all members of the cluster. This optimization
is at the level of artifacts as opposed to ours which optimizes the analysis by targeting
the unique revisions of a module.
QualBoa [16] is a recommendation system, built on top of Boa, that considers both
functional and quality aspects. It searches for candidate les matching a \signature"'s
functionality, downloads those les and runs static analysis to calculate quality metrics
based on their AST. Similar to SQUAAD, it runs static analysis. It is based on Boa which
gives it an advantage to run customized metrics on AST. On the other hand, it has all the
limitations of Boa mentioned above. Candoia [53] is a platform based on Boa for building
and sharing mining software repositories tools as apps. Similar to SQUAAD, it retrieves
information from software (and issue repositories). The foundation of Candoia is based on
Boa which facilitates running analysis on source code using an AST abstraction, as well
as running analysis on the result of mining, such as prediction of the relationship between
les. It depicts the result of the analysis to users in form of customized interfaces. Similar
69
to SQUAAD and as opposed to the original Boa, it is designed to be able to run analysis
on a user's private projects and user specic data sets. Candoia is more customizable
than SQUAAD, in that users can develop mining logic in Boa language. It also gives the
users the ability to develop their own customized charts. The fundamental limitation of
Candoia in comparison with SQUAAD is that it is completely based on Boa except that
it retrieves issue information from issue repositories. This means that it is only capable
of running static analysis algorithms on the AST of les.
Lisa [1] is a tool for reducing redundancies in multi-revision code analysis. It intro-
duces low-level ground work for eciently running static analysis over the evolution of
software. Lisa runs the analysis on every le in the bare local Git repository at the start
of the analysis. It saves the artifacts, then loads the already analyzed results for each le
when analyzing every revision, instead of checking out all les of that revision and run-
ning the analysis technique. This drastically reduces the amortized time of the analysis
of each revision. However, this approach is limited to the analysis of source code in les,
and is not applicable to bytecode analysis and dynamic analysis of the whole system.
Lisa analyzes every Xth commit instead of analyzing every commit. This is called \sam-
pling intervals" which is introduced to select a subset of commits that provide sucient
information about the evolution of a software system. X is 250 for Java, 150 for C#, and
25 for Javascript. In contrast, our approach identies the distinct revisions of the module
under analysis. In a one to one comparison of approaches, Lisa is extremely faster for the
types of analysis it is capable of but is incapable of running complex analysis involving
multiple les (e.g., architecture recovery), bytecode, and dynamic analysis.
MetricMiner [48] is a web based framework to study the evolution of software systems.
Similar to SQUAAD, it analyzes the whole development history including commits. It
uses Java parser to create the AST of each le. It calculates dierent metrics on each
commit. It provides an API for adding new metrics to the framework, but all metrics
70
are developed per le. As a result, MetricMiner is not capable of running more complex
analysis by module and architecture levels. It is also not capable of bytecode analysis.
HistoryMiner [58] is a tool to conduct large scale study on when and why code starts to
smell bad. It mines every commit in the history of software and applies a lightweight static
analysis technique on les aected by the commit to check if a code smell is introduced.
It marks the rst commit introducing a smell as smell-introducing commit. It also mines
Jira and automatically assigns issues to commits. Their approach in analyzing multiple
commits is sequential, and their mining takes several weeks on 200 subject systems.
Similar to Lisa, their optimization is to avoid analyzing redundant les which works
when the analysis technique is only applied to single les. HistoryMiner is used in a
follow up paper [59] to study when code smells disappear from the system.
SmartShark [54,55] is a framework to address the problems with external validity of
mining software repository studies. It is a distributed platform with a web interface. After
a project is declared by its GitHub URL and programing language, SmartShark starts
the mining task by running static analysis on aected les by every commit. It constructs
the AST and calculates metrics such as number of clone for each le. Its distribution
approach is designed to perform on HPC clusters, while our approach depends on virtual
machines orchestrated on an on-premise server or a private/public cloud.
The following tools either do not consider source code or are not designed to mine
software repositories to study commit history.
Xavier et al [66] design a tool-set to conduct a large scale historical and impact
analysis of API breaking changes. Similar to SQUAAD, their tool-set uses GitHub API
for system selection. It runs a static analysis technique on each release that is basically
a Java parser. It uses a binary metric that compares two releases of a Java library to
measure API dierence. The data-
ow of their analysis is comparable with SQUAAD;
however, they consider the evolution of system at release level, not commit level.
71
Groundhog [40] is a framework developed to study the usage of Java's concurrent
programming constructs. Similar to SQUAAD, it is capable of retrieving the ocial
releases from source code repositories, running static analysis techniques on the code,
and saving the metrics in a relational database. It uses the Java compiler to parse the
source code, but it does not build the bytecode. Its scope is limited to the ocial releases.
GCC-Git [29] is a change classier for extraction and classication of changes in
software systems. It classies the commits to three categories: bug repairing, feature
introducing, and general. This classication is solely based on commit description and it
does not consider source code facts.
RepoGram [42] provides a convenient interface to show the evolution of software
system based on metrics dened on the quality of commits. Similar to SQUAAD, it
considers the evolution by commit level and provides unifying views for all metrics. Its
views are called footprints. However, in calculating the metrics, it only relies on the
repositories interface (e.g., size of the commit) and does not consider source code.
Lean GHTorrent [25] is a lightweight tool to request GitHub data dumps on demand
for any collection for GitHub repositories. Similar to our framework, it works with GitHub
API to retrieve the information. It has a more advanced schema in terms of the data
it retrieves, and it saves snapshots of GitHub specic data, such as pull request history.
However, it does not deal with any code artifact.
Sourcerer [8] is an infrastructure for large-scale collection and analysis of open-source
code. Similar to SQUAAD, it crawls software repositories and downloads the latest
version of projects. It runs an AST extractor and automated dependency resolution
technique to nd the relationship between les and entities. Its code crawler is more
advanced in comparison with SQUAAD as it covers more repositories in comparison with
SQUAAD. However, it does not mine the history (i.e., multiple commits). Its static
72
analysis technique is limited to Java as its meta model is not language agnostic. It does
not build the source.
5.2.2 Replication
I assess the replicability of my study results by extending the dataset used in my ear-
lier papers [2, 3, 9] in terms of number of subject systems and organizations in a recent
publication [12].
Our rst maintainability trends analysis [9] involves a total of 19,580 examined re-
visions from 38 Apache-family systems across a timespan from January 2002 through
March 2017, comprising 586 MSLOC. In this analysis, to obtain software quality, we use
three widely-used open-source static analysis tools: FindBugs, PMD, and SonarQube.
This dataset is used in two follow-up studies [2,3]. In a recent study [12], we extend this
dataset to include 30 new subject systems from Google and Net
ix, as well as revisions
committed in 2017 and 2018 of the 38 Apache systems used in the rst study. The ex-
tended dataset comprises more than 37k distinct software revisions and more than 1.5
billion lines of code. We replicate the analysis presented in the former study using the new
dataset and extend it to reach the maximum commit compilability (98.4% for Apache,
99.0% for Google, and 94.3% for Net
ix). We also analyze all compilable commits to
understand the dierence between aliated and external developers of each organization
in terms of impacting dierent quality attributes.
The compilation and analysis of all revisions of Google and Net
ix systems are done
by distributing the analysis over tens of AWS m5.large nodes (2 vCPUs, 8GB Memory).
The analysis costs $102 and takes a total of 1884 hours. Consequently, its replication
using 20 cloud instances should take less than a week. Note that my approach to collect
data does not rely on an expensive on-premise infrastructure (e.g., an HPC cluster or a
73
multi-processor server machine). As a result, any researcher with access to public cloud
can replicate the studies presented in this dissertation.
The detailed list of our subject systems, the impactfuls and their relationships, the
compilability of each commit, and the quality metrics calculated for the compiled revisions
are available in our replication package
8
.
5.3 Threats to Validity
I discuss the threats to validity of the empirical study presented in this dissertation based
the guidelines by Wieringa et al. [65].
External Validity. The main threats involve our subject systems: 1) We study
a limited number of systems. However, we select 68 systems that vary along multiple
dimensions, including size, number of commits and developers, owner, used build tools,
and domain. 2) We study Java systems. However, our approach to focus on a module
is not limited to just Java systems and can be applied to any evolving system hosted in
Git. We have integrated tools that are capable of analyzing C++ in SQUAAD. 3) We
study open-source systems. However, we select our systems from non-prot (Apache)
and for-prot (Google and Net
ix) organizations.
Another threat is that our algorithm to detect relationships between impactfuls is
designed based on Git branching model. Further research should be conducted to inves-
tigate if our ndings hold for other VCSs. Analyzing only a handful of software quality
metrics is another threat. SQUAAD is capable of collecting more quality metrics, such
as technical debt calculated by SQALE [31] method (implemented in SonarQube). We
have used those metrics in other studies.
Conclusion Validity. The main threat involves our manual inspection to conrm
uncompilability of commits. We inspect the output generated by build tools to conrm the
8
https://gshare.com/s/7c3d3569932566649047
74
error for all brokens. The list of brokens is available in our replication package. Another
threat involves the statistical analyses used to answer our research questions. We use
multiple techniques to measure message similarity and identifying keywords that are an
indicator of a compile error. We also use Games-Howell test which is a well-established
technique for comparing two populations with dierent sizes.
Internal Validity. Relying on static code analysis tools, which might have false
positives and false negatives, to measure quality metrics is an internal threat to validity of
our results. However, we use three well-known static analysis tools that are widely-used by
open-source community. Using multiple techniques designed to measure similar attributes
(e.g., both UCC and SonarQube measure size) could mitigate this threat. Another threat
involves depending on commit metadata that might be inaccurate/modied. We identify
impactfuls that are older than their parent and consider this anomaly as a feature in our
prediction model.
Construct Validity. The main threat is that we do not study what compilation
ratio we could achieve by analyzing the whole software (instead of the core module) and
compare it with our results. However, in Section 5.1.2, we discuss one of the cases in which
we achieve a 100% compilation ratio although there is a compile error in other artifacts.
Also, in Section 3.1 we discuss the related work that compile the whole software and
reach a low compilation ratio.
5.4 Future Work
My approach to analyze software quality evolution is not limited to study the ocial
releases but to take a step further by analyzing the state of the software after each
commit. One future direction can be comparing the results of software evolution analysis
by commit level and release level to highlight the benets of my approach further. For
75
example, my analysis on the evolution history of 68 open-source software systems shows
from one release to the next one, software contains fewer lines of code, classes, code smells,
and security vulnerabilities in 8%, 4%, 14%, and 6% of times. However, from one revision
(produced by a commit) to the next one, these ratios are 18%("), 3%(#), 17%("), and
2%(#). Analyzing the impact of each commit on software quality can reveal a wealth of
information because commits carry ne-grained data on every stage of software evolution,
such as the author, the time, and the intent (i.e., commit message) of change.
Another future direction can be analyzing commit metadata to understand the rela-
tionships between the development patterns and impact on software quality. For example,
two extensions [2,3] of commit impact analysis [9] study the impact of developers on tech-
nical debt in open source software systems based on their level of involvement and the
characteristics of their commits. We investigate whether there is any statistical dier-
ence in the amount of change to the technical debt that a commit imposes considering
the seniority of the committer and the number of commits she has had by the time of
the commit, the interval between the commit and its parent commit, and whether the
committer is a core developer of the system based her commit frequency. We are cur-
rently expanding the capabilities of SQUAAD in order to be able to predict dierent
characteristics of a commit (e.g., committed by aliated/external developers or authored
in weekday/weekend) based on its impact on software quality using deep learning tech-
niques.
In our previous work [11,30], we conduct a large-scale empirical study of architectural
change across dierent versions of a software system. Another future direction can be
extending that work and studying architectural evolution over commit history. In chapter
3, I observe that uncompilability can happen when the software undergoes architectural
change as the developers may forget to include all modications in their commit. I have
already recovered the architecture of all revisions of the core module in 17 subject systems
76
for which I achieve a %100 compilation ratio using two architecture recovery techniques:
ACDC and PKG. The next step is to analyze architectural change using architectural
change metrics (i.e., a2a and c2c) to pinpoint when/why/by whom the architectural
change happens.
In the analysis of broken commits, I frequently observe that uncompilability happens
during specic maintenance tasks (e.g., code cleanup). I also frequently observe that
uncompilability happens when the commit message does not correctly present what has
changed. One future direction is studying commit messages and labeling commits based
on a taxonomy on maintenance tasks, then studying whether the software is more prone
to uncompilability issues and other defects during some tasks in comparison with others.
Towards this direction, we have developed a maintenance tasks taxonomy and labeled
900 solid and broken commits based on it. We have also studied the commit message
consistency over brokens. Studying how commits impact software quality based on their
purpose (i.e., maintenance task) and their message consistency helps developers incor-
porate better practices, reduces the amount of technical debt, and the total ownership
costs.
Finally, conducting analysis presented in this dissertation to new subject systems, es-
pecially by the systems that are developed by government and for-prot organizations can
be a future direction. Any organization can apply the analysis conducted by SQUAAD
to improve its software and its software engineering, achieve customer satisfaction, and
reduce the total cost of ownership. Managers can assess the quality of organization's
project types and divisions to understand which quality attributes are being achieved
poorly or well. They can also understand which types of processes and project events
correlate with which types of quality increase or decrease, and which types of personnel or
projects contribute most to quality problems or excellence. Developers can continuously
monitor the evolution of software and evaluate their impact on software quality.
77
Several organizations (Mitre, Aerospace, SEI) are interested in applying and extending
SQUAAD to evaluate systems in their domains. I will be responding to a call for DoD
SERC proposals to expand the research.
78
Chapter 6
Conclusions
I successfully design, develop, and empirically evaluate an approach to achieve a high
compilation ratio over commit history by focusing on a module instead of the whole
software. I design methods to target a module and identify its distinct revisions and
evolutionary relationships between them. I explain how to reach the maximum compi-
lation and how to scale the analysis of thousands of revisions by distributing them over
the cloud. I implement my approach in the SQUAAD framework and employ it to study
the evolution of the core module in 68 industry-scale open-source software systems across
Apache, Google, and Net
ix.
I assess the compilability of every commit and analyze the broken sequences to under-
stand how long the software is uncompilable and how many commits and developers are
involved. I study whether it is possible to create a model to predict uncompilability based
on commit metadata. I study each uncompilable sequence to understand why the software
is uncompilable and provide a guideline for developers to prevent them from committing
broken code. I study the probability for a quality metric to change while another one does
not change to show the importance of analyzing every commit from multiple perspectives.
I analyze the compiled revisions to study how software quality evolves when the software
is compilable and obtain a ground truth. I evaluate the eectiveness of my approach
in comparing two groups of commits (i.e., committed by aliated/external developers)
79
within an organization in terms of impact on software quality. I study software quality
evolution over uncompilable sequences and compare it against the ground truth obtained
from the compilable commits to understand whether the software is more prone to issues
when it is uncompilable.
One of the main conclusions of my dissertation is that targeting a module and analyz-
ing its evolution enables achieving a high compilation ratio, provides a more complete and
accurate analysis of software quality evolution, reduces the cost and complexity of anal-
ysis, and facilitates manual inspection. My analysis shows that when compilation breaks
by a commit, in most cases, the same developers immediately resolves the problem in
the next commit. However, there can be long periods over which multiple subsequent
commits are uncompilable and multiple developers are actively committing code. It also
shows that the metadata of the next commit can be used to create a model for predicting
uncompilability.
Another broad conclusion of my dissertation is that analyzing every commit is critical
as even when the size does not change there is a chance for other quality attributes to
change. My analysis shows that collecting software quality data on a large number of
commits and minimizing missing data enables identifying statistically signicant dier-
ence between two groups of commits/developers within the same organization in terms of
their impact on software quality. It also shows although the ratio of change in software
size is not signicantly dierent over compilable and uncompilable sequences, other qual-
ity attributes, such as complexity, maintainability, and security change more frequently
when the compilability breaks. This further emphasize on the fact that presence of un-
compilable code is a symptom of careless development, and the software is more prone
to error when developers commit broken code.
Achieving a high compilation ratio over commit history opens up multiple oppor-
tunities to extend the analysis presented in this dissertation for the future work. My
80
multi-perspective software quality evolution approach assesses dierent quality attributes
such as software size, code quality, and security. It utilizes complex program analysis
techniques (e.g., bytecode analysis using FindBugs) and COTS tools with complex en-
vironments (e.g., SonarQube). It enables analysis of the con
icts and synergies between
dierent quality attributes and the dierence between developers in terms of their impact
on software quality. My integrated tool-based approach has been documented in multiple
research publications [2,3,4,9,11,12,13,30,35] empowering their empirical studies and is
used by a major government entity.
Tools such as SQUAAD can enable organizations to monitor their sources of technical
debt continuously and remove them quickly during development, rather than pay for them
with interest during maintenance. The analyses presented here can be applied by any
organization wishing to improve its software and its software engineering, by having its
projects gather and analyze the types of data presented here. At the organization level,
managers can determine which of its divisions and project types have better or worse
quality; which quality attributes are being achieved poorly or well; and how do these
correlate with customer satisfaction and total cost of ownership. At the department
or project level, managers can better understand which types of projects or personnel
contribute most to quality problems or excellence, and which types of project events
correlate with which types of quality increase or decrease.
81
Bibliography
[1] C. V. Alexandru, S. Panichella, and H. C. Gall. Reducing redundancies in multi-
revision code analysis. In 2017 IEEE 24th International Conference on Software
Analysis, Evolution and Reengineering (SANER), pages 148{159, Feb 2017.
[2] R. Alfayez, P. Behnamghader, K. Srisopha, and B. Boehm. How does contributors
involvement in
uence open source systems. In 2017 IEEE 28th Annual Software
Technology Conference (STC), pages 1{8, Sept 2017.
[3] R. Alfayez, P. Behnamghader, K. Srisopha, and B. Boehm. An exploratory study on
the in
uence of developers in technical debt. In Proceedings of the 2018 International
Conference on Technical Debt, TechDebt '18, pages 1{10, New York, NY, USA, 2018.
ACM.
[4] R. Alfayez, C. Chen, P. Behnamghader, K. Srisopha, and B. Boehm. An Empirical
Study of Technical Debt in Open-Source Software Systems, pages 113{125. Springer
International Publishing, Cham, 2018.
[5] G. Antoniol, G. Canfora, and A. D. Lucia. Maintaining traceability during object-
oriented software evolution: a case study. In Software Maintenance, 1999. (ICSM
'99) Proceedings. IEEE International Conference on, pages 211{219, 1999.
[6] T. Arbuckle. Measuring multi-language software evolution: A case study. In Pro-
ceedings of the 12th International Workshop on Principles of Software Evolution and
the 7th Annual ERCIM Workshop on Software Evolution, IWPSE-EVOL '11, pages
91{95, New York, NY, USA, 2011. ACM.
[7] N. Ayewah, D. Hovemeyer, J. D. Morgenthaler, J. Penix, and W. Pugh. Using static
analysis to nd bugs. IEEE Software, 25(5):22{29, Sept 2008.
[8] S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An infrastructure for large-
scale collection and analysis of open-source code. Science of Computer Program-
ming, 79:241 { 259, 2014. Experimental Software and Toolkits (EST 4): A special
issue of the Workshop on Academic Software Development Tools and Techniques
(WASDeTT-3 2010).
[9] P. Behnamghader, R. Alfayez, K. Srisopha, and B. Boehm. Towards better un-
derstanding of software quality evolution through commit-impact analysis. In 2017
IEEE International Conference on Software Quality, Reliability and Security (QRS),
pages 251{262, July 2017.
82
[10] P. Behnamghader and B. Boehm. Towards better understanding of software main-
tainability evolution. In S. Adams, P. A. Beling, J. H. Lambert, W. T. Scherer,
and C. H. Fleming, editors, Systems Engineering in Context, pages 593{603, Cham,
2019. Springer International Publishing.
[11] P. Behnamghader, D. M. Le, J. Garcia, D. Link, A. Shahbazian, and N. Medvidovic.
A large-scale study of architectural evolution in open-source software systems. Em-
pirical Software Engineering, 22(3):1146{1193, 2017.
[12] P. Behnamghader, P. Meemeng, I. Fostiropoulos, D. Huang, K. Srisopha, and
B. Boehm. A scalable and ecient approach for compiling and analyzing commit
history. In Proceedings of the 12th ACM/IEEE International Symposium on Em-
pirical Software Engineering and Measurement, ESEM '18, pages 27:1{27:10, New
York, NY, USA, 2018. ACM.
[13] B. Boehm and P. Behnamghader. Anticipatory development processes for reducing
total ownership costs and schedules. Systems Engineering, 22(5):401{410, 2019.
[14] M. D'Ambros, H. Gall, M. Lanza, and M. Pinzger. Analysing Software Repositories
to Understand Software Evolution, pages 37{67. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2008.
[15] M. D'Ambros and M. Lanza. Visual software evolution reconstruction. Journal of
Software Maintenance and Evolution: Research and Practice, 21(3):217{232, 5 2009.
[16] T. Diamantopoulos, K. Thomopoulos, and A. Symeonidis. Qualboa: reusability-
aware recommendations of source code components. In Proceedings of the 13th
International Conference on Mining Software Repositories, pages 488{491. ACM,
2016.
[17] N. Dini, A. Sullivan, M. Gligoric, and G. Rothermel. The eect of test suite type on
regression test selection. In 2016 IEEE 27th International Symposium on Software
Reliability Engineering (ISSRE), pages 47{58, Oct 2016.
[18] R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infras-
tructure for analyzing ultra-large-scale software repositories. In Proceedings of the
2013 International Conference on Software Engineering, pages 422{431. IEEE Press,
2013.
[19] R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: Ultra-large-scale software
repository and source-code mining. ACM Trans. Softw. Eng. Methodol., 25(1):7:1{
7:34, Dec. 2015.
[20] H. Gall, M. Jazayeri, and C. Riva. Visualizing software release histories: the use of
color and third dimension. In Software Maintenance, 1999. (ICSM '99) Proceedings.
IEEE International Conference on, pages 99{108, 1999.
[21] A. Ganpati, A. Kalia, and H. Singh. A comparative study of maintainability index
of open source software. Int. J. Emerg. Technol. Adv. Eng, 2(10):228{230, 2012.
83
[22] J. Garcia, I. Ivkovic, and N. Medvidovic. A comparative analysis of software ar-
chitecture recovery techniques. In Automated Software Engineering (ASE), 2013
IEEE/ACM 28th International Conference on, pages 486{496. IEEE, 2013.
[23] M. Godfrey and Q. Tu. Growth, evolution, and structural change in open source
software. In Proceedings of the 4th International Workshop on Principles of Software
Evolution, IWPSE '01, pages 103{106, New York, NY, USA, 2001. ACM.
[24] W. H. Gomaa and A. A. Fahmy. A survey of text similarity approaches. Interna-
tional Journal of Computer Applications, 68:13 { 18, 2013.
[25] G. Gousios, B. Vasilescu, A. Serebrenik, and A. Zaidman. Lean ghtorrent: Github
data on demand. In Proceedings of the 11th Working Conference on Mining Software
Repositories, pages 384{387. ACM, 2014.
[26] F. Hassan, S. Mostafa, E. S. L. Lam, and X. Wang. Automatic building of java
projects in software repositories: A study on feasibility and challenges. In 2017
ACM/IEEE International Symposium on Empirical Software Engineering and Mea-
surement (ESEM), pages 38{47, Nov 2017.
[27] F. Hassan and X. Wang. Change-aware build prediction model for stall avoidance in
continuous integration. In 2017 ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM), pages 157{162, Nov 2017.
[28] H. He and E. A. Garcia. Learning from imbalanced data. IEEE TRANSACTIONS
ON KNOWLEDGE AND DATA ENGINEERING, 21:1263 { 1284, 2009.
[29] A. Kaur and D. Chopra. GCC-Git Change Classier for Extraction and Classica-
tion of Changes in Software Systems, pages 259{267. Springer Singapore, Singapore,
2018.
[30] D. M. Le, P. Behnamghader, J. Garcia, D. Link, A. Shahbazian, and N. Medvidovic.
An empirical study of architectural change in open-source software systems. In
Proceedings of the 12th Working Conference on Mining Software Repositories, pages
235{245. IEEE Press, 2015.
[31] J. Letouzey and M. Ilkiewicz. Managing technical debt with the sqale method. IEEE
Software, 29(6):44{51, Nov 2012.
[32] D. Link, P. Behnamghader, R. Moazeni, and B. Boehm. Recover and relax: concern-
oriented software architecture recovery for systems development and maintenance.
In Proceedings of the International Conference on Software and System Processes,
pages 64{73. IEEE Press, 2019.
[33] D. Link, P. Behnamghader, R. Moazeni, and B. Boehm. The value of software
architecture recovery for maintenance. In Proceedings of the 12th Innovations on
Software Engineering Conference (Formerly Known As India Software Engineering
Conference), ISEC'19, pages 17:1{17:10, New York, NY, USA, 2019. ACM.
84
[34] C. Macho, S. McIntosh, and M. Pinzger. Automatically repairing dependency-related
build breakage. In 2018 IEEE 25th International Conference on Software Analysis,
Evolution and Reengineering (SANER), pages 106{117. IEEE, 2018.
[35] S. Mahajan, B. Li, P. Behnamghader, and W. G. Halfond. Using visual symptoms for
debugging presentation failures in web applications. In Software Testing, Verication
and Validation (ICST), 2016 IEEE International Conference on, pages 191{201.
IEEE, 2016.
[36] Y. K. Malaiya, M. N. Li, J. M. Bieman, and R. Karcich. Software reliability growth
with test coverage. IEEE Transactions on Reliability, 51(4):420{426, Dec 2002.
[37] B. Mexim and M. Kessentini. An introduction to modern software quality assurance.
Software quality assurance: in large scale and complex software-intensive systems.
Morgan Kaufmann, Waltham, pages 19{46, 2015.
[38] A. Mockus, R. T. Fielding, and J. Herbsleb. A case study of open source software
development: the apache server. In Proceedings of the 22nd international conference
on Software engineering, pages 263{272. Acm, 2000.
[39] R. Novais, J. A. Santos, and M. Mendona. Experimentally assessing the combination
of multiple visualization strategies for software evolution analysis. J. Syst. Softw.,
128(C):56{71, June 2017.
[40] G. Pinto, W. Torres, B. Fernandes, F. Castor, and R. S. Barros. A large-scale
study on the usage of java's concurrent programming constructs. J. Syst. Softw.,
106(C):59{81, Aug. 2015.
[41] D. Rozenberg, I. Beschastnikh, F. Kosmale, V. Poser, H. Becker, M. Palyart, and
G. C. Murphy. Comparing repositories visually with repograms. In 2016 IEEE/ACM
13th Working Conference on Mining Software Repositories (MSR), pages 109{120,
May 2016.
[42] D. Rozenberg, I. Beschastnikh, F. Kosmale, V. Poser, H. Becker, M. Palyart, and
G. C. Murphy. Comparing repositories visually with repograms. In 2016 IEEE/ACM
13th Working Conference on Mining Software Repositories (MSR), pages 109{120,
May 2016.
[43] S. Ruange and G. Melan con. Animatrix: A matrix-based visualization of soft-
ware evolution. In Software Visualization (VISSOFT), 2014 Second IEEE Working
Conference on, pages 137{146. IEEE, 2014.
[44] I. Scholtes, P. Mavrodiev, and F. Schweitzer. From aristotle to ringelmann: a large-
scale analysis of team productivity and coordination in open source software projects.
Empirical Software Engineering, 21(2):642{683, 2016.
[45] H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian, and R. Bowdidge. Programmers'
build errors: A case study (at google). In Proceedings of the 36th International
Conference on Software Engineering, ICSE 2014, pages 724{734, New York, NY,
USA, 2014. ACM.
85
[46] A. Shahbazian, D. Nam, and N. Medvidovic. Toward predicting architectural signif-
icance of implementation issues. In Proceedings of the 15th International Conference
on Mining Software Repositories, MSR '18, pages 215{219, New York, NY, USA,
2018. ACM.
[47] T. Siddiqui and A. Ahmad. Data mining tools and techniques for mining software
repositories: A systematic review. In Big Data Analytics, pages 717{726. Springer,
2018.
[48] F. Z. Sokol, M. F. Aniche, and M. A. Gerosa. Metricminer: Supporting researchers in
mining software repositories. In 2013 IEEE 13th International Working Conference
on Source Code Analysis and Manipulation (SCAM), pages 142{146, Sept 2013.
[49] J. Spacco, D. Hovemeyer, and W. Pugh. Tracking defect warnings across versions.
In Proceedings of the 2006 international workshop on Mining software repositories,
pages 133{136. ACM, 2006.
[50] K. Srisopha, P. Behnamghader, and B. Boehm. Do users talk about the software in
my product? analyzing user reviews on iot products. In Proceedings of the 21st Ibero-
American Conference on Software Engineering (CIBSE), Requirements Engineering
(WER) Track, Bogot a, Colombia, 23{27 April 2018.
[51] K. Srisopha, B. W. Boehm, and P. Behnamghader. Do consumers talk about the
software in my product? an exploratory study of iot products on amazon. CLEI
Electron. J., 22(1), 2019.
[52] M. Sul r and J. Porub an. A quantitative study of java software buildability. In
Proceedings of the 7th International Workshop on Evaluation and Usability of Pro-
gramming Languages and Tools, PLATEAU 2016, pages 17{25, New York, NY, USA,
2016. ACM.
[53] N. M. Tiwari, G. Upadhyaya, H. A. Nguyen, and H. Rajan. Candoia: A platform for
building and sharing mining software repositories tools as apps. In 2017 IEEE/ACM
14th International Conference on Mining Software Repositories (MSR), pages 53{63,
May 2017.
[54] F. Trautsch, S. Herbold, P. Makedonski, and J. Grabowski. Adressing problems
with external validity of repository mining studies through a smart data platform.
In Proceedings of the 13th International Conference on Mining Software Repositories,
MSR '16, pages 97{108, New York, NY, USA, 2016. ACM.
[55] F. Trautsch, S. Herbold, P. Makedonski, and J. Grabowski. Addressing problems
with replicability and validity of repository mining studies through a smart data
platform. Empirical Software Engineering, 23(2):1036{1083, Apr 2018.
[56] Q. Tu et al. Evolution in open source software: A case study. In Software Mainte-
nance, 2000. Proceedings. International Conference on, pages 131{142. IEEE, 2000.
86
[57] M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, and
D. Poshyvanyk. There and back again: Can you compile that snapshot? Journal of
Software: Evolution and Process, 29(4):e1838{n/a, 2017. e1838 smr.1838.
[58] M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia, and
D. Poshyvanyk. When and why your code starts to smell bad. In Proceedings of the
37th International Conference on Software Engineering - Volume 1, ICSE '15, pages
403{414, Piscataway, NJ, USA, 2015. IEEE Press.
[59] M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. D. Penta, A. D. Lucia, and
D. Poshyvanyk. When and why your code starts to smell bad (and whether the
smells go away). IEEE Transactions on Software Engineering, 43(11):1063{1088,
Nov 2017.
[60] V. Tzerpos and R. C. Holt. Acdc: An algorithm for comprehension-driven clus-
tering. In Proceedings of the Seventh Working Conference on Reverse Engineering
(WCRE'00), WCRE '00, pages 258{, Washington, DC, USA, 2000. IEEE Computer
Society.
[61] G. Upadhyaya and H. Rajan. On accelerating ultra-large-scale mining. In 2017
IEEE/ACM 39th International Conference on Software Engineering: New Ideas and
Emerging Technologies Results Track (ICSE-NIER), pages 39{42, May 2017.
[62] O. Vandecruys, D. Martens, B. Baesens, C. Mues, M. D. Backer, and R. Haesen.
Mining software repositories for comprehensible software fault prediction models.
Journal of Systems and Software, 81(5):823 { 839, 2008. Software Process and Prod-
uct Measurement.
[63] M. Vasic, Z. Parvez, A. Milicevic, and M. Gligoric. File-level vs. module-level re-
gression test selection for .net. In Proceedings of the 2017 11th Joint Meeting on
Foundations of Software Engineering, ESEC/FSE 2017, pages 848{853, New York,
NY, USA, 2017. ACM.
[64] B. Vasilescu, A. Serebrenik, and V. Filkov. A data set for social diversity studies of
github teams. In Proceedings of the 12th Working Conference on Mining Software
Repositories, pages 514{517. IEEE Press, 2015.
[65] R. Wieringa. Design science methodology for information systems and software en-
gineering. Springer, 2014. 10.1007/978-3-662-43839-8.
[66] L. Xavier, A. Brito, A. Hora, and M. T. Valente. Historical and impact analysis of api
breaking changes: A large-scale study. In 2017 IEEE 24th International Conference
on Software Analysis, Evolution and Reengineering (SANER), pages 138{147, Feb
2017.
87
Abstract (if available)
Abstract
Researchers oftentimes measure quality metrics only in the changed files when analyzing software evolution over commit history. This approach is not suitable for compilation and using program analysis techniques that consider relationships between files. At the same time, compiling the whole software not only is costly but may also leave us with a large number of uncompilable and unanalyzed revisions. In this dissertation, I intend to demonstrate if analyzing changes in a module results in achieving a high compilation ratio over commit history and a better understanding of compilability and its impact on software quality. I (and my team) conduct a large-scale multi-perspective empirical study on more than 37k distinct revisions of the core module of 68 systems across Apache, Google, and Netflix to assess their compilability and identify when the software is uncompilable as a result of a developers fault. We study the characteristics of uncompilable revisions and analyze compilable ones to understand the impact of developers on software quality. We achieve high compilation ratios: 98.4% for Apache, 99.0% for Google, and 94.3% for Netflix. We identify sequences of uncompiled commits and create a model to predict uncompilability based on commit metadata. We identify statistical differences between the impact of compilable and uncompilable commits on software quality. We conclude that focusing on a module results in a more complete and accurate software evolution analysis, reduces the cost and complexity, and facilitates manual inspection. The analysis presented in this dissertation can be applied by any organization wishing to improve its software and its software engineering.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Software quality analysis: a value-based approach
PDF
Architectural evolution and decay in software systems
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
The effects of required security on software development effort
PDF
Experimental and analytical comparison between pair development and software development with Fagan's inspection
PDF
Techniques for methodically exploring software development alternatives
PDF
Using metrics of scattering to assess software quality
PDF
Calibrating COCOMO® II for functional size metrics
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Hardware and software techniques for irregular parallelism
PDF
Constraint-based program analysis for concurrent software
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Automatic test generation system for software
PDF
Shrinking the cone of uncertainty with continuous assessment for software team dynamics in design and development
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Towards the efficient and flexible leveraging of distributed memories
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Formal analysis of data poisoning robustness of K-nearest neighbors
Asset Metadata
Creator
Behnamghader, Pooyan
(author)
Core Title
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/12/2019
Defense Date
04/08/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data analytics,mining software repositories,OAI-PMH Harvest,software compilability,software cost,software evolution,software maintenance,software process,software quality,software schedule
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry William (
committee chair
), Razaviyayn, Meisam (
committee member
), Siegel, Neil Gilbert (
committee member
), Wang, Chao (
committee member
)
Creator Email
behnamghader@gmail.com,pbehnamg@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-231754
Unique identifier
UC11674228
Identifier
etd-Behnamghad-7907.pdf (filename),usctheses-c89-231754 (legacy record id)
Legacy Identifier
etd-Behnamghad-7907.pdf
Dmrecord
231754
Document Type
Dissertation
Rights
Behnamghader, Pooyan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
data analytics
mining software repositories
software compilability
software cost
software evolution
software maintenance
software process
software quality
software schedule