Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A search-based approach for technical debt prioritization
(USC Thesis Other)
A search-based approach for technical debt prioritization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A Search-Based Approach for Technical Debt Prioritization
by
Reem Abdulaziz I Alfayez
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2021
Copyright 2021 Reem Abdulaziz I Alfayez
Dedication
To my parents, Abdulaziz and Dalal.
Acknowledgments
As a Ph.D. student at the University of Southern California (USC), my journey has been one of the most
enriching experiences of my life. Throughout my Ph.D. study, I have been fortunate enough to interact
with people whose support and encouragement aided in the completion of this dissertation.
First and foremost, I would like to thank my advisor, Professor Barry Boehm, for his support, encour-
agement, and confidence in me. His immense knowledge and work ethic inspired me during every step
of the Ph.D. journey. I would also like to thank my dissertation committee members, Professor Aiichiro
Nakano and Professor Paul Adler, for their invaluable advice and encouragement.
Besides my advisor and committee members, I would like to thank Professor Jeffrey Miller for his
support throughout my years at USC. I would also like to express my gratitude to Ms. Julie Sanchez for
facilitating every aspect of my Ph.D. study. I am also thankful to my co-authors Elaine Venson, Robert
Winn, and Wesam Alwehaibi for their numerous contributions to this dissertation. It has been a great
pleasure and privilege to work with all of them. I would also like to express my appreciation to my
labmates, and I am grateful for all the experiences we shared during my time at USC. Last but not least, I
would like to thank my family and friends for their endless love, support, and encouragement.
iii
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables vii
List of Figures viii
Abstract ix
Chapter 1: Introduction 1
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Major Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Insights and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2: Dissertation Overview 7
2.1 Systematic Literature Review (SLR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Investigative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Prioritization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 3: Background 10
3.1 Technical Debt (TD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.2 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Static Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Prioritization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Weighted sum model (WSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 Analytic hierarchy process (AHP) . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.3 Cost-benefit analysis (CBA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.4 Modern portfolio theory (MPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.5 Real options analysis (ROA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 The Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Multi-objective Optimization (MOO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1 Pareto optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Search-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6.1 Genetic algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iv
3.6.2 Non-dominated sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6.3 Crowding distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6.4 Non-dominated sorting genetic algorithm-II (NSGA-II) . . . . . . . . . . . . . . 22
Chapter 4: A Systematic Literature Review of Technical Debt Prioritization 24
4.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Research Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Search strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3 Study selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.4 Data extraction and synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 5: Understanding How Software Practitioners Prioritize Technical Debt Under a Re-
source Constraint: An Investigative Study 52
5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Data Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.1 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Chapter 6: A Search-Based Approach for Technical Debt Prioritization 78
6.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.1 TD prioritization problem model . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.2 Multi-objective TD prioritization problem formulation . . . . . . . . . . . . . . . 79
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.2 Initial population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.3 Evaluation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2.4 Genetic operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2.5 Constraint handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.6 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Chapter 7: Related Work 102
7.1 Secondary Studies on TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 Empirical Studies on TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3 Search-Based Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
v
Chapter 8: Conclusion and Future Directions 113
8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
References 117
vi
List of Tables
4.1 Selected studies and their respective reference numbers . . . . . . . . . . . . . . . . . . . 30
4.2 Technical debt (TD) venues and their corresponding study numbers . . . . . . . . . . . . 30
4.3 Summary of TD prioritization approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 TD type definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 TD types addressed, decision factor categories considered, value estimation methods, and
cost estimation methods for each approach . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 Evaluation method definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Software artifact dependencies, human involvement level, and evaluation method for each
approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 TD prioritization techniques, limitations, and corresponding studies . . . . . . . . . . . . 48
5.1 Participating software practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Summary of software systems utilized in the investigative study . . . . . . . . . . . . . . 59
5.3 Value formulas utilized by software practitioners . . . . . . . . . . . . . . . . . . . . . . 68
5.4 TD prioritization approaches utilized by participating software practitioners . . . . . . . . 69
6.1 Experiment one system details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2 Software practitioners included in experiment two . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Experiment one results for effectiveness of TDPrioritizer at improving repayment value
over random search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4 Summary of the approach and participating software practitioners prioritization results . . 96
vii
List of Figures
2.1 Dissertation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Fowler’s TD quadrant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Technical debt management (TDM) activities . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 SonarQube dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Pareto-front for repayment solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Non-dominated sorting genetic algorithm-II (NSGA-II) steps . . . . . . . . . . . . . . . . 23
4.1 Selection procedure summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Breakdown of studies based on publication year . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Breakdown of approaches based on TD type addressed . . . . . . . . . . . . . . . . . . . 36
4.4 Breakdown of approaches based on decision factor categories utilized . . . . . . . . . . . 37
4.5 Breakdown of approaches based on TD value estimation . . . . . . . . . . . . . . . . . . 37
4.6 Breakdown of approaches based on TD value estimation method . . . . . . . . . . . . . . 37
4.7 Breakdown of approaches based on TD cost estimation . . . . . . . . . . . . . . . . . . . 38
4.8 Breakdown of approaches based on TD cost estimation method . . . . . . . . . . . . . . . 38
4.9 Breakdown of software artifacts based on utilization of approaches . . . . . . . . . . . . . 40
4.10 Breakdown of approaches based on required human involvement level . . . . . . . . . . . 42
4.11 Breakdown of approaches based on evaluation method . . . . . . . . . . . . . . . . . . . 43
4.12 Breakdown of TD prioritization approaches based on prioritization techniques utilized . . 45
5.1 Breakdown of participants based on role . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Breakdown of participants based on industry experience and affiliation type . . . . . . . . 59
5.3 Experiment phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Breakdown of valuation parameters based on their utilization by participants . . . . . . . 64
5.5 Breakdown of attributes selected to evaluate files based on their utilization by participants 66
5.6 Breakdown of TD prioritization approaches’ patterns utilized by participants . . . . . . . . 70
6.1 Approach overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
viii
Abstract
Technical debt (TD) is a metaphor used to account for the added software system effort or cost resulting
from taking early software project shortcuts. This acquired debt accumulates interest and becomes more
challenging to repay over time. When not managed, TD can cause significant long-term issues, such as
high maintenance costs and eventual system failures. TD prioritization is the process of deciding which TD
items are to be repaid first and which items are to be delayed until further releases. With limited resources
at their disposal, individuals may struggle to decide which TD items should be repaid to achieve the highest
possible value, as there is typically a trade-off between the value of a TD item and its cost. Though the soft-
ware engineering community has developed several TD prioritization approaches, researchers have noted
several limitations in the existing approaches and have called for developing new, improved approaches.
The focus of this dissertation is TD prioritization. A systematic literature review (SLR) was first
conducted to identify and analyze the existing TD prioritization approaches. The SLR revealed a scarcity
in the identified approaches that account for value, cost, and a resource constraint in addition to a lack of
industry evaluations. Moreover, an investigative study was conducted with 89 software practitioners to gain
a better understanding regarding how practitioners prioritize TD in the presence of a resource constraint.
The study revealed three unique patterns: most of the participants balance the trade-off between value and
cost, a smaller number of participants repaid higher value TD items first, and a single participant prioritized
lower cost TD items.
Aiming to address the limitations identified in the existing TD prioritization approaches, this disserta-
tion models the TD prioritization problem as a multi-objective optimization (MOO) problem and develops
a search-based TD prioritization approach that is capable of handling a resource constraint. The output of
the approach encompasses which TD items should be repaid to maximize the value of a given repayment
activity while minimizing its cost and satisfying its resource constraint. In its evaluation, the approach was
compared to random search using 40 open-source software (OSS) systems, and the approach surpassed
random search in terms of best obtained solution’s value in all the cases. The approach was also compared
to 66 software practitioners’ prioritized solutions and was able to obtain values similar to or greater than
the values obtained by the practitioners. Moreover, the approach required only an average running time of
3 minutes to generate the prioritized solution set.
ix
Chapter 1
Introduction
Technical debt (TD) is a metaphor coined by agile software pioneer Ward Cunningham to account for the
added software system effort or cost resulting from taking early software project shortcuts [73]. The TD
metaphor reflects that such debt accumulates interest: the later it is paid, the more it costs. TD has become
a prevalent issue worldwide and has associating costs of billions of dollars each year, as evidenced by
multiple studies [2, 3, 40, 43, 66, 74, 83, 85, 104, 110, 122, 131, 136, 174, 183, 187, 203, 212, 238].
A study conducted by CAST software, in which researchers measured the TD of 700 software systems
submitted by 158 organizations in the United States, Europe, and India, and cumulatively contained 550
million lines of code (LOC), revealed that, on average, each LOC has a TD amount of $3.61 [74]. Fur-
thermore, another study, which investigated how TD built up over time in 66 Apache Java open-source
software (OSS) systems, found that there is a monotonic upward trend of TD over time in most of the
investigated systems [85].
While TD does not necessarily have a direct impact on the external behavior of a software system, the
accumulation of TD in a given software system can lead to a significant drop of overall system quality,
increased maintenance costs, and eventual system decay [66,122,174,183,212,238]. Large amounts of TD
may cause the inability to incorporate new features without compromising existing ones [66,122,136,174,
183, 238]. Eventually, the developers of a software system may not be able to make changes to the system
without the introduction, on average, of at least one bug [183]. Furthermore, neglecting TD can result in
doubling or nearly tripling the initial repayment cost of a given TD item [110]. The findings above are
concerning, as roughly 75% of software lifecycle costs are spent on maintenance activities [83]. Moreover,
in 2016, it was found that software defects had an associated cost of $1.1 trillion worldwide and affected
4.4 billion people [9].
The consequences of accumulating large amounts of TD are not only detrimental to the system itself,
but can also have rippling, negative effects within a company. Accumulating TD can have a negative
impact on the morale and productivity of software teams [40,43,87,104,131,136,174,187,203]. In 2016,
1
Stack Overflow conducted a survey, in which 56,033 developers from 173 different countries participated.
The survey found that 29.6% of the participants considered a fragile code base as their primary challenge
in the workplace [3]. Additionally, a report published jointly by Evans Data Corp, CIA Factbook, and
Stripe revealed that an estimated worth of $85 billion in opportunity cost is lost, annually, worldwide as a
result of developers’ time spent on “bad code”. The report found that the average developer spends more
than 13 hours a week repaying TD. Furthermore, nearly 80% of participants stated that paying down TD
negatively impacts their morale, and over half of the participants believe that TD hinders their productivity.
Lastly, about two-thirds of the participants called for a clear TD prioritization to improve their productivity
in the workplace [2].
1.1 Definition
TD prioritization is the process of deciding which TD items are to be repaid in a given repayment activity
and which TD items are to be delayed until later releases based on specific, predefined rules to support the
decision [141]. Typically, the objective is to maximize the value of a repayment activity while minimizing
its cost and satisfying its resource constraint [111].
1.2 Major Challenges
TD prioritization poses several challenges for even competent and prudent individuals. In particular, both
software and business teams continually struggle to identify which TD items should be repaid [111, 185].
A naive approach would be attempting to repay all TD items existing in a system. However, this may
not be feasible, as typically there are limited allocated resources available for TD repayment activities
[29, 45, 56, 91, 95, 111, 237].
The scarcity of allocated resources for TD repayment stems from the innate nature of TD. Repaying
TD does not necessarily have a direct impact on the external behavior of a system, and the effect is only
limited to its internal quality. This poses a challenge in recognizing and acknowledging the benefits of
TD repayment [56, 95, 101]. As a result, both software and business teams prefer to spend available
resources on implementing new features or fixing bugs [111]. Therefore, generally, TD repayment is a
highly constrained activity with limited resources available [29, 45, 56, 91, 95, 111, 237].
Another challenge with TD prioritization is that with limited resources at their disposal, individuals
may struggle to decide which TD items should be repaid to achieve the highest possible value, as there
is typically a trade-off between the value obtained from repaying a given TD item and its associated
cost [111]. The value and cost of a given TD item tend to be conflicting objectives, as improvements in
2
one generally have adverse consequences for the other. Finding an ideal middle spot that balances the
trade-off between value and cost is a hard feat, especially when there is a tremendous number of TD items
to be repaid [168]. The TD prioritization problem grows exponentially with the number of identified TD
items. The decision space is enormous, as each TD item has two options: inclusion or exclusion from a
repayment activity. Due to the exponential nature of the TD problem, finding the most optimal solution
that maximizes value, minimizes cost, and satisfies a resource constraint in large problems is extremely
difficult and is considered as an NP-hard problem [128, 168].
1.3 Insights and Hypothesis
This section presents the key insights underlying the research and the hypothesis that this dissertation
revolves around.
1.3.1 Insights
Insight 1: The TD prioritization problem is a form of a multi-objective optimization
(MOO) problem
The first key insight is that the TD prioritization problem is a form of a MOO problem. In this type
of problems, there exists more than one objective that is sought to be optimized, and these objectives
may be conflicting [68, 169], which means that an improvement in one objective’s value might degrade
another objective’s value. As a result, in MOO type problems, there is no one optimal solution. On
the contrary, there is a set of potential optimal solutions, with varying degrees of trade-offs between the
objectives [68, 169]. The set of optimal solutions assists decision makers in the decision process, as it
provides them with an enumeration of the trade-offs between objectives. Solving this form of optimization
problems is an active area of study and research, and there exists many algorithms and techniques to
address it [68, 169]. Consequently, one can formulate the TD prioritization problem as a MOO problem
and apply one of the existing multi-objective evolutionary algorithms (MOEAs) to solve it.
Insight 2: The TD prioritization problem is a form of a constrained, combinatorial-
optimization problem
The second key insight is that the TD prioritization problem can be considered as a form of the canonical
restricted 0-1 knapsack problem, which is known to be NP-hard in its optimization form [116, 168]. Both
the TD prioritization problem and the knapsack problem have a constraint that needs to be satisfied and
objectives that are sought to be achieved and balanced. Regardless of the restricted 0-1 knapsack problem’s
3
challenges and complexity, researchers have applied many algorithms and constraint-handling techniques
to address it [116, 168]. One can consider utilizing some of these techniques that were previously applied
to address the 0-1 knapsack problem when designing a TD prioritization approach that satisfies a resource
constraint and aims to balance the trade-off between value and cost. It is important to note that the exist-
ing techniques only find near-optimal solutions, and there is no technique that claims to find an optimal
solution for large instances of the problem [128, 156, 168].
Insight 3: A near-optimal solution might be sufficient
Aiming to find an optimal solution to the TD prioritization problem may be an over-ambitious goal. It
is well-known that it is very difficult to find an optimal solution for NP-hard problems, especially those
at a significant size [128, 156, 168]. The challenge does not only lie in finding the most optimal solution
itself, but also in verifying that the obtained solution is an ideal one [128, 156, 168]. No study in the
current literature that addresses NP-hard problems has claimed to have found an optimal solution, as all of
the identified solutions are instead near-optimal [21, 62, 138, 156, 157]. However, a near-optimal solution
may be sufficient enough if it satisfies the needs and objectives of the individuals who are aiming to solve
the given problem. In the context of TD prioritization, a near-optimal solution will suffice if the given
solution outperforms the competitive solution at hand. The competitive solution varies depending on the
objective of the evaluation. The evaluation of this dissertation’s approach and the goals of said evaluation
are described in more detail in Section 6.3.
1.3.2 Hypothesis
Based on the three insights described above, the hypothesis statement of this dissertation is as follows:
Search-based optimization techniques can prioritize TD items in the presence of a resource constraint with
high effectiveness.
To evaluate the hypothesis, the TD prioritization problem was modeled as a MOO problem. Subsequently,
a TD prioritization approach that maximizes for TD value, minimizes TD cost, and satisfies a resource
constraint was developed and implemented. The approach utilizes a search-based optimization technique.
Specifically, the approach incorporates the non-dominated sorting genetic algorithm-II (NSGA-II), which
is a search-based MOEA, and a greedy-repair, constraint-handling technique.
The effectiveness of the approach was demonstrated through the means of two experiments. In the first
experiment, the performance of the approach outperformed the performance of random search on 40 OSS
systems. In the second experiment, the performance of the approach was compared to the performance
4
of 66 software practitioners. The approach was able to obtain values similar to or greater than the values
obtained by the software practitioners. Moreover, the approach was also able to produce prioritization
solutions with an average of 3 minutes. Lastly, the approach gained positive feedback from industry
software practitioners. The results of the evaluation confirmed the hypothesis of this dissertation and
illustrated that search-based techniques can be effective when prioritizing TD items in the presence of a
resource constraint.
1.4 Contributions
The contributions of this dissertation include a systematic literature review (SLR) that provides an in-
depth analysis of the currently existing TD prioritization approaches, an investigative study that explores
and identifies how software practitioners prioritize TD in the presence of a resource constraint, the design
and development of a TD prioritization approach that utilizes a search-based optimization technique, and
an evaluation of said approach to assess its effectiveness and efficiency.
1. A systematic literature review (SLR) of TD prioritization approaches: The SLR identified 24
unique prioritization approaches that utilize 10 different prioritization techniques. The SLR revealed
a scarcity of TD prioritization approaches that can be applied to any TD type while simultaneously
considering value, cost, and a resource constraint. Additionally, the SLR demonstrated the lack of
evidence regarding the effectiveness and applicability of a few of the identified TD prioritization
approaches in industry settings. The SLR is presented in Chapter 4.
2. An investigative study of TD prioritization approaches utilized by software practitioners under
a resource constraint: The investigative study explored how 89 software practitioners prioritize
TD items in the presence of a resource constraint. The study revealed three unique prioritization
patterns. The majority of participants aimed to balance the trade-off between value and cost in their
prioritization approaches. A smaller number of participants prioritized TD items based solely on
value by selecting TD items that have a higher value first. One participant prioritized TD items
solely based on cost, prioritizing lower cost TD items over others. Furthermore, the study confirmed
that value is subjective to the eye of the beholder, and value and cost might be context-dependant.
The investigative study is detailed in Chapter 5.
3. A TD prioritization approach: Initially, the TD prioritization problem was formulated as a MOO
problem. Subsequently, a novel, search-based TD prioritization approach was designed and imple-
mented. The approach incorporates the NSGA-II, which is a MOEA that is search-based, and it
handles a resource constraint through a greedy-repair, constraint-handling technique. The approach
5
selects TD items for repayment that will maximize the value of a repayment activity while minimiz-
ing its cost and satisfying its resource constraint. The approach was evaluated through the means
of two experiments to assess its effectiveness and efficiency. In the first experiment, the approach
performance was compared to random search using 40 OSS systems, and the approach surpassed
random search in all the cases. The second experiment compared the performance of the approach to
the performance of 66 software practitioners, and the approach was able to obtain solutions’ values
similar to or greater than the ones obtained by the practitioners. Moreover, the approach required
only an average running time of 3 minutes to generate the prioritized solution set. The approach and
its evaluation were presented to software practitioners, and it gained an overall positive feedback.
The approach and its evaluation are described in more detail in Chapter 6.
6
Chapter 2
Dissertation Overview
This chapter provides an overview of this dissertation. The goal of this dissertation is to develop an
approach for technical debt (TD) prioritization that maximizes value and minimizes cost in the presence
of a resource constraint while using a search-based optimization technique. To attain its primary goal, this
dissertation is divided into three phases: (1) a systematic literature review (SLR) that analyzes the current
TD prioritization approaches (Chapter 4), (2) an investigative study that explores and identifies patterns in
TD prioritization approaches utilized by software practitioners under a resource constraint (Chapter 5), (3)
and a prioritization approach that is novel and search-based (Chapter 6). Figure 2.1 summarizes the aims
and findings of each phase. The following sections provide a brief description of each phase and discuss
how each phase guided the subsequent one.
Systematic Literature
Review (SLR)
Objective: A synthesized reference of
the current TD prioritization approaches
Method:
• Conducting an SLR
Results:
• Identifying and summarizing 24
unique prioritization approaches,
including: TD type addressed,
decision factors considered, software
artifacts required, type of human
involvement level needed, and type of
evaluation employed
• A scarcity of approaches that
consider value, cost, and a resource
constraint while being applicable to
any TD type
• Lack of evaluation and evidence of
approaches’ applicability in industry
settings
• Confirming that TD item’s value and
cost are context-dependent
Investigative
Study
Objective: Identifying and exploring
patterns in TD prioritization approaches
utilized by software practitioners under
a resource constraint
Method:
• Controlled experiment and a
questionnaire with 89 software
practitioners
Results:
• Identifying three unique TD
prioritization approaches’ patterns
based on decision factor categories:
§ Balancing value and cost (66
participants)
§ Value only (22 participants)
§ Cost only (1 participant)
• Confirming that TD item’s value and
cost are context-dependent and
subjective
Prioritization
Approach
Objective: Designing and developing
a TD prioritization approach that
maximizes the value of a repayment
activity, minimizes its cost, and
satisfies its resource constraint
Method:
• Formulating the TD prioritization
problem as a multi-objective
optimization (MOO) problem
• Applying the non-dominated sorting
genetic algorithm-II (NSGA-II), a
multi-objective evolutionary
algorithm (MOEA) that is search-
based
• Applying a greedy-repair, constraint-
handling technique
Results:
• The approach acquired solutions’
values greater than the solutions’
values obtained by random search in
an evaluation using 40 open-source
software (OSS) systems
• The approach achieved repayment
solutions’ values similar to or greater
than the solutions’ values obtained by
66 software practitioners
• The average running time of the
approach is 3 minutes
Figure 2.1: Dissertation overview
7
2.1 Systematic Literature Review (SLR)
The systematic mapping review (SMR) by Li et al. [141] and the SLR by Ampatzoglou et al. [29] have
pointed out a lack of researched TD prioritization approaches. However, neither of the two studies provided
an in-depth analysis on the currently researched TD prioritization approaches. The study by Li et al. [141]
addresses TD in general without a specific focus on TD prioritization, and the study by Ampatzoglou et
al. [29] studies TD only from a financial point of view. The lack of a study that analyzes and synthesizes
the currently researched TD prioritization approaches motivated the conducting of an SLR. The benefits
of having such a synthesized reference are two-fold. The reference can aid individuals when designing and
developing new TD prioritization approaches. Moreover, it can aid potential TD prioritization approach
users when aiming to find the available approaches and deciding which approach is the most suitable to
their needs. The SLR identifies 24 unique TD prioritization approaches and summarizes the steps of each
approach. Subsequently, the SLR identifies the TD types addressed by each approach, the decision factor
categories considered by each approach, the required software artifacts that each approach depends on,
the level of human involvement required by each approach, and how each approach was evaluated. The
SLR also identifies the prioritization techniques that each approach is built upon and summarizes a few
of the known limitations associated with these techniques. The findings of the SLR include: variations
in the decision factor categories considered by each approach; a scarcity of general (i.e., can be applied
to any TD type) TD prioritization approaches that consider value, cost, and a resource constraint; lack of
evidence concerning the applicability of some of the approaches in industry settings; and that value and
cost are context-dependant.
2.2 Investigative Study
The SLR revealed that not all TD prioritization approaches consider the same decision factor categories. A
few of the approaches solely consider value while others aim for a balance between the value and cost of a
given TD item. Additionally, most of the approaches are designed without any consideration of a resource
constraint, which conflicts with real-world scenarios due to the fact that TD repayments are typically
highly constrained [29,45,56,91,95,111,237]. Lastly, most of the approaches are assumed to be effective
without any form of industry evaluation. The lack of an industry evaluation diminishes the credibility of
an approach and signifies a lack of understanding on how prospective users perceive said approach.
The findings mentioned above inspired and motivated the conducting of an investigative study that
explores how software practitioners prioritize TD in the presence of a resource constraint. The study
consists of a controlled experiment in which 89 software practitioners were requested to select TD items to
include for repayment under a resource constraint. The study revealed three unique prioritization patterns
8
in the presence of a resource constraint. The majority of the participants aimed to balance the trade-off
between the value and cost of TD items in their prioritization. A smaller number of participants only
considered value and repaid higher value TD items first, and only one participant prioritized lower cost TD
items. The study also revealed that the value and cost of TD items are subjective to the individual and the
situation at hand.
2.3 Prioritization Approach
The investigative study demonstrated that a few software practitioners rely solely on value when prioritiz-
ing TD items, and that one participant prioritized TD items based solely on cost. Applying one of these
approaches only requires sorting the approaches based on the selected decision factor category. However,
the challenge with such approaches lies within the assignment of value or cost to various TD items. Valu-
ing and estimating cost of TD items are not trivial tasks, as they vary based on the situation at hand and the
perspective of the software practitioner. TD valuation and cost estimation are both parts of TD measure-
ment, which is a step in technical debt management (TDM) that is separate from TD prioritization. It is
important to note that the focus of this dissertation is TD prioritization. Therefore, the valuation and cost
estimation of TD are out of this dissertation’s scope.
The investigative study also revealed that the majority of the participants aim to achieve the high-
est possible overall value of a repayment activity by balancing the trade-off between the value and cost
of TD items. However, applying a balanced approach is complicated due to the challenges described
in Section 1.2. To address the needs of this set of practitioners, this dissertation’s work phase aims to
design and develop a search-based TD prioritization approach. The approach maximizes the value of a
repayment activity while minimizing its cost and satisfying its resource constraint. The approach utilizes
the non-dominated sorting genetic algorithm-II (NSGA-II), which is a multi-objective evolutionary algo-
rithm (MOEA) that is search-based, and a greedy-repair, constraint-handling technique [81, 168]. The
approach was implemented in Java as a prototype tool and is named: TDPrioritizer. The approach was
evaluated using two experiments. The first experiment was conducted with 40 open-source software (OSS)
systems to compare the performance of the approach against that of random search. The approach outper-
formed random search in all the cases. The second experiment was conducted to compare the performance
of the approach against the performance of 66 software practitioners. The approach was able to achieve re-
payment solutions’ values similar to or greater than those obtained by software practitioners. The average
running time of the approach for both experiments was 3 minutes.
9
Chapter 3
Background
This chapter provides necessary background information that is utilized throughout this dissertation. Sec-
tion 3.1 summarizes the fundamentals of technical debt (TD). Section 3.2 provides an overview of static
code analysis. Section 3.3 presents some techniques that are utilized in TD prioritization. Section 3.4
demonstrates the knapsack problem. Section 3.5 summarizes multi-objective optimization (MOO), and
Section 3.6 outlines the basics of search-based techniques.
3.1 Technical Debt (TD)
3.1.1 Definition
TD is a metaphor coined by agile software pioneer Ward Cunningham to account for the added software
system effort or cost resulting from taking early software project shortcuts [134]. The TD metaphor reflects
that such debt accumulates interest: the later it is paid, the more it costs. While TD can be treated as an
investment to enable rapid delivery in time-critical situations, such as meeting market windows for early
adopters or trying to build the riskiest parts first to determine whether continuing the project is feasible,
there are more and less responsible ways of incurring TD that suffer from a lack of foresight, such as in
building the easiest parts first, neglecting non-functional requirements, or skimping on software systems
engineering or maintainability preparation [94, 134].
Cunningham defines the metaphor as follows: “Shipping first-time code is like going into debt. A
little debt speeds development as long as it is paid back promptly with a rewrite. Objects make the cost
of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-
quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-
still under the debt load of an unconsolidated implementation, object-oriented or otherwise” [73]. Other
definitions of TD have also appeared to clarify and explain the metaphor. Steve McConnell defines TD as:
“A design or construction approach that’s expedient in the short term but that creates a technical context
10
in which the same work will cost more to do later than it would cost to do now (including increased cost
over time)” [162]. Additionally, a week-long Dagstuhl Seminar on Managing Technical Debt in Software
Engineering was held in 2017 [34]. The seminar produced a consensus definition for TD which is “In
software-intensive systems, technical debt is a collection of design or implementation constructs that are
expedient in the short term, but set up a technical context that can make future changes more costly or
impossible. Technical debt presents an actual or contingent liability whose impact is limited to internal
system qualities, primarily maintainability and evolvability” [34].
3.1.2 Types
The TD metaphor has recently gained the software engineering community’s attention even though it
was introduced 25 years ago [141]. The International Workshop on Managing Technical Debt (MTD)
series has brought together practitioners and researchers since 2010 to discuss TD and its management and
share emerging practices used in software development organizations [206]. Additionally, the Dagstuhl
Seminar on Managing Technical Debt in Software Engineering produced a consensus definition for TD, a
draft conceptual model, and a research road-map [34]. The seminar also resulted in graduating the MTD
workshop into a conference to accelerate progress on studying TD, and the first International Conference
on Technical Debt (TechDebt) was held in 2018.
When the TD metaphor started being used, it was usually associated with compromises and poor tech-
nical choices on the code level of software [73, 193, 213]. However, the software engineering community
took liberty with Cunningham’s metaphor and started to expand the TD metaphor to refer to any bad deci-
sions or issues that plague software systems, software systems development lifecycle, and software systems
development process without regard to the metaphor’s original and refined definitions that limit TD to in-
ternal system qualities [27, 193, 213, 217]. In the literature, TD started to be associated with requirements
debt [54], design debt [213, 237, 238], architectural debt [177], test debt [54], process debt [143], docu-
mentation debt [134], build debt [174], people debt [134], quality debt [213], configuration management
debt [213], social debt [215], defect debt [19], infrastructure debt [65], process debt [66], automated testing
debt [229], usability debt [239], service debt [28], and platform experience debt [213].
TD can also be categorized using McConnell’s categorization, which is based on the reasons behind
TD incurrence, into intentional and unintentional [162]. Intentional TD occurs when an organization
or developer makes a conscious decision to optimize for the present rather than the future [162, 193].
The reasons to take intentional TD include pressure to meet deadlines [193, 233], lowering the current
release cost [193], faster time-to-market [143, 193], faster customer feedback [143]. On the other hand,
unintentional TD [162, 193] is incurred unknowingly as a result of team members incompetence, business
immaturity, or process deficiency that lead to poor engineering practices [162, 193]. Unintentional TD
11
is also known as reckless debt, naive debt, or mess. Furthermore, TD can also be categorized based
on Martin Fowler’s TD quadrant [14], as Figure 3.1 illustrates. The quadrant divides TD based on the
awareness (deliberate/inadvertent) of the debt occurrence and the reasons (reckless/prudent) of incurring
it.
Figure 3.1: Fowler’s TD quadrant [14]
3.1.3 Management
Technical debt management (TDM) is the process of acknowledging, identifying, measuring, and reducing
TD, which includes processes, techniques, and tools. It is essential to the software system evolution
[193, 213], and it should be conducted collaboratively between the technical and business teams to ensure
addressing and balancing their goals [213]. Li et al. [141] conducted an extensive study, in which the
researchers mapped 49 primary studies on TDM. The study concluded that TDM can be described in
eight distinctive activities, summarized in Figure 3.2 and defined below based on the definitions of Li et
al. [141]:
– Identification: Detecting TD resulting from unintentional or intentional technical decisions in a
software system using specific techniques, such as static code analysis.
– Measurement: Estimating the level of the overall TD in a system or measuring the cost and
benefits of the identified TD in a software system using estimation methods.
– Representation: Documenting TD uniformly while addressing the concerns of all stakeholders.
– Communication: Sharing identified TD and its effects among all stakeholders to facilitate its
discussion and management.
12
– Prioritization: Deciding which TD items are to be repaid in a repayment activity and which TD
items to delay until later releases based on specific, predefined rules to support the decision.
– Repayment: Resolving or mitigating identified TD in a software system using techniques, in-
cluding refactoring. Refactoring is the process of changing a software system by improving its
internal quality and structure without altering the external behavior [101].
– Monitoring: Observing the changes in the value and cost of unpaid TD over time.
– Prevention: Preventing incurring potential TD.
Managing Technical Debt
Identification Measurement Representation Communication Prioritization Repayment Monitoring Prevention
Figure 3.2: Technical debt management (TDM) activities
The focus of this dissertation is TD prioritization, and as TD identification and measurement are pre-
requisites to TD prioritization, a summary of the current state of the art approaches to identify and measure
TD is provided. The approaches are categorized into human and automated assessment approaches:
– Human assessment approaches
Guo and Seaman [109] developed a portfolio approach for TDM; the main component of this portfolio is
the “TD list” that contains “TD items”. Each item represents an incomplete task that may cause future
issues. Having the complete TD list helps in acknowledging TD’s existence, understanding its conse-
quences, and deciding which TD items should be repaid first. The approach is adopted from the finance
domain in which it is used as a risk reduction strategy for investors since it helps in determining the types
and amounts of assets to be invested or divested. A similar approach was created by Li et al. [142] for
managing architectural TD. Moreover, code reviews can also be utilized to identify TD; reviewers can
check other developers’ solutions against the architecture design and coding standards to detect possible
TD in the design and code [152]. Additionally, applying agile practices in a more restrictive manner, such
as having a strong “definition of done” in addition to adding the location, description, and potential cost of
TD to the “product backlog”, helps in identifying and measuring TD during the development phase in the
early stages of development [193, 213].
13
– Automated assessment approaches
Automated approaches for identifying and measuring TD mostly depend on static code analysis, defined
in Section 3.2. The tools analyze source code to identify violations of coding standards and guidelines,
lack of testing, and different software component dependencies [234]. These tools support a wide range
of programming languages and exist as free software, such as SonarQube [10], as well as commercial
software, such as CAST [1] and kiuwan [5]. Additionally, the software engineering research community
has developed many tools that target the identification and quantification of specific types of TD, such as
architectural TD [142,160,226,232], build TD [174], code TD [132,163], requirements and documentation
TD [77], defect TD [20], and test TD [219].
Another active area of TDM is identifying and measuring self-admitted technical debt (SATD), an in-
tentional TD acknowledged by developers [186], by studying source code comments and issue trackers.
To identify SATD in code comments, researchers extract code comments and examine each comment to
classify it based on whether it is an SATD comment or not. The examination process originated through
using a manual approach to determine patterns in the comments that indicate SATD [186] and advanced
into utilizing automated approaches that employ text mining [119] and natural language processing (NLP)
techniques [151]. The researchers improved their SATD predictions and developed a tool that helps soft-
ware system development teams be instantly aware of the SATD as soon as it is introduced to facilitate
its tracking and management [145]. Similarly, researchers have examined the existence of TD in issue
trackers through manual examination [44] and applying NLP techniques [75].
3.2 Static Code Analysis
Static code analysis is the process of analyzing a software system without executing it [213]. The analysis
can be performed on the source code, which is the exact reflection of the software system itself or the
compiled bytecode [146]. The analysis reveals software issues and flaws that are not visible to the compiler,
such as bugs, violations of coding standards and guidelines, or design issues [213]. It can be considered as
a machine-assisted code review that assists in detecting developers’ mistakes at early stages.
SonarQube is a static code analysis tool that was utilized throughout this dissertation to identify and
measure TD. SonarQube was selected to identify and measure TD, as it is the only open-source soft-
ware (OSS) tool that identifies and quantifies TD while also being widely used in industry and OSS com-
munities alike [57,140]. SonarQube helps developers continuously improve their source code by providing
measures on TD, bugs, security vulnerabilities, code complexity, code duplicates, unit tests, code cover-
age, and comments. While SonarQube provides a wide range of code analyzers that supports over 25
programming languages, including Java, C, C++, C#, PHP, and JavaScript, not all of the analyzers are
14
released as free software. Users can integrate these analyzers into various development environments or
run the SonarQube analyzer, a Java-based command-line tool, to analyze their code. Such practice en-
sures continuous inspection throughout the development process and guarantees that code quality analysis
and reporting will be an integral part of the development lifecycle [57]. SonarQube analysis results are re-
ported in a dashboard, which can be customized and edited by users. An example of the default SonarQube
dashboard is provided in Figure 3.3.
Figure 3.3: SonarQube dashboard
Each SonarQube analyzer has numerous rules that detect general and language-specific quality issues.
In its early releases, SonarQube provided an implementation of the SQALE pyramid [139]. However, in
its recent releases, the tool no longer implements the SQALE pyramid. Instead, it implements a wide
range of coding standards and convention rules, such as some of the SQALE rules, Oracle Java Platform
standards [4], Motor Industry Software Reliability Association (MISRA) coding standards for C and C++
[7], SEI CERT Coding Standards [8], World Wide Web Consortium (W3C) standards [15], and Maven
conventions [6]. The complete list of the rules and their associated sources is publicly available in [11]. In
addition to the predefined rules, users can extend each analyzer by adding their own custom rules [57].
Each SonarQube rule that detects a TD item assigns it a severity score based on the SonarQube team’s
perception of the item’s probable impact on future maintenance activities. The severity of a given TD item
is categorized from lowest to highest as follows: info, minor, major, critical, or blocker. Additionally, each
rule has an associated remediation effort function that determines the required effort, in minutes, to repay
the identified TD item. For some rules, the remediation effort is constant for all the TD items. For other
15
rules, the remediation effort is calculated dynamically based on the specification of each TD item incident,
such as the number of lines of code (LOC) involved [84]. The remediation effort is utilized to compute the
total TD in a system by summing the remediation effort of every TD item.
3.3 Prioritization Techniques
This section presents an overview of decision-making techniques that have been utilized for prioritization
in the currently existing TD prioritization approaches. The overview encompasses a summarized descrip-
tion of each technique, its history, strengths, and drawbacks. It is important to note that each technique’s
presented strengths and weaknesses are not comprehensive, as they are limited to the ones identified in the
reviewed literature.
3.3.1 Weighted sum model (WSM)
WSM is a technique that is applied to solve multiple-criteria decision problems. There exist multiple
alternatives in this set of problems, and these alternatives should be evaluated based on multiple criteria.
The WSM assigns a weight of importance to each criterion considered. Afterward, each alternative is
evaluated based on each criterion and assigned a score. Subsequently, the weighted scores are obtained by
multiplying each criterion’s score by its weight. Finally, the global score of each alternative is calculated
by summing the retrieved weighted scores of the criteria [222]. In 1967, Fishburn formulated WSM
in [97]. Since then, WSM has been one of the most applied multiple-criteria decision techniques due to its
effectiveness and simplicity. However, WSM has some drawbacks, including its inapplicability when the
decision criteria have different units of measurements, as summing such criteria is similar to adding apples
and oranges [222]. Additionally, there is a challenge in determining each criterion’s appropriate weight,
and the results are highly sensitives to these weights [222]. Furthermore, there usually exists a pool of
multiple feasible alternatives. However, WSM does not indicate which alternatives are to be selected to
achieve the highest possible goal in the presence of a resource constraint.
3.3.2 Analytic hierarchy process (AHP)
AHP is a practical approach to handle multiple-criteria decision problems. AHP considers a set of al-
ternatives that are prioritized based on a set of evaluation criteria. In AHP, each evaluation criterion is
assigned a weight according to pairwise comparisons of the considered criteria. The higher the weight, the
more important the corresponding criterion. After determining the weight of all evaluation criteria, each
16
alternative is evaluated based on each criterion and assigned a score based on alternatives’ pairwise com-
parisons. The higher the score, the better the alternative is based on each considered criterion. Eventually,
the weighted sum of each alternative is calculated to determine each alternative’s global score based on
all the considered criteria. The global score indicates each alternative’s rank, and the best alternative has
the most suitable trade-off among the different criteria [86, 196, 197, 222, 223, 231]. AHP was introduced
by Thomas Saaty in the 1970s to fulfill the need for sufficient means to handle weapon trade-offs while
working at the Arms Control and Disarmament Agency at the U.S. Department of State [197, 223]. Saaty
formulated and explained AHP in [196], and the technique became popular in use. The popularity of AHP
is owed to its straightforwardness, convenience, and ability to tackle complex decision problems. AHP de-
composes a complex problem into a series of pairwise comparisons to effectively capture multiple decision
criteria and compare alternatives concerning various criteria. AHP can also help decision makers organize
their thoughts and judgments to make more effective decisions [199]. However, identifying the ideal cri-
teria and weights in AHP can be challenging, and conducting pairwise comparisons in large hierarchies
can consume large amounts of time and effort [86,178,222,225]. Moreover, introducing a new alternative
at the end of an AHP analysis requires redoing the analysis to consider the newly added alternative in
the ranking, which might demand considerable effort [225, 231]. Additionally, the introduction of a new
alternative might result in a rank reversal, a change in the relative ranking of some of the original alterna-
tives [86, 222, 225, 231]. Another limitation of AHP is its lack of guidelines that aid when deciding which
alternatives to select from a pool of multiple feasible alternatives in the presence of a resource constraint
to obtain the highest possible desired outcome.
3.3.3 Cost-benefit analysis (CBA)
CBA is a decision-making technique that aids in determining the feasibility of an option. The process
starts by calculating the aggregated cost and benefit of an option and then quantitatively comparing its
total cost to its total benefit. In most cases, an option will only be implemented if its total benefit exceeds
its total cost [221]. CBA was initially developed by Jules Dupuit to determine tolls for a bridge that he
was working on. In 1848, Dupuit outlined the principles of his evaluation process in [88]. The process
was refined and popularized by the British economist Alfred Marshal in 1890 [155]. Even though CBA
sounds a commonsensical and straightforward process that aids in determining the worthiness of an option,
there might be a difficulty in accurately estimating the actual cost and benefit of an option. Accurately
estimating the actual cost and benefit of an option is crucial, as the results of CBA are highly dependent on
these estimates [173]. Additionally, CBA does not provide any guidelines to follow when deciding which
options to be selected to implement from a pool of multiple feasible options when there is a resource
constraint to maximize the obtained value.
17
3.3.4 Modern portfolio theory (MPT)
MPT is a mathematical model for compiling a portfolio of investments such that the expected return on
investment (ROI) is maximized for a given level of risk. A portfolio consists of a collection of investments
and the relationship among each other’s returns. It encompasses the expected ROI and the risk involved
for each investment [189]. Portfolio management refers to managing and selecting an investment policy
that maximizes the ROIs and minimizes risk. Portfolio management involves investment mix and policy
decisions, aligning investments with objectives, allocating assets, and balancing performance and risks.
The basic portfolio model was developed by Nobel prize winner Harry Markowitz, in 1952, to find the
optimal portfolio selection [154]. The model considers the expected rate of ROI, the risk of an investment,
and the interrelationship between investments as measured by the correlation between the returns on these
investments. The model assumes that an investor prefers higher ROIs and that an investor is a risk-averse
[189]. Additionally, the model seeks a diversified portfolio to reduce risk, and a lack of correlation between
two investments is an indicator of diversity [154, 189]. The model ranks and prioritizes investments in a
holistic manner that simplifies the process of comparing multiple investments, reduces the overhead of
comparisons, and aids in making better strategic planning decisions [189]. However, the model depends
on the risk measures, returns measures, and correlations. These calculations might be challenging and
complicated for a large number of investments, and they require some professional skills and historical
data that make adapting such a model expensive [147, 189]. The model might not be beneficial for a
small number of investments, and the assumption that an investor is risk-averse might not hold in all
situations [189].
3.3.5 Real options analysis (ROA)
A real option is the right of an investor to defer, expand, contract, abandon, or otherwise alter an economic
asset on fixed terms before they are no longer able to [25, 176]. ROA is examining and choosing between
such options to obtain the most beneficial outcome on a given investment [175,176,214]. ROA is useful for
identifying, understanding, valuing, prioritizing, selecting timing, optimizing, and managing the strategic
business and capital allocation decisions [53,175,176]. ROA was coined and formulated by Stewart Myers,
in 1977, to overcome the lack of flexibility in traditional valuation approaches [25, 176]. The strength of
ROA lies within the flexibility it provides to management, which inadvertently lowers the risk of potential
investments; ROA allows decision makers to respond when information becomes available and adhere
to market conditions and how the development of the actual product is before investing large sums of
money [175]. However, when applying ROA, a hurdle might arise in obtaining accurate estimates for
value and cost. Obtaining accurate estimates is vital, as these estimates have a significant influence on
18
quality. Moreover, when applying ROA to software engineering problems, individuals should be aware
that some of the ROA’s assumptions, such as tradability and liquidity, might not apply to the software
engineering field. Additionally, individuals should expect a challenge when explaining ROA to software
practitioners and technical leaders, as it might be too complex to communicate [214].
3.4 The Knapsack Problem
The knapsack problem is a form of combinatorial-optimization problems. In the knapsack problem, there
is a set of items that each has an associated value and weight. The problem aims to select a subset of these
items that has the maximum possible value while satisfying a weight capacity limit [156,210]. There exist
many forms of the knapsack problem, and some examples are provided below:
– Fractional knapsack: Items can be broken into fractions and added to the selected set [210].
– 0/1 knapsack: Each item must be entirely included or entirely excluded in the selected set
[156, 210].
– Bounded knapsack: There is an upper bound on the frequency of including each item [156].
– Unbounded knapsack: There is no upper bound on the frequency of including each item [156].
All forms of the knapsack problem except the fractional knapsack are considered as NP-hard problems
[156]. There is no known polynomial algorithm to find an optimal solution for NP-hard problems, and
the time to find a solution for this type of problem grows exponentially with the problem size, which
makes finding an optimal solution for large instances of this type of problem extremely hard [70, 128].
Regardless of the knapsack problem complexity, the theoretical computer science community has put
considerable effort towards developing approaches to find approximately optimal solutions. This is due
to the applicability of the knapsack problem in real-world decision-making processes in a broad range of
fields, such as finance and industrial [21, 62, 116, 128, 138, 157, 168].
3.5 Multi-objective Optimization (MOO)
Optimization is the process of finding the best solution among a set of alternatives subject to none or
some constraints [60, 79]. The set of all possible solutions composes the search space, and the goal of
optimization is to find the best solution or set of best solutions [60, 79].
While single objective optimization utilizes a single measure for determining the best solution, MOO
uses two or more measures. These measures are known as objectives, and they might be conflicting [79].
19
As a result, there is no one optimal solution; rather, there is a set of potential solutions with varying
degrees of trade-off between the objectives [79, 169]. This leads to a challenge that decision makers face
when determining which solution is to be implemented. Fortunately, many MOO techniques can facilitate
this decision process by adopting Pareto optimality which facilitates exploring and evaluating the set of
best feasible solutions [68, 79]. Generally, MOO techniques aim to find a diverse set of solutions that are
as close as possible to the Pareto-optimal solutions [68, 79].
3.5.1 Pareto optimality
Pareto optimality considers solutions that are not dominated by any other solutions, referred to as non-
dominated or Pareto-optimal solutions [68,79]. These are feasible solutions for which no objective can be
improved without detracting from at least one other objective. MOO techniques aim to find the Pareto-
front, which consists of the Pareto-optimal solution set [68].
As an example, Figure 3.4 demonstrates the available solutions for a repayment activity. The ideal
solution would have the minimum cost and the maximum value, which is equal to the minimum(value)
in Figure 3.4, as the value was negated to improve the intelligibility of the figure. However, since these
objectives are competing, it is impossible to identify one single optimal solution. If S1 and S2 are com-
pared, one cannot decide which solution is superior, as each solution is better in one objective. However,
comparing S2 and S3 reveals that S3 is superior to S2 in both objectives, and one can conclude that S3
dominates S2. Similarly, one can conclude that S5 dominates S4, and it is evident that all other solutions
dominate S6. Nonetheless, it is ambiguous which solution among S1, S3, and S5 is superior to others, as
all of these solutions have a trade-off between value and cost, where a decrease in cost leads to reduced
value. This set of solutions is the Pareto-optimal set. The solid line represents the Pareto-front, and solu-
tions on it are Pareto-optimal. When constraints are present, only solutions that satisfy these constraints
will be considered. In this example, the cost is constrained to 90; hence solutions beyond 90 are infeasible.
Figure 3.4: Pareto-front for repayment solutions
20
3.6 Search-Based Techniques
Search-based techniques aim to find optimal or approximately optimal solutions in a huge solution space
efficiently and intelligently by utilizing fitness functions to guide the search. Generally, a search-based
technique starts by selecting a set of initial solutions. Then a fitness function will be applied to evaluate
the quality of the selected solutions based on the objectives that are optimized for. The obtained fitness
values guide the search direction, as the search will progress in the direction that leads to improved fitness
values until stopping criteria are reached. When the search terminates, the set of solutions with the best
fitness values is outputted [114]. One type of the most commonly used search-based algorithms is genetic
algorithm [113], which will be explained in more detail in the following section.
3.6.1 Genetic algorithm (GA)
A GA is a stochastic search-based optimization technique that is inspired by the process of natural selec-
tion. GAs were first developed at the University of Michigan by John Holland, his colleagues, and his
students in the early 1960s. Since their development, GAs of many forms have been applied to different
optimization techniques to great success [171].
GAs are commonly applied to find approximately optimal solutions to hard optimization and search
problems within a reasonable time [113]. Generally, GAs require a representation of a solution and a
fitness function [171]. Solutions can be represented in many forms, such as binary, integer, real value,
and permutation representations [114, 171]. Typically, the fitness function is derived from the objective or
objectives that the algorithm aims to optimize [114, 167, 171]. Furthermore, in GAs, there is a population
of possible solutions, and each solution is assigned a fitness value based on the developed fitness function.
These individual solutions iteratively evolve using bio-inspired operators until stopping criteria are reached
[114, 167, 171]. The genetic algorithms bio-inspired operators are:
– Selection: Selects solutions (i.e., parents) that contribute to the next generation population [114,
167, 171].
– Crossover: Combines two parents to generate offspring for the next generation population [114,
167, 171].
– Mutation: Applies random changes to solutions to maintain diversity [114, 167, 171].
21
3.6.2 Non-dominated sorting
Non-dominated sorting classifies solutions based on their different non-domination levels [81]. Solutions
with lower levels are considered better than solutions with higher levels. In Figure 3.4, there are three non-
dominated fronts: first (F1), second (F2), and third (F3) fronts. While S1, S3, and S5 belong to the first
non-dominated front, S2 and S4 belong to the second front, and S6 belongs to the third front. Therefore,
S1, S3, and S5 are considered better than other solutions, and S2 and S4 are considered better than S6.
3.6.3 Crowding distance
Crowding distance measures the density of solutions surrounding a particular solution, and a solution with
a higher crowding distance is considered better. For each objective function, the algorithm sorts solutions
based on the objective function value in increasing order. Then it assigns an infinite distance value for
solutions with the smallest and largest function value. The remaining solutions are assigned a distance
value equal to the absolute normalized difference in the objective values of the two adjacent solutions. The
overall crowding distance value of a solution is the sum of individual distance values corresponding to
each objective [81].
3.6.4 Non-dominated sorting genetic algorithm-II (NSGA-II)
The non-dominated sorting genetic algorithm-II (NSGA-II) [81] is one of the most applied MOO algo-
rithms in software engineering due to its effectiveness and efficiency in comparison to other search-based
techniques, such as greedy [70], simulated annealing [210], and SPEA2 [241, 242] algorithms. The algo-
rithm proved its effectiveness in optimizing GUIs energy consumption [144], estimating effort [202], and
solving multiple problems, including the test suite minimization problem [235, 236] and the next release
problem (NRP) [36, 241].
The NSGA-II finds the non-dominated solution in a search space by applying two concepts: elitism
and crowding distance. Elitism guarantees that the elite solutions (i.e., the previously found best solutions)
are preserved and used in subsequent generations, and crowding distance serves as an explicit diversity
preserving mechanism [80].
The NSGA-II starts by generating an initial solution population P
t
of size N. The initial solution pop-
ulation can be generated randomly, or it can also be injected with some non-random solutions [113].
Then the NSGA-II applies the fitness functions and, subsequently, sorts the initial population using non-
dominated sorting and crowding distance. Afterward, the NSGA-II applies the evolutionary operators:
22
selection, crossover, and mutation to create a new offspring population Q
t
of size N. Then the NSGA-II re-
peatedly applies the steps illustrated in Figure 3.5 and explained in Algorithm 1 [81] until stopping criteria
are reached.
Figure 3.5: Non-dominated sorting genetic algorithm-II (NSGA-II) steps [81]
As demonstrated in Algorithm 1, the algorithm combines the parent population P
t
and the offspring
population Q
t
to create population R
t
of size 2N. Afterward, it sorts the combination of population R
t
using non-dominated sorting and starts filling a new population P
t+1
of size N based on front ranks. The
lower front ranks have higher preference, and if a front rank is taken partially, such as F3 in Figure 3.5,
the algorithm will sort the solutions based on crowding distance and select solutions with higher crowding
distance. Afterward, the algorithm creates a new offspring population by applying the evolutionary oper-
ators on P
t+1
. These steps are repeated until the stopping criteria are met. When the stopping criteria are
reached, the algorithm terminates and outputs the solutions in the last generated population. The NSGA-II
complexity is O
mN
2
, where m is the number of objectives, and N is the population size.
Algorithm 1 The NSGA-II main loop [81]
1: while not stopping criteria do
2: R
t
=P
t
[ Q
t
3: F=non-dominated-sort(R
t
)
4: i=1
5: P
t+1
=f
6: whilejP
t+1
j+jF
i
j N do
7: P
t+1
= P
t+1
[ F
i
8: i= i+ 1
9: end while
10: crowding distance sort(F
i
)
11: P
t+1
= P
t+1
[ F
i
[1 :(NjP
t+1
j)]
12: Q
t+1
= create-new-population(P
t+1
)
13: t=t+1
14: end while
23
Chapter 4
A Systematic Literature Review of Technical Debt Prioritization
4.1 Background and Motivation
Secondary studies aim to integrate and synthesize evidence related to a specific topic by reviewing and
analyzing the current primary studies related to this topic [130, 184]. Primary studies, such as empirical
studies, investigate a specific topic [130]. There are two forms of secondary studies, which are systematic
mapping review (SMR) and systematic literature review (SLR) [130, 184].
SMR studies aim to provide a coarse-grained overview of a topic by performing a broad review of
primary studies on that specific topic. This form of secondary studies has a broad research question (i.e.,
what studies have been conducted). The question intends to identify what evidence is available and pro-
vides a structure of the type of research reports and results that have been published through categorizing
the current primary studies and often providing a visual summary (i.e., map) of its results. This form of
secondary studies does not aggregate nor analyze the related primary studies. Rather, it categorizes and
clusters related primary studies. SMR studies are often conducted as a form of preliminary studies to
assess whether there is a scope for an SLR. [130, 184].
SLR studies are means of identifying, evaluating, interpreting, and summarizing all available research
relevant to a specific phenomenon, research question, or topic area of interest. SLR studies apply a well-
defined methodology to identify, analyze, interpret, and summarize all available evidence related to the
subject under study in an unbiased and repeatable manner. The resulting summary aids in identifying gaps
in the current research literature and points to areas that need further research to position new research
activities appropriately [130].
The software engineering community has put in a great effort into understanding the current status of
technical debt (TD) and technical debt management (TDM). However, the existing literature relating to TD
lacks a study that summarizes the current literature regarding TD prioritization. The lack of a synthesized
24
reference on the currently available TD prioritization approaches poses an issue for the software engineer-
ing community. Prospective users of TD prioritization approaches may face difficulties when searching for
an appropriate TD prioritization approach and evaluating the identified approaches for fitness of purpose
and applicability. Additionally, the lack of a reference on TD prioritization may hinder the software engi-
neering community when developing new approaches. Having such a reference may identify gaps that the
software engineering community can contribute to fulfilling, as the reference may allow the community to
take advantage of new opportunities to apply new techniques to TD prioritization. Additionally, the review
can highlight a few of the limitations within the current TD prioritization approaches, which individuals
can seek to avoid when developing new approaches.
To fill this shortage in the TD literature, this chapter aims to analyze the existing TD prioritization
approaches through the means of an SLR [130]. Specifically, this SLR aims to identify studies that propose
TD prioritization approaches, summarize and evaluate such approaches, and synthesize the findings. The
analysis involves identifying the steps required in each approach, the type of TD addressed, the decision
factor categories considered, the essential artifacts, the type of human involvement required, and how
and in which setting each approach was evaluated. Additionally, the analysis identifies the prioritization
techniques that these approaches are based on and point out some of their limitations.
4.2 Research Method
This SLR is conducted based on the guidelines of [72, 130, 230, 240]. The SLR’s steps are detailed below.
4.2.1 Research questions
This section presents the research questions that were formulated to achieve the goal of this SLR. The
research questions are as follows:
RQ1: What are the current TD prioritization approaches?
RQ1.1: Which types of TD are addressed by the TD prioritization approaches?
RQ1.2: What decision factor categories are the TD prioritization approaches based on?
RQ1.3: What software artifacts do the TD prioritization approaches depend on?
RQ1.4: What type of human involvement is required in the TD prioritization approaches?
RQ1.5: How are the current TD prioritization approaches evaluated?
RQ2: What prioritization techniques are utilized by the identified TD prioritization approaches?
RQ2.1: What are the limitations of the identified prioritization techniques?
25
4.2.2 Search strategy
The search strategy of this SLR is a combination of a manual and automated search. The concept of
a quasi-gold standard was employed to assess the completeness of the search, as proposed by Zhang et
al. [240]. This dual search method improves the rigor of the search process [240]. The search strategy
for both the manual and automated search was constrained to published literature from 1992 to 2018. The
search strategy is constrained to this range because the TD metaphor was only introduced by Cunningham
in 1992 [73] and the search was conducted in early 2019. The International Workshop on Managing
Technical Debt (MTD) and the International Conference on Technical Debt (TechDebt) were selected as
basic venues for the manual search since both venues are specializing in TD. Beginning in 2010, the
MTD workshop series has been a specialized congregation of researchers and practitioners with the aim
of discussing TD and TDM. In 2017, the workshop graduated into a conference, TechDebt conference, to
accelerate progress on studying TD, and the first TechDebt conference was held in 2018.
The manual search and the application of the selection criteria, which will be defined in the following
section, resulted in a set of 10 studies that constitutes the quasi-gold standard. The completeness of the
search was evaluated by comparing the primary studies in the quasi-gold standard set against the set of
studies identified by the automated search in the same venues considered in the manual search. This com-
parison was carried out through the application of sensitivity and precision metrics [130, 240]. Sensitivity
is the ratio of the relevant studies found by the automated search to the set of relevant studies found by
the manual search. Precision is the ratio of the relevant studies found by the automated search to the total
number of studies retrieved by the automated search. The two metrics are calculated as follows:
Sensitivity=
Number o f relevant studies f ound byautomated search
Number o f relevant studies f ound bymanual search
100
Precision=
Number o f relevant studies f ound byautomated search
Number o f studiesretrieved byautomated search
100
To select the search string, a subjective approach was followed [240]. The search string was derived
through trial experiments, and the candidate search string was evaluated against the quasi-gold standard.
The experiment was begun using the term “debt”. However, searching using the term “debt” produced
an overwhelming number of results, and the majority of these results reference “debt” in a financial or
economic context. Consequently, the search string was instead tailored to the term “technical debt”.
To evaluate the performance of the “technical debt” search string, an automated search was conducted
on the two venues that were used in the manual search: MTD and TechDebt. From the automated search,
a total number of 129 studies were retrieved. From these 129 studies, a total number of 10 relevant
studies were identified by applying the selection criteria, which will be described in the following section.
26
The sensitivity and precision of the search string were calculated. The “technical debt” search string
resulted in 100% sensitivity and 7.75% precision. Though an ideal search string has 100% precision and
100% sensitivity, obtaining absolute values is challenging due to a trade-off between the two metrics.
A highly sensitive search string will retrieve the majority of the relevant studies, but also may retrieve
many irrelevant studies. In contrast, a highly precise search string will guarantee the retrieval of a smaller
number of irrelevant studies, but it might miss a large portion of the relevant ones. In an SLR context,
a high sensitivity search string is more desirable than a high precision one [240]. Therefore, “technical
debt” was selected as the search string for this SLR since it achieved a perfect sensitivity (i.e., 100%) by
retrieving all the studies in the quasi-gold standard.
Following the recommendations of [130, 240], the automated search elicited resources are as follows:
ACM Digital Library, Google Scholar, IEEE Xplore, Inspec, ScienceDirect, Scopus, Springer, and Web
of Science. For Google Scholar, only the top 1,000 results were included. The automated search was
performed using the “technical debt” search string in the eight venues listed above. This resulted in a total
number of 2,613 studies.
4.2.3 Study selection
This section summarizes the selection criteria that were established to identify relevant studies. It also
describes the procedure that was followed to apply these criteria.
Selection criteria
The selection criteria were derived from previous secondary studies to ensure following the standards of the
TD research community [26, 29, 41, 42, 95, 141, 191]. The inclusion criteria was limited to one criterion:
A study must propose a TD prioritization approach. A study will be disqualified if it meets any of the
following exclusion criteria.
– Studies that are published in a language other than English.
– Studies that were inaccessible (i.e., the full text was not available).
– Non-paper submissions, such as keynotes, tutorials, posters, abstract-only, panels, presentations,
books, books reviews, non-peer-reviewed, or entire volumes of proceedings (i.e., relevant indi-
vidual papers from volumes were included).
– Duplicates of an included study; the most comprehensive version reporting the study was se-
lected.
27
– Studies published before 1992 or after 2018.
– Studies that propose approaches for prioritization outside the context of TD.
– Studies that decide between repaying TD items or implementing new features.
– Studies that prioritize the refactoring steps to repay a given TD item.
– Studies that categorize TD items based on their types or relevancy to a specific software attribute.
– Studies that only quantify TD items and do not provide guidance for the prioritization process.
Selection procedure
The selection procedure is summarized in Figure 4.1. Omitting duplicates resulted in a set of 1,550 studies
that were independently reviewed by two reviewers (i.e., I conducted the review in collaboration with
a computer science master’s student). The review process involved reading the title and the abstract of
each study to determine its relevance to the subject matter. The reading step was followed by a joint
meeting between the two reviewers in which the results were discussed. Disagreements were resolved by
independently reading the entirety of studies under disagreement and, subsequently, holding a discussion
to compare the results. If the disagreement persists, a conservative approach was followed by including the
study under disagreement. These steps resulted in a set of 27 studies that were fully read independently
by three reviewers (i.e., another computer science master’s student contributed to the review process).
Afterward, group discussions among the three reviewers were held to discuss the results. The discussions
resulted in the exclusion of seven studies due to a lack of well defined TD prioritization approaches in such
studies. The remaining 20 studies were passed to the snowballing (i.e., citation analysis) step. Forward and
backward snowballing were conducted on the selected set of studies while following Wohlin’s guidelines
[230]:
– Backwards snowballing: Analyzing the set of studies that were cited by a specific study to
identify new relevant studies [230].
– Forwards snowballing: Analyzing the set of studies that cited a specific study to identify new
relevant studies [230].
The aforementioned selection procedure was repeated to determine the relevancy of studies during the
snowballing process. Two iterations of snowballing were conducted. The first iteration resulted in three
studies (i.e., two from forward snowballing and one from backward snowballing). Performing snowballing
on the studies that resulted from the first iteration of snowballing did not result in any new studies. Conse-
quently, the search was concluded with a total of 23 studies.
28
Full study
reading
Snowballing
Data
extraction and
synthesis
Selection
criteria
Duplicate
removal
2,613 1,550 27 20 23
Figure 4.1: Selection procedure summary
4.2.4 Data extraction and synthesis
This section describes the procedure that was followed to extract and validate the data from the selected,
primary studies. Additionally, it summarizes the strategy of synthesizing the extracted data through sum-
marizing, integrating, and comparing the findings [130].
The data extraction procedure involved the three reviewers independently reading the full studies and
tabulating in a spreadsheet how each study addresses each of the research questions. Data was validated
by conducting joint meetings to discuss the results. When comparing the reviewers’ answers for each of
the research questions, no significant conflicts arose among the answers. This was due to the fact that
most of the included studies thoroughly describe the TD prioritization approaches and explicitly mention
their TD types, dependencies, evaluation methods, and techniques. However, there was a discrepancy in
the level of abstraction when summarizing the approaches. The discrepancy was resolved by taking the
shortest summary of each approach and, subsequently, asking two computer science masters’ students to
read these summaries. If there were any ambiguities present in the amassed summaries, then the missing
information and clarifications were added as necessary. A thematic analysis was employed to identify,
analyze, and report patterns within the included studies, following Cruzes and Dyb ˚ a’s guidelines [72].
Specifically, an integrative approach was applied to categorize the findings. Both a start list of categories
(i.e., deductive approach) and the development of new categories along the way (i.e., inductive approach)
were utilized. The initial list of categories was derived from the study’s research questions. Additionally,
visual representations were used to illustrate and summarize the findings [170].
4.3 Results
This section presents and discusses the findings of this SLR. The results are based on the set of 23 studies
resulting from the previously mentioned search and selection steps. An overview of the included studies is
presented, and the research questions are addressed. Table 4.1 summarizes the 23 studies that satisfied the
selection criteria. Additionally, Table 4.2 presents the venues in which these studies were published.
29
Table 4.1: Selected studies and their respective reference numbers
ID Author (year) Reference
S1 Schmid (2013) [203]
S2 Guo & Seaman (2011) [109]
S3 Ribeiro et al. (2017) [190]
S4 Almeida et al. (2018) [76]
S5 Sae-Lim et al. (2018) [200]
S6 Snipes et al. (2012) [211]
S7 Pl¨ osch et al. (2018) [185]
S8 Guimaraes et al. (2018) [108]
S9 Guo et al. (2016) [111]
S10 Aldaeej & Seaman (2018) [24]
S11 Vidal et al. (2015) [227]
S12 Letouzey & Ilkiewicz (2012) [140]
S13 Choudhary & Singh (2016) [61]
S14 Mensah et al. (2018) [164]
S15 Tornhill (2018) [220]
S16 Zazworka et al. (2011) [237]
S17 Albarak & Bahsoon (2018) [23]
S18 Codabux & Williams (2016) [67]
S19 Akbarinasaji (2015) [19]
S20 Fontana et al. (2015) [99]
S21 Harun & Lichter (2015) [115]
S22 Abad & Ruhe (2015) [17]
S23 Seaman et al. (2012) [205]
Table 4.2: TD venues and their corresponding study numbers
Venues Studies
Asia-Pacific Software Engineering Conference (APSEC) S13
Empirical Software Engineering Journal S9
Euromicro Conference on Software Engineering and Advanced Applications (SEAA) S3
IEEE Software Journal S12
International ACM Sigsoft conference on the Quality of software architectures (QoSA) S1
International Conference of the Chilean Computer Science Society (SCCC) S11
International Conference on Software Engineering Advances (ICSEA) S21
International Conference on Software Maintenance and Evolution (ICSME) S4
International Conference on Technical Debt (TechDebt) S7, S10, S15, S17
International Doctoral Symposium on Empirical Software Engineering (IDoESE) S19
International Requirements Engineering Conference (RE) S22
International Workshop on Managing Technical Debt (MTD) S2, S6, S16, S18, S20, S23
Journal of Software: Evolution and Process (J. Softw.: Evol. Process) S5
Journal of Systems and Software (J. Syst. Softw.) S14
Software: Practice and Experience Journal S8
Figure 4.2 illustrates the distribution of the studies over the last few years. As the figure demonstrates,
the interest toward TD prioritization prior to 2012 was little to none, but has significantly increased in
recent years. The first study that proposed a TD prioritization approach was only published as recently as
2011. Additionally, upon inspection of the obtained studies, 34.78% of the identified studies were found to
be published in 2018. In regards to venues, MTD is the venue in which most of the studies that proposed
a TD prioritization approach were published, with a total of six studies. TechDebt came in second, with
four studies being published there.
30
Number of studies
0
2
4
6
8
Year
2011 2012 2013 2014 2015 2016 2017 2018
8
1
3
5
0
1
3
2
Figure 4.2: Breakdown of studies based on publication year
RQ1: What are the current TD prioritization approaches?
In the context of this SLR, a TD prioritization approach refers to a systematic methodology that is based
on a prioritization technique with the intent of prioritizing a group of TD items for repayment revolving
around a set of predefined goals. A total of 24 TD prioritization approaches were identified in the 23
selected studies. This imbalance in the number of approaches identified and the number of selected studies
is due to study S23, which identifies four unique approaches. One of these approaches has been discussed
in more detail in study S9, which is a previous study by the authors. Another approach from these four
approaches was detailed in study S2, a more recent study by the authors. Therefore, these duplicate
approaches were only included once in the total count of TD prioritization approaches. For each study,
the proposed TD prioritization approach was identified, and the specific process of each TD prioritization
approach is summarized in Table 4.3.
Table 4.3: Summary of TD prioritization approaches
Study Process
S1 1. Identify an optimal system using expert knowledge and the evolution steps required to achieve the optimal system
2. Identify the TD items and the cost to repay each TD item
3. Assign probabilities to each evolution step for the likelihood of including each step in the current iteration
4. Calculate expected savings from addressing each TD item for the full evolution sequence
5. Subtract the expected savings from each TD item’s cost and sort the TD items from largest to smallest based on
the result
6. Repay a given TD item if the expected savings exceed the calculated cost
S2 1. Identify TD items and their respective principal, expected interest, interest standard deviation, and correlations
with other items
2. Select a component X, which will be significantly worked on in the upcoming release
3. Extract TD items that are associated with component X
4. Adjust estimates for the extracted TD items based on the current release plan
5. Set constraints for the selected portfolio model and a preferred risk level
6. Run the model to generate the optimal portfolio and use the results for prioritization
31
Study Process
S3 1. Identify TD items
2. Identify decision criteria
3. Assign weights to each criterion
4. Evaluate each TD item based on the decision criteria using the weighted sum model (WSM)
5. Sort TD items based on the evaluation score and prioritize based on the sorted list
S4 1. Identify TD items
2. Identify software and/or infrastructure configuration items that are related to each TD item
3. Identify and model business processes based on the identified configuration items
4. Identify and prioritize business process activities based on urgency and criticality by consulting business stake-
holders
5. Use a business perspective to prioritize TD items
S5 1. Obtain a list of change information from an issue tracking system
2. Perform impact analysis on the system to identify modules that will most likely be affected by the change infor-
mation, and calculate the impact probability score of each module
3. Identify TD items in the form of code smells
4. For each TD item, calculate its context relevance index (CRI) value by taking the summation of the impact proba-
bility scores of the modules, which match the given TD item
5. Prioritize TD items by their CRI value, with the larger values having higher priority
S6 1. Identify TD items as defects present in a given system
2. For each TD item, calculate its principal (P), interest cost of defining a workaround (Iw), interest cost for customer
support of the workaround (Ic), batch cost (Ip), probability of a requesting a batch (Ipr), and probability of fixing
the defect (Ifr)
3. Compute the following cost-benefit ratio:
P
Iw+Ic+I pI pr+PI f r
4. Prioritize TD items based on the cost-benefit ratio defined above
S7 1. Identify design best practices for a given system
2. Identify TD as violations of the identified design best practices
3. Calculate a quality index for each violation by assessing each violation using a benchmarking approach
4. Assess the importance of each design best practice
5. Map the design best practices into a portfolio matrix using the quality index and the importance assessment
6. Create a quality design model to assign the design best practices to product parts
7. Assign remediation costs to each product part
8. Define an improvement strategy
9. Considering the improvement strategy, business needs, remediation cost, and best practices in the portfolio matrix,
prioritize design best practice violations
S8 Using JSpIRIT and expert knowledge:
1. Identify TD items in the form of code smells
2. For each TD item:
(a) Calculate its component criterion by determining the set of classes affected by the given TD item
(b) Count the number of components of blueprints affected by the number of classes and normalize the resulting
score values
(c) Calculate the concern criterion using expert knowledge to identify the relationship between a given TD item
and the architectural concern provided by the architectural blueprint
(d) Count the number of concerns the TD item affects and normalize the score
(e) Calculate the scenario criterion by taking the sum of the importance value of the identified related scenarios
and normalizing their values
(f) Apply the voting criterion by defining individual thresholds for each of the criteria provided, and if the nor-
malized score of a TD item is above the threshold, consider the TD item as potentially critical
(g) If a TD item is considered potentially critical by more than one criterion, then take the average of those criteria
3. Prioritize the TD items based on their voting criterion value
32
Study Process
S9 1. Identify TD items
2. Estimate cost, which is based on principal, and value, which is based on the multiplication of interest amount and
interest probability, for each TD item
3. Categorize each TD item based on its value estimate of high/medium/low
4. Select a specific software component and extract a related list of TD items
5. Reevaluate the original high/medium/low value estimates for extracted items
6. Restrict extracted list of TD items to one of only high value items
7. Compare each item’s cost and benefit in the restricted list and eliminate items of whose cost does not outweigh
their value
8. Add up the remaining items’ estimated cost, and if the total cost can not be observed in the current release, a
decision should be made to eliminate some of the TD items from the current repayment
S10 1. Identify TD items in the form of defect TD
2. Identify the state of the software by calculating its degree of maintenance, which is calculated by dividing the
accumulation of code churn of defect fixing activities by the initial software size
3. Study historical data to identify the major release that has the largest code churn
4. Divide historical data into two parts, historical data (before the major release) and testing data (after the major
release)
5. Use testing data to apply the Markov chain model to simulate decisions at each release planning (RP)
6. At each RP, use the testing data up to the given RP time to determine the software state
7. At each RP, apply the model to estimate defect TD, principal, and interest
8. Use the principal and interest as inputs for real options analysis (ROA) to prioritize defect TD
S11 1. Identify TD items
2. Use JSpIRIT’s predefined criteria and/or practitioner’s added criteria
3. Assign weights to each criterion
4. A score for each TD item is calculated using WSM, which can be altered by the practitioner
S12 1. Using expert knowledge, create a quality model by making a list of non-functional requirements that define the
“right” code
2. Using expert knowledge, develop an estimation model that estimates the remediation cost to refactor non-compliant
code
3. Map each requirement to a software quality assessment based on lifecycle expectations (SQALE) quality charac-
teristic that would be affected in the case of a requirement violation
4. Follow the SQALE indicator pyramid to prioritize TD repayment from the bottom to the top
S13 1. Analyze historical data to identify frequently refactored classes and categorize such classes as refactoring prone
classes
2. From the identified refactoring prone classes, select classes containing code smells that directly relate to architec-
tural problems and categorize them as TD items
3. For each TD class, calculate its class score using the frequency score of a class(F), severity score of a class(S),
severity score of a particular code smell S(x
i
), and the number of instances a code smell is present in a given class
I(x
i
) as follows: ClassScore= F SS(S(x
i
) I(x
i
))
4. Prioritize TD classes based on the class score in decreasing order
S14 1. Identify self-admitted technical debt (SATD) items
2. Categorize each SATD item as a major or minor task based on urgency, seriousness, and significance
3. Analyze the gap between a given task’s significance and complexity
4. Categorize the tasks as expected or expedited tasks
5. Categorize tasks as vital few or trivial many
6. Identify plausible causes of key (i.e., buggy prone) SATD item tasks and calculate the rework effort for each task
7. Use the rework effort estimation metric as a guide in the prioritization process
S15 Using CodeScene:
1. Identify TD items as refactoring candidate files
2. Using a machine learning algorithm based on technical and social factors, prioritize and visualize the candidate
files
3. Using complexity trend analysis, determine whether the candidate files will continue to degrade in code quality
4. Using X-Ray analysis, prioritize individual functions inside the candidate files
33
Study Process
S16 1. Identify TD items in the form of god classes
2. Calculate the WMC, ATC, and ATFD metrics for each god class and thresholds, as proposed by Marinescu [153]
3. For each class and each metric, assign a rank (the closer a class is to a threshold, the lower its rank)
4. Calculate the sums of the three given ranks (i.e., one for each metric) for each god class
5. Rank the god classes based on the calculated total sum, which indicates the cost of repaying each god class
6. Calculate the change likelihood and the defect likelihood for each god class
7. For each god class and each likelihood, assign a separate rank
8. For each god class, calculate the sums of the two obtained ranks
9. Create a separate ranking for the god classes based on the calculated total sums of the likelihoods, which indicates
the benefits of repaying each god class
10. Compute a final ranking for each god class by computing a profitability measure by subtracting the cost from the
value
S17 1. Identify TD tables (i.e., tables below the fourth normal form)
2. Using a database monitoring system, determine the growth rate of TD tables
3. For TD tables with high growth rate, calculate their I/O cost
4. For tables with high growth rate, calculate their values of the portfolio model variables (expected return=
1
(I=Ocost)
and risk= tablegrowthrate)
5. Run the model on the data to produce the optimal portfolio of the TD tables
S18 1. Identify TD items in the form of classes that are defect and change prone
2. For each identified class, extract class level metrics for defect and change proneness
3. Using a Bayesian based prediction model, determine the TD proneness of each class
4. Using a predefined classification scheme, categorize classes based on the TD proneness probability into high,
medium, or low
5. Apply analytic hierarchy process (AHP) to generate a prioritized list of defect and change prone classes based on
the prediction model’s result and a set of predefined prioritization criteria
S19 1. Identify TD in the form of defects
2. Identify each TD principal as the difference between the defect reporting time and the resolving time
3. For each TD item, calculate its interest amount as(real f ixingtime principal) severity
4. Formalize the TD problem as a reinforcement learning problem as follows:
(a) The goal is to eliminate all TD items or maximize the amount of saved interest
(b) The agents are the developers
(c) The states are points in time in the total time remaining before the upcoming release
(d) The set of actions is whether to repay TD (x) in the upcoming release or not
(e) The reward equals the amount of saved interest resulting from paying TD (x)
5. Apply reinforcement learning, where the output policy is the prioritization of TD items
S20 1. Set a threshold on chosen metrics for detecting TD items in the form of code smells and identify TD items
2. Assign an Intensity Value (IV) to each metric used in the TD detection strategy
3. Calculate the Intensity Index (II) of each TD item by taking the average of each metric’s IV
4. Calculate the exceeding ratio for each metric by dividing the metric value by the threshold value for each metric
5. Sum the ratios of each metric and round the value to obtain an index representing the II
6. Prioritize based on the II and refer to the exceeding ratio-based index if ties occur as an additional criterion
S21 1. Identify TD items
2. Calculate the impact, defect, and change likelihood for each TD item
3. Calculate total benefit by multiplying the raw benefit (i.e., saved future effort) with the characteristics mentioned
above
4. Calculate return on investment (ROI) of each TD item by dividing benefit by cost and prioritize TD items based on
ROI
S22 1. Identify TD items in the form of requirement TD
2. Formulate the problem as a binomial valuation model as follows:
(a) For each TD item, assign it an initial value in terms of its product value in product-driven software develop-
ment or its market payoff in market-driven software development
(b) Calculate the potential increase and decrease of the initial value to form the upward and downward branches
(c) For every terminal node, calculate its net value
(d) Recursively fold back the tree to calculate the present value of each node
(e) Calculate the net present value of each node by subtracting the given option’s exercise cost from the option’s
present value
3. Prioritize the TD items based on the output of the model
34
Study Process
S23:
S23.1 1. Identify TD items
2. Identify the criteria related to each TD item (e.g., principal and interest)
3. Apply AHP by:
(a) Assigning weights and scales to each criteria based on a series of pairwise comparisons
(b) Performing a series of pairwise comparisons between the alternatives against the various criteria
S23.2 1. Identify TD items and represent them as investment decisions
2. Identify the short term cost of each decision
3. Identify uncertain long term benefits of each decision
4. Apply options theory to determine the optimal timing of paying off each TD item
RQ1.1: Which types of TD are addressed by the TD prioritization approaches?
The identified TD prioritization approaches vary in terms of the TD types they address. Of the identi-
fied approaches, 70.83% address a specific type of TD, and 29.17% are general, which means they are
applicable to any type of TD. The various types of TD are defined in Table 4.4.
Table 4.4: TD type definitions
Type Definition
Architectural TD TD identified through the means of discovering issues in a given system’s architecture [26]
Code TD TD identified by revealing issues in the source code that might affect its legibility and main-
tainability [26, 99]
Database normalization TD TD identified in a database design through discovering violations of the optimal database
design best practices [22]
Defect TD TD associated with defects that are reported on bug tracking systems or revealed through
testing activities [26, 204]
Design TD TD identified through analyzing the source code and identifying violations of the best design
principles [26, 30]
Requirement TD TD identified during the requirement engineering process due to a trade-off in deciding which
requirements to implement or how requirements should be implemented [17, 26]
Self-admitted TD TD associated with incomplete or temporary fixes, which are intentionally committed and
admitted during software development [164, 186]
As Table 4.5 summarizes and Figure 4.3 displays, 33.33% of the approaches address the prioritization
of code TD. Moreover, 16.67% of the approaches can be applied to design TD, and 12.5% address defect
TD. The remaining four TD types, which are SATD, database normalization TD, requirement TD, and ar-
chitectural TD, were each respectively only found to have been addressed by one approach. The summary
above may be helpful for practitioners when deciding which TD prioritization approach to use based on
the TD type at hand. Additionally, the software engineering community can use this summary to assess
the applicability of one of the approaches to a different type of TD when developing new TD prioritization
approaches.
35
Code TD
General TD
Design TD
Defect TD
Architectural TD
Database normalization TD
Requirement TD
Self-admitted TD
Number of approaches
0 2 4 6 8
Figure 4.3: Breakdown of approaches based on TD type addressed
RQ1.2: What decision factor categories are the TD prioritization approaches based
on?
There is a multitude of decision factors to consider when deciding whether to repay a TD item or delay it
to a next release. Ribeiro et al. [191] identified 14 unique decision factors, which include TD’s severity,
TD’s impact on a customer, TD’s cost to repay, etc., that influence the decision of TD repayment. In this
study, these decision factors were classified into three “decision factor categories,” which are: value, cost,
and a resource constraint. A TD prioritization approach may utilize one of the defined decision factor
categories or a combination of them. Table 4.5 displays which decision factor categories are considered
in each approach, where filled circles represent the categories considered in a given study. The review of
the literature revealed that all the identified TD prioritization approaches consider value. However, only
70.83% of the approaches consider cost as an essential category in the prioritization process, and only a
mere 16.67% of the approaches consider a resource constraint in the prioritization process.
As Figure 4.4 illustrates, 29.17% of the approaches rely solely on value, 54.17% of the approaches
employ a combination of value and cost, and only 16.67% of the approaches utilize a combination of value,
cost, and a resource constraint. The level of guidelines regarding value and cost estimations varied among
the approaches. Some of the approaches provided exact methods for estimating value and cost, while others
suggested several methods that could be chosen at the practitioner’s discretion. On the contrary, some
approaches did not provide any method nor suggestions and assumed that these estimates are precalculated.
The lack of concise methods to estimate value and cost of TD items suggests that the value and cost of a TD
item are context-dependent, which was also noted in several previous studies [26,29,42,95,141,191,207].
Table 4.5 summarizes the estimation methods employed by each approach, where suggestions are denoted
in parentheses.
36
16.67%
54.17%
29.17%
Value Value and cost Value, cost, and a resources constraint
Figure 4.4: Breakdown of approaches based on decision factor categories utilized
As illustrated in Figure 4.5, the majority of the approaches (i.e., 70.83%) provided value estimation
guidelines. A few of the TD prioritization approaches (i.e., 12.5 %) did not specify a value estimation
method but suggested several methods that a practitioner can utilize. Four (i.e., 16.67 %) of the approaches
did not provide a specific value estimation method nor any suggestions. It should be noted that study S18
partially estimated the value and suggested other complementary estimation methods. Hence, it was only
included in the set of studies that provided suggestions.
As Figure 4.6 presents, of the approaches that specified exact value estimation methods, 35.29% rely
on calculation models, 23.53% utilize prediction models, and 17.65% depend on expert knowledge to
estimate value. Additionally, 23.53% of the approaches utilize a combination of calculation models and
expert knowledge in their value estimation.
Figure 4.5: Breakdown of approaches based on
TD value estimation
Figure 4.6: Breakdown of approaches based on
TD value estimation method
37
As demonstrated in Figure 4.7, of the approaches that considered cost as a decision factor category,
47.06% specified how to estimate cost, and 17.65 % did not provide what specific method to utilize when
estimating cost and only provided some suggestions. On the contrary, 35.29% of these approaches did not
provide any guidelines regarding cost estimation.
Of the approaches that specified how to estimate cost, the most applied cost estimation methods are as
follows: expert knowledge (37.5 %), a combination of calculation models and expert knowledge (25.0 %),
calculation models (12.5 %), prediction models (12.5 %), and a combination of calculation and prediction
models (12.5 %), as displayed in Figure 4.8.
Figure 4.7: Breakdown of approaches based on
TD cost estimation
Figure 4.8: Breakdown of approaches based on
TD cost estimation method
38
Table 4.5: TD types addressed, decision factor categories considered, value estimation methods, and
cost estimation methods for each approach
Study Type Value Cost Resource constraint Value estimation (suggestions) Cost estimation (suggestions)
S1 General Not specified Not specified
S2 General Not specified (expert knowledge,
historical data, or program analy-
sis)
Not specified (expert knowledge,
historical data, or program analy-
sis)
S3 General Expert knowledge Expert knowledge
S4 General Expert knowledge N/A
S5 Code TD Calculation model based on im-
pact analysis using information re-
trieval, mining software repository,
or dynamic analysis
N/A
S6 Defect TD Expert knowledge Expert knowledge
S7 Design TD A combination of calculation
model and expert knowledge
Expert knowledge
S8 Code TD V oting criterion based on compo-
nent, concern, and scenario criteria
calculated using expert knowledge
and JSpIRIT
N/A
S9 General Not specified (any value estimation
method, such as expert knowledge)
Not specified (any cost estima-
tion method, such as cost estima-
tion model or expert knowledge)
S10 Defect TD Prediction model using historical
data
Prediction model using historical
data
S11 Code TD JSpIRIT predefined criteria or
practitioner’s criteria
Not specified
S12 Code TD Predefined quality characteristic
based on expert knowledge
Calculation model based on ex-
pert knowledge
S13 Architectural TD Calculation model based on histor-
ical data, architecture design, and
severity of the class
N/A
S14 Self-admitted TD Prediction model Calculation model based on the
prediction model result
S15 Code TD Prediction model based on techni-
cal and social factors
N/A
S16 Design TD Calculation model based on code
repository and issue tracking sys-
tem data
Calculation model based on the
comparison of the calculated
metrics to their respective thresh-
olds
S17 Database normalization TD Calculation model based on risk
of data inconsistency and expected
return from decreasing I/O cost
N/A
S18 Code TD and design TD Prediction model result in addition
to suggesting other potential cri-
teria (expert knowledge, historical
data, or dependency analysis)
Not specified (dependency analy-
sis, expert knowledge, or histori-
cal data)
S19 Defect TD Not specified Not specified
S20 Code TD Calculation model based on prede-
fined software metrics
N/A
S21 Code TD and design TD Calculation model based on
change likelihood using historical
data, defect likelihood using his-
torical data, expert knowledge, and
impact using dependency analysis
Calculation model using Sonar-
Qube, which is based on expert
knowledge
S22 Requirement TD Historical data Not specified
S23:
S23.1 General Not specified Not specified
S23.2 General Not specified Not specified
39
RQ1.3: What software artifacts do the TD prioritization approaches depend on?
Software artifacts provide a wealth of information that can aid in TD prioritization [108,200]. This research
question aims to identify the software artifacts that are essential to each TD prioritization approach. Infor-
mation on software artifact dependencies can aid prospective users identify which approaches are suitable
for their individual needs and available resources. In this SLR context, a software artifact is considered
indispensable to an approach if the approach will not work without the presence of the given software
artifact. For example, assume that a study’s TD prioritization approach requires each TD item’s value
and that the study suggested using an issue tracking system or a static analysis tool to estimate the value,
opposed to specifying a single method to obtain the value of each TD item. In this case, it is concluded
that the study’s TD prioritization approach does not have any software artifact dependencies, as any value
estimation method will suffice. However, assume now that a given study’s approach requires each TD
item’s cost and that the study explicitly specifies that historical releases data is used to compute the cost of
each TD item. Then historical releases data would be considered as an essential software artifact of which
the provided approach requires to be able to prioritize TD items.
Table 4.7 summarizes the software artifact dependencies of each TD prioritization approach. Many
of the approaches (i.e., 66.67%) depend on at least one type of software artifact, with some relying on
multiple artifacts. As Figure 4.9 illustrates, 58.33% of the 24 approaches rely on a system’s source code
to prioritize TD. Issue tracking systems are the second most common software artifact dependency, with
25% of the total TD prioritization approaches requiring it to work as intended. Additionally, 12.5% of
the approaches make use of historical releases data. Code repository and multiple releases data are core
requirements in 12.5% and 8.33% of the approaches, respectively. Furthermore, one approach requires the
use of architectural blueprints, and another approach requires database monitoring system.
Source code
Issue tracking system
Code repository
Historical releases data
Multiple releases
Architecture blueprints
Database monitoring system
Number of approaches
0 2 4 6 8 10 12 14
Figure 4.9: Breakdown of software artifacts based on utilization of approaches
40
It is important to note that some code repositories are capable of tracking and recording a portion of
a given system’s historical releases data, such as the amount of lines of code (LOC) changed by each
software practitioner. However, historical releases data encapsulates a wider range of data, such as release
plans and teams. Therefore, these two software artifacts are considered as separate entities.
RQ1.4: What type of human involvement is required in the TD prioritization approaches?
Human involvement can provide invaluable insights in the process of TD prioritization [108, 111, 200].
Human involvement may be essential to the addition of necessary information which can be overlooked
or impossible to measure automatically, such as understanding the business value of a TD item [111].
However, human involvement can also be burdening, tedious, and unnecessary in some situations, such
as identifying the dependencies of a TD item [108, 200]. As the number of TD items increases, the effort
required to prioritize these items increases as well. Using a prioritization approach should not take longer
than the time required to repay TD items. Human involvement should be minimized as much as possible
to lower the workload on the associated software practitioners.
A few of the TD prioritization approaches identified in this SLR developed automated means to obtain
TD item information and aid in the prioritization process. However, other approaches demand human
involvement to determine TD item priorities. As Table 4.7 demonstrates, the approaches were categorized
based on the level of human involvement as follows:
– None: The approach is fully-automated and does not require any human intervention.
– Minor: The approach is semi-automated and only requires human involvement during the setup phase,
such as defining thresholds and assigning weights to various criteria. Once the setup is complete, human
involvement is no longer required, and the prioritization of TD can occur without human involvement.
– Major: The approach depends on a human’s continual presence to carry out the prioritization process.
If a human is absent, the approach will not work as intended.
As Table 4.7 summarizes and Figure 4.10 demonstrates, most of the approaches (i.e., 41.67%) are fully-
automated. Minor human involvement is required by 29.17% of the approaches. Similarly, 29.17% of the
approaches require continual human participation during the prioritization process.
41
41.67%
29.17%
29.17%
Major Minor None
Figure 4.10: Breakdown of approaches based on required human involvement level
RQ1.5: How are the current TD prioritization approaches evaluated?
Evaluation is the process of measuring the effectiveness or performance of an approach in achieving its
goals [64]. TD prioritization approaches might be challenging to evaluate objectively due to the difficulty
that lies within the replication of settings and the attempt to compare the effectiveness of a proposed ap-
proach to another [26,95,203]. Nonetheless, each identified evaluation method and its respective definition
are presented in Table 4.6. Moreover, Table 4.7 displays the evaluation method of each approach and the
evaluation method’s respective setting, in which said evaluation was conducted, in parentheses. As table
4.7 illustrates, 62.5% of the identified TD prioritization approaches were evaluated.
Table 4.6: Evaluation method definitions
Evaluation method Definition
Case study An intensive study that involves a single unit and aims to generalize it across a larger set of units [105]
Experiment A study where conditions are under direct control of the investigator to test the effectiveness of an approach. [172, 209]
Feasibility study A small study to assess the practicality of a proposed approach. It aims to objectively evaluate an approach by revealing its
weaknesses, strengths, and required resources [32]
Figure 4.11 demonstrates that 37.5% of the TD prioritization approaches were evaluated using a case
study. Additionally, 20.83% of the approaches were evaluated through the use of an experiment, and only
one approach utilized a feasibility study to assess its applicability. The evaluations were conducted in
various settings. Of the evaluated approaches, 46.67% were evaluated, respectively, in industry settings
and using open-source software (OSS) systems, and one approach was evaluated in an academic setting.
Based on these results, it can be concluded that some approaches were not evaluated by any means and
other approaches were not evaluated in industry settings. This lack of industry evaluations harms the
42
reliability of the not evaluated approaches, as there is no evidence of their respective performances in
industry settings.
37.50%
4.17%
20.83%
37.50%
Case study Experiment
Feasibility study None
Figure 4.11: Breakdown of approaches based on evaluation method
Table 4.7: Software artifact dependencies, human involvement level, and evaluation method for each
approach
Study Required software artifacts Human involvement Evaluation (setting)
S1 None Minor None
S2 None None None
S3 None Major Case study (academic)
S4 None Major Case study (industry)
S5 Issue tracking system and source code Major Experiment (industry)
S6 Source code Major Case study (industry)
S7 Source code Minor Case study (open-source)
S8 Architectural blueprints and source code Major Experiment (open-source)
S9 None Major Case study (industry)
S10 Historical releases data, issue tracking system, multiple releases data,
and source code
None None
S11 Source code Minor Case study (industry)
S12 Source code Minor Case study (industry)
S13 Historical releases data, multiple releases data, and source code None Experiment (open-source)
S14 Source code Minor Experiment (open-source)
S15 Historical releases data and source code None Case study (open-source)
S16 Code repository, issue tracking system, and source code None Feasibility study (industry)
S17 Database monitoring system None Case study (open-source)
S18 Code repository, issue tracking system, and source code Minor None
S19 Issue tracking system None None
S20 Source code None Experiment (open-source)
S21 Code repository, issue tracking system, and source code Major None
S22 None None None
S23:
S23.1 None Minor None
S23.2 None None None
43
RQ2: What prioritization techniques are utilized by the identified TD prioritization
approaches?
Each TD prioritization approach, while unique, is based on a core concept that is defined as a prioritization
technique. A prioritization technique is an overarching strategy with the aim of assigning values to distinct
prioritization objects that allow the establishment of a relative order between said objects in the set [231].
In this SLR, the objects are the TD items needed to be prioritized. This research question aims to identify
and define prioritization techniques, so that they can be utilized and built upon by the software engineer-
ing community during TD prioritization. When designing a new TD prioritization approach, individuals
should begin by selecting which prioritization technique best fits their situation. Once a prioritization
technique has been selected, a TD prioritization approach should be designed around the core ideals of the
given technique.
By examining the resulting literature pool, 10 unique prioritization techniques were identified. Table
4.8 displays the identified prioritization techniques, their respective definitions, limitations, and studies
that incorporated them. Figure 4.12 depicts the number of prioritization approaches that are based on
each prioritization technique. The utilized prioritization techniques, which are sorted based on their fre-
quency of utilization, are as follows: cost-benefit analysis (CBA), ranking, predictive analytics, real op-
tions analysis (ROA), analytic hierarchy process (AHP), modern portfolio theory (MPT), weighted sum
model (WSM), business process management (BPM), reinforcement learning, and software quality as-
sessment based on lifecycle expectations (SQALE). CBA weighs a given TD item’s benefit versus the
item’s cost and prioritizes based on the resulting ratio [111,221]. Ranking is defined as the instances when
a calculation model is developed to generate a score, which serves as the basis for TD item sorting and
prioritization [18, 61]. Predictive analytics uses data mining, predictive modeling, statistical, and machine
learning techniques to analyze current and historical TD data which are then used to identify patterns that
aid in prioritizing TD items [67, 133]. ROA is employed by calculating the value and risk of paying off
a TD item at certain points in a given time frame and prioritizing based on when it is most beneficial for
the user to pay off the TD item [25, 176, 205]. AHP calculates a score for each TD item by computing a
pairwise comparison matrix, which is based on predefined criteria and their respective weights [196, 205].
MPT is utilized by assembling a portfolio of TD items, which maximizes the expected return of paying off
a given item at a provided level of risk [189,205]. WSM is utilized by assigning weights to selected criteria,
summing a TD item’s weight values, and prioritizing TD items based on these summed values [225, 227].
44
BPM takes advantage of the business side of TD prioritization, where technical and business stakehold-
ers collaboratively work to prioritize TD items [76]. Reinforcement learning develops an optimal policy
that dictates TD repayment actions [19, 124]. The SQALE technique is based on predefined, modifiable
measures and rules to aid in the TD prioritization process [139]. It should be noted that, with exception to
SQALE, all of the techniques are not limited to TD and have a large variety of applications.
Cost–benefit analysis
Ranking
Predictive analytics
Real options analysis
Analytic hierarchy process
Modern portfolio theory
Weighted sum model
Business process management
Reinforcement learning
SQALE
Number of approaches
0 1 2 3 4 5
Figure 4.12: Breakdown of TD prioritization approaches based on prioritization techniques utilized
RQ2.1: What are the limitations of the identified prioritization techniques?
Each prioritization technique has its drawbacks. Being aware of such drawbacks can be beneficial for
practitioners and researchers alike. When deciding whether to utilize a given technique, one must be
aware of the potential drawbacks of said technique to avoid potential consequences. Once such drawbacks
have been considered, users can decide whether to continue with the technique or switch to a prioritization
technique more suited to their scenario. This research question aims to provide more background on
the various existing prioritization techniques. The limitations of the existing prioritization techniques are
provided in Table 4.8. It is important to note that the identified limitations are not comprehensive and
there may exist other limitations for the identified prioritization techniques that were not included, as the
list is based solely on the acquired literature pool. The identified limitations have been further categorized
into groups, which can be observed in Table 4.8 in parentheses. The categorization groups help identify
which type of limitation is associated with what prioritization technique. In summary, these categories are
defined as follows:
– Applicability: A few of the TD prioritization approaches revolve around the use of a few core
concepts, which might be difficult to equate in a software engineering scope. For example, ROA
45
is conducted through the use of various of measures and metrics, which might not all be viable
for use with TD, such as liquidity and tradability [214].
– Communication among stakeholders: A few prioritization techniques are dependent on the
ability of all relevant stakeholders to understand the importance and impact of each TD item.
Without the input of relevant stakeholders or if such stakeholders are uninformed, the given
technique is doomed to output inaccurate results [18]. A communication among stakeholders
limitation would be encountered when utilizing the BPM prioritization technique. BPM requires
the technical stakeholders and the business stakeholders to jointly evaluate TD. In case that the
various stakeholders are unable to communicate or meet with each other, BPM will be unusable
[76].
– Computational complexity: The cost of applying a technique to prioritize a set of TD items
measured by the number of operations required, the amount of memory used, and the amount of
time required [18]. One example of a technique with a high computational complexity would be
reinforcement learning. Due to the innate nature of reinforcement learning, results have a degree
of variance depending on how long the agent is run [124].
– Error proneness: A few existing prioritization techniques are prone to errors. When using this
type of technique, small changes in variables or the accuracy of estimates can substantially alter
the results achieved [18]. Due to this restriction, one must be meticulous when employing a
prioritization technique that depends on estimates, such as CBA that is reliant on the accuracy
of the value and cost estimates [173].
– Loss of information: An ordered list of TD items is occasionally displayed as the result of
a few prioritization techniques, such as ranking. However, while effective at displaying the
prioritization results, this ordered list might not convey the individual impact of each TD item.
It can be difficult to view an ordered list and fully understand the scope of each TD item. This
potential loss of information includes the cost of each TD item, the impact of each item on a
given system, and more [18]. An example of this limitation would be the use of ranking as a
prioritization technique. The resulting ranked list only displays the relative importance of each
TD item and leaves out all other information [188].
46
– Rank updates: A rank update limitation is when a provided prioritization technique’s results
are subject to change if an additional item is added or removed. With the inclusion or exclusion
of an item, the selected prioritization technique must be rerun to generate a new list of rankings.
Furthermore, the entire prioritized list of TD items is subject to change, leading to inaccurate
results and frustration [18]. MPT can have a rank update limitation, as the model must identify
the relationships a new TD item has with all of the preexisting items, which can result in a
completely new portfolio [189].
– Scalability: Describes the capacity of a technique to handle large amounts of TD. Techniques
that become unmanageable and require a plenty of resources, such as effort, when dealing with
large amounts of TD are considered unscalable [18]. An example of such a technique would be
AHP, where TD items are prioritized based on n(n-1)/2 pairwise comparisons. AHP is unscalable
when dealing with many TD items or many criteria, as the effort and time required to compute
all the pairwise comparisons become unrealistic [125].
47
Table 4.8: TD prioritization techniques, limitations, and corresponding studies
Technique Description Limitations Studies
Analytic hier-
archy process
(AHP)
Considers multiple criteria and gen-
erates weights for each criterion
based on pairwise comparisons. A
score is assigned based on the pair-
wise comparisons to each criterion
and TD item. The scores are then
used to calculate the relative impor-
tance of each TD item [196, 205]
(Error proneness & scalability) Challenge in identifying ideal criteria and
weights, and the quality of the results depends on these weights [198,225]
(Computational complexity & scalability) Time and effort consuming to
complete pairwise comparisons in large hierarchies [178, 225]
(Rank updates) Difficulty in adding a new TD item, as all the analysis
must be redone [225]
(Rank updates) Rank reversal may occur after the addition of a new alter-
native [135]
S18,
S23.1
Business pro-
cess manage-
ment (BPM)
Utilizes BPM in TD prioritization
by conducting focus groups and
various interviews with technical
and business stakeholders in order
to account for the business perspec-
tive in TD prioritization [76]
(Communication among stakeholders) Depends on business and technical
expert knowledge [76]
(Communication among stakeholders) Lack of a mutual understanding
between technical and business stakeholders can lead to issues that may
influence and misdiagnose the level of urgency and criticality of a given
TD item [76]
(Scalability) Can be time consuming for large amounts of TD [76]
S4
Cost-benefit
analy-
sis (CBA)
Assigns a numerical value to the
cost and benefit of TD items to pri-
oritize them by weighing their po-
tential benefits against costs [111,
221]
(Error proneness) Difficulty in obtaining accurate estimates for benefit
and cost, especially for qualitative benefits, and the quality of the results
is highly dependant on the quality of value and cost estimates [48, 173]
S1, S6,
S9, S16 ,
S21
Modern
portfolio
theory (MPT)
Employs the use of financial MPT
techniques to prioritize TD by as-
sembling a portfolio of assets that
maximizes the expected return for a
given level of risk [189, 205]
(Computational complexity & rank updates) The approach depends on the
risk measures, return measures, and correlations, which might be chal-
lenging and complicated to measure for large amounts of TD [189]
(Error proneness) Requires past historical data to accurately assess return
and risk, which can lead to problems if new circumstances occur that were
not present in the past [147, 189]
(Applicability) Might not be beneficial for a small number of investments,
and the assumption that an investor is risk-averse might not hold in all the
situations [189]
S2, S17
Predictive ana-
lytics
Uses data mining, predictive mod-
eling, statistical, and machine learn-
ing techniques to analyze current
and historical TD data, which are
then used to identify patterns that
aid in prioritizing TD items [67,
133]
(Error proneness) Requires significant human effort to build training data
set and finding categorization patterns. The training data set is used to
train and build the model, and the validity of obtained results is highly
dependent on the quality of the data [46, 165]
(Applicability) The models and categorization patterns might be software
system or team dependant [46, 165]
S14, S15,
S18
Ranking Develops a calculation model to
generate a value that aids in TD
prioritization through sorting or
threshold-based approaches [18,61]
(Loss of information) The final rank may not display the relative differ-
ence between ranked TD items [188]
(Rank updates) Adding a new TD item may require a redo of the analysis
since these calculation models are generally based on the TD set [108,
200]
S5, S7,
S8, S13,
S20
Real op-
tions analy-
sis (ROA)
Employs financial ROA techniques
to value and prioritize TD items
while considering the possibility to
adjust them in the future [25, 176,
205]
(Error proneness) Expected results are subject to error based on the accu-
racy of the calculation of value, cost, and other option criteria [51]
(Applicability) Some real options’ assumptions (e.g., liquidity) are not
applicable in the software engineering field [214]
S10, S22,
S23.2
Reinforcement
learning
Develops an optimal policy that
determines TD repayment actions,
which aim to maximize a reward
[19, 124]
(Error proneness & computational complexity) Depending on how long
the agent is run, the results may vary due to the innate nature of rein-
forcement learning when it comes to exploitation versus exploration trade-
off [124]
(Scalability) Many reinforcement learning techniques struggle to properly
scale and accurately solve large problems [124]
(Rank updates) Requires rerunning the model when a new TD item is
added to the preexisting pool of TD items [19]
S19
Software qual-
ity assessment
based on life-
cycle expecta-
tions (SQALE)
Uses predefined, modifiable rules
and measures to identify, quantify,
and categorize TD using several in-
dicators that can aid in TD prioriti-
zation [139]
(Error proneness) Challenges arise in identifying “right code” and accu-
rately mapping each quality requirement into one of the SQALE quality
model characteristics [139]
(Loss of information) SQALE fails to consider the relative importance of
non-conformities for operations or business [139]
S12
Weighted sum
model (WSM)
A multi-criteria decision analysis
method for determining the impor-
tance of a TD item by summing the
weights of importance of the de-
fined criteria [225, 227]
(Error proneness) Criteria need to be additive, and difficulty arises when
dealing with criteria that have different units of measurement [222]
(Error proneness) Difficulty in determining the ideal weight for each cri-
terion, and the results are highly sensitive to these weights [98, 222]
S3, S11
48
4.4 Threats to Validity
The validity of this SLR is subject to construct, internal, conclusion, and external threats. In this section,
each threat and its respective mitigation strategy are discussed by following the guidelines of Kitchenham
et al. [130].
Construct validity
Construct validity is concerned with the design of a study. Primarily, construct validity focuses on the
adequacy of a study’s design to address its research questions [130]. This study aims to summarize the
current TD prioritization approaches and techniques by performing an SLR. Potential threats to construct
validity primarily encompass the search strategy and study selection. Pilot trials were conducted to select
the appropriate search string to mitigate construct threats in the search string. The search string that resulted
in the most relative number of results and that was adopted by previous TD secondary studies [29, 41, 95]
was selected. Additionally, a quasi-gold standard was applied to assess the completeness of the search
strategy and the quality of the search string. To mitigate threats related to the study selection strategy, the
strategy was based on the software engineering systematic review guidelines [130] and previous secondary
studies on TD [26,29,41,42,95,141,191], and the inclusion and exclusion criteria were derived from these
secondary studies.
Internal validity
Internal validity is concerned with the conducts of a study. In the context of secondary studies, potential
threats to a study’s internal validity include weakness in one’s data synthesis and extraction methods [130].
To mitigate internal validity threats during the study selection process, the guidelines of Kitchenham et al.
[130] were followed to construct the search strategy. Two reviewers independently decided which studies
should be included by reading the title and abstract of each retrieved study and applying the predetermined
selection criteria to each study. The reviewers followed a conservative strategy, meaning that discrepancies
between the reviewers’ analysis were resolved by the reviewers reading the entirety of each study under
disagreement and discussing the findings. All studies where a consensus was not achieved were included
to strengthen the internal validity of this research. The lack of quality evaluations for the selected studies is
another threat to the internal validity of this study. This SLR did not conform with the guidelines mentioned
above by not applying such an evaluation due to the small number of studies regarding TD prioritization.
49
If such an evaluation was to be applied, the evaluation might result in the omission of a large portion of
the selected studies. The removal of a significant amount of selected studies would lead to difficulties
when answering the research questions and understanding the current approaches and techniques for TD
prioritization. Kitchenham et al. ’s [130] guidelines were followed during data extraction as well. Three
reviewers independently read and extracted the data from each selected study. Furthermore, the three
reviewers discussed the results through the means of collective discussions.
Conclusion validity
Conclusion validity is concerned with the reliability of the conclusions of a study. In the context of
secondary studies, potential threats to conclusion validity include the data extraction and synthesis ele-
ment [130]. Hence, there is no distinction between this aspect of validity and internal validity in this
case [130]. The internal validity section discusses in detail the steps that were followed to mitigate any
potential threats to the data extraction and synthesis element in this SLR.
External validity
External validity is concerned with the generalizability of the finding of a study. In the context of secondary
studies, it involves the range of primary studies covered [130]. To eliminate any threats to this aspect of
validity, the standard software engineering databases and publication venues were considered during the
search [130]. The search scope was also complemented with Google Scholar, which is believed to be the
“most comprehensive” source of literature available [41]. Moreover, multiple iterations of forward and
backward snowballing were performed to expand the search scope [130]. Though the aim was to cover a
representative body of the current TD prioritization literature, the findings may not be generalizable to TD
prioritization studies outside of this search scope. It is important to emphasize that the conclusions of this
study are only applicable to studies that were assessed.
4.5 Summary
This study conducts an SLR of the current TD literature to identify the existing TD prioritization ap-
proaches and the prioritization techniques that these approaches utilize. Several research questions were
formulated with this goal in mind regarding what current TD prioritization approaches exist, types of
50
TD addressed, decision factor categories considered, software artifact dependencies, human involvement
level required, and their evaluation methods. Additionally, this SLR explores the topic of prioritization
techniques and such techniques’ limitations.
The SLR revealed that a total of 23 studies were found to have included at least one TD prioritiza-
tion approach. These chosen studies proposed 24 unique TD prioritization approaches, which employed
10 unique prioritization techniques. The SLR found that many of the identified TD prioritization ap-
proaches do not consider a resource constraint, and only four of the identified approaches considered all
three decision factor categories of value, cost, and a resource constraint in their TD prioritization process.
Additionally, the SLR revealed that the evaluations of some of the identified approaches were lacking.
Of the identified approaches, 37.5% were not evaluated in any manner, and only seven of the approaches
were evaluated in industry settings. This lack of evaluation diminishes the credibility of the approaches.
Furthermore, the SLR unveiled that among the approaches that considered value, cost, and a resource
constraint, only two approaches were evaluated, and only one of these two approaches can be applied to
prioritize any type of TD.
The study of TD and specifically its prioritization is a relatively new topic of research. It is essential
to consolidate the current research on such a topic to spur new growth in the field and allow practitioners
to learn about the current state of the field. This SLR allows individuals to view the currently researched
TD prioritization approaches and several characteristics of such approaches. The prioritization techniques
are also presented, each with a description and several limitations. Practitioners can utilize this SLR
as a reference to give them an overview of the current TD prioritization approaches. Additionally, TD
researchers might find this review beneficial when developing new approaches for TD prioritization. Based
on the results of this SLR, more research needs to be conducted in reference to creating TD prioritization
approaches that consider value, cost, and a resource constraint in addition to including an evaluation in
industry settings to assess the applicability and effectiveness of the approaches. Furthermore, the SLR
demonstrates that there is no one concise method to estimate value and cost of TD items, which suggests
that these estimates are context-dependent. Consequently, new approaches should be flexible enough to
incorporate various value and cost estimation methods.
51
Chapter 5
Understanding How Software Practitioners Prioritize Technical
Debt Under a Resource Constraint: An Investigative Study
5.1 Background and Motivation
Previous research on technical debt (TD) revealed a few ambiguities within the existing TD prioritization
approaches. The systematic literature review (SLR), in Chapter 4, highlighted an inconsistency regarding
the decision factor categories of which the current approaches are basing their prioritization on. The
majority of the approaches base their prioritization on the value of each TD item and the cost required
to repay each TD item. A smaller number of approaches solely consider the value of TD items, and a
handful of approaches consider the value and cost of TD items in addition to accounting for a resource
constraint. Furthermore, the SLR and other previous research [26, 95, 141, 191] have raised a concern
regarding the effectiveness and applicability of the majority of the existing TD prioritization approaches
in real-world settings, as many of the approaches were not evaluated in industry settings. A lack of an
industry evaluation imposes difficulties on recognizing which decision factors should be considered in the
TD prioritization process to satisfy the needs of software practitioners.
Having an understanding of software practitioners’ behaviors and perceptions regarding TD prioritiza-
tion is a crucial element when developing an effective TD prioritization approach [120]. Such information
is an important input to design TD prioritization approaches, and it is comparable to the first step in the
systems and software engineering life-cycle, in which the requirements of a system are elicited and an-
alyzed [45, 120]. Developing a TD prioritization approach without adequately understanding the needs
and behaviors of its prospective users can lead to an approach appearing to have been developed solely in
isolation, away from real-world settings, which may deem the approach as a failure that does not address
52
its users’ needs [45]. Moreover, it makes predicting an approach’s performance in real-world settings
challenging and signifies that researchers may have designed a TD prioritization approach without taking
software practitioners’ behaviors and perceptions into account [45, 64].
Though the software engineering community has gone to great lengths toward understanding TD and
its management in real-world settings, current empirical studies, as of 2019, on the subject matter have only
either focused on how TD is perceived and managed, without a specific focus on TD prioritization [49,66,
159], or have investigated the factors involved within the prioritization process, without focusing on how
such factors are utilized [66,158,201]. Furthermore, a few studies have explored how software practitioners
prioritize TD in specific, smaller contexts, such as TD-related scenarios [201], TD types [158, 159], and
software development methodologies [66, 158]. However, thus far, there has been no empirical study that
investigates how software practitioners prioritize TD in the presence of a resource constraint even though
having a resource constraint is a typical aspect of TD repayment [29, 45, 49, 91, 95, 237]. The lack of
understanding on how TD is prioritized by software practitioners in the presence of a resource constraint
hinders the development of effective TD prioritization approaches.
To fill this gap in the current TD literature and contribute to the body of TD prioritization knowledge,
this chapter presents an investigative study that aims to gain a better understanding of practitioners’ prac-
tices and motivations when prioritizing TD under a resource constraint. Having such knowledge can assist
in determining what decision factors software practitioners are basing their prioritization approaches on.
Moreover, observing how practitioners utilize such decision factors categories may be beneficial in guiding
the developments of new TD prioritization approaches.
5.2 Study Setup
This section presents the setup of the study: research questions, subjects, and an experiment. The study
was designed and conducted based on the software engineering empirical studies guidelines [129, 195].
5.2.1 Research questions
Aiming to gain a better understanding of practitioner’s practices and motivations when prioritizing TD
under a resource constraint, this study addresses three specific research questions that act as the driving
factors behind the design of the study and aid in aligning the study’s goals and the study’s researchers’
53
interests (i.e., I collaborated with a computer science Ph.D. student and three computer science master’s
students to conduct this study).
Cost estimation
Software practitioners and researchers alike employ static code analysis tools to aid in TD related activities.
Typically, static code analysis tools that identify and quantify TD provide estimates regarding the cost
associated with repaying each TD item, which are known as a TD item’s principal [1, 5, 10]. These cost
estimates of TD are usually expressed in terms of time [5, 10] or monetary [1] units. However, previous
studies, including this dissertation’s SLR, have noted that costs associated with TD items may be context
dependent [26, 29, 42, 95, 141, 191, 207], which refers to TD having varying associated costs according
to the perspective at hand. Additionally, software practitioners who employ static code analysis tools
can have varying opinions on the estimated costs of TD that may affect the manner in which software
practitioners utilize estimated costs of TD items. Software practitioners may deem the tools’ estimates
fitting and appropriate for the prioritization process and fully depend on them, or they may judge them as
unfitting and take steps to modify such estimates based on their personal perspective. The observations
mentioned above lead into this study’s first research question being: RQ1: Howdosoftwarepractitioners
perceiveTDitems’estimatedcostsascalculatedbystaticcodeanalysistools?
Value estimation
Many studies that distinguish successful software projects from failed ones found that most software
project failures are caused by value-oriented shortfalls [45, 63]. Specifically, the value-based software
engineering (VBSE) has divulged that value-neutral approaches to software engineering can be the source
of failure for a software project [45], as value-neutral software engineering methods can cause software
projects to expend significant amounts of scarce resources on activities with negative return on invest-
ments (ROIs) [45,63]. Furthermore, the VBSE argues that value should be integrated into the full range of
existing and emerging software engineering principles and practices [45].
In the context of TD, software practitioners can view TD items as value-neutral, where each TD item
is assumed to be equally important. Alternatively, practitioners can assign a value to each TD item, where
the benefit that would be achieved from repaying each TD item is realized. The value of a TD item can
be estimated based on a simple or complex notion of value and with respect to shared goals or to satisfy
54
personal objectives [45]. The software engineering community has developed a wide range of methods to
estimate a given TD item’s value [17, 23, 24, 61, 67, 76, 99, 108, 115, 140, 164, 185, 190, 200, 211, 220, 227,
237]. Unfortunately, many of the researched value estimation methods have not been evaluated in industry
settings, as observed by this dissertation’s SLR and previous research on TD [26, 95, 141, 191]. As a
result, it is unknown whether practitioners follow these estimation methods and the specific circumstances
in which these estimation methods are utilized. To aid future TD prioritization approach developers in
determining how to estimate the value of TD, this study seeks to understand TD value from two view
points. First, how do practitioners assign value to TD items? Second, is the value of a TD item constant
or does it change overtime? With the previous questions in mind, the second research question is the
following: RQ2: HowdosoftwarepractitionersestimateTDitems’values?
Prioritization decision factors
When prioritizing TD, there are a multitude of decision factors, which may have a substantial impact on
the results of an individual’s TD prioritization, that one can consider. These decision factors are identified
and discussed extensively by Ribeiro et al. [191]. In this dissertation’s SLR, the decision factors presented
in [191] were compiled into a set of three overarching categories, which are referred to as “decision factor
categories”: value, cost, and a resource constraint. The review found that a TD prioritization approach
may revolve around one of the categories listed above [23, 61, 76, 99, 108, 200, 220] or a combination of
them [17, 19, 24, 67, 109, 111, 115, 140, 164, 185, 190, 203, 205, 211, 227, 237]. Unfortunately, though each
TD prioritization approach is based on one or more of the decision factor categories mentioned above,
there is a lack of consistency concerning the factor categories being considered in the currently researched
TD prioritization approaches. In other words, there is no standard set of decision factor categories that TD
prioritization approaches follow. A few approaches solely focus on prioritizing based on a TD item’s value,
while others instead account for both the value and cost associated with each TD item or simultaneously
account for value, cost, and a resource constraint. Moreover, it is still unknown how software practitioners
conduct their prioritization of TD under a resource constraint and which decision factor categories are
accounted for by them, which is primarily due to a lack of empirical evidence and evaluations in the
industry with many of the currently researched TD prioritization approaches [26, 95, 141, 191]. However,
the current TD literature indicates that TD repayment is almost always a highly constrained endeavor,
with the aim of repaying all TD present in a given system typically being unrealistic [29, 45, 91, 95, 237].
55
Consequently, this study assumes that there will always be a resource constraint that practitioners need to
satisfy when prioritizing TD. With the information presented above in mind, this study seeks to understand
whether practitioners account for TD item’s value, cost, or a combination of them when prioritizing TD
and the specific ways these decision factor categories are being utilized. Therefore, the third research
questions is: RQ3: HowdosoftwarepractitionersprioritizeTDitems?
5.2.2 Subjects
The objective of this study is to identify patterns in TD prioritization approaches utilized by software prac-
titioners in the presence of a resource constraint. The primary goal of this study was achieved through
the means of a controlled experiment. To conduct the experiment, a convenience sampling technique
was employed to select potential subjects. Convenience sampling is the most common form of non-
probabilistic sampling, in which samples are collected by including samples that are conveniently available
to researchers [123, 137]. The technique is commonly utilized when there is a hurdle in reaching subjects
or when the population is unknown [96, 123, 137]. Inferences and conclusions drawn from such samples
can not be used in generalizations pertaining to the entire population [123, 137]. Regardless of the limita-
tion pertaining to the generalizability of convenience sampling, the technique has proved its effectiveness
in preliminary explorations and observations of various phenomena and behaviors [96, 123, 137]. The fol-
lowing sections describe in more detail the participating software practitioners and the utilized software
systems within this study.
Software practitioners
The study involved 89 software practitioners, who have varying software roles and levels of industry
experience. The participants were recruited from five separate sources, which are denoted as “Affiliation”
in Table 5.1. The table also provides the participants’ information, including their industry experience in
years, roles, and the system that they performed the experiment on, where “P#” identifies each practitioner,
“Exp” denotes experience in years, and “Sys#” identifies the system, which each participant performed the
study on.
The participants are categorized into two categories based on their affiliation type. The first category
(i.e., 12.36% of the participants) contains six software practitioners from software company C1, three
practitioners from company C2, one practitioner from company C3, and one practitioner from company
56
C4. Company C1 is a mid-size software development company with nearly 50 employees. The company
develops and maintains web-based, information technology solutions for the healthcare insurance market.
Company C2 is an investment company that has a small-size software engineering team of six employees.
The responsibilities of the software team include continuously managing an internal software system and
maintaining the company’s website. Company C3 is a software company that specializes in providing
solutions for various government sectors to digitize and automate their operations. Company C4 is an
information and communication technology provider that provides software and hardware services. The
second category (i.e., 87.64% of the participants) consists of 78 students from the University of Southern
California (USC) CSCI 590 master’s level course. The course aims to provide students with real-world
software engineering experience by immersing them in a real-world software engineering setting. In the
course, students are assigned to work on software projects that have real clients, who are university af-
filiated or external. Client organizations can be identified as: for-profit, non-profit, government sector,
academic departments, or any other entity developing a software project.
Table 5.1: Participating software practitioners
P# Affiliation Exp Role Sys# P# Affiliation Exp Role Sys#
P1 C1 16 Project manager Sys1 P46 CSCI 590 < 1 Developer Sys8
P2 C1 9 Senior software engineer Sys1 P47 CSCI 590 < 1 Developer Sys8
P3 C1 5 Software engineer Sys1 P48 CSCI 590 2 Developer Sys8
P4 C1 3 Software engineer Sys1 P49 CSCI 590 < 1 Developer Sys9
P5 C1 5 Software engineer Sys1 P50 CSCI 590 - Developer Sys9
P6 C1 2 Software engineer Sys1 P51 CSCI 590 1 Developer Sys9
P7 C2 17 Developer Sys2 P52 CSCI 590 - Developer Sys9
P8 C2 13 Developer Sys2 P53 CSCI 590 1 Developer Sys9
P9 C2 7 Developer Sys2 P54 CSCI 590 < 1 Developer Sys9
P10 C3 5 Project manager Sys2 P55 CSCI 590 2 Developer Sys10
P11 C4 8 Integration engineer Sys2 P56 CSCI 590 2 Developer Sys10
P12 CSCI 590 5 Tester Sys3 P57 CSCI 590 - Developer Sys10
P13 CSCI 590 1 Developer Sys3 P58 CSCI 590 - Developer Sys10
P14 CSCI 590 3 Developer Sys3 P59 CSCI 590 6 Developer Sys10
P15 CSCI 590 5 Developer Sys3 P60 CSCI 590 - Developer Sys11
P16 CSCI 590 3 Developer Sys4 P61 CSCI 590 1 Developer Sys11
P17 CSCI 590 2 Developer Sys4 P62 CSCI 590 1 Developer Sys11
P18 CSCI 590 < 1 Tester Sys4 P63 CSCI 590 - Developer Sys11
P19 CSCI 590 - Developer Sys4 P64 CSCI 590 - Developer Sys11
P20 CSCI 590 < 1 Developer Sys4 P65 CSCI 590 1 Developer Sys11
P21 CSCI 590 2 Developer Sys4 P66 CSCI 590 - Developer Sys11
P22 CSCI 590 - Tester Sys4 P67 CSCI 590 < 1 Developer Sys11
P23 CSCI 590 2 Developer Sys4 P68 CSCI 590 2 Developer Sys11
P24 CSCI 590 < 1 Developer Sys4 P69 CSCI 590 1 Developer Sys11
P25 CSCI 590 1 Developer Sys4 P70 CSCI 590 1 Developer Sys11
P26 CSCI 590 - Developer Sys4 P71 CSCI 590 1 Project manager Sys11
P27 CSCI 590 - Developer Sys4 P72 CSCI 590 1 Developer Sys11
P28 CSCI 590 4 Developer Sys5 P73 CSCI 590 - Developer Sys11
P29 CSCI 590 7 Developer Sys5 P74 CSCI 590 - Developer Sys11
P30 CSCI 590 1 Tester Sys5 P75 CSCI 590 1 Tester Sys11
P31 CSCI 590 6 Developer Sys6 P76 CSCI 590 1 Developer Sys11
P32 CSCI 590 12 Developer Sys6 P77 CSCI 590 1 Developer Sys11
P33 CSCI 590 2 Developer Sys6 P78 CSCI 590 1 Developer Sys12
P34 CSCI 590 2 Developer Sys6 P79 CSCI 590 < 1 Developer Sys12
P35 CSCI 590 1 Developer Sys6 P80 CSCI 590 < 1 Developer Sys12
P36 CSCI 590 < 1 Developer Sys6 P81 CSCI 590 < 1 Developer Sys12
P37 CSCI 590 < 1 Developer Sys6 P82 CSCI 590 < 1 Developer Sys12
P38 CSCI 590 - Developer Sys6 P83 CSCI 590 - Developer Sys12
P39 CSCI 590 2 Developer Sys7 P84 CSCI 590 4 Developer Sys12
P40 CSCI 590 4 Developer Sys7 P85 CSCI 590 1 Developer Sys13
P41 CSCI 590 1 Developer Sys7 P86 CSCI 590 3 Developer Sys13
P42 CSCI 590 2 Developer Sys7 P87 CSCI 590 2 Developer Sys13
P43 CSCI 590 - Developer Sys7 P88 CSCI 590 4 Developer Sys13
P44 CSCI 590 < 1 Developer Sys8 P89 CSCI 590 < 1 Developer Sys13
P45 CSCI 590 - Developer Sys8
57
As presented in Table 5.1 and illustrated by Figure 5.1, the majority of the participants are developers
(i.e., 84.27% of the participants), followed by testers (i.e., 5.62% of the participants), software engineers
(i.e., 4.49% of the participants), and project mangers (i.e., 3.37% of the participants). Additionally, one
senior software engineer and one integration engineer participated in the study. It is important to note that
software engineers have a broader set of tasks compared to that of developers. This set of tasks spans over
the entire lifecycle of a given project, including the design, development, maintenance, and testing [50].
5.62%
4.49%
1.12%
3.37%
1.12%
84.27%
Developer Integration engineer Project manager
Senior software engineer Software engineer Tester
Figure 5.1: Breakdown of participants based on role
Additionally, Figure 5.2 depicts the level of industry experience of the participants. A stacked column
graph was utilized to differentiate between the participants based on their respective affiliations. Gray
indicates participants affiliated with the USC CSCI 590 master’s level course, and red indicates participants
affiliated with one of the participating companies. As Figure 5.2 depicts, the majority of the participants
have a year or more of industry experience (i.e., 61.8% of the participants). Moreover, 17.98% of the
participants have less than one year of industry experience, and 20.22% of the participants have no industry
experience.
58
None
<1
1
2
3
4
5
> 5 and < 10
≥10
Number of participants
0 2 4 6 8 10 12 14 16 18 20
CSCI 590 course Other
Figure 5.2: Breakdown of participants based on industry experience and affiliation type
Software systems
This study involved 13 software systems, which were analyzed using SonarQube [10] to identify and
quantify their present TD items. Each participant performed the experiment on one of the involved sys-
tems’ identified list of TD items. Table 5.1 indicates which system each participant performed the study
on, and Table 5.2 summarizes the subject software systems, where “Sys#” identifies each system. The
summary includes the utilized programming language, number of TD items identified, which is indicated
as “TD count” in the table, and time required to repay all the identified TD items in minutes, which is
displayed in column “TD time”.
Table 5.2: Summary of software systems utilized in the investigative study
Sys# Language TD count TD time
Sys1 Java 5,042 46,829
Sys2 Java 1,743 17,047
Sys3 Java 102 768
Sys4 Java 2,438 50,697
Sys5 PHP 399 4,199
Sys6 JavaScript 11,105 147,669
Sys7 JavaScript 5,179 86,942
Sys8 JavaScript 355 6,103
Sys9 JavaScript 1,290 17,859
Sys10 JavaScript 237 3,251
Sys11 Python 115 660
Sys12 JavaScript 26 520
Sys13 Java 210 1,523
The first analyzed software system is an in-house software system (i.e., “Sys1”), which was currently
being worked on by participants affiliated with company C1. At the time the experiment was performed,
the in-house software system had eight software engineers, been under development for three years and
eight months, and actively been in use for two years. The second analyzed software system, “Sys2” in
59
Table 5.2, is Apache VXQuery, which is an open-source software (OSS) system that was suggested to this
study’s researchers at company C2. The system was performed on by participants affiliated with companies
C2, C3, and C4. Lastly, all participants that are affiliated with the master’s level course performed the
experiment on eleven different software systems, where each said participant performed the experiment
specifically on the system that they were currently working on at the time. The subject systems are written
using these programming languages from highest to lowest: JavaScript (i.e., 46.15%), Java (i.e., 38.46%),
followed by PHP and Python (i.e.,7.69%), respectively. Additionally, the number of the identified TD
items per system ranged from 26 to 11,105, with an average of 2,172 and a median equals to 399. The
time, in minutes, required to repay all the TD items presented in a given system ranges from 520 to 147,669
with an average of 29,544 and median equals to 6,103.
5.2.3 Experiment
This section describes the phases of the experiment in addition to summarizing the steps that were per-
formed to improve the design of the experiment. The experiment consisted of four phases, which are
displayed in Figure 5.3 and summarized below.
Training session
Artifact
preparation
Experiment Questionnaire
Figure 5.3: Experiment phases
– Training session: In the first phase, a training session that presented the primary concepts of the
experiment to the participants was performed. The training session involved introducing the concept of
TD and SonarQube. The defining of TD and SonarQube was necessary to ensure that all participants
understand the TD metaphor and the various capabilities of SonarQube. Cunningham’s definition of TD
was provided, and the participants were referred to his video explanation [16] and Fowler’s illustration of
TD [13]. Subsequently, SonarQube was demoed to the participants by presenting an OSS system that was
analyzed on SonarCloud, which is the online version of SonarQube. The demo encompassed displaying
SonarQube dashboard to the participants, and walking the participants through all of the capabilities of
SonarQube with more of a focus on SonarQube TD analysis functionality. The total amount of TD and the
details of an example for each TD severity type were presented to the participants. SonarQube categorizes
60
TD items based on their severity scale from lower to higher severities as follows: info, minor, major,
critical, blocker. The details for each TD severity type example included the following: the file in which a
given TD item exists, start line, end line, severity, time to repay in minutes, a message that suggests how
to repay a given TD item, and the rule which a given TD item has violated. Lastly, the list of SonarQube
rules [11] was presented to the participants.
– Artifact preparation: In the artifact preparation phase, all necessary artifacts needed to conduct
the study were prepared. To do as such, the participants’ software systems were locally analyzed using
SonarQube analyzer. The results were then presented to the participants in SonarQube. Afterward, the
TD items and their details were extracted using SonarQube API [12]. The list of the TD items and their
details, including rule, message, severity, file, start line, end line, and time to repay, were provided to the
participants.
– Experiment: In the experiment phase, the participants were provided with the list of TD items and
their respective details. The participants were asked to select TD items for a given repayment activity that
has a resource constraint of 360 minutes. In other words, the total cost of the selected TD items should
not exceed 360 minutes. Participants were then explained that the output of the experiment phase should
be the following: a list of the selected TD items to be repaid and the participant’s prioritization approach.
While completing the experiment phase, participants were requested to be as detailed as possible when
developing and reporting their TD prioritization approach.
– Questionnaire: In the final phase, the participants were requested to fill out a questionnaire, in which
the participants provide their information: the system they worked on, role, and industry experience.
Additionally, participants were asked to explain in detail their cost estimation method, value estimation
method, TD prioritization approach, and respective rationales. Once the questionnaire was completed,
follow-up, individual interviews were conducted to ensure that the researchers and participants share a
common understanding of all terminologies and definitions.
Design improvements
The experiment and the questionnaires were developed by the researchers based on the current TD prioriti-
zation literature while following the guidelines of [129,195]. The initial experiment and the questionnaire
design were drafted through several collaborative sessions among the researchers. The initial questionnaire
61
design was divided into two subsections: demographics and TD prioritization. It then went through multi-
ple revisions, where questions that were regarded to be open to interpretation and may be misunderstood
were rephrased. Once the initial designs had been finalized, the experiment and questionnaire were piloted
with four members of the target population. All four participants were invited to individually conduct the
experiment and, subsequently, fill out the questionnaire in a read-aloud session, in which participants read
questions out loud as well as externalize their thinking process. While the pilot sessions did not reveal any
points of improvement for the experiment, they did result in a few invaluable insights and revealed areas in
need of improvement in regards to the questionnaire. The participants suggested merging the demographic
and TD prioritization sections into one section, as it would provide the participants more of a comprehen-
sive overview of the questionnaire. The participants also recommended changing a question that revolved
around the rationalization of why some files should not be treated as equally important as others from
an open-ended question to a checkbox question. The checkbox version would include several potential
reasons as options in addition to adding an “other” option where participants can add more reasons. The
suggestions mentioned above were taken into account in the final revision of the questionnaire.
5.3 Data Analysis and Results
This section presents the data analysis procedure and results of the experiment.
5.3.1 Data analysis
The data analysis procedure involved both a qualitative and quantitative analysis. First, a qualitative ap-
proach was followed to analyze the open-ended questions related to the TD prioritization process. The
analysis was conducted through the means of a thematic analysis [72]. A thematic analysis involves im-
mersing in the data by reading it extensively. Subsequently, the data can be coded using descriptive labels
that segment and interpret individual responses. Codes derived from a thematic analysis can be generated
using an inductive approach, deductive approach, or integrative approach. The inductive approach utilizes
a code start list that has been created prior to the analysis. In the deductive approach, the code list is
generated during the analysis. The integrative approach uses both inductive and deductive approaches.
After coding the data, code can be translated into themes by combining different codes into an overarching
theme to improve the elaboration of findings [72].
62
This study applied the integrative approach to code the data. The analysis was initiated by two re-
searchers (i.e., I collaborated with one of the master’s student to conduct the analysis). The two researchers
developed the inductive coding list based on the research questions of the study. Subsequently, the two
researchers independently read and coded the responses and generated a deductive coding list during the
mapping of the data. Through multiple collaborative sessions, the two researchers discussed the results un-
til an agreement was reached. To verify and validate the resulted mapping, two additional researchers (i.e.,
two of the master’s students) were provided with the coding list and asked to map the data. Afterward, the
results were compared and discussed in a joint meeting among these two researchers and one of the pre-
vious researchers (i.e., I conducted this step with the two master’s students). Discrepancies were resolved
through collective discussions until a consensus was reached. A quantitative analysis was then conducted
on the closed-ended questions of the questionnaire and the results of the qualitative analysis. Descriptive
statistics, charts, and tables were utilized to convey and report the varying patterns in the results [170].
5.3.2 Results
RQ1: How do software practitioners perceive TD items’ estimated costs as calculated
by static code analysis tools?
As stated previously, this study investigates whether software practitioners rely on the cost estimations
of static code analysis tools during TD prioritization. To do as such, the study utilized SonarQube as
an example static code analysis tool since the tool provides estimates for the effort, which is denoted in
minutes, required to repay each TD item. Participants were examined to determine whether they would
rely on the provided effort estimates or change them. The experiment revealed two patterns: reliance on
the provided cost estimations and altering of the provided cost estimations.
– Reliance on provided cost estimations: The experiment revealed that almost all of the participants
rely on the cost estimations provided by the static code analysis tool.
– Altering of provided cost estimations: Only one participant (i.e., participant P21) opted to not rely
on the cost estimations provided to them by the selected static code analysis tool during TD prioritization.
The participant altered the provided estimates by multiplying the cost estimates with a competency scalar.
The value of the competency scalar for each TD item was estimated based on the familiarity of the par-
ticipant with the given system’s components and the experience of the participant with repaying the TD
item at hand. Additionally, a separate participant commented that SonarQube may overestimate the effort
63
required to repay each TD item. However, the participant perceived this as an advantage and believes that
it is always better to overestimate rather than underestimate, which may lead to delays or missed deadlines.
RQ2: How do software practitioners estimate TD items’ values?
Given the importance of accounting for the value of a given TD item during the prioritization process,
one would assume that there are well-known, defined methods for estimating the value of TD. However,
the experiment revealed that there is no consensus on which method should be utilized to associate a
value with each TD item within the participants. The participants developed multiple valuation formulas,
and only one participant did not consider value in their TD prioritization process. Table 5.3 presents the
valuation formula created by each participant, where “P#” column identifies each participant. Moreover,
“Cost”, “File”, “Line”, “Rule”, and “Severity” columns indicate the potential valuation parameters, where
filled circles represent the parameters considered in the value formula. Several participants (i.e., 14 partic-
ipants) used one parameter in their valuation, while the majority of participants (i.e., 74 participants) used
multiple. When combining the valuation parameters, participants applied multiplication, summation, or
weighted sum model (WSM) functions. This information is denoted in Table 5.3, where a filled circle in-
dicates the usage of each function. Additionally, “+” indicates summation and “*” refers to multiplication.
The majority of the participants (i.e., 68 participants) used simple multiplication, four of the participants
used summation, and only two participants used WSM. Moreover, Figure 5.4 illustrates the valuation
parameters and the frequency of utilization of each parameter. The most utilized parameter was severity,
which was followed by file, rule, cost, and number of lines involved with TD items. The parameters are
discussed below in order of their frequency of utilization from highest to lowest.
Severity
File
Rule
Cost
Line
Number of participants
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Figure 5.4: Breakdown of valuation parameters based on their utilization by participants
64
– Severity: SonarQube assigns a severity score to each TD item based on the SonarQube team’s per-
ception of the item’s probable impact on future maintenance activities. The severity of a given TD item is
categorized from lowest to highest as follows: info, minor, major, critical, or blocker. When developing
a value formula, 92.05 % of participants who accounted for value (i.e., 81 of the 88 participants) consid-
ered the severity of TD items. The previously mentioned set of participants mapped severity ranks using
three different methods: linear function, exponential function, or T-shirt sizing [161]. This information is
displayed in Table 5.3, where “S linear” indicates linear function, “S exp” refers to exponential function,
“S TS” indicates T-shirt sizing, and a filled circle indicates the usage of each method. Additionally, each
method is explained in more detail below:
– Linear function: Mapping severity ranks using a linear function involves assigning numeric
values to each level of severity, where the values grow by the same amount as the level of
severity increases. An example of utilizing a linear function when mapping severity ranks is
the following:fin f o;minor;ma jor;critical;blockerg!f1;2;3;4;5g. A total of 56 participants
employed the use of a linear function to map severity ranks. However, there was no consensus
on the initial numerical value for the lowest severity rank.
– Exponential function: Utilizing an exponential function to map severity ranks is achieved through
selecting a base constant, then raising the constant to a linearly increasing power, and subse-
quently assigning the resulting values to the severity ranks in an increasing order. For exam-
ple, a participant who utilizes an exponential function could map severity ranks as follows:
fin f o;minor;ma jor;critical;blockerg!f1;3;9;27;81g. Only 8 participants employed the use
of an exponential function, and there was no consensus on the base value of the exponential
function. It should be noted that participants who utilize an exponential function do as such to
express greater differences among the severity ranks than that of a linear function. The value
assigned to the highest severity rank in a linear function is typically only marginally larger than
the one assigned to the lowest rank, while the assigned value to the highest severity rank in an
exponential function is significantly larger than that of the lowest rank.
– T-shirt sizing: When following a T-shirt sizing centered approach, participants do not utilize a
specific mathematical function when mapping severity ranks. In contrast, participants utilize
relative severity mappings. An example of applying a T-shirt sizing mapping is as follows:
fin f o, minor, ma jor, critical, blockerg!f1;3;5;8;13g. A total of 17 participants utilized
65
T-shirt sizing when mapping severity ranks. Participants follow a T-shirt sizing approach since
they have a different perspective of the relative differences among severity ranks. As such, the
resulting mappings were unique to each participant.
– Relative importance of a file: System files can have relatively different values of importance that can
be utilized when prioritizing TD items. However, determining the importance of a file is not a trivial
task, as it varies among individuals and the situation at hand. Figure 5.5 summarizes the attributes that
the participants might consider when assigning value to files. Though only 65 participants considered
the relative importance of a file in which a TD item exists in their value formula, the figure’s data is
derived from the questionnaire, where a few additional participants indicated that they did not consider the
importance of files when prioritizing TD during the experiment since all files in the system at hand have
a relatively similar level of importance. However, these participants provided the reasons that they would
typically consider when determining the relative values of files in other systems that lack a consistent
level of file importance. It is important to note that participants may have indicated multiple attributes to
evaluate the importance of a file, as they believe that the importance of a file depends on the purpose of a
repayment activity and varies from one situation to another.
Importance in a system
Change frequency
Ownership
Maintainability
TD frequency
Readability
Number of participants
0 5 10 15 20 25 30 35 40 45 50 55 60
Figure 5.5: Breakdown of attributes selected to evaluate files based on their utilization by participants
The questionnaire indicated that 78 participants (87.64% of participants) believe that different sys-
tem’s files may have varying levels of importance. Below, the attributes that the participants indicated are
explained based on their frequency of selection:
– File importance in a system: 66.29% of the participants believe that the importance of a file is
determined by its value to the system, which can be based on its business value, customer value,
or the set of features that the file implements.
66
– Change frequency: 37.08% of the participants perceive the importance of a file as its frequency
of change since repaying TD items in a file that is frequently changed may improve the file’s
maintainability and save future efforts.
– Ownership: 35.96% of the participants believe that files that a given participant owns and fre-
quently interacts with should have higher importance, as these participants surmise repaying TD
items in such files is relatively easier. Additionally, this set of participants believe that repaying
TD items in such files improves the files’ maintainability, which may improve the participant’s
productivity.
– Maintainability: A few participants (i.e., 17.98% of the participants) assign greater importance
to files that suffer from maintainability issues. The maintainability of a file can be estimated
manually or automatically using software metrics, such as the maintainability index (MI) [179].
– TD frequency: 13.48% of the participants believe that files that contain higher amounts of TD
items should have a higher importance. They believe that reducing the amount of TD items in a
file will ease the changeability of the file and improves its maintainability.
– Readability: A handful of the participants (i.e., 4.49% of the participants) consider files that
are difficult to read to be more important. A file’s readability may be assessed manually or
automatically using software metrics, such as the readability metric proposed in [55].
– Rule: A few participants utilized the rule which a given TD item has violated in their valuation
formulas. Based on previous experiences, these 14 participants deduced that different SonarQube rule
violations have different impact on maintainability. Consequently, these participants believe that the rule
that each TD item violates indicates the value of said TD item.
– Cost: Cost of repaying each TD item was encompassed in the valuation formula generated by 11
participants. These participants perceive cost as a proxy of the importance of each TD item, where TD
items with a higher cost to repay are to be assigned higher values.
– Line: Only one participant incorporated number of lines involved with TD items in their valuation
formula. The participant believe that TD items that affect larger number of lines should have a higher
value, as these TD items impact larger parts of code.
67
Table 5.3: Value formulas utilized by software practitioners
P# Value formula Cost File Lines Severity Rule * + WSM S linear S exp S TS
P1 file*severity
P2 file*severity
P3 severity
P4 file*severity
P5 file*severity
P6 file*severity
P7 file*severity
P8 file*severity
P9 file*severity
P10 file*severity
P11 file*rule
P12 file*severity
P13 cost*severity
P14 cost*severity
P15 cost*severity
P16 severity
P17 cost*severity
P18 severity
P19 cost*file*severity
P20 file*severity
P21 file*severity
P22 severity
P23 (file + severity)ˆ2
P24 file*severity
P25 file*severity
P26 file*severity
P27 file*severity
P28 severity
P29 cost*severity
P30 file*severity
P31 cost*file*severity
P32 cost*file*severity
P33 file*severity
P34 cost*file*severity
P35 file*severity
P36 lines*severity*rule
P37 file*severity
P38 file*severity
P39 file*severity
P40 file*severity
P41 severity
P42 (x*file) + (y*severity)
P43 (x*file) + (y*severity)
P44 severity
P45 file*severity*rule
P46 file*severity
P47 file
P48 file*severity
P49 file*severity
P50 file*severity
P51 file+severity
P52 file*severity
P53 cost*file*severity
P54 file*severity
P55 file*severity
P56 file*severity*rule
P57 file*severity
P58 file*severity
P59 severity*rule
P60 severity
P61 file*severity
P62 file*severity
P63 file*severity
P64 file*severity
P65 severity
P66 file*severity
P67 file+severity
P68 severity*rule
P69 severity*rule
P70 file*severity
P71 file*rule
P72 cost*file*rule
P73 severity+rule
P74 severity*rule
P75 file*severity
P76 file*severity
P77 severity
P78 N/A - - - - - - - - - - -
P79 severity
P80 file
P81 file*severity
P82 file*rule
P83 file*severity*rule
P84 severity
P85 file*severity
P86 file*severity
P87 file*severity
P88 file*rule
P89 file*severity
Total 11 65 1 81 14 68 4 2 56 8 17
68
RQ3: How do software practitioners prioritize TD items?
The data revealed three patterns in the prioritization approaches adopted by the participants. Table 5.4
displays which prioritization approach pattern is utilized by each participant, where “P#” identifies a par-
ticipant and a filled circle indicates the utilization of an approach.
Table 5.4: TD prioritization approaches utilized by participating software practitioners
P# Value Cost Value and cost Cumulative
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
P37
P38
P39
P40
P41
P42
P43
P44
P45
P46
P47
P48
P49
P50
P51
P52
P53
P54
P55
P56
P57
P58
P59
P60
P61
P62
P63
P64
P65
P66
P67
P68
P69
P70
P71
P72
P73
P74
P75
P76
P77
P78
P79
P80
P81
P82
P83
P84
P85
P86
P87
P88
P89
Total 22 1 66 4
69
Additionally, Figure 5.6 depicts a breakdown of the three patterns based on their utilization by the
participants. The majority of the participants aimed to balance the trade-off between value and cost in
their prioritization, a smaller number of the participants prioritized TD items based on each TD items
associated value aiming to repay TD items with higher values first. Lastly, only one participant prioritized
TD items based on cost; the participant favored repaying lower TD items first.
Figure 5.6: Breakdown of TD prioritization approaches’ patterns utilized by participants
The patterns are described below in order of their frequency of utilization from highest to lowest in
addition a discussion on a context-switch overhead is presented.
TD prioritization approaches’ patterns
– Value and cost: The majority of the participants (i.e., 66 participants) accounted for the trade-off be-
tween value and cost and sought to balance these two factors during their prioritization. Participants who
account for the value and cost of each TD item believe that one should take into account the overall quality
of a system and distribute the available resources in a manner that improves the total obtained value of
a given repayment activity. They also believe that one should not spend all of the time designated for a
given repayment activity on the highest value TD item if it were to consume all of the allocated time. On
the contrary, these participants deduced that repayment activities should aim to optimize the allotted time
for repayment and seek to improve the net value of the repayment activity, which can be achieved through
considering the economic value of each TD item and being as cost-effective as possible. Furthermore,
participants in this set also believe that solely depending on the value of TD items is not realistic, as TD
repayment activities are almost always highly constrained. Therefore, individuals should be greedy when
70
spending the limited, available repayment time to maximize the overall value achieved. To attain the goals
mentioned above, participants applied a range of techniques, including but not limited to: cost-benefit
analysis (CBA), WSM, and sorting.
– Value: A smaller number of participants (i.e., 22 participants) prioritized the identified TD items
based solely on the value of each TD item. These participants believe that a repayment activity should
first tackle the TD item with the highest value and that all available resources should be utilized to address
the highest value TD items regardless of their associated cost. For example, if a TD item were to have the
highest value and requires 360 minutes to repay, all the available 360 minutes should be utilized to repay
the TD item; even if repaying other TD items may provide more overall improvement to the quality of the
given system. The rationale of the value-oriented set of participants is that if a TD item is valuable, then it
should be repaid first no matter the cost.
– Cost: Only one participant (i.e., participant P78) prioritized TD items based solely on their repay-
ment cost. The participant calculated the total time required to repay all the TD items in each system’s file.
Subsequently, the participant sorted the files in increasing order based on the calculated total time required
to repay all the TD contained in each file. The participant then selected files to repay based on the generated
sorted list of files. The participant believed that starting with files that have lower cost reduces the number
of overall TD items in the system and reduces the overall number of tasks that the participant is supposed
to complete, which may have a positive impact on the participant’s moral and overall productivity.
Context-switch overhead
Lastly, it is important to note that when prioritizing TD items, four participants considered the context-
switch overhead resulting from alternating between different files. These participants cumulatively pri-
oritize TD items in a file, which means that the participants prioritize each file instead of each TD item.
Column “Cumulative” in Table 5.4 identifies these participants, where a filled circle indicates that a par-
ticipant prioritizes files. Conversely, the majority of the participants (i.e., 85 participants) consider each
individual TD item independently, without taking the context-switch overhead into account.
71
5.4 Discussion
This section discusses the results presented in the previous section. The discussion revolves around the
following topics: valuation of TD items; severity of TD items; rule violations of TD items; security issue,
defect, or TD item; and no TD prioritization approach is a silver bullet.
Valuation of TD items
This study revealed that participants employ a wide range of methods to estimate values of various TD
items. In particular, the questionnaire uncovered that not only do valuation methods vary among partic-
ipants, but they also vary from one situation to another. Moreover, as indicated by the questionnaire, a
participant may value a given TD item differently at various points of time depending on the situation at
hand, which aligns with this dissertation’s SLR’s and previous TD researchs’ findings that the value of a
TD item is context-dependent [26, 29, 42, 95, 141, 191, 207]. The identified variations in the utilized TD
valuation methods can serve as motivation for the software engineering community to develop TD prior-
itization approaches that are independent of valuation methods and are capable of adapting to any given
valuation method.
Severity of TD items
As mentioned previously, SonarQube categorizes TD items based on their severity scale from lowest to
highest as: info, minor, major, critical, blocker. The severity of each TD item is determined by SonarQube
team’s perception of the item’s probable impact on future maintenance activities. However, the results of
the questionnaire divulge that participants have conflicting perceptions on SonarQube severity categoriza-
tion. A few participants relied heavily on SonarQube categorization by prioritizing TD items based solely
on severity without considering the associated cost. They prefer to first repay all of the blocker TD items
prior to proceeding to the lower level severity items. A separate set of participants perceive SonarQube
severity categories only as probabilities since TD generally only has an impact on the internal quality of
a system and lacks a direct impact on a system’s functionality. They instead took into consideration the
overall obtained value of various TD repayment activities. For example, participants may prefer repaying
two TD items that have critical severities rather than repaying one blocker TD item; if repaying the two
critical TD items will obtain a higher combined value than that of the blocker TD item, and it can be repaid
with a less or equal cost to that of repaying the blocker. The rationale of these participants is as follows:
72
repaying two critical TD items will provide more value and will improve the overall quality of a system in
comparison to repaying the blocker TD item. The findings presented above align with this dissertation’s
SLR’s findings, as the review found that some approaches focus solely on addressing the highest TD value
item first, while other approaches aim to maximize the overall value.
Rule violations of TD items
SonarQube detects TD items by comparing each piece of code to a set of coding and quality rules. The
details of each identified TD item includes a description of the rule that the TD item has violated. The
results revealed that a few participants find rule descriptions as an invaluable resource when prioritizing
TD. These participants utilize rules in their TD item valuation methods based on the participants’ personal
perceptions of the importance of each rule. Furthermore, these participants indicated that rules might have
different values at different times, as the values are subject to change based on the objective of a repayment
activity. For example, if the goal of a repayment activity is to improve the understandability of code, any
rule that is associated with cognitive complexity would be given a higher value. Additionally, this set of
participants have mentioned that one should consider the hidden implications of repaying some TD items.
They believe that rules may be a good indicator of such implications. For example, a participant mentioned
that the repayment of a TD item only requires renaming a variable. Though the cost of repaying such TD
item is relatively small, it has a ripple effect on testing that requires significant costly changes. Moreover,
the participant stated that in some cases they do not have access nor authorization to change test cases,
so the participant will immediately exclude these TD items from the prioritization process. The findings
above suggest that human involvement is a crucial element in the TD prioritization process. While au-
tomatizing the TD prioritization process may save time and effort, one should not aim to eliminate human
involvement. A TD prioritization approach should allow room for a practitioner’s input and intervention,
such as allowing a practitioner to exclude TD items from the prioritization process.
Security issue, defect, or TD item
The results of the questionnaire uncovered that several participants deem a few of the TD items as secu-
rity issues or defects that need to be addressed and repaid prior to release. From past experiences, the
participants realize the consequences and implications of such TD items. As a result, said participants
deem these TD items are to be addressed and removed from the system before production regardless of
73
their associated repayment costs. An example of a TD item that was included in the set mentioned pre-
viously is the return of null from a boolean method. Another example is the usage of wildcard imports,
as this may cause unexpected system behavior. It should be noted that the observation above is not just
specific to SonarQube and is based on the participants overall experience when using various static code
analysis tools. The findings above emphasize the importance of human involvement in TD prioritization
approaches. TD prioritization approaches should grant software practitioners the freedom to exclude TD
items from the prioritization process, as practitioners may be planning to repay a given TD item regardless
of the circumstances at hand.
No TD prioritization approach is a silver bullet
The results demonstrated that there is no silver bullet approach that addresses the various needs of software
practitioners when prioritizing TD. A few participants focused their efforts on addressing the most valuable
TD items regardless of their associated cost. In contrast, one participant focused their prioritization method
on TD items that are cheaper to repay regardless of the obtained value. This participant was cost-conscious
and aimed to reduce the number of instances of TD items in a system without considering the potential
returned value. Other participants’ objectives were instead to utilize the scarce resources allocated for
repayment activities and maximize the ROI of a given repayment activity. Specifically, they aimed to
balance the trade-off between the value and cost of TD items and select TD items based on that balance.
It is important to note that this study does not aim to support nor undermine any TD prioritization
approach. On the contrary, it aims to identify patterns in TD prioritization approaches utilized by software
practitioners without deeming an approach as right or wrong. Adopting any approach depends on the dif-
ferent needs of software practitioners. The results of this study should motivate the software engineering
community to continue supporting the various needs and objectives of software practitioners through de-
veloping more TD prioritization approaches that aid and support the repayment process without focusing
on one specific perspective. The results should also encourage software practitioners to be mindful of the
situation at hand when deciding which prioritization approach to apply.
74
5.5 Threats to Validity
This section discusses several threats to the validity of this study and their corresponding mitigation strate-
gies while following the guidelines in [194, 195]. The identified threats to validity are categorized into the
following three types: construct, internal, external.
Construct validity
The construct validity of a study reflects the degree to which the study assesses what it intends to study
[194, 195]. A potential threat to the construct validity of this study is the accuracy of the measurement of
TD. Aiming to mitigate this threat, this study relied on SonarQube, which identifies and measures TD.
SonarQube is the only OSS tool that identifies and measures TD while also being widely utilized in indus-
try and OSS communities alike [57, 140]. Additionally, another potential threat to the construct validity
of this study is the varying interpretations of value and cost among participants. This threat was mitigated
by conducting a questionnaire and subsequently a follow-up interview with each participant to ensure a
mutual understanding of value and cost. Another potential threat is the possibility that participants may
misinterpret questions in the questionnaire. However, to mitigate this threat, a pilot study was conducted
to identify potential shortfalls. After consolidating with the pilot study participants, the questionnaire was
updated to reflect the feedback provided by them. The follow-up interviews with the study participants
also ensured a mutual understanding among researchers and participants.
Internal validity
The internal validity of a study refers to the extent in which the results are derived from the data [194,195].
It is also concerned with assessing the meaningfulness of the utilized metrics to the acquired conclusions
and the adequacy of the employed measurements. This study relied on SonarQube to identify and mea-
sure TD, instead of using manual methods and human judgment, to avoid subjectivity and error since
using SonarQube ensures that all included subject systems were assessed with a unified set of rules. An
additional potential threat to the internal validity of this study involves the analysis of the study’s data
and its interpretation. This study followed the software engineering thematic synthesis guidelines [72]
to eliminate this threat. The initial list of codes was developed independently by two researchers (i.e., I
collaborated with one computer science master’s student to develop the list of codes and map the data)
and then verified through joint discussions. Furthermore, two additional researchers (i.e., two computer
75
science master’s students) independently analyzed and mapped the data. After analyzing and mapping the
data, the two researchers and one of the previously mentioned researchers (i.e., I participated in this step
with the two master’s students) held a joint meeting to revise and validate the results until a consensus was
reached.
External validity
The external validity of a study refers to the generalizability of its findings [194, 195]. This study aims
to unveil and explore patterns in TD prioritization approaches utilized by software practitioners without
aiming to draw any generalizable conclusions. As such, neither the comprehensibility nor the generaliz-
ability of the findings can be claimed. It is acknowledged that the scope of the study in terms of systems
and participants is smaller than the population of all software systems and software practitioners, and there
may exist other TD prioritization patterns that this study failed to identify.
However, a broad set of sources for opinions regarding practitioners’ experience and team roles was
sought out. Furthermore, the limited pool of participants was due to a difficulty arising in convincing
companies to agree on sparing a portion of their software practitioners’ working-hours and revealing their
intricacies. To expand the limited pool of participants, the scope of the study was expanded by including
students of a master’s level course. The course mimics real-world development settings aiming to provide
students with a real-world software engineering experience. Though many of the participants are students,
the majority of participating students have experience in the software engineering industry. An additional
external validity threat is that this study’s inclusion of only one industry system. Acquiring access to
industry systems was challenging due to legal and intellectual property concerns. This threat was mitigated
by making use of an OSS system suggested by one of the participating companies in addition to expanding
the exploration scope of the study by using the master’s level course’s projects, which are used in real-
world settings.
5.6 Summary
There has been a lack of empirical evidence concerning TD prioritization research in real-world settings,
as evidenced by this dissertation’s SLR and multiple studies [26, 95, 141, 191]. A lack of empirical evi-
dence can lead to a decrease in credibility of research and hinder future research on a selected topic [64].
To address this shortage in the TD prioritization literature concerning empirical evidence, this chapter
76
explores the mindsets and behaviors of software practitioners during TD prioritization in the presence of
a resource constraint. A controlled experiment was conducted with 89 software practitioners who were
requested to select TD items to repay under a resource constraint and, subsequently, tasked with complet-
ing a questionnaire, in which they detailed their approaches and rationales. The results revealed that the
majority of the participants balanced the trade-off between value and cost when they selected TD items to
repay. In contrast, a smaller number of the participants prioritized TD items based solely on value aiming
to repay higher value TD items first, and only one participant prioritized TD items based on their estimated
repayment cost by prioritizing lower cost TD items. The results also unveiled that the majority of the par-
ticipants rely on the cost estimates provided by static analysis tools, and there is no one particular method
to estimate TD items’ values. Furthermore, the questionnaire revealed that not only do valuation meth-
ods differ from one participant to another, but individual participants may also utilize varying valuation
methods depending on the current situation at hand when prioritizing TD.
The findings above suggest that there is no silver bullet approach to TD prioritization that suits all
software practitioners’ desires and needs. In addition, the results imply that TD prioritization is complex,
challenging to perform, and variable depending on the environment in which it is conducted. Given that
there is no “one size fits all” approach to TD prioritization, one should look to incorporate multiple value
and cost estimation methods when developing a TD prioritization approach, as the value and cost of a given
TD item can be subjective and context-dependent. Incorporating the ability to utilize multiple value and
cost estimation methods in an approach grants practitioners the ability to apply said approach in a greater
number of circumstances. In addition to diversifying the estimation aspects of a TD prioritization approach,
the software engineering community should also look to continue supporting the different requirements and
objectives of software practitioners through the development of more TD prioritization approaches without
concentrating on a specific prioritization perspective. Using the conclusions of this study as guidelines for
the future, researchers should look to progress the field by creating new TD prioritization approaches and
providing the necessary flexibility in each new one.
77
Chapter 6
A Search-Based Approach for Technical Debt Prioritization
In this chapter, the technical debt (TD) prioritization approach and its evaluation are presented. TD pri-
oritization is the process of deciding which TD items are to be repaid in a given repayment activity and
which TD items are to be delayed until later releases based on specific, predefined rules to support the
decision [141]. Unfortunately, previous research [29,141,191] and the systematic literature review (SLR),
in Chapter 4, have indicated a scarcity of TD prioritization approaches in general and a lack of priori-
tization approaches that specifically account for a resource constraint during their prioritization process.
Additionally, the currently researched TD prioritization approaches have some limitations, such as a lack
of industry evaluation. To address these limitations, this dissertation introduces a novel search-based ap-
proach for prioritizing TD. The approach specifies which TD items should be repaid to maximize the value
of a repayment activity while minimizing its cost and satisfying its resource constraint. The approach is
not dependant on any value or cost estimation methods, and it is flexible to work with any TD type as long
as the value and cost are precalculated.
6.1 Problem Formulation
In this section, the TD prioritization problem is formulated, and a multi-objective TD formulation for the
problem is presented.
6.1.1 TD prioritization problem model
In a TD prioritization problem, there is a set of independent TD items which are considered to be repaid in
a TD repayment activity. The set is represented as:
78
TD=ftd
1
;td
2
;:::;td
n
g
Each TD item has an associated value. The value represents the benefit obtained from repaying this TD
item. The resulting value vector, for the set of TD items TD, is denoted by:
Value=fvalue
1
;value
2
;:::;value
n
g
Additionally, each TD item is associated with a cost. The cost represents the resources needed to repay a
particular TD item. The resulting cost vector, for the set of TD items TD, is denoted by:
Cost =fcost
1
;cost
2
;:::;cost
n
g
Furthermore, the repayment activity has limited allocated resources which can be transformed and defined
as cost. The resource constraint is represented as RC. Moreover, the decision vector ˜ x =fx
1
;x
2
;:::;x
n
g2
f0;1g determines which TD items will be repaid in the current repayment activity. In this vector, x
i
is 1 if
td
i
is selected to be repaid and 0 otherwise.
6.1.2 Multi-objective TD prioritization problem formulation
In the formulation of the TD prioritization problem, there are two conflicting objectives: maximizing value
and minimizing cost of a TD repayment activity. Cost is treated as an objective to explore the whole set
of solutions in the Pareto-optimal solution set. Having the Pareto-optimal solution set is valuable since it
allows decision makers to evaluate the trade-offs and balance the two conflicting objectives. The following
objective function is considered for maximizing total value:
Maximize
n
å
i=1
value
i
x
i
The second objective function is to minimize the total cost required for repaying the TD items:
Minimize
n
å
i=1
cost
i
x
i
79
In order to convert the first objective to a minimization problem, the total value is multiplied by1.
Therefore, the fitness functions are represented as follows:
Minimize f
1
(~ x)=
n
å
i=1
value
i
x
i
Minimize f
2
(~ x)=
n
å
i=1
cost
i
x
i
Also, cost is added as a constraint to limit the solution set to solutions that are feasible within a resource
constraint:
sub ject to :
n
å
i=1
cost
i
x
i
RC
6.2 Approach
As described in Section 6.1, TD prioritization is a decision problem in which there are two conflicting
objectives that need to be balanced and a resource constraint that must be satisfied. Deciding which
TD items to include in a repayment activity is complicated by several challenges. The first challenge is
that value and cost are typically conflicting objectives, as improvement in value tends to have adverse
consequences on cost. The second challenge is that the decision space is exponential to the problem size,
as each TD item has two options: inclusion or exclusion from a repayment activity, which can result in
a colossal decision space. Finally, repayment activities tend to have a resource constraint that must be
satisfied. This resource constraint imposes a challenge since it restricts the decision space to solutions that
satisfy it.
Three insights into these challenges guide the design of the approach. The first insight is that it is
possible to consider the TD prioritization problem as a multi-objective optimization (MOO) problem in
which there exist many algorithms and techniques to solve it. In this type of problem, there is no one
optimal solution; rather, there is a set of potential optimal solutions with varying degrees of trade-off
between the objectives. This set of optimal solutions assists decision makers in the decision process since
it provides them with an enumeration of these trade-offs. The second insight is that it is possible to consider
this problem as a form of the canonical restricted 0-1 knapsack problem, which is known to be an NP-hard
in its optimization form. Regardless of the restricted 0-1 knapsack complexity, researchers have applied
many evolutionary algorithms and constraint-handling techniques to address it [116,168]. Lastly, the third
80
insight is that while finding an optimal solution might be hard given the exponential nature of the problem,
a near-optimal solution might be sufficient since the objective is to maximize the obtained value while
minimizing cost and satisfying the resource constraint.
Figure 6.1 provides an overview of the approach. The inputs to the approach are a resource constraint
and a list of TD items that includes each item, its value, and its cost. The TD items can be provided either
automatically, such as using static code analysis tools [1,10], or manually using human estimation [109] or
solution comparisons [92]; automatically providing TD items lowers workload on the associated software
practitioners when prioritizing TD. The approach applies the non-dominated sorting genetic algorithm-
II (NSGA-II), which is explained in Section 3.6.4, on the TD list to generate the prioritized solution set. It
begins with an initialization process, where the initial solution population of size N is generated. It then
evaluates and ranks each solution in the initial population using an evaluation function. Next, it enters a
loop of applying genetic operators to select parent individuals from the population then applying crossover
and mutation operators on the selected parents to generate offspring population. Finally, it replaces the
current population using a replacement strategy in which it combines the parent and offspring populations
and selects the best N solutions to replace them with the current population. The selection criteria are
based on the dominance ranking and crowding distance. This loop is repeated until some termination
conditions are met. When the search terminates, the set of Pareto-optimal solutions, which represents
the TD prioritized solutions discovered during the search, is outputted. This includes the value of each
solution, its cost, and the set of TD items selected to be repaid. Along with that, the Pareto-front is also
graphed to facilitate the interpretation and evaluation of alternative solutions. The steps of the approach
are explained in more detail in the following subsections.
Initialize
TD list
Resource
constraint
Evaluate
Combine parent
and of fspring
populations
Select N
i ndividuals
Set of prioritized TD
solutions
Y
N
T erminate Evaluate
Genetic
Operators
Crossover
Selection
Mutation
Figure 6.1: Approach overview
81
6.2.1 Representation
In the TD prioritization problem, each solution is a decision vector, as noted in Section 6.1.1. Each decision
vector will be represented as a binary string. The binary string length is equal to the number of TD items,
and each bit represents a TD item. The value of a bit indicates whether the TD item is included. A bit
value equal to one implies that this TD item is selected in the repayment activity. For example, if there are
five TD items, there will be a binary string with length five: V
1
=(10000), and only the first item would
be repaid in this solution. Each bit has two possible values (0 or 1) which results in 2
5
(i.e., 32) potential
solutions. As a result, the search space contains 32 solutions.
6.2.2 Initial population
This step creates an initial population of candidate solutions that the approach performs the search on. The
step aims to create a feasible and diverse initial population that allows the search to explore different areas
of the solution space [90]. The quality of the initial population, generally, plays a vital role on the perfor-
mance of the search since all the populations in the iterative search process depend on it, and it influences
the final solution [38, 126, 149]. However, the size of the search space and the resource constraint of the
TD prioritization problem impose a challenge in generating a high-quality, initial population. The chal-
lenge arises from the difficulty in searching a vast search space to locate feasible solutions with acceptable
quality; there is a very low probability that only a random generator will create such solutions, which will
prevent the algorithm from converging [38, 126, 149]. Therefore, to achieve the highest possible chance
of solution feasibility and diversity in the initial population, the approach seeds the initial population with
greedy solutions to increase the chance of the search converging on highly-fit solutions [103, 114, 181] by
applying a greedy initialization algorithm to generate the initial population [168], as Algorithm 2 demon-
strates.
The approach only seeds half of the initial population with greedy solutions, as recommended by [181],
to avoid decreasing the exploration of the search and lowering the quality of the solutions resulting from
the excessive use of heuristic solutions in the initial population [103,181]. The remaining half of the initial
population will be randomly generated.
The algorithm divides the problem resource constraint value by half of the population size to obtain
an increment value that determines the value of a local constraint that caps cost in each iteration aiming to
have an initial population that is uniformly distributed over the feasible region [149]. In each iteration, the
82
algorithm aims to include the most attractive solution in the feasible region [149], and it follows a greedy
strategy in order to accomplish that. The algorithm begins by sorting the TD item list in decreasing order
of value to cost ratio. In each iteration, it clones the sorted list and omits any item that violates the local
constraint from the cloned list. Next, it orderly adds the remaining items to the current solution until no
other item can be added without violating the local constraint. It then adds the current solution to the initial
population if the solution does not exist in the population. Afterward, it increases the local constraint by
the increment. The algorithm iterates until half the population size or the problem resource constraint is
reached.
When the iteration terminates, the rest of the solutions in the initial population will be randomly gener-
ated by assigning a value of 0 or 1 to each bit in the binary string. If a resulting solution violates the resource
constraint, it will be repaired using the greedy-repair function, described in the constraint-handling step
below.
Algorithm 2 Greedy initialization
Input: T D : TD item list
N : Population size
RC : Resource constraint
Output: P : An initial population of size N
1: increment RC=(N=2)
2: localConstraint 0
3: sort(TD) // sort TD items based on their decreasing order of value to cost ratio
4: while P.size()< N/2 and localConstraint RC do
5: S // A new solution instance where none of the TD items is included
6: tempTDItems TD
7: RMNG Cost localConstraint
8: tempTDItems.remove(RMNG Cost) // remove any TD item that its cost exceeds the remaining
cost from the list
9: while tempTDItems.notEmpty() do
10: TDItem=tempTDItems.remove()
11: S.add(TDItem) // include the TD item in the repayment activity
12: RMNG Cost RMNG Cost TDItem:Cost()
13: tempTDItems.remove(RMNG Cost)
14: end while
15: if P.notContain(S) then
16: P.add(S)
17: end if
18: localConstraint localConstraint+ increment
19: end while
20: P.addRandomSolutions(N-P.size())
21: return P
83
6.2.3 Evaluation function
An evaluation function determines how fit a solution is to the problem in consideration. It rates the quality
of a solution based on its objective functions [47, 167]. In the TD prioritization problem, there are two
objective functions, as indicated in Section 6.1.2. Therefore, the evaluation function has two fitness func-
tions. The first fitness function, to be maximized, is the value function. Given a solution S, represented as
a binary string V , it calculates its value based on the following equation:
f
1
(S)=
n
å
i=1
value
i
v
i
However, to make the Pareto-front representations of the solution more intelligible, the approach trans-
forms this objective into a minimization objective by multiplying the solution’s total value by1. Accord-
ingly, the fitness function to be minimized is:
f
1
(S)=
n
å
i=1
value
i
v
i
The second fitness function to be minimized is the cost function. Given a solution S, represented as a
binary string V , it computes its cost based on the following equation:
f
2
(S)=
n
å
i=1
cost
i
v
i
6.2.4 Genetic operators
In order to evolve solutions, the approach uses the following genetic operators: selection, crossover, and
mutation:
Selection
The goal of the selection step is to determine which solutions will be passed to the crossover step to create
offspring solutions for the next generation [47]. There are many selection operators, such as linear ranking,
exponential ranking, truncation, proportional, and tournament selection [47].
The approach uses the binary tournament selection described by Deb et al. [81] because it proved its
effectiveness in the original implementation of NSGA-II, and it works with negative fitness values [81].
A binary tournament selection randomly selects two solutions from a population and holds a competition
84
between the two selected solutions. The winner of a tournament is the best solution in terms of non-
dominance ranking. If the two solutions have the same non-dominance rank, then the winner is the solution
with higher crowding distance. If both solutions are winners, one of them is returned with equal probability.
The approach applies this step two times in order to select two parents that will be passed to the crossover
operator.
Crossover
The crossover operator is used to vary solutions from one generation to the next. It is an analogy to
the biological reproduction crossover. In this operator, more than one parent is selected, and offspring
solutions are produced by merging the two parents into one or more offspring solutions [90]. There are
different types of crossover operators for binary strings, such as single-point [171, 228], two-point [228],
uniform [171, 228], and half-uniform (HUX) crossover [93].
The approach applies the HUX operator to the two selected parents. This ensures that the offspring so-
lutions are equidistant between the two parents, which serves as a diversity-preserving mechanism [93]. In
this operator, half of the non-matching bits are randomly swapped between the two parents. The Hamming
distance [150] (i.e., the number of differing bits) is calculated and it is divided by two. The result indicates
how many of the non-matching bits between the two parents will be exchanged, and these bits are chosen
randomly. The operator is applied to the two parents based on a randomly calculated probability. If the
probability limit is reached, then crossover occurs. Otherwise, the resulting offspring solutions are simply
copies of the parents. The two resulting offspring solutions will be passed to the mutation operator.
For example, consider two solutions: V
1
= (11001) and V
2
= (10110). The approach generates a
random number in the range[0;1]. If the generated number is less than or equal to the crossover rate, then
the operator will be applied. In this example, the crossover rate was set to be 1:0, so the crossover operator
will always be applied. The Hamming distance equals 4. As a result, 2 bits will be exchanged between
the two parents. These bits are randomly selected from the non-matching bits and are indicated by “” :
V
1
=(1 00) and V
2
=(1 11). This results in two offspring solutions which are: V
1
0 =(10000) and
V
2
0 =(11111).
Mutation
The mutation operator aims to introduce and maintain diversity in the population. It is an analogy to the
biological mutation. The operator alters a solution by perturbing it, and it is usually applied with a low
85
probability. There are several mutation operators for binary strings, including interchanging, reversing,
and flipping [171].
The approach uses the bit-flipping mutation operator because it is the most effectively employed op-
erator for binary strings [171]. This operator acts independently on each bit in a solution and switches its
value from 1 to 0 and vice versa, with a mutation rate that indicates the probability that a bit is flipped.
For example, consider solution: V
1
0 =(10000). The approach generates a random number in the range
[0;1] for each bit: R=f0:3;0:7;0:9;0:6;0:1g. If the value of the random number associated with a bit is
less than or equal to the mutation rate, the bit will be flipped. In this example, the mutation rate = 0:2.
Consequently, only the last bit will be flipped since 0:1 0:2. This results in V
1
00 =(10001).
6.2.5 Constraint handling
In most cases, the TD prioritization problem is severely constrained where the majority of the search space
is infeasible. This imposes a challenge for the NSGA-II to find a feasible solution. In the acquired litera-
ture, there are many constraint-handling techniques that can be categorized into three categories: penalty,
elimination, and repair [166–168]. In the penalty technique, solutions will be generated without regarding
constraints, and the infeasible solutions will be penalized in the evaluation function by degrading their
fitness. However, this technique has several drawbacks. First, the search algorithm will spend the majority
of its time evaluating infeasible solutions. Second, selecting the appropriate penalties is challenging. If
moderate penalties are imposed, the algorithm might converge on infeasible solutions since they might
have better fitness values than the feasible ones. On the contrary, if one imposes a high penalty and a
feasible solution was found, the algorithm might converge on this particular solution without considering
other feasible solutions. The algorithm might fail to consider other feasible solutions because generating
these feasible solutions might depend on the infeasible ones, and the high penalties make selecting the
infeasible solutions for reproduction unlikely [168, 192]. The elimination technique applies the “death
penalty” by simply eliminating all the infeasible solutions from the population. This technique also has its
drawbacks, as it will waste time evaluating infeasible solutions that will not contribute to any generated
population [35,168]. Lastly, the repair technique applies a special repair function to transform any infeasi-
ble solution to a feasible one. When designing the TD prioritization approach, a decision was made to uti-
lize a repair technique due to the fact that it has been empirically verified that repair techniques outperform
other constraint-handling techniques when they are applied to constrained, combinatorial-optimization
86
problems [116, 166–168, 180]. Specifically, the approach utilizes a greedy-repair technique [168], as it is
illustrated by Algorithm 3.
The algorithm aims to repair an infeasible solution by eliminating expensive TD items that contribute
the least value. The algorithm starts by sorting the included TD items of a solution in an increasing order
of value to cost ratio. It then orderly removes TD items from the sorted list and the solution until the
cost of the solution satisfies the resource constraint. Possible resource constraint violations may occur in
the process of generating random solutions and new offspring solutions. To guarantee the exclusion of
infeasible solutions, the approach checks the feasibility of solutions after these steps and applies this repair
function on the infeasible solutions.
Algorithm 3 Greedy repair for infeasible solutions
Input: S : A prioritized TD solution
RC : Resource constraint
Output: S
0
: A repaired prioritized TD solution
1: constraintViolated true
2: S
0
S
3: includedItems S
0
:getIncludedTD()
4: sort(includedItems) // sort the included TD items based on the increasing order of value to cost ratio
5: while constraintViolated do
6: TD includedItems:remove() // remove an item from the list based on the list order
7: S
0
.remove(TD) // remove the item from the solution
8: if S
0
.getCost() RC then
9: constraintViolated false
10: end if
11: end while
12: return S
0
6.2.6 Termination
The algorithm has one termination condition: when the total number of fitness evaluations reaches a pre-
defined maximum number of evaluations. This condition bounds the search execution time and guarantees
the algorithm stopping to prevent the algorithm from running for a long time without converging.
6.3 Evaluation
The approach was evaluated through the means of two experiments to assess its effectiveness and overall
running time. Particularly, the evaluation of the approach answers the following research questions:
87
RQ1: How effective is the approach at improving the value of a TD repayment activity as
opposed to random search?
RQ2: How does the approach’s prioritization perform in comparison to software practitioner’s
prioritization?
RQ3: How long does the approach take to generate a prioritized solution set?
6.3.1 Implementation
The approach was implemented in Java as a prototype tool named TDPrioritizer to conduct the evalua-
tion. To implement the search algorithm, the MOEA Framework, which is an open-source Java library
for developing multi-objective evolutionary algorithms (MOEAs), was utilized [112]. Moreover, for the
search technique described in Section 6.2, the following parameter values were used: population size =
200, crossover rate = 1.0, mutation rate = 0.01, and max number of evaluations = 1,000,000. Finally, all
experiments were run on Amazon EC2 t2.xlarge instances pre-installed with Ubuntu 16.04.
6.3.2 Experiments
Two experiments were conducted to address the research questions. The first experiment answered RQ1,
and the second experiment focused on answering RQ2. To answer RQ3, the running time of TDPrioritizer
was measured and recorded for all the runs in both experiments. Each experiment is described, in more
detail, below where the description of each experiment includes its subjects and methodology.
Experiment one
A novel TD prioritization approach must outperform baseline methods to prove its usefulness. Random
search is the simplest form of search algorithms and is commonly used as a benchmark to evaluate the
performance of other search-based techniques [33, 114]. Thus, the aim of the first experiment is to com-
pare the performance of the approach to that of random search in terms of convergence and quality of
resulted solutions. Given that a generated data set will suffice for the purpose of this experiment, the
experiment was run on a semi-generated data set. It is important to note that utilizing a generated data
set for this sort of comparison is common in search-based software engineering, as evidenced by multiple
studies [52, 58, 59, 71, 89, 106, 117, 148, 208, 241] and is in accordance with the search-based software en-
gineering guidelines [114]. The experiment was conducted as follows: TDPrioritizer and three variations
88
of TDPrioritizer, where each replaced the NSGA-II search aspect with random search, were run 100 times
on each subject system in order to mitigate the effects of random variation resulting from the stochastic
nature of optimization algorithms, as recommended by [33, 114].
– Subjects
The first experiment is comprised of 40 randomly selected Apache Java open-source software (OSS) sys-
tems. Apache was chosen as the source of subject systems due to the following: Apache systems fall
under multiple domains, are widespread in use, and the source code is readily available. Furthermore, the
experiment was limited to Java systems because SonarQube Java analyzer is freely available and there are
many Apache Java systems, as opposed to Python or PHP. Each system is listed in Table 6.1, where column
“Sys#” refers to the system number and column “Sys name” indicates the name of the system. Column
“TD count” refers to the total number of TD items, and column “TD time” indicates the total time required
to repay all TD items in minutes. Both “TD count” and “TD time” column values were calculated using
SonarQube.
Table 6.1: Experiment one system details
Sys# Sys name TD count TD time Resource constraint
Sys1 Atlas 2,233 29,251 5,774
Sys2 Avro 1,162 13,768 1,186
Sys3 Axiom 3,087 116,402 2,065
Sys4 BCEL 2,130 23,475 4,452
Sys5 C-collections 801 11,261 1,718
Sys6 C-JCS 751 8,878 2,375
Sys7 C-math 6,269 61,466 5,262
Sys8 C-NET 1,392 12,474 764
Sys9 Cayenne 5,719 101,911 15,949
Sys10 Chemistry 2,989 65,872 8,788
Sys11 cTAKES 13,004 135,575 5,281
Sys12 Directory 3,917 41,486 4,148
Sys13 Doxia 1,760 13,349 1,870
Sys14 Empire-DB 2,347 35,761 4,186
Sys15 Felix 390 3,879 863
Sys16 Flume 2,750 26,212 4,837
Sys17 Fortress 889 9,527 999
Sys18 Hama 1,851 16,427 2,144
Sys19 Isis 7,134 111,892 3,706
Sys20 James 3,652 45,721 9,637
Sys21 Jclouds 4,568 68,716 16,931
Sys22 JSPWiki 2,364 16,603 6,152
Sys23 jUDDI 8,224 80,946 3,572
Sys24 Juneau 2,974 61,312 518
Sys25 Karaf 5,073 57,740 19,523
Sys26 Kerby 1,414 86,816 5,434
Sys27 Knox 2,306 23,389 2,306
Sys28 Kylin 6,517 54,682 7,029
Sys29 Lens 2,040 22,897 4,036
Sys30 MyFaces 1,941 31,604 6,857
Sys31 OpenNLP 5,008 72,591 7,786
Sys32 Qpid 10,379 85,807 10,021
Sys33 RocketMQ 2,598 32,627 12,371
Sys34 Roller 2,205 23,716 4,890
Sys35 Shiro 1,251 26,965 3,754
Sys36 Storm 7,001 80,160 10,264
Sys37 Tika 4,118 37,390 3,330
Sys38 UIMA 3,649 34,454 1,765
Sys39 Wicket 3,856 205,070 30,007
Sys40 XML Graphics 1,014 13,857 10,801
89
– Methodology
SonarQube was used to analyze the source code of each subject system to obtain the set of TD items
contained within each system. Specifically, the SonarQube API [12] was utilized to extract each TD
item and its respective characteristics, which include: the rule that each TD item violated, a message that
suggests how to repay each TD item, the file in which each TD item was identified, the start line of each
TD item, the end line of each TD item, the estimated time in minutes to repay each TD item, and the
severity of each TD item.
For each system, a resource constraint value in the range [360;T D time] was randomly generated,
as displayed by column “Resource constraint” in Table 6.1. Resource constraint values less than 360
minutes were excluded to ensure that multiple TD items can be repaid within the provided resource con-
straint, which allows the search to be performed and be consistent with the investigative study in Chap-
ter 5. Subsequently, each TD severity type was mapped onto a number using the following mapping:
fin f o;minor;ma jor;critical;blockerg!f1;2;3;4;5g. Additionally, weight representing the importance
of a file was then randomly assigned in the range [1;10] to each system file. The mapped severities and
assigned file weight values were used in TD items’ values calculation using the following equation:
Value(td
i
)= severity(td
i
) f ile weight(td
i
)
To measure the cost of each TD item, SonarQube estimation for the time required to repay each TD item
was used:
Cost(td
i
)= repay time(td
i
)
The obtained values, costs, and resource constraint were used across all the experiment’s runs and
variations. To assess the significance of applying the NSGA-II in the approach, experiment runs were
conducted with three variations of TDPrioritizer. TDPrioritizer and the three variations of TDPrioritizer
were run 100 times on each system in order to mitigate the effects of random variation resulting from the
stochastic nature of optimization algorithms, per the recommendations of [33, 114].
The first variation (V1) replaces the NSGA-II search with a pure random search. In V1, an initial
population of solutions is randomly generated, no repair function is applied to the infeasible solutions,
and random search is applied. Upon termination, any solution that violates the resource constraint will
be eliminated from the results. Solutions of V1 serve as an assessment of the benefits of injecting the
90
initial population with greedy solutions, repairing the infeasible solutions, and applying the NSGA-II.
The second variation (V2) applies random search on the initial, greedy population that is generated using
TDPrioritizer. However, it skips the repair function; any infeasible solution will be eliminated from the
results. V2 evaluates the usefulness of applying the repair function and NSGA-II. Lastly, the third variation
(V3) applies all the components of TDPrioritizer except NSGA-II. Instead, it applies random search on the
initial, greedy, repaired population that is generated using TDPrioritizer to assess the necessity of applying
NSGA-II. The best obtained solutions’ values of each run of TDPrioritizer and the three variations were
recorded.
All acquired values were then used in the comparisons. Solutions’ costs were not considered in the
comparisons since in general decision makers prefer a solution that maximizes the value of a repayment
activity over a solution that has less value and less cost, as long as the solution that maximizes the value
does not exceed the constraint. Additionally, it was found that in most cases TDPrioritizer and its three
variations obtained similar costs for solutions with the best obtained values.
Experiment two
To evaluate the approach in comparison to software practitioners, the second experiment was conducted
with the data from a subset of the software practitioners who participated in the investigative study in
Chapter 5. Specifically, the subset of practitioners included those who balanced the value and cost of
TD items in their prioritization. The experiment aimed to compare the performance of the selected set
of software practitioners with the performance of TDPrioritizer in terms of the best obtained repayment
solution’s value. TDPrioritizer was run 100 times on each practitioner’s input to mitigate the effects
of random variation resulting from the stochastic nature of optimization algorithms, as recommended
by [33, 114].
– Subjects
As stated above, experiment two was conducted with the software practitioners from the investigative
study who balanced the value and cost of TD items during TD prioritization. Information on the software
practitioners whose data were used in the second experiment can be viewed in Table 6.2. Each software
practitioner is identified using “P#”, and “Expe” refers to each practitioner’s industry experience in years.
Additionally, each system that each practitioner performed the experiment on, in the investigative study,
91
is referred to as “Sys#” . Each practitioner’s identification number is consistent with the one used in the
investigative study. Lastly, the systems’ details are also summarized in Table 5.2.
Table 6.2: Software practitioners included in experiment two
P# Affiliation Expe Role Sys# P# Affiliation Expe Role Sys#
P1 C1 16 Project manager Sys1 P45 CSCI 590 - Developer Sys8
P2 C1 9 Senior software engineer Sys1 P46 CSCI 590 <1 Developer Sys8
P3 C1 5 Software engineer Sys1 P47 CSCI 590 <1 Developer Sys8
P4 C1 3 Software engineer Sys1 P48 CSCI 590 2 Developer Sys8
P5 C1 5 Software engineer Sys1 P49 CSCI 590 <1 Developer Sys9
P6 C1 2 Software engineer Sys1 P50 CSCI 590 - Developer Sys9
P7 C2 17 Developer Sys2 P51 CSCI 590 1 Developer Sys9
P8 C2 13 Developer Sys2 P54 CSCI 590 <1 Developer Sys9
P9 C2 7 Developer Sys2 P58 CSCI 590 - Developer Sys10
P10 C3 5 Project manager Sys2 P60 CSCI 590 - Developer Sys11
P11 C4 8 Integration engineer Sys2 P61 CSCI 590 1 Developer Sys11
P12 CSCI 590 5 Tester Sys3 P63 CSCI 590 - Developer Sys11
P16 CSCI 590 3 Developer Sys4 P64 CSCI 590 - Developer Sys11
P17 CSCI 590 2 Developer Sys4 P65 CSCI 590 1 Developer Sys11
P18 CSCI 590 <1 Tester Sys4 P67 CSCI 590 <1 Developer Sys11
P19 CSCI 590 - Developer Sys4 P68 CSCI 590 2 Developer Sys11
P20 CSCI 590 <1 Developer Sys4 P69 CSCI 590 1 Developer Sys11
P21 CSCI 590 2 Developer Sys4 P70 CSCI 590 1 Developer Sys11
P22 CSCI 590 - Tester Sys4 P72 CSCI 590 1 Developer Sys11
P23 CSCI 590 2 Developer Sys4 P73 CSCI 590 - Developer Sys11
P24 CSCI 590 <1 Developer Sys4 P74 CSCI 590 - Developer Sys11
P26 CSCI 590 - Developer Sys4 P75 CSCI 590 1 Tester Sys11
P32 CSCI 590 12 Developer Sys6 P76 CSCI 590 1 Developer Sys11
P34 CSCI 590 2 Developer Sys6 P77 CSCI 590 1 Developer Sys11
P35 CSCI 590 1 Developer Sys6 P79 CSCI 590 <1 Developer Sys12
P36 CSCI 590 <1 Developer Sys6 P81 CSCI 590 <1 Developer Sys12
P37 CSCI 590 <1 Developer Sys6 P82 CSCI 590 <1 Developer Sys12
P38 CSCI 590 - Developer Sys6 P83 CSCI 590 - Developer Sys12
P39 CSCI 590 2 Developer Sys7 P84 CSCI 590 4 Developer Sys12
P40 CSCI 590 4 Developer Sys7 P85 CSCI 590 1 Developer Sys13
P42 CSCI 590 2 Developer Sys7 P86 CSCI 590 3 Developer Sys13
P43 CSCI 590 - Developer Sys7 P87 CSCI 590 2 Developer Sys13
P44 CSCI 590 <1 Developer Sys8 P89 CSCI 590 <1 Developer Sys13
– Methodology
In the investigative study, the subject systems were analyzed using SonarQube and their TD items were
extracted. Afterward, each software practitioner was requested to select TD items that they would select
to repay in a given repayment activity without exceeding the resource constraint, which was set to 360
minutes.
For each participating software practitioner’s data, the same resource constraint (i.e., 360 minutes) and
list of all TD items in the system, including each TD item’s value and cost as determined by each practi-
tioner were inputted into TDPrioritizer. The list of TD items is identical to the one that each practitioner
originally conducted the investigative study on, including each TD item’s value and cost.
TDPrioritizer was run 100 times on each practitioner’s input to ensure that the variations between
practitioners’ obtained values and TDPrioritizer’s obtained values are not random, and the value of the
best obtained solution for each run was recorded. The total value of the TD items selected by each software
practitioner was then calculated. Once the total value had been calculated, the practitioner’s total value was
compared to the value of the best obtained solution for each TDPrioritizer run.
92
6.3.3 Results
RQ1: How effective is the approach at improving the value of a TD repayment activity as opposed
to random search?
To compare the performance of the approach to the performance of random search, TDPrioritizer and
the three variations of random search were each run 100 times on each subject software system, and,
subsequently, the value of the best obtained solution for each run was recorded. Afterward, the best values
obtained by TDPrioritizer were compared to the best values obtained by each of the three variations. For
each system, the differences among the best values obtained by TDPrioritizer and the best values obtained
by each of the three variations of TDPrioritizer were calculated then averaged over the 100 runs.
Table 6.3 summarizes the results. Columns “ A VG(V1)”, “ A VG(V2)”, and “ A VG(V3)” display the
mean of best solutions’ values of V1, V2, and V3 of the 100 runs, respectively. Additionally, columns
“ A VG(TDPrioritizer-V1)”, “ A VG(TDPrioritizer-V2)”, and “ A VG(TDPrioritizer-V3)” depict the mean
of variations in best solutions’ values between TDPrioritizer and V1, V2, and V3 of the 100 runs, respec-
tively.
Table 6.3: Experiment one results for effectiveness of TDPrioritizer at improving repayment value
over random search
Sys# A VG(TDPrioritizer) A VG(V1) A VG(V2) A VG(V3) A VG(TDPrioritizer-V1) A VG(TDPrioritizer-V2) A VG(TDPrioritizer-V3)
Sys1 9,549 - 7,384 7,419 - 2,165 2,130
Sys2 2,881 - 2,548 2,568 - 333 313
Sys3 6,230 - 5,345 5,369 - 885 861
Sys4 7,792 - 6,098 6,126 - 1,694 1,666
Sys5 4,383 - 3,178 3,204 - 1,205 1,179
Sys6 3,679 - 2,695 2,715 - 984 964
Sys7 12,619 - 10,813 10,835 - 1,806 1,784
Sys8 2,140 - 1,881 1,895 - 259 245
Sys9 21,864 - 18,706 18,757 - 3,158 3,107
Sys10 14,050 - 11,232 11,272 - 2,818 2,778
Sys11 12,972 - 12,421 12,436 - 551 536
Sys12 9,529 - 8,847 8,881 - 682 648
Sys13 3,517 - 3,474 3,477 - 43 40
Sys14 7,797 - 6,585 6,615 - 1,212 1,182
Sys15 1,811 - 1,452 1,464 - 359 347
Sys16 10,317 - 8,755 8,792 - 1,562 1,525
Sys17 2,785 - 2,428 2,434 - 357 351
Sys18 5,278 - 4,449 4,473 - 829 805
Sys19 12,290 - 11,045 11,049 - 1,245 1,241
Sys20 18,443 - 14,541 14,593 - 3,902 3,850
Sys21 22,021 - 17,938 17,979 - 4,083 4,042
Sys22 12,179 - 9,053 9,094 - 3,126 3,085
Sys23 7,044 - 6,257 6,272 - 787 772
Sys24 4,061 - 3,135 3,153 - 926 908
Sys25 24,408 - 19,680 19,720 - 4,728 4,688
Sys26 6,826 - 5,065 5,093 - 1,761 1,733
Sys27 6,348 - 5,666 5,679 - 682 669
Sys28 15,006 - 14,510 14,518 - 496 488
Sys29 7,941 - 6,490 6,529 - 1,451 1,412
Sys30 16,644 - 10,774 10,830 - 5,870 5,814
Sys31 14,094 - 12,594 12,616 - 1,500 1,478
Sys32 22,015 - 21,070 21,098 - 945 917
Sys33 14,873 - 10,994 11,029 - 3,879 3,844
Sys34 9,948 - 7,567 7,605 - 2,381 2,343
Sys35 5,478 - 4,003 4,023 - 1,475 1,455
Sys36 18,775 - 16,852 16,893 - 1,923 1,882
Sys37 7,715 - 7,261 7,273 - 454 442
Sys38 5,375 - 4,684 4,700 - 691 675
Sys39 22,264 - 15,161 15,222 - 7,103 7,042
Sys40 7,721 5,137 5,121 5,135 2,584 2,600 2,586
93
To consider TDPrioritizer effective, TDPrioritizer must obtain higher best solutions’ values than the
ones obtained by the three variations. It should be noted that to compare the performance of two ap-
proaches, one can not draw definite conclusions without conducting a statistical test prior, as obtained
results may be the consequence of chance. To ensure drawn conclusions are not derived from chance,
the search-based software engineering guidelines [33, 114] call for the application of the two-tailed non-
parametric Mann-Whitney U-test and the reporting of the resulting p-values to assess the significance of
the findings. In particular, the following null hypothesis should be tested: The two approaches exhibit
stochastic equality. Therefore, firstly, the alpha value was set to 0.05, and the suggested test was then per-
formed. However, as advised by [33], solely reporting the resulting p-values is insufficient. One needs to
calculate Vargha and Delaney’s
ˆ
A
12
statistic to measure the effect size to ensure that the difference between
the compared approaches is not negligible. The Vargha and Delaney’s
ˆ
A
12
statistic is a non-parametric ef-
fect size that measures the probability that one approach yields better results than the other one [224].
The value of the measure ranges from [0;1]. A value of
ˆ
A
12
> 0:5 means TDPrioritizer is better than
the other variation in more than 50% of the time, a value of
ˆ
A
12
< 0:5 means the other variation is better
than TDPrioritizer in more than 50% of the time, and a value of
ˆ
A
12
= 0:5 means the two approaches are
equivalent.
As illustrated in Table 6.3, V1 failed to find a feasible solution for 39 systems. Though V1 was able to
find feasible solutions for XML Graphics, TDPrioritizer outperformed V1 in terms of the best solutions’
values and improved the best value of repayment by 2,584 units of value, on average, over the 100 runs.
Also, the results of Mann-Whitney U-test and Vargha and Delaney’s
ˆ
A
12
were a p-value < 0:001 and an
ˆ
A
12
= 1:0, which confirm that TDPrioritizer has significantly outperformed V1. The results demonstrate
that solely applying random search is not sufficient enough for TD prioritization.
Moreover, TDPrioritizer obtained greater best repayment values than V2 for all the 100 runs on all
systems. The statistical tests’ results were a p-value< 0:001 and an
ˆ
A
12
= 1:0. The results indicate that
TDPrioritizer significantly surpassed V2 in terms of obtained best solutions’ values. Therefore, one can
conclude that applying random search on the initial, greedy population is inadequate to prioritize TD.
Similarly, for all the systems, TDPrioritizer obtained higher best repayment values than V3 for all the
runs on all systems. The statistical tests resulted in a p-value< 0:001 and an
ˆ
A
12
= 1:0, which confirm that
TDPrioritizer significantly surpassed V3 and that applying random search on the initial, greedy, repaired
population is insufficient to prioritize TD.
94
As TDPrioritizer significantly outperformed V1, V2, and V3, it can be concluded that it is necessary
to apply NSGA-II in the approach and that the approach outperforms random search.
RQ2: How does the approach’s prioritization perform in comparison to software practitioner’s
prioritization?
To compare the performance of the approach to the performance of software practitioners, TDPrioritizer
was run 100 times on each practitioner’s input, and the value of the best obtained solution for each run was
recorded. Afterward, the values obtained by TDPrioritizer were compared to the values obtained by each
software practitioner. To consider the approach effective, TDPrioritizer must obtain values similar to or
greater than the ones obtained by the software practitioners.
Table 6.4 summarizes the results of TDPrioritizer and the software practitioners. Column “P#” iden-
tifies each practitioner, “Practitioner value” column represents the value obtained by each practitioner,
the average of best solutions’ values for the 100 runs obtained by TDPrioritizer is displayed in column
“A VG(TDPrioritizer value)”, column “A VG(diff)” displays the average variation between software practi-
tioner’s obtained value and the best solutions’ values for the 100 runs obtained by TDPrioritizer. As the
table demonstrates, TDPrioritizer was able to obtain similar values to the ones obtained by 14 software
practitioners (i.e., 21.21% of 66 practitioners), and it outperformed the performance of 52 practitioners’
prioritization (i.e., 78.79% of 66 practitioners), on average.
One should keep in mind that when there is a variation between a software practitioner’s result and
TDPrioritizer’s result, a statistical test should be conducted to ensure that the variations between the prac-
titioners’ values and TDPrioritizer’s values are not random. Consequently, the One-Sample Wilcoxon test
[118] was applied to measure the significance of variations in practitioners’ and TDPrioritizer’s repayment
values, as recommended by [33]. Specifically, the following null hypothesis was tested: TDPrioritizer’s
values are symmetric about a practitioner’s value, and the alpha value was set to 0.05. To measure the
effect size, the suggestion of [182] was followed and the effect size was calculated by dividing the test
statistic Z by the square root of the number of observations. As for an interpretation of the results, a value
0:80 is considered as very high; a value 0:60 and< 0:80 is considered as strong; a value 0:40 and
< 0:60 is considered as moderate; a value 0:20 and< 0:40 is considered as low; and a value< 0:20 is
considered as very low, following the guidelines presented in [39].
The results of the statistical test are presented in Table 6.4, where column “P-value” presents the p-
value of the One-Sample Wilcoxon test, and “ES” refers to the obtained effect size. It should be noted
95
that in the “P-value” and “ES” columns, an “NA” value was used to refer to the rows with no variations
between the approach’s values and practitioner’s obtained value.
In all the cases of variations, the p-value was< 0:001 and the effect size was 0:80, which provides
sufficient evidence that the improvements of value obtained by TDPrioritizer are not results of chance.
Since TDPrioritizer obtained similar solutions’ values to those acquired by practitioners or significantly
outperformed practitioners’ solutions, it can be concluded that TDPrioritizer is similar to or better than
software practitioners’ prioritization.
Table 6.4: Summary of the approach and participating software practitioners prioritization results
P# Practitioner value A VG(TDPrioritizer value) A VG(diff) P-value ES
P1 1,718 1,722 4 < 0:001 0:8
P2 1,840 2,084 244 < 0:001 0:8
P3 1,360 1,540 180 < 0:001 0:8
P4 882 2,070 1,188 < 0:001 0:8
P5 2,384 2,763 379 < 0:001 0:8
P6 2,044 2,584 540 < 0:001 0:8
P7 5,110 5,155 45 < 0:001 0:8
P8 3,574 3,645 71 < 0:001 0:8
P9 2,851 2,868 17 < 0:001 0:8
P10 2,728 2,736 8 < 0:001 0:8
P11 3,289 3,321 32 < 0:001 0:8
P12 1,060 1,214 154 < 0:001 0:8
P16 860 860 0 NA NA
P17 1,440 2,940 1,500 < 0:001 0:8
P18 209 506 297 < 0:001 0:8
P19 52,288 57,600 5,312 < 0:001 0:8
P20 510 867 357 < 0:001 0:8
P21 4,588 7,442 2,854 < 0:001 0:8
P22 774 1,012 238 < 0:001 0:8
P23 3,773 3,781 8 < 0:001 0:8
P24 955 2,340 1,385 < 0:001 0:8
P26 87,650 87,650 0 NA NA
P32 1,080 1,740 660 < 0:001 0:8
P34 8,585 8,743 158 < 0:001 0:8
P35 11,792 11,804 12 < 0:001 0:8
P36 20,400 95,520 75,120 < 0:001 0:8
P37 6,708 6,712 4 < 0:001 0:8
P38 16,300 16,405 105 < 0:001 0:8
P39 12,725 12,750 25 < 0:001 0:8
P40 696 704 8 < 0:001 0:8
P42 312 452 140 < 0:001 0:8
P43 337 349 12 < 0:001 0:8
P44 132 157 25 < 0:001 0:8
P45 1,780 2,772 992 < 0:001 0:8
P46 750 774 24 < 0:001 0:8
P47 2,463 2,791 328 < 0:001 0:8
P48 1,240 1,313 73 < 0:001 0:8
P49 1,077 1,084 7 < 0:001 0:8
P50 19,690 19,700 10 < 0:001 0:8
P51 1,766 1,770 4 < 0:001 0:8
P54 830 970 140 < 0:001 0:8
P58 994 998 4 < 0:001 0:8
P60 99 129 30 < 0:001 0:8
P61 414 1,095 681 < 0:001 0:8
P63 7,955 7,957 2 < 0:001 0:8
P64 78,418 78,418 0 NA NA
P65 290 300 10 < 0:001 0:8
P67 33,400 33,400 0 NA NA
P68 1,596 1,666 70 < 0:001 0:8
P69 4,616 4,616 0 NA NA
P70 1,879 1,897 18 < 0:001 0:8
P72 3,933 3,977 44 < 0:001 0:8
P73 831 831 0 NA NA
P74 805 805 0 NA NA
P75 28 28 0 NA NA
P76 1,208 1,208 0 NA NA
P77 270 270 0 NA NA
P79 55 55 0 NA NA
P81 480 480 0 NA NA
P82 1,600 1,760 160 < 0:001 0:8
P83 1,800 1,800 0 NA NA
P84 54 55 1 < 0:001 0:8
P85 208 210 2 < 0:001 0:8
P86 652 977 325 < 0:001 0:8
P87 277 277 0 NA NA
P89 259 343 84 < 0:001 0:8
96
RQ3: How long does the approach take to generate a prioritized solution set?
In both experiments, the running time of TDPrioritizer was recorded for all the runs. The running time of
TDPrioritizer ranged from 17 seconds to 25 minutes, with an average of 3 minutes and a median of fewer
than 2 minutes. As of July 2020, an Amazon EC2 t2.xlarge instance was priced at $0.3712 per hour. Thus,
with an average running time of 3 minutes, the average cost of running TDPrioritizer was $0.01856 per
subject.
Software practitioners’ perception of TDPrioritizer
The approach and its results were presented to the 11 participating software practitioners affiliated with the
companies that participated in the investigative study, in Chapter 5, to gain an insight regarding the applica-
bility and potential of the approach in real-world settings. The practitioners appreciated how TDPrioritizer
generates the output in a reasonable time frame, which lowers manual workload on the associated software
practitioners. The practitioners valued that TDPrioritizer produces a set of solutions with varying values
and costs that demonstrates the value and cost trade-offs. Not only does demonstration of value and cost
trade-offs aid software practitioners in determining the amount of cost required to gain a value, but it also
aids in justifying spending more resources on TD repayment to higher management, as the tool optimizes
over value and cost and demonstrates the amount of value improvement that can be achieved by spend-
ing more resources. Furthermore, the practitioners appreciated how TDPrioritizer’s generated solutions
specify which exact TD items should be repaid and the flexibility that TDPrioritizer provides in terms of
valuing TD items and estimating costs to fix.
The practitioners suggested that adding an automated filtering scheme that facilitates excluding TD
items from prioritization based on the file in which the item exists or the rule that the item violates may
be of benefit. The filtering scheme would allow practitioners to easily exclude items that they intend
to repay regardless of their cost from the prioritization process. Moreover, it would enable practitioners
to omit items that they will not consider for repayment, as these items might not fit the purpose of the
repayment activity, or the practitioner does not have authorization to modify such items. Furthermore,
the practitioners suggested adding the capability of automatically calculating file weights by connecting
TDPrioritizer with different version control systems and issue tracking systems. Based on the information
stated thus far, the results are promising and revealed the potential of the approach in industry settings.
The points of improvement can be incorporated to TDPrioritizer in future work.
97
6.4 Threats to Validity
Several factors can threat the validity of this study. These factors and their corresponding mitigations
are discussed in this section based on five types of threats: construct, internal, conclusion, external, and
reliability following the guidelines in [78, 195].
Construct validity
Construct validity reflects what degree the measures that are used precisely assess what is intended to study
[195]. The study aims to provide a TD prioritization approach that maximizes the value of a repayment
activity while minimizing its cost and satisfying its resource constraint. One potential threat to construct
validity in this study involves the accuracy of measuring TD. However, the approach is not dependant
on a specific TD measurement method and can be adapted to any TD measurement method since it only
requires the value and cost of each TD item. Additionally, in the evaluation of the approach, SonarQube
was utilized to identify and measure TD items. SonarQube is the only OSS tool that identifies and measures
TD, and it is widely utilized in both industry and OSS communities [57, 140]. Another potential threat is
that value and cost are subjective, and individuals might have different interpretations of value and cost.
However, in the evaluation of the approach, the estimated value and cost of each TD item were used across
all the comparisons. In the first experiment, the assigned value and cost of each TD item were utilized
across all the runs and variations of the experiment. Similarly, in the second experiment, the value and
cost of each TD item that were input to the approach were identical to the ones assigned by each software
practitioner.
Internal validity
Internal validity relates to whether the results in a study truly follow from the data [195]. Primarily,
whether the metrics are meaningful to the conclusions and whether the measurements are adequate. To
avoid bias and error, manual methods and human judgment were avoided when identifying and measuring
TD in the study. The study relied on SonarQube to identify and measure TD, which ensured that all of
the systems were assessed using one unified set of rules and eliminated any subjectivity. Additionally, to
mitigate threats caused by confounding factors, the approach and random search were compared under
identical parameter settings, and the same value and cost of TD items in addition to resource constraint
values were used across all the runs in the evaluation. Having identical value and cost of TD items in
98
addition to the resource constraint was also ensured when comparing the performance of the approach to
the performance of the software practitioners.
Conclusion validity
Conclusion validity is concerned with random variations and adequate application of statistical tests [121].
To mitigate these threats, the standard guidelines and best practices in search-based software engineering
[33, 114] were followed. The recommended tests for statistical testing and effect sizes were meticulously
applied in addition to thoroughly verifying all the required assumptions. Furthermore, the approach was
run 100 times on each input during the evaluation, as recommended by [33].
External validity
External validity refers to the generalizability of the findings of a study [195]. There are several limitations
to the generalizability of the conclusions of this study. First, in the evaluation, comparing the approach
with random search was based solely on 40 OSS systems. Undoubtedly, the size of the subject systems
in the evaluation is smaller than the population of all software systems. To mitigate this threat, these
systems were randomly selected from one of the leading OSS communities. Additionally, these selected
systems are varying among various dimensions, such as domain, size, time-frame, and the number of
versions. Second, in the evaluation of the approach, comparing the performance of the approach to the
performance of the software practitioners was limited to a sample of 66 software practitioners. Indeed,
having a larger number of practitioners would strengthen the evaluation. However, there was a hurdle
in recruiting software practitioners to participate in the study due to privacy concerns and hesitation in
sparing the time of software practitioners. The study started by including industry software practitioners.
Then the set of industry practitioners was expanded by including students from the University of Southern
California (USC) CSCI 590 master’s level course. The set of 66 practitioners includes practitioners with
varying levels of experience and various team roles which mitigates this threat to the external validity of
the study. Third, the prototype tool and its results were only presented to the set of practitioners affiliated
with industry. Presenting the tool and its results to the set of practitioners affiliated with the USC CSCI
590 course might have revealed many other interesting improvements to the prototype tool. Nonetheless,
there was a difficulty in reaching this set of practitioners, as these students graduated.
99
Reliability
The reliability of a study is measured by the reliability and reproducibility of its findings [195]. Any threat
to this aspect of validity was striven to be eliminated by presenting a detailed formal description of the
approach and its evaluation in addition to including all the parameter settings that were used.
6.5 Summary
In this chapter, the TD prioritization problem is formulated as a decision problem in which there are two
conflicting objectives (i.e., value and cost) that need to be balanced and a resource constraint that must be
satisfied. The inputs of the approach are a resource constraint value that serves as a maximum value for
cost in a given repayment activity and a list of TD items, which are to be repaid. The list of TD items
includes information on the value of each TD item and the cost required to repay each TD item. The
approach utilizes the NSGA-II and a greedy-repair, constraint-handling technique to effectively identify
which TD items are to be included in a given repayment activity to maximize its value and minimize its
cost without exceeding its respective resource constraint. The approach represents each repayment solution
as a bit string, where the value of each bit specifies whether the item should be considered in the current
repayment activity. The initial population of solutions is partly generated randomly, and the other half of
the population is generated using a greedy algorithm to facilitate the search convergence. The quality of
a solution is measured using an evaluation function that determines the quality of a solution based on its
value and cost. Solutions evolve through a set of genetic operators, and the approach terminates when the
total number of fitness evaluations reaches a predefined maximum number of evaluations. When the search
terminates, the approach outputs the set of Pareto-optimal solutions, which represents the TD prioritized
solutions discovered during the search. The set of Pareto-optimal solutions includes the value of each
solution, its cost, and the set of TD items selected to be repaid. Along with that, the Pareto-front is also
graphed to facilitate the interpretation and evaluation of alternative solutions.
The approach was evaluated through the means of two experiments to compare its performance to ran-
dom search and software practitioners in addition to identify its running time. The approach was compared
to random search using 40 OSS systems, and the evaluation confirmed the approach’s effectiveness over
random search, as the approach surpassed random search in terms of best obtained solution’s value in all
the cases. Additionally, the approach was compared to the performance of 66 software practitioners, and
it obtained best solutions’ values similar to or greater than the solutions’ values acquired by the software
100
practitioners. The approach achieved greater values than the values acquired by software practitioners in
52 out of 66 cases, and it obtained similar values to the ones obtained by 14 practitioners. Furthermore,
it only required the approach an average of 3 minutes to generate the prioritized solution set in both ex-
periments. The approach and its results were presented to 11 software practitioners from the participating
companies. The feedback of the practitioners was positive and highlighted the potential of the approach in
industry.
101
Chapter 7
Related Work
This chapter presents related work to secondary studies on technical debt (TD), empirical studies on TD,
and approaches that utilized search-based techniques for prioritization in software engineering.
7.1 Secondary Studies on TD
Secondary studies refer to studies that aim to synthesize evidence related to a specific topic by reviewing all
the primary studies relating to that specific topic [127,130]. The software engineering research community
has conducted several secondary studies to investigate the current TD and technical debt management
(TDM) literature from multiple perspectives.
Tom et al. [216] were the first to conduct a systematic literature review (SLR) on the current, at the time,
TD academic literature. The aim of the study is to establish boundaries of the TD metaphor and develop
a comprehensive theoretical framework to facilitate future research in the field. The study included all
publications after 1992 up to the time of the search from the Scopus, Inspec, and Web of Science databases,
with forward and backward snowballing also being applied. The search and selection criteria resulted in
a total of 19 papers. The SLR found that while code decay and architectural deterioration are commonly
recognized to be major elements of TD, other aspects, such as a lack of documentation, are yet to be widely
recognized as a source of TD. This suggests that there is no clear boundary regarding the classification of
TD. Additionally, the study unveiled that time pressure is one of the reasons for incurring TD in addition
to other constraints, such as budgeting or resourcing. The SLR also revealed that decreasing the cost of a
current release and accelerating short term development are the most recognized benefits for taking on TD.
In contrast, the most recognized consequences of taking on TD include an increasing cost of maintenance,
102
the inability to accurately estimate effort, difficulty in repayment decisions, increasing risks and costs of
changes, and an overall decrease in system quality. The authors developed an initial theoretical framework
of TD to aid in TDM and to steer future empirical studies on TD. Additionally, the authors noted that the
current studies on TD address the topic from an abstracted viewpoint and that there is a lack of empirical
studies that examine TD. Consequently, it was concluded that future research should incorporate empirical
studies to validate techniques and heuristics to aid in TDM.
To build upon the results of their SLR and expand their search scope, Tom et al. [217] performed a
multivocal literature review (MLR), supplemented with 11 software practitioners’ interviews. MLRs are
a type of literature review that includes all accessible writing, including non-academic, on a given topic.
Examples of literature included in an MLR are internet blogs and white papers. The authors complemented
their prior search in their SLR by searching for “technical debt,” “design debt,” and “debt metaphor”
in Google and reviewed the 50 most relevant results. The authors conducted multiple iterations with
other search strings which were inferred from the initial search. The MLR found that TD has multiple
dimensions: code, design, documentation, and testing. Additionally, it was found that TD can be described
using different attributes: monetary cost, amnesty, bankruptcy, interest, principal, leverage, repayment,
and withdrawal. The MLR also revealed that TD precedents include pragmatism, prioritization, process,
attitude, oversight, and ignorance. Additionally, the MLR also unveiled that the outcomes of TD lead to
a downward spiral of a software system quality, team moral, team productivity, and risk prevention. The
study summarized its findings in a theoretical framework that provides a comprehensive understanding of
TD from academic and industry perspectives alike. The results of the MLR framework were evaluated
by three informant reviewers to increase the validity of the study. Furthermore, the authors stated that
future research should work towards further quantifying the associations for different forms of TD and
establishing metrics to identify and quantify TD.
Unfortunately, in 2014, there were many ambiguities present in the use of TD as a term and, subse-
quently, how to manage it. Li et al. [141] aimed to address this gap of knowledge in the field of TD and
conducted a systematic mapping review (SMR). The objective of this SMR is to acquire a comprehensive,
general understanding of TD and TDM and to display promising future research directions. The authors
reviewed studies published from 1992 to 2013, which were collected from the following databases: ACM
Digital Library, CiteSeer, IEEEXplore, Inspec, ISI Web of Science, Science Direct, Scopus, SpringerLink,
and Wiley InterScience. The search and filtering criteria of the study resulted in 94 studies, with only
backward snowballing being conducted. The SMR stated that there are 10 types of TD mentioned in the
103
reviewed literature, which are as follows: architectural, build, code, defect, design, documentation, in-
frastructure, requirement, test, and versioning debt. Of the identified types of TD, code TD was the most
frequently studied. Additionally, the authors identified 24 notions, referring to any term that has a direct
relationship with TD and is employed to explain or describe TD. Interest, risk, and principal were found to
be the most frequently identified notions. Moreover, five categories were formed to aid in differentiating
between notions and are as follows: metaphor, property, uncertainty, effect, and cause. The SMR also
revealed that most of the current studies on the various consequences of TD indicate that TD has a severe
negative impact on a software system maintainability. Among the retrieved studies, only three studies
specifically addressed the limitations with the TD metaphor. The limitations identified in such studies in-
clude: insufficient application of the TD metaphor in modern development approaches, a lack of standard
units of measurement, difficulty in equating TD to an interest rate, and that individuals who take on TD are
usually not the ones to repay it. In addition, the study of Li et al. [141] presented TDM as a combination of
eight separate activities: identification, measurement, prioritization, prevention, monitoring, repayment,
representation, and communication. Additionally, 29 tools were identified to aid in the TDM process.
Furthermore, the authors called for more empirical studies on the TDM process and on the application
of TDM in industry settings. Similar to the SLR in this dissertation, the SMR was able to identify six
studies that each proposes a TD prioritization approach [109, 139, 140, 204, 211, 237]. This dissertation’s
SLR did not include study [204], as the decision was made to instead include study [111], which is a more
comprehensive version of the same approach and includes an evaluation of said approach.
Ampatzoglou et al. [29] conducted an SLR specifically on the financial aspects of TD. The study aims
to identify the financial approaches utilized in TDM and create a glossary of financial terms and definitions
related to TDM in the software engineering field. To examine the existing literature pool, Ampatzoglu et
al. first conducted a manual search on the following venues: International Workshop on Managing Tech-
nical Debt (MTD), IEEE Software (SW), and Journal of Software Quality (SQJ). The authors created a
quasi-gold standard from the results of the manual search that was used to validate the accuracy and com-
pleteness of the automated search. Using this quasi-gold standard, the authors then performed an auto-
mated search into seven digital libraries: ACM Digital Library, IEEExplore, ScienceDirect, SpringerLink,
Scopus, Web of Science, and Google Scholar, without conducting any form of snowballing to expand the
search scope. The search and application of the search criteria resulted in 69 primary studies, which were
all published prior to September 2013. The authors presented a detailed glossary of financial terminology
and a classification schema of the financial approaches incorporated in TDM. These results revealed that
104
principal and interest are the most common financial terms referenced in TD research and that the most
frequently applied financial approaches for TDM are the following: real options analysis (ROA), modern
portfolio theory (MPT), cost-benefit analysis (CBA), and value-based analysis. Researchers found a lack
of consistency among the evaluated studies when applying these approaches and an absence of a clear and
concise mapping between the identified financial and software engineering concepts. Unfortunately, the
review only addressed the financial approaches utilized in TDM in general (i.e., identifying, prioritizing,
repaying, and monitoring), without specific emphasis on TD prioritization. However, the gap in research
regarding TD prioritization approaches was emphasized. The SLR was only able to identify two stud-
ies [205, 237] that proposed a total of five TD prioritization approaches. Similarly, this dissertation’s SLR
identified current TD prioritization approaches and the techniques these approaches employed. However,
the focus of this dissertation’s SLR is TD prioritization, and this dissertation’s SLR is not constrained to ap-
proaches that employ financial concepts. By displaying the results of the SLR, the authors aim to provide
the resources needed to help improve communication between software practitioners and non-technical
managers. Additionally, the authors called for paying more attention to the interdisciplinary nature of TD
and to verify the correct application of economic and financial approaches in TDM.
Building upon the last two secondary studies presented previously in this section, Alves et al. [26] con-
ducted an SMR to address the current strategies, at the time, that have been proposed to identify or manage
TD in software projects. Specifically, the SMR focused on defining different TD types, identifying indica-
tors of TD existence, identifying TDM strategies, understanding the maturity level of each identification
and management technique, and identifying the available TD visualization techniques. The authors’ search
for primary studies included publications from 2010 to 2014. The authors found such publications from
the following libraries: ACM Digital Library, IEEEXplore, Science Direct, Engineering Village, Springer-
Link, Scopus, Citeseer, and DBLP. After applying backward snowballing and subsequently the search
criteria, the authors reviewed a total of 100 studies. The SMR resulted in an initial taxonomy of 15 TD
types: architectural, build, code, defect, design, documentation, infrastructure, people, process, require-
ment, service, test, test automation, usability, and versioning debt. The SMR also identified 36 indicators
of TD existence and which TD types each indicator might aid in its identification. The imbalance of the
indicators and TD types was highlighted, as some TD types have many indicators while other types have
little to none. Unfortunately, few of the identified indicators were found to be evaluated. Additionally, the
SMR addresses TDM, which in the context of this study refers to strategies that measure the quantity of
105
TD, value of TD, and determine the best time to repay TD. The authors identified 20 different TDM strate-
gies, highlighted the lack of evaluation for such strategies, and concluded that TDM is context-dependant.
Lastly, the SMR revealed that 22 of the 100 primary studies proposed a software visualization technique
in the context of TDM. The most proposed techniques were the following: dependency matrix, bar graph,
and pie chart format. As displayed by the results of this SMR, the number of studies on TD types and TD
indicators is growing while the evaluations of these proposals are lacking. The SMR noted that TD was in
a phase of exploration and expansion. However, as more and more people research and evaluate TD, the
field of TD research is becoming narrower and more focused, with the study and evaluation of the most
effective practices becoming more popular.
Due to the importance of the decision criteria in TDM and the lack of a comprehensive view on the
topic, Ribeiro et al. [191] conducted an SMR study to identify the decision criteria utilized in the repayment
of TD. The study conducted an SMR over studies published up to 2014. The evaluated studies were
selected from the following databases: ACM Digital Library, IEEEXplore, and Scopus. Unfortunately,
snowballing was not applied to expand the search scope. The SMR included any study that explored
decision making criteria in its theory, practice, or approach, without any distinction between these type
of studies. The resulting literature pool consisted of a total of 38 studies. From these studies, the authors
provided a summary of 14 decision criteria used in the TD repayment process. Most of the decision criteria
were only applicable for use in the repayment of defect debt or design debt. The SMR also highlighted
the fact that none of the selected studies conducted empirical evaluations to assess the identified decision
criteria. This dissertation’s SLR focus is TD prioritization approaches, as opposed to repayment decision
factors, and it provides a deep analysis of the current TD prioritization approaches.
Fern´ andez-S´ anchez et al. [95] performed an SMR, with the focus being on the elements required to
manage TD effectively. The search included all papers published until and including 2015, from the follow-
ing databases: IEEEXplore, ACM digital library, Scopus, ScienceDirect, Web of Science, and Springer-
Link. Applying the search criteria and snowballing resulted in a literature pool that consists of 63 pa-
pers. The authors summarized the elements that are considered in TDM in general and categorized them
into three categories: essential decision factors, cost estimation techniques, and techniques for decision-
making. The identified essential decision factors include: TD item, TD principle, TD interest, TD interest
probability, and TD impact. The found cost estimation techniques include: automated and experts tech-
niques. Techniques for decision-making include: scenario analysis, time-to-market, when to implement
decisions, TD evolution, and TD visualization. The SMR revealed that stakeholders are essential in TDM
106
and that various stakeholders are involved in TDM. These stakeholders can be classified into engineers,
engineering managers, and business organizational managers. Furthermore, the SMR summarized what
elements are considered by each class of stakeholders and concluded that TDM is context-dependent.
Lastly, the SMR assessed the relevancy of the selected studies that proposed elements in the industry by
identifying whether the identified elements were evaluated and assigning a rigor and relevance score for
each considered study. The results demonstrate the lack of evaluation in earlier studies and an increasing
trend to include evaluation in recent studies.
Behutiye et al. [42] conducted an SLR on the concept of TD in the context of agile software devel-
opment. The researches aimed to analyze and synthesize TD, its causes and consequences, and various
TDM strategies, all within the context of agile software development. Researchers referenced publications
from the following databases: ACM Digital Library, IEEEXplore, ProQuest, Scopus, Web of Science, and
Google Scholar. Applying the selection criteria and backward snowballing resulted in a literature pool
that consists of 38 studies. The review determined five research areas within the context of agile software
development from the literature pool: TDM in agile software development, architecture in agile software
development and its relationship with TD, TD know-how in agile software development teams, TD in
rapid fielding development, and TD in distributed agile software development. The study found that, in
agile development, TD occurs primarily due to the following reasons: over-emphasis on rapid delivery,
design and architectural issues, inadequate test coverage, unclear requirements, overlooked and delayed
solutions or estimates, lack of refactoring, and code duplicates. Once the source actions of TD had been
identified, the authors reviewed the specific consequences that can occur when TD is accumulated in agile
software development. The most recognized TD consequences in agile development were reduced pro-
ductivity, degradation in system quality, increased maintenance costs, system rebuilds, and market share
losses. Additionally, the SLR revealed that refactoring and improving the visibility of TD are the most
proposed TDM strategies in the context of agile software development.
Becker et al. [41] conducted an SLR that concentrates on the trade-off decisions across time in TDM.
The authors searched ACM Digital Library, IEEEXplore, and Scopus databases for potential candidate
studies. The search criteria resulted in 240 studies being selected to review and analyze. Results from
the review revealed that only nine studies explicitly used empirical methods to study specific trade-off
decisions. In these studies, decisions are assumed to have been made by weighing and evaluating multiple
alternatives against different criteria. Moreover, these studies focus on obtaining these various measures.
The SLR also revealed that these nine studies have different notions of time and none of them account for
107
intertemporal choices (i.e., “Decisions involving trade-offs among costs and benefits occurring at different
times [102]”).
7.2 Empirical Studies on TD
Empirical studies in software engineering refer to the collection and analysis of data based on direct or
indirect observations or experiences. These studies aim to evaluate, characterize, and reveal relationships
between software development practices, technologies, and deliverables through empirical evidence [69,
130]. Software engineering researchers have conducted multiple empirical studies aiming to understand
and observe TD prioritization. However, no study has yet addressed TD prioritization in the presence of
a resource constraint. Nonetheless, this section summarizes a few related studies that investigate TD and
their respective findings.
Codabux and Williams [66] interviewed 28 engineers in a software development division of a mid-sized
company to understand their perceptions of agile software adoption and how TD affects the development
process. Researchers found that there is no consensus on the terminology of TD, specifically that the
definition of TD coincides with the work that a given engineer is currently completing. Additionally, the
interviews demonstrated that test debt, code debt, and defect debt are the most frequently encountered
types of TD. Moreover, it was found that architectural debt is the most difficult debt to repay, as it requires
a large amount of cooperation between teams. The study also pointed out that the participants believe
that customer requests are the primary driver for TD prioritization, as customer requests determine the re-
sources availability to repay TD. Furthermore, a smaller number of participants indicated that TD severity
can aid in determining which TD items should be repaid. The study also found that there exist dedicated
teams for TD reduction that focus on reducing different TD types in several parts of a project. Additionally,
the study revealed that a few teams contribute to TD reduction by repaying TD in the part of the project
that they currently work on. Unfortunately, the study found that some TD remains unpaid forever in view
of the fact that teams prefer spending available resources on developing new features.
Bomfim Jr. and Santos [49] interviewed six technical leaders from four different software companies
to understand how agile teams deal with TD in their daily work. The study revealed that the majority
of the participating teams rely on manual methods to identify TD and that only one team depend on
SonarQube to identify TD. In addition, the study unveiled that daily and weekly meetings, code reviews,
and retrospectives are utilized to discuss TD. However, the study found that four teams add TD to their
108
primary backlog, without explicitly referring to the items as TD. Furthermore, the researchers were able
to bear witness to a few TD repayment strategies, and they summarized the observed standard decision
flows for TD repayment in a flowchart. The study found that decisions regarding TD repayment are based
on impact of TD, cost of repaying TD, and opinions of product owners or customers. The study also
highlighted the challenge of allocating resources for TD repayment, as TD repayments tend to lack a clear,
tangible value.
Martini et al. [159] conducted multiple case studies on architectural debt at a selected software com-
pany. The researchers utilized Arcan [100] to analyze four software systems, developed within the com-
pany, to detect their respective architectural smells. Subsequently, for each system, the researchers selected
a sample of 22 smells from the detected architectural smells. These smells have the highest severity and
belong to three architectural smell types. Afterward, the researchers interviewed nine practitioners from
the teams responsible for the projects containing the identified smells. During the interviews, the re-
searchers introduced Arcan, displayed a graphical representation of the sampled architectural smells, and
asked the participants to identify the architectural issues related to the selected smells in addition to filling
out a questionnaire. The results revealed that 50% of the architectural smells were associated with re-
duced development speed when adding new functionalities. Furthermore, in more than 60% of the cases,
participants associated cyclic dependency smells with an increased number of bugs and pointed out that
the negative impact, which grows over time, of architectural smells can extend to writing test cases and
fixing conflicts during merging. Moreover, participants indicated that conducting refactoring, generally,
would not create any negative side effects. Additionally, the results revealed a strong correlation between
a smell’s negative impact and its priority. The results also found a medium-strong positive correlation
between cost and priority, as practitioners prefer to repay higher cost TD items first
SAE-LIM et al. [201] conducted an investigative study to identify factors that practitioners consider
in code smell filtration and prioritization process during prefactoring phase of development, the phase in
which developers refactor source code before implementing their code. The study involved 10 profes-
sional developers, who have working experience ranging from 2 to 13 years. The researchers analyzed an
open-source software (OSS) project to detect code smells and selected five issues from the issue tracking
system. Subsequently, researchers provided the participating developers with the list of five issues. The list
of issues included the summary and description of each issue, the solution for each issue, and a list of 22
code smells. The code smells included the following: bob class, data class, god class, and schizophrenic
109
class. The researchers then requested the participants to select code smells to refactor, provide their re-
spective selection criteria, and describe their respective prioritization processes. The study revealed that
task relevance, smell severity, task implementation cost, testability, co-located smells, module importance,
readability, smell false positives, smell coupling, maintainability, and understandability are the factors that
participants consider when prioritizing code smells, where the list order reflects the frequency of the fac-
tors in the participants’ responses from highest to lowest. The study did not find a significant difference
between the filtration and prioritization factors of junior or senior participants.
Martini and Bosch [158] investigated information needed by agile product owners and architects to
prioritize architectural TD through the means of interviews, observations, and surveys. The researchers
also explored the differences between architects and product owners when prioritizing architectural TD.
The study involved six cases in four large companies and included three portfolio managers, three prod-
uct owners, five architects responsible for a system level, and four architects responsible for sub-systems.
The researchers began the study by conducting a 30-minute workshop to present the considered prioritiza-
tion aspects to the participants. This 30-minute workshop aims to ensure that both parties have a mutual
understanding of the concepts of the study. After conducting the workshop, the researchers divided the
participants into two-factor groups to analyze two concrete architectural TD prioritization cases. The par-
ticipants were then gathered to summarize and discuss the results, and the effects of architectural TD were
presented and analyzed. Subsequently, the participants were asked to complete a questionnaire on the
information needs of architectural TD. The results of the questionnaire revealed that when prioritizing ar-
chitectural TD, the participants consider the following aspects, sorted based on the average of importance
and frequency from highest to lowest: competitive advantage, specific customer value, market attractive-
ness, lead time, maintenance cost, customer long-term satisfaction, risk penalty, volatility. Moreover, the
results unveiled that product owners rely on specific customer values and the attractiveness of said TD item
for the market when prioritizing architectural TD items. Regarding the usefulness of the effects of archi-
tectural TD in prioritization, the study found that product owners are more concerned with big deliveries.
On the contrary, architects are more concerned with code changes. Furthermore, both architects and prod-
uct owners consider information regarding contiguous architectural TD to be crucial in the prioritization
process, and both consider probable, hidden architectural TD trivial for prioritization.
110
7.3 Search-Based Prioritization
Search-based techniques have been widely utilized for prioritization in software engineering, specifically,
to solve the next release problem (NRP) [36]. The problem involves deciding which software requirements
are to be included in the next release. This section presents studies that applied search-based techniques to
solve the NRP.
Baker et al. [37] formulated the NRP as a single objective optimization problem to maximize value.
The authors applied greedy and simulated annealing algorithms on a real-world data set from Motorola.
The study found that simulated annealing algorithm outperformed greedy algorithm. Additionally, the
study revealed that both algorithms surpassed expert ranking. While the study does not optimize over cost,
it accounts for a cost constraint.
Del Sagrado et al. [82] applied ant colony optimization to solve the NRP. The researchers formulated
the problem as maximizing customer satisfaction without exceeding an effort constraint. Customer satis-
faction is measured by summing the multiplications of each customer’s requirement value to the weight
of each customer. The researchers applied ant colony, genetic algorithm (GA), and simulated annealing
techniques to a real software project problem obtained from [107] and compared the performance of the
three algorithms. The results indicate that ant colony surpassed the other techniques in terms of customer
satisfaction.
Zhang et al. [241] formulated the NRP as a multi-objective optimization (MOO) problem to maximize
value and minimize cost. The authors utilized four search techniques namely: random search, single-
objective GA, Pareto GA, and non-dominated sorting genetic algorithm-II (NSGA-II). The authors gen-
erated test problems to use them in two empirical studies. The test problems were created by assigning
random values for value and cost. Generated values range from 0 to 5, and generated costs range from 1
to 9. In the first empirical study, the researchers compared the performance of NSGA-II with Pareto GA,
random search, and single-objective GA. The study demonstrated that NSGA-II outperformed the other
techniques in terms of quality of solutions and finding a large and important part of the Pareto-front in
large problems. The study provided an evidence of the suitability of NSGA-II to solve NRP. In the sec-
ond empirical study, the researchers examined the number of requirement that makes the NRP non-trivial.
The results revealed that when the number of requirements exceed about 20, the NRP becomes non-trivial
and applying a search-based technique becomes worthy. The study did not consider cost constraint nor
provided any guidance on handling cost constraint.
111
Tonella et al. [218] proposed an interactive GA algorithm to generate a requirement order that com-
plies with existing technical constraints and requirement dependencies in addition to accounting for users’
relative preferences. The authors represent requirements as precedence graphs constructed from require-
ment documents and, subsequently, applied GA to minimize disagreement among ordered requirements.
In cases of ties among requirements, the interactive GA asks for user input to resolve these ties. The al-
gorithm neither optimizes for cost of requirements nor accounts for cost constraint. The algorithm was
evaluated through a real case study, as part of the project Ambient Aware Assistance (ACube) [31]. The
performance of the interactive algorithm was compared to the performance of regular GA, which does not
support user input. The results revealed that the interactive GA algorithm outperformed the regular GA,
and that the better ordering was achieved by eliciting between 50 and 100 pairwise comparisons from the
user.
Durillo et al. [89] formulated the NRP as minimizing cost and maximizing total customer satisfaction
problem. Customer satisfaction is indicated by summing the multiplications of each customer’s require-
ment value to the weight of each customer. The approach accounts for requirements’ dependencies, and it
does not allow the inclusion of a requirement in a solution if its dependencies were excluded from the so-
lution. However, the approach does not provide any mechanism to support cost constraint. The researchers
applied NSGA-II, MOCell, and random search algorithms on the problem and compared the performance
of these algorithms in two empirical studies with randomly generated test problems. The results of the
first empirical study demonstrated that MOCell outperformed NSGA-II in terms of solutions spread, an
indicator of solutions diversity. Additionally, the results indicated that NSGA-II surpassed MOCell in
terms of Hypervolume (HV), an indicator of the volume in the solutions space covered by members of
a non-dominated set of solutions, when the number of requirements exceeds 40 requirements. However,
there was no significant difference between the two algorithms for smaller number of requirements. The
second empirical study proved the effectiveness of NSGA-II over MOCell and random search in terms of
finding more Pareto-front solutions.
112
Chapter 8
Conclusion and Future Directions
This chapter concludes the dissertation and discusses future directions for the work.
8.1 Conclusion
Technical debt (TD) prioritization is a crucial step that facilitates technical debt management (TDM).
TD prioritization is particularly important, as repaying all existing TD in a system may be unviable due
to the typical abundance of TD items in a software system and shortage of resources allocated for TD
repayment. Deciding which TD items to include in a repayment activity is challenging, as each TD item
has an associated cost for repaying and an associated value that indicates the benefits that will be gained
from repaying the TD item. Typically, individuals seek to maximize the overall value of a given repayment
activity while minimizing its cost and satisfying its resource constraint. However, identifying which TD
items should be included in a repayment activity to achieve such a goal is extremely hard, as it requires
a tremendous number of comparisons among TD items, and such type of problems is known to be an
NP-hard problem.
With the challenges mentioned above in mind, this research aims to facilitate the prioritization of TD.
Specifically, this dissertation attains its goal by formulating the TD prioritization problem as a multi-
objective optimization (MOO) problem to develop an effective search-based TD prioritization approach
that aligns with the needs of software practitioners. The hypothesis statement of this dissertation is:
Search-based optimization techniques can prioritize TD items in the presence of a resource constraint with
high effectiveness.
113
To prove this hypothesis, a systematic literature review (SLR) and an investigative study were conducted
in addition to designing and developing a search-based TD prioritization approach.
An important aspect of this research is to develop an effective TD prioritization approach that satis-
fies a resource constraint while simultaneously addressing the needs of software practitioners. Therefore,
to provide context to this research, the currently researched TD prioritization approaches were reviewed
through the means of an SLR. The SLR aided in guiding the development of the search-based TD prioriti-
zation approach by ensuring that the approach adheres to the standards set by the current TD prioritization
approaches and avoids their respective limitations. The results of the SLR divulged a scarcity of TD pri-
oritization approaches in general. Furthermore, of the identified approaches, a few inconsistencies were
unveiled in regards to the decision factors that each approach considered during the prioritization process.
In the SLR, the decision factors considered by the approaches were compiled into a set of three overarch-
ing categories, which are referred to as “decision factor categories”: value, cost, and a resource constraint.
The SLR found a TD prioritization approach may revolve around one of the categories listed above or
a combination of them. Unfortunately, though each TD prioritization approach is based on one or more
of these categories, there is a lack of consistency concerning the categories being considered in the cur-
rently researched TD prioritization approaches. In other words, there is no standard set of decision factor
categories that TD prioritization approaches follow. Moreover, the review found that the majority of the
approaches do not account for a resource constraint even though a resource constraint is typically present
in TD repayment activities. Lastly, the SLR raised a concern regarding the effectiveness of the identified
approaches in industry settings, as the majority of the approaches lack an evaluation in industry settings
and are assumed to be effective in industry without providing any form of empirical evidence.
Knowing which decision factor categories should be considered when prioritizing TD items under a re-
source constraint is a crucial element in developing an effective TD prioritization approach. Unfortunately,
the SLR failed to supply such information, as it instead revealed an ambiguity regarding the utilization of
decision factor categories. Moreover, the current TD literature lacks a study that investigates and sum-
marizes how software practitioners prioritize TD under a resource constraint. Aiming to understand how
software practitioners prioritize TD when there is a resource constraint present and to provide guidance
to the software engineering community on developing an effective TD prioritization approach, an inves-
tigative study was conducted. The investigative study sought to unveil the behaviors and perceptions of
software practitioners during the prioritization of TD under a resource constraint. The study involves a
controlled experiment in which 89 software practitioners were requested to select TD items to include
114
for repayment under a resource constraint and, subsequently, tasked with completing a questionnaire, in
which they detailed their approaches and rationales. The study unveiled three unique prioritization patterns
among the participants. The majority of the participants balanced the trade-off between value and cost,
with an aim to maximize the overall obtained value when selecting TD items for repayment. A smaller
number of the participants prioritized higher value TD items, and only one participant prioritized lower
cost TD items. The results of the study provided several valuable insights, with the most notable being
that the value and cost of a TD item are subjective and context dependent. Given that value and cost can
change depending on the individual, a TD prioritization approach should be developed to be independent
of value and cost estimation methods.
As stated previously, the objective of this dissertation is to facilitate the prioritization of TD items by
developing an effective TD prioritization approach under a resource constraint to satisfy various needs
of software practitioners. The investigative study revealed that the majority of the participating software
practitioners aimed to balance the trade-off between the value and cost of TD items (i.e., maximize the
overall obtained value while minimizing cost under a resource constraint). However, finding such a bal-
ance is typically extremely difficult, as it requires a large number of comparisons that grows exponentially
with the number of TD items considered. The large number of required comparisons poses a challenge
in finding the most optimal solution and confirming that a given solution is the best solution. Moreover,
the numerous comparisons also act as a hurdle for individuals when manually performing TD prioriti-
zation. Therefore, to develop a search-based TD prioritization approach, the TD prioritization problem
was formulated as a MOO problem in which there are two primary objectives: maximizing value and
minimizing cost while satisfying a resource constraint. The approach utilizes the non-dominated sorting
genetic algorithm-II (NSGA-II) to optimize over the two objectives and utilizes a greedy-repair, constraint-
handling technique to ensure that the identified prioritization solutions are within the predefined resource
constraint. Additionally, the approach applies a greedy initialization algorithm to generate a set of initial
solutions to improve the convergence of the search. The effectiveness and efficiency of the approach were
evaluated through the means of two experiments. The results of the evaluation proved the effectiveness
of the approach in improving the value of found solutions over those of random search. Furthermore,
the results revealed the ability of the approach to obtain repayment solutions with values similar to or
greater than the values acquired by software practitioners. The approach was able to generate solutions
with an average running time of 3 minute. The approach gained positive feedback from industry software
practitioners, which highlights the potential of the approach in industry settings.
115
The search-based TD prioritization approach has demonstrated high effectiveness in prioritizing TD
items while maintaining a resource constraint, which thereby confirms the hypothesis of this dissertation
and indicates the usability and suitability of search-based techniques when prioritizing TD items.
8.2 Future Directions
Addressing and prioritizing TD will continue to be a pressing issue in the future for researchers and practi-
tioners alike. This dissertation identifies and addresses several real-world challenges that may occur when
prioritizing TD. Such challenges motivated the direction of this research and acted as the foundation for
it on developing a search-based TD prioritization approach. Given that the TD prioritization problem was
formulated as a MOO problem, a possible future direction for research would be to examine the possibil-
ity of applying other multi-objective evolutionary algorithms (MOEAs). This would pose new research
challenges, such as identifying which algorithm achieves the best results in terms of effectiveness and
efficiency.
A second possible direction for future research would be extending the search-based TD prioritization
approach by implementing it as a graphical user interface (GUI) tool that allows its users to specify their
own value and costs formulas in addition to a resource constraint. Ideally, the tool would be flexible enough
to be able to incorporate the value and cost formulas identified in this dissertation as well as allowing
users to define their own formulas. Other extensions of the tool include being capable of connecting to
different version control systems (e.g., Git and Apache Subversion) or issue tracking systems (e.g., Jira
and Bugzilla). These connectivity capabilities would allow the tool to automatically calculate multiple
metrics that can be utilized to automatically determine the importance of the files of a system, such as the
frequency of changing a file and the number of bugs in a file. Furthermore, the GUI tool could allow users
to filter out TD items, files, and rules that a user would like to exclude from their prioritization process.
A third potential direction could be to build upon the GUI tool and conduct a large-scale study with
software practitioners deriving from various different domains to identify additional value and cost formu-
las. Such formulas can then be utilized to expand the options of the predefined value and cost formulas
currently set. Having such a tool would, hopefully, lower the amount of the required human effort when
utilizing the approach.
116
Lastly, another possible direction for future work is to utilize the search-based prioritization approach
in software domains other than TD, particularly in software requirements. Applying the approach to pri-
oritize requirements would pose new research challenges, such as assessing its suitability for requirement
prioritization and comparing its performance with the performance of currently available requirement pri-
oritization approaches.
117
References
[1] Cast. www.castsoftware.com/products/health-dashboard.
[2] The developer coefficient software engineering efficiency and its $3 trillion impact on global gdp.
https://stripe.com/files/reports/the-developer-coefficient.pdf.
[3] Developer survey results 2016. https://insights.stackoverflow.com/survey/2016/.
[4] Java platform standard edition 8. https://docs.oracle.com/javase/8/docs/api/index.
html.
[5] kiuwan. https://www.kiuwan.com.
[6] Maven conventions. https://maven.apache.org/developers/conventions.
[7] MISRA coding standards for C and C++. https://www.perforce.com/resources/qac/
what-misra-overview-misra-standard.
[8] SEI CERT coding standards. https://wiki.sei.cmu.edu/confluence/display/seccode.
[9] Software fail watch: 2016. https://www.tricentis.com/resource-assets/
software-fail-watch-2016/.
[10] Sonarqube. http://www.sonarqube.org.
[11] Sonarqube rules. https://rules.sonarsource.com/.
[12] Sonarqube web api. https://docs.sonarqube.org/display/DEV/Web+API.
[13] Technical debt. https://martinfowler.com/bliki/TechnicalDebt.html.
[14] Technical debt quadrant. https://martinfowler.com/bliki/TechnicalDebtQuadrant.
html.
118
[15] W3c standards. https://www.w3.org/standards.
[16] Ward explains debt metaphor. http://wiki.c2.com/?WardExplainsDebtMetaphor lastac-
cessed = May 19, 2019.
[17] Zahra Shakeri Hossein Abad and Guenther Ruhe. Using real options to manage technical debt in
requirements engineering. In 2015 IEEE 23rd International Requirements Engineering Conference
(RE), pages 230–235. IEEE, 2015.
[18] Philip Achimugu, Ali Selamat, Roliana Ibrahim, and Mohd Naz’ri Mahrin. A systematic litera-
ture review of software requirements prioritization research. Information and software technology,
56(6):568–585, 2014.
[19] Shirin Akbarinasaji. Toward measuring defect debt and developing a recommender system for their
prioritization. In Proceedings of the 13th International Doctoral Symposium on Empirical Software
Engineering, pages 15–20, 2015.
[20] Shirin Akbarinasaji, Ayse Basar Bener, and Atakan Erdem. Measuring the principal of defect debt.
In 2016 IEEE/ACM 5th International Workshop on Realizing Artificial Intelligence Synergies in
Software Engineering (RAISE), pages 1–7. IEEE, 2016.
[21] Yalc ¸ın Akc ¸ay, Haijun Li, and Susan H Xu. Greedy algorithm for the general multidimensional
knapsack problem. Annals of Operations Research, 150(1):17, 2007.
[22] Mashel Al-Barak and Rami Bahsoon. Database design debts through examining schema evolution.
In 2016 IEEE 8th International Workshop on Managing Technical Debt (MTD), pages 17–23. IEEE,
2016.
[23] Mashel Albarak and Rami Bahsoon. Prioritizing technical debt in database normalization using
portfolio theory and data quality metrics. In Proceedings of the 2018 International Conference on
Technical Debt, pages 31–40. ACM, 2018.
[24] Abdullah Aldaeej and Carolyn Seaman. From lasagna to spaghetti: A decision model to manage
defect debt. In 2018 IEEE/ACM International Conference on Technical Debt (TechDebt), pages
67–71. IEEE, 2018.
119
[25] Franklin Allen, Sudipto Bhattacharya, Raghuram Rajan, and Antoinette Schoar. The contributions
of stewart myers to the theory and practice of corporate finance. Journal of Applied Corporate
Finance, 20(4):8–19, 2008.
[26] Nicolli SR Alves, Thiago S Mendes, Manoel G de Mendonc ¸a, Rodrigo O Sp´ ınola, Forrest Shull, and
Carolyn Seaman. Identification and management of technical debt: A systematic mapping study.
Information and Software Technology, 70:100–121, 2016.
[27] Nicolli SR Alves, Leilane F Ribeiro, Vivyane Caires, Thiago S Mendes, and Rodrigo O Sp´ ınola.
Towards an ontology of terms on technical debt. In Managing Technical Debt (MTD), 2014 Sixth
International Workshop on, pages 1–7. IEEE, 2014.
[28] Esra Alzaghoul and Rami Bahsoon. Cloudmtd: Using real options to manage technical debt in
cloud-based service selection. In 2013 4th International Workshop on Managing Technical Debt
(MTD), pages 55–62. IEEE, 2013.
[29] Areti Ampatzoglou, Apostolos Ampatzoglou, Alexander Chatzigeorgiou, and Paris Avgeriou. The
financial aspect of managing technical debt: A systematic literature review. Information and Soft-
ware Technology, 64:52–73, 2015.
[30] Areti Ampatzoglou, Apostolos Ampatzoglou, Alexander Chatzigeorgiou, Paris Avgeriou, Pekka
Abrahamsson, Antonio Martini, Uwe Zdun, and Kari Systa. The perception of technical debt in the
embedded systems domain: an industrial case study. In 2016 IEEE 8th International Workshop on
Managing Technical Debt (MTD), pages 9–16. IEEE, 2016.
[31] Renzo Andrich, Francesco Botto, Valerio Gower, Chiara Leonardi, Oscar Mayora, Lucia Pigini,
Valentina Revolti, Luca Sabatucci, Angelo Susi, and Massimo Zancanaro. Acube: User-centred
and goal-oriented techniques. Fondazione Bruno Kessler-IRST, Tech. Rep, 2010.
[32] Mubashir Arain, Michael J Campbell, Cindy L Cooper, and Gillian A Lancaster. What is a pi-
lot or feasibility study? a review of current practice and editorial policy. BMC medical research
methodology, 10(1):67, 2010.
[33] Andrea Arcuri and Lionel Briand. A hitchhiker’s guide to statistical tests for assessing randomized
algorithms in software engineering. Software Testing, Verification and Reliability, 24(3):219–250,
2014.
120
[34] Paris Avgeriou, Philippe Kruchten, Ipek Ozkaya, and Carolyn Seaman. Managing Technical Debt
in Software Engineering (Dagstuhl Seminar 16162). Dagstuhl Reports, 6(4):110–138, 2016.
[35] Thomas Back, Frank Hoffmeister, and Hans-Paul Schwefel. A survey of evolution strategies. In
Proceedings of the fourth international conference on genetic algorithms, volume 2. Morgan Kauf-
mann Publishers San Mateo, CA, 1991.
[36] Anthony J. Bagnall, Victor J. Rayward-Smith, and Ian M Whittley. The next release problem.
Information and software technology, 43(14):883–890, 2001.
[37] Paul Baker, Mark Harman, Kathleen Steinhofel, and Alexandros Skaliotis. Search based approaches
to component selection and prioritization for the next release problem. In Software Maintenance,
2006. ICSM’06. 22nd IEEE International Conference on, pages 176–185. IEEE, 2006.
[38] Abu SSM Barkat Ullah, Ruhul Sarker, and David Cornforth. Search space reduction technique for
constrained optimization with tiny feasible space. In Proceedings of the 10th annual conference on
Genetic and evolutionary computation, pages 881–888. ACM, 2008.
[39] AE Bartz. Basic statistical concepts. New York: Macmillan. Devore, J., and Peck, 1994.
[40] Raja Bavani. Distributed agile, agile testing, and technical debt. IEEE software, 29(6):28–33, 2012.
[41] Christoph Becker, Ruzanna Chitchyan, Stefanie Betz, and Curtis McCord. Trade-off decisions
across time in technical debt management: a systematic literature review. In Proceedings of the
2018 International Conference on Technical Debt, pages 85–94. ACM, 2018.
[42] Woubshet Nema Behutiye, Pilar Rodr´ ıguez, Markku Oivo, and Ays ¸e Tosun. Analyzing the con-
cept of technical debt in the context of agile software development: A systematic literature review.
Information and Software Technology, 82:139–158, 2017.
[43] Stephany Bellomo, Robert L Nord, and Ipek Ozkaya. A study of enabling factors for rapid field-
ing: combined practices to balance speed and stability. In Proceedings of the 2013 International
Conference on Software Engineering, pages 982–991. IEEE Press, 2013.
[44] Stephany Bellomo, Robert L Nord, Ipek Ozkaya, and Mary Popeck. Got technical debt?: surfacing
elusive technical debt in issue trackers. In Proceedings of the 13th International Conference on
Mining Software Repositories, pages 327–338. ACM, 2016.
121
[45] Stefan Biffl, Aybuke Aurum, Barry Boehm, Hakan Erdogmus, and Paul Gr¨ unbacher. Value-based
software engineering. Springer Science & Business Media, 2006.
[46] Christian Bird, Tim Menzies, and Thomas Zimmermann. The art and science of analyzing software
data. Elsevier, 2015.
[47] Tobias Blickle and Lothar Thiele. A comparison of selection schemes used in evolutionary algo-
rithms. Evolutionary Computation, 4(4):361–394, 1996.
[48] Anthony E Boardman, David H Greenberg, Aidan R Vining, and David L Weimer. Cost-benefit
analysis: concepts and practice. Cambridge University Press, 2017.
[49] Marcelo M Bomfim and Viviane A Santos. Strategies for reducing technical debt in agile teams. In
Brazilian Workshop on Agile Methods, pages 60–71. Springer, 2016.
[50] Pierre Bourque, Richard E Fairley, et al. Guide to the software engineering body of knowledge
(SWEBOK (R)): Version 3.0. IEEE Computer Society Press, 2014.
[51] Edward H Bowman and Gary T Moskowitz. Real options analysis and strategic decision making.
Organization science, 12(6):772–777, 2001.
[52] Michael Bowman, Lionel C Briand, and Yvan Labiche. Solving the class responsibility assignment
problem in object-oriented analysis with multi-objective genetic algorithms. IEEE Transactions on
Software Engineering, 36(6):817–837, 2010.
[53] Richard A Brealey, Stewart C Myers, and Franklin Allen. Brealey, myers, and allen on real options.
Journal of Applied Corporate Finance, 20(4):58–71, 2008.
[54] Nanette Brown, Yuanfang Cai, Yuepu Guo, Rick Kazman, Miryung Kim, Philippe Kruchten, Erin
Lim, Alan MacCormack, Robert Nord, Ipek Ozkaya, et al. Managing technical debt in software-
reliant systems. In Proceedings of the FSE/SDP workshop on Future of software engineering re-
search, pages 47–52. ACM, 2010.
[55] Raymond PL Buse and Westley R Weimer. A metric for software readability. In Proceedings of the
2008 international symposium on Software testing and analysis, pages 121–130. ACM, 2008.
122
[56] Yuanfang Cai, Rick Kazman, Carlos V A Silva, Lu Xiao, and Hong-Mei Chen. A decision-support
system approach to economics-driven modularity evaluation. In Economics-Driven Software Archi-
tecture, pages 105–128. Elsevier, 2014.
[57] G. Ann Campbell and Patroklos P. Papapetrou. SonarQube in Action. Manning Publications Co.,
Greenwich, CT, USA, 1st edition, 2013.
[58] Lei Cao, Jian Cao, and Minglu Li. Genetic algorithm utilized in cost-reduction driven web service
selection. In International Conference on Computational and Information Science, pages 679–686.
Springer, 2005.
[59] Lei Cao, Minglu Li, and Jian Cao. Cost-driven web service selection using genetic algorithm. In
International Workshop on Internet and Network Economics, pages 906–915. Springer, 2005.
[60] Edwin KP Chong and Stanislaw H Zak. An introduction to optimization. John Wiley & Sons, 2004.
[61] Aabha Choudhary and Paramvir Singh. Minimizing refactoring effort through prioritization of
classes based on historical, architectural and code smell information. In QuASoQ/TDA@ APSEC,
pages 76–79, 2016.
[62] Paul C Chu and John E Beasley. A genetic algorithm for the multidimensional knapsack problem.
Journal of heuristics, 4(1):63–86, 1998.
[63] Tom Clancy. The standish group chaos report. Project Smart, 2014.
[64] Alan Clarke. Evaluation research: An introduction to principles, methods and practice. Sage,
1999.
[65] Andrew Clay Shafer. Infrastructure debt: Revisiting the foundation. Cutter IT Journal, 23(10):36,
2010.
[66] Zadia Codabux and Byron Williams. Managing technical debt: An industrial case study. In Pro-
ceedings of the 4th International Workshop on Managing Technical Debt, pages 8–15. IEEE Press,
2013.
[67] Zadia Codabux and Byron J Williams. Technical debt prioritization using predictive analytics. In
Proceedings of the 38th International Conference on Software Engineering Companion, pages 704–
706. ACM, 2016.
123
[68] Carlos A Coello Coello, Gary B Lamont, David A Van Veldhuizen, et al. Evolutionary algorithms
for solving multi-objective problems, volume 5. Springer, 2007.
[69] Paul R Cohen. Empirical methods for artificial intelligence, volume 139. MIT press Cambridge,
MA, 1995.
[70] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to
algorithms. MIT press, 2009.
[71] Vittorio Cortellessa, Ivica Crnkovic, Fabrizio Marinelli, and Pasqualina Potena. Experimenting
the automated selection of cots components based on cost and system requirements. J. UCS,
14(8):1228–1255, 2008.
[72] Daniela S Cruzes and Tore Dyba. Recommended steps for thematic synthesis in software engineer-
ing. In 2011 International Symposium on Empirical Software Engineering and Measurement, pages
275–284. IEEE, 2011.
[73] Ward Cunningham. The wycash portfolio management system. ACM SIGPLAN OOPS Messenger,
4(2):29–30, 1993.
[74] Bill Curtis, Jay Sappidi, and Alexandra Szynkarski. Estimating the principal of an application’s
technical debt. IEEE software, 29(6):34–42, 2012.
[75] Ke Dai and Philippe Kruchten. Detecting technical debt through issue trackers. In QuASoQ@
APSEC, pages 59–65, 2017.
[76] Rodrigo Rebouc ¸as de Almeida, Uir´ a Kulesza, Christoph Treude, Aliandro Higino Guedes Lima,
et al. Aligning technical debt prioritization with business objectives: A multiple-case study. In 2018
IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 655–664.
IEEE, 2018.
[77] Amanda F de O Passos, M´ ario Andr´ e de Freitas Farias, Manoel G de Mendonc ¸a Neto, and Ro-
drigo Oliveira Sp´ ınola. A study on identification of documentation and requirement technical debt
through code comment analysis. In Proceedings of the 17th Brazilian Symposium on Software
Quality, pages 21–30. ACM, 2018.
[78] M´ arcio de Oliveira Barros and Arilo Cl´ audio Dias-Neto. 0006/2011-threats to validity in search-
based software engineering empirical studies. RelaTe-DIA, 5(1), 2011.
124
[79] Kalyanmoy Deb. Multi-objective optimization using evolutionary algorithms, volume 16. John
Wiley & Sons, 2001.
[80] Kalyanmoy Deb and Deb Kalyanmoy. Multi-Objective Optimization Using Evolutionary Algo-
rithms. John Wiley & Sons, Inc., New York, NY , USA, 2001.
[81] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiob-
jective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197,
2002.
[82] Jos´ e Del Sagrado, Isabel Mar´ ıa Del
´
Aguila, and Francisco Javier Orellana. Ant colony optimization
for the next release problem: A comparative study. In Search Based Software Engineering (SSBSE),
2010 Second International Symposium on, pages 67–76. IEEE, 2010.
[83] B Dhillon. Life cycle costing: techniques, models and applications. Routledge, 2013.
[84] Georgios Digkas, Mircea Lungu, Paris Avgeriou, Alexander Chatzigeorgiou, and Apostolos Am-
patzoglou. How do developers fix issues and pay back technical debt in the apache ecosystem?
In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering
(SANER), pages 153–163. IEEE, 2018.
[85] Georgios Digkas, Mircea Lungu, Alexander Chatzigeorgiou, and Paris Avgeriou. The evolution of
technical debt in the apache ecosystem. In European Conference on Software Architecture, pages
51–66. Springer, 2017.
[86] John S Dodgson, Michael Spackman, Alan Pearman, and Lawrence D Phillips. Multi-criteria anal-
ysis: a manual. 2009.
[87] Paulo S´ ergio Medeiros dos Santos, Amanda Varella, Cristine Ribeiro Dantas, and Daniel Beltr˜ ao
Borges. Visualizing and managing technical debt in agile development: An experience report. In
International Conference on Agile Software Development, pages 121–134. Springer, 2013.
[88] Jules Dupuit. On the measurement of the utility of public works. International Economic Papers,
2(1952):83–110, 1844.
[89] Juan J Durillo, YuanYuan Zhang, Enrique Alba, and Antonio J Nebro. A study of the multi-objective
next release problem. In Search Based Software Engineering, 2009 1st International Symposium
on, pages 49–58. IEEE, 2009.
125
[90] Agoston E Eiben, James E Smith, et al. Introduction to evolutionary computing, volume 53.
Springer, 2003.
[91] Hakan Erdogmus. Comparative evaluation of software development strategies based on net present
value. In International Workshop on Economics-Driven Software Engineering Research EDSER,
volume 1, 1999.
[92] Neil A Ernst. On the role of requirements in understanding and managing technical debt. In Pro-
ceedings of the Third International Workshop on Managing Technical Debt, pages 61–64. IEEE
Press, 2012.
[93] Larry J Eshelman. The chc adaptive search algorithm: How to have safe search when engaging
in nontraditional genetic recombination. In Foundations of genetic algorithms, volume 1, pages
265–283. Elsevier, 1991.
[94] Davide Falessi and Andreas Reichel. Towards an open-source tool for measuring and visualizing
the interest of technical debt. In Managing Technical Debt (MTD), 2015 IEEE 7th International
Workshop on, pages 1–8. IEEE, 2015.
[95] Carlos Fern´ andez-S´ anchez, Juan Garbajosa, Agust´ ın Yag¨ ue, and Jennifer Perez. Identification and
analysis of the elements required to manage technical debt by means of a systematic mapping study.
Journal of Systems and Software, 124:22–38, 2017.
[96] Leon Ed Festinger and Daniel Ed Katz. Research methods in the behavioral sciences. 1953.
[97] Peter C Fishburn. Letter to the editor additive utilities with incomplete product sets: application to
priorities and assignments. Operations Research, 15(3):537–542, 1967.
[98] Carlos M Fonseca and Peter J Fleming. An overview of evolutionary algorithms in multiobjective
optimization. Evolutionary computation, 3(1):1–16, 1995.
[99] Francesca Arcelli Fontana, Vincenzo Ferme, Marco Zanoni, and Riccardo Roveda. Towards a
prioritization of code debt: A code smell intensity index. In 2015 IEEE 7th International Workshop
on Managing Technical Debt (MTD), pages 16–24. IEEE, 2015.
[100] Francesca Arcelli Fontana, Ilaria Pigazzini, Riccardo Roveda, Damian Tamburri, Marco Zanoni, and
Elisabetta Di Nitto. Arcan: a tool for architectural smells detection. In 2017 IEEE International
Conference on Software Architecture Workshops (ICSAW), pages 282–285. IEEE, 2017.
126
[101] Martin Fowler. Refactoring: improving the design of existing code. Addison-Wesley Professional,
2018.
[102] Shane Frederick, George Loewenstein, and Ted O’donoghue. Time discounting and time prefer-
ence: A critical review. Journal of economic literature, 40(2):351–401, 2002.
[103] Tobias Friedrich and Markus Wagner. Seeding the initial population of multi-objective evolutionary
algorithms: A computational study. Applied Soft Computing, 33:223–230, 2015.
[104] Israel Gat and John D Heintz. From assessment to reduction: how cutter consortium helps rein in
millions of dollars in technical debt. In Proceedings of the 2nd Workshop on Managing Technical
Debt, pages 24–26. ACM, 2011.
[105] John Gerring. What is a case study and what is it good for? American political science review,
98(2):341–354, 2004.
[106] Heather J Goldsby and Betty HC Cheng. Automatically generating behavioral models of adaptive
systems to address uncertainty. In International Conference on Model Driven Engineering Lan-
guages and Systems, pages 568–583. Springer, 2008.
[107] Des Greer and Guenther Ruhe. Software release planning: an evolutionary and iterative approach.
Information and software technology, 46(4):243–253, 2004.
[108] Everton Guimaraes, S Vidal, A Garcia, JA Diaz Pace, and Claudia Marcos. Exploring architecture
blueprints for prioritizing critical code anomalies: Experiences and tool support. Software: Practice
and Experience, 48(5):1077–1106, 2018.
[109] Yuepu Guo and Carolyn Seaman. A portfolio approach to technical debt management. In Proceed-
ings of the 2nd Workshop on Managing Technical Debt, pages 31–34. ACM, 2011.
[110] Yuepu Guo, Carolyn Seaman, Rebeka Gomes, Antonio Cavalcanti, Graziela Tonin, Fabio QB
Da Silva, Andre LM Santos, and Clauirton Siebra. Tracking technical debt an exploratory case
study. In 2011 27th IEEE international conference on software maintenance (ICSM), pages 528–
531. IEEE, 2011.
[111] Yuepu Guo, Rodrigo Oliveira Sp´ ınola, and Carolyn Seaman. Exploring the costs of technical debt
management–a case study. Empirical Software Engineering, 21(1):159–182, 2016.
127
[112] David Hadka. Moea framework-a free and open source java framework for multiobjective optimiza-
tion. version 2.12. URL http://www. moeaframework. org, 2018.
[113] Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. Search based software engineering: A
comprehensive analysis and review of trends techniques and applications. Department of Computer
Science, King’s College London, Tech. Rep. TR-09-03, page 23, 2009.
[114] Mark Harman, Phil McMinn, Jerffeson Teixeira De Souza, and Shin Yoo. Search based software
engineering: Techniques, taxonomy, tutorial. In Empirical software engineering and verification,
pages 1–59. Springer, 2012.
[115] Muhammad Firdaus Harun and Horst Lichter. Towards a technical debt-management framework
based on cost-benefit analysis. In Proceedings of the 10 th International Conference on Software
Engineering Advances, 2015.
[116] Jun He and Yuren Zhou. A comparison of gas using penalizing infeasible solutions and repairing
infeasible solutions on restrictive capacity knapsack problem. In Proceedings of the 9th annual
conference on Genetic and evolutionary computation, pages 1518–1518. ACM, 2007.
[117] Pei He, Lishan Kang, and Ming Fu. Formality based genetic programming. In 2008 IEEE Congress
on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pages 4080–
4087. IEEE, 2008.
[118] Myles Hollander, Douglas A Wolfe, and Eric Chicken. Nonparametric statistical methods, volume
751. John Wiley & Sons, 2013.
[119] Qiao Huang, Emad Shihab, Xin Xia, David Lo, and Shanping Li. Identifying self-admitted technical
debt in open source projects using text mining. Empirical Software Engineering, pages 1–34, 2017.
[120] Elizabeth Hull, Ken Jackson, and Jeremy Dick. Requirements Engineering. Springer Science &
Business Media, 2011.
[121] Ray Hyman. Quasi-experimentation: Design and analysis issues for field settings (book). Journal
of Personality Assessment, 46(1):96–97, 1982.
[122] Clemente Izurieta, Antonio Vetr` o, Nico Zazworka, Yuanfang Cai, Carolyn Seaman, and Forrest
Shull. Organizing the technical debt landscape. In 2012 Third International Workshop on Managing
Technical Debt (MTD), pages 23–26. IEEE, 2012.
128
[123] Sherri L Jackson. Research methods and statistics: A critical thinking approach. Cengage Learn-
ing, 2015.
[124] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A
survey. Journal of artificial intelligence research, 4:237–285, 1996.
[125] Joachim Karlsson, Claes Wohlin, and Bj¨ orn Regnell. An evaluation of methods for prioritizing
software requirements. Information and software technology, 39(14-15):939–947, 1998.
[126] Borhan Kazimipour, Xiaodong Li, and A Kai Qin. A review of population initialization techniques
for evolutionary algorithms. In Evolutionary Computation (CEC), 2014 IEEE Congress on, pages
2585–2592. IEEE, 2014.
[127] Staffs Keele et al. Guidelines for performing systematic literature reviews in software engineering.
Technical report, Technical report, Ver. 2.3 EBSE Technical Report. EBSE, 2007.
[128] Hans Kellerer, Ulrich Pferschy, and D. Pisinger. Knapsack problems. Springer, 2011.
[129] Barbara A Kitchenham, Shari Lawrence Pfleeger, Lesley M Pickard, Peter W Jones, David C.
Hoaglin, Khaled El Emam, and Jarrett Rosenberg. Preliminary guidelines for empirical research
in software engineering. IEEE Transactions on software engineering, 28(8):721–734, 2002.
[130] Barbara Ann Kitchenham, David Budgen, and Pearl Brereton. Evidence-based software engineer-
ing and systematic reviews, volume 4. CRC press, 2015.
[131] Supannika Koolmanojwong and Jo Ann Lane. Enablers and inhibitors of expediting systems engi-
neering. Procedia Computer Science, 16:483–491, 2013.
[132] Makrina Viola Kosti, Apostolos Ampatzoglou, Alexander Chatzigeorgiou, Georgios Pallas, Ioannis
Stamelos, and Lefteris Angelis. Technical debt principal assessment through structural metrics.
In 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA),
pages 329–333. IEEE, 2017.
[133] Vijay Kotu and Bala Deshpande. Predictive analytics and data mining: concepts and practice with
rapidminer. Morgan Kaufmann, 2014.
[134] Philippe Kruchten, Robert L Nord, and Ipek Ozkaya. Technical debt: From metaphor to theory and
practice. IEEE software, 29(6):18–21, 2012.
129
[135] Edouard Kujawski. Multi-criteria decision analysis: Limitations, pitfalls, and practical difficulties.
In INCOSE International Symposium, volume 13, pages 1169–1176. Wiley Online Library, 2003.
[136] Jo Ann Lane, Supannika Koolmanojwong, and Barry Boehm. Affordable systems: Balancing the
capability, schedule, flexibility, and technical debt tradespace. In INCOSE International Sympo-
sium, volume 23, pages 1385–1399. Wiley Online Library, 2013.
[137] Paul J Lavrakas. Encyclopedia of survey research methods. Sage Publications, 2008.
[138] Eugene L Lawler. Fast approximation algorithms for knapsack problems. Mathematics of Opera-
tions Research, 4(4):339–356, 1979.
[139] Jean-Louis Letouzey. The sqale method for evaluating technical debt. In 2012 Third International
Workshop on Managing Technical Debt (MTD), pages 31–36. IEEE, 2012.
[140] Jean-Louis Letouzey and Michel Ilkiewicz. Managing technical debt with the sqale method. IEEE
software, 29(6):44–51, 2012.
[141] Zengyang Li, Paris Avgeriou, and Peng Liang. A systematic mapping study on technical debt and
its management. Journal of Systems and Software, 101:193–220, 2015.
[142] Zengyang Li, Peng Liang, and Paris Avgeriou. Architectural technical debt identification based
on architecture decisions and change scenarios. In 2015 12th Working IEEE/IFIP Conference on
Software Architecture, pages 65–74. IEEE, 2015.
[143] Erin Lim, Nitin Taksande, and Carolyn Seaman. A balancing act: what software practitioners have
to say about technical debt. IEEE software, 29(6):22–27, 2012.
[144] Mario Linares-V´ asquez, Gabriele Bavota, Carlos Eduardo Bernal C´ ardenas, Rocco Oliveto, Mas-
similiano Di Penta, and Denys Poshyvanyk. Optimizing energy consumption of guis in android
apps: a multi-objective approach. In Proceedings of the 2015 10th Joint Meeting on Foundations of
Software Engineering, pages 143–154. ACM, 2015.
[145] Zhongxin Liu, Qiao Huang, Xin Xia, Emad Shihab, David Lo, and Shanping Li. Satd detector: A
text-mining-based self-admitted technical debt detection tool. In Proceedings of the 40th Interna-
tional Conference on Software Engineering: Companion Proceeedings, pages 9–12. ACM, 2018.
[146] Panagiotis Louridas. Static code analysis. IEEE Software, 23(4):58–61, 2006.
130
[147] Rand Kwong Yew Low, Robert Faff, and Kjersti Aas. Enhancing mean–variance portfolio selection
by modeling distributional asymmetries. Journal of Economics and Business, 85:49–72, 2016.
[148] Rudi Lutz. Evolving good hierarchical decompositions of complex systems. Journal of systems
architecture, 47(7):613–634, 2001.
[149] Heikki Maaranen, Kaisa Miettinen, and Marko M M¨ akel¨ a. Quasi-random initial population for
genetic algorithms. Computers & Mathematics with Applications, 47(12):1885–1895, 2004.
[150] David JC MacKay and David JC Mac Kay. Information theory, inference and learning algorithms.
Cambridge university press, 2003.
[151] Everton Maldonado, Emad Shihab, and Nikolaos Tsantalis. Using natural language processing
to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering,
2017.
[152] Mika V M¨ antyl¨ a and Casper Lassenius. What types of defects are really discovered in code reviews?
IEEE Transactions on Software Engineering, 35(3):430–448, 2009.
[153] Radu Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In 20th IEEE
International Conference on Software Maintenance, 2004. Proceedings., pages 350–359. IEEE,
2004.
[154] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.
[155] Alfred Marshall. Principles of economics: unabridged eighth edition. Cosimo, Inc., 2009.
[156] Silvano Martello. Knapsack problems: algorithms and computer implementations. Wiley-
Interscience series in discrete mathematics and optimiza tion, 1990.
[157] Silvano Martello, David Pisinger, and Paolo Toth. Dynamic programming and strong bounds for
the 0-1 knapsack problem. Management Science, 45(3):414–424, 1999.
[158] Antonio Martini and Jan Bosch. Towards prioritizing architecture technical debt: information needs
of architects and product owners. In 2015 41st Euromicro Conference on Software Engineering and
Advanced Applications, pages 422–429. IEEE, 2015.
131
[159] Antonio Martini, Francesca Arcelli Fontana, Andrea Biaggi, and Riccardo Roveda. Identifying and
prioritizing architectural debt through architectural smells: a case study in a large software company.
In European Conference on Software Architecture, pages 320–335. Springer, 2018.
[160] Antonio Martini, Erik Sikander, and Niel Madlani. A semi-automated framework for the identifica-
tion and estimation of architectural technical debt: A comparative case-study on the modularization
of a software component. Information and Software Technology, 93:264–279, 2018.
[161] Steve McConnell. Software estimation: demystifying the black art. Microsoft press, 2006.
[162] Steve McConnell. Managing technical debt. Construx Software Builders, Inc, pages 1–14, 2008.
[163] Thiago S Mendes, Felipe GS Gomes, David P Gonc ¸alves, Manoel G Mendonc ¸a, Renato L Novais,
and Rodrigo O Sp´ ınola. Visminertd: a tool for automatic identification and interactive monitoring
of the evolution of technical debt items. Journal of the Brazilian Computer Society, 25(1):2, 2019.
[164] Solomon Mensah, Jacky Keung, Jeffery Svajlenko, Kwabena Ebo Bennin, and Qing Mi. On the
value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and
Software, 135:37–54, 2018.
[165] Tim Menzies, Ekrem Kocaguneli, Burak Turhan, Leandro Minku, and Fayola Peters. Sharing data
and models in software engineering. Morgan Kaufmann, 2014.
[166] Zbigniew Michalewicz. A survey of constraint handling techniques in evolutionary computation
methods. Evolutionary programming, 4:135–155, 1995.
[167] Zbigniew Michalewicz. Evolution strategies and other methods. In Genetic Algorithms+ Data
Structures= Evolution Programs, pages 159–177. Springer, 1996.
[168] Zbigniew Michalewicz and Jarosław Arabas. Genetic algorithms for the 0/1 knapsack problem. In
International Symposium on Methodologies for Intelligent Systems, pages 134–143. Springer, 1994.
[169] Kaisa Miettinen. Nonlinear multiobjective optimization, volume 12. Springer Science & Business
Media, 2012.
[170] Mattheu B Miles and A Michael Huberman. Qualitative data analysis: A sourcebook of new meth-
ods. In Qualitative data analysis: a sourcebook of new methods. Sage publications, 1984.
132
[171] Melanie Mitchell. An introduction to genetic algorithms. MIT press, 1998.
[172] Douglas C Montgomery. Design and analysis of experiments. John wiley & sons, 2017.
[173] John L Moore. Cost-benefit analysis: Issues in its use in regulation. Congressional Research
Service, Library of Congress Washington, DC, 1995.
[174] J David Morgenthaler, Misha Gridnev, Raluca Sauciuc, and Sanjay Bhansali. Searching for build
debt: Experiences managing technical debt at google. In 2012 Third International Workshop on
Managing Technical Debt (MTD), pages 1–6. IEEE, 2012.
[175] Johnathan Mun. Real options analysis: Tools and techniques for valuing strategic investments and
decisions, volume 137. John Wiley & Sons, 2002.
[176] Stewart C Myers. Determinants of corporate borrowing. Journal of financial economics, 5(2):147–
175, 1977.
[177] Robert L Nord, Ipek Ozkaya, Philippe Kruchten, and Marco Gonzalez-Rojas. In search of a metric
for managing architectural technical debt. In Software Architecture (WICSA) and European Con-
ference on Software Architecture (ECSA), 2012 Joint Working IEEE/IFIP Conference on, pages
91–100. IEEE, 2012.
[178] Senay Oguztimur. Why fuzzy analytic hierarchy process approach for transport problems? 2011.
[179] Paul Oman and Jack Hagemeister. Metrics for assessing a software system’s maintainability. In
Proceedings Conference on Software Maintenance 1992, pages 337–344. IEEE, 1992.
[180] David Orvosh and Lawrence Davis. Shall we repair? genetic algorithmscombinatorial optimizatio-
nand feasibility constraints. In Proceedings of the 5th International Conference on Genetic Algo-
rithms, page 650. Morgan Kaufmann Publishers Inc., 1993.
[181] Eneko Osaba, Roberto Carballedo, Fernando Diaz, Enrique Onieva, P Lopez, and Asier Perallos.
On the influence of using initialization functions on genetic algorithms solving combinatorial op-
timization problems: a first study on the tsp. In 2014 IEEE Conference on Evolving and Adaptive
Intelligent Systems (EAIS), pages 1–6. IEEE, 2014.
[182] Julie Pallant. SPSS survival manual. McGraw-Hill Education (UK), 2013.
133
[183] David Lorge Parnas. Software aging. In Proceedings of the 16th international conference on Soft-
ware engineering, pages 279–287. IEEE Computer Society Press, 1994.
[184] Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. Systematic mapping studies in
software engineering. In Ease, volume 8, pages 68–77, 2008.
[185] Reinhold Pl¨ osch, Johannes Br¨ auer, Matthias Saft, and Christian K¨ orner. Design debt prioritization:
a design best practice-based approach. In 2018 IEEE/ACM International Conference on Technical
Debt (TechDebt), pages 95–104. IEEE, 2018.
[186] Aniket Potdar and Emad Shihab. An exploratory study on self-admitted technical debt. In 2014
IEEE International Conference on Software Maintenance and Evolution, pages 91–100. IEEE,
2014.
[187] Ken Power. Understanding the impact of technical debt on the capacity and velocity of teams and
organizations: Viewing team and organization capacity as a portfolio of real options. In Managing
Technical Debt (MTD), 2013 4th International Workshop on, pages 28–31. IEEE, 2013.
[188] Muhammad Ramzan, M Arfan Jaffar, and Arshad Ali Shahid. Value based intelligent requirement
prioritization (virp): expert driven fuzzy logic based prioritization technique. International Journal
Of Innovative Computing, Information And Control, 7(3):1017–1038, 2011.
[189] Frank K Reilly and Keith C Brown. Investment analysis and portfolio management. Cengage
Learning, 2002.
[190] Leilane Ferreira Ribeiro, Nicolli Souza Rios Alves, Manoel Gomes de Mendonca Neto, and Ro-
drigo Oliveira Sp´ ınola. A strategy based on multiple decision criteria to support technical debt
management. In 2017 43rd Euromicro Conference on Software Engineering and Advanced Appli-
cations (SEAA), pages 334–341. IEEE, 2017.
[191] Leilane Ferreira Ribeiro, M´ ario Andr´ e de Freitas Farias, Manoel G Mendonc ¸a, and Rodrigo Oliveira
Sp´ ınola. Decision criteria for the payment of technical debt in software projects: A systematic
mapping study. In ICEIS (1), pages 572–579, 2016.
[192] Jon T Richardson, Mark R Palmer, Gunar E Liepins, and Mike R Hilliard. Some guidelines for
genetic algorithms with penalty functions. In Proceedings of the 3rd international conference on
genetic algorithms, pages 191–197. Morgan Kaufmann Publishers Inc., 1989.
134
[193] Kenneth S. Rubin. Essential Scrum: A Practical Guide to the Most Popular Agile Process.
Addison-Wesley Professional, 1st edition, 2012.
[194] Per Runeson and Martin H¨ ost. Guidelines for conducting and reporting case study research in
software engineering. Empirical software engineering, 14(2):131, 2009.
[195] Per Runeson, Martin Host, Austen Rainer, and Bjorn Regnell. Case study research in software
engineering: Guidelines and examples. John Wiley & Sons, 2012.
[196] Thomas L Saaty. How to make a decision: the analytic hierarchy process. European journal of
operational research, 48(1):9–26, 1990.
[197] Thomas L Saaty. Decision making the analytic hierarchy and network processes (ahp/anp). Journal
of systems science and systems engineering, 13(1):1–35, 2004.
[198] Thomas L Saaty. Decision making with the analytic hierarchy process. International journal of
services sciences, 1(1):83–98, 2008.
[199] Thomas L Saaty and Luis G Vargas. Models, methods, concepts & applications of the analytic
hierarchy process, volume 175. Springer Science & Business Media, 2012.
[200] Natthawute Sae-Lim, Shinpei Hayashi, and Motoshi Saeki. Context-based approach to prioritize
code smells for prefactoring. Journal of Software: Evolution and Process, 30(6):e1886, 2018.
[201] Natthawute Sae-Lim, Shinpei Hayashi, and Motoshi Saeki. An investigative study on how de-
velopers filter and prioritize code smells. IEICE TRANSACTIONS on Information and Systems,
101(7):1733–1742, 2018.
[202] Federica Sarro, Alessio Petrozziello, and Mark Harman. Multi-objective software effort estimation.
In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 619–
630. IEEE, 2016.
[203] Klaus Schmid. A formal approach to technical debt decision making. In Proceedings of the 9th
international ACM Sigsoft conference on Quality of software architectures, pages 153–162. ACM,
2013.
[204] Carolyn Seaman and Yuepu Guo. Measuring and monitoring technical debt. In Advances in Com-
puters, volume 82, pages 25–46. Elsevier, 2011.
135
[205] Carolyn Seaman, Yuepu Guo, Clemente Izurieta, Yuanfang Cai, Nico Zazworka, Forrest Shull, and
Antonio Vetr` o. Using technical debt data in decision making: Potential decision approaches. In
Proceedings of the Third International Workshop on Managing Technical Debt, pages 45–48. IEEE
Press, 2012.
[206] Carolyn Seaman, Robert L Nord, Philippe Kruchten, and Ipek Ozkaya. Technical debt: Beyond
definition to understanding report on the sixth international workshop on managing technical debt.
ACM SIGSOFT Software Engineering Notes, 40(2):32–34, 2015.
[207] Forrest Shull, Davide Falessi, Carolyn Seaman, Madeline Diep, and Lucas Layman. Technical debt:
Showing the way for better transfer of empirical results. In Perspectives on the Future of Software
Engineering, pages 179–190. Springer, 2013.
[208] Christopher L Simons and Ian C Parmee. Agent-based support for interactive search in concep-
tual software engineering design. In Proceedings of the 10th annual conference on Genetic and
evolutionary computation, pages 1785–1786, 2008.
[209] Sandra
ˇ
Sipeti´ c Grujiˇ ci´ c. Experimental StudiesExperimental studies, pages 421–424. Springer
Netherlands, Dordrecht, 2008.
[210] Steven S Skiena. The algorithm design manual: Text, volume 1. Springer Science & Business
Media, 1998.
[211] Will Snipes, Brian Robinson, Yuepu Guo, and Carolyn Seaman. Defining the decision factors for
managing defects: a technical debt perspective. In Proceedings of the Third International Workshop
on Managing Technical Debt, pages 54–60. IEEE Press, 2012.
[212] Rodrigo O Sp´ ınola, Nico Zazworka, Antonio Vetr` o, Carolyn Seaman, and Forrest Shull. Investigat-
ing technical debt folklore: Shedding some light on technical debt opinion. In Proceedings of the
4th International Workshop on Managing Technical Debt, pages 1–7. IEEE Press, 2013.
[213] Chris Sterling. Managing Software Debt: Building for Inevitable Change. Addison-Wesley Profes-
sional, 2010.
[214] Paul P Tallon, Robert J Kauffman, Henry C Lucas, Andrew B Whinston, and Kevin Zhu. Using real
options analysis for evaluating uncertain investments in information technology: Insights from the
icis 2001 debate. Communications of the Association for Information Systems, 9(1):9, 2002.
136
[215] Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. What is social debt
in software engineering? In Cooperative and Human Aspects of Software Engineering (CHASE),
2013 6th International Workshop on, pages 93–96. IEEE, 2013.
[216] Edith Tom, Aybuke Aurum, and Richard Vidgen. A consolidated understanding of technical debt.
2012.
[217] Edith Tom, Ayb¨ uKe Aurum, and Richard Vidgen. An exploration of technical debt. Journal of
Systems and Software, 86(6):1498–1516, 2013.
[218] Paolo Tonella, Angelo Susi, and Francis Palma. Using interactive ga for requirements prioritization.
In 2nd International Symposium on Search Based Software Engineering, pages 57–66. IEEE, 2010.
[219] Adam Tornhill. Assessing technical debt in automated tests with codescene. In 2018 IEEE Inter-
national Conference on Software Testing, Verification and Validation Workshops (ICSTW), pages
122–125. IEEE, 2018.
[220] Adam Tornhill. Prioritize technical debt in large-scale systems using codescene. In Proceedings of
the 2018 International Conference on Technical Debt, pages 59–60. ACM, 2018.
[221] Jacopo Torriti and Eka Ikpe. Cost–benefit analysis. Encyclopedia of Law and Economics, pages
1–8, 2014.
[222] Evangelos Triantaphyllou. Multi-criteria decision making methods: A comparative study. In Multi-
criteria decision making methods: A comparative study, pages 5–21. Springer, 2000.
[223] Ricardo Viana Vargas and PMP IPMA-B. Using the analytic hierarchy process (ahp) to select and
prioritize projects in a portfolio. In PMI global congress, pages 1–22, 2010.
[224] Andr´ as Vargha and Harold D Delaney. A critique and improvement of the cl common language
effect size statistics of mcgraw and wong. Journal of Educational and Behavioral Statistics,
25(2):101–132, 2000.
[225] Mark Velasquez and Patrick T Hester. An analysis of multi-criteria decision making methods.
International Journal of Operations Research, 10(2):56–66, 2013.
137
[226] Roberto Verdecchia. Identifying architectural technical debt in android applications through auto-
mated compliance checking. In Proceedings of the 5th International Conference on Mobile Soft-
ware Engineering and Systems, MOBILESoft ’18, pages 35–36, New York, NY , USA, 2018. ACM.
[227] Santiago Vidal, Hernan Vazquez, J Andres Diaz-Pace, Claudia Marcos, Alessandro Garcia, and
Willian Oizumi. Jspirit: a flexible tool for the analysis of code smells. In 2015 34th International
Conference of the Chilean Computer Science Society (SCCC), pages 1–6. IEEE, 2015.
[228] Thomas Weise. Global optimization algorithms-theory and application. Self-published, 2, 2009.
[229] Kristian Wiklund, Sigrid Eldh, Daniel Sundmark, and Kristina Lundqvist. Technical debt in test
automation. In 2012 IEEE Fifth International Conference on Software Testing, Verification and
Validation, pages 887–892. IEEE, 2012.
[230] Claes Wohlin. Guidelines for snowballing in systematic literature studies and a replication in soft-
ware engineering. In Proceedings of the 18th international conference on evaluation and assess-
ment in software engineering, page 38. Citeseer, 2014.
[231] Claes Wohlin et al. Engineering and managing software requirements. Springer Science & Busi-
ness Media, 2005.
[232] Lu Xiao, Yuanfang Cai, Rick Kazman, Ran Mo, and Qiong Feng. Identifying and quantifying
architectural debt. In Proceedings of the 38th International Conference on Software Engineering,
pages 488–498. ACM, 2016.
[233] Jesse Yli-Huumo, Andrey Maglyas, and Kari Smolander. The benefits and consequences of
workarounds in software development projects. In International Conference of Software Business,
pages 1–16. Springer, 2015.
[234] Jesse Yli-Huumo, Andrey Maglyas, and Kari Smolander. How do software development teams
manage technical debt?–an empirical study. Journal of Systems and Software, 120:195–218, 2016.
[235] Shin Yoo and Mark Harman. Pareto efficient multi-objective test case selection. In Proceedings of
the 2007 international symposium on Software testing and analysis, pages 140–150. ACM, 2007.
[236] Shin Yoo and Mark Harman. Regression testing minimization, selection and prioritization: a survey.
Software Testing, Verification and Reliability, 22(2):67–120, 2012.
138
[237] Nico Zazworka, Carolyn Seaman, and Forrest Shull. Prioritizing design debt investment oppor-
tunities. In Proceedings of the 2nd Workshop on Managing Technical Debt, pages 39–42. ACM,
2011.
[238] Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. Investigating the impact of
design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt,
pages 17–23. ACM, 2011.
[239] Nico Zazworka, Rodrigo O Sp´ ınola, Antonio Vetro, Forrest Shull, and Carolyn Seaman. A case
study on effectively identifying technical debt. In Proceedings of the 17th International Conference
on Evaluation and Assessment in Software Engineering, pages 42–47. ACM, 2013.
[240] He Zhang, Muhammad Ali Babar, and Paolo Tell. Identifying relevant studies in software engineer-
ing. Information and Software Technology, 53(6):625–637, 2011.
[241] Yuanyuan Zhang, Mark Harman, and S Afshin Mansouri. The multi-objective next release problem.
In Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 1129–
1137. ACM, 2007.
[242] Eckart Zitzler, Marco Laumanns, and Lothar Thiele. Spea2: Improving the strength pareto evolu-
tionary algorithm. TIK-report, 103, 2001.
139
Abstract (if available)
Abstract
Technical debt (TD) is a metaphor used to account for the added software system effort or cost resulting from taking early software project shortcuts. This acquired debt accumulates interest and becomes more challenging to repay over time. When not managed, TD can cause significant long-term issues, such as high maintenance costs and eventual system failures. TD prioritization is the process of deciding which TD items are to be repaid first and which items are to be delayed until further releases. With limited resources at their disposal, individuals may struggle to decide which TD items should be repaid to achieve the highest possible value, as there is typically a trade-off between the value of a TD item and its cost. Though the software engineering community has developed several TD prioritization approaches, researchers have noted several limitations in the existing approaches and have called for developing new, improved approaches. ❧ The focus of this dissertation is TD prioritization. A systematic literature review (SLR) was first conducted to identify and analyze the existing TD prioritization approaches. The SLR revealed a scarcity in the identified approaches that account for value, cost, and a resource constraint in addition to a lack of industry evaluations. Moreover, an investigative study was conducted with 89 software practitioners to gain a better understanding regarding how practitioners prioritize TD in the presence of a resource constraint. The study revealed three unique patterns: most of the participants balance the trade-off between value and cost, a smaller number of participants repaid higher value TD items first, and a single participant prioritized lower cost TD items. ❧ Aiming to address the limitations identified in the existing TD prioritization approaches, this dissertation models the TD prioritization problem as a multi-objective optimization (MOO) problem and develops a search-based TD prioritization approach that is capable of handling a resource constraint. The output of the approach encompasses which TD items should be repaid to maximize the value of a given repayment activity while minimizing its cost and satisfying its resource constraint. In its evaluation, the approach was compared to random search using 40 open-source software (OSS) systems, and the approach surpassed random search in terms of best obtained solution’s value in all the cases. The approach was also compared to 66 software practitioners’ prioritized solutions and was able to obtain values similar to or greater than the values obtained by the practitioners. Moreover, the approach required only an average running time of 3 minutes to generate the prioritized solution set.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The effects of required security on software development effort
PDF
Incremental development productivity decline
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Value-based, dependency-aware inspection and test prioritization
PDF
Software security economics and threat modeling based on attack path analysis; a stakeholder value driven approach
PDF
Domain-based effort distribution model for software cost estimation
PDF
A model for estimating cross-project multitasking overhead in software development projects
PDF
Using social networking technology to improve collaborative requirements elicitation, negotiation, prioritization and evolution
PDF
Software architecture recovery using text classification -- recover and RELAX
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Incremental search-based path planning for moving target search
PDF
Architectural evolution and decay in software systems
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Improved size and effort estimation models for software maintenance
PDF
A value-based theory of software engineering
PDF
Constraint-based program analysis for concurrent software
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks
PDF
Calibrating COCOMO® II for functional size metrics
Asset Metadata
Creator
Alfayez, Reem Abdulaziz I
(author)
Core Title
A search-based approach for technical debt prioritization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/07/2021
Defense Date
02/08/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Maintenance,OAI-PMH Harvest,software engineering,technical debt
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry (
committee chair
), Adler, Paul (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
alfayez@usc.edu,reem.alfayez@hotmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-440221
Unique identifier
UC11666561
Identifier
etd-AlfayezRee-9415.pdf (filename),usctheses-c89-440221 (legacy record id)
Legacy Identifier
etd-AlfayezRee-9415.pdf
Dmrecord
440221
Document Type
Dissertation
Rights
Alfayez, Reem Abdulaziz I
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
software engineering
technical debt