Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Calibrating COCOMO® II for functional size metrics
(USC Thesis Other)
Calibrating COCOMO® II for functional size metrics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Calibrating COCOMO® II for Functional Size Metrics
by
Anandi V. Hira
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2020
Copyright 2020 Anandi V. Hira
ii
Dedication
With love and appreciation, to God,
my Dad, Mom, and Sister
iii
Acknowledgments
First and foremost, I thank God for His many blessings. I am grateful for my parents and
sister for supporting me through the PhD process. Special thanks to my advisor, Dr. Barry
Boehm, for accepting me as his PhD student, mentoring and guiding me. I thank my committee
(Dr. Shang-hua Teng and Dr. Behrokh Khoshnevis) for their feedback and support.
While I must keep the company that provided me with data for this research confidential,
I thank them for giving me the resources I needed to collect the data. Finally, I thank all
colleagues that have supported me.
iv
Table of Contents
Dedication ....................................................................................................................................... ii
Acknowledgments .......................................................................................................................... iii
List of Tables ................................................................................................................................. vi
List of Figures ............................................................................................................................... vii
Abstract ........................................................................................................................................ viii
Chapter 1 Introduction ............................................................................................................. 1
1.1 Motivation ...................................................................................................................................... 1
1.2 Research Overview ......................................................................................................................... 6
1.3 Research Contributions ................................................................................................................. 10
1.3.1 Body of Knowledge ............................................................................................................................. 10
1.3.2 Practical Use in Industry ...................................................................................................................... 12
Chapter 2 Background ............................................................................................................ 13
2.1 Terms and Definitions .................................................................................................................. 13
2.2 COCOMO® II .............................................................................................................................. 14
2.3 IFPUG Function Points (FPs) ....................................................................................................... 16
2.4 COSMIC Function Points (CFPs) ................................................................................................ 18
Chapter 3 Related Work ......................................................................................................... 21
3.1 Parametric versus Expert Judgment Estimation ........................................................................... 21
3.2 Software Size Metrics ................................................................................................................... 22
3.3 Existing Parametric Models .......................................................................................................... 25
3.3.1 TruePlanning® by PRICE® Systems .................................................................................................. 25
3.3.2 SEER®-SEM by Galorath ................................................................................................................... 26
3.4 Functional Size Metrics Research ................................................................................................ 26
Chapter 4 Datasets .................................................................................................................. 36
4.1 Unified Code Count (UCC) .......................................................................................................... 37
4.1.1 Dataset Summary ................................................................................................................................. 37
4.1.2 Dataset Attributes ................................................................................................................................. 38
4.2 Industry New Development Projects ............................................................................................ 42
4.2.1 Dataset Summary ................................................................................................................................. 42
4.2.2 Dataset Attributes ................................................................................................................................. 43
Chapter 5 Research Methodology .......................................................................................... 47
5.1 Research Questions and Hypotheses ............................................................................................ 47
5.2 Calibration Technique .................................................................................................................. 49
v
5.2.1 Effort Normalization ............................................................................................................................ 52
5.3 Goodness of Fit and Prediction Accuracy Statistics ..................................................................... 53
5.3.1 Cross Validation ................................................................................................................................... 54
5.3.2 Residuals’ Properties ............................................................................................................................ 55
5.4 Threats to Validity and Limitations .............................................................................................. 55
5.4.1 Internal Validity Concerns ................................................................................................................... 55
5.4.2 External Validity Concerns .................................................................................................................. 57
Chapter 6 Calibrated COCOMO® II Model .......................................................................... 60
6.1 Calibration Step Results ............................................................................................................... 60
6.1.1 Step 1 Results ....................................................................................................................................... 60
6.1.2 Step 2 Results ....................................................................................................................................... 62
6.1.3 Final Model Residuals .......................................................................................................................... 63
6.2 Model Details ............................................................................................................................... 65
6.2.1 Original COCOMO® II Model ............................................................................................................ 65
6.2.2 IFPUG Function Points (FPs) .............................................................................................................. 66
6.2.3 COSMIC Function Points (CFPs) ........................................................................................................ 67
Chapter 7 Effort Estimation Effectiveness ............................................................................. 69
7.1 Calibrated COCOMO® II vs Alternative Methods ...................................................................... 69
7.2 Calibrated COCOMO® II for Software Types ............................................................................ 71
Chapter 8 Conclusions ........................................................................................................... 80
8.1 General Conclusions ..................................................................................................................... 80
8.2 IFPUG versus COSMIC Function Points ..................................................................................... 81
8.3 Summary of Contributions ........................................................................................................... 85
8.4 Future Work .................................................................................................................................. 86
References ..................................................................................................................................... 89
Appendices .................................................................................................................................... 98
Appendix A: Unified Code Count (UCC)’s Dataset ............................................................................... 98
Appendix B: Calculation Summary of FPs and CFPs for Unified Code Count (UCC)’s Dataset ........ 100
Appendix C: COCOMO® II Effort Factors Description ...................................................................... 109
Appendix D: Comparison of COCOMO® II Calibration Step 1 Options ............................................ 111
Appendix E: Linear and Nonlinear Equations, and Conversion Ratios used for Research Question 1 113
vi
List of Tables
Table 1 IFPUG Function Points (FPs) Calculation Complexity Multiplier Factors .................................................... 16
Table 2 Complexity Levels Criteria for External Inputs (EI) ...................................................................................... 17
Table 3 Complexity Levels Criteria for External Outputs (EO) and External Inquiries (EQ) .................................... 17
Table 4 Complexity Levels Criteria for Internal Logical Files (ILF) and External Interface Files (EIF) ................... 17
Table 5 Summary of Empirical Research Studies with IFPUG Function Points (FPs) ............................................... 29
Table 6 Summary of Empirical Research Studies with COSMIC Function Points (CFPs) ......................................... 31
Table 7 Unified Code Count (UCC)'s Dataset Attributes ............................................................................................ 38
Table 8 Industry's Dataset Attributes ........................................................................................................................... 44
Table 9 Project Types Identified in Datasets ............................................................................................................... 48
Table 10 Calibration Step 1 Compare R
2
of Calibration Options ................................................................................ 61
Table 11 Standard Error, t statistic, and p-values for Calibration Step 1 Selected Model’s coefficients .................... 61
Table 12 Residual Properties’ Statistics for Calibration Step 1 Selected Model ......................................................... 61
Table 13 Standard Error, t statistic, and p-values for Calibration Step 2 coefficients ................................................. 62
Table 14 Residual Properties’ Statistics for Calibration Step 2 ................................................................................... 62
Table 15 Residual Properties’ Statistics for Calibrated COCOMO® II with FPs and CFPs ...................................... 63
Table 16 Residuals Properties Statistics for Magnitude of Relative Error (MRE) of Calibrated COCOMO® II
with FPs and CFPs ....................................................................................................................................................... 64
Table 17 Original COCOMO® II (for SLOC) Constants’ Values .............................................................................. 65
Table 18 Original COCOMO® II (for SLOC) Product Complexity (CPLX) parameter values ................................. 65
Table 19 Original COCOMO® II (for SLOC) Exponent range .................................................................................. 66
Table 20 Calibrated COCOMO® II for FPs Constants’ Values .................................................................................. 66
Table 21 Calibrated COCOMO® II for FPs Product Complexity (CPLX) parameter values ..................................... 67
Table 22 Calibrated COCOMO® II for FPs Exponent range ...................................................................................... 67
Table 23 Calibrated COCOMO® II for CFPs Constants’ Values ............................................................................... 67
Table 24 Calibrated COCOMO® II for CFPs Product Complexity (CPLX) parameter values .................................. 68
Table 25 Calibrated COCOMO® II for CFPs Exponent range ................................................................................... 68
Table 26 Prediction Accuracy of Methods suggested by research and Calibrated COCOMO® II for FPs ................ 70
Table 27 Prediction Accuracy of Methods suggested by research and Calibrated COCOMO® II for CFPs ............. 70
Table 28 Prediction Accuracy and Correlation Statistics for UCC's Low Parsing Projects ........................................ 72
Table 29 Prediction Accuracy and Correlation Statistics for UCC's High Parsing Projects ....................................... 73
Table 30 Prediction Accuracy and Correlation Statistics for Industry’s Data Transfer Projects ................................. 75
Table 31 Prediction Accuracy and Correlation Statistics for Industry’s Record, Encrypt, Decrypt Data Projects ..... 77
Table 32 Prediction Accuracy and Correlation Statistics for Input and Output Projects ............................................. 77
Table 33 Summary of Research Question 2 Results .................................................................................................... 79
Table 34 Calibrated COCOMO® II performance across FPs and CFPs for UCC's and Industry's Datasets
Individually .................................................................................................................................................................. 81
Table 35 Comparison of Correlation across Project Types ......................................................................................... 83
Table 36 Comparison of Prediction Accuracy across Project Types ........................................................................... 83
Table 37 Unified Code Count (UCC)'s Dataset ........................................................................................................... 98
Table 38 SLOC, FPs, and CFPs counts for functionality in UCC's Dataset .............................................................. 100
Table 39 COCOMO® II Effort Factors Definitions per Rating Level ...................................................................... 109
Table 40 Prediction Accuracy Comparison of Calibration Step 1 Options for FPs ................................................... 112
Table 41 Prediction Accuracy Comparison of Calibration Step 1 Options for CFPs ................................................ 112
Table 42 Linear and Nonlinear Models to Estimate Effort with FPs and CFPs ........................................................ 113
Table 43 Conversion Ratios used in Research Question 1 ........................................................................................ 114
vii
List of Figures
Figure 1 Cone of Uncertainty constructed by Barry Boehm in (B. W. Boehm 1981) demonstrating the uncertainty
and error in estimating SLOC across different phases of the lifecycle .......................................................................... 2
Figure 2 Effort against FPs (left) and CFPs (right) for 2 teams developing similar products from the same
organization, demonstrating differences in growth trends ............................................................................................. 3
Figure 3 Visual demonstration of how size and effort factors work together to estimate development effort .............. 4
Figure 4 Methodology used to develop COCOMO® II effort estimation model from (B. Boehm et al. 2000) ......... 15
Figure 5 Software Model used by IFPUG to calculate Function Points (FPs) ............................................................ 16
Figure 6 Software Model used by COSMIC to calculate COSMIC Function Points (CFPs) ...................................... 19
Figure 7 Software Lifecycle Development High-level Phases, Sizing Aids available during the Phases, and
corresponding Size Metrics. Initial image from (Malik 2010), with additional information from (Fenton and
Bieman 2014; Whigham, Owen, and Macdonell 2015; Cohn 2004). .......................................................................... 24
Figure 8 Actual or Simulated Data with FPs (top) and CFPs (bottom) from empirical research papers. 3 outliers
hidden from images to better show variation among datasets. .................................................................................... 33
Figure 9 Calibrated COCOMO® II Residuals for FPs (left) and CFPs (right) ........................................................... 63
Figure 10 Calibrated COCOMO® II Magnitude of Relative Errors (MREs) for FPs (left) and CFPs (right) ............ 64
Figure 11 Normalized Effort vs FPs (Left) and CFPs (Right) for UCC’s Low Parsing Projects ................................ 72
Figure 12 Normalized Effort vs FPs (Left) and CFPs (Right) for UCC’s High Parsing Projects ............................... 74
Figure 13 Normalized Effort vs FPs (Left) and CFPs (Right) for Industry’s Data Transfer Projects ......................... 75
Figure 14 Normalized Effort vs FPs (Left) and CFPs (Right) for Industry’s Record, Encrypt, Decrypt Data
Projects ......................................................................................................................................................................... 76
Figure 15 Normalized Effort vs FPs (Left) and CFPs (Right) for Input and Output Projects ..................................... 78
Figure 16 FPs (top) and CFPs (bottom) mapped against SLOC for UCC’s Dataset ................................................. 115
viii
Abstract
Accurate software development effort and cost estimates help teams and managers
perform trade-off analysis, manage resources, objectively track progress, and have sufficient
time to build high-quality products. Since the 1960s, source lines of code (SLOC) has been the
predominant size metric for software estimation models, because it is the most tangible metric of
the software development effort. Barry Boehm performed behavior analysis to identify factors
that would have positive or negative effects on the required software development effort. The
first version of the resulting estimation model, COCOMO (Constructive Cost Model)® (also
referred to as COCOMO® ‘81), and its successor, COCOMO® II, take SLOC and effort factor
ratings as inputs to estimate the required effort. However, SLOC is difficult to estimate until a
project is nearly complete.
Functional size metrics (FSMs), such as IFPUG Function Points (FPs) and COSMIC
Function Points (CFPs), represent software size using functional processes visible to a user –
inputs, outputs, and accessing storage. FSMs allow better communication between the
development team and users on the scope of the project, how it changes over time, and the cost
of building the software. To date, a generalizable effort estimation model with FSMs does not
exist. The most common method used to estimate effort with FSMs is to perform linear or
nonlinear regressions. These research papers do not account for additional effort factors and their
effects on development effort. Another suggested method is to convert FPs to SLOC and use
existing cost models, provided as an option by COCOMO® II. However, the conversion between
FPs and SLOC assumes that both size metrics grow in a similar manner and thus, additional
estimation errors can be introduced in the estimation process. This dissertation provides a
generalizable effort estimation model by calibrating the COCOMO® II model to use either FPs
ix
or CFPs directly as size parameters. The calibrated COCOMO® II model estimated within 25%
of the actuals 68% of the time for FPs and 70% of the time for CFPs. In comparison, the best of
the alternative solutions provided estimates within 25% of the actuals 36% of the time for FPs
and 38% of the time for CFPs.
FPs and CFPs have been found to work well in different scenarios: FPs are well-suited
for Management Information Systems (MIS) or data-driven applications, while CFPs are also
well-suited for embedded, real-time, and web applications. Even within application domains,
applications can differ from one another by the number and complexity of algorithms
implemented. No empirical study has attempted to characterize software attributes and how
FSMs behave differently with respect to them. Five types of software attributes were identified
in the datasets used for this dissertation based on the number and complexity of operations and
algorithms. The results show that the correlation between FPs/CFPs and effort depends on the
amount of complexity operations required with respect to the functional processes.
This dissertation addresses gaps in the existing research by calibrating COCOMO® II to
allow FSMs (functional size metrics) as size parameters and analyzes how the model behaves
differently for various types of projects. The gaps addressed in this research are: 1) a
generalizable estimation model that accounts for functional size and effort factors, and 2)
characterizing the types of projects for which FSMs provide better effort estimates. Software
development teams, managers, cost estimators, and organizations will be able to use the results
provided here to make informed decisions, as they will be able to generate estimates based on the
personnel, product, and environmental factors. Additionally, they will know how well the model
performs on the type of projects and tasks they are estimating.
1
Chapter 1 Introduction
1.1 Motivation
Software cost and effort estimation is a crucial step in software development. It is
necessary for managing resources, making decisions early in the software development lifecycle,
performing trade-off analysis, and tracking progress of the project. Though software cost
estimation has been a topic of research since at least 1965 (Nelson 1967), research has not yet
provided a generalizable cost estimation model based on functional size metrics (FSMs).
Most widely used cost models use source lines of code (SLOC) as the software size input
metric, due to its quantifiability and high correlation with effort (Albrecht and Gaffney 1983;
Kemerer 1987). SLOC cannot be accurately estimated until the project is nearly complete, as
seen in the Cone of Uncertainty (Figure 1) (B. W. Boehm 1981). While estimating the required
SLOC size of new projects is difficult, the task is more daunting when existing code is being
used or reused. An extensive amount of time is needed to understand and learn the source code,
and which “parts of the (source code) are relevant to (the) current task” (Ko et al. 2006; Singer et
al. 2010). Additionally, the source code requiring changes may be distributed among the
“system’s components and modules” making the learning process “both time-consuming and
difficult” (Ko et al. 2006; Eick et al. 2001; LaToza, Venolia, and DeLine 2006). Therefore,
estimating the size of the modules and code that need to be changed, and the amount of changes
that will be required in SLOC becomes very difficult early in a task’s or project’s lifecycle.
Two predominant functional size metrics (FSMs), IFPUG Function Points (FPs) and
COSMIC Function Points (CFPs), represent software based on its functional processes.
Functional processes consist of inputs from users, reading from or writing to memory, and
outputting results. Because project stakeholders can define such functional processes earlier in
2
the project’s lifecycle, FSMs are easier to estimate early in the lifecycle. The software’s functions
provide a means to measure the software project size, thus making it a problem-based metric. On
the other hand, SLOC represents the output of the development process (code), making it a
solution-based metric. Research on the benefits and effectiveness for FSMs for cost estimation
has been conducted for many years. Research papers reporting on the effectiveness of FSMs to
estimate effort/cost generally do not account for other effort drivers (personnel, product, and
environmental factors) or they include ad hoc parameters to improve estimates. The models and
ad hoc parameters proposed by the existing studies only fit the data used in the paper. IFPUG
originally defined General System Characteristics to adjust the functional size with complexity
attributes, which were removed in 2010 due to their ineffectiveness (Albrecht and Gaffney 1983;
(IFPUG) 2010). In summary, a generalizable cost estimation model with FSMs does not exist.
Figure 1 Cone of Uncertainty constructed by Barry Boehm in (B. W. Boehm 1981) demonstrating the
uncertainty and error in estimating SLOC across different phases of the lifecycle
3
Figure 2 Effort against FPs (left) and CFPs (right) for 2 teams developing similar products from the
same organization, demonstrating differences in growth trends
Organizations can locally calibrate custom effort estimation models with FSMs, and
potentially get accurate effort estimates. While the calibrated model may work well as long as
the software development team builds similar types of software, it will not work as accurately if
there are changes in the development team or in the type of software being built. As an example,
Figure 2 displays data collected from 2 teams in the same organization (details on dataset in
Chapter 4 Datasets). Though the 2 teams built similar types of software, the growth trends are
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
0 50 100 150 200 250
Effort (hours)
IFPUG Function Points (FPs)
Comparison of 2 Industry Teams (FPs)
Team 1
Team 2
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
0 20 40 60 80 100 120 140 160 180
Effort (hours)
COSMIC Function Points (CFPs)
Comparison of 2 Industry Teams (CFPs)
Team 1
Team 2
4
significantly different. Performing linear regression on both teams’ data would not lead to
accurate effort estimates (as demonstrated in Chapter 7 Effort Estimation Effectiveness).
A more generalizable effort estimation model allows an organization to make accurate
effort estimates for situations locally calibrated models cannot. The required software
development effort can vary based on effort factors. A few examples include:
1. Experience levels: more experienced programmers will be able to produce more code in
less time compared to less experienced programmers.
2. Software complexity: more complex software will require more time and effort than less
complex software.
3. Team location: teams located within the same office can resolve questions more quickly
than teams across buildings and/or time zones.
Figure 3 visually demonstrates how size and effort drivers would work together to provide more
accurate and generalizable effort estimates.
Figure 3 Visual demonstration of how size and effort factors work together to estimate development effort
5
COCOMO (Constructive Cost Model)® II is a parametric cost estimation model that
requires size (KSLOC, which is equivalent to 1000 SLOC), personnel, product, and
environmental attributes as input parameters, and returns the estimated effort in Person-Months
(PM) as the output. Function Points (FPs) can be converted into SLOC using conversion ratios
published by Capers Jones (Jones 1996). This metric can then be used in COCOMO® II as a size
input to get estimates. The author of (Rollo 2006) found that the conversion led to inaccurate
estimates due to errors in the conversion. I did a regression for a custom conversion rate between
FPs and SLOC and came to the conclusion that the conversion still adds an extra layer of error
leading to inaccurate effort estimates (Hira and Boehm 2016). This dissertation provides a
generalizable effort estimation model by calibrating the COCOMO® II model to use either FPs
or CFPs directly as size parameters. This solution will not introduce conversion errors leading to
inaccurate effort estimates, while still utilizing COCOMO® II’s existing knowledge base (see
Chapter 7 Effort Estimation Effectiveness).
Empirical studies found that FPs are well-suited for Management Information Systems
(MIS) or data-driven applications, while CFPs are also well-suited for embedded, real-time, and
web applications (Xunmei, Guoxin, and Hong 2006). These are high-level descriptions and
groupings of applications. Applications from the same domain may differ significantly by the
number and complexity levels of algorithmic, mathematical, and other types of operations. The
Product Complexity (CPLX) parameter in COCOMO® II has the second-highest productivity
range (B. Boehm et al. 2000). This implies that complexity has one of the strongest effects on
required effort. Since complexity operations in software do not affect size in terms of FSMs, the
effectiveness of FSMs may differ if projects are characterized by complexity types. No empirical
study has attempted to characterize software attributes based on complexity and determine how
6
FSMs behave differently with respect to them. Five types of projects were identified in the
datasets used for this dissertation based on the number and complexity of operations and
algorithms. The results show that the correlation between FPs/CFPs and effort depends on the
amount of complexity operations required with respect to the functional processes that are
counted for size. Software development teams, cost estimators, managers, and organizations will
not only have a method to get generalizable effort estimates but evaluate how well the model
works on specific types of software.
1.2 Research Overview
Since COCOMO® II’s model is calibrated with respect to source lines of code (SLOC),
the model needs to be adjusted to accept functional size metrics (FSMs) as a size parameter. The
COCOMO® II model is described in detail in Section 2.2 COCOMO® II, but the general
mathematical model, or equation, is:
Effort = A × Size
E
× ! EM
i
17
i=1
where E = B + C × " SF
j
5
j=1
Equation 1 Generalized COCOMO® II Effort Estimation Model
A, B, and C are constants determined through the calibration process. EM stands for Effort
Multipliers and SF stands for Scale Factors, which are two categories of the model’s parameters.
Effort Multipliers have a multiplicative effect on effort, while Scale Factors have an exponential
effect. Since FSMs represent software size at a higher granularity-level compared to SLOC, 3
constants/parameters need to be updated:
1. The constant representing the productivity rate (A)
2. The Product Complexity (CPLX) parameter (an EM (Effort Multiplier))
7
3. The constants in the exponent that affect the rate at which effort grows with respect to
size (B and C)
If you move everything but constant A in the COCOMO® II model to the left, then the
units of the constant A is Person-Month/KSLOC (KSLOC is equivalent to 1000 SLOC). Hence,
A represents the productivity rate - the amount of time it takes to develop 1 KSLOC. Since
SLOC is a solution-based metric and FSMs are problem-based metrics, a productivity rate
defined in terms of SLOC cannot be used with FSMs. The constant A, which represents the
average productivity rate in the COCOMO® II model, needs to be redefined for FSMs.
Most development tasks require using and modifying existing software. COCOMO® II
provides a reuse model to calculate the equivalent SLOC of a project that uses or modifies
existing code. In other words, COCOMO® II uses parameters, such as software understanding,
to make the SLOC of a reuse project equivalent to that of a new development project. Therefore,
the productivity rate for any kind of development task is represented by a single constant. Both
FPs and CFPs have methods to size projects that modify existing software, but they do not make
them equivalent to new development size. Therefore, not only does the productivity rate need to
be adjusted for FPs and CFPs, but productivity rates for new development and enhancement
projects need to be identified.
There are 3 types of enhancement tasks: adding new features or functionality, modifying
the existing features/functionality, and fixing defects. FSMs are not recommended for sizing and
estimating bug fixes ((IFPUG) 2010; (COSMIC) 2014), and hence, bug fixes are disregarded
from this research. Even though both adding new features and modifying functionality to
existing software requires understanding the existing software and the required changes, it is
quite possible that the productivity rates may be different for both types of tasks. While
8
calibrating for the productivity rate constant A in the COCOMO® II model, I will compare: 1)
having a single constant for productivity; 2) having productivity rates for new development
versus enhancement projects; and 3) having productivity rates for new development, adding new
features, and modifying existing features.
When calibrating the original COCOMO® II parameters, subject matter experts provided
their numerical effects on effort through Wideband Delphis (B. Boehm et al. 2000). The actual
parameter values were determined by combining values determined through data analysis and the
subject matter experts with Bayesian analysis, and therefore, the final COCOMO® II parameter
values are calibrated with respect to SLOC. Hence, the effect of Product Complexity (CPLX) has
a relationship with the size metric being used. It is very possible that more complex software
requires more lines of code. For instance, an output of a report without requiring any algorithms
(i.e., just a plain read statement) will require fewer lines of code than an output that is a result of
an algorithm. Hence, more complex code may cause an increase in lines of code and
corresponding effort. Note, this assumption does not necessarily hold for scientific software or
algorithms that require extensive amount of research, though may only require a few lines of
code. For FSMs, an output that may or may not require algorithms are given the same size,
because size is based on the data involved in the functional process and not any required
algorithms. Therefore, Product Complexity (CPLX) may have a stronger effect on effort with
respect to FSMs since size does not correspondingly increase with complexity (as it did for
SLOC). The effect other product-based effort factors have on effort may also vary based on
FSMs. The data collected for this research does not have enough variation in the other product-
based effort factors to evaluate whether there is a difference in effect or provide a quantitative
9
value for the difference. Therefore, while this is outside of the scope of this research, it is a
planned future step to further enhance the calibrated COCOMO® II model for FSMs.
A few sources state that software development effort grows more quickly than size
(nonlinearly) due to diseconomies of scale (B. W. Boehm 1981; McConnell 2006). No study has
evaluated whether the diseconomies of scale applies to FSMs. While earlier studies evaluating
FPs occasionally use nonlinear equations, more recent studies (including studies evaluating
CFPs) mostly assume that the relationship between FPs/CFPs and effort is linear (see Chapter 3
Related Work). The COCOMO® II model identifies five scale factors that have an exponential
effect on effort, which could reduce the exponent to 0.91 or increase it to 1.23 (B. Boehm et al.
2000). However, the exponent is adjusted with 2 constants: one that is multiplied with the scale
factors’ parameter values (C), and one that is added to that product (B). Constant C determines
how much the factors’ affect constant B, while constant B determines the minimum possible
exponent value. To determine effort’s rate of growth with respect to FSMs, I calibrate for both
the B and C constants in the COCOMO® II model.
After adjusting the COCOMO® II effort estimation model to allow FSMs as size inputs, I
evaluate whether the updated model leads to more accurate effort estimates compared to methods
currently suggested through research:
• Run linear regression
• Run nonlinear regression (log transformation)
• Convert FSM to SLOC using Capers Jones ratios and use COCOMO® II (FPs only)
• Locally calibrate FSM to SLOC ratio and use COCOMO® II
• Locally calibrate CFPs to FPs ratio and use FP linear and nonlinear models (CFPs only)
10
Since the causal effects of personnel, product, and environmental factors are accounted
for in COCOMO® II’s model, the effort estimation accuracy is expected to improve. Though the
Product Complexity (CPLX) parameter will be adjusted, the correlation between FSMs and
effort may vary within groups of projects with similar numbers and types of complexity
operations. FSMs do not account for complexity caused by algorithms and operations, since their
calculations are based on data moved through inputs, outputs, and transactions that access
storage. For example, an output that is produced without any computational operations is the
same size as an output produced as a result of computational operations. This may cause large
variation in effort with respect to FSMs even among projects with similar attributes based on
complexity operations. Five groups were identified in the datasets used for this dissertation based
on the number and complexity of operations and algorithms. The correlation and effort
estimation effectiveness will be compared across the 5 groups to determine whether or not FSMs
and a complexity parameter can sufficiently estimate effort across various types of projects.
1.3 Research Contributions
1.3.1 Body of Knowledge
In order to calibrate COCOMO® II and allow size to be represented with FSMs (such as
FPs and CFPs), a fundamental question must be addressed: the rate at which effort grows with
respect to FSMs. As will be reviewed in Chapter 3 Related Work, the existing body of
knowledge in software effort estimation has not fully addressed whether the relationship between
FSMs and effort is linear or not. While earlier studies perform log or natural log transformations
on data to account for the nonlinear relationship between size and effort, more recent papers
mostly assume linear relationships. Currently, the relationship between size and effort depends
on the dataset at hand. However, the true relationship between any 2 variables can only be
11
determined if other effects on the dependent variable (effort) have been normalized. In this
dissertation, while accounting for effort factors through the parameters defined in COCOMO®
II, I adjust the model’s constants in the exponent to determine the true rate at which effort grows
with respect to FSMs.
Another shortcoming in the existing research is the limited amount of data collected and
information provided. Most current studies evaluate whether FSMs can estimate effort within an
organization developing similar types of applications. It is very possible that the development
team remains mostly constant throughout the projects represented in the datasets. The effort
estimation models produced in these research studies cannot be used in other environments.
While there is a push for generalizable results and solutions in software engineering research,
high levels of prediction accuracy cannot be achieved between size and effort if other causes are
not accounted for in effort estimation. One can have a general solution or high accuracy, but not
both. Unless the effort estimation model is able to account for the most relevant causes of
software development effort. The developers of the COCOMO® II software development effort
estimation model collaborated with subject matter experts (cost estimators in industry) to get the
most prevalent effort drivers and their numerical effects on effort. To date, this existing
knowledge base has not been utilized to develop an effort estimation model using FSMs. This
research will adjust the COCOMO® II model to allow size represented by FSMs and
demonstrate that the adjusted model can provide sufficiently accurate predictions across 2
different teams and even organizations. Therefore, the main contribution of this research is a
generalizable and effective effort estimation model using FSMs (functional size metrics).
12
1.3.2 Practical Use in Industry
Calibrating COCOMO® II to allow size to be represented with FSMs (such as FPs and
CFPs) will provide cost estimators and managers with a generalizable, free, and openly available
software effort estimation model that accounts for personnel, product, and environmental factors
without needing to estimate SLOC. The ways this specifically benefits estimators and managers
are:
• It is commonly known and understood in the software cost estimation community that
locally calibrated cost models will outperform models built to provide effort estimates
across multiple environments and application types. Since the models provided in this
dissertation are free and openly available, organizations can calibrate it to perform more
accurately in their environment.
• Effort models based on the assumption that size is the only determining factor for effort
will not only provide poor statistical performance but cannot work in new situations. For
example, if and when the personnel changes, as will be shown in this dissertation as the
datasets consisted of projects developed by different teams within the same
environments.
• Since FSMs are easier to calculate early in the software development lifecycle, compared
to SLOC, the above-mentioned contribution leads to estimators and managers being able
to make more accurate and informed decisions, perform trade-off analyses, track
progress, and manage resources earlier in the lifecycle.
• When management is able to accurately predict effort, they will provide software
development teams sufficient time to test the software more completely, leading to better-
written and tested code (in other words, higher quality code).
13
Chapter 2 Background
2.1 Terms and Definitions
Software Size Metric
A software size metric or measurement attempts to describe or represent the size of a
project, without considering how much time or effort is required to develop it. Though size is a
relevant input for effort estimation, effort is also affected by personnel, product, and
environmental factors.
Functional Size Metric (FSM)
A functional size metric or measurement (FSM) attempts to represent the size of a project
based on its functional processes and attributes, without considering how much time or effort is
required to develop it. IFPUG Function Points (FPs) and COSMIC Function Points (CFPs) are
the 2 most prominent and wide-spread functional size metrics. Though many other functional
size metrics exist, they are either variants of FPs or used in very specific environments.
Productivity
In software development, productivity is calculated as effort/size. Therefore, productivity
represents the amount of effort required to build a size unit. The productivity rate depends on the
units used for both effort and size.
Product Complexity
Complexity in software can increase due to algorithms or operations needed to perform
specific functions. The developers of the COCOMO® II model identified 5 types of operations
14
that increase software complexity: control, computational, device-dependent, data management,
and user interface operations (B. Boehm et al. 2000).
Diseconomies of Scale
Diseconomies of scale is the relatively increased effort required while increasing the unit
of work. In other words, while developing a 100 SLOC project may take 3 months, a 200 SLOC
project may require more than 6 months. Therefore, additional effort is required with respect to
the unit of work.
2.2 COCOMO® II
COCOMO (Constructive Cost Model)® II is a parametric software development effort
estimation model that requires size, personnel, product, and environmental attributes as input and
returns the estimated effort in Person-Months (PM) as the output. This cost model is calibrated
with 16 organizations' data and projects of size 2,000 - 1,292,000 SLOC (source lines of code)
(Yang et al. 2011). The form of the COCOMO® II model is (B. Boehm et al. 2000):
Effort = 2.94 × Size
E
× ! EM
i
17
i=1
where E = 0.91 + 0.01 × " SF
j
5
j=1
Equation 2 COCOMO® II Effort Estimation Model for SLOC
Size is in terms of KSLOC, or 1000 SLOC. EM stands for Effort Multipliers and SF stands for
Scale Factors, which are two categories of the model’s parameters. Effort Multipliers have a
multiplicative effect on effort, while Scale Factors have an exponential effect. The ratings of the
5 scale factors can set the exponent between 0.91 and 1.23 (B. Boehm et al. 2000).
The 8-step methodology used to develop COCOMO® II is summarized in Figure 4
(image originally from (B. Boehm et al. 2000)). Subject matter experts (cost estimators in
15
industry) used behavioral analyses to identify the set of the most relevant effort drivers that
should be represented in COCOMO® II. The subject matter experts then continued to work with
the model developers to define how to rate the effort drivers and their numerical effects using
Wideband Delphi (B. Boehm et al. 2000). Delphi is a group-consensus technique that originated
at Rand in 1948 (Helmer, Brown, and Gordon 1966). The method is slightly modified to allow
discussion between votes to reduce the variance among the numeric values given to the effort
drivers (B. Boehm et al. 2000).
In Step 5, subject matter experts were asked to estimate the numerical effects effort
drivers have on effort while assuming all other effort drivers are nominal. The effort multipliers’
(EMs’) effects also should not depend on the size of the project. The above steps result in the a
priori model, which is enhanced with actual data using Bayesian analysis. In other words, the
experts’ opinions of the numerical effects the drivers have on effort is balanced with the
numerical effects inferred from regression. The EMs’ values form a geometric sequence, while
the Scale Factors (SFs) values follow an arithmetic/linear sequence.
Figure 4 Methodology used to develop COCOMO® II effort estimation model from (B. Boehm et al.
2000)
16
2.3 IFPUG Function Points (FPs)
Each project in the datasets used in this research have been sized using version 4.3.1 of
the International Function Point User Group (IFPUG)’s Function Points (FPs) method, which is
described briefly here. FPs describes software by the number and complexity of transactions and
file types. Figure 5 displays the software model used to calculate FPs, and Table 1 shows a
simplistic representation of how to calculate FPs ((IFPUG) 2010).
Figure 5 Software Model used by IFPUG to calculate Function Points (FPs)
Table 1 IFPUG Function Points (FPs) Calculation Complexity Multiplier Factors
Type of Component
Complexity Multiplier Factor
Low Average High Total
External Inputs (EI) 3 4 6
External Outputs (EO) 4 5 7
External Inquiries (EQ) 3 4 6
Internal Logical Files (ILF) 7 10 15
External Interface Files (EIF) 5 7 10
Total Number of Function Points (FPs)
17
Table 2, Table 3, and Table 4 display how the complexity levels are determined for the
component types ((IFPUG) 2010):
Table 2 Complexity Levels Criteria for External Inputs (EI)
Data Element Types (DETs)
1 – 4 5 – 15 > 15
File Types
Referenced
(FTRs)
0 – 1 Low Low Average
2 Low Average High
> 2 Average High High
Table 3 Complexity Levels Criteria for External Outputs (EO) and External Inquiries (EQ)
Data Element Types (DETs)
1 – 5 6 – 19 > 19
File Types
Referenced
(FTRs)
0 – 1 Low Low Average
2 – 3 Low Average High
> 3 Average High High
Table 4 Complexity Levels Criteria for Internal Logical Files (ILF) and External Interface Files (EIF)
Data Element Types (DETs)
1 – 19 20 – 50 > 50
Record Element
Types (RETs)
1 Low Low Average
2 – 5 Low Average High
> 5 Average High High
The Development Project Function Point Count (DFP) sizes new development projects,
calculated as ((IFPUG) 2010):
DFP = ADD + CFP
Equation 3 Development Project Function Point Count (DFP)
18
ADD and CFP are sizes of the added and data conversion functions, respectively ((IFPUG)
2010).
Maintenance tasks can be sized using the Enhancement Project Function Point Count
(EFP), calculated as ((IFPUG) 2010):
EFP = ADD + CHGA + CFP + DEL
Equation 4 Enhancement Project Function Point Count (EFP)
ADD, CHGA, CFP, and DEL are the sizes for new functions, changed functions after the
modifications, data conversion functions, and deleted functions, respectively ((IFPUG) 2010).
2.4 COSMIC Function Points (CFPs)
The Common Software Measurement International Consortium (COSMIC) group
reviewed the existing functional size methods, such as IFPUG Function Points (FPs), to develop
a size metric based on “the basic principles” that applies to a wider range of application domains
(Abran, Oligny, and Symons 2000). Described here is version 4.0 of COSMIC’s Function Points
(CFPs) method, which is used to size the projects in this research. Calculating a software’s size
with CFPs requires 3 phases ((COSMIC) 2014):
1. The Measurement Strategy Phase: The measurers define the purpose, scope, and level of
granularity of the measurement and the functional users of the software being measured
((COSMIC) 2014).
2. The Mapping Phase: The functional user requirements (FURs) should be identified and
follow these guidelines: each FUR should map to a unique functional process, each
functional process has sub-processes, and each sub-process is either a data movement or
data manipulation. CFPs do not size data manipulation, and the 4 defined types of data
19
movements that are counted for CFP size estimates are Entry (E), Exit (X), Read (R), and
Write (W) ((COSMIC) 2014).
3. The Measurement Phase: Each data movement receives a count equal to the number of
data groups being moved by the transaction, where a data group is a “distinct, non-empty
and non-ordered set of data attributes” (Di Martino et al. 2016). The size of the software
is the sum of all data movement counts across all the functional processes ((COSMIC)
2014).
Figure 6 is a graphical representation of the functional sub-processes defined for CFPs
((COSMIC) 2014).
Figure 6 Software Model used by COSMIC to calculate COSMIC Function Points (CFPs)
Therefore, the total size of a functional process is the equivalent of all individual data
movements ((COSMIC) 2014):
Size#functional processes
i
$
= " size(Entries
i
)+ " size(Exits
i
)+ " size(Reads
i
)+ " size(Writes
i
)
Equation 5 COSMIC calculation of the total size of a functional process
20
Changes to a functional process are sized in the following manner ((COSMIC) 2014):
Size#Change(functional processes
i
)$
= " size(added data movements
i
)
+ " size(modified data movements
i
)+ " size(deleted data movements
i
)
Equation 6 COSMIC calculation of the changes to a functional process
21
Chapter 3 Related Work
3.1 Parametric versus Expert Judgment Estimation
Project managers or development teams make estimates based on their judgment or a
subjective comparison to previous projects and software features if they do not use a formal
effort estimation method. Through several experiments, Jørgensen found that expert-based and
judgment-based estimates are subject to bias with respect to the order in which projects are
estimated (Jørgensen specifically used Story Points in his work). Jørgensen found that estimators
tend to estimate larger tasks better after estimating smaller tasks. After estimating larger tasks,
estimators are more likely to give optimistic (less effort) estimates for smaller tasks.
Additionally, estimators may have the tendency to think that of 2 similarly sized projects, the
second one (despite the order in which the projects are evaluated) will seem larger than the first
one (Jørgensen 2004).
The authors of (Nolan et al. 2017) performed an experiment to gain insight on how well
people can guess and compare it to their confidence in their answers. Of the 3,760 responses
received, about 65% of the population’s estimation accuracy and confidence matched (in other
words, they knew whether their estimates were accurate or not). Another 28% of the responders
were better estimators than they realized, 6% were overly confident and 1% were very confident
and had very wrong responses. Without a systematic, reliable method to estimate effort,
estimators can fall in any one of these 4 categories for any given estimate. Unfortunately, one
cannot rely on the estimators’ confidence, as 7% are likely to be over-confident and 28% under-
confident (Nolan et al. 2017).
Having a formal way to measure the project and estimate the associated effort will
provide more reliable and replicable estimates. To ensure that effort estimates are also accurate,
22
one must use a size metric that correctly describes and measures the size of the software being
built, use a cost model that accounts for various effort drivers, and potentially calibrate the model
further. Therefore, the next section will present an overview of the size metrics that exist.
3.2 Software Size Metrics
Source lines of code (SLOC) has been used for effort estimation because it can easily be
collected for completed projects, has high correlation with effort, and represents the output of
software development. However, they are difficult to objectively estimate early in the software
development life cycle. Figure 1 in Chapter 1 Introduction (on page 2) shows the amount of
variation in the SLOC estimates by experienced cost estimators. Inexperienced estimators may
have larger variations in their estimates. Since the 1970s, many software size metrics have been
developed to use various aspects of software or their requirements to size software more
objectively early in the software development life cycle. Figure 7 lists these metrics and maps
them to the software development life cycle phases, level of requirements (Cockburn 2000) and
deliverables defined during the phases.
Two effort estimation books, Software Estimation Best Practices, Tools & Techniques: A
Complete Guide for Software Project Estimators and Software Project Effort Estimation:
Foundations and Best Practice Guidelines for Success (Chemuturi 2009; Trendowicz and Jeffery
2014), suggest using functional size metrics (FSMs). FSMs use information that can be estimated
before and further clarified while constructing the software’s architecture. FSMs size the types of
transactions software systems are typically required to perform (inputs and outputs to and from
interfaces and storage).
Object-Oriented applications generally have internal data transfers between objects, and
these types of transactions are not accounted for in FSMs. While some web applications rely on
23
data transfers between a user and storage, many websites primarily share static information or
dynamic data with viewers. Researchers developed many object- and GUI-based metrics in order
to size these types of projects: Objects, Predictive Object Points, Components, Object Points,
Screens, Reports, and Files.
Use Case Points (UCPs) can be estimated with higher level requirements (as shown in
Figure 7), but many practitioners criticize the method because the complexity weights were not
constructed or validated by data. Additionally, use cases are very tedious to construct, and
therefore in many environments, have been replaced by Story Points. Many Agile development
teams use Story Points to “size” features and determine which and how many features they will
implement within a sprint. However, they do not intend to use Story Points to estimate effort.
Compared to the alternative options, FSMs are an objective way to calculate a project’s
size early in the lifecycle and potentially lead to acceptable effort estimates. IFPUG Function
Points (FPs) and COSMIC Function Points (CFPs) are the 2 most relevant functional size
metrics. While FPs is the first functional size metric developed and, therefore, has gained a lot of
popularity, CFPs is a second-generation functional size metric that uses the foundation of FPs
while being more applicable to modern types of software and improving some reported
drawbacks of FPs. The next section describes the existing parametric models that allow FSMs as
inputs to estimate effort/cost followed by another section describing the existing research that
use FSMs for effort estimation.
24
Figure 7 Software Lifecycle Development High-level Phases, Sizing Aids available during the Phases, and corresponding Size Metrics.
Initial image from (Malik 2010), with additional information from (Fenton and Bieman 2014; Whigham, Owen, and Macdonell 2015;
Cohn 2004).
25
3.3 Existing Parametric Models
Though parametric effort/cost estimation models that allow FSMs as size parameters
exist, they are both proprietary. Therefore, the details of the models’ parameter values and
whether they are dependent or independent on the type of size input is not publicly available.
3.3.1 TruePlanning® by PRICE® Systems
The TruePlanning® software model is PRICE® Systems’ software estimation model that
takes activities, resources, programming languages, size, and cost drivers as inputs. Data has
been gathered across various domains: business systems, military, avionics, flight and space
software, and commercial software. The model estimates effort using the following formula:
Effort = Size × Baseline Productivity × Productivity Adjustments
Equation 7 PRICE® Systems' TruePlanning® Effort Estimation Model
Baseline productivity varies by activity and size metric used. It is calculated using
existing data and/or research results to determine this productivity rate. Productivity adjustments
are the numerical effects of cost drivers on productivity. Size can be in terms of Source Lines of
Code (SLOC), Function Points (FPs), Predictive Object Points (POPs), or Use Case Conversion
Points (UCCP).
The system then asks for details on the functional complexity of the software being built.
The estimator must explain the percentage of the code that consists of statistical/mathematical
operations, string manipulation, data storage and retrieval operations, etc. Additionally, the user
needs to provide details for the operating specification, project constraints, internal integration
complexity, development team complexity, organizational productivity, and other cost drivers to
26
further adjust the productivity rate and the resulting effort estimate (PRICE Systems L.L.C.
2011).
3.3.2 SEER®-SEM by Galorath
Another proprietary software cost estimation model, SEER®-SEM from Galorath, also
allows source lines of code (SLOC) and multiple functional input size metrics: IFPUG Function
Points, COSMIC Function Points, Mark II Function Points, Fast Function Points, Function
Based Sizing, and Unadjusted Function Points. The model “equates” the functions to “effort
units” and uses cost drivers (such as platform application and complexity) to get the effort
estimate (Galorath Incorporated, n.d.):
Effort = Lx × (AdjFactor × FSM)
!
Entropy
1.2
"
Equation 8 Galorath’s SEER®-SEM Effort Estimation Model
Lx is the effort units, which are based on the analysis of actual project data; AdjFactor is the
product of the complexity, count phase, platform and application adjustments; Entropy ranges
from 1.04 to 1.2 depending on the type of software being estimated.
Along with software cost estimates, SEER®-SEM also provides risk information as
output. SEER-SEM’s data repository consists of thousands of data points that come from
Department of Defense (DoD) projects and commercial software products (Galorath
Incorporated, n.d.).
3.4 Functional Size Metrics Research
IFPUG Function Points (FPs) represents software size by functions or modifications to
functions, and therefore, may be used as early in the lifecycle as the required functionality is
known. Hence, it becomes “highly desirable to (have) a relationship between functional size and
27
effort to correctly estimate the effort” (Gencel and Demirors 2008). Though Albrecht and
Kemerer found FPs to be reliable in estimating effort (Albrecht and Gaffney 1983; Kemerer
1987), several studies found that FPs do not represent complexity sufficiently (Fenton and
Bieman 2014; Ferens 1988; Symons 1988) and “can result in a disproportional measurement of
size” (Hastings and Sajeev 2001). One study found that the correlation between FPs and effort
was weak and led to a curve with R
2
=0.43 (Hastings and Sajeev 2001). Another study found that
the linear regression between FPs and effort led to R
2
=0.65 and stated that “even in concert with
other predictors, (FPs) seem only helpful in software development cost estimation” (Matson,
Barrett, and Mellichamp 1994). The authors evaluating whether FPs can be used for Object-
Oriented systems found that FPs are not applicable to applications without heavy data
processing. The FPs counting practices require a large number of data element types (described
in Chapter 2 Background) to reach average or high complexity (Caldiera et al. 1998). Jørgensen
states that FPs are not helpful on small maintenance tasks, or tasks that make changes not
concerned with user functionality (Jørgensen 1995).
The software model and measurement rules defined for FPs cannot be applied to software
outside the business application domain. In 2007, Functional Size Measurement (FSM)
practitioners found applying FPs was becoming difficult, as stated by the author of (Heeringen
2007): “In this era of web applications and service-oriented architectures, the guidelines to
identify logical files for instance are sometimes hard to apply. Many applications don't even
work with permanent data anymore, so there will be no EIF's (External Interface Files) and ILF's
(Internal Logical Files) present”. Lavazza and Morasca found that FPs can be applied to
embedded software “via a careful interpretation of FP counting rules”, however, “bending of the
rules” is not required when using COSMIC Function Points (CFPs) for software in this domain
28
(Lavazza and Morasca 2013). The authors of (Di Martino et al. 2016) made customized rules to
use FPs to estimate effort for Web applications but discovered that CFPs led to better predictive
statistics compared to their variation of FPs (Di Martino et al. 2016). FPs have not only been
criticized for solely being applicable to business applications, but also for its limiting complexity
factors. For example, an External Input (EI) function in (IFPUG) FPs “gets 3, 4 or 6 function
points. A complex one gets 6 points, but a very complex EI also gets 6 points” (Jørgensen 1995).
CFPs does not limit the size of a function, as each data group involved in the functionality
receives an individual count. The authors of (Abran, Symons, and Oligny 2001) sized 6 real-time
projects within an organization and discovered that the “granularity of COSMIC allows much
better to capture the functional size variation within individual functional processes.”
Table 5 and Table 6 summarize previous empirical research using FPs and CFPs,
respectively, for effort/cost estimates. The goodness of fit and prediction accuracy statistics used
in effort/cost estimation are explained in Section 5.3 Goodness of Fit and Prediction Accuracy
Statistics. While many performed a linear or nonlinear regression between effort/cost and
FPs/CFPs, some studies introduced additional parameters to improve estimates. Figure 8 displays
the datasets used in the empirical studies listed in Table 5 and Table 6 demonstrating how
different the datasets are from one another. The effort estimation model in each study only fits its
own data and cannot be generalized to other datasets. The data in Figure 8 is either actual data or
simulated with the information provided in the paper. Some papers did not provide sufficient
information to simulate data, while 2 papers in Table 5 used the public ISBSG dataset. Hence,
these studies are not included in Figure 8.
29
Table 5 Summary of Empirical Research Studies with IFPUG Function Points (FPs)
Study Model
Development
Type
R
2
,
MMRE,
and/or
PRED(25)
Additional Notes on Data
(Abran and
Robillard
1996)
Work-days
= 30.853 +
1.956 * FP
Unknown R
2
= 0.49
Data from a major Canadian
financial organization. After
removing 16 data points,
R
2
=0.81.
(Ahn et al.
2003)
Weeks =
0.054 *
FPs
1.353
Maintenance/
Enhancement
R
2
=0.9664
Data comes from 4 system
management organizations in
Korea. Adjusts the FPs size by
rating factors related to
modifying existing software.
(Albrecht
and Gaffney
1983)
Hours =
-13390 +
54.5 * FPs
New
Development
R
2
= 0.874
Data from IBM Data Processing
Systems.
(Hastings
and Sajeev
2001)
Hours =
1991.7 +
5.1588 *
FPs
Unknown R
2
=0.4392
Gathered data from different
organizations, application
domains, and application
domains. 8 data points. Used
vector-based approach to
quantify size and showed that it
worked better than using only
FPs.
(Jeng et al.
2011)
Man-days =
-0.1476 +
0.2962 *
FPs
Add features
MMRE =
11.28%
5 data points. Made complexity
weights for following things:
inter-communication parameter
sets, connected subsystems,
user departments, connected
utility programs, updated files.
Ratings of these adjusted the
size.
(Jiang,
Naudé, and
Jiang 2007)
Hours =
38.3 *
FPs
0.736
New
Development,
Enhancement
R
2
= 0.475
Used ISBSG dataset, as of
2007. The ISBSG is a public
dataset consisting of data from
various organizations,
application domains, and
application types.
(Kemerer
1987)
Man-
months =
-122 +
0.341 * FPs
Unknown R
2
= 0.553
Project comes from a computer
consulting and services firm
that designs and develops data
processing software.
30
Study Model
Development
Type
R
2
,
MMRE,
and/or
PRED(25)
Additional Notes on Data
(Koh,
Selamat, and
Ghani 2008)
Hours =
54.82 *
FPs
0.711
New
Development,
Enhancement
R
2
=
0.3296,
MMRE =
0.049
Used ISBSG dataset, as of 2008
(Release 9).
(Matson,
Barrett, and
Mellichamp
1994)
Hours =
85.7 + 15.12
* FPs
Unknown R
2
= 0.652
Data comes from a large
organization, but ranges across
many application types:
systems that run on MVS and
UNIX operating systems;
programmed in COBOL, PL/I,
and C; and database
management systems.
(Niessink
and Van Vliet
1997)
Hours =
6.01 + 8.96
* FPs
Maintenance
R
2
=0.40,
MMRE =
110%
PRED(25)
= 26%
Found that the size of the
components that need to be
changed leads to better results
than the size of the change
itself. This study proposes
corrective factors to improve
effort estimates. Data comes
from a large financial
information system.
(Wittig and
Finnic 1994)
Neural
Networks
Unknown
MMRE =
24.71%
Data collected from 15
commercial systems
(Zheng et al.
2009)
Man-days =
25.7401 +
0.7098 *
FPs
New
Development
R
2
=
0.9942
Data comes from a software
company that provides
information service (IT).
31
Table 6 Summary of Empirical Research Studies with COSMIC Function Points (CFPs)
Study Model
Development
Type
R
2
,
MMRE,
and/or
PRED(25)
Additional Notes on Data
(Abran,
Silva, and
Primera
2002)
Organization
A: Hours
8.97 + 0.65
* CFPs +
163.54 *
Difficulty +
0.16 * CFPs
* Difficulty
B: 26.94 +
1.25 * CFPs
+ 56 *
Difficulty +
3.24 * CFPs
* Difficulty
A: Adding and
Modifying
Features
B:
Enhancements
A: R
2
=
0.83
MMRE =
45%
PRED(25)
= 53%
B: R
2
=
0.84
MMRE =
40%
PRED(25)
= 55.5%
Data comes from 2 organizations
that were modifying or adding
features to existing Web-based
applications, and real-time
applications. Both datasets are
analyzed separately. Considered
personnel experience and
difficulty level, which improved
estimation accuracy.
(Costagliola
et al. 2004)
Hours =
72.254 +
0.232 *
CFPs
New
Development
R
2
=0.878
Dynamic web applications,
client-server applications. Data
comes from projects developed
by students during a course on
Web Engineering that spans 2
academic years.
(Di Martino
et al. 2016)
Hours =
15.53 *
CFPs
0.8
Unknown
R
2
=0.87
MMRE:
29%
PRED(25)
= 68%
Data comes from an Italian
company that builds enterprise
information systems accessible
through the Web.
(Lévesque
and Bevo
2001)
Man-Months
= 43402 +
0.26 * CFPs
Unknown R
2
=0.6262
Data comes from an outsourcing
company that develops and
maintains management
information systems for a
telecommunication company.
(Nagano et
al. 2002)
Cost = -0.34
+ 0.17 *
CFPs
Unknown
Overall
R
2
=0.986
Small
R
2
=0.292
Data represents
telecommunication services:
discount, network, and Internet-
related services.
32
Study Model
Development
Type
R
2
,
MMRE,
and/or
PRED(25)
Additional Notes on Data
(Stern and
Guetta
2010) BCM
Dataset
Unknown
Body Control
Module
(BCM) Data:
New
Development
Engine Control
Module
(ECM) Data:
New
Development,
Enhancements
BCM Data:
R
2
= 0.61
ECM Data:
R
2
= 0.81
Data comes from Renault and the
efforts required to develop
software in cars. Suggests that
the data consists of new
development and enhancement
tasks.
Most studies only reported the R
2
of the regression, which is not a strong indicator of
prediction accuracy. Of the empirical studies with FPs, (Jeng et al. 2011), (Zheng et al. 2009),
and (Ahn et al. 2003) reported the highest performances with MMRE being less than 25% or R
2
being greater than 0.95. The datasets in (Jeng et al. 2011) and (Zheng et al. 2009) come from a
single organization and only consist of 5 and 7 data points, respectively. The data in (Ahn et al.
2003) comes from 4 organizations within the same application domain and the dataset consists of
26 data points. For CFPs, the studies with the best prediction accuracy are (Costagliola et al.
2004) with 32 data points and (Di Martino et al. 2016) with 25 data points. Again, the data points
in both studies represent a single development environment. In particular, note that the
PRED(25) is 68% in (Di Martino et al. 2016). As shown in Chapter 7 Effort Estimation
Effectiveness, the calibrated COCOMO® II model is able to get PRED(25) at 68% with FPs and
70% with CFPs for data that comes from 2 different organizations and 50 data points.
Most of the studies assume a linear relationship between FSMs and effort, while a few
used nonlinear regressions. Kitchenham did an in-depth comparison and analysis in 2002 to
33
Figure 8 Actual or Simulated Data with FPs (top) and CFPs (bottom) from empirical research papers. 3
outliers hidden from images to better show variation among datasets.
0
20000
40000
60000
80000
100000
120000
0 500 1000 1500 2000 2500 3000
Effort (hours)
IFPUG Function Points (FPs)
Data from Empirical Studies on FPs
Kemerer
Albrecht
Matson
Zheng
Hastings
Jeng
Niessink
Ahn
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 200 400 600 800 1000 1200
Effort (hours)
COSMIC Function Points (CFPs)
Data from Empirical Studies on CFPs
DiMartino
Costagliola
Levesque
Abran Org A
Abran Org B
34
determine why software engineering research has not been able to provide answers on the
relationship between functional size metric and effort. She attributes the differences in
conclusions across multiple empirical studies to inconsistent methods and assumptions, and lack
of making data publicly available (Kitchenham 2002). However, these studies do not account for
any factors that can have positive or negative effects on effort. Normalizing effort with respect to
personnel, product, and environmental factors allow for a more objective analysis of the
relationship between size and effort. Otherwise, higher productivity rates due to higher levels of
personnel experience, for example, compared to other data points will cause noise in the data
analysis. Hence, I am using a cost model (COCOMO® II) that describes a set of cost drivers to
account for changes in effort caused by external factors, for an objective and generalizable
analysis.
While the ratios between FPs and SLOC proposed by Jones (Jones 1996) have been
successfully used and adopted (and also included in the COCOMO® II model), some research
studies found that the ratios from Jones do not perform well in all environments. Instead, the
conversion ratio should be calculated with homogenous data coming from a single environment
(Rollo 2006; Henderson 1992; Desharnais and Abran 2003). The authors of (Dekkers and Gunter
2000) note that SLOC and FPs represent “different dimensional attributes of the software”. FPs
count inputs, outputs, queries, and storage while SLOC is every line of code that is written to
satisfy system requirements. While algorithms might increase the size in terms of SLOC, it does
not for FPs. Hence, FPs and SLOC do not grow at similar rates (Gencel, Heldal, and Lind 2009).
A few studies criticize the use of conversion ratios between FPs and SLOC for effort estimates as
additional layers of error are introduced (Santillo 2006; Neumann and Santillo 2006; Hira and
Boehm 2016).
35
FPs and CFPs are both functional size metrics and therefore, represent software from
similar points of view. Of course, both have different counting rules (see Chapter 2 Background).
Two motivations for researchers to evaluate the conversion between FPs and CFPs were: 1)
CFPs counting rules are easily applied to modern software and larger set of application domains,
and 2) in order to use the existing data with FPs for benchmarks and comparisons. Therefore,
managers or cost estimators can size a project in CFPs, but then convert to FPs to use the
existing data and benchmarks. A paper reviewed previous conversion studies (Ho, Abran, and
Fetcke 1999; V ogelezang, Lesterhuis, and others 2003; Desharnais, Abran, and Cuadrado 2006)
and noticed that the conversion between FPs and CFPs typically has a coefficient of 0.9 to 1.1. In
other words, there is an almost 1-to-1 conversion between FPs and CFPs (Cuadrado-Laborde et
al. 2010). The authors of (Ferrucci, Gravino, and Sarro 2014) found that identifying the
conversion rates with data local to an organization/company performs better than data coming
from multiple organizations/companies. On the other hand, researchers noticed that information
was lost in the conversion due to the differences in counting rules (Abualkishik et al. 2012;
V ogelezang, Lesterhuis, and others 2003; Rabbi, Natraj, and Kazeem 2009; Lavazza 2014;
Gencel and Bideau 2012; Di Martino et al. 2016).
Despite the number of empirical research papers studying the effectiveness of using
FSMs for effort estimation and the methods to make using FSMs simpler or use existing data and
knowledge, a generalizable and accurate solution does not exist. The existing solutions either
perform regressions between size and effort, which ignore the effects of personnel, product, and
environmental effort factors; add ad hoc parameters to the model which account for some of the
effort factors; or convert one size metric into another one, adding an additional layer of error.
36
Chapter 4 Datasets
In exploring types of research at USC (University of Southern California), I wanted to
pursue quantitative models for improving software development productivity, performance, and
quality, but I was having difficulty finding adequate sources of data. Fortunately, an opportunity
came when I got the job of managing the evolution of USC’s Unified Code Count (UCC) system.
UCC is the most widely used tool for software customers and developers in consistently and
accurately measuring the size aspects of computer programs. UCC is continuously evolving:
each year there are requests to support more programming languages (now up to 30), or
additional metrics, such as complexity metrics.
Each year, performers are selected from a pool of USC Masters-level Computer Science
Directed Research students, with varying annual workload from 40 to 80 performers. As their
manager, I was able to get them to report progress, hours worked, and percent complete of their
projects. These data were valuable to me in determining how many students I would need for the
next year’s desired capabilities. I could also collect alternative size parameters such as source
lines of code (SLOC), IFPUG Function Points (FPs), and COSMIC Function Points (CFPs).
Since UCC versions developed by students constitute a rather unique class of software
projects, I then looked for alternative sources of commercial and industrial software project data
to collect and analyze. I was fortunate to find a company that was looking for better ways to
estimate their software project sizes and costs, and was willing to enable me to analyze their
project data to see what alternative combinations of SLOC, FPs, and CFPs would best predict
their projects’ budgets, schedules, and team size needs at which points in their projects’
development. The resulting analyses constitute the results of this Ph.D. dissertation.
37
4.1 Unified Code Count (UCC)
4.1.1 Dataset Summary
University of Southern California (USC) maintains Unified Code Count (UCC) with the
help of Masters’ Computer Science students. Unified Code Count (UCC) is an object-oriented,
modularized project written in C++ that provides source lines of code (SLOC) counting metrics
for about 30 programming languages, such as logical SLOC (Park 1992) and cyclomatic
complexity (McCabe 1976). UCC is a command-line program
1
, updated each year with new
language parsers, additional code-based metrics, and/or additional input/output options (such as,
allowing additional input parameters). Bug fixes are also released every year, but the effort of
these efforts are not included in this study as FSMs are not recommended for sizing bug fixes
((IFPUG) 2010; (COSMIC) 2014).
Data for analysis came from the weekly timesheets, requirements and project description,
test documentation with corresponding test data, completed code, and explanatory reports
summarizing the steps taken and the results of projects that began and completed between 2010
and 2017. UCC's current dataset covers recent projects consisting of 20 projects researching and
adding cyclomatic complexity metrics, 8 new language parsers, 1 project to enhance the
differencing functionality, and 3 enhancing input/output options in UCC. These data points fall
into the following logical groups:
• Low Parsing Projects (20): New code-based metrics, such as cyclomatic complexity,
requires some additional logic to the existing parsers (which require up to 3 additional
algorithms). The Product Complexity (CPLX) rating ranges from Very Low to Low.
1
A GUI interface was developed for UCC, but the data point for its development has been removed from this
dataset. Primarily due to the size in FPs being 0 - since no new functional processes were added. This was causing
problems with the calibration process (cannot take the log or ln of 0), and therefore, was removed for this analysis.
38
• High Parsing Projects (9): Language parsers parse through input files 3 times in order to
determine logical SLOC, comments, blank lines, and occurrences of keywords. These
tasks require 4-5 algorithms with nested logical operations - equivalent to the Low rating
for the Product Complexity (CPLX) parameter in COCOMO® II. The project that
enhances the differencing functionality is similar to the language parsers with respect to
the number times the input is parsed and the number of algorithms required. The only
exception is CPLX is rated at Nominal (versus Low), due to the increased complexity of
control operations required.
• Inputs/Outputs (3): Adding input/output options may require some logical operations, but
no computations or algorithms. Hence, the Product Complexity (CPLX) rating ranges
from Very Low to Nominal.
4.1.2 Dataset Attributes
A summary of the attributes included in UCC's dataset of maintenance tasks are as
follows, where rows 5 on are effort factors defined by COCOMO® II (B. Boehm et al. 2000):
Table 7 Unified Code Count (UCC)'s Dataset Attributes
Metric/Effort Factor Range Explanation
Equivalent Logical
Source Lines of Code
(ESLOC)
45 to 1425
ESLOC
Logical SLOC was developed and defined by the
Software Engineering Institute (SEI) to standardize
SLOC measurement. Logical SLOC counts
executable lines of code, ignoring lines that are
blank, contain code comments, or non-executable
characters (Park 1992). Equivalent Logical SLOC
(ESLOC) makes modifications to reused code
equivalent to new code, which has been calculated
using Nguyen's modification to COCOMO® II's
reuse model (B. Boehm et al. 2000; Nguyen 2010).
39
Metric/Effort Factor Range Explanation
IFPUG Function Points
(FPs)
4 to 15 FPs
The sizes were calculated with the help of a
colleague who is certified in the IFPUG FPs
method.
2
While the counting rules described in the
Counting Manual ((IFPUG) 2010) are used, it is
complemented with IFPUG’s Sizing Component-
Based Development Using Function Points manual
((IFPUG) 2009).
COSMIC Function Points
(CFPs)
2 to 12
CFPs
The sizes were calculated with the help of a
colleague who is certified in the COSMIC FPs
method
3
.
Total Effort
166.5 to
1500.62 hrs
Effort in terms of hours, including time spent on
training, requirements gathering, coding, testing, and
documenting.
Precedentedness (PREC)
Low to
High
Team members have some experience with similar
types of projects. Hence, most projects are given a
Nominal rating for this factor. The Word/Text
counter received a Low rating due to developers
having little experience processing languages prior
to this project. A couple projects are rated High due
to building input and output options are common
programming assignments.
Development Flexibility
(FLEX)
Nominal
Specific requirements are not provided but assigned
goals must be met unless it is not possible.
Architecture/Risk
Resolution (RESL)
Nominal
Every week, team members are probed to evaluate
the architecture and risks, and make necessary plans
and adjustments.
Team Cohesion (TEAM) Nominal
Generally, the teams work as well as they can,
resolving conflicts as they come up.
Process Maturity (PMAT) Nominal
Required documentation exists to support the high
personnel turnover.
2
The sizes and therefore effort model and results differ from the previously published paper (Hira and Boehm
2016). Since publishing the paper, I met the colleague who has been certified in IFPUG’s FPs method and helped
me correct the sizes. Additionally, I found mistakes in the requirements specifications for a few projects. The
differences may also affect the results published in (Hira et al. 2018a, 2018b).
3
The sizes and therefore effort model and results differ from the previously published paper (Hira and Boehm
2018). While I had worked with a colleague certified in COSMIC’s method before publishing the paper, I had found
mistakes in the requirements specifications for a few projects since publishing the paper. The differences may also
affect the results published in (Hira et al. 2018a, 2018b)(Hira et al. 2018b, 2018a)(Hira et al. 2018b, 2018a) .
40
Metric/Effort Factor Range Explanation
Required Software
Reliability (RELY)
N/A
No specific reliability constraints exist in the
requirements, though, unavailability or errors cause
slight inconveniences.
Test Database Size
(DATA)
N/A
While a test database exists, little extra effort is
required to maintain or expand it.
Product Complexity
(CPLX)
Very Low
to Nominal
Product Complexity (CPLX) is divided into 5
categories, described in the following 5 rows.
Control Operations
Very Low
to High
Language parsers typically are rated Low, function-
level differencing is High, and cyclomatic
complexity and input/output projects typically rated
between Very Low and Low.
Computational
Operations
N/A or
Very Low
Language parsers, cyclomatic complexity, and
function-level differencing projects have Very Low
ratings (though UCC performs computational
operations, they are simple additions and
subtractions), whereas computational operations are
not applicable for other projects.
Device-Dependent
Operations
N/A This category does not apply to UCC.
Data Management
Operations
Nominal
UCC and all of its functions take multiple files as
inputs and outputs a single file with results.
User Interface
Management Operations
Very Low
All projects are rated as Very Low, since UCC is a
primarily command-line program.
Developed for
Reusability (RUSE)
Nominal
While the code is well documented to allow
reusability, extensive testing for generalizability is
not performed.
Documentation Match to
Lifecycle Needs (DOCU)
Low to
High
All teams are required to document the requirements
of the projects, summarize the work completed and
decisions made. However, a couple of projects had
substantially more documentation with respect to the
requirements, and earned a High rating, and one
project had less than the required documentation,
earning a Low rating.
Execution Time
Constraint (TIME)
N/A No constraints are in the requirements.
Main Storage Constraints
(STOR)
N/A No constraints are in the requirements.
41
Metric/Effort Factor Range Explanation
Platform V olatility
(PVOL)
Nominal
The platform(s) UCC is built on change in
accordance with the Nominal rating.
Analyst Capability
(ACAP)
Low to
Very High
Some teams showed high and very high analyst
capability, whereas very few teams showed low
capability in analysis and design.
Programmer Capability
(PCAP)
Low to
High
Though most of the programming personnel were
sufficiently capable, some developers had especially
proficient programming skills.
Personnel Continuity
(PCON)
Extra Low
COCOMO® II's highest personnel turnover rating is
48% per year. On average, UCC faces a 90%
turnover over 4 months. This parameter had to be
adjusted for this environment in a previous study
(Hira, Sharma, and Boehm 2016).
Applications Experience
(APEX)
Low
Most of the personnel that join UCC's development
team do not have prior industry experience in similar
application types, though they have sufficient
Computer Science education. Therefore, the Low
rating is used for APEX on all data points (as
opposed to Very Low or Nominal, etc.).
Language and Tools
Experience (LTEX)
Nominal
Team members have at least a couple years of
experience in C++ programming, in accordance with
the Nominal rating.
Platform Experience
(PLEX)
Low
The development personnel have little experience in
building cross-platform applications. Hence, a Low
rating for PLEX best describes the development
teams for all data points.
Use of Software Tools
(TOOL)
Very Low
The UCC development environment uses tools to
develop and debug code and manage tasks.
Multisite Development
and Communications
(SITE)
Nominal
Team members may be in different cities, and their
primary method of communication is email.
The applicable COCOMO® II effort factors and their corresponding values for each of
the projects were evaluated by reviewing the source code, deliverables, and weekly notes on
teams' progress at the time of data collection. The UCC dataset is provided in Appendix A, a
42
summary of how FPs and CFPs are calculated (and how they differ from each other and SLOC)
is in Appendix B, and definitions of the dataset attributes are in Appendix C.
4.2 Industry New Development Projects
4.2.1 Dataset Summary
A company that develops a wide range of commercial and defense products has allowed
me to gather data for this research objective, as they would like to improve their estimation
capabilities. The company implements firmware and software that interact with custom-built
hardware using C, Verilog, and VHDL. A member in the upper management provided me access
to the campus and assigned a middle manager and a software engineer to provide the completed
projects’ requirements, architecture designs, interface design documents, and test documentation.
Additionally, a firmware engineer and a member in technical upper management helped me get
access to similar documents from another team. The head of project scheduling and a member in
the Human Resources (HR) provided the corresponding project plans and effort reports. I had
talked with team members to understand the personnel attributes, product characteristics, and
development environment. With the deliverables I was provided, I was able to calculate both FPs
and CFPs. The company required that I keep the identity and data confidential.
I received data from 2 teams for a total of 4 projects. These 4 projects primarily record,
save, and transfer data among locally made hardware devices, primarily using a command-line
interface to get inputs. The functionality and effort for just the firmware and software
implementation were extracted and used for this research. Functionally, these 4 projects
essentially provide various input options; output logs, status information, error messages, and
requested information; capture, save, and transfer data; and encrypt and decrypt transferred data.
For 3 of the projects, task and effort tracking was available for smaller components of the
43
projects, adding 14 data points to the dataset. The 18 data points fall into the following logical
groups:
• Data transfers, Hardware Interaction (10): These data points require control, few
computational, Nominal level data management, and Nominal level of data-dependent
operations. These projects differ from the inputs/outputs group by generally requiring
higher levels of data-management operations and requiring some computational
operations. The Product Complexity (CPLX) ratings for these projects are all Nominal.
• Record, Encrypt, Decrypt Data (3): The projects in this group have a High rating for
Product Complexity, as High level of computational operations are required, with
everything else being the same as the Data Transfers group.
• Inputs/Outputs (5): Require some control and data management operations (either Very
Low or Nominal level), but no computational operations or advanced algorithms. Since
the firmware and software interacts with hardware, Nominal level of data-dependent
operations are also required. Hence, the Product Complexity (CPLX) rating for these
projects are all Low.
For other projects in the company, sufficient documentation did not exist, thus limiting
the data points that could be gathered.
4.2.2 Dataset Attributes
A summary of the attributes included in the Industry’s dataset are as follows, where rows
4 on are effort factors defined by COCOMO® II (B. Boehm et al. 2000) and repeated in
Appendix C:
44
Table 8 Industry's Dataset Attributes
Metric/Effort Factor Range Explanation
IFPUG Function Points
(FPs)
6 to 196
FPs
COSMIC Function Points
(CFPs)
2 to 156
CFPs
Total Effort
56 to
6142.5 hrs
Effort in terms of hours, including time spent on
requirements gathering, coding, testing, and
documenting. Only effort for firmware and software
gathered (not hardware).
Precedentedness (PREC)
Low to
Very High
Development Flexibility
(FLEX)
Nominal
Since the requirements only allow some relaxation,
this parameter is rated as Nominal across all data
points.
Architecture/Risk
Resolution (RESL)
Nominal to
High
Methods used depended on the experience levels of
the team members.
Team Cohesion (TEAM)
Low to
Extra High
Process Maturity (PMAT) Nominal
The organization produces documents and track the
projects in accordance to the Nominal rating.
Required Software
Reliability (RELY)
Nominal
Effect of software failure would cause losses that are
recoverable.
Test Database Size
(DATA)
Low
The amount of test data required is equivalent to the
Low rating for all projects.
Product Complexity
(CPLX)
Low to
High
Product Complexity (CPLX) is divided into 5
categories, described in the following 5 rows.
Control Operations Nominal
All projects in the industry dataset require a
Nominal level of control operations.
Computational
Operations
N/A, Very
Low, or
High
The input/output interface components did not
require any computational operations, while the
components responsible for encrypting and
decrypting data requires advanced computations.
The remaining data points required Very Low level
of computational operations (simple math
operations).
45
Metric/Effort Factor Range Explanation
Device-Dependent
Operations
Nominal
Since the software and firmware interact with
hardware devices, Nominal level of Device-
Dependent operations are required for all data points
in the dataset.
Data Management
Operations
Nominal
All data points of the dataset work with data
corresponding to the Nominal level.
User Interface
Management Operations
Very Low
or High
The projects represented in the Industry’s dataset
have command-line interfaces, while the
components recording data deal with video and
audio.
Developed for
Reusability (RUSE)
Nominal
Since the development teams do not (currently)
implement and document code for reuse across the
teams, this parameter is rated as Nominal across all
data points.
Documentation Match to
Lifecycle Needs (DOCU)
Nominal
Since the documentation matched the lifecycle needs
for all projects, all data points received a Nominal
rating.
Execution Time
Constraint (TIME)
N/A No constraints were in the requirements.
Main Storage Constraints
(STOR)
N/A No constraints were in the requirements.
Platform V olatility
(PVOL)
Low
The company builds products for platforms that has
major changes once every year and/or minor
changes every month; corresponding to the Low
rating.
Analyst Capability
(ACAP)
Nominal to
Very High
Programmer Capability
(PCAP)
Nominal to
High
Personnel Continuity
(PCON)
Very High
This company faces little personnel turnover every
year.
Applications Experience
(APEX)
Low to
Very High
Language and Tools
Experience (LTEX)
High to
Very High
46
Metric/Effort Factor Range Explanation
Platform Experience
(PLEX)
Nominal to
Very High
Use of Software Tools
(TOOL)
Nominal
The organization used tools to integrate
development with project plans and tracking tools.
Multisite Development
and Communications
(SITE)
Very High
All team members are collocated within the same
building; hence, SITE is rated as Very High across
all data points.
47
Chapter 5 Research Methodology
5.1 Research Questions and Hypotheses
Research Question 1
The first research question is whether calibrating COCOMO® II for functional size
metrics (FSMs) performs better than other options currently suggested through research:
• Run linear regression (approach primarily used in empirical research)
• Run linear regression on the log transformation of the data (another approach used in
empirical research to account for the nonlinear relationship between size and effort)
• Convert FPs to SLOC with ratios published by Capers Jones then using COCOMO® II
(how COCOMO® II currently supports FPs)
• Locally calibrate the FSMs to SLOC ratio and using COCOMO® II (suggested for better
accuracy; though not suggested, do for CFPs, too)
• Locally calibrate CFPs to FPs ratio and use FP linear and nonlinear models (approach
suggested in empirical research)
Null Hypothesis (H0): The calibrated COCOMO® II for FSMs does not perform better
than the currently available options (listed above) in terms of MMRE and PRED(25) across data
from 2 different environments.
Approach: To demonstrate the generalizability of the calibrated COCOMO® II model,
the methods suggested in empirical research are performed on the UCC and Industry datasets
combined and compared to the performance of the COCOMO® II model calibrated on both
datasets. The prediction accuracy (MMRE and PRED(25) - see Section 5.3 Goodness of Fit and
Prediction Accuracy Statistics) across the above-listed and the calibrated COCOMO® II model
are compared.
48
Research Question 2
The second research question addressed is: do FSMs, along with the calibrated
COCOMO® II model, perform better for some types of software compared to others? FSMs do
not account for complexity caused by algorithms and operations, since they count inputs,
outputs, and transactions that access storage. For example, an output that is produced without
any computational operations is the same size as an output produced as a result of computational
operations. This may cause effort estimation effectiveness to vary across types of projects based
on complexity operations. The null hypothesis, however, is to assume that FSMs, with the
calibrated COCOMO® II model, performs well for all types of projects. In order to compare
objectively, the comparison is done with respect to correlation and prediction accuracy statistics.
Null Hypothesis (H0): MMRE ≤ 25%, PRED(25) ≥ 75%, and Spearman’s Correlation
Coefficient (rho) is ≥ 0.8 for all types of projects, categorized by complexity operations.
Approach: The prediction accuracy statistics (MMRE and PRED(25) -see Section 5.3
Goodness of Fit and Prediction Accuracy Statistics) of the calibrated COCOMO® II model and
the correlation
4
between functional size and normalized effort is evaluated for the groups of
projects identified in the UCC and Industry datasets:
Table 9 Project Types Identified in Datasets
Group Name Dataset(s) Sample Size Description
Low Parsing UCC 20
Parse through the input files once to return an
output. Types of algorithmic operations
required: control and data management
operations, 1-3 simple computational
operations.
4
Spearman’s Correlation test is used because it does not assume the variables have a linear relationship or that they
are normally distributed (software development data is rarely normally distributed (Whigham, Owen, and Macdonell
2015)).
49
Group Name Dataset(s) Sample Size Description
High Parsing UCC 9
Parse through the input files 3 times to return
outputs. Types of algorithmic operations
required: control and data management
operations, 4-5 simple computational
operations.
Data Transfers Industry 10
Data transferred between different hardware
instruments. Types of algorithmic operations
required: control, data management, device-
dependent, and simple computations.
Record,
Encrypt,
Decrypt
Industry 3
Record, encrypt, and decrypt data. Types of
algorithmic operations required: control, data
management, device-dependent, and complex
computations.
Inputs and
Outputs
UCC,
Industry
8
Input and output processes without any
computational operations. Types of algorithmic
operations required: control and data
management. Industry data points require some
device-dependent operations.
5.2 Calibration Technique
FSMs (FPs and CFPs) define software size at a higher granularity compared to SLOC,
thus requiring 3 types of constants/parameters of the original COCOMO® II model to be
recalibrated:
1. Productivity Factor (constant A)
2. Product Complexity parameter (CPLX)
3. Exponent Growth Rate (constants B and C)
While the original COCOMO® II effort estimation model is a power equation (see
below), it can be handled as a linear equation after taking the logarithm (log) of both sides. When
the parameter values of an effort multiplier are unknown or need to be calibrated, then the rating
level is represented as a number scale between -2 and 3 (where -2 is Very Low and 3 is Extra
High) and the weight of the rating is calibrated through regression.
50
Recalibrating the 3 above-listed constants/parameters requires 2 steps:
Step 1. Calibrate A and CPLX (exponent simplified to E for this step - the calibrated value is
thrown away in this step because it is calibrated in the next step):
Equation 10 Calibration Step 1 for Productivity Rate and Product Complexity parameter
To calibrate COCOMO® II for FSMs, it is important to only adjust the necessary
constants and parameters. Therefore, I perform a stepwise type of analysis by running this step
with the following variations:
• Single Productivity Rate
• Productivity Rate for New Development/Enhancement
• Productivity Rate for New Development/Add Features/Modify Features
log(Effort) −" log(EM
i
)
16
i=1
= log(A) + (E × log(Size)) + log(CPLX)
log(Effort)
−" log(EM
i
)
16
i=1
= log(A) + (E × log(Size)) + (log(CPLX_weight) × CPLX_rating)
Effort = A × Size
E
× ! EM
i
17
i=1
where E = B + C × " SF
j
5
j=1
log(Effort) = log(A) + (E × log(Size)) + " log(EM
i
)
17
i=1
log(Effort) = log(A) + (E × log(Size)) + "( log(EM
i
_weight) × EM
i
_rating )
17
i=1
Equation 9 Convert COCOMO® II Model into Linear Equation using log Transformation
51
• Single Productivity Rate & CPLX
• New Development/Enhancement & CPLX
• New Development/Add Features/Modify Features & CPLX
Equation 10 works to calibrate the single productivity rate A and the CPLX parameter. To
use the original CPLX parameter values, it would be included with the sum of EMs on the left
side of the equation. To have separate productivity rates for the 3 types of development tasks, the
last equation in Equation 10 changes to:
log(Effort)−" log(EM
i
)
16
i=1
= log(A) + "(log(DevType_weight) × DevType_rating) +
(E × log(Size)) + (log(CPLX_weight) × CPLX_rating)
Equation 11 Calibration Step 1 for Productivity Rate by Development Type and Product Complexity
parameter
The calibration that returned the highest R
2
was used to continue with the second step
(New Development/Add Features/Modify Features & CPLX)
5
.
Step 2. Calibrate for B and C requires moving everything but the exponent over to the left side
of the equation, so that a linear regression determines the required coefficients. A
DevType
represents the productivity rate for the development task type (new development, adding or
modifying features) and CPLX represents the parameter rating values determined by Step 1 of
the calibration process.
5
Note, I also performed step 2 on the other options to complete calibrating a model and compared the prediction
accuracy - the selected model performed better. See Appendix D.
52
Equation 12 Calibration Step 2 for Exponent constants
5.2.1 Effort Normalization
To visually see the true relationship between size and the effort caused by size alone,
effort needs to be normalized with respect to the effort factors and their numerical effects. Taking
the COCOMO® II equation, the product of EMs (Effort Multipliers) and sum of SFs (Scale
Factors) need to move to the left side of the equation where effort is:
Normalizing effort helps one to analyze how data points are behaving differently from
others and to identify groups or factors that need to be taken into account.
log(Effort)−" log(EM
i
)
16
i=1
− log(A)− log(A
DevType
) − log(CPLX) = (E × log(Size))
log(Effort)−∑ log(EM
i
)
16
i=1
− log(A)− log#A
DevType
$− log(CPLX)
log(Size)
= E
log(Effort)−∑ log(EM
i
)
16
i=1
− log(A)− log#A
DevType
$− log(CPLX)
log(Size)
= B + C ×" SF
j
5
j=1
Effort = A × A
DevType
× Size
(B + C × ∑ SF
j
5
j=1
)
× ! EM
i
17
i=1
Effort
∏ EM
i
17
i=1
= A × A
DevType
× Size
(B + C × ∑ SF
j
5
j=1
)
Effort
A
DevType
× ∏ EM
i
× Size
(C × ∑ SF
j
5
j=1
) 17
i=1
= A × Size
B
Equation 13 Normalizing Effort with respect to COCOMO® II‘s Effort Factors
53
5.3 Goodness of Fit and Prediction Accuracy Statistics
The R
2
determines how closely the regression curves fit the data points. The following
statistics describe the accuracy of the model, since their calculations are based on the difference
between actuals and estimates, and will determine whether the effort models lead to acceptable
accuracy: MAR (Mean of Absolute Residuals), MdAR (Median of Absolute Residuals), MMRE
(Mean Magnitude of Relative Error), MdMRE (Median Magnitude of Relative Error), and
PRED(25) (Prediction Quality Indicator).
Absolute Residuals (AR) is the absolute value of the difference between estimates and
actuals:
AR = |y
i
- y
i
/ |
Equation 14 Absolute Residual (AR) Calculation
In Equation 14, y
i
and y
i
/ are the actual and estimated values, respectively. MAR (Mean of
Absolute Residuals) and MdAR (Median of Absolute Residuals) give cost estimators and project
managers the average error of effort estimates (in terms of hours, for this analysis). Unless the
ARs are normally distributed, these results can be heavily influenced by extreme values.
MRE, or Magnitude of Relative Error, is defined as:
MRE =
| y
i
- y
i
/ |
y
i
Equation 15 Magnitude of Relative Error (MRE) Calculation
Where y
i
and y
i
/ are the actual and estimated values, respectively. MMRE (Mean Magnitude of
Relative Error) and MdMRE (Median Magnitude of Relative Error) give cost estimators and
project managers the average accuracy of effort estimates. Again, unless the MREs are normally
distributed, the MMRE and MdMRE values can be heavily influenced by extreme values.
54
The PRED(x) prediction quality index gives cost estimators and project managers the
ability to state how often estimates can be expected to be within an acceptable margin of error.
The PRED(x) prediction quality index is calculated as (where n is the number of data
points/observations):
PRED(x) =
100
n
× 1"2
1 if MRE
i
≤x
0 otherwise
n
i=1
3
Equation 16 PRED(x) Calculation
Cost estimators generally want effort and cost estimates to come within 25% of the actuals at
least 75% of the time (Conte et al. 1986; Jørgensen 2004). Hence, the ideal result for MMRE is
less than or equal to 25% and the ideal result for PRED(25) is 75% or higher.
5.3.1 Cross Validation
Validating an effort estimation model with the same data used to calibrate the model will
lead to biased results - typically, in favor of the model. Though good practice would split an
existing dataset of past projects into a training and validation set, this is impractical when the
dataset is small. Each subset would be too small for statistical significance. K-fold cross-
validation provides a method to validate an effort model built with a small dataset. This method
splits the dataset into k subsets, and the following is run k times: a calibration is performed with
k-1 subsets, and 1 subset validates the resulting model. K-fold cross-validation considers the
variation among data points and provides less biased predictive accuracy results. Leave-one-out
cross-validation is a specific variation of k-fold cross-validation, typically recommended when
building an effort estimation model as it is reproducible (Kocaguneli and Menzies 2013). A
single data point is used to validate the calibrated model using the remaining data points. This is
55
repeated until each data point has served as the validation set. The final prediction accuracy
statistics are the averages of the leave-one-out models’ statistics.
5.3.2 Residuals’ Properties
To ensure that the result of regression represents reality, the residuals (difference between
actuals and estimates) must have the following properties:
• The residuals are normally distributed, verified by the Shapiro-Wilk test. The null
hypothesis of the Shapiro-Wilk test is that the variable from which the sample was
extracted follows a normal distribution. Therefore, a p-value of greater than 0.05 is
considered statistically significant to not reject the null hypothesis.
• The residuals should be homoscedastic, meaning that the residuals have constant variance
across the independent variables. This property is verified by the Breusch-Pagan test, for
which the null hypothesis being tested is that the residuals are homoscedastic. Therefore,
a p-value of greater than 0.05 is considered statistically significant to not reject the null
hypothesis.
• Lastly, the residuals should be independent of each other, or are not autocorrelated. This
property is verified by the Durbin-Watson test. The null hypothesis of the test is that the
residuals are not autocorrelated. Therefore, if the p-value is greater than 0.05, one cannot
reject the null hypothesis.
5.4 Threats to Validity and Limitations
5.4.1 Internal Validity Concerns
IFPUG and COSMIC both have certification programs to ensure that practitioners learn
the counting rules and how to apply them to various types of software correctly. Though I am not
56
certified in IFPUG's or COSMIC's methods, I calculated the sizes for both UCC's and Industry’s
data points. Two collaborators that together have been certified in both methods volunteered to
and reviewed the calculations for UCC, to ensure the FSM rules are correctly applied. Due to
confidentiality clauses with the Industry dataset, I used analogies and examples to ensure the
reliability of the size metrics.
The authors of (B. Boehm et al. 2000) suggest having at least 10 data points to locally
calibrate the COCOMO® II model. Through empirical analysis, the authors of (Menzies et al.
2005) determined that the accuracy mean stabilizes between 5 and 10 data points. Recalibrating
COCOMO® II for FSMs requires the adjustment of 6 parameters/constants: productivity rate for
new development, adding features, and modifying features; product complexity parameter; and 2
constants for the exponent. Since the 2 datasets combined provide 50 data points, that is about 8
data points per variable which is within the 5-10 suggested range.
Many of the statistical tests for residuals’ properties require larger datasets for statistical
significance. Spearman’s rank-order correlation test does not assume the variables to have
normal distributions or for the relationship between them to be linear. However, for 95%
confidence interval with a coefficient determinant ranging from 0.55 to 0.85, Spearman’s test
requires 149 data points (Bonett and Wright 2000). Normality tests, including Shapiro-Wilk, are
not sensitive for small sample sizes (n = 30) leading to the results suggesting that the data points
can follow a normal distribution most times (Le Boedec 2016). No literature discussing the
minimum sample size for statistical significance could be found regarding the Breusch-Pagan
test for homoscedasticity of the residuals. The authors of (DeCarlo and Tryon 1993) found that
the power of the Durbin-Watson test is low for sample sizes of less than 50, and the author of
57
(Rayner 1994) discovered that though the performance of auto-correlation tests on small sample
sizes was weak (including Durbin-Watson), Durbin-Watson performed better than the others.
While the 2 datasets combined yield 50 data points, some of the tests expect a larger
sample size. Hence, the performance of the effort estimation models may not give the most
accurate results. In most cases, software development data cannot be increased due to sample
requirements, but instead depend on the amount of data available or provided. If sufficient
documents are not maintained and data is not collected regularly, it is very difficult or impossible
to get additional data. However, the calibration analysis is performed on data from 2 different
software development environments on different types of software, making the results more
generalizable than analyses completed on a single dataset.
5.4.2 External Validity Concerns
Due to the needs of this research, it was pertinent I collect data from environments that
allowed me to get enough information about the projects to: calculate the size in FPs and CFPs,
get the COCOMO® II parameter ratings, and know the application domains and identify logical
groups of projects. I was able to get a dataset from a bank, but the size was provided only in FPs
and I was not given information to calculate CFPs. While I knew the data consisted of
enhancement projects, documentation did not exist to differentiate between new features and
modified features. From my analyses, I found determining the different types of enhancement
tasks led to significant improvement in the prediction accuracy. The International Software
Benchmarking Standards Group (ISBSG) is a not-for-profit organization that maintains a few
software development related data repositories for organizations to use for benchmarking and
data analyses. ISBSG’s Development and Enhancement Repository (as of February 27, 2018)
contains data from many organizations across 32 countries. Though the dataset consists of a large
58
number of data points, one cannot get the details of the development environments in which the
software was developed or the values of all effort drivers impacting the software development to
try to understand how data points differ from one another. Hence, this dataset also could not be
used to accurately calibrate COCOMO® II for FSMs.
Though UCC is being maintained by Masters-level students, the data points are not mere
student assignments. Each data point signifies a maintenance project completed by Master’s
students on a software product used across several types of users across the world (such as,
project managers, developers, researchers, etc.). Every year, improvements and features
completed by the students are released as updates to all users. Most students take this course
during their last semester, after which, many are hired to work at big software companies. While
there is concern whether student data can generalize to industry work, 3 studies found that
student data is similar to industry professionals when introduced to new practices or concepts
(Runeson 2003; Salman, Misirli, and Juristo 2015) or when students are in a project setting
(versus a classroom setting) (Berander 2004). In this study, I applied COCOMO® II’s effort
factors that represent the difference between students and industry professionals on UCC’s
dataset: very low levels of experience, lower levels of capability, and less complex software.
Normalizing effort with the effort factors made the student data points similar to industry’s data
points.
The original COCOMO® II model provides a reuse model for sizing changes to existing
software. While it accounts for the percentage change being made to the existing software, it also
has parameters that represent how familiar the programmers are with the existing code, and how
easily the existing code can be understood with the help of documentation. While the calibration
process will define the productivity rates for enhancements that add new features and modify
59
existing ones, the productivity rates may be different in other environments due to the effects of
familiarity with and understandability of the existing code. In this research, UCC’s dataset
consists of maintenance tasks. The developers are given training on UCC’s architecture and
design, requirements, and code for the first 3-4 weeks of beginning a project. Development
teams are also required to document their code and decisions made throughout the project.
Hence, the familiarity and understanding parameters can be rated the same across all data points
in UCC’s dataset. More data with varying team structures and documentation availability would
be a good next step to define a generalizable way to estimate maintenance tasks with FSMs.
The dataset from the Industry and UCC represent a small sector of industry and
application domains. Though the software domains and development environments covered by
the datasets in this analysis might not be representative of all domain areas, the 2 selected
datasets represent 2 different domain areas. Most of the differences can be explained by
COCOMO® II’s effort factors, leading to a generalizable effort estimation model. My plan is to
continue collecting data from various sources to further enhance the calibrated COCOMO® II
model for FSMs.
60
Chapter 6 Calibrated COCOMO® II Model
6.1 Calibration Step Results
6.1.1 Step 1 Results
In the first step of the calibration process, I compare the R
2
of various options for the
productivity rate constant A and the Product Complexity (CPLX) parameter to select the model
with the best performance. The variations are:
• Single Productivity Rate (Prod Rate)
• Productivity Rate for New Development/Enhancement (New/Enh)
• Productivity Rate for New Development/Add Features/Modify Features (New/Add/Mod)
• Single Productivity Factor & CPLX (Prod & CPLX)
• New Development/Enhancement & CPLX (New/Enh & CPLX)
• New Development/Add Features/Modify Features & CPLX (New/Add/Mod & CPLX)
Table 10 displays the R
2
for each of the variations for FPs and CFPs, demonstrating that
the last option has the highest R
2
in both cases
6
. In other words, the productivity differs for
developing a new project, enhancing a project by adding new features, and enhancing a project
by modifying existing features; and, the Product Complexity (CPLX) parameter also requires
adjustment for FSMs. Table 11 contain the coefficients, standard error, t statistic, and p-value for
the New/Add/Mod & CPLX model with FPs and CFPs.
6
I performed Step 2 on the other options and display the resulting effort estimation accuracy results in Appendix D
without cross validation. The analysis confirms that the selected model does indeed perform better than the other
options.
61
Table 10 Calibration Step 1 Compare R
2
of Calibration Options
FPs CFPs
Prod Rate 0.404 0.631
New/Enh 0.655 0.748
New/Add/Mod 0.804 0.777
Prod & CPLX 0.921 0.954
New/Enh & CPLX 0.922 0.973
New/Add/Mod & CPLX 0.957 0.975
Table 11 Standard Error, t statistic, and p-values for Calibration Step 1 Selected Model’s coefficients
Variable Coefficient Standard Error t P > |t|
FPs
log(A) 1.642 0.087 18.768 < 0.0001
log(FP) 1.031 0.068 15.205 < 0.0001
log(New) 0.079 0.082 0.963 0.341
log(Add) 0.360 0.060 6.045 < 0.0001
log(Mod) 0.000 0.000
log(CPLX_weight) 0.12 0.034 3.567 0.001
CFPs
log(A) 1.883 0.057 32.761 < 0.0001
log(FP) 0.877 0.042 20.995 < 0.0001
log(New) 0.340 0.055 6.126 < 0.0001
log(Add) 0.095 0.046 2.082 0.043
log(Mod) 0.000 0.000
log(CPLX_weight) 0.137 0.025 5.429 < 0.0001
Table 12 Residual Properties’ Statistics for Calibration Step 1 Selected Model
Tests
Shapiro-
Wilk
Breusch-
Pagan
Durbin-
Watson
FPs
Statistics 0.964 0.888 1.659
p-value 0.127 0.346 0.186
CFPs
Statistics 0.978 0.475 1.913
p-value 0.458 0.491 0.706
62
Table 12 displays the residuals’ properties test results (Shapiro-Wilk for normality,
Breusch-Pagan for homoscedasticity, and Durbin-Watson for autocorrelation) for the selected
models using FPs and CFPs. The tests results’ p-values claim that the null hypotheses could not
be rejected. Therefore, the preferred residuals’ properties hold for the calibration’s first step.
6.1.2 Step 2 Results
Step 2 of the calibration process determines the constants’ values in the exponent of the
model. Table 13 displays the R
2
of the model, coefficients of and the p-values for the constants,
while Table 14 displays the residuals’ properties test results.
Table 13 Standard Error, t statistic, and p-values for Calibration Step 2 coefficients
R
2
Variable Coefficient Standard Error t P > |t|
FPs 0.026
B 0.833 0.183 4.543 < 0.0001
C 0.011 0.010 1.127 0.265
CFPs 0.03
B 0.629 0.211 2.978 0.005
C 0.014 0.012 1.222 0.228
Table 14 Residual Properties’ Statistics for Calibration Step 2
Tests
Shapiro-
Wilk
Breusch-
Pagan
Durbin-
Watson
FPs
Statistics 0.974 0.276 1.681
p-value 0.325 0.599 0.226
CFPs
Statistics 0.890 0.093 2.116
p-value 0.0002 0.761 0.729
63
6.1.3 Final Model Residuals
Using the results from the calibration steps, Figure 9 displays the distribution of residuals
(differences between actual and estimated effort) for both FPs (left) and CFPs (right).
Figure 9 Calibrated COCOMO® II Residuals for FPs (left) and CFPs (right)
Table 15 Residual Properties’ Statistics for Calibrated COCOMO® II with FPs and CFPs
Tests
Shapiro-
Wilk
Breusch-
Pagan
Durbin-
Watson
FPs
Statistics 0.885 26.805 2.209
p-value 0.0002 < 0.0001 0.259
CFPs
Statistics 0.641 29.459 1.334
p-value < 0.0001 < 0.0001 0.994
Though the residuals for both calibration steps mostly held the expected properties, Table
15 shows that the residuals for the final model are not necessarily homoscedastic or normally
distributed. One reason for this is because as the size and required development effort of
software grows, so does the error. The analysis on residuals look at the error in terms of hours. A
10-hour error is a 50% error on a project requiring 20 hours, and a 10% error on a project
requiring 100 hours. Hence, in terms of hours, the residuals grow as the project grows. Which by
-1000
-800
-600
-400
-200
0
200
400
600
800
1000
1200
0 50 100 150 200 250
Residuals (hrs)
IFPUG Function Points (FPs)
FP Model Residuals
-400
-200
0
200
400
600
800
1000
1200
0 20 40 60 80 100 120 140 160 180
Residuals (hrs)
COSMIC Function Points (CFPs)
CFP Model Residuals
64
definition, makes the residuals heteroscedastic (error changes with respect to the independent
variable).
If, instead, the Magnitude of Relative Error (MRE) were tested for normality,
homoscedasticity, and autocorrelation (see Table 16), then the p-values are high enough that the
null hypotheses cannot be rejected (except for the normality test for FPs). Figure 10 displays the
distribution of the error percentages (MRE) for both the calibrated COCOMO® II model for FPs
and CFPs.
Table 16 Residuals Properties Statistics for Magnitude of Relative Error (MRE) of Calibrated
COCOMO® II with FPs and CFPs
Tests
Shapiro-
Wilk
Breusch-
Pagan
Durbin-
Watson
FPs
Statistics 0.847 0.256 1.578
p-value < 0.0001 0.613 0.947
CFPs
Statistics 0.973 0.050 1.945
p-value 0.312 0.823 0.604
Figure 10 Calibrated COCOMO® II Magnitude of Relative Errors (MREs) for FPs (left) and CFPs
(right)
-200
-150
-100
-50
0
50
100
0 50 100 150 200 250
Magnitude of Relative Error (MRE)
IFPUG Function Points (FPs)
FP Model MREs
-80
-60
-40
-20
0
20
40
60
0 20 40 60 80 100 120 140 160 180
Magnitude of Relative Eror (MRE)
COSMIC Function Points (CFPs)
CFP Model MREs
65
6.2 Model Details
6.2.1 Original COCOMO® II Model
To compare with the changes made for FPs and CFPs, the constants and Product
Complexity (CPLX) values from the original COCOMO® II model are listed in this section. The
equation of the COCOMO® II model is:
Effort = A × Size
E
× ! EM
i
17
i=1
where E = B + C × " SF
j
5
j=1
Equation 17 Generalized COCOMO® II Effort Estimation Model
The values of constants A, B, and C are listed in Table 17 below. The COCOMO® II
model was calibrated for person-months as the measure of effort, but the data collected for this
research tracked effort in terms of hours. The authors of (B. Boehm et al. 2000) determined that a
person-month is equivalent to 152 person-hours. Hence, the original constant A is multiplied by
152 in Table 17.
Table 17 Original COCOMO® II (for SLOC) Constants’ Values
A B C
446.88 0.91 0.01
The rating values for the CPLX parameter is provided in Table 18 below:
Table 18 Original COCOMO® II (for SLOC) Product Complexity (CPLX) parameter values
Very Low Low Nominal High Very High
Extra
High
0.73 0.87 1 1.17 1.34 1.74
66
The range of the exponent in Table 19 is described in terms of setting all the Scale
Factors (effort factors that can have an exponential effect on effort) to Very Low, Nominal, and
Extra High. The range displays the default exponent (when the Scale Factors are Nominal), the
lowest possible exponent (when the Scale Factors are Extra High), and the highest possible
exponent (when the Scale Factors are Very Low).
Table 19 Original COCOMO® II (for SLOC) Exponent range
Lowest Default Highest
0.91 1.0997 1.2262
6.2.2 IFPUG Function Points (FPs)
The constants A, B, and C’s values determined by the calibration process are below in
Table 20 when using IFPUG Function Points (FPs) as the size parameter in COCOMO® II.
Note, there are 3 values for constant A based on whether the project being estimated is a new
development, adding a new feature enhancement, or modifying an existing feature enhancement
project/task.
Table 20 Calibrated COCOMO® II for FPs Constants’ Values
A B C
New Development 52.602
0.833 0.011 Add New Features 100.51
Modify Features 43.84
The rating values for the CPLX parameter is provided in Table 21 below. Notice that the
effect of Very Low complexity is less than for SLOC in the original COCOMO® II, but much
stronger for Extra High complexity.
67
Table 21 Calibrated COCOMO® II for FPs Product Complexity (CPLX) parameter values
Very Low Low Nominal High Very High
Extra
High
0.57 0.76 1 1.32 1.74 2.30
The range of the exponent is listed in Table 22. The exponent range is not very different
from the original COCOMO® II model. A software project’s size in FPs may grow less quickly
than effort, due to the Data Element Types (DETs) requirements before the size of a transaction
increases. Hence, the rate of growth between effort and size would be expected to be generally
nonlinear.
Table 22 Calibrated COCOMO® II for FPs Exponent range
Lowest Default Highest
0.833 1.0511 1.1963
6.2.3 COSMIC Function Points (CFPs)
The constants A, B, and C’s values determined by the calibration process are below in
Table 23 when using COSMIC Function Points (CFPs) as the size parameter in COCOMO® II.
As with FPs, there are 3 values for constant A based on whether the project being estimated is a
new development, adding a new feature enhancement, or modifying an existing feature
enhancement project/task.
Table 23 Calibrated COCOMO® II for CFPs Constants’ Values
A B C
New Development 166.94
0.629 0.014 Add New Features 95.04
Modify Features 76.32
68
The rating values for the CPLX parameter is provided in Table 24 below. As with FPs,
the effect of Very Low complexity is less than for SLOC in the original COCOMO® II, but
much stronger for Extra High complexity.
Table 24 Calibrated COCOMO® II for CFPs Product Complexity (CPLX) parameter values
Very Low Low Nominal High Very High
Extra
High
0.53 0.73 1 1.37 1.88 2.57
The range of the exponent is listed in Table 25. For CFPs, the exponent, and therefore
effort with respect to size, starts and grows at a slower rate than the for SLOC and FPs. CFPs
provides more granularity in size compared to FPs, since each data group is counted in a
functional process. Therefore, size grows more quickly with respect to effort compared to FPs.
Table 25 Calibrated COCOMO® II for CFPs Exponent range
Lowest Default Highest
0.629 0.9015 1.0829
69
Chapter 7 Effort Estimation Effectiveness
7.1 Calibrated COCOMO® II vs Alternative Methods
The first research question compares the effort estimation accuracy of the calibrated
COCOMO® II model with the existing methods to use FSMs and determine which is more
generalizable and accurate. Current research on FSMs suggest the following methods for effort
estimation:
• Run linear regression (approach primarily used in empirical research)
• Run linear regression on the log transformation of the data (another approach used in
empirical research to account for the nonlinear relationship between size and effort)
• Convert FPs to SLOC with ratios published by Capers Jones then using COCOMO® II
(how COCOMO® II currently supports FPs)
• Locally calibrate the FSMs to SLOC ratio and using COCOMO® II (suggested for better
accuracy; though not suggested, include CFPs)
• Locally calibrate CFPs to FPs ratio and use FP linear and nonlinear models (approach
suggested in empirical research)
The null hypothesis is: Calibrating COCOMO® II for functional size metrics does not
perform better in terms of MMRE and PRED(25) than the options currently available in
research/empirical studies (listed above).
Table 26 and Table 27 display the prediction accuracy statistics of the various methods for
FPs and CFPs, respectively. (Note: Details of the linear and nonlinear equations, and conversion
ratios used for FPs and CFPs are provided in Appendix E.) The calibrated COCOMO® II model
(explained in Chapter 6 Calibrated COCOMO® II Model) has the highest PRED(25) and lowest
MMRE results, proving that the calibrated COCOMO® II model does perform better than the
70
other options that are currently suggested through research. Therefore, the above-mentioned
hypothesis is false.
Table 26 Prediction Accuracy of Methods suggested by research and Calibrated COCOMO® II for FPs
Statistic Linear Nonlinear
Jones
Conversion
to SLOC
Local
Conversion
to SLOC
Calibrated
COCOMO® II
MAR (hrs) 436.23 457.88 421.65 635.38 214.51
MdAR (hrs) 198.54 190.75 143.86 242.94 89.401
MMRE 89.87% 72.86% 47.92% 72.07% 31.14%
MdMRE 57.14% 43.6% 49% 71.32% 19.46%
PRED(25) 20% 36% 14% 2% 68%
Table 27 Prediction Accuracy of Methods suggested by research and Calibrated COCOMO® II for CFPs
Statistic Linear Nonlinear
Local
Conversion
to SLOC
Conversion
to FPs -
Linear
Conversion
to FPs -
Nonlinear
Calibrated
COCOMO
® II
MAR 480.8 421.47 536.74 470.21 443.69 147.53
MdAR 237.51 131.08 276.48 220.02 171.45 68.57
MMRE 96.2% 56.02% 80.05% 89.38% 70.39% 20.94%
MdMRE 67.53% 39.49% 85.28% 59.1% 51.41% 18.22%
PRED(25) 18% 38% 4% 22% 20% 70%
Though the data used in this research comes from 2 different organizations and
environments, the calibrated COCOMO® II model performs better than the other options for 2
reasons:
1. The calibrated COCOMO® II model accounts for most of the causal factors of effort
(through the effort drivers), making it a more generalizable and accurate model.
71
2. Additional layers of error are not added by converting one size metric to another. FPs and
CFPs are used directly in the COCOMO® II model (with some adjustments - see Chapter
6 Calibrated COCOMO® II Model).
7.2 Calibrated COCOMO® II for Software Types
The second research question addressed is: do functional size metrics, along with the
calibrated COCOMO® II model, perform better for some types of projects compared to others?
The null hypothesis is: MMRE ≤ 25%, PRED(25) ≥ 75%, and Spearman’s Correlation
Coefficient (rho) is ≥ 0.8 for all types of projects.
The prediction accuracy statistics of the calibrated COCOMO® II model and the
correlation between functional size and normalized effort (see Section 5.2.1 Effort
Normalization) is evaluated for the 5 types of projects identified in the UCC and Industry
datasets:
1. Low Parsing - UCC
UCC’s Low Parsing projects add new code-based metrics to the existing code base, such
as cyclomatic complexity. These projects require some additional logic to the existing language
parsers, which could require up to 3 additional algorithms. Figure 11 displays the size in terms of
FPs and CFPs against normalized effort. While the correlation between size and effort is low, the
prediction accuracy is within the acceptable range (PRED(25) ≥75% and MMRE ≤ 25% - see
Table 28).
One of the major reasons the correlation between FSMs and effort are low for this group
of projects is because the projects are similar to each other functionally. In a user’s perspective,
the projects look similar in terms of functionality and features. However, the number of control
72
and computational operations depends on the specific feature being implemented, and how it
relates to the existing code.
Figure 11 Normalized Effort vs FPs (Left) and CFPs (Right) for UCC’s Low Parsing Projects
Table 28 Prediction Accuracy and Correlation Statistics for UCC's Low Parsing Projects
Statistic FPs CFPs
MAR 49.85 hrs 50.38 hrs
MdAR 28.65 hrs 42.13 hrs
MMRE 16.21% 19.27%
MdMRE 14.83% 18.54%
PRED (25) 80% 75%
rho (Correlation Coefficient) 0.378 0.328
Correlation p-value 0.101 0.158
UCC is a code metrics tool. Most of the projects in this group consist of adding
Cyclomatic Complexity metrics for the languages supported by UCC at the time. Since UCC’s
architecture is highly modularized and uses Object-Oriented concepts, some Cyclomatic
Complexity implementations may require more algorithms than others. If the feature is being
implemented for a language that is similar to the parent object, then the code for the output may
not need to be re-implemented. Hence, FSMs are not well-suited for such situations.
0
50
100
150
200
250
300
350
400
450
0 2 4 6 8 10 12 14
Normalized Effort (hrs)
IFPUG Function Points (FPs)
UCC Low Parsing - FPs
0
50
100
150
200
250
300
350
400
450
500
0 1 2 3 4 5 6 7 8
Normalized Effort (hrs)
COSMIC Function Points (CFPs)
UCC Low Parsing - CFPs
73
CFPs show a little more granularity and difference among the data points compared to
FPs. For FPs the number of Data Element Types (DETs) being outputted by UCC fall under the
Low complexity group. The one project of size 12 is because 3 very similar projects were
grouped and assigned to a team (Midas, XMidas, and NeXtMidas). Since the 3 languages
produce individual output reports, and each output is considered Low complexity, and the total is
3 * 4 = 12. With CFPs, each data group can be counted individually, allowing for some changes
in the sizes across the projects.
2. High Parsing - UCC
UCC’s High Parsing projects parse through input files 3 times in order to return the
required output. These tasks require 4-5 algorithms with nested logical operations. While the
types of operations may be similar to UCC’s Low Parsing projects (with some increase in
complexity level), the major difference is in the number of complexity operations required.
These projects are either size 4 or 5 in terms of FPs, but are of different sizes with respect to
CFPs (see Figure 12). Given the difference in granularity levels, the correlation and prediction
accuracy with FPs is lower than CFPs (see Table 29).
Table 29 Prediction Accuracy and Correlation Statistics for UCC's High Parsing Projects
Statistic FPs CFPs
MAR 274.03 hrs 165.5 hrs
MdAR 235.69 hrs 162.5 hrs
MMRE 33.51% 21.76%
MdMRE 22.75% 17.56%
PRED (25) 66.67% 77.78%
rho (Correlation Coefficient) 0.693 0.882
Correlation p-value 0.05 0.005
74
Figure 12 Normalized Effort vs FPs (Left) and CFPs (Right) for UCC’s High Parsing Projects
The reason a few of the data points are sized as 5 FPs, instead of 4, are because the
number of required DETs increased due to adding Cyclomatic Complexity outputs along with the
usual parsing outputs (which are rated Low). The projects sized 5 FPs have been added to UCC
after Cyclomatic Complexity was added to the existing language parsers. Therefore, new
language parsers need to include Cyclomatic Complexity in the requirements, rating the outputs
as Average complexity. With CFPs, there are more size options due to each data group getting its
own count, as long as the groups of data hold consistently throughout the system. The SLOC
counting metrics output, based on whether these are applicable to the language or not: logical,
physical, compiler directives, executable instructions, and data declarations. Since these outputs
depend on the language being parsed, they can be considered individual logical groups – causing
the variation in sizes with respect to CFPs.
3. Data Transfers - Industry
This group consists of projects from Industry’s dataset that transfer data between different
hardware instruments. These tasks also require some computational operations, which
differentiates these tasks from the Inputs and Outputs group (see group #5 below). While the
0
50
100
150
200
250
300
350
400
450
500
0 1 2 3 4 5 6
Normalized Effort (hrs)
IFPUG Function Points (FPs)
UCC High Parsing - FPs
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14
Normalized Effort (hrs)
COSMIC Function Points (CFPs)
UCC High Parsing - CFPs
75
correlation between size and normalized effort is the same for FPs and CFPs, the prediction
accuracy is higher for CFPs (see Table 30). Figure 13 displays that though there are strong
positive trends between size and normalized effort for both FPs and CFPs, the amount of
variation among the data points is less for CFPs - leading to the higher prediction accuracy.
Figure 13 Normalized Effort vs FPs (Left) and CFPs (Right) for Industry’s Data Transfer Projects
Table 30 Prediction Accuracy and Correlation Statistics for Industry’s Data Transfer Projects
Statistic FPs CFPs
MAR 418.96 hrs 387.65 hrs
MdAR 166.9 hrs 123.86 hrs
MMRE 31.85% 19.36%
MdMRE 28.26% 17.39%
PRED (25) 50% 80%
rho (Correlation Coefficient) 0.973 0.973
Correlation p-value < 0.0001 < 0.0001
Though the correlation between FPs and normalized effort was high for Data Transfer
projects, the effort estimation accuracy is weak. While the trend among the data points is
generally moving upward, effort plateaus between 50 and 150 FPs, with effort growing before
and after this range. This behavior causes the reduced prediction accuracy.
0
1000
2000
3000
4000
5000
6000
7000
0 50 100 150 200 250
Normalized Effort (hrs)
IFPUG Function Points (FPs)
Data Transfers - FPs
0
500
1000
1500
2000
2500
3000
3500
0 20 40 60 80 100 120 140 160 180
Normalized Effort (hrs)
COSMIC Function Points (CFPs)
Data Transfers - CFPs
76
4. Record, Encrypt, Decrypt - Industry
Some components in Industry’s projects were responsible for recording, encrypting, and
decrypting data. The encryption and decryption algorithms increase the complexity of the
projects compared to the other existing groups. While the correlation coefficient between size
and normalized effort is strong for both FPs and CFPs (0.866), the prediction accuracy is higher
for FPs. See Figure 14 and Table 31 for the visual and statistical results.
One reason the prediction accuracy is lower for CFPs could be the larger difference in
size between the 2 groups of projects. The size difference between the smaller project and the 2
larger ones is by almost a factor of 3 (12 to 35). Even though other projects are not in Figure 14,
the trend between 0 and 35 shows that other projects around 12 CFPs would require much less
effort compared to this data point. In this case, the increased granularity CFPs provides
compared to FPs did not lead to better effort estimation. On the other hand, the size difference
between the smaller and 2 larger data points is only 10% in terms of FPs. Only 3 data points
from Industry’s dataset fall into this category, limiting the ability to generally determine how
effort grows with respect to size for this type of project.
Figure 14 Normalized Effort vs FPs (Left) and CFPs (Right) for Industry’s Record, Encrypt, Decrypt
Data Projects
0
200
400
600
800
1000
1200
1400
47 48 49 50 51 52 53 54
Normalized Effort (hrs)
IFPUG Function Points (FPs)
Record, En/Decrypt - FPs
0
100
200
300
400
500
600
0 5 10 15 20 25 30 35 40
Normalized Effort (hrs)
COSMIC Function Points (CFPs)
Record, En/Decrypt - CFPs
77
Table 31 Prediction Accuracy and Correlation Statistics for Industry’s Record, Encrypt, Decrypt Data
Projects
Statistic FPs CFPs
MAR 316.16 hrs 107.73 hrs
MdAR 279.84 hrs 93.96 hrs
MMRE 21.53% 15.45%
MdMRE 22.56% 5.99%
PRED (25) 100% 66.67%
rho (Correlation Coefficient) 0.866 0.866
Correlation p-value 1 1
5. Input and Outputs - Industry, UCC
This group consists of data points from both the Industry and UCC datasets. The tasks in
this group consist of input and output processes without any computational operations. While the
correlation coefficient between CFPs and normalized effort is strong (0.868), the prediction
accuracy statistics do not reach the acceptable threshold (PRED(25) is < 75%). The correlation
and prediction accuracy results are weak for FPs. See Figure 15 and Table 32 for the trends and
statistics.
Table 32 Prediction Accuracy and Correlation Statistics for Input and Output Projects
Statistic FPs CFPs
MAR 265.48 hrs 84.94 hrs
MdAR 44.58 hrs 63.09 hrs
MMRE 68.5% 28.22%
MdMRE 46% 26.93%
PRED (25) 50% 37.5%
rho (Correlation Coefficient) 0.683 0.868
Correlation p-value 0.083 0.011
78
Figure 15 Normalized Effort vs FPs (Left) and CFPs (Right) for Input and Output Projects
Before cross validation, the calibrated COCOMO® II model for CFPs returns PRED(25)
of 62.5%, which corresponds to the high correlation. Since the 3 groups of 2 data points are
distant from one another, cross validation decreases prediction performance tremendously. Since
FSMs account for data being moved through transactions and not the associated algorithms, one
would expect FSMs to perform well for input and output processes that simply move data across
the software boundary. With more data points across various application domains, I can further
analyze why the prediction accuracy is lower for this group.
Summary of Results
The hypothesis for this research question is: MMRE ≤ 25%, PRED(25) ≥ 75%, and
Spearman’s Correlation Coefficient (rho) is ≥ 0.8 for all types of projects. In other words, it is
assumed that FSMs has high correlation with effort and can provide high prediction accuracy for
all types of projects. However, the results explained above, and summarized in Table 33 below,
display that neither FPs nor CFPs performs equally well across the varying project types. Green
checkmarks in Table 33 mean that the MMRE, PRED(25), and Spearman’s Correlation
0
200
400
600
800
1000
1200
0 5 10 15 20 25 30 35 40
Normalized Effort (hrs)
IFPUG Function Points (FPs)
Inputs/Outputs - FPs
0
100
200
300
400
500
600
700
800
0 2 4 6 8 10 12 14 16
Normalized Effort (hrs)
COSMIC Function Points (CFPs)
Inputs/Outputs - CFPs
79
Coefficient (rho) were within the stated ranges, and red X’s mean at least 1 of the statistics is not
within the stated range.
Table 33 Summary of Research Question 2 Results
FPs CFPs
Low Parse – UCC ✘ ✘
High Parse – UCC ✘
✓
Data Transfer – Industry ✘
✓
Record, Encrypt, Decrypt – Industry
✓
✘
Inputs/Outputs – UCC, Industry ✘ ✘
In some cases, either the correlation between size and effort was high or the prediction
accuracy was high. Hence, there are so few green checkmarks in Table 33. The results suggest
that the correlation between FSMs and normalized effort does vary based on the amount of
complexity operations required with respect to the functional processes that are counted for size.
The correlation between FPs and normalized effort is low for the groups with Very Low, Low,
and Nominal ratings for Product Complexity (CPLX) (Low Parsing, High Parsing, and
Inputs/Outputs). The correlation between CFPs and normalized effort is low only for the Low
Parsing projects in UCC’s dataset. Prediction accuracy, however, depends on the variation of
effort compared to size among the groups and across all the projects. This analysis, however,
helps cost estimators and managers to determine how accurate the effort estimate is based on the
type of project and be able to recognize the risk associated with the estimate as well as the reason
for the risk. Additionally, cost estimators and managers can adjust the estimate by evaluating
how effort behaves compared to size by project type.
80
Chapter 8 Conclusions
8.1 General Conclusions
This research develops a generalizable effort estimation model for functional size metrics
(FSMs), which was done by calibrating the COCOMO® II model for IFPUG Function Points
(FPs) and COSMIC Function Points (CFPs). The first research question demonstrated that the
calibrated COCOMO® II model performs better than the other currently available methods to
use FSMs (see Section 7.1 Calibrated COCOMO® II vs Alternative Methods). The data used in
this research come from 2 different organizations and environments, and for the combined
dataset, the calibrated COCOMO® II model performs better than the other options for 2 reasons:
1. The calibrated COCOMO® II model accounts for most of the causal factors of effort
(through the effort drivers), making it a more generalizable and accurate model.
2. Additional layers of error are not added by converting one size metric to another. FPs and
CFPs are used directly in the COCOMO® II model (with some adjustments - see Chapter
6 Calibrated COCOMO® II Model).
The second research question helps cost estimators, managers, and teams determine how
accurately the calibrated COCOMO® II model will predict effort for specific types of projects.
This allows estimators, managers, and teams to adjust the effort estimate by observing the
relationship between size and effort for the project type or be aware of and account for the risk
associated with the estimate. This type of analysis differs from all research performed on FSMs.
To date, research has found that FSMs work well for specific application domains but have not
evaluated whether FSMs work well within sub-groups of these application domains or more
general software attributes/types. Five project groups were identified from the data gathered for
this research: low parsing, high parsing, data transfers, record/encrypt/decrypt, and
81
inputs/outputs. These projects varied from one another based on the complexity and number of
operations and algorithms required. The analysis in Section 7.2 Calibrated COCOMO® II for
Software Types found that FSMs do not perform similarly well on all of these project types.
While FPs had high correlation and led to accurate effort estimates for record/encrypt/decrypt,
CFPs did for high parsing and data transfer projects.
8.2 IFPUG versus COSMIC Function Points
The results of Research Questions 1 and 2 can also be a platform to discuss whether cost
estimators, managers, and development teams should use IFPUG Function Points (FPs) or
COSMIC Function Points (CFPs) for effort estimates. The prediction accuracy rates for the
calibrated COCOMO® II model across the combined UCC’s and Industry’s datasets were higher
for CFPs than FPs (see Table 26 and Table 27). Looking at the correlations and prediction
accuracy rates among the project types, addressed in Research Question 2 (see Table 33), CFPs
generally has more instances of better performance compared to FPs. However, there are cases
that FPs performs better than CFPs. In this section, I summarize the scenarios or attributes for
which FPs performs better than CFPs and vice versa.
Table 34 Calibrated COCOMO® II performance across FPs and CFPs for UCC's and Industry's
Datasets Individually
UCC Industry
Statistic FPs CFPs FPs CFPs
MAR (hrs) 170.39 86.97 292.93 255.17
MdAR (hrs) 58.83 59.80 292.93 58.83
MMRE 33.35% 20.15% 27.20% 22.35%
MdMRE 16.43% 17.81% 21.93% 19.80%
PRED(25) 68.75% 75% 66.67% 61.11%
82
As mentioned before, CFPs performed better than FPs when evaluating the prediction
accuracy across the combination of UCC’s and Industry’s data. Table 34 above shows the
prediction accuracy of the calibrated COCOMO® II model for UCC’s and Industry’s datasets
individually. The results indicate that while the increased granularity of CFPs helps improve
estimates overall, they are particularly effective on UCC’s data. In UCC’s data, small changes in
the number and types of data being transferred in transactions can lead to more significant
changes in effort. Additionally, UCC’s dataset consists of maintenance tasks. Since maintenance
tasks may add a smaller amount of change to existing functional processes, CFPs seems to
provide better granularity for effort estimates. On the other hand, FPs performs slightly better
than CFPs on the Industry’s dataset overall – projects primarily concerned with data storage and
movement.
When evaluating the performance of FPs and CFPs for specific project types in Research
Question 2, the acceptable thresholds were set to ideal settings for both correlation and
prediction accuracy. However, in some cases, the correlation between normalized effort and size
was strong, but the prediction accuracy statistics were weak and vice versa. A comparison of the
correlations across the types of projects is below in Table 35 and a comparison of the prediction
accuracy is below in Table 36. For Table 35, green checkmarks are given if the correlation
coefficient is ≥ 0.8 and red X’s otherwise. In Table 36, green checkmarks indicate that both
MMRE ≤ 25% and PRED(25) ≥ 75%.
Table 35 and Table 36 evaluate the performance of FPs and CFPs with more relaxed
settings/threshold expectations compared to Table 33 in the previous section. This allows cost
estimators to determine which size metric to use if either high correlation or high prediction
accuracy is sufficient. High correlation can indicate the trend towards the estimated project
83
effort, even if the prediction accuracy is low. In other words, cost estimators would be able to
determine the amount of and adjust for the risk associated with the estimate if either correlation
or prediction accuracy is low by looking at the trend for the one that is performing better.
Table 35 Comparison of Correlation across Project Types
FPs CFPs
Low Parse – UCC ✘ ✘
High Parse – UCC ✘
✓
Data Transfer – Industry
✓ ✓
Record, Encrypt, Decrypt – Industry
✓ ✓
Inputs/Outputs – UCC, Industry ✘
✓
Table 36 Comparison of Prediction Accuracy across Project Types
FPs CFPs
Low Parse – UCC
✓ ✓
High Parse – UCC ✘
✓
Data Transfer – Industry ✘
✓
Record, Encrypt, Decrypt – Industry
✓
✘
Inputs/Outputs – UCC, Industry ✘ ✘
In summary, the above analyses provide the following insights:
• Across multiple datasets with mixed project types, CFPs leads to better effort
estimates compared to FPs.
• CFPs are better able to size and provide accurate effort estimates for maintenance
tasks and other environments where smaller changes in the number of data groups
transferred can affect effort significantly (example: UCC’s dataset).
84
• FPs performs slightly better than CFPs (PRED(25) is 66.67% versus 61%) for new
development projects related to recording and transferring several data element types
(DETs).
• FSMs, generally, do not perform well for Object-Oriented Design. The reason is that
some functionality might not need to be re-implemented for particular inherited
classes/objects unless the functions in the parent class/object needs to be further
customized. The sizes do vary more in terms of CFPs compared to FPs since CFPs
describe size at a lower granularity compared to FPs.
• While correlations with effort are similar for both FPs and CFPs, CFPs leads to better
effort estimates for: UCC’s high parsing projects, data transfers, and inputs/outputs.
All of these groups transfer data, requiring different levels of complexity operations.
The lower granularity of size provided by CFPs allows for improved estimates.
• Finally, the granularity of CFPs does not help for estimating high complexity projects,
such as those that record, encrypt, and decrypt data in Industry’s dataset. FPs had a
more consistent size-effort relationship for these data points, leading to better effort
estimates.
These observations provide cost estimators, managers, and development teams with more
information that can help them make better effort estimates. While cost models must provide a
solution that best fits all data points for generalizable results, the analysis by project type allows
evaluators to determine how projects with similar attributes tend to perform in comparison to
other projects. Allowing for the effort estimate to be adjusted as necessary.
85
8.3 Summary of Contributions
In order to calibrate the COCOMO® II model for FSMs, 3 changes were required:
1. The constant representing productivity needed to be defined separately for new
development projects, enhancements adding new features, and enhancements modifying
existing features.
2. The effect of Product Complexity (CPLX) on effort is significantly stronger for FSMs
compared to SLOC (source lines of code). The reason for this is that the existence or lack
of complex operations does not affect the size in FSMs but does have at least a small
effect on the size in SLOC.
3. While the rate at which effort grows with respect to FPs is similar to SLOC, effort grows
at a slower rate for CFPs. The reason is due to granularity level differences among the
size metrics. Since SLOC represents size at a very low level of granularity, large changes
in size requires proportionally more effort to develop. Size in FPs grows more slowly,
due to the complexity groups within each transaction. However, effort grows more
steeply with respect to size in FPs. On the other hand, size grows more quickly with
respect to CFPs, as each data group is being counted for all functional transactions.
Therefore, effort grows more gradually as size grows.
The calibrated COCOMO® II model provides a generalizable effort estimation model
that allows FSMs as the size parameter. FSMs are easier to calculate and estimate earlier in the
software development lifecycle compared to SLOC, as they depend on the understanding of the
functional processes required and the data being transferred. Though a lot of empirical studies
analyzed the effectiveness of FSMs for effort estimation, a generalizable model did not exist to
date. The calibrated COCOMO® II model performs better than the methods available prior to
86
this research, allowing cost estimators, managers, and teams to make more accurate and
informed decisions earlier in the lifecycle. The analysis on project type (research question 2)
further allows cost estimators, managers, and teams to be aware of how accurate the estimates
might be for specific components or projects. Section 8.2 IFPUG versus COSMIC Function
Points discusses scenarios/attributes for which FPs performed better than CFPs and vice versa.
Having better estimates allows organizations to make realistic bids on projects, manage resources
for better quality products, track progress more accurately, perform trade-off analyses, and
increase morale. Since the calibrated COCOMO® II model details are provided in this
dissertation, cost estimators and managers can also locally calibrate the model to fit the
productivity of their environment, further improving the estimation accuracy.
8.4 Future Work
This dissertation takes the first step towards a generalizable effort estimation model for
FSMs, without requiring conversions between 2 size metrics. The calibrated COCOMO® II
model could be further improved in the following ways (requiring more data from various
development environments):
• COCOMO® II provides a reuse model to make reuse project size equivalent to new
development size in SLOC. The reuse model accounts for how easily the existing code
can be understood and how familiar the programmers are with the existing code, as well
as the amount of change being made to determine the equivalent size (B. Boehm et al.
2000). Developing a similar reuse model for FSMs may lead to more generalizable and
accurate estimates for maintenance tasks, as understanding and familiarity with the
existing code will have a big impact on the productivity.
87
• Recall that the developers of the COCOMO® II model identified 5 types of operations
that account for product complexity (CPLX): control, computational, device-dependent,
data management, and user interface operations. The average of the ratings for these
operations are used to determine the overall product complexity (CPLX) level. However,
it is possible that each of these complexity types have different effects on effort. For
instance, computational operations may require more research and effort not necessarily
accounted for in the size of the software (in terms of SLOC or FSMs) compared to the
other types of operations. I would like to analyze whether separating out these 5 types of
complexity operations with their individual parameter values would lead to better effort
estimates. I ran preliminary analyses with Industry’s and UCC’s datasets (individually
and combined), which led to some unintuitive/incorrect results for some of the
complexity operations.
• In this dissertation, FSMs did not perform well on projects requiring few to 0 algorithms
or computational operations. The reason for this could be that size does not change for
these projects (or not significantly), though effort does. To improve the use of FSMs for
these projects, one may want to complement the size with a metric that accounts for
changes in algorithms and operations. IFPUG developed the Software Non-functional
Assessment Process (SNAP) to complement FPs for a more complete representation of
size and therefore, better effort estimates. Preliminary analyses found that using SNAP
with both FPs and CFPs did improve effort estimates. However, I would need to
determine how to use both FPs/CFPs and SNAP with the COCOMO® II model.
88
One of the biggest challenges to develop generalizable effort estimation models is finding
data sources that collect or would allow a researcher to collect effort, size, and the corresponding
effort factors. Most datasets provide effort and size, but little to no information regarding the
personnel, product, and environmental effort factors. Of course, the more information that is
required, more time and effort is required to collect the data. Which is one of the biggest hurdles
to getting data from industry. However, this dissertation shows that getting effort factors, details
on the projects and tasks, along with size and effort lead to generalizable and well-performing
effort estimation models.
89
References
(COSMIC), Common Software Measurement International Consortium. 2014. The COSMIC Functional
Size Measurement Method-Version 4.0 Measurement Manual. The COSMIC Implementation Guide
for ISO/IEC. Vol. 19761.
(IFPUG), International Function Point Users Group. 2009. Sizing Component-Based Development Using
Function Points, Version 1.0.
———. 2010. Function Point Counting Practices Manual, Release 4.3.1.
Abran, Alain, Serge Oligny, and Charles Symons. 2000. “COSMIC FFP and the World-Wide Field Trials
Strategy.” New Approaches in Software Measurement, October, 125–34.
Abran, Alain, and Pierre N Robillard. 1996. “Function Points Analysis: An Empirical Study of Its
Measurement Processes.” IEEE Transactions on Software Engineering 22 (12): 895–910.
Abran, Alain, Ilionar Silva, and Laura Primera. 2002. “Field Studies Using Functional Size Measurement
in Building Estimation Models for Software Maintenance.” Journal of Software Maintenance and
Evolution: Research and Practice 14 (1): 31–64.
Abran, Alain, C Symons, and S Oligny. 2001. “An Overview of COSMIC-FFP Field Trial Results.” In
12th European Software Control and Metrics Conference--ESCOM, 2–4.
Abualkishik, Abedallah Zaid, Jean-Marc Desharnais, Adel Khelifi, Abdul Azim Abd Ghani, Rodziah
Atan, and Mohd Hasan Selamat. 2012. “An Exploratory Study on the Accuracy of FPA to COSMIC
Measurement Method Conversion Types.” Information and Software Technology 54 (11): 1250–64.
Ahn, Yunsik, Jungseok Suh, Seungryeol Kim, and Hyunsoo Kim. 2003. “The Software Maintenance
Project Effort Estimation Model Based on Function Points.” Journal of Software: Evolution and
Process 15 (2): 71–85.
Albrecht, Allan J, and John E Gaffney. 1983. “Software Function, Source Lines of Code, and
Development Effort Prediction: A Software Science Validation.” IEEE Transactions on Software
Engineering, no. 6: 639–48.
90
Berander, Patrik. 2004. “Using Students as Subjects in Requirements Prioritization.” In Proceedings.
2004 International Symposium on Empirical Software Engineering, 2004. ISESE’04., 167–76.
Boedec, Kevin Le. 2016. “Sensitivity and Specificity of Normality Tests and Consequences on Reference
Interval Accuracy at Small Sample Size: A Computer-Simulation Study.” Veterinary Clinical
Pathology 45 (4): 648–56.
Boehm, Barry, Chris Abts, A Winsor Brown, Sunita Chulani, Bradford K Clark, Ellis Horowitz, Ray
Madachy, and Bert Reifer, Donald J and Steece. 2000. Software Cost Estimation with Cocomo II
with Cdrom. Prentice Hall PTR.
Boehm, Barry W. 1981. Software Engineering Economics. Vol. 197. Prentice-hall Englewood Cliffs (NJ).
Bonett, Douglas G, and Thomas A Wright. 2000. “Sample Size Requirements for Estimating Pearson,
Kendall and Spearman Correlations.” Psychometrika 65 (1): 23–28.
Caldiera, Gianluigi, Giuliano Antoniol, Roberto Fiutem, and C Lokan. 1998. “Definition and
Experimental Evaluation of Function Points for Object-Oriented Systems.” In Software Metrics
Symposium, 1998. Metrics 1998. Proceedings. Fifth International, 167–78.
Chemuturi, Murali. 2009. Software Estimation Best Practices, Tools & Techniques: A Complete Guide
for Software Project Estimators. J. Ross Publishing.
Cockburn, Alistair. 2000. Writing Effective Use Cases. Addison-Wesley Professional.
Cohn, Mike. 2004. User Stories Applied: For Agile Software Development. Addison-Wesley
Professional.
Conte, Samuel Daniel, Hubert E Dunsmore, Vincent Y Shen, and Y E Shen. 1986. Software Engineering
Metrics and Models. Benjamin-Cummings Publishing Co., Inc.
Costagliola, Gennaro, Filomena Ferrucci, Carmine Gravino, Genoveffa Tortora, and Giuliana Vitiello.
2004. “A COSMIC-FFP Based Method to Estimate Web Application Development Effort.” In
International Conference on Web Engineering, 161–65.
91
Cuadrado-Laborde, C, A Díez, J L Cruz, and M V Andrés. 2010. “Experimental Study of an All-Fiber
Laser Actively Mode-Locked by Standing-Wave Acousto-Optic Modulation.” Applied Physics B 99
(1–2): 95–99.
DeCarlo, Lawrence T, and Warren W Tryon. 1993. “Estimating and Testing Autocorrelation with Small
Samples: A Comparison of the C-Statistic to a Modified Estimator.” Behaviour Research and
Therapy 31 (8): 781–88.
Dekkers, Carol, and Ian Gunter. 2000. “Using Backfiring to Accurately Size Software: More Wishful
Thinking Than Science?” IT Metrics Strategies 6 (11): 1–8.
Desharnais, Jean-Marc, and Alain Abran. 2003. “Approximation Techniques for Measuring Function
Points.” In 13th International Workshop on Software Measurement--IWSM, 23–25.
Desharnais, Jean-Marc, Alain Abran, and Juan Cuadrado. 2006. “Convertibility of Function Points to
COSMIC-FFP: Identification and Analysis of Functional Outliers.” ENSUR A, 190.
Eick, Stephen G, Todd L Graves, Alan F Karr, J Steve Marron, and Audris Mockus. 2001. “Does Code
Decay? Assessing the Evidence from Change Management Data.” IEEE Transactions on Software
Engineering 27 (1): 1–12.
Fenton, Norman, and James Bieman. 2014. Software Metrics: A Rigorous and Practical Approach. CRC
Press.
Ferens, Daniel V. 1988. “Software Size Estimation Techniques.” In Aerospace and Electronics
Conference, 1988. NAECON 1988., Proceedings of the IEEE 1988 National, 701–5.
Ferrucci, Filomena, Carmine Gravino, and Federica Sarro. 2014. “Conversion from IFPUG FPA to
COSMIC: Within-vs without-Company Equations.” In Software Engineering and Advanced
Applications (SEAA), 2014 40th EUROMICRO Conference On, 293–300.
Galorath Incorporated. n.d. “Function Based Sizing in SEER for Software (SEER-SEM).”
https://www.yumpu.com/en/document/view/32023312/function-based-sizing-in-seer-sem-galorath.
92
Gencel, Cigdem, and Carl Bideau. 2012. “Exploring the Convertibility between IFPUG and COSMIC
Function Points: Preliminary Findings.” In Software Measurement and the 2012 Seventh
International Conference on Software Process and Product Measurement (IWSM-MENSURA), 2012
Joint Conference of the 22nd International Workshop On, 170–77.
Gencel, Cigdem, and Onur Demirors. 2008. “Functional Size Measurement Revisited.” ACM
Transactions on Software Engineering and Methodology (TOSEM) 17 (3): 15.
Gencel, Cigdem, Rogardt Heldal, and Kenneth Lind. 2009. “On the Relationship between Different Size
Measures in the Software Life Cycle.” In 2009 16th Asia-Pacific Software Engineering Conference,
19–26.
Hastings, T E, and A S M Sajeev. 2001. “A Vector-Based Approach to Software Size Measurement and
Effort Estimation.” IEEE Transactions on Software Engineering 27 (4): 337–50.
Heeringen, Harold van. 2007. “Changing from FPA to COSMIC A Transition Framework.” In
Proceedings of Software Measurement European Forum, 143–54.
Helmer, Olaf, Bernice Brown, and Theodore J Gordon. 1966. Social Technology.[By] Olaf Helmer.[With]
Contributions by Bernice Brown, Theodore Gordon. Basic Books.
Henderson, Garland S. 1992. “The Application of Function Points to Predict Source Lines of Code for
Software Development.”
Hira, Anandi, and Barry Boehm. 2016. “Function Point Analysis for Software Maintenance.” In
Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering
and Measurement, 48.
———. 2018. “COSMIC Function Points Evaluation for Software Maintenance.” In Proceedings of the
11th Innovations in Software Engineering Conference, 4.
Hira, Anandi, Barry Boehm, Robert Stoddard, and Michael Konrad. 2018a. “Further Causal Search
Analyses With UCC’s Effort Estimation Data.” In 15th Annual Acquisition Research Symposium.
Naval Postgraduate School, Monterey, California.
93
———. 2018b. “Preliminary Causal Discovery Results with Software Effort Estimation Data.” In
Proceedings of the 11th Innovations in Software Engineering Conference, 6.
Hira, Anandi, Shreya Sharma, and Barry Boehm. 2016. “Calibrating COCOMO®II for Projects with
High Personnel Turnover.” In Proceedings of the International Workshop on Software and Systems
Process, 51–55.
Ho, Vinh T, Alain Abran, and T Fetcke. 1999. “A Comparative Study Case of COSMIC-FFP, Full
Function Point and IFPUG Methods.” Department of Informatics, University of Quebec at Montreal,
Canada.
Jeng, Bingchiang, Dowming Yeh, Deron Wang, Shu-Lan Chu, Chia-Mei Chen, and others. 2011. “A
Specific Effort Estimation Method Using Function Point.” Journal of Information Science and
Engineering 27 (4): 1363–76.
Jiang, Zhizhong, Peter Naudé, and Binghua Jiang. 2007. “The Effects of Software Size on Development
Effort and Software Quality.” International Journal of Computer and Information Science and
Engineering 1 (4): 230–34.
Jones, Capers. 1996. “Applied Software Measurement: Assuring Productivity and Quality.” McGraw-
Hill, New York 17 (1): 2.
Jørgensen, Magne. 1995. “An Empirical Study of Software Maintenance Tasks.” Journal of Software:
Evolution and Process 7 (1): 27–48.
———. 2004. “A Review of Studies on Expert Estimation of Software Development Effort.” Journal of
Systems and Software 70 (1–2): 37–60.
Kemerer, Chris F. 1987. “An Empirical Validation of Software Cost Estimation Models.”
Communications of the ACM 30 (5): 416–29.
Kitchenham, Barbara A. 2002. “The Question of Scale Economies in Software—Why Cannot
Researchers Agree?” Information and Software Technology 44 (1): 13–24.
94
Ko, Andrew J, Brad A Myers, Michael J Coblenz, and Htet Htet Aung. 2006. “An Exploratory Study of
How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance
Tasks.” IEEE Transactions on Software Engineering 32 (12).
Kocaguneli, Ekrem, and Tim Menzies. 2013. “Software Effort Models Should Be Assessed via Leave-
One-out Validation.” Journal of Systems and Software 86 (7): 1879–90.
Koh, T W, M H Selamat, and A Ghani. 2008. “Exponential Effort Estimation Model Using Unadjusted
Function Points.” Information Technology Journal 7 (6): 830–39.
LaToza, Thomas D, Gina Venolia, and Robert DeLine. 2006. “Maintaining Mental Models: A Study of
Developer Work Habits.” In Proceedings of the 28th International Conference on Software
Engineering, 492–501.
Lavazza, Luigi. 2014. “An Evaluation of the Statistical Convertibility of Function Points into COSMIC
Function Points.” Empirical Software Engineering 19 (4): 1075–1110.
Lavazza, Luigi, and Sandro Morasca. 2013. “Measuring the Functional Size of Real-Time and Embedded
Software: A Comparison of Function Point Analysis and COSMIC.” In 8th Int. Conf. on Software
Engineering Advances--ICSEA.
Lévesque, Ghislain, and Valéry Bevo. 2001. “Measuring Size for the Development of A Cost Model: A
Comparison of Results Based on COSMIC FFP and SLIM Back-Firing Function Points.” In
Proceedings of the 11th International Workshop on Software Measurement, Montreal, Canada.
Malik, Ali Afzal. 2010. “QUANTITATIVE AND QUALITATIVE ANALYSES OF REQUIREMENTS
ELABORATION FOR EARLY SOFTWARE SIZE ESTIMATION.” UNIVERSITY OF
SOUTHERN CALIFORNIA.
Martino, Sergio Di, Filomena Ferrucci, Carmine Gravino, and Federica Sarro. 2016. “Web Effort
Estimation: Function Point Analysis vs. COSMIC.” Information and Software Technology 72: 90–
109.
95
Matson, Jack E, Bruce E Barrett, and Joseph M Mellichamp. 1994. “Software Development Cost
Estimation Using Function Points.” IEEE Transactions on Software Engineering 20 (4): 275–87.
McCabe, Thomas J. 1976. “A Complexity Measure.” IEEE Transactions on Software Engineering, no. 4:
308–20.
McConnell, Steve. 2006. Software Estimation: Demystifying the Black Art. Microsoft press.
Menzies, Tim, Dan Port, Zhihao Chen, Jairus Hihn, and Sherry Stukes. 2005. “Validation Methods for
Calibrating Software Effort Models.” In Proceedings of the 27th International Conference on
Software Engineering, 587–95.
Nagano, Shin-ichi, Ken-ichi Mase, Yasuo Watanabe, Takaichi Watahiki, and Shigeru Nishiyama. 2002.
“Validation of Application Results of COSMIC-FFP to Switching Systems.” Journal of Information
Processing Society of Japan, IPSJ SIG Notes 35: 1–7.
Nelson, Edward Axel. 1967. “Management Handbook for the Estimation of Computer Programming
Costs.”
Neumann, R, and Luca Santillo. 2006. “Experiences with the Usage of COCOMOII.” In Proc. of
Software Measurement European Forum, 2006:269–80.
Nguyen, Vu. 2010. “Improved Size and Effort Estimation Models for Software Maintenance (Software
Engineering).” Ph. D. Dissertation. University of Southern California, Los Angeles, CA. UTI Order.
Niessink, Frank, and Hans Van Vliet. 1997. “Predicting Maintenance Effort with Function Points.” In
Software Maintenance, 1997. Proceedings., International Conference On, 32–39.
Nolan, Andy, Olimpia Vlad, Andrew C Pickard, and Richard Beasley. 2017. “Fortune Telling, Estimating
& Systems Engineering.” In INCOSE International Symposium, 27:1682–98.
Park, Robert E. 1992. “Software Size Measurement: A Framework for Counting Source Statements.”
PRICE Systems L.L.C. 2011. “Software Estimating Model for TruePlanning Version 4.0.”
https://www.pricesystems.com/resource/software-estimating-model-for-trueplanning/.
96
Rabbi, Md Forhad, Shailendra Natraj, and Olorisade Babatunde Kazeem. 2009. “Evaluation of
Convertibility Issues between IFPUG and COSMIC Function Points.” In 2009 Fourth International
Conference on Software Engineering Advances, 277–81.
Rayner, Robert K. 1994. “The Small-Sample Power of Durbin’s h Test Revisited.” Computational
Statistics & Data Analysis 17 (1): 87–94.
Rollo, T. 2006. “Functional Size Measurement and COCOMO--A Synergistic Approach.” In Proc. of
Software Measurement European Forum, 2006:259–67.
Runeson, Per. 2003. “Using Students as Experiment Subjects--an Analysis on Graduate and Freshmen
Student Data.” In Proceedings of the 7th International Conference on Empirical Assessment in
Software Engineering, 95–102.
Salman, Iflaah, Ayse Tosun Misirli, and Natalia Juristo. 2015. “Are Students Representatives of
Professionals in Software Engineering Experiments?” In 2015 IEEE/ACM 37th IEEE International
Conference on Software Engineering, 1:666–76.
Santillo, Luca. 2006. “Error Propagation in Software Measurement and Estimation.” In IWSM/Metrikon
2006 Conference Proceedings, Potsdam, Berlin, Germany, 2–3.
Singer, Janice, Timothy Lethbridge, Norman Vinson, and Nicolas Anquetil. 2010. “An Examination of
Software Engineering Work Practices.” In CASCON First Decade High Impact Papers, 174–88.
Stern, S, and O Guetta. 2010. “Manage the Automotive Embedded Software Development Cost by Using
a Functional Size Measurement Method (COSMIC).” In European Congress of ERTS.
Symons, Charles R. 1988. “Function Point Analysis: Difficulties and Improvements.” IEEE Transactions
on Software Engineering 14 (1): 2–11.
Trendowicz, Adam, and Ross Jeffery. 2014. Software Project Effort Estimation: Foundations and Best
Practice Guidelines for Success. Springer.
Vogelezang, Frank, Arlan Lesterhuis, and others. 2003. “Applicability of COSMIC Full Function Points
in an Administrative Environment: Experiences of an Early Adopter.” In Proceedings of the 13th
International Workshop on Software Measurement--IWSM 2003.
97
Whigham, Peter A, Caitlin A Owen, and Stephen G Macdonell. 2015. “A Baseline Model for Software
Effort Estimation.” ACM Transactions on Software Engineering and Methodology (TOSEM) 24 (3):
20.
Wittig, Gerhard E, and G R Finnic. 1994. “Using Artificial Neural Networks and Function Points to
Estimate 4GL Software Development Effort.” Australasian Journal of Information Systems 1 (2).
Xunmei, Gu, Song Guoxin, and Zheng Hong. 2006. “The Comparison between FPA and COSMIC-FFP.”
In Proceedings of Software Measurement European Forum (SMEF) Conference, 113–14.
Yang, Ye, Lang Xie, Zhimin He, Qi Li, Vu Nguyen, Barry Boehm, and Ricardo Valerdi. 2011. “Local
Bias and Its Impacts on the Performance of Parametric Estimation Models.” In Proceedings of the
7th International Conference on Predictive Models in Software Engineering, 14.
Zheng, Yinhuan, Beizhan Wang, Yilong Zheng, and Liang Shi. 2009. “Estimation of Software Projects
Effort Based on Function Point.” In Computer Science & Education, 2009. ICCSE’09. 4th
International Conference On, 941–43.
98
Appendices
Appendix A: Unified Code Count (UCC)’s Dataset
Table 37 contains UCC’s Dataset used for analyses in this research (excluding the factors
that remain constant across all of the data points for the sake of space – can find the ratings for
those factors in Section 4.1.2). A summary of how IFPUG Function Points (FPs) and COSMIC
Function Points (CFPs) are calculated for UCC’s project is provided in Appendix B, and the
definitions of the effort drivers, defined in (B. Boehm et al. 2000), are repeated in Appendix C.
Table 37 Unified Code Count (UCC)'s Dataset
Project
Actual
Effort
(hrs)
SLOC FPs CFPs PREC CPLX
CPLX-
Control
CPLX-
Comput
ational
DOCU ACAP PCAP
Assembly -
Microprocessor
x86
742.083 318 4 6 Nominal Low Nominal
Very
Low
Nominal Nominal Nominal
Cobol 1420.960 460 5 10 Nominal Low Nominal
Very
Low
Nominal Low Nominal
Objective C 1209.833 448 5 12 Nominal
Very
Low
Low
Very
Low
High Nominal High
Word/Text 1500.617 89 4 6 Low Nominal High
Very
Low
Nominal Low Nominal
Assembly - MIPS 1350.950 442 5 7 Nominal Low Nominal
Very
Low
Nominal Nominal Low
Makefile 785.950 192 4 5 Nominal Low Nominal
Very
Low
Nominal Low Nominal
DOS Batch
Counter
916.700 498 5 11 Nominal Low Nominal
Very
Low
High High
Very
High
Function-level
Differencing
838.000 491 4 4 Nominal Nominal High
Very
Low
Nominal Nominal Nominal
extfile Feature 465.167 67 10 3 Nominal Nominal Nominal
Very
Low
Nominal Nominal Nominal
Convert Main
Components to
Java
462.700 116 8 5 High
Very
Low
Low N/A Nominal
Very
High
High
Convert C/C++
Counter to Java
321.700 184 4 5 High Low Nominal
Very
Low
Nominal High High
Cyclomatic
Complexity -
C++
166.500 104 4 5 Nominal
Very
Low
Very
Low
N/A Nominal
Very
High
Very
High
Cyclomatic
Complexity -
Java
166.500 23 4 2 Nominal Low Nominal
Very
Low
Nominal
Very
High
High
Cyclomatic
Complexity - C#
250.100 38 4 3 Nominal Low Nominal
Very
Low
Nominal High Nominal
99
Project
Actual
Effort
(hrs)
SLOC FPs CFPs PREC CPLX
CPLX-
Control
CPLX-
Comput
ational
DOCU ACAP PCAP
Cyclomatic
Complexity - Perl
250.100 38 4 3 Nominal
Very
Low
Low
Very
Low
Nominal High Nominal
Cyclomatic
Complexity -
VisualBasic
251.100 51 4 4 Nominal
Very
Low
Low
Very
Low
Nominal High Nominal
Cyclomatic
Complexity - Ada
230.750 32 4 3 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity -
Fortran
230.750 29 4 3 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity -
Pascal
219.150 39 4 4 Nominal
Very
Low
Low
Very
Low
Nominal High High
Cyclomatic
Complexity -
Python
216.150 117 4 4 Nominal
Very
Low
Low
Very
Low
Nominal High High
Cyclomatic
Complexity -
Ruby
249.750 86 4 4 Nominal
Very
Low
Low
Very
Low
Nominal Nominal High
Cyclomatic
Complexity -
Bash
204.667 65 4 3 Nominal
Very
Low
Low
Very
Low
Nominal Nominal High
Cyclomatic
Complexity -
CFScript
252.167 82 4 3 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity - C
Shell
195.600 41 4 2 Nominal
Very
Low
Low
Very
Low
Nominal High High
Cyclomatic
Complexity -
Cold Fusion
183.600 96 4 4 Nominal
Very
Low
Low
Very
Low
Nominal
Very
High
High
Cyclomatic
Complexity -
PHP
333.400 70 4 3 Nominal Low Nominal
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity -
MIDAS
619.200 51 12 7 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity -
Verilog
381.500 81 4 4 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexity -
VHDL
381.500 88 4 3 Nominal
Very
Low
Low
Very
Low
Nominal Nominal Nominal
Cyclomatic
Complexityc -
Javascript
243.500 119 4 4 Nominal
Very
Low
Very
Low
N/A Nominal High High
Cyclomatic
Complexity - P/L
SQL
374.000 6 4 2 Nominal
Very
Low
Nominal N/A Nominal Low Nominal
Extfile
Enhancements
419.250 438 15 7 Nominal
Very
Low
Nominal N/A Nominal High Nominal
Appendix B: Calculation Summary of FPs and CFPs for Unified Code Count
(UCC)’s Dataset
The below tables describe the major functionality added by the enhancement projects in
UCC’s dataset, and how SLOC, FPs, and CFPs are associated to them. As demonstrated in the
below tables, CFPs are able to size projects at a lower-granularity level compared to FPs. SLOC
varies even more as it represents size at an even lower level of granularity.
Table 38 SLOC, FPs, and CFPs counts for functionality in UCC's Dataset
Assembly - x86
Function SLOC FPs CFPs
Send Language-specific keywords to Main
Components
115
3
Output Directive Lines
203 4
1
Output Exec Lines 1
Output Logical Lines
1
Output Nested Loops
Output Complexity
Parse Functions
Parse for Cyclomatic Complexity
Total
318 4 6
Cobol
Function SLOC FPs CFPs
Send Language-specific keywords to Main
Components
239
5
Output Directive Lines
144
5
1
Output Exec Lines
1
Output Data Lines 1
Output Logical Lines
1
Output Nested Loops 13 1
Output Complexity
Parse Functions
Parse for Cyclomatic Complexity
Total
460 5 10
101
Objective C
Function SLOC FPs CFPs
Send Language-specific keywords to Main
Components
200
5
Output Directive Lines
175
5
1
Output Exec Lines
1
Output Data Lines
1
Output Logical Lines
1
Output Nested Loops
73 1
Output Complexity
Parse Functions 45 2
Parse for Cyclomatic Complexity
Total 493 5 12
Assembly - MIPS
Function SLOC FPs CFPs
Send Language-specific keywords to Main
Components
147
2
Output Directive Lines
256
5
1
Output Exec Lines 1
Output Data Lines 1
Output Logical Lines
1
Output Nested Loops 39 1
Output Complexity
Parse Functions
Parse for Cyclomatic Complexity
Total
442 5 7
102
DOS Batch
Function SLOC FPs CFPs
Send Language-specific keywords to Main
Components
197
4
Output Exec Lines
191
5
1
Output Data Lines 1
Output Logical Lines
1
Output Nested Loops
Output Complexity 80 1
Parse Functions
30
2
Parse for Cyclomatic Complexity 1
Total
498 5 11
Word/Text
Function SLOC FPs CFPs
Output Logical Lines 27
4
1
Output Physical Lines 6 1
Output Keywords 56 4
Total 89 4 6
Cyclomatic Complexity – C++
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
22
1
Parse Functions 26
1
Parse for Cyclomatic Complexity 40
4
2
Return Level of complexity
16 1
Total
104 4 5
Cyclomatic Complexity – Java
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
23
2
Parse Functions
4
Parse for Cyclomatic Complexity
Total
23 4 2
103
Cyclomatic Complexity – C#
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
7
1
Parse Functions 31
4
2
Parse for Cyclomatic Complexity
Total
38 4 3
Cyclomatic Complexity – Perl
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
9
1
Parse Functions 29
4
2
Parse for Cyclomatic Complexity
Total
38 4 3
Cyclomatic Complexity – Visual Basic
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
9
1
Parse Functions
30
4
2
Parse for Cyclomatic Complexity 12 1
Total
51 4 4
Cyclomatic Complexity – Ada
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
7
1
Parse Functions
25
4
2
Parse for Cyclomatic Complexity
Total
32 4 3
104
Cyclomatic Complexity – Fortran
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
4
1
Parse Functions 25
4
2
Parse for Cyclomatic Complexity
Total
29 4 3
Cyclomatic Complexity – Pascal
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
5
1
Parse Functions 25
4
2
Parse for Cyclomatic Complexity 9 1
Total
39 4 4
Cyclomatic Complexity – Python
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
5
1
Parse Functions
47
4
2
Parse for Cyclomatic Complexity 65 1
Total
117 4 4
Cyclomatic Complexity – Ruby
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
9
1
Parse Functions
39
4
2
Parse for Cyclomatic Complexity 38 1
Total
86 4 4
105
Cyclomatic Complexity – Bash
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
7
1
Parse Functions 58
4
2
Parse for Cyclomatic Complexity
Total
65 4 3
Cyclomatic Complexity – CF Script
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
7
1
Parse Functions 75
4
2
Parse for Cyclomatic Complexity
Total
82 4 3
Cyclomatic Complexity – C Shell
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
6
1
Parse Functions
4
Parse for Cyclomatic Complexity 35 1
Total
41 4 2
Cyclomatic Complexity – Cold Fusion
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
13
1
Parse Functions
49
4
2
Parse for Cyclomatic Complexity 34 1
Total
96 4 4
106
Cyclomatic Complexity – PHP
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
13
1
Parse Functions 57
4
2
Parse for Cyclomatic Complexity
Total
70 4 3
Cyclomatic Complexity – MIDAS (3 variations)
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
10
3
Parse Functions
41 12
2
Parse for Cyclomatic Complexity 2
Total
51 12 7
Cyclomatic Complexity – Verilog
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
6
1
Parse Functions
33
4
2
Parse for Cyclomatic Complexity 42 1
Total
81 4 4
Cyclomatic Complexity – PVHDL
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
6
1
Parse Functions
39
4
2
Parse for Cyclomatic Complexity 43
Total
70 4 3
107
Cyclomatic Complexity – JavaScript
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
6
1
Parse Functions 56
4
2
Parse for Cyclomatic Complexity 57 1
Total
119 4 4
Cyclomatic Complexity – P/L SQL
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
6
2
Parse Functions
4
Parse for Cyclomatic Complexity
Total
6 4 2
Convert C++ Counter to Java
Function SLOC FPs CFPs
Send Language-specific, Cyclomatic Complexity
keywords to Main Components
184 4 5
Total
184 4 5
Convert Main Components to Java
Function SLOC FPs CFPs
Output Counting Results
116
4 1
Output Differencing Results
4 4
Total 116 8 5
extfile Feature
Function SLOC FPs CFPs
Language to Extension Map (called extfile)
58 7 2
Check input against map
9 3 1
Total
67 10 3
108
Function-level Differencing
Function SLOC FPs CFPs
Output Function-level Differencing Results
491 4 4
Total
491 4 4
extfile Enhancements
Function SLOC FPs CFPs
Allow comments in extfile
135 7 1
Input custom header
299
0 1
Output module-level reports
4 4
Only show differenced files
4 4 1
Total
438 15 7
109
Appendix C: COCOMO® II Effort Factors Description
Detailed explanations of the COCOMO® II effort factors and their ratings are provided
here in Table 39 for easy reference. They come directly from (B. Boehm et al. 2000).
Table 39 COCOMO® II Effort Factors Definitions per Rating Level
Very Low Low Nominal High Very High Extra High
PREC
(Precedentedness)
Thoroughly
unprecedented
Largely
unprecedented
Somewhat
unprecedented
Generally
familiar
Largely familiar
Thoroughly
familiar
FLEX
(Development
Flexibility)
Rigorous
Occasional
Relaxation
Some Relaxation
General
Conformity
Some
Conformity
General Goals
RESL
(Architecture/
Risk Resolution)
Little (20%) Some (40%) Often (60%) Generally (75%) Mostly (90%) Fully (100%)
TEAM (Team
Cohesion)
Very Difficult
Interactions
Some Difficult
Interactions
Basically
Cooperative
Interactions
Largely
Cooperative
Highly
Cooperative
Seamless
Interactions
PMAT (Process
Maturity)
CMM/CMMI
Level 1 Lower
CMM/CMMI
Level 1 Upper
CMM/CMMI
Level 2
CMM/CMMI
Level 3
CMM/CMMI
Level 4
CMM/CMMI
Level 5
RELY (Required
Software
Reliability)
Slight
Inconvenience
Low, easily
recoverable
losses
Moderate, Easily
recoverable
losses
High financial
loss
Risk to human
life
DATA (Test
Database Size)
Testing DB
bytes/ Program
SLOC < 10
10 < D/P < 100
100 < D/P <
1000
D/P > 1000
CPLX-Control
(Product
Complexity,
Control
Operations)
Straight-line
code with few
non-nested
structured
programming
operations
Straightforward
nesting of
structured
programming
operators. Mostly
simple
predicates.
Mostly simple
nesting. Some
inter-module
control. Decision
tables, simple
callbacks or
message passing.
Highly nested
structured
programming
operators with
many compound
predicates.
Queue and stack
control.
Reentrant and
recursive coding.
Fixed-priority
interrupt
handling. Task
sync, complex
callbacks.
Multiple resource
scheduling.
CPLX-
Computational
(Product
Complexity,
Computation
Operations)
Evaluation of
simple
expressions: e.g.,
A = B + C*(D-
E)
Evaluation of
moderate-level
expressions: e.g.,
D = SQRT(B*2-
4*A*C)
Use of standard
math and
statistical
routines. Basic
matrix/vector
operations.
Basic numerical
analysis.
Difficult but
structured
numerical
analysis.
Difficult and
unstructured
numerical
analysis
CPLX-Devices
(Product
Complexity,
Device-
Dependent
Operations)
Simple read,
write statements
with simple
formats.
No cognizance
needed of
particular
processor or I/O
device
characteristics.
I/O Processing
includes device
selection, status
checking and
error processing.
Operations at
physical I/O
level. Optimized
I/O overlap.
Routines for
interrupt
diagnosis,
servicing,
masking.
Communication
line handling.
Device timing-
dependent
coding, micro-
programmed
operations.
CPLX-Data
(Product
Complexity, Data
Management
Operations)
Simple arrays in
main memory.
Single file
subsetting with
no data structure
changes, no edits,
no intermediate
files.
Multi-file input
and single file
output. Simple
structural
changes, simple
edits.
Simple triggers
activated by data
stream contents.
Distributed
database
coordination.
Complex
triggers. Search
optimization.
Highly coupled,
dynamic
relational and
object structures.
110
Very Low Low Nominal High Very High Extra High
CPLX-Interfaces
(Product
Complexity, User
Interface
Management
Operations)
Simple input
forms, report
generators.
Use of simple
GUI builders.
Simple use of
widgets.
Widget
development and
extension.
Simple voice I/O,
multimedia.
Moderately
complex 2D/3D,
dynamic
graphics,
multimedia.
Complex
multimedia,
virtual reality,
natural language
interface.
RUSE
(Developed for
Reusability)
None Across Project Across Program
Across Product
Line
Across Multiple
Product Lines
DOCU
(Documentation
Match to
Lifecycle Needs)
Many lifecycle
needs uncovered
Some lifecycle
needs uncovered
Right-sized to
lifecycle needs
Excessive for
lifecycle needs
Very excessive
for lifecycle
needs
TIME (Execution
Time Constraint)
< 50% of
available
execution time
70% use of
available
execution time
85% use of
available
execution time
95% use of
available
execution time
STOR (Main
Storage
Constraint)
< 50% of
available storage
70% use of
available storage
85% use of
available storage
95% use of
available storage
PVOL (Platform
Volatility)
Major change
every year;
Minor change
every month
Major change
every 6 mo.,
Minor change
every 2 weeks
Major change
every 2 mo.,
Minor change
every week
Major change
every 2 week;
Minor change
every 2 days
ACAP (Analyst
Capability)
15th percentile 53th percentile 55th percentile 75th percentile 90th percentile
PCAP
(Programmer
Capability)
15th percentile 53th percentile 55th percentile 75th percentile 90th percentile
PCON (Personnel
Continuity)
48%/year 24%/year 12%/year 6%/year 3%/year
APEX
(Applications
Experience)
< 2 months 6 months 1 year 3 years 6 years
LTEX (Language
and Tool
Experience)
< 2 months 6 months 1 year 3 years 6 years
PLEX (Platform
Experience)
< 2 months 6 months 1 year 3 years 6 years
TOOL (Use of
Software Tools)
Edit, code, debug
Simple, fronted,
backend CASE,
little integration
Basic lifecycle
tools, moderately
integrated
Strong, mature
lifecycle tools,
moderately
integrated
Strong, mature,
proactive
lifecycle tools,
well integrated
with processes,
methods, reuse
SITE-Collation
(Multisite
Development)
International
Multi-city and
Multi-company
Multi-city or
Multi-company
Same city or
metro area
Same building or
complex
Fully collocated
SITE-
Communications
Some phone,
mail
Individual phone,
FAX
Email
Electronic
communication
Electronic
communication,
occasional video
conf.
Interactive
multimedia
SCED (Required
Development
Schedule)
75% of nominal 85% of nominal 100% of nominal 130% of nominal 160% of nominal
Appendix D: Comparison of COCOMO® II Calibration Step 1 Options
While the option to calibrate separate productivity factors for new development, adding
new features, and modifying existing features and re-calibrating the Productivity Complexity
(CPLX) was shown to have the highest R
2
compared to alternatives in Chapter 6 Calibrated
COCOMO® II Model, this Appendix provides the prediction accuracy statistics results without
cross validation (hence the difference in prediction accuracy for the selected model compared to
results shown in Chapter 7 Effort Estimation Effectiveness). The purpose of this Appendix is to
show that the selected model does indeed out-perform the other options in terms of prediction
accuracy, not just R
2
.
Recall, the options for calibration COCOMO® II for FSMs are:
• Single Productivity Rate (Prod Rate)
• Productivity Rate for New Development/Enhancement (New/Enh)
• Productivity Rate for New Development/Add Features/Modify Features (New/Add/Mod)
• Single Productivity Factor & CPLX (Prod & CPLX)
• New Development/Enhancement & CPLX (New/Enh & CPLX)
• New Development/Add Features/Modify Features & CPLX (New/Add/Mod & CPLX)
The prediction accuracy statistics of the options for FPs are below in Table 40. Note, the
below prediction accuracy statistics are without cross validation, hence the prediction accuracy
for the New/Add/Mod & CPLX option is different than the reported results in Chapter 7 Effort
Estimation Effectiveness.
112
Table 40 Prediction Accuracy Comparison of Calibration Step 1 Options for FPs
Statistic Prod Rate New/Enh
New/Add/
Mod
Prod &
CPLX
New/Enh
& CPLX
New/Add/
Mod &
CPLX
MAR (hrs) 546.41 646.8 659.03 232.86 229.21 183.92
MdAR (hrs) 267.9 213.12 226.31 84.38 84.95 76.74
MMRE 76.36% 68.39% 88.15% 33.96% 33.8% 26.48%
MdMRE 79.34% 65.47% 72.21% 24.31% 18.82% 16.24%
PRED(25) 18% 10% 6% 54% 54% 68%
The prediction accuracy statistics of the options for CFPs are below in Table 41. Note,
the below prediction accuracy statistics are without cross validation, hence the prediction
accuracy for the New/Add/Mod & CPLX option is different than the reported results in Chapter
7 Effort Estimation Effectiveness.
Table 41 Prediction Accuracy Comparison of Calibration Step 1 Options for CFPs
Statistic Prod Rate New/Enh
New/Add/
Mod
Prod &
CPLX
New/Enh
& CPLX
New/Add/
Mod &
CPLX
MAR (hrs) 475.52 585.12 674.29 304.35 124.69 124.48
MdAR (hrs) 232.03 293.05 244.07 114.2 56.16 61.08
MMRE 63.17% 81.19% 84.1% 34.4% 19.14% 18.49%
MdMRE 61.09% 74.4% 74.66% 22.1% 20.13% 16.44%
PRED(25) 18% 10% 8% 62% 70% 78%
113
Appendix E: Linear and Nonlinear Equations, and Conversion Ratios used for
Research Question 1
The models and conversion ratios used for the estimation methods inspired by research
papers and used in Chapter 7 Effort Estimation Effectiveness are described in this Appendix. The
methods are:
• Run linear regression (approach primarily used in empirical research)
• Run linear regression on the log transformation of the data (another approach used in
empirical research to account for the nonlinear relationship between size and effort)
• Convert FPs to SLOC with ratios published by Capers Jones then using COCOMO® II
(how COCOMO® II currently supports FPs)
• Locally calibrate the FSMs to SLOC ratio and using COCOMO® II (suggested for better
accuracy; include CFPs even though not suggested)
• Locally calibrate CFPs to FPs ratio and use FP linear and nonlinear (approach suggested
in empirical research)
Linear and Nonlinear Regression
Table 42 shows the linear and nonlinear equations and their R
2
for both FPs and CFPs.
Table 42 Linear and Nonlinear Models to Estimate Effort with FPs and CFPs
Size Equation/Model R
2
FPs
Effort (hrs) = 279.22 + 21.456 * FPs 0.673
Effort (hrs) = 136.62 * FPs
0.5301
0.404
CFPs
Effort (hrs) = 341.94 + 28.173 * CFPs 0.539
Effort (hrs) =114.446 * CFPs
0.7029
0.631
114
Conversion Ratios
All projects in UCC’s dataset were developed either in C++ or Java. Capers Jones found
that the average SLOC per FP is 53 for both programming languages. Industry projects were
developed in C, Verilog, and VHDL. The SLOC/FP ratio for C is 128 and 19 for VHDL. Capers
Jones does not provide a conversion ratio for Verilog (Jones 1996). Hence, I took the average of
19 and 128 (73.5) to convert the Industry FPs to SLOC.
Since I did not collect the SLOC for Industry’s data, I used UCC’s data to locally
calibrate the SLOC/FP and SLOC/CFP ratio. The correlation between FPs and SLOC is very
weak for UCC’s data (see top of Figure 16). Hence, I took the average of SLOC/FP ratios for
each data point. The bottom of Figure 16 demonstrates that the power equation between CFPs
and SLOC has a slightly stronger fit than a linear equation. Hence, I used the power equation to
convert CFPs to SLOC. All the conversion ratios are listed in Table 43.
Table 43 Conversion Ratios used in Research Question 1
SLOC/FP by Capers Jones UCC: 53; Industry: 73.5
SLOC/FP locally calibrated 32.625
SLOC/CFP locally calibrated SLOC = 6.9127 * CFPs
1.818
FP/CFP locally calibrated FPs = 1.3125 + 1.4065 * CFPs
115
Figure 16 FPs (top) and CFPs (bottom) mapped against SLOC for UCC’s Dataset
y = 13.455x + 88.523
R² = 0.04657
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Source Lines of Code (SLOC)
IFPUG Function Points (FPs)
UCC FPs vs SLOC
y = 50.87x - 83.826
R² = 0.61909
y = 6.9127x
1.818
R² = 0.64098
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14
Source Lines of Code (SLOC)
COSMIC Function Points (CFPs)
UCC CFPs vs SLOC
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A model for estimating cross-project multitasking overhead in software development projects
PDF
The effects of required security on software development effort
PDF
A model for estimating schedule acceleration in agile software development projects
PDF
Improved size and effort estimation models for software maintenance
PDF
Domain-based effort distribution model for software cost estimation
PDF
The incremental commitment spiral model process patterns for rapid-fielding projects
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Incremental development productivity decline
PDF
Impacts of system of system management strategies on system of system capability engineering effort
PDF
How can metrics matter: performance management reforms in the City of Los Angeles
PDF
Managing functional coupling sequences to reduce complexity and increase modularity in conceptual design
PDF
Analysis using generalized linear models and its applied computation with R
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
3D printing of polymeric parts using Selective Separation Shaping (SSS)
PDF
Functional connectivity analysis and network identification in the human brain
PDF
Energy control and material deposition methods for fast fabrication with high surface quality in additive manufacturing using photo-polymerization
PDF
Nanostructure interaction modeling and estimation for scalable nanomanufacturing
PDF
Context-adaptive expandable-compact POMDPs for engineering complex systems
PDF
Kernel methods for unsupervised domain adaptation
Asset Metadata
Creator
Hira, Anandi V.
(author)
Core Title
Calibrating COCOMO® II for functional size metrics
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/11/2020
Defense Date
04/27/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
COCOMO II,COCOMO(R) II,COSMIC,COSMIC function points,cost estimation,effort estimation,FPA,function point analysis,function points,functional size measurement,functional size metrics,IFPUG,IFPUG function points,OAI-PMH Harvest,parametric cost estimation,parametric effort estimation,parametric estimation model,SLOC,software cost estimation,software cost estimation model,software development cost,software development cost estimation,software development effort,software development effort causes,software development effort estimation,software development effort factors,software effort estimation,software effort estimation model,software engineering,software engineering metrics,software estimation,software estimation model,software management,software metrics,software process improvement,software process management,software process metrics,software size,source lines of code
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry (
committee chair
), Khoshnevis, Behrokh (
committee member
), Teng, Shang-Hua (
committee member
)
Creator Email
a.hira@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-366241
Unique identifier
UC11666308
Identifier
etd-HiraAnandi-8928.pdf (filename),usctheses-c89-366241 (legacy record id)
Legacy Identifier
etd-HiraAnandi-8928.pdf
Dmrecord
366241
Document Type
Dissertation
Rights
Hira, Anandi V.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
COCOMO II
COCOMO(R) II
COSMIC
COSMIC function points
cost estimation
effort estimation
FPA
function point analysis
function points
functional size measurement
functional size metrics
IFPUG
IFPUG function points
parametric cost estimation
parametric effort estimation
parametric estimation model
SLOC
software cost estimation
software cost estimation model
software development cost
software development cost estimation
software development effort
software development effort causes
software development effort estimation
software development effort factors
software effort estimation
software effort estimation model
software engineering
software engineering metrics
software estimation
software estimation model
software management
software metrics
software process improvement
software process management
software process metrics
software size
source lines of code