Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Organizing complex projects around critical skills, and the mitigation of risks arising from system dynamic behavior
(USC Thesis Other)
Organizing complex projects around critical skills, and the mitigation of risks arising from system dynamic behavior
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ORGANIZING COMPLEX PROJECTS AROUND CRITICAL SKILLS, AND THE
MITIGATION OF RISKS ARISING FROM
SYSTEM DYNAMIC BEHAVIOR
by
Neil Gilbert Siegel
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(INDUSTRIAL AND SYSTEMS ENGINEERING)
August 2011
Copyright 2011 Neil Gilbert Siegel
ii
Dedication
To my wife, Robyn.
In paraphrase of Maulana:
iii
Acknowledgements
I first would like to acknowledge my committee chairman, Professor Barry
Boehm, for his guidance, patience, and insight. It was a desire to study with Professor
Boehm that motivated me to undertake a mid-career Ph.D. in systems engineering, while
continuing to work at a full-time job. The experience turned out to be everything I
wanted it to be, for which Professor Boehm deserves principal credit.
I would like to acknowledge Professor James Moore, who was chairman of the
USC systems engineering department at the time of my entering the program, and who
provided much encouragement and guidance in getting through the nuances of the
application and admission process. He also kindly served as a member of my committee.
I would also like to acknowledge my other committee members – Professors Azad
Madni, Ann Majchrzak, and Stan Settles. Professors Madni and Majchrzak were
generous in spending considerable time reading successive iterations of my proposal and
analyses, and, along with Professor Boehm, exerted a decisive influence in helping me to
sharpen my thinking. Professor Settles was department head at the time of my qualifying
exam and defense, and provided support and encouragement through those processes.
I entered private industry more than 30 years ago, having trained as a
mathematician, and started work as a computer programmer. I quickly discovered
systems engineering, and found that that field would become my professional passion and
my life’s work. TRW, my original employer, provided numerous formal and informal
opportunities to learn about systems engineering, for which I am grateful. It is certainly
iv
the case that almost everything I had learned about systems engineering up to the time I
re-entered USC at the age of 54, I had learned at TRW (and its successor company,
Northrop Grumman).
My career at TRW / Northrop Grumman has provided me with tremendous
intellectual stimulation from my peers, with an intense sense of purpose from our
customers, and with life-changing mentoring from a series of brilliant supervisors. I
could form a very long list of specific individuals, but I will let Jack Distaso, Joseph
Mason, Peter Karacsony, and Lieutenant General (retired) Bill Campbell stand for the
total set of these wonderful, talented, and generous people.
I must express specific gratitude to Northrop Grumman, and to a series of recent
supervisors, who let me pursue my Ph.D. while simultaneously holding a demanding and
responsible position within the company, who let me have access to the necessary
company data, and who also kindly paid for my studies. They could easily have decided
that such an educational venture was too late in my career to be of value to the company,
and too likely to interfere with my professional responsibilities. Instead, I received
unwavering support from three consecutive sector presidents, my direct supervisors.
I would be remiss if I did not mention a debt to my parents, both of whom were
aerospace engineers, and from whom I undoubtedly learned both a love of science, and
about the opportunities available in the aerospace industry.
And always my wife Robyn, my companion of (so far) 37 years, who makes
everything worth doing.
v
Table of Contents
Dedication ………………………………………………………………………...…. ii
Acknowledgements ……………………………………………………………........ iii
List of Tables ...………………………………………………………...……………vii
List of Figures …………………………………………….……………………..... viii
Abbreviations ……………………………………………….……………….…….... x
Abstract ……………………………………………………………………….....… xiii
Chapter 1: Background and Motivation ..................................................................1
1.1 Background of the problem. ...............................................................................1
1.2 Statement of the problem ....................................................................................2
1.3 Summary of the approach ...................................................................................7
1.4 Importance of the topic .......................................................................................7
1.5 Definition of terms ..............................................................................................9
1.6 Survey of the literature ......................................................................................13
1.6.1 Understanding the problem domain of the first four cases .........................14
1.6.2 Understanding the problem domains of the remaining cases ....................18
1.6.3 Understanding the independent variables ..................................................19
1.6.3.1 The system design process / dynamic control of systems .......................19
1.6.3.2 The role of people in the process; critical-skill based project
organizations ........................................................................................................34
1.6.4 Problems in the system development process ............................................45
1.6.5 Understanding the dependent variables .....................................................49
1.6.6 The cases and their data .............................................................................52
1.6.7 Methodology ..............................................................................................57
1.6.8 Other sources and references .....................................................................59
1.7 Organization of the study ..................................................................................59
Chapter 2: Hypotheses .............................................................................................61
2.1 Scope of the systems-of-interest ........................................................................61
2.2 Statement of the hypotheses ...............................................................................66
2.3 Design-based technique for centralizing the control of the dynamic behavior .78
vi
Chapter 3: Methodology.........................................................................................100
3.1 Overview of the methodology .........................................................................100
3.2 Selection and discussion of the cases ...............................................................112
3.2.1 Initial case .................................................................................................114
3.2.2 Remaining cases........................................................................................115
3.2.3 Summary of the cases ...............................................................................117
3.3 Preparatory Steps and Study Methodology......................................................127
3.4 Remaining limitations and risks in the study ..................................................161
Chapter 4: Research Results ..................................................................................165
4.1 Analysis of hypothesis a .................................................................................165
4.2 Analysis of hypothesis b. ................................................................................191
4.3 Analysis of hypothesis c. ................................................................................201
4.4 Analysis of hypothesis d .................................................................................205
Chapter 5: Interpretations and Conclusions ........................................................218
Bibliography .............................................................................................................230
Appendices ...............................................................................................................236
Appendix A: Provenance of the data ....................................................................236
Appendix B: The Forward-Area Air Defense Command-Control-and-Intelligence
System ....................................................................................................................243
Appendix C: The Force-XXI Battle Command, Brigade-and-Below System ......251
Appendix D: Partitioning into skill bins employed on two exemplar projects .....260
D.1 On the FAAD C
2
I project ...........................................................................260
D.2 On the FBCB2 / BFT project .....................................................................263
Appendix E: Vitæ – Neil Gilbert Siegel ...............................................................265
vii
List of Tables
Table 2.2-1 SAS – required features – methodology ........................................ 97
Table 2.2-2 SAS – required features – implementing tools .............................. 98
Table 3.1-1. Nomenclature – experiments and non-experiments ..................... 103
Table 3.1-2. Addressing Yin’s four problems in case-study design ................. 112
Table 3.2-1. Cases versus system characteristics ............................................ 118
Table 3.2-2. Cases versus independent and dependent variables .................... 119
Table 3.2-3. Control & Treatment for Dependent Variable 4 .......................... 123
Table 3.3-1. Satisfaction of Yin’s four tests .................................................... 151
Table 3.3-2. Expected results ............................................................................ 153
Table 3.3-3. Threats / plausible rival explanations / predicted data shapes ...... 155
Table 4.1-1. Comparison of 2004 and 2004 Standish Group survey findings. .. 191
Table 4.4-1. Scoring for hypothesis d ............................................................... 210
Table 4.4-2. Excursions for hypothesis d .......................................................... 215
viii
List of Figures
Figure 2.2-1 Hypotheses and dependent variables ........................................... 72
Figure 2.2-2 Procedural description of the system architectural skeleton ...... 91
Figure 3.1-1. A repeated-treatment design ....................................................... 105
Figure 3.1-2. Flow-of-work through the study ................................................. 111
Figure 3.2-1. Dependent variable coverage for the first independent
variable ........................................................................................ 120
Figure 3.2-2. Dependent variable coverage for the second independent
variable ........................................................................................ 122
Figure 3.3-1. Case study protocol – hypothesis a ............................................. 135
Figure 3.3-2. Decomposition of hypothesis b ................................................... 143
Figure 3.3-3. Case study protocol – hypothesis b ............................................. 144
Figure 3.3-4. Case study protocol – hypothesis c ............................................. 145
Figure 3.3-5. Case study protocol – hypothesis d ............................................. 146
Figure 3.3-6. Scalar metric for measuring for hypothesis d ............................. 147
Figure 3.3-7. Method for formulating an interpretation ................................... 161
Figure 4.1-1. Period I contractor test, attributable problem reports by month. . 167
Figure 4.1-2. Period II contractor test,attributable problem reports by month .. 167
Figure 4.1-3. Period III contractor test, attributable problem reports
by month .......................................................................................168
Figure 4.1-4. Periods I-III contractor tests, attributable problem reports
by month ..................................................................................... 169
Figure 4.1-5. Periods I-III contractor tests, attributable problem reports by
month, adjusted for peer-review and unit-test procedure change
during period II ........................................................................... 177
Figure 4.1-6. Cost performance in each of the three project periods ................ 179
ix
Figure 4.1-7. FAAD C
2
I, hypothesis a, with the design-based technique ........ 179
Figure 4.1-8. Project AAAA, hypothesis a, without the design-based
technique ..................................................................................... 180
Figure 4.1-9. Project BBBB, hypothesis a, without the design-based
technique ..................................................................................... 181
Figure 4.1-10. All six hypothesis a data sets on a single plot ............................ 182
Figure 4.2-1. Allocation of the FAAD C
2
I system MTBF target
to the 3 categories ...................................................................... 194
Figure 4.2-2. FAAD C
2
I allocated and predicted / achieved MTBFs ............... 197
Figure 4.2-3. FBCB2 measured software MTBF.............................................. 200
Figure 4.3-1. FAAD C
2
I on-the-move fire unit ................................................ 201
Figure 4.3-2. FAAD C
2
I end-to-end timing thread ........................................... 201
Figure 4.3-3. Distribution of port-to-port timing, without SAS and with SAS. 204
Figure 4.4-1. Data for hypothesis d .................................................................. 214
Figure B-01. The Forward-Area Air Defense System ...................................... 243
Figure B-02. OV-1 diagram for FAAD C
2
I ...................................................... 245
Figure B-03. OV-2 diagram for FAAD C
2
I ...................................................... 246
Figure C-01. FBCB2 computer mounted at the commander’s position in a
Bradley Infantry Fighting Vehicle .............................................. 256
Figure C-02. FBCB2 screen image ................................................................... 256
Figure C-03. Critical FBCB2 user functions .................................................... 257
Figure C-04. Planning display image................................................................ 257
Figure C-05. FBCB2 in a mobile command post ............................................. 259
x
Abbreviations
A2C2 Army Airspace Command-and-Control
ABMOC Air Battle Management Operations Center
ACM Association for Computing Machinery
AFATDS Advanced Field Artillery Tactical Data System
ANSI American National Standards Institute
AWACS Airborne Warning and Control System
BGP-4 Border Gateway Protocol, number 4
BM/C3I Battle Management / Command, Control, Communications, and
Intelligence
BFT Blue-Force Tracker
BPEL Business Process Execution Language
C4ISR Command, Control, Communications, Computers, Intelligence,
Surveillance, and Reconnaissance
CCPDS-R Cheyenne Mountain Complex Processing and Display System –
Replacement
CMM Capability Maturity Model
CMMI Capability Maturity Model -- Integration
C-RAM C2 Counter Rocket-Artillery-Mortar Command-and-Control
C/S Cost / Schedule
CSSCS Combat Service Support Control System
COCOMO Constructive Cost Model
CFAF Cost Plus Award Fee
xi
CFIF Cost Plus Incentive Fee
DoD Department of Defense
DODAF Department of Defense Architectural Reference Framework
ERP Enterprise Resource Planning
FAAD C
2
I Forward-Area Air Defense Command, Control, and Intelligence
FBCB2 Force XXI Battle-Command, Brigade-and-Below
FFP Firm Fixed Price
hIPC Heterogeneous Inter-Process Communications
(note: the lower-case ‘h’ is intentional)
HMMWV High-Mobility Multi-purpose Wheeled Vehicle
IEEE Institute of Electrical and Electronics Engineers
INCOSE International Council on Systems Engineering
IP Internet Protocol
ISO International Standards Organization
JCRV Joint Capability Release -- Vehicles
JDN Joint Distribution Network
LNPL Lower Natural Process Limit
MANPADS Man-Portable Air Defense System
MTBF Mean Time Between Failure
NAS Network Architecture Services
OASIS Organization for the Advancement of Structured Information Standards
xii
OPTEC Operation Test and Evaluation Command
OSPF Open Shortest Path First
OV Operational View
PATRIOT Phased Array Tracking and Intercept of Tracks
PEO Program Executive Officer
PMI Project Management Institute
SAS System Architecture Skeleton
SEI Software Engineering Institute
SERC Systems Engineering Research Center
SLAMRAAM Surface-Launched Advanced Multi-Role Air-to-Air Missile
TCP Transport Control Protocol
THAAD Theater High-Altitude Air Defense
TSG Tactical Systems Gateway
TRW Thompson Ramo Woolridge
UNAS Universal Network Architecture Services
UNPL Upper Natural Process Limit
xiii
Abstract
Many of the key products and services that improve the lives of people, and/or are
vital to the defense of our Nation, are the result of large-scale engineering projects.
Despite decades of theoretical and practical work in the art of systems engineering and
project management, project execution results remain somewhat inconsistent, in the sense
that many projects fail to produce a product that meets the original specifications, and
many more projects achieve some measure of technical success only after taking
significantly more time and/or money than originally expected.
Many of the most interesting and important such systems that are needed by
society, by their very size, require large teams to perform the necessary work. In any
large set of people, however, there will be a distribution of skills and capabilities across
the individuals forming the team. Yet current technical systems engineering design
techniques do not explicitly account for this distribution of personnel skill, and hence
provide no method of explicitly partitioning the work up into “skill bins” whose
distribution might match the skill distribution of the team. Such a mismatch of job
demands against personnel skills is a potential root cause of project failures, schedule
delays, and cost over-runs.
The research described herein assessed (through case studies on large system
development programs) the potential of a candidate design-based improvement technique
for performing such a partitioning of the design / implementation of a complex system
into such skill bins. The specific technique examined aims to reduce unplanned and
xiv
adverse dynamic behavior in the resulting system through design-phase actions that
centralize control of the eventual system’s dynamic behavior, and implement that
centralization as a particular instance of a partitioning of the work. This approach could
lead to increased success on future major system development projects, through a better
matching of work to personnel, and better insight into one method for instituting better
control of the dynamic behavior of such a system.
The study examined a particular method of implementing the partitioning-by-skill
of the work on a large, complex system development project into separate portions,
where most are explicitly intended to be easier than average, while a small portion are
intended to be more difficult than average. The study showed that use of this technique
resulted in better project outcomes, as measured by an indicated set of metrics. This
particular method was examined first in one application problem domain, and then in four
additional application problem domains. This result is likely to be of value, given the
high failure rate still experienced on such system development projects.
1
Chapter 1: Background and Motivation
1.1 Background of the problem.
This dissertation describes a case study in the field of industrial systems
engineering. Many of the key products and services that improve the lives of people,
and/or are vital to the defense of our Nation, are the result of large-scale engineering
projects. Despite decades of theoretical and practical work in the art of systems
engineering for such projects, and in the art of project management, project execution in
fact remains inconsistent, in the sense that many projects fail to produce a product that
meets the original specifications, and many more projects achieve some measure of
technical success only after taking significantly more time and/or money than originally
expected
1
. In the extreme, many such projects are cancelled before completion,
delivering no useful return on the (at times) significant expenditure made before
cancellation
2
.
Many of the most interesting and important such systems that are needed by
society, due to their size and complexity, require relatively large teams to perform the
necessary work. My own industrial experience is primarily on systems that required
teams of between 100 and 1,000 people, and took 3 to 7 years to develop. The members
of such teams specify the system, and validate that specification. They create a system-
level design, and validate that design through modeling, benchmarking, and other
1
For example, (Glass 2001) cites data indicating that only about 16% of the system development projects
that he surveyed were listed as successful by their own developers.
2
The literature is full of examples. See, for example, (Glass 2001), (Flowers 1996), and (Wired 2008).
2
techniques. They decompose the system into smaller functional entities; synthesize those
functional entities into implementation entities (hardware and software); design,
construct, and test all of those individual implementation entities; perform various forms
of step-wise integration of those components, eventually resulting in the complete
system; perform verification and validation on the system at various levels; and create the
support infrastructure needed to train, operate, and maintain the system in its operational
context.
1.2 Statement of the problem
In any such large set of people, there is a distribution of skills and capabilities
across the individuals forming the team. Yet current technical systems engineering
design techniques do not explicitly account for this distribution of skill, and provide no
design-based method of explicitly partitioning the work into “skill bins” whose
distribution might match the skill distribution of the team. Implied in the hypothesis
(chapter 2) to be investigated by this study is the possibility that use of a design-based
technique that performs such a partitioning could improve system development
outcomes, manifested as fewer system development failures, schedule delays, and cost
over-runs.
I have developed such a technique to address this issue, in the context of a
specific class of systems, within a specific application problem domain in which I have
industrial experience. Through this industrial experience, I have applied this technique to
multiple large projects over a period of nearly 20 years, with some apparent success.
This doctoral dissertation will describe this problem, describe the technique I developed
3
and deployed to correct it, use the data from a number of these completed real-world
project applications to assess the efficacy of the technique (using research methods
grounded in the literature), and discuss its improvement, application, and potential
extension to additional problem domains.
There is prior work that recognizes the difference in skill level among system-
and (especially) software-development practitioners, but this work is focused almost
exclusively on the use such a factor in determining the duration and cost of a
development effort. This is a thread of work that in some sense started at TRW in the
early 1970’s, when Robert Walquist asked Barry Boehm to develop a method to estimate
the cost of software for satellite systems. As Dr. Boehm tells it
3
, Mr. Walquist indicated
that they had methods based on actual experience for estimating the cost and duration of
every other element of a satellite system, but the estimate for the entire software element
was based solely on “engineering judgment”, which Mr. Walquist described as
“essentially a guess”. This tasking led to the development by Dr. Boehm of COCOMO
(“constructive cost model”), which included factors related to the skill level of the
personnel involved; the “knob settings” concerning personnel would influence the effort
required (estimated by the model in terms of man-months) and the schedule required
(estimated by the model in terms of calendar months)
4
. A number of successors and
competitors to COCOMO have since emerged, many of which have been extended to
3
Personal communication, 2009
4
See (Boehm 1981).
4
estimate many elements of a system’s development cost, not just the software portion of
the system, e.g., systems engineering, integration, test, training, and so forth.
But neither COCOMO, nor any of its many successors and competitors, have ever
looked at the problem the other way around – e.g., given a particular set of people with a
particular set of skills and limitations, how can one use the design process to create
partitions of work at different skill levels, so that one can have an explicit basis for
allocating these people to tasks in a way that can increase the chance of success (this is
not a defect of COCOMO and its competitors; this was not their purpose). Managers
have always tried to assign personnel to tasks that they believe match the skill levels and
abilities of those personnel, but the literature is apparently silent on the question of using
the design process itself to create partitions of work at different skill levels to aid this
process. The value of such an approach, if it were feasible, would be to lessen the
dependence on judgment as the sole basis regarding what work should be assigned to
which people, through the creation of segments of work known to be harder (or at least,
to require certain specific, and perhaps rare, skills), and therefore, decreasing the average
skill level required in the remainder (and hopefully, the majority) of the program
technical tasks.
An analogy to the well-know “Pareto principle” can be drawn here; a Pareto
diagram
5
is a histogram of the relative contribution of elements to a total effect; for
example, a Pareto diagram might bin various root-causes of problems reported with a
product. The Pareto principle is that, in general, a large portion of the effect is caused by
5
See (Juran 1951).
5
a small portion of the elements. The implication to the matter under discussion herein is
that, in many system development efforts, a large portion of the complexity and difficulty
may come from a relatively small portion of the total effort, and that an endeavor –
grounded in the design process – to separate the more-difficult from the merely routine
elements of the work can form a basis for isolating much of that complexity and risk into
a relatively small portion of the development tasks, which can then be assigned to a
small, but suitably skilled, portion of the team.
In this dissertation, the subject of the hypothesis is one specific such technique
that I developed in the late 1980’s and early 1990’s, and applied to a series of major
Department of Defense system-development programs (that shared a set of system-level
characteristics that will be described) at and since that time. In these systems, disparate
devices, processes, and people are being brought together to perform in new and more
useful configurations. These large-scale systems display complex emergent behaviors.
6
These include complex dynamic behavior – behavior that changes over time, in response
to various internal and external stimuli. In large systems, the number and variation in
these stimuli, and the large number of possible internal states may include combinations
of dynamic behavior that have undesirable effects – adversely affecting, for example,
timing, capacity, or reliability. It is desirable to control system dynamic behavior through
6
The following defines what is meant by ‘emergent behavior’ in the context of such a system: “The whole
(of a system) is greater in some sense than the sum of the parts, that is, the system has properties
beyond those of the parts. Indeed, the purpose of the building systems is to gain those properties”
(Rechtin 1991).
6
the design and implementation process, so that there is little such “unplanned dynamic
behavior”.
The specific “hard” portion of the system design and implementation effort for the
systems that will be considered herein, therefore, is the problem of controlling and
managing system dynamic behavior. This will be seen to be a key element of system
design defects and difficulties. A technique to address this difficulty will be described,
one that centralizes the control and management of the system’s dynamic behavior during
system operation, through a series of actions taken during the design and implementation
phases of the system’s development. The goal of this centralization is to partition the
design and implementation work, so that only a small portion of the development team
need be expert at the task of implementing this mechanism for controlling the system’s
dynamic behavior.
This specific technique involves the use of a certain class of “middleware”, used
to implement a particular system development, performance-prediction, and system
integration methodology. I will demonstrate that these techniques isolate one specific
type of difficult technical problem for these systems into a relatively small portion of the
implementation. This thereby provides a basis for allocating work to specific individuals
in a manner that allows the assignment of work of average (or less) difficultly to the
majority of the team (who, of course, will, on average, have average-or-less skill). The
specific hypotheses (chapter 2) simply state that such matching of degree-of-difficulty to
skill will increase the chance of program success.
7
1.3 Summary of the approach
The work takes the form of an observational case study. Drawing on my personal
and professional experience, informed by the literature, I scope the problem, and then
formulate specific, testable hypotheses. Drawing upon methodology in the literature, I
then formulate measurement instruments and a case study protocol. A few specific
aspects of the study take the form of a quasi-experiment; drawing upon methodology
from the literature, I define the risks to the study, and the methods to assess those risks.
Using data drawn from ten real-world projects and personal / professional experience, I
collected and organize the case documentation. I then apply the measurement
instruments to the cases. I apply tests draw from methodological literature to assess the
validity of the analyses. Finally, I formulate an interpretation of what the cases indicate
about the hypotheses.
1.4 Importance of the topic
Insight that provides understanding of some root causes of common sources of
development difficulties on large-scale, software-intensive systems development projects
(to say nothing of solutions for some of those root causes) would be valuable to
practitioners, and valuable to a society that depends on the products and systems
produced by those practitioners. Moreover, as noted by (Ramo & Booton 1984), there is
a major trend towards an “increase in the complexity of systems being routinely
developed”; thus, such improvements in system development practice are needed by a
society that depends on these increasingly-complex systems.
8
The application problem domain selected – tactical military command-and-
control, and military decision-support systems – is appropriate because (a) the domain is
one that contributes to National security – and hence, improving performance in this
domain serves the public purpose of improving National security; (b) the domain is one
that involves the expenditure of significant amounts of public funds – billions of dollars
every year – and hence, improving performance in this domain serves a significant public
fiscal purpose; and (c) there may be reason to believe that the results hereby obtained
could be applied to other application problem domains, and thereby, provide additional
benefits to society.
The problem of achieving consistent success on large-scale, complex system
development project is clearly a difficult topic, and one that is worthy of study; the
continuing occurrence of failures on large, complicated system development efforts
indicates that the problem is difficult (else such failures, which at times can cost billions
of dollars in a single instance, would no longer occur). It is worthy of study because such
complicated systems are important for society, and the failures can involve the loss of
significant amounts of public funds. Even the narrow, specific approach to this problem
proposed in this study, e.g., centralizing responsibility for management of dynamic
behavior, is a hard problem: a large portion of the system may be involved in dynamic
behavior, yet the goal is to limit the control and mechanization of that dynamic behavior
to only a small portion of the implementation. This requires that most system
components be able to “inherit” a set of control behaviors without having specifically to
implement them, or even be aware of them. Furthermore, not all dynamic behavior can
9
be predicted and planned during initial design (especially as most systems these days are
implemented in “spirals”
7
of incremental capability), and thus the centralizing process
needs to allow for flexibility and adaptability, and to do so without having these
adaptations “ripple” into large portions of the design and implementation. This involves
having a function a high level of internal complexity (e.g., controlling dynamic behavior)
exhibit a low level of external complexity (e.g., the process through which the remaining
system components “inherit” these controls).
1.5 Definition of terms
Case study – A method of performing research, preferred when the investigator
has little control over events. It draws on direct observation, data collection,
and/or interviews of participants, in addition to assessment of data. A case study
design specifies what questions to study, what data are relevant, what data to
collect, and how to analyze the results.
8
Case study protocol – A case study protocol is more than a data collection
instrument; per (Yin1994) “the protocol contains the instrument, but also contains
the procedures and general rules that should be followed in using the instrument”.
Quasi-experiment – According to (Cook & Campbell 1979), the term quasi-
experiment has come into use to denote “experiments that have treatments,
outcome measures, and experimental units, but do not use random assignments to
create the comparisons from which treatment-caused change is inferred”. Not all
7
See (Boehm 1988) for a description of the spiral model.
8
Drawn from (Yin 1994)
10
situations lend themselves to the use of randomization; critical elements might
not, for example, be under the control of the experimenter, and therefore, the
option of random assignment might not be available. Earlier works, e.g.,
(Campbell & Stanley 1963), use the term ‘pre-experimental’, rather than ‘quasi-
experiment’; they distinguish this from ‘true experimental’ designs, e.g., those
that include the use of randomization.
Hypothesis – herein, a hypothesis is simply the key question whose correctness
(or lack thereof) is being assessed via the case study methodology. It is framed as
an affirmative statement, and therefore susceptible of a “yes / no” answer, but it is
also desirable to draw out insight (in the form of narrative, in addition to
quantitative analysis) that can provide insight into the “hows” and “whys” of the
phenomena.
System -- An integrated composite of people, products, and processes that
provide a capability to satisfy a stated need or objective
9
. Per (Rechtin 1991), a
system is a set of parts that displays emergent behavior, through the interaction
amongst those parts.
Large-scale system – A system whose creation is a significant undertaking, in
terms of time, number of participants, and funding. In terms of absolute scale,
this might mean upwards of 1,000 man-years of effort.
9
A U.S. Department of Defense definition, quoted from (Jackson & Hines 2009)
11
Project – A project is a temporary endeavor undertaken to create a unique
product, service, or result
10
.
Phases of a system-development project – The literature cited by this study
provides guidance and discussion of various aspects of the problem of developing
large-scale, complicated systems. All of the citations adopt an approach of
dividing the activity of developing such a system into separate, approximately
sequential steps, which are often called “phases” of the development activity.
They typically include activities such as requirements definition, design,
implementation, integration, test, operations and support, and so forth. The
phases may be assembled into an actual work-plan in a variety of ways:
sequentially (usually called “waterfall”), iteratively (for example, Boehm’s
“Spiral Method”
11
), and others.
Design phase of a project – That portion of the system-development activity
concerned with translating the requirements (e.g., what the system must do) into a
specific and tangible method of accomplishment (e.g., how the system will
perform so as to meet the requirements). The design phase includes steps such as
functional analysis and decomposition, synthesis, performance prediction and
design validation, and so forth.
10
Quoted from Ursula Knopp‐McKendree, from her class notes for ISE‐515, Introduction to Project
Management, 2008. She attributed this definition to the Program Management Body of Knowledge.
11
Described in (Brooks 2010), among other places.
12
Implementation phase of a project – That portion of the system-development
activity concerned with constructing the elements of the system, in accordance
with the boundaries, strategies, logic, and other factors defined during the design
phase.
Design-based technique – A methodology to achieve a goal whose
accomplishment is based on activities performed during the design-phase of a
project. Herein, this phrase is used specifically to refer to a technique to
centralize the control of the dynamic behavior of a system through actions that
largely are taken during the design phase of the project, rather than through the
latter phases of the project.
Dynamic behavior of a system – As defined above, a system is a set of parts that
displays emergent behavior, through the interaction amongst those parts. These
emergent behaviors include dynamic behavior – that is, behavior that changes
over time, in response to various stimuli and state transitions.
Unplanned dynamic behavior – In large systems, the number and variation in
these stimuli, and the large number of possible states may include combinations
of dynamic behavior that were not planned for in advance. These constitute
unplanned dynamic behavior. Such unplanned dynamic behavior could have
either desirable or undesirable effects – for example, an adverse effect on timing,
capacity, or reliability. To avoid such undesirable unplanned dynamic behavior, it
is desirable to control system dynamic behavior through the design and
implementation process.
13
Skill bins – One of the ideas being investigated in this study is the idea of using
the design process to centralize the control mechanisms for influencing the
dynamic behavior of a large-scale system. This is a specific instance of the more
general idea of using the design process to partition the implementation of a
system into separable units of work that have identifiable portions that are harder
and easier, or at least, segregate those that require scarce or special skills that
might not be available in the general engineering population. These partitioned
separable units of work are referred to herein as skill bins.
1.6 Survey of the literature
The relevant literature is reviewed in eight sections, as follows, in order to
establish a baseline of knowledge, and to identify gaps in the literature that can be
addressed in this work:
Understanding the problem domain of the first four cases
Understanding the problem domain of the remaining cases
Understanding the independent variables
Problems in the system development process
Understanding the dependent variables
The cases and their data
Methodology
Other references
14
1.6.1 Understanding the problem domain of the first four cases
As noted above, the first four cases are all drawn from the same application
problem domain, that of large Government tactical command-and-control system
acquisitions. The following are key topics and literature that address this problem
domain.
C4ISR systems, large-scale projects
(DoD 2004) U.S. Department of Defense 2004, “Net-Centric Operations
and Warfare Reference Model (NCOW RM)”,
http://akss.dau.mil/dag/Guidebook/IG_c7.2.1.4.asp and “Net-Centric
Enterprise Solutions for Interoperability 2005, Part 2: ASD (NII) Checklist
Guidance, Net-Centric Enterprise Solutions for Interoperability”,
http://nesipublic.spawar.navy.mil/files/NESI_Part2_v1_1.pdf
(DODAF 2010) U.S. Department of Defense, DoD Architecture Reference
Framework, version II, 2010.
(Hughes 1998) Hughes, Thomas P., Rescuing Prometheus, Vintage Books,
1998.
(Madni & Moini 2007) Madni, A.M., and Moini, A. “Viewing Enterprises
as Systems-of-Systems (SoS): Implications for SoS Research,” Journal of
Integrated Design and Process Science, Vol. 11, No. 2, June 2007.
Software-intensive systems
(Browning 2001) Browning, Tyson R., “Applying the Design Structure
Matrix to System Decomposition and Integration Problems: A Review
15
and New Directions”, , IEEE Transactions On Engineering Management,
volume 48, number 3, 2001.
(Browning 2008) Browning, Tyson R., “The Many Views of a Process:
Toward a Process Architecture Framework for Product Development”,
Published Online: 2008, Wiley InterScience
(www.interscience.wiley.com)
(Cockburn 2007) Cockburn, Alistair, Agile Software Development,
Addison Wesley, 2007.
(Dijkstra 1988) Dijkstra, Edsger, “On the Cruelty of Really Teaching
Computer Science”, an open letter, 1988.
(Haberfellner & de Week 2005) Haberfellner, R and de Week, O, “Agile
SYSTEMS ENGINEERING versus AGILE SYSTEM engineering”,
Proceedings of the Fifteenth Annual International Symposium of the
International Council on Systems Engineering (INCOSE), 2005.
(Medvidovic 2003) Medvidovic, Nenad; et al. “The Role of Middleware in
Architecture-Based Software Development”, International Journal of
Software Engineering and Knowledge Engineering, 2003.
(Medvidovic 2010) Medvidovic, Nenad, “Software Architecture and
Mobility: A Perfect Marriage or an Uneasy Alliance?”, lecture (with
briefing charts) delivered 14 September 2010.
16
Project management, systems engineering
(ANSI 2004) American National Standards Institute, A Guide to the
Project Management Body of Knowledge, Project Management Institute,
3
rd
edition, 2004.
(Austin et al) Austin, Simon; Newton, Andrew; Steele, John; & Waskett,
Paul, “Modeling and Managing Project Complexity”, Loughborough
University.
(Bertelsen & Koshela 2004) Bertelsen, Sven & Koskela, Lauri, “Avoiding
and Managing Chaos in Projects”, 2004.
(Clements et al 2003) Clements, Paul; Bachmann, Felix; Bass, Len;
Garlan, David; Ivers, James; Little, Reed; Nord, Robert; Stafford, Judith;
Documenting Software Architectures, Addison-Wesley, 2003.
(INCOSE 2007) INCOSE, Systems Engineering Handbook, August 2007
(Meredith & Mantel 2006) Meredith, Jack R. & Mantel, Samuel J., Project
Management, John Wiley & Sons, 2006.
(Pich et al 2002) Pich, Michael T.; Loch, Christoph H.; & De Meyer,
Arnoud, “On Uncertainty, Ambiguity, and Complexity in Project
Management”, Management Science, 2002.
(PMI 2004) Project Management Institute, A Guide to the Project
Management Body of Knowledge, ANSI standard ANSI/PMI 99-001-
2004, Project Management Institute, 2004.
17
(Ramo & Booton 1984), Ramo, Simon & Booton, Richard, “The
Development of Systems Engineering”, IEEE Transactions on Aerospace
and Electronics Systems, July 1984.
(Ramo & St. Clair, 1998) Ramo, Simon; and St. Clair, Robin, The Systems
Approach, KNI, 1998.
(Rechtin 1991) Rechtin, Eberhardt; Systems Architecting, Prentice Hall,
1991
(Ring & Madni 2005) Ring, J. and Madni, A.M. “Key Challenges and
Opportunities in ‘System of Systems’ Engineering,” Proceedings of the
2005 IEEE International Conference on Systems, Man, and Cybernetics,
October 10-12, 2005, Hawaii.
Government acquisition & development standards
(DoD 2001) U.S. Department of Defense, Systems Engineering
Fundamentals, Defense Acquisition University Press, January 2001.
(GSA 2005) The Federal Acquisition Regulation, United States General
Services Administration, 2005.
Industry development standards
(Northrop 2008) Northrop Grumman Mission Systems, Engineering for
Integration Guidance, 2008
(Northrop 2009) Northrop Grumman Mission Systems, Standard Process
Manual, revision of April 2009
18
(TRW 1976) TRW Incorporated, Systems Engineering & Integration
Division Software Development Standards (training manual), TRW, 1976
(TRW 1989) TRW, Ada Process Model, TRW Systems Engineering and
Development Division, 1989
(TRW 1994) TRW, Army Systems Organization Engineering Process
Document, 1994
1.6.2 Understanding the problem domains of the remaining cases
As noted above, the remaining cases are drawn from four separate application
problem domains. The following literature teaches about these problem domains.
Large-scale information storage and retrieval
(Apers & Weiderhold, 1989) Apers, Peter M.G. & Weiderhold, Gio
(editors), Proceedings 1989 International Conference on Very Large Data
Bases, Morgan Kaufman Publishers, 1989.
(Veeravalli & Barlas 2006) Veeravalli, Bharadwaj; & Barlas Gerassimos,
Distributed Multimedia Retrieval Strategies for Large Scale Networked
Systems, Springer, 2006.
Logistics automation:
(Horn et al 1998) Horn, Will H.; Clark, Dorothy M.; Browne, John W., Jr.;
& Bienlien, James P., Handbook for Army Logistics Automation ,
Logistics Management Institute, 1998.
19
(Silver et al 1998), Silver, Edward A.; Pyke, David F.; & Peterson, Rein,
Inventory Management and Production Planning and Scheduling, John
Wiley & Sons, 1998.
Radar
(Gould 1972), Gould, Karl; Understanding Radar, privately published,
1972.
Air Traffic Control
(Nolan 1999) Nolan, Michael S., Fundamentals of Air Traffic Control,
Wadsworth Publishing Company, 1999.
1.6.3 Understanding the independent variables
1.6.3.1 The system design process / dynamic control of systems
The literature reviewed in this area is of three broad sorts: (a) standards and
procedure documents (from standards organizations, from the Government, and from
private industry); (b) books and articles about the system design process; and (c)
academic papers on elements of the system development process related to this study
(e.g., about reliability, resilience, etc.).
The first category of literature reviewed in this area, the standards and procedure
documents from standards organizations, from the Government, and from private
industry, are fairly consistent. They are largely procedural guides – sequences of steps to
be performed, the rationale for those steps, the products of each step, criteria for
discerning whether each step is done well, and examples of representations or analyses to
be produced. Some cover systems engineering procedures; some cover software
20
engineering procedures; some cover both. According to (Hughes 1998), credit for the
origins of modern systems engineering as a formal, rigorous discipline belongs largely to
Dr. Simon Ramo, founder of TRW, in conjunction with his leadership of the development
of the U.S. intercontinental ballistic missile in the 1950’s. These origins are discussed in
(Ramo & Booton 1984), while (Ramo and St. Clair, 1998) discusses Dr. Ramo’s
philosophical underpinnings for what he terms the “systems approach”. (Hughes 1998)
and (Neal 1962) describe the original application of this systems approach to the
intercontinental ballistic missile program, for which Dr. Ramo served as the initial
program manager.
The earliest example of which I have become aware of a systematic approach to
software development policies and procedures was undertaken by TRW in the mid-
1970’s; the resulting artifact is (TRW 1976).
The examples of standards and procedures documents reviewed herein are
suitable for this study, because they represent (a) the principal customer for whom the
work in the case studies was performed, (b) the prime contractor who performed that
work, and (c) the principal standards bodies used by this segment of the industry.
Both Government and industry (separately, but also together, via standards bodies
such as INCOSE
12
, ACM
13
, IEEE
14
, etc.) have widely adopted this approach of codifying
12
International Council on Systems Engineering; see http://www.incose.org/.
13
Association for Computing Machinery; see http://www.acm.org/.
14
Institute of Electrical and Electronics Engineers; see http://ieee.org/index.html.
21
desired policies and procedures for performing systems engineering and software
development; examples include (ANSI 2004), (INCOSE 2007), (Northrop 2009), (OASIS
2007), (PMI 2004), (TRW 1976), (TRW 1989), (TRW 1994), (DoD 2001), and (DoD
2004). As noted in (Boehm 2010), the inclusion of large amounts of software in most of
today’s complicated systems has created a movement to blur and/or combine these two
disciplines. Other recent trends are the inclusion of (or reference via link to) example
artifacts [e.g., (Northrop 2009) ], and a standardization on a set of “views”, e.g.,
representations to be created by the specified analyses; earlier approaches had attempted
to standardize on the tools to be used for these analyses, rather than on the
representations, but it is now generally recognized that tools become obsolete too fast to
form a suitable basis for standardization. The result is the adoption instead of standard
views [e.g., (Northrop 2008), (DODAF 2010), etc. ].
Early process guidance documents in this area [e.g., (TRW 1976) ] tended to be
content with specifying the steps and products desired. Another important trend is to
attempt to specify the goals of each activity, so as to provide clarity of purpose, and to
permit tailoring and adaptation of the process guidance to each business situation. For
example, (Northrop 2009) and (TRW 1994) define a set of steps or phases for systems
engineering and software development, and then for each such phase, specify goals, such
as these for the design phase of a system development effort:
Develop the top-level execution architecture for systems
Define the top-level system organizational structure
22
For each entity within the structure, develop threads, define interfaces, develop
design representations, and prototype critical design items
Verify the overall structure and flow
Check constructs for correctness
The second category includes books and articles that discuss the system / software
development process, either as a general discipline, or descriptions of specific activities
[such as (Hughes 1998) and (Neal 1962), already cited above].
The third category – academic papers – includes philosophical ideas and
experimental results that the authors hope will guide future developments of the process
guidance documents, and future actual practice. (Dijkstra 1988), (Friedman 2005), and
(Rechtin 1991) are examples of this; I will discuss these in more detail below.
Somewhat in between categories 2 and 3 are (Brooks 1975) and (Brooks 2010).
While not academic in orientation, these are serious and important works in the field by
someone who has been both a highly-effective practitioner and an academic. He is
concerned with how to achieve good designs for our systems. Whereas (Brooks 1975)
focuses on tools and processes, by the time the author wrote (Brooks 2010), he was
concerned almost entirely on good designers, and how to grow them. This is consistent
with my own industrial experience; in (Siegel 2010) I emphasize the role of good
designers, and place the role of process as achieving repeatability and scale, but not in-
and-of-itself as being capable of leading us to good designs.
23
The literature exposes key threads of thought in the field, some of which conflict.
Attempting to resolve one aspect of this conflict leads directly to my own proposed
contribution. I will try to explain.
What is the key philosophy of systems engineering? (Jackson & Hines 2009) and
(Jackson 2009) answer “hierarchies”: the system itself is decomposed into a hierarchy;
the implementation takes place via this hierarchy (e.g., most of the implementation time
is spent building “small” components that will later be integrated to realize the system);
the process of managing the creation of the system is a hierarchy [think of the “V”
diagram, used in (Jackson & Hines 2009) and many other places: “down” to perform
decomposition, and then “up” again to integrate]; most of the accompanying artifacts
(e.g., specifications, test documents, etc) are created in hierarchies; and so forth.
(Ramo and Booton 1984) disagree – they say that the key philosophy of systems
engineering is thinking about the system as a whole: “ . . . the design of the whole as
distinguished from the design of the parts”. To them, the key role is for someone to step
forward and be the advocate for the behavior that the system will display as a whole, to
take responsibility for figuring out what this behavior needs to be, and then ensuring that
the activities executed on the project will lead (in some sense that can be validated) to
achieving this system-level behavior. They see hierarchies not as the essence, but as a
tool. (Rechtin 1991) adds to this the insight of emergent behavior as the key
characteristic of a system, and thereby, provides the systems engineer with a tangible
system-level behavior with which to concern him or herself.
24
This is not the only philosophical disagreement in the literature; another concerns
how to proceed. As indicated above, (Jackson & Hines 2009), (TRW 1976), (Ramo &
Booton 1984), and (Ramo & St. Clair, 1998) among many others, emphasize
decomposition as the principal technique by which to design and implement a system.
This has proven to have a lot of merits; for example, it does seem to allow people to work
past what in my own writings I have termed the “platitude barrier”, wherein a team
reaches consensus on a top-level goal, but has a difficult time determining what tangible
steps should be undertaken in order actually to realize that goal. Decomposition seems to
be reasonably effective at working past that point. The problem, in the view of some, is
that decomposition allows, or even encourages, the systems engineer to place his (or her)
focus on something other than the behavior of the system as a whole. This school of
thought, as exemplified by (Dijkstra 1988) and (Friedman 2005), instead advocate an
approach based on detailed mathematical modeling of the complete system, rather than
on decomposition; Dr. Friedman even speaks of the need to “declare independence from
functional decomposition”
15
. Their view is that systems display complex, high-
dimensional behavior, and that decomposition forces the systems engineer to use
judgment and/or intuition to “shed” dimensions as they undertake a sequential set of low-
dimensional analyses in order to understand and predict eventual system behavior, rather
than trying to model the entire system in a single analysis. The problem, as they see it, is
that in the absence of a method to guide such shedding, people will choose poorly in
designing such a sequence of low-dimensional problems to analyze, and the result will be
15
Personal communication, 27 September 2010
25
a poor predictor of the behavior of the eventual system. Their view that there is no
substitute for complete, high-dimensional modeling. This is an interesting and
compelling vision, but in my view there are two issues associated with it: (a) there are not
nearly enough skilled practictioners at present to do this consistently across all major
systems developments; and (b) the process and representations involved are inherently
complex, and do not provide transparency to all of the stakeholders.
With regard to (a), the situation is probably correctable. (Dijkstra 1988) is
precisely a plea to change computer-science education so that all computer scientists are
adept in this approach (although I observe that this call, despite Dijkstra’s stellar
reputation, seems to have been ignored), and I would guess that the 82-year-old Dr.
Friedman is still teaching exactly so as to increase the number of systems engineers who
are adept at his mathematical modeling approach, and to guide graduate students who are
extending his theory and approach.
Unfortunately, (b) may be a harder problem. The job of the systems engineer is
inherently a mixture of technical and social aspects. On the technical side, the systems
engineer is responsible for defining the needed system-level behavior of the resulting
product, and for ensuring that the design and implementation process lead to the creation
of those behaviors. On the social side, the systems engineer is responsible for creating
consensus around those goals. To accomplish this, the system engineer must use
persuasion, which is aided by tangible artifacts that convey and express key messages,
that is, by artifacts that are transparent enough to be understood not just by expert
practitioners, but by the large number of non-technical stakeholders usually involved in a
26
system development effort. It is my view that, unfortunately, at least at present, the
representations of whole-system mathematical models are too complex to be effective for
this important purpose
16
. Over time, this may be addressable via the creation of a set of
what Dr. Friedman terms “projections”, which select a set of dimensions from the high-
dimensional math model, project the high-dimensional object into those dimensions, and
thereby create a lower-dimensional view of the system model that can be used to
illustrate some particular feature or characteristic of the model.
(Rechtin 1991) tries to get out of this dilemma through the use of heuristics to
convey and communicate the top-level goals of the system, and also to guide the systems
engineer through the key decisions necessary to define and decompose the system. While
having great respect for Dr. Rechtin (whom I knew), I respectfully disagree with his faith
in heuristics for this purpose. In my own work, I have found that heuristics are effective
after-the-fact in conveying what was decided, but during the actual engineering process
are a poor guide for decision-making; there are too many potential heuristics [I myself
created several in (Siegel 2009 a) ], and in the end, one needs to find other methods of
guiding decision-making.
This starts to get closer to my own proposed work: (Rechtin 1991) points out the
central role of emergent behavior in a system, much of which is associated with
dynamics; (Siegel 1993 b) points out that not all of this emergent behavior is desirable.
In fact, my own lesson-learned is that explicitly defining the desirable emergent behavior,
16
Dr. Friedman also recognizes this point: Such “modeling has fundamental problems; valuable (yet) anti‐
intuitive results are suspect.”, SAE‐542 class notes, 4 October 2010.
27
and structuring the control logic of the system to be support that behavior and no other
(which might be undesirable emergent / dynamic behavior) is a key systems engineering
role. (Dijkstra 1988) and (Friedman 2005) propose to accomplish this work through their
high-dimensional mathematical models, but as I have noted, there may not be enough
pratictioners always to do this, and the accompanying representations are weak for the
social purposes of systems engineering. Most of the literature is content with weaker
methods. This is a gap that I will try to address in this work, through the definition of
what I believe is a feasible and effective way to identify which portions of system
behavior can effectively be analyzed separately (e.g., the sequence of low-dimensional
models to which reference was made in the previous chapter), and which portions must
be kept unified; this is my attempt to avoid the problem cited by (Friedman 2005) that
without such guidance, people will make poor decisions about separating the dimensions
for modeling. The key of my approach, however, is to identify a core of the system that
is responsible for dynamic behavior that I treat as indivisible, and in fact, do not attempt
to model; I prototype and exercise it, obtaining benchmarks, and insight through
instrumentation. I define this core through a method that starts with the identification of
every independently-schedulable entity within the system. This is the system architecture
skeleton discussed in chapter 2.
Another key point in relation to this study is the fact that although the role of
people is recognized in several of these references, none of them link the role of people to
tangible steps that can or should be taken during the design process to optimize, in some
sense, the use of those people. None of them approach the central idea behind this study:
28
that an additional goal can and should be added to the technical design phase of a system
development, one that establishes the desirability of using the technical design phase to
partition the work into “skill bins” that provide specific guidance in determining how best
to assign work to specific individuals. This forms another relevant gap in this literature,
one that I will attempt to address by this study.
If this study is successful, not only could it represent a contribution to this
literature, but the approach embodied herein may become a portion of standards
documents, such as those cited above. Other authors may be able to extend and improve
the approach, while others may challenge it, allowing the technique to be sharpened and
improved. The following literature teaches about these problem domains:
(ANSI 2004) American National Standards Institute, A Guide to the Project
Management Body of Knowledge, Project Management Institute, 3
rd
edition, 2004.
(Brooks 1975) Brooks, Fredrick P., The Mythical Man-Month, Essays on Software
Engineering, Addison-Wessley, 1975.
(Brooks 2010) Brooks, Fredrick P, The Design of Design, Addison Wesley, 2010.
(Browning 2001) Browning, Tyson R., “Applying the Design Structure Matrix to
System Decomposition and Integration Problems: A Review and New
Directions”, , IEEE Transactions On Engineering Management, volume 48,
number 3, 2001.
(Browning 2008) Browning, Tyson R., “The Many Views of a Process: Toward a
Process Architecture Framework for Product Development”, Published Online:
2008, Wiley InterScience (www.interscience.wiley.com)
29
(Cockburn 2007) Cockburn, Alistair, Agile Software Development, Addison
Wesley, 2007.
(Cureton 2010) Cureton, Kenneth, SAE-550 class notes, USC School of
Engineering, 2010.
(Dijkstra 1988) Dijkstra, Edsger, “On the Cruelty of Really Teaching Computer
Science”, an open letter, 1988.
(Friedman 2005) Friedman, George, Constraint Theory Multidimensional
Mathmatical Model Management, Springer, 2005
(Haberfellner & de Week 2005) Haberfellner, R and de Week, O, “Agile
SYSTEMS ENGINEERING versus AGILE SYSTEM engineering”, Proceedings
of the Fifteenth Annual International Symposium of the International Council on
Systems Engineering (INCOSE), 2005.
(Hughes 1998) Hughes, Thomas P., Rescuing Prometheus, Vintage Books, 1998.
(INCOSE 2007) INCOSE, Systems Engineering Handbook, August 2007
(Jackson & Hines 2009) Jackson, Scott, & Hines, James, lecture notes for SAE-
541, University of Southern California, 2009
(Lu 2009) Lu, Stephen C-Y, “Complexity in the Design of Technical Systems”,
Annals of the CIRP, volume 58, 2009
(Madni 2006) Madni, A.M. “The Intellectual Content of Systems Engineering: A
definitional Hurdle or Something More?” Fellows’ Insight, INCOSE INSIGHT,
Vol. 9, No.1, 2006.
30
(Madni & Moini 2007) Madni, A.M., and Moini, A. “Viewing Enterprises as
Systems-of-Systems (SoS): Implications for SoS Research,” Journal of Integrated
Design and Process Science, Vol. 11, No. 2, June 2007.
(Madni 2008 a) Madni, A.M. “AgileTecting™: A Principled Approach to
Introducing Agility in Systems Engineering and Product Development
Enterprises,” Journal of Integrated Design and Process Science, Vol. 12, No. 4,
December 2008.
(Madni 2008 b) Madni, A.M. “Agile Systems Architecting: Placing Agility
Where it Counts,” Conference on Systems Engineering Research (CSER), 2008.
(Madni 2008 c) Madni, A.M. “Architecture Follies: Common Misconceptions and
Erroneous Assumptions”, Fellows’ Insight, INCOSE INSIGHT, Vol. 11, No. 1,
pp. 33-34, January 2008.
(Madni 2008 d) Madni, A.M. “Agile Systems Architecting: Placing Agility
Where it Counts,” Conference on Systems Engineering Research (CSER), 2008.
(Madni 2008 e) Madni, A.M. “Architecture Follies: Common Misconceptions and
Erroneous Assumptions”, Fellows’ Insight, INCOSE INSIGHT, Vol. 11, No. 1,
January 2008.
(Madni & Jackson 2008) Madni, A.M., and Jackson, S. “Towards a Conceptual
Framework for Resilience Engineering,” IEEE Systems Journal, Special issue on
Resilience Engineering, Paper No. 132, 2008 (accepted for publication).
31
(Medvidovic 2003) Medvidovic, Nenad; et al. “The Role of Middleware in
Architecture-Based Software Development”, International Journal of Software
Engineering and Knowledge Engineering, 2003.
(Medvidovic 2010) Medvidovic, Nenad, “Software Architecture and Mobility: A
Perfect Marriage or an Uneasy Alliance?”, lecture (with briefing charts) delivered
14 September 2010.
(Neal 1962) Neal, Roy, Ace in the Hole, Doubleday & Company, 1962.
(Northrop 2008) Northrop Grumman Mission Systems, Engineering for
Integration Guidance, 2008
(Northrop 2009) Northrop Grumman Mission Systems, Standard Process
Manual, revision of April 2009
(OASIS 2007) OASIS, Web Services Business Process Execution Language
Version 2.0, Organization for the Advancement of Structured Information
Standards, 2007
(OMG various dates) Object Management Group (OMG), CORBA middleware
specifications
(http://www.omg.org/technology/documents/spec_catalog.htm#Middleware),
various dates
(PMI 2004) Project Management Institute, A Guide to the Project Management
Body of Knowledge, ANSI standard ANSI/PMI 99-001-2004, Project
Management Institute, 2004.
32
(Ramo & Booton 1984), Ramo, Simon & Booton, Richard, “The Development of
Systems Engineering”, IEEE Transactions on Aerospace and Electronics Systems,
July 1984.
(Ramo & St. Clair, 1998) Ramo, Simon; and St. Clair, Robin, The Systems
Approach, KNI, 1998.
(Rechtin 1991) Rechtin, Eberhardt; Systems Architecting, Prentice Hall, 1991
(Ring & Madni 2005) Ring, J. and Madni, A.M. “Key Challenges and
Opportunities in ‘System of Systems’ Engineering,” Proceedings of the 2005
IEEE International Conference on Systems, Man, and Cybernetics, October 10-
12, 2005, Hawaii.
(Roshandel et al 2004) Roshandel, Roshanak; Medvidovic, Nenad, et al, “Mae –
A System Model and Environment for Managing Architectural Evolution”, ACM
Transactions on Software Engineering and Methodology. 2004.
(Salasin & Madni 2007) Salasin, J., and Madni, A.M. “Metrics for Service
Oriented Architecture (SOA) Systems: What Developers Should Know,” Journal
of Integrated Design and Process Science, Vol. 11, No. 2, June 2007.
(Siegel 1993 b) Siegel, Neil, “Lessons-Learned with Ada in Building the
Forward-Area Air Defense Command, Control, and Intelligence System”, a
chapter in Ada: Lessons Learned in Development and Management, TRW,
February 1993
33
(Siegel 2009 a) Siegel, Neil, “Architecting a System for Flexibility: A case study
of the U.S. Army’s Forward-Area Air Defense Command-Control-and-
Intelligence (FAAD C
2
I) system”, summer 2009
(Siegel 2010) Siegel, Neil, “An Engineering Career in Private Industry”, lecture to
the USC undergraduate honors colloquium, March 2010
(Sprott 2004) Sprott, D., and Wilkes, L. “Understanding Service-Oriented
Architecture,” The Architecture Journal, January 2004.
(TRW 1976) TRW Incorporated, Systems Engineering & Integration Division
Software Development Standards (training manual), TRW, 1976
(TRW 1989) TRW, Ada Process Model, TRW Systems Engineering and
Development Division, 1989
(TRW 1994) TRW, Army Systems Organization Engineering Process Document,
1994
(DoD 2001) U.S. Department of Defense, Systems Engineering Fundamentals,
Defense Acquisition University Press, January 2001
(DoD 2004) U.S. Department of Defense 2004, “Net-Centric Operations and
Warfare Reference Model (NCOW RM)”,
http://akss.dau.mil/dag/Guidebook/IG_c7.2.1.4.asp and “Net-Centric Enterprise
Solutions for Interoperability 2005, Part 2: ASD (NII) Checklist Guidance, Net-
Centric Enterprise Solutions for Interoperability”,
http://nesipublic.spawar.navy.mil/files/NESI_Part2_v1_1.pdf
34
(DODAF 2010) U.S. Department of Defense, DoD Architecture Reference
Framework, version II, 2010.
(Williams 1999) Williams, T M, “The need for new paradigms for complex
projects”, International Journal of Project Management, 1999.
(Xia & Lee 2005) Xia, Weidong & Lee, Gwanhoo, “Complexity of Information
Systems Development Projects: Conceptualization and Measurement
Development”, Journal of Management Information Systems, 2005.
1.6.3.2 The role of people in the process; critical‐skill based project organizations
The study hypothesis postulates a method of improving the assignment of specific
work tasks to specific people, enabled by a design-based technique. This portion of the
literature survey describes the literature that has been drawn upon to understand the work
done to-date on the role of people in the system-development process.
One thread through this literature is personal mastery – what one author terms
“the personal software process strategy” (Humphrey 1995), a methodology for self-
improvement of a single person’s skill as a software developer and systems engineer.
The literature reviewed is mostly not about “tricks” of programming or engineering
(although there is plenty of literature about that, too), but rather, per (Demarco & Lister
1987), (Humphrey 1995), and others, about organizing one’s own work, creating
estimates, how to organize and conduct reviews, how to improve one’s ability to organize
and plan work, how to communicate effectively with others, how to build and maintain
consensus, and other process and relational aspects of the work. This is useful to create
potential categories and approaches to improving individual capability, but it is mostly
35
qualitative, rather than quantitative. The focus is on motivation, teamwork, collaboration
at a distance, communications between team members, communications with other
stakeholders, alignment of goals, personal skill acquisition, and so forth.
Another thread through this literature is organizational mastery – methodologies
for “implementing workforce practices that continuously improve the capability of an
organization’s workforce”, as discussed in (Curtis et al 2002), (Turner 2004), and so
forth. Recently, it has become common to structure such an effort along the lines of a
process maturity framework; as, for example, in (Curtis et al 2002), (Clements &
Northrop 2001), and (SEI 2010). A well-known approach that is particularly relevant to
organizations that do business for the U.S. Federal Government is the Capability Maturity
Model (CMM) and its variants, created by the Software Engineering Institute (SEI), a
federally-funded research and development center affiliated with Carnegie-Mellon
University. The SEI has applied this method on behalf of the U.S. Department of
Defense to assess the maturity of the organizations performing software development on
their behalf, and to stimulate improvements within those organizations. This literature
provides structured, comparable methods for talking about the capability of teams of
people within a software or system development organization, using practice areas such
as staffing, communication and coordination, work environment, performance
management, training and development, compensation, competency analysis and
development, career development, empowered work-groups, qualitative performance
management, mentoring, etc. Note that most of the cases examined herein are activities
performed for the U.S. Federal Government. I think, however, that it should be pointed
36
out that the relationship between higher CMM maturity levels and better project
outcomes is not unambiguously established [although (Keane 2010) and some other
sources try to do so].
There is analysis in this portion of the literature, too, of how a lack of items such
as these can become obstacles to projects; this portion of the work is in some instances
quantitative, in the sense of characterizing the adverse impact of the lack of such qualities
on project performance.
There is no effort in either of the above bodies of work explicitly to connect
people skills, or assignment of people to tasks based on their skills, to the design process
of a project; although there is lots of discussion of the value of well-motivated and
suitably-skilled personnel, none of this literature seeks ways of using the design process
to partition the work into bins of different skills, and thereby provide a strategy for
achieving better assignment of personnel to work, and of isolating certain difficult (or
specialized) pieces of work into a smaller, definable portion of the team. This is a gap
that I will try to address in my study.
(Boehm 1981) started a thread of dealing with people skills (at the team and
project level) in a quantitative manner, focused on the purpose of resource estimation,
e.g., how much time and effort will be required to develop a specified piece of software
[continued in (Boehm et al 2000), (Boehm 2006), and others ]. This work has developed
predictive relationships between a team’s skill levels and experiences, and the resources
(in terms of cost and schedule) that a task will require. This is being extended by Boehm
[ e.g., (Boehm 2009) ] and others to the entire system development process. In addition
37
to its intrinsic value in guiding informed resource estimation, it is interesting and valuable
in pointing out that people and their skills can be dealt with quantitatively.
People in the process
(Boehm 1981) Boehm, Barry W., Software Engineering Economics, Prentice
Hall, 1981.
(Boehm et al 2000) Barry W. Boehm (with C. Abts, A.W. Brown, S. Chulani,
B.K. Clark, E. Horowitz, R. Madachy, D. Reifer, and B. Steece), Software
Cost Estimation with COCOMO II, Prentice Hall, 2000
(Boehm 2006) Boehm, Barry W., “Value-Based Software Engineering:
Overview and Agenda”, Barry Boehm, a chapter in Value-Based Software
Engineering, Boehm (with Biffl, Aurum, Erdogmus, and Grunbacher, editors),
Springer, 2006
(Boehm 2009) Boehm, Barry et al, “Early Identification of SE-Related
Program Risks”, Final Technical Report A013, Systems Engineering Research
Center, 2009
(Boehm 2010) Boehm, Barry W., “Some Future Software Engineering
Opportunities and Challenges”, pre-publication copy, 2010.
(Curtis et al 2002) Curtis, Bill; William E., and Miller, Sally A.; The People
Capability Maturity Model: Guidelines for Improving the Workforce, Addison
Wesley, 2002.
(Demarco & Lister 1987) DeMarco, Tom; and Lister, Timothy; Peopleware,
Dorset House Publishing, 1987.
38
(Fairly 1985) Fairley, Richard, Software Engineering Concepts, McGraw Hill,
1985.
(Folds 2007) Dennis J. Folds, Human Systems Integration, Georgia Tech
Research Institute, August 2007
(www.incose.org/.../Human_Systems_Integration_Seminar_INCOSE_August
_2007.ppt)
(Humphrey 1995) Humphrey, Watts, A Discipline for Software Engineering,
Addison Wesley, 1995.
(Keane 2010), “Keane Outsourcing and the Capability Maturity Model
(CMM)”, from
http://www.keane.com/resources/pdf/WhitePapers/OutsourcingCMM.pdf,
2010
(Lu 2007) Lu, S C-Y, “A Scientific Foundation of Collaborative
Engineering”, University of Southern California, 2007 CIRP General
Assembly, Dresden Germany
(Madni 2010 a) Madni, A.M. “Integrating Humans with Software and
Systems: Technical Challenges and a Research Agenda,” INCOSE Journal of
Systems Engineering, 2010.
(Madni 2010 b) Madni, A.M., “Integrating Humans with Software and
Systems: Technical Challenges and a Research Agenda”, accepted for
publication in the INCOSE Journal of Systems Engineering, 2010.
39
(Majchrzak and Malhotra 2003 a) Majchrzak, Ann, & Malhotra Arvind,
Deploying Far-Flung Teams: A Guidebook for Managers, 2003.
(Majchrzak and Malhotra 2003 b) Majchrzak, Ann, & Malhotra Arvind,
Deploying Far-Flung Teams: A Checklist for Managers, 2003.
(Majchrzak & Jarvenpaa, 2008 a) Majchrzak, A. & Jarvenpaa, S.,
“Knowledge Collaboration Among Professionals Protecting National
Security: Role of Transactive Memories in Ego-Centered Knowledge
Networks”, Organization Science, 2008.
(Majchrzak & Jarvenpaa, 2008 b) Majchrzak, A. & Jarvenpaa, S.,
“Knowledge Collaboration Among Professionals Protection National
Security”, Articles in Advance, 2008.
(Majchrzak undated a) Majchrzak, Ann, “How Emergent Groups
Collaboratively Create New Knowledge”, publication history not specified,
date not specified.
(Majchrzak undated b) Majchrzak, Ann, “Technology Support for Virtual
Collaboration for Innovation in Synchronous and Asynchronous Interaction
Modes”, publication history not specified, date not specified.
(Clements & Northrop 2001) Clements, Paul; Northrop, Linda, Software
Product Lines: Practices and Patterns, Addison-Wesley, 2001
40
(SERC 2009) Systems Engineering Research Center
17
, Personnel
Competencies Instrument (draft), 2009.
(Siegel 2002) Siegel, Neil, “Digitization of the Battlefield”, a chapter in the
book Fateful Lightning, Perspectives on IT in Defense Transformation, ITAA,
2002.
(Siegel 2009 b) Siegel, Neil, “Defense of the Free World” (n.b.: the editors
chose the chapter title), a chapter in the book Beautiful Teams, O’Reilly, 2009.
(SEI 2010) Carnegie-Mellon Software Engineering Institute, “Capability
Maturity Model – Integration” (http://www.sei.cmu.edu/cmmi/), 2010.
(Turner 2004) Turner, R., Balancing Agility and Discipline: A Guide for the
Perplexed, Addison Wesley, 2004.
Critical-skill-based project organizations:
Critical skill definitions
(Noll & Wilkins 2002) Noll, Cheryl L. & Wilkins, Marilyn,
“Critical Skills of IS Professionals”, Journal of Information
Technology Education, 2002.
(Seffah 1999) Seffah, Ahmed, “Training Developers in Critical
Skills”, IEEE Software, 1999
Critical skill certifications
IEEE and other organizational certifications for individuals
17
The SERC (http://www.sercuarc.org/) is an National Science Foundation research center, jointly
managed by The Stevens Institute of Technology and The University of Southern California.
41
Commercial product certification documents
SEI / CMMI documents
ISO documents regarding organizational certifications
Critical skills applied to projects / skill integration into team and project
contexts
(Akkermans & van Helden 2002) Akkermans, H. & K. van
Helden, “Vicious and virtuous cycles in ERP implementation: a
case study of interrelations between critical success factors”,
European Journal of Information Systems, 2002.
(Bresnen et al 2004), Bresnen, Mike; Goussevskaia, Anna; &
Swan, Jacky, “Embedding New Management Knowledge in
Project-Based Organizations”, Organization Studies, Sage
Publications, 2004.
(Curtis et al 2002) Curtis, Bill; William E., & Miller, Sally A., The
People Capability Maturity Model: Guidelines for Improving the
Workforce, Addison Wesley, 2002.
(Demarco & Lister 1987) DeMarco, Tom; and Lister, Timothy;
Peopleware, Dorset House Publishing, 1987.
(Folds 2007) Dennis J. Folds, Human Systems Integration, Georgia
Tech Research Institute, August 2007
(www.incose.org/.../Human_Systems_Integration_Seminar_INCO
SE_August_2007.ppt)
42
(Humphrey 1995) Humphrey, Watts, A Discipline for Software
Engineering, Addison Wesley, 1995.
(Nah & Lau 2001) Nah, Fiona Fui-Hoon; & Lau, Janet Lee-Shang,
“Critical factors for successful implementation of enterprise
systems”, Business Process Management Journal, 2001
(SERC 2009) Systems Engineering Research Center
18
, Personnel
Competencies Instrument (draft), 2009.
(Sukhoo et al 2008) Sukhoo, Aneerav; Barnard, Andries; Eloff,
Mariki M.; Van der Poll, John A., “Accommodating Soft Skills in
Software Project Management”, Issues in Informing Science and
Information Technology, 2008.
(Sumner 2000) Sumner, Mary, “Risk factors in enterprise-wide
ERP projects”, Journal of Information Technology , 2000.
(Boehm & Turner 2004) Boehm, B., and Turner, R., Balancing
Agility and Discipline: A Guide for the Perplexed, Addison
Wesley, 2004.
Project organization
(Baccarini 1996) Baccarini, David, “The concept of project
complexity”, International Journal of Project Management, 1996.
18
The SERC (http://www.sercuarc.org/) is a National Science Foundation research center, jointly managed
by The Stevens Institute of Technology and The University of Southern California.
43
(Hobday 2000) Hobday, Mike, “The project-based organisation:
an ideal form for managing complex products and systems?”,
Research Policy, 2000.
(Markus & Robey 1988) Markus, M. Lynee & Robey, Daniel,
“Information Technology and Organizational Change”,
Management Science, 1988.
(Thiry 2008) Thiry, Michel, “Creating Project-Based
Organizations to Deliver Value”, PM World Today, 2008.
(Turner & Keegan 1999) Turner, J. Rodney & Keegan, Anne, “The
Versatile Project-based Organization: Governance and Operational
Control”, European Management Journal, 2008.
Assigning tasks on projects
(Duggan et al 2004) Duggan, Jim; Byrne, Jason; & Lyons, Gerard
J., “A Task Allocation Optimizer for Software Construction”,
IEEE Software Magazine, 2004.
(Jalote & Jain) Jalote, Pankaj & Jain, Gourav, “Assigning Tasks in
a 24-Hour Software Development Model”, Proceedings of the 11th
Asia-Pacific Software Engineering Conference, 2004.
(Junjie et al 2009) Junjie, Chen; Chuxiu, Yun; Zhen, Wang,
“Multi-dimensional Model Method for the Human Resource
Allocation in Multiproject”, 2009 International Conference on
44
Information Management, Innovation Management and Industrial
Engineering, 2009.
(Lamersdorf et al 2009) Lamersdorf, Ansgar; Münch, Jürgen;
Rombach, Dieter, “A Survey on the State of the Practice in
Distributed Software Development: Criteria for Task Allocation”,
2009 Fourth IEEE International Conference on Global Software
Engineering, 2009.
(Lepak & Snell 1999), Lepak, David P. & Snell, Scott A., “The
Human Resource Architecture: Toward a Theory of Human
Capital Allocation and Development”, Academy of Management
Review, 1999.
(Paris et al 2000) Paris, Carol R.; Salas, Eduardo; & Cannon-
Bowers, Janis A., “Teamwork in multi-person systems”,
Ergonomics, 2000.
(Tatikonda & Rosenthal 2000), Tatikonda, Mohan V. & Rosenthal,
Stephen R., “Technology Novelty, Project Complexity, and
Product Development Project Execution Success: A Deeper Look
at Task Uncertainty in Product Innovation” , IEEE Transactions on
Engineering Management, 2000.
(Vaziri et al 2005) Vaziri, Kabeh; Nozick, Linda K.; Turnquist,
Mark A., “Resource Allocation and Planning for Program
45
Management”, Proceedings of the 2005 IEEE Winter Simulation
Conference, 2005.
(Wang, XU, & Zhang 2009) Wang, Ying; Xu, Yitai; Zhang,
Xiaodong, “Task-Allocation Algorithm for Collaborative Design
based on Negotiation Mechanism”, Proceedings of the 2009 13th
International Conference on Computer Supported Cooperative
Work in Design, 2009.
(Wang et al 2009) Wang, Shao-Qiang; Gong, Li-Hua; Shi-Liang
Yan, “The Allocation Optimization of Project Human Resource
Based on Particle Swarm Optimization Algorithm”, 2009 IITA
International Conference on Services Science, Management and
Engineering, 2009.
(Zhou 2008) Zhou, Lixin, “A Project Human Resource Allocation
Method Based on Software Architecture and Social Network”,
IEEE, 2008.
Training / skill development
DoD and industry training guides
Academic papers on effective training / learning methods
1.6.4 Problems in the system development process
The motivation for performing this study arose from the recurrence of problems
during large-scale system development programs. This portion of the literature review
46
therefore looks at documented examples of such problems, and offers analyses of those
examples.
The literature discusses a wide range of project failure mechanisms, and was
reviewed to look for work along the lines that I have proposed herein, e.g., for failures
attributed to lack of control of dynamic system behavior. In general, this literature [
(Glass 2001), (Flowers 1996), etc. ] provided discussion of social, budgetary, and inter-
personal explanations of project failures, but relatively little discussion about examining
the role of weak or insufficient design as a cause of project problems. I believe that this
is a gap in the literature. I note that one example cited in the literature is about a project
which I have been called upon in a professional capacity to review, where our review
team found one or more design weaknesses as the underlying cause of the (for example)
budgetary problem. That is, a weak design resulted in the effort requiring much more
money (and time) than originally planned, and that in fact, a design that could have met
the original budget was available but not selected. In such a circumstance, I would call
the budget overrun a symptom, rather than a cause. It seems to me that this is likely the
case in many of the examples in the literature; for example (Glass 2001) examines the
failure of several “dot.com” businesses. Often, he attributes the failure to lack of
adequate capital. But it is clear from his own explanations that in most of these cases, the
business was executing a single project (e.g., to bring their eBusiness site on-line), and
when that project got into difficulties, they found themselves in a situation where there
was a material difference between their original estimate of cost and schedule to
complete the project, and their current estimate. If they were unable to raise additional
47
capital at that point, one could fairly argue that that inability is a symptom of their project
performance problem, rather than a problem with the capitalization process. Although
most of these articles do not provide enough technical information to do a proper
assessment of the root cause of these project difficulties, many of them describe slower
than expected performance, inconsistent behavior, frequent and unexplained crashes, and
use phrases like “complexity underestimated” (Flowers 1996) . . . all of which are typical
symptoms of the sort of unplanned dynamic behavior that I will analyze in this study.
A second type of literature examined herein analyzes failures, and in particular, analyzes
software defects – how to model defect rates [e.g., (Clark & Zubrow 2001), (Keene
1999), and (Keene 2000) ], occurrence rates under various scenarios [e.g., (Kulkarni
2006) and (Li et al 2005) ], root causes of defects [e.g., (Rams 2010) ], and so forth.
(Boehm & Basili 2001), Boehm, Barry, & Basili, Victor, “Defect Reduction –
top-ten list”, downloaded from
http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf, 2001.
(Clark & Zubrow 2001), Clark, Brad; & Zubrow, Dave, “How Good is the
Software – A review of Defect Prediction Techniques”, Carnegie Mellon
Software Engineering Institute, 2001.
(Fenton et al) Fenton, Norman; Neil, Martin; Marsh, William; Hearty, Peter;
Radliski, Łukasz; Krause, Paul; “Project Data Incorporating Qualitative Factors
for Improved Software Defect Prediction”, downloaded from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.8739 in 9-2010, date
not specified.
48
(Fenton & Neil 1999) Norman E. Fenton, Norman E. & Neil, Martin, “A Critique
of Software Defect Prediction Models”, IEEE Transactions on Software
Engineering, 1999.
(Flowers 1996) Flowers, Stephen, Software Failure: Management Failure, John
Wiley and Sons, 1996
(Glass 2001) Glass, Robert L., ComputingFailure.com, Prentice Hall, 2001
(Keene 1999) Keene, Samuel, “Progressive Software Reliability Modeling”,
publication history not specified, 1999
(Keene 2000) Keene, Samuel, “The Process-Based Early Prediction Software
Reliability Model”, publication history not specified, 2000.
(Kulkarni 2006), Kulkarni, Aaniruddha P., “Software Defect Data - Predictability
and Exploration” (MS thesis), Texas Tech University, 2006.
(Li et al 2005) Li, Paul Luo; Shaw, Mary; Herbsleb, Jim; Santhanam, P.; Ray,
Bonnie, “An Empirical Comparison of Field Defect Modeling Methods”, Institute
for Software Research International, 2005
(Rams 2010) Alles zu Rams, “Software Fault Tree / Software FMEA”,
downloaded from
http://www.rams.de/beratung/reliability/analysis/software_reliability.html, 2010.
(Sunita 1998) Devnani-Chulani, Sunita, “Modeling Software Defect
Introduction”, USC - Center for Software Engineering, 1998
(Wired 2008) Wired.com, “Rogue Satellite’s Rotten, $10 Billion Legacy”,
http://www.wired.com/dangerroom/2008/02/that-satellite/#ixzz0xrJjyrs2, 2008
49
1.6.5 Understanding the dependent variables
The following literature teaches about the principal dependent variables used in
this study.
Defect rates for systems
(Boehm & Basili 2001), Boehm, Barry, & Basili, Victor, “Defect
Reduction – top-ten list”, downloaded from
http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf,
2001.
(Clark & Zubrow 2001), Clark, Brad; & Zubrow, Dave, “How Good is
the Software – A review of Defect Prediction Techniques”, Carnegie
Mellon Software Engineering Institute, 2001.
(Fenton et al) Fenton, Norman; Neil, Martin; Marsh, William; Hearty,
Peter; Radliski, Łukasz; Krause, Paul; “Project Data Incorporating
Qualitative Factors for Improved Software Defect Prediction”,
downloaded from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.8739 in
9-2010, date not specified.
(Fenton & Neil 1999) Norman E. Fenton, Norman E. & Neil, Martin,
“A Critique of Software Defect Prediction Models”, IEEE
Transactions on Software Engineering, 1999.
50
(Li et al 2005) Li, Paul Luo; Shaw, Mary; Herbsleb, Jim; Santhanam,
P.; Ray, Bonnie, “An Empirical Comparison of Field Defect Modeling
Methods”, Institute for Software Research International, 2005
(Rams 2010) Alles zu Rams, “Software Fault Tree / Software FMEA”,
downloaded from
http://www.rams.de/beratung/reliability/analysis/software_reliability.h
tml, 2010.
(Sunita 1998) Devnani-Chulani, Sunita, “Modeling Software Defect
Introduction”, USC - Center for Software Engineering, 1998
Reliability of systems
(Keene 1999) Keene, Samuel, “Progressive Software Reliability
Modeling”, publication history not specified, 1999.
(Keene 2000) Keene, Samuel, “The Process-Based Early Prediction
Software Reliability Model”, publication history not specified, 2000.
(Kulkarni 2006), Kulkarni, Aaniruddha P., “Software Defect Data -
Predictability and Exploration” (MS thesis), Texas Tech University,
2006.
(Lu 2009) Lu, Stephen C-Y, “Complexity in the Design of Technical
Systems”, Annals of the CIRP, volume 58, 2009
(Madni & Jackson 2008) Madni, A.M., and Jackson, S. “Towards a
Conceptual Framework for Resilience Engineering,” IEEE Systems
51
Journal, Special issue on Resilience Engineering, Paper No. 132, 2008
(accepted for publication).
(Tamura & Yamada 2007), “Software Reliability Assessment and
Optimal Version-upgrade Problem for Open Source Software”, IEEE,
2007.
(Yamada & Osaki 1985) Yamada, Shigeru, & Osaki, Shunji,
“Software Reliability Growth Modeling: Models and Applications”,
IEEE Transactions on Software Engineering, 1985.
(Zhao et al 2005) Zhao, Jing; Liu, Hong-Wei; Cui, Gang; & Yang,
Xiao-Zong, “A Software Reliability Growth Model from Testing to
Operation”, Proceedings of the 21st IEEE International Conference on
Software Maintenance, 2005.
Port-to-port timing in large-scale systems
(Chung et al 1994) Chung, Hak-yeong; Bien, Zeungnam; Park, Joo-
hyun; Seong, Poong-hyun; “Incipient Multiple Fault Diagnosis in Real
Time with Appliciations to Large-Scale Systems”, IEEE Transactions
on Nuclear Science, 1994.
(Dubey et al 2006) Dubey, Abhishek; Nordstrom, Steve; Keskinpala,
Turker; Neema, Sandeep; Bapty, Ted; “Verifying Autonomic Fault
Mitigation Strategies in Large Scale Real-Time Systems”, Proceedings
of the Third IEEE International Workshop on Engineering of
Autonomic & Autonomous Systems, 2006.
52
(Hatley & Pirbhai 1988) Hatley, Derek J. & Pirbhai, Imtiaz, Strategies
for Real-Time System Specification, Dorset House Publishing, 1998.
(Northrop 2008) Northrop Grumman Mission Systems, Engineering
for Integration Guidance, 2008
(Ye et al 2005) Ye, Jianming; Loyall, Joe; Schantz, Rick; Duzan,
Gary; “Detection and Reaction to Unplanned Operational Events in
Large Scale Distributed Real-Time Embedded Systems”, Proceedings
of the 19
th
IEEE International Parallel and Distributed Processing
Symposium, 2005.
1.6.6 The cases and their data
The study methodology identifies ten cases from which the data will be drawn for
this study. This portion of the literature survey identifies the principal sources for these
cases.
(OPTEC 1994) OPTEC (U.S. Department of Defense Operational Training
Agency), Forward-Area Air Defense Command-Control-and-Intelligence System
– Report from system operational testing, 1994.
(Siegel 1993 a) Siegel, Neil, “BM/C3I software technology”, a conference
presentation, January 1993.
(Siegel 1993 b) Siegel, Neil, “Lessons-Learned with Ada in Building the
Forward-Area Air Defense Command, Control, and Intelligence System”, a
chapter in Ada: Lessons Learned in Development and Management, TRW,
February 1993
53
(Siegel et al 1993) Siegel, Neil; Bebb, Joan; Royce, Walker; and Andres, Don,
“Ada and the Management of Transition”, appeared in TRW Technology Series
Journal, TRW-TS-93-01, February 1993.
(Siegel et al 1993 b) Siegel, Neil; Bebb, Joan; Royce, Walker; Royce, Winston;
Andres, Don; Gerhardt, “Ada: Lessons Learned in Development and
Management”. Briefing given in multiple forums, in the U.S., Holland, Belgium,
and France. January / February 1993.
(Siegel 2009 a) Siegel, Neil, “Architecting a System for Flexibility: A case study
of the U.S. Army’s Forward-Area Air Defense Command-Control-and-
Intelligence (FAAD C
2
I) system”, summer 2009
(TRADOC 1998) TRADOC (U.S. Army Training and Doctrine Command),
FBCB2 Limited User Test – Interim Test Report, 1998
TRW / Northrop Grumman – project documents, project records / metrics – for
the FBCB2 / BFT project. About 2,700 computer files, from the project librarian
and the company’s office of cost estimation.
Key documents: These are formal records from the project data librarian,
unless otherwise specified.
Listing of project problem reports. This is the “raw” input to the
project error-reporting-and-resolution process – a problem report
title / description, and an assigned sequence number.
Listing of project problem report outcomes. This is an Excel
spreadsheet that for every project problem report, keyed by the
54
assigned sequence number, has basic outcome information, such as
date opened, date closed, name of the program module(s) affected,
severity code assigned, and so forth.
Monthly program management reports, which include
identification and discussion of key problem reports
Monthly program cost reports
Monthly program metric reports, which include problem report
status and statistical summaries
Formal records from the company’s Office of Cost Estimation
Program manager’s contemporaneous notes and memos (from the
program manager’s records)
Supplemental documents:
Requirements documents
Interface control documents
Design documents
Test documents (plans, procedures, etc.)
Operational concept documents
Training documents
Maintenance documents
Lessons-learned reports
Bill of materials
55
Contractor technical reports for external events (e.g., data packages
for formal reviews, such as the critical design review, etc.)
Internal management reports (e.g., monthly project reviews with
senior company management, etc.)
Externally-generated reports (e.g., Government test reports, etc.).
TRW / Northrop Grumman – project documents, project records / metrics – for
the FAAD C
2
I project. About 850 computer files, from the project librarian and
the company’s office of cost estimation.
Key documents: These are formal records from the project data librarian,
unless otherwise specified.
Monthly program management reports, which include
identification and discussion of key problem reports
Monthly program cost reports
Monthly program metric reports, which include problem report
status and statistical summaries
Formal records from the company’s Office of Cost Estimation
Chief engineer’s notes and memos (from the chief engineer’s
records)
Supplemental documents:
Requirements documents
Interface control documents
Design documents
56
Operational concept documents
Lessons-learned reports
Bill of materials
Contractor technical reports for external events (e.g., data packages
for formal reviews, such as the critical design review, etc.)
Internal management reports (e.g., monthly project reviews with
senior company management, etc.)
Externally-generated reports (e.g., Government test reports, etc.).
TRW / Northrop Grumman – project documents, project records / metrics – for
the AAAA project. From the company’s office of cost estimation.
Software source-lines-of-code counts, by product element
Problem report summaries, by month and severity
Other metric data
Narrative data by the program leadership
TRW / Northrop Grumman – project documents, project records / metrics – for
the BBBB project. From the company’s office of cost estimation.
Software source-lines-of-code counts, by product element
Problem report summaries, by month and severity
Other metric data
Narrative data by the program leadership
Interview notes – Neil Siegel interviewing Eric Deets, former deputy project
manager, and John Williams, former project manager, for the prime contractor
57
(Magnavox, later purchased by Hughes, later purchased by Raytheon), U.S. Army
Advanced Field Artillery Tactical Data System, 22 December 2010.
Interview notes – Neil Siegel interviewing Arthur Hawking, former systems
engineering manager for the prime contractor (Magnavox), U.S. Army Advanced
Field Artillery Tactical Data System. Date 30 January 2011.
Interview notes – Neil Siegel interviewing David Bixler, former chief engineer for
the prime contractor (TRW), U.S. Army Combat Service Support Control System
(logistics automation), 17 January 2011.
Interview notes – Neil Siegel interviewing John Dowdee, former program for the
prime contractor (TRW), THAAD radar software project (originally known as
Ground-Based Radar, GBR), 17 January 2011.
Interview notes – Neil Siegel interviewing Jeff Steiner, former manager for the
CCCC program. Date 2009.
(Royce 1998) Royce, Walker, Software Project Management: A Unified
Framework, Addison-Wesley, 1998. Appendix D is about CCPDS-R, one of my
cases.
1.6.7 Methodology
The study was conducted as an observational case study, using quantitative
methods, combined with elements that take the form of a quasi-experiment. This portion
of the literature survey lists the methodological literature that has been drawn upon to
design the methodology for this study. (Yin 1994) is the primary source of case study
methodology employed; this book not only provides a methodology, but provides criteria
58
for assessing the efficacy of the resulting case study design. (Campbell & Stanley 1966)
is the primary methodological source for the quasi-experimental portions of the work.
(Campbell & Stanley 1966) Campbell D. & Stanley J., Experimental and quasi-
experimental designs for research, Rand McNally, 1966
(Cook & Campbell 1979) Cook, Thomas; & Campbell, Donald T., Quasi-
Experimentation – Design & Analysis Issues for Field Settings, Houghton Mifflin
Company, 1979.
(Creswell 1998) Creswell, John W., Qualitative Inquiry and Research Design,
Sage Publications, 1998.
(Dogain & Pellasy 1990) Dogan M. & Pellasy D, How to compare nations:
strategies in comparative politics, Chatham House, 1990
(Eisenhardt 1989) Eisenhart, Kathleen M., “Building Theories from Case Study
Research”, The Academy of Management Review, Academy of Management,
1989
(Flyvberg 2006) Flyvbjerg, Bent, Five Misunderstandings About Case Study
Research, Qualitative Inquiry vol 12 no 2, April 2006.
(Giddens 1982) Giddens, A., Profiles and Critiques in Social Theory, University
of California Press, 1982.
(Kardos 1979) Kardos, “Engineering Cases in the Classroom”, National
Conferences on Engineering Case Studies, March 1979.
(Kazeef 2009) Kazeef, Michael, class notes for ISE-517, spring 2009.
(Kidler & Judd 1994) Kidler and Judd, 1986
59
(Kuper & Kuper 1985) Kuper A. and Kuper J. (editors), The social science
encyclopedia, Routledge, 1985
(Miler & Dingwall 1997) Miler, Gale & Dingwall, Robert, Context & Method in
Qualitative Research, Sage Publications, 1997
(Shepard & Green 2003) Shepard, Jon and Green, Robert, Sociology and You,
Glencoe McGraw-Hill, 2003.
(Yin 1994) Yin, Robert. Case Study Research: Design and Methods, SAGE
Publications, 1994 (2
nd
edition), 2002 (3
rd
edition), and 2009 (4
th
edition).
(Wheeler 2000) Wheeler, Donald J., Understanding Variation, 2
nd
edition, SPC
Press, 2000.
(Walton 1992) Walton, J, “Making the Theoretical Case”, From Ragin and
Becker (editors) What is a Case? Exploring the Foundations of Social Inquiry,
Cambridge University Press, 1992.
(Webb & Webb 1932) Webb S. and Webb B., Methods of Social Science,
Cambridge University Press, 1932 (re-issued 1975).
1.6.8 Other sources and references
(Juran 1951) Juran, J. M., Quality-control handbook, McGraw-Hill, 1951. This is
the source for Pareto analysis and the “Pareto principle”.
1.7 Organization of the study
Chapter 2 presents the hypothesis, and describes the specific design-based
technique. Chapter 3 describes the case study methodology. Chapter 4 presents the
quantitative findings that result from applying the measurement instruments and the case
60
study protocol to the case files. Chapter 5 presents interpretations and conclusions. The
appendices include detailed descriptions of two of the cases, and other supporting
materials.
61
Chapter 2: Hypotheses
2.1 Scope of the systems‐of‐interest
The use of systems seems to be on the increase, per sources such as (Ramo &
Booton 1984) and (Boehm 2010). These systems perform important roles for society,
running our power grid, our banking system, our air-traffic control system, our traffic
signals, and so forth. They also play important roles in the National defense.
Herein, however, I am interested only in systems with the following specific
characteristics:
Complex emergent behavior, as described by (Rechtin 1991)
Interactions with physical devices (moving missile turrets, other time-
sensitive mechanical devices, etc.)
Stressing asynchronous stimuli (such as extraordinary high data-ingest rates,
or highly-stressed communications structures)
Extraordinarily high reliability requirements
Development efforts of large size
Systems that need to display lots of early progress through prototyping and re-
use, but need, at the same time, to avoid having their design “locked in” to a
pattern that will be ineffective over the life of the development effort
Within this study, I will term systems that display these characteristics “large-
scale complex systems”. This forms the subset of all systems that I will consider in this
study. As will be seen, such systems are still “interesting”, in the sense that such systems
62
experience development-phase failures, and this subset spans a meaningful range of
applications problem domains.
Let us consider this list of characteristics. Within any system, there will be
variability in timing, cueing, and transaction service rate. Long “tails” (e.g., large
variances) in these distributions can, for example, prevent effective and safe control of
physical devices. So reduction of such variance is likely to be a subject of interest in
systems that interact with physical devices. Note that many common trends in software-
intensive systems tend to create long “tails” in distributions, so systems may not acquire
the required low level of variance without dedicated effort. For example, the use of
carrier-sense media access methods (such as is used by the EtherNet) provides good
average performance, but does not prevent long “tails” (outliers) in delivery distribution
times.
Within large systems, there is a material possibility of a high degree of a
combinatorial fan-out of options along the system processing threads, which can create
system state-spaces that are larger than might be traversed by a nominal test activity; this
increases the likelihood of the system inadvertently progressing into a state where some
aspect of system performance becomes unacceptable. A typical behavior experienced
when a system thus progresses into an unintended state is the creation of race conditions,
or other unexpected sequencing, resulting in various sorts of errors, including the
cessation of meaningful operations. This of course lowers system reliability. Systems
where asynchronous stimuli are significant, either through exceedingly high stimulation
rates, or through communications-media-usage being near the theoretical limit, (which
63
tends to introduce non-linear degradations akin to turbulence in water flow), are
particularly vulnerable to this type of problem.
In this study, I will propose a candidate corrective method, and measure its
efficacy. There is a cost associated with implementing this method, and therefore, I limit
my assessment to larger systems (e.g., total non-recurring development effort greater than
1,000 man-years); projects that are materially smaller may not have the technical and
social complexity that would justify the investment in the cost of implementing the
design-based technique described herein. It is not, however, a goal of this study to
determine a specific value for this boundary.
The element of social complexity induced into large projects is worth special
mention: most projects of this size have far more constraints placed on them than smaller
projects (social complexity seems to increase far more than linearly with project budget
size). These social constraints may take the form of requirements to use a specific off-
the-shelf product or products as a part of the implementation, to co-exist with legacy
systems and their interfaces for large portions of their operational life-time, constraints
about specific contractors / vendors to use (including geographic and social contracting
goals), and so forth. These social constraints tend to close off portions of the design trade
space, and often leave projects with designs that have that large fan-out of potential states
described above. Techniques like that considered in this study can help capture system
performance otherwise lost to such constraints.
Projects of this size often have a political or sales imperative to focus on quickly
showing progress (often, even during the pre-award time-period, through prototyping and
64
re-use); this leads (perhaps inadvertently) to an emphasis at the beginning of the
development effort on implementing the easy or most visible portions of the problem
(e.g., either those that are inherently easy, those portions of the system for which partial
solutions are available through re-use, and those portions of the system that provide the
user interfaces). Focusing on these portions of the problem does indeed allow for a lot of
apparent progress to be demonstrated, and therefore, to meet the sales-oriented objective
mentioned above. But so placing initial focus on these easy and visible portions of the
system may not be a path to a viable total system – because focusing on the easy / re-use-
based / visible portions of the system may invoke early design decisions that might
inadvertently preclude achieving important system goals as the total system progresses.
For example, an early decision to re-use some element from another program may cause
the new system to inherit a measure of variability in port-to-port timing along some
critical thread that is unacceptably large; although requirements for such timings are often
specified, they are almost always specified in a way that does not bound “tails” in their
distribution, and this can cause undesirable dynamic behavior as the system nears
completion. A very common problem involves the mandatory incorporation of a large
number of off-the-shelf or otherwise existing components; this approach can lead to
artificially-high external complexity at the interfaces
19
, which is turn leads to the
problems noted above.
19
Rechtin’s heuristic (Rechtin 1991) for good system design is for low external (e.g., at the interfaces)
complexity, while permitting high internal complexity
65
This is particularly the case for behavior that appears only as the system nears
completion, that appears only as the induced load approaches its design limits, or that
appears only as the system gets used over long periods of time. The design-based
technique discussed herein is in some sense a “hard-part-first” strategy, and is intended
explicitly to address this issue. [ Per (Royce 1998), appendix D, the project that he
analyzes, CCPDS-R, attempted to “confront the most important risks first”. His
discussion makes it clear that he believes that doing so was not the norm in the industry
at the time of his writing. ] Much of the literature regarding early prototyping, however,
has in fact focused on re-use and prototyping the user interfaces. Yet, given data from
sources like Forman and Cureton about the likelihood of large, long programs being
cancelled before completion (their thesis is that the long development schedules
associated with large programs create temptations for competitors to arise, and when
significant technical problems arise mid-project, especially those that seem fundamental
to the selected design, programs are cancelled either to try one of the competing
approaches, or to allocate the funding to another mission), a more balanced approach that
allows front-end demonstrations of quick progress combined with efforts explicitly
intended to lower the risk of material problems arising mid-program (such as those cited
above) seems appropriate, or at least, worth investigating.
The characteristics described above form the basis for defining the types of
systems of interest to this study. My professional experience includes systems for the
military and the Federal Governmental; for the entertainment industry; for real-time
process control in manufacturing settings; for electronic medical records, Enterprise
66
Resource Planning, enterprise information, logistics automation, radar and other sensors,
airplanes, robots; and others. This experience suggests that the findings of this study will
apply across all of these problem domains, e.g., there will be some material number of
systems in all of these application problem domains that incorporate the characteristics
listed above. Interactions with colleagues, participation on industry and Government
senior advisory panels, and readings lead me to believe in the likelihood that these
findings will be applicable in additional domains with which I do not have personal
experience. Assessing these assertions is, of course, a part of the purpose of this study.
2.2 Statement of the hypotheses
During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic behavior
of a system will produce more consistent and better project outcomes.
In order to show that this hypothesis is testable, and then to proceed to test it, one
needs to form a relationship between effective control (or lack thereof) of the dynamic
behavior of the system, and some item or items that can be directly measured. I started
this process by noting that design flaws that permit unplanned dynamic behavior will
show up as errors in an appropriately-designed system test program. One item that will
be measured in this study is the incidence rate of these errors – the errors that can be
directly attributed to lack of control of the dynamic behavior of a system.
The literature recognizes defect density as a key indicator of quality; see, for
example, (Fenton & Neil 1999); at the same time, defect density is a metric that has
strong resonance in customer assessments, as well; for example, system acceptance-test
planning documents often require that defect density / defect occurrence for the most
67
serious defect categories be below a certain threshold value before a test event can
commence. For this study, therefore, the density of those defects that are directly
attributable to the control and management of system dynamic behavior represents the
intersection of three important characteristics: (a) it is a measure that has resonance in the
customer / user community for these types of systems, as evidenced by its inclusion in
acceptance-test entrance criteria (b) it is a widely-recognized measure in the literature for
technical merit (c) it relates in a direct and unambiguous manner to the treatment
proposed in this study (e.g., the use or non-use of a technique – described in chapter 2.3 –
to improve the control of the dynamic behavior of a system); the case for a causal
relationship between the treatment and defect density is strong.
At the same time, (Boehm 2000) and other sources show that more capable
designers and programmers will make fewer errors than average personnel, in the areas of
their expertise.
These two factors – density of defects directly attributable to design flaws that
permit unplanned dynamic behavior, and that more capable designers and programmers
will make fewer errors than average personnel – combine to provide the starting point for
forming the necessary relationship between effective control (or lack thereof) of the
dynamic behavior of the system, and some item or items that can be directly measured.
Error-report logs from a real project will be examined. Through such examination, the
error reports that are due to lack of control of the system’s dynamic behavior (that is,
those that are directly attributable to the problem addressed by the hypothesis) will be
separated from those that are due to other causes, allowing measure of the occurrence
68
density of these types of errors both when the design-based technique (described below)
is used, and when it is not used. This will permit the development of a statistical
“signature” that quantifies the density of those errors that are due to lack of control of the
system’s dynamic behavior. The difference in signature between the cases where the
design-based technique is used, and where it is not used, will be shown to have statistical
significance. For example, error density for a portion of a project when the technique
was applied will be compared to a subsequent portion of the same project when the
technique was not applied; and again, to a third portion of the same project where the
technique was re-applied. If the hypothesis is correct, a statistically-significant increase
in these metrics would be expected during the second of these project periods. This
forms the first (and primary) case of this study; additional cases will be carried out, as
well, as is described in chapter 3.
This relationship then enables the following chain of reasoning:
Within the problem domain that I have examined, if control of the dynamic
behavior of a large-scale complicated system is implemented poorly, the system
will perform poorly.
Therefore, we take a design decision to centralize the control of the dynamic
behavior during the design process. We accomplish this through the use of the
technique explained below.
Having been centralized, the design and implementation of that control
mechanism is thereby separated from the majority of the development work.
69
The design-phase and implementation-phase work to implement the control of the
system’s dynamic behavior, having been so separated from the remainder of the
design-phase and implementation-phase work, can be assigned to a small team of
specialists, which relieves the majority of the implementation team from having to
become expert in this matter, and in fact, the majority of the project personnel
need not be cognizant of dynamic behavior as a consideration in the course of
their own work assignments. This is made feasible by the creation of a
mechanism that allows the work of the majority of the development team to
“inherit” the control mechanism from the work of the small team that is
implementing the control logic.
Since the implementation of the control of the system’s dynamic behavior is now
being performed solely by a small team of personnel who are expert in this
matter, per (Boehm 2000) fewer errors caused by unplanned dynamic behavior
will be incorporated in the system design and in the system implementation,
because this implementation is being performed by experts (e.g., personnel who
would have higher “settings” in this particular skill set in Boehm’s models for
factors such as “analyst capability” and “programmer capability”).
Since errors due to these causes are, on average, more serious and take far more
time and effort to correct
20
, having fewer such errors decreases the likelihood of
20
(Siegel 1993 b) makes the following statement: “Errors caused by unplanned dynamic behavior tend to
be more serious than the average error, and take far more effort and time to correct”. (Rechtin 1991)
similarly observes that “Conceptual design failures are catastrophic more often than not”. (Siegel 1993
b) also says “Control of the dynamic behavior of the system, however, is among the most difficult
aspect of the system to implement”; (Rechtin 1991) concurs: such design errors, “because so many
piece parts and elements are involved, are the most common causes of system failures”.
70
significant design re-work being required, and in general, decreases the likelihood
that cost and/or schedule over-runs will be encountered.
Therefore, density of those defects that are directly attributable to unplanned
adverse dynamic system behavior is proposed as the first dependent variable in this study,
and the use or non-use of the design-based technique for centralizing the control of the
system’s dynamic behavior is proposed as the first independent variable.
In addition, two other dependent variables will be examined, in terms of their
behavior in relation to the same independent variable. The first of these is system
reliability. System reliability is an interesting dependent variable for this study because
the literature (and my own benchmarking activities) have shown an unusually-wide range
of outcomes across similar systems: up to a factor of 1,000. For a recognized and
frequently-measured quality metric to show such a wide “spread” of outcomes is highly
unusual. Even our every-day experience reflects this; for example, newly-built, 3,000-
pound family sedans sold in the United States show a range of gasoline mileage of far
less than a factor of 2. Something “interesting” is driving that factor-of-1,000 variation!
If the hypothesis is true, the use of the design-based technique in the characterized
systems ought to result in outcomes that are well toward the “good” end of the observed
range. Studies that can elucidate the causes of (and provide potential correctives to) such
a wide variance have a particularly high potential for useful contribution.
The third item that will be measured as a dependent variable in terms of its
behavior in relation to the same treatment is variance on certain critical port-to-port
timing relationships. If the design-based technique is effective, its use ought materially
71
to reduce the range of variance in the distribution of port-to-port timings, especially at the
high-end of the distribution (I have heard this described as “cutting off the tail” of the
distribution).
This study will also consider a second independent variable: the use or non-use of
techniques to place critical skills within the team that is performing the centralization of
the control of the dynamic behavior of the system. The expectation is that the use of a
design-based technique that centralizes the control of the dynamic behavior of a system is
less effective if the team implementing the centralization lacks critical skills. A scalar
metric of project performance will be used as the dependent variable.
The complete set of independent and dependent variables is summarized in Figure
2.2-1, below:
72
The following paragraphs provide the four specific hypotheses that will be used in
this study. The first names the major independent variable, and the principal dependent
variable:
a. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will lower the density of those defects that are attributable to
unplanned adverse dynamic system behavior.
The second hypothesis introduces the notion of reliability in the system design:
Figure 2.2‐1. Hypotheses and dependent variables.
During the development phase of a large‐scale complicated
computer‐based system, the use of a design‐based
technique that centralizes the control of the dynamic
behavior of a system will produce more consistent and
better project outcomes.
Major
independent
variable:
Use or non‐use
of the design‐
based technique
Dependent variables:
a. Density of defects attributable to errors in
managing system dynamic behavior
b. System reliability
c. Variance of selected port‐to‐port timing
measurements
One hypothesis per dependent variable
Secondary independent variable:
Placing / not placing appropriate skills
Corresponding dependent variable:
Scalar metric of project performance
Null hypothesis:
If this design‐based technique is used, programs will
continue to experience problems at the historical rate
73
b. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will produce better reliability for key system capabilities.
The third hypothesis introduces the notion of port-to-port timing in the system
design:
c. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will reduce the variance for critical port-to-port timing
relationships.
For each of these three hypotheses, I will assess the validity by comparing data
from examples where the design-based technique was used to examples where the
design-based technique was not used, and then showing that any positive correlation was
in fact due to the cause embodied in the specific hypothesis, rather than due to some
alternative explanation.
The fourth hypothesis introduces the notion of the critical skills necessary to
exercise this technique:
d. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system is less effective if the team implementing the centralization
lacks critical skills.
This hypothesis will be assessed through an assessment of six cases, where I have
access to data concerning projects that attempted similar technical interventions, but
74
where some also focused on critical skills acquisition and deployment within the team,
while others did not.
The following paragraphs discuss the basis in the literature and theory for the
operation of the hypotheses.
Hypothesis a: There are many sources in the literature describing the range of
variation in the productivity of engineers (especially programmers); typical values
reported (mean to best) form a linear relationship with a range from a factor of 3 more
productive, to a factor of 10 more productive; a few sources report even larger numbers.
Curtis and Thayer, in particular, report of the range of variation in the number of defects
in the work of individual programmers, reporting ranges (mean to best) of 5 to 10 time
fewer defects. The operation of these reported findings in terms of my hypothesis is that
if my design-based technique can in fact isolate the work of centralizing the control of
dynamic system behavior to a small team of experts, and one then assigns that work to
those experts, the literature suggests that those experts will make 5 to 10 times fewer
defects than average implementers would. Since through that isolation, I have removed
such work from the responsibility of the average and less-than-average members of the
team, the overall result for the project is superior.
Hypothesis d: I discuss hypothesis d next, both because, like hypothesis a, it is
linear in its effects, and also because there is interaction between them in the operation of
my approach. Hypothesis d relates skill level of individual practitioners to project
metrics, such as cost and schedule. Works such as (Boehm 2010) are relevant to
hypothesis d for two reasons: first, because he is one who has quantified the scope of the
75
impact on cost and schedule that derives from the variability in individual skill (linear, a
factor of about 8 from worst to best). Second, he is one who has quantified the
relationship between latent defects and cost / schedule performance (so have others, but
he pioneered this work); he has found that not only will each additional defect cost
additional money and time to correct (a linear effect), but having a large number of
defects has the additional effect of causing the correction of some of these defects to be
deferred to a later stage of the development cycle. He (and many others) have developed
data that shows that such deferral results in a very significant increase in the cost and
schedule required to implement each correction (on the order of a factor of 3 increase for
each development phase, e.g., (a) requirements, (b) design, (c) implementation, (d)
integration, (e) test, (f) deployment, (g) operations, so that a deferral of a correction from
the design to the test phase of a development, for example, raises the cost of each
correction so deferred by a factor of 27). In the context of my hypothesis, this effect of
later-correction-costs-more is very significant: errors in implementing the control of
dynamic system behavior tend not to be noticed until late in the development cycle, as
they involve the interaction of multiple components, and such interaction is not exercised
until fairly late in the integration and test portion of the typical development cycle. This
is probably the root cause underlying the well-known “90-90 rule of project schedules:
the first 90% of the project takes the first 90% of the schedule; the last 10% of the project
takes the next 90% of the schedule”. As noted above, errors not noted until late in the
development cycle cost a lot more to correct than errors noted earlier in the cycle, so if
76
my approach can reduce the number of such defects whose correction would otherwise
occur in later development phases, the cost (and schedule) savings are considerable.
In this context, the design-based technique offers two specific mechanisms that
could provide improvement: (a) as noted in hypothesis a, the assignment of this
responsibility to expert practitioners decreases to total number of such defects, and (b) the
use of the design-based technique, through its system architecture skeleton component,
allows for the integration process to start much earlier in the development cycle, thereby
allowing the finding and fixing of such defects to start in an earlier development phase,
where the literature says that the cost for each correction will be less.
Boehm’s data also indicate that if one holds practitioner skill level fixed and
simultaneously decreases the difficulty of the work to be accomplished, the required cost
and schedule will decrease. In terms of the entire team, the design-based technique
allows me to assign the hardest work to a small portion of the team, and ensure that those
difficult portions do not “leak out” to the rest of the team. As noted in the discussion of
hypothesis a, above, these people are 3x to 10x more productive, and incorporate 5x
fewer defects into their work. The rest of the team, having work to accomplish that is
easier than average (since the hard portions have been isolated away from them by the
design-based technique) will, per Boehm, perform their work at a (linearly) better-than-
average cost and schedule. The combined result is beneficial to the project as a whole.
77
Hypotheses b and c both involve non-linear relationships, so I discuss them
together. Keene is an important source for hypothesis b, because he reports that having
fewer latent defects in a product results in higher levels of reliability. Sha et al is a
source for hypothesis c, because they describe a theory of scheduling events within a
system that predicts that simpler control structures and avoidance of unplanned dynamic
behavior will result in less variance in system performance metrics, including port-to-port
timing. A discussion and explanation of the root causes of the non-linearity of these
metrics is included in chapter 2.3, below. The literature notes examples of factor of
1,000 variation due to these causes. I will in fact show data almost of that magnitude in
hypothesis b, and on the order of a factor of 100 for hypothesis c.
Having cited Sha et al, I feel it necessary to show that the systems that I use in my
10 cases conform to the assumptions of their theory. They have 3 key assumptions.
First, that the selected scheduling mechanism implements strict pre-emption. All
10 of my cases do this.
Second, that the run-model allows no re-entrancy; each entity, once started, is ran-
to-completion. All 10 of my cases do this, too.
Third, that scheduling priorities are in general assigned according to a scheme that
they call rate-monotonic, wherein system activities that occur at a higher repetition rate
are assigned a higher scheduling priority. All 10 of my cases do this, too. Within my 10
cases, system activities fall within three categories of repetition rate: (a) administrative
activities (e.g., an external interface cannot generate an interrupt, so that device has to be
78
polled, usually at a high rate such as 100 times per second); (b) machine-generated traffic
(e.g., output messages from a radar, automatically-generated messages saying that a
vehicle has moved its position, and so forth; typical rates of 5-20 per second); and (c)
human-generated traffic; with typical rates per user of one event per every 10 seconds).
Events of both category b and c may “converge” on an individual operator, and create
complex variations in that person’s work-load. Across these 3 categories, the 10 systems
used as my cases apply a strict implementation of rate monotonic scheduling, that is, the
scheduling priority for the category a items is higher than for the category b items, which
is higher than for the category c items. Within a single category, especially the category
of human-generated traffic, some exceptions to strict rate-monotonic scheduling were
implemented. This is because rather than blindly following a theoretical scheduling
construct, these were real projects that had actual business-process rules, and minor
adjustments to strict rate-monotonic scheduling could be advantageously made to take
advantage of known user priorities among tasks. The theory allows (and even
encourages) such minor deviations (on the basis of actual, specific business-process
rules) in its application to real systems.
2.3 Design‐based technique for centralizing the control of the dynamic behavior
(TRW 1994), (Northrop Grumman 2009), (PMI 2004), (DoD 2001), (Humphrey
1995), (INCOSE 2002), and others sources identify the goals of the design phase of a
large-scale complicated system, including such factors as:
Develop the top-level execution architecture for systems
Define the top-level system organizational structure
79
For each entity within the structure, develop threads, define interfaces, develop
design representations, and prototype critical design items,
Verify the overall structure and flow
Check constructs for correctness
Note than none of these sources identify as a goal of the design phase anything
that has to do with using the design process to partition the work into separable “bins” of
specified degrees of difficulty, or to somehow facilitate the matching of the pieces of
work (and their degree of difficulty) to the skill capabilities and distribution of the
intended design / implementation team. It would appear that such a goal for the design
phase is not one that is widely-recognized.
Several sources (e.g., Boehm 1981, etc.) talk about the importance of individual
or team skill level to the accomplishment of the project (Boehm develops a quantifiable
model of the likely impact on cost and schedule performance, as a function of the skill
level of the personnel involved), but none of these sources appear to have taken the step
of making such matching of work assignment to individual or team skill level an explicit
goal of the design process. Many sources (e.g., Demarco and Lister 1977) make
statements such as:
Get the right people
Make them happy so they don’t want to leave
Turn them loose
whereas (Curtis et al 2002) talks about the value of having work assignments be defined
in consultation with the people involved, and for work assignments to enhance the
80
individual’s growth in skills. (Cockburn 2007) talks about the potential for work
assignments to act as what he terms an “information radiator” which can help enhance
team alignment. Other sources make qualitative statements about the importance of
hiring good people, aligning their view of the goals of the team, avoiding unnecessary
level of personnel turnover, and so forth. (Boehm et al 2009) provides instruments for
assessing whether a particular project is operating in a manner that would tend to increase
or decrease risks, and includes several factors regarding personnel, but does not include
guidance about using the systems engineering process to optimize the assignment of
personnel to tasks. Overall, little specific guidance about how to assign people to specific
elements of work is provided other than general heuristics such as assigning one’s best
people to the hardest parts of the problem.
The process described herein takes the next step of creating an explicit goal for
the design process of a large-scale complicated system of using the design process to
partition off one particular type of difficult task – the control and management of the
system’s dynamic behavior – so that its implementation can be centralized to a small
team, and therefore, the majority of the implementation team need not become proficient
in that technique. Specific skill bins used in this work are described in Appendix D.
The above would be one particular instance of a more general strategy of using
the design process to partition the work into “bins of difficulty”, so as to facilitate the
matching of work assignment to individual or team skill level, for each of several
different types of “difficult” or specialized tasks.
81
Consistent with the hypothesis, above, herein we look at the efficacy of a specific
design-based technique that centralizes the control of the dynamic behavior. Many
portions of such a system will be involved in dynamic behavior, so the technique makes
the control mechanism governing dynamic behavior of the system an automatically-
inherited characteristic of the development environment and system implementation.
This allows the developers for the majority of the system to have their portion of the
system exhibit the desired dynamic behavior, but to do so without their having explicitly
to account for this in their efforts; rather, their portion of the system inherits these
controls from the work of the centralized team. This last item is likely the most
technically-challenging aspect of the problem; it is not sufficient for the implementers of
most of the system not to have to deal with constructs that affect dynamic behavior; their
components of the system must still participate in the system’s dynamic behavior in the
desired way, even though there is nothing in the design or implementation of their
component that explicitly addresses such behavior.
Through such inheritance, all elements of the design avoid the unplanned dynamic
behavior cited above as a cause for difficult and costly errors in these types of systems.
This provides a viable mechanism for partitioning the work on a real large-scale system
engineering program into one type of set of “skill bins”, whose distribution can be
matched to the skill distribution within the team, e.g., those who deal with the dynamic
behavior of the system, and those who do not. (Siegel 1993 a) states this as: “Process
architecture design is a ‘programming-in-the-large’ activity best localized to a small,
senior group of software engineers”, and “Utilizing interprocess and interprocessor
82
protocols requires specialized expertise. How can this expertise be isolated in the code to
reduce personnel training and reduce errors?”
Before I describe the specific technique to be assessed by this study, I wish to
specify what I mean in this context by the term “dynamic behavior of a system”. Per
(Rechtin 1991), a system is a set of parts that displays emergent behavior, through the
interaction amongst those parts. Some of that emergent behavior is planned and desired.
Some of it might be unplanned, and some of that unplanned emergent behavior could be
undesirable; (Ramo & Booton 1984) use the phrase “ . . may produce unexpected
phenomena far from what the designer intended”. These emergent behaviors include
dynamic behavior – that is, behavior that changes over time, in response to various
stimuli and state transitions. In large systems, the number and variation in these stimuli,
and the large number of possible states, may include combinations of dynamic behavior
that have undesirable effects – adversely affecting, for example, timing, capacity, or
reliability. It is desirable to control system dynamic behavior through the design and
implementation process, so that there is little such “unplanned dynamic behavior”.
Among the most complex sorts of emergent dynamic behavior are those
associated with timing of interactions; this area is a source of a large class of unplanned /
adverse dynamic behaviors.
Consider a simple example: a disk drive operates by having a disk coated with a
storage medium spinning at a planned, constant rate. The surface of the disk is organized
as concentric rings of storage medium, each called a track, which are partitioned into
angular sectors of data, with control information placed in between each sector. If the
83
design goal is to read all of the data on a track in a single revolution of the physical disk,
there is a clear timing budget for performing the necessary data transfer and processing
for each sector, derived from the rate of revolution of the disk and the angular size of the
sector and inter-sector control data. If the process to read and process the data from
sector 1 takes even a small amount longer than this timing budget, the system will not be
ready to start reading the data from sector 2 when the head is over the appropriate
location, and the disk will have to be allowed to spin all the way around again before the
data from sector 2 can start to be read and processed. As a result, to read the data from
sector 1 and sector 2 would take much longer than budgeted, because the time for the
platter to spin all the way around intervenes between the two sector reads. The result is a
highly non-linear degradation of performance from the expected level, due to what could
be a relatively minor overage in a timing budget.
This potential for extreme non-linearity in degradation of performance due to an
unexpected dynamic in system behavior, even a minor such variation, is what makes the
management of system dynamic behavior (and the avoidance of unplanned dynamic
behavior) such a fruitful one for system development. The undesirable behavior could
manifest itself as a timing / performance / capacity degradation, as in our simple example,
or as a reliability / mean-time-between-failure degradation, or in some other fashion.
Often, the causal relationships are far more complex and subtle (and hence, harder to
find) than in our simple disk-drive example.
Digital hardware designers have recognized this problem for years, and the state-
of-the-art in digital hardware design addresses this problem head-on through techniques
84
such as DC timing diagrams and the associated analyses. But notice that a digital
hardware component only interacts directly with the components with which it is
physically connected by a wire – the combinatorial nature of the analysis can be
rigorously bounded.
A modern software-based system, perhaps with hundreds (or even thousands) of
computer processors operating in a distributed fashion, each of which has 500 to 1,000
independently-scheduled software entities, and many forms of asynchronous stimulation
(including from people), can have many more opportunities for such unplanned / adverse
dynamic behavior. But the adverse effect, as in the simple disk-drive example cited
above, can still be highly non-linear – that is, seemingly small instances of unplanned
dynamic behavior can seriously degrade and compromise a seemingly well-designed and
well-tested system.
The introduction of ever-larger amounts of software into large-scale systems
[described by Boehm 2010)] introduces an additional significant increase in the potential
interconnectedness. Not only are there many more “parts” – the systems under
consideration herein have a few tens of thousands of physical parts, but millions of lines
of software – but unlike hardware components which only directly interact with other
parts with whom they are physically interconnected (via a wire or near-field
electromagnetic effects, for electronic parts; via mechanical motion, for physical parts
like gears and levers), software “parts” can directly interact (via various sorts of calls,
etc.) with almost any other software “part”. So, in software there are both many more
“parts”, and the potential for a much denser interconnection topology. This has led to a
85
situation where unplanned adverse dynamic behavior is a significant factor in the failure
mechanisms of large-scale, software-intensive system development, as evidenced by the
quotes from (Siegel 1993 b) and (Rechtin 1991) cited above.
Modern software practices have (unintentionally) tended to increase the
likelihood of such problems; for example, the increasing use of service-oriented
structures in software systems, in which hundreds of software entities are made available
to be called and used by a large set of applications, has led to a significant increase in
such unplanned dynamic behavior – the developers of these software services may have
no credible way to understand all of the ways in which their service may be used.
Yet there is little perceivable momentum towards society wanting to give up the
functionality and richness achievable through such large software-intensive systems –
these perform vital roles for society that cannot, at present, be performed in any other
manner. It would, therefore, be desirable to address and correct the problem, so that
society can realize the benefits of such systems, yet the development process for such
systems will succeed more often.
Dynamic behavior is implemented in such large-scale systems through a chain of
events such as the following:
An asynchronous stimulation (e.g., from a person, from a communication
channel, from a sensor, etc.), or a synchronous stimulation (e.g., from a timer)
occurs, forming an input to the system. This can be mechanical or electrical.
This stimulation causes the spawning of a software or hardware action, to respond
and process the event or input.
86
This can cause a thread (planned sequence of activity). Note that any step along
the thread, however, may spawn additional, concurrent threads.
Eventually the original thread become completes all of its steps, and these
software / hardware components then remain quiescent until the next instance of
stimulation occurs.
There are at least two classes of problems that might occur in the context of such
a thread execution: firstly, many of these threads have hundreds or even thousands of
potential branches and alternatives, which often include the spawning of separate and
concurrent additional threads. So even within the context of a single thread initiation,
unplanned dynamic behavior can occur, through the interaction between the spawned
threads.
But the more common and more serious instances generally arise because the
stimuli for the above threads are often intrinsically asynchronous, and an indeterminate
number of combinations of such threads may be initiated and running concurrently.
The complexity associated with analyzing the above is far beyond the capacity of
the tools used by digital circuit designers [ (Medvidovic 2010) describes the large
combinatorial complexity of a similar analysis problem], and in general, the instances
that prove the most troublesome are precisely those instances of dynamic system
behavior that were not foreseen (e.g., were unplanned). Some [such as (Dijkstra 1998)]
call for solving this by programming only through the artifice of formally-proven
mathematical models; even, however, if desirable, this is not a typical current practice,
and there are no signs of its being adopted on a large scale. Others, such as (Madni 2008)
87
call for attempting to address this sort of problem through what he terms “resilience
engineering”.
This problem is made more difficult by the fact that the typical software
environment has many mechanisms for implementing concurrent processing. For
example, usually the operating system has more than one such mechanism, usually each
layer of the system’s middleware provides at least one, and most modern programming
languages have one or more such mechanisms built right into their language definition.
Furthermore, the typical system development methodology distributes the
decision-making about the implementation and control of concurrency (and other forms
of dynamic behavior) across the entire population of developers; every software
developer (and many of the hardware designers) can trigger additional concurrent
processing threads within the system.
The process fails, therefore, because:
Having the responsibility for the implementation of dynamic behavior distributed
across the members of the development team allows the potential for unplanned
dynamic behavior to be introduced by design decisions – the individual designers
may not understand all of the ways in which their components can participate in
processing, and may therefore not provide adequate controls to ensure that their
components perform always as planned.
Such distributed responsibility for the implementation of dynamic behavior makes
an implicit decision that any member of the development team is adequately
skilled in the area of controlling dynamic behavior to be able to make appropriate
88
design decisions. Given the wide range of skills and experience on a large team
(the example projects considered in this study each had on the order of 300 to 500
developers), this is likely an incorrect assumption.
As noted in the previous chapter, (Dijkstra 1988) and (Friedman 2005) advocate
high-dimensional mathematical modeling of a complete system, which in principle ought
to include the modeling of the dynamic behavior of the system. Most other sources of
system-design guidance [e.g., (PMI 2004), (DoD 2001), etc], however, allow for analysis
and prediction of the system’s behavior through far less complete methods, usually
associated with low-dimensional “views” of the system, such as those specified in
(DODAF 2010) and (Northrop 2008). One key and novel aspect of the design-based
technique assessed herein is that it seems to provide a feasible and effective way to
identify which portions of system behavior can effectively be analyzed separately, and
which portions must be kept unified. In essence, it provides guidance for how to split the
dimensionality of the system into separately-analyzable domains; this is an attempt to
avoid the problem cited by (Friedman 2005) that without such guidance, people will
make poor decisions about separating the dimensions for modeling. This is done through
(a) the identification of every independently-schedulable entity within the system, (b) the
forming of those entities into threads of execution, (c) implementing the actual control
structure (not a model)
21
that mechanizes those threads, and then (d) executing those
threads across a large range of scenarios, timing conditions, and stimuli, and in quantities
large enough to drive significant stochastic range. Experience has demonstrated that the
21
I term this the system architecture skeleton, orSAS.
89
implementation of the above can be done with an effort that is usually 2 orders of
magnitude smaller than the entire system implementation effort, allowing it to be
accomplished early in the project life-cycle. Implementing such a system architecture
skeleton so early in the project life-cycle necessarily involves the use of stubs,
prototypes, and early versions of application functionality, but the key is that it can
incorporate the actual hardware and software that implements the control structure.
Note that this is backwards from the approach usually offered in the literature for
how to construct a system; sources such as (ANSI 2004), (Brooks 1975), (Cockburn
2007), (INCOSE 2007), (Jackson & Hines 2009), (Northrop 2009), (PMI 2004), (DoD
2001), and (DoD 2004) call for prototyping of applications, algorithms, external
interfaces, and human-machine interface, but do not call for the sort of prototyping /
assessment described above of those elements of the system that are responsible for
controlling and managing the dynamic behavior of the system. This is a gap that will be
addressed by this study.
Framed in terms of the language of (Friedman 2005), in partitioning the system
into lower-dimensional entities for analysis and decomposition, I treat all of the elements
involved in the control and management of the system’s dynamic behavior as indivisible,
and in fact, do not attempt to model them; I prototype and exercise this element (the
SAS), obtaining benchmarks and insight through instrumentation. If the hypothesis is
correct, system development efforts that follow this approach should do better, directly
via fewer errors in the control and management of dynamic system behavior during the
development phases, and (if these errors are material relative to the entire scope of the
90
project), indirectly through measures of cost and schedule performance during the
incremental development cycle.
The design-based technique assessed herein to correct this problem consists of
both a methodology, and a set of implementing / supporting tools.
The methodology portion of the design-based technique must: (a) allow for the
description and documentation of system dynamic behavior (which implies that this
specification must reach to the level of every independently-schedulable entity within the
system
22
); (b) allow for the recognition of every mechanism available to a developer that
could create concurrency and other forms of dynamic behavior within the system, and
train the developers outside of those implementing the control kernel that they are not to
use these mechanisms; (c) enforce that proscription; (d) allow for the specification of the
threads and allowable interconnects / interactions amongst the components (whether
hardware or software or people); (e) allow for the instrumented exercising and analysis of
the system threads under realistic stimuli, and do so in a way that allows the dynamic
behavior to be observed and adjusted separately from the specific functionality of the
system.
The implementing / supporting tools must: (a) allow for a small portion of the
implementation to specify and control the dynamic behavior of the system, e.g., provide
for a sort of “control kernel”; (b) allow for the entities within the system that are designed
and implemented without reference to dynamic behavior in some fashion to “inherit” a
22
It is telling that relatively few system‐ or software‐development guidance documents even address the
concept of independently‐schedulable entities.
91
set of behavioral controls in accordance with the parameters and algorithms implemented
within the control kernel; (c) automatically translate the specification of the threads and
allowable interconnects / interactions amongst the components (whether hardware or
software or people) into an executable mechanism (the scale and complexity is usually
far too large to do this credibly and reliably by hand); (d) provide an execution and
instrumentation framework for the dynamic system threads, even prior to the availability
of actual system components (hardware and software).
The following figure provides a procedural description of the design-based system
architecture skeleton (SAS) technique. The key step (shown in boldface type) is the step
that accomplishes the actual de-coupling of the “hard” work from the more normal work.
Figure 2.2‐2. Procedural description of the system architectural skeleton.
Describe system’s desired dynamic behavior, to the level of every independently ‐schedulable entity
Take a design decision to centralize control of system dynamic behavior into a small portion of the
design and implementation (“control kernel”)
Use ICDs and programming manuals to catalogue every mechanism available to a developer that
could create concurrency and other forms of dynamic behavior
Designate a small team with suitable expertise as the implementers of the control kernel
Train the developers outside of those implementing the control kernel that they are not to use these
mechanisms, and create ways to enforce that proscription
Provide a mechanism for specification & implementation of the threads and allowable
interconnects / interactions amongst the components
Provide a mechanism that allows for a small portion of the implementation to specify and control
the dynamic behavior of the system, e.g., implement the control kernel
Provide for a mechanism that supports the instrumented exercising and analysis of the threads
under realistic stimuli, and do so in a way that allows the dynamic behavior to be observed and
adjusted separately from the specific functionality of the system, and can do so even prior to the
availability of actual system components
Allow for the entities within the system that are designed and implemented without reference to
dynamic behavior in some fashion to “inherit” a set of behavioral controls in accordance with the
parameters and algorithms implemented within the control kernel
Automatically translate the specification of the threads and allowable interconnects / interactions
amongst the components (whether hardware or software or people) into an executable mechanism
92
Figure 2.2-2 starts with the step of creating a description of the system’s desired
dynamic behavior, down to the level of every independently-schedulable entity and event
within the system. These independently-schedulable entities – hardware and software –
form the building-blocks from which system dynamic behavior is formed. Any
reasonable endeavor to control and improve system dynamic behavior must include a step
of capturing a list of these building blocks.
The figure continues with a design decision – to centralize control of system
dynamic behavior into a small portion of the design and implementation, and one that
will therefore be implementable by a small, expert team.
The process continues with a cataloging of all of the forms by which processing
initiation and especially, concurrency, can be introduced into a system; e.g., each of the
levels of software (operating system, programming languages, middleware scripts, etc.),
hardware interrupts, true simultaneous external stimulation, and so forth.
For concurrency created through hardware mechanisms, one starts this process by
consulting the system’s interface control documents (internal and external); they will
describe every hardware-oriented stimulation and data-oriented stimulation to the system.
One can then build a comprehensive list of such stimulations, and create a representation
of the thread of processing spawned by each. Such a comprehensive definition of
processing threads, while not nearly universal in practice, is well documented in the
literature [ e.g., (Northrop 2008) ] as a best practice.
93
The next step is not, however, a common one: for concurrency created through
software mechanisms, one goes through the programming manuals for each of the
selected pieces of infrastructure software in your system (e.g., operating systems,
middleware, programming languages) and finds every programming construct that can
spawn concurrent processing; such constructs are easy for an expert practitioner to find,
having observable described properties such as “create a process”, “wait for a rendezvous
signal”, and so forth. The combination of these two lists form the desired list or
catalogue.
One then selects an appropriate subset of these mechanisms as being the ones that
will be available to the development team, e.g., we will use the provisions of the
middleware for thread dispatch and rendezvous, but not use the similar features built into
the operating system and programming languages. One must then train the developers
regarding these decisions, and create mechanisms (such as code auditors and peer-review
procedures) for enforcing those proscriptions.
Having catalogued both the independently-schedulable entities and the forms for
introducing stimulation and concurrency, one can then form processing thread definitions
that can be carried through to the level of capturing the desired invocation conditions for
every independently-schedulable entity within the system. For most real systems, this is
complicated enough that some sort of computer-readable representation of these data is
probably required, allowing this representation to become an actual specification of the
desired processing threads and sequences. This is accomplished through a process that
was later adopted as an industry standard [ cited in (OASIS 2007) ] that at the time we
94
called “system management scripts”, but is now called “business-process execution
language”. This is formal semantic representation of allowable work-flow within the
system. We have created implementations that require textual inputs, but have also
created front-end tools that support graphical input.
One must then provide an executable mechanism that can read this computer-
readable representation of processing threads, and turn it into processing instructions,
using only the selected mechanisms for initiating processing and concurrency. The intent
is that (a) this control kernel is a small portion of the overall system implementation, and
(b) is the only mechanism that can invoke processing or concurrency. It must literally
block non-authorized invocations. It takes experience and some trial-and-error to reach a
balance between achieving a small implementation with rigorous blocking of non-
authorized behaviors, and sufficient richness of expression to support the range of
processing required in a real-world system. In the case described in this study, we did not
reach a satisfactory balance until the 3
rd
iteration.
Having achieved all of the above in advance of taking the (presumably large)
effort to develop all of the systems actual applications software and specialized hardware
devices, one can still implement a system architecture skeleton that implements the actual
processing threads – using the actual intended control mechanisms – in a manner that can
have credible timing, capacity, and other characteristics that will reflect the end-state of
the eventual system. In the case described in this study, this involved only about 1% of
the total system development effort, so this step credibly could preceed most of the
development effort.
95
The key step (shown in boldface type in Figure 1) is the step that accomplishes
the actual de-coupling of the “hard” work from the more normal work – allowing for the
entities within the system that are designed and implemented by those outside of the
small team implementing the control kernel to implement their elements of the system
without reference to the desired system-level dynamic behavior, yet for their components
to in some fashion inherit a set of behavioral controls in accordance with the parameters
and algorithms in the control kernel. One approach to achieving such inheritance is the
use of pre-written shells and stubs that control processing invocation and concurrency for
software applications. These shell programs interact at run-time with the control kernel
to mediate inputs and outputs, and provide the control of system dynamic behavior, but
the personnel who write the applications that populate the shells needs have no
cognizance of these matters; their requirements specifications simply specify inputs,
outputs, and transformation methods. They are limited to run-to-complete behavior.
Other approaches are possible, as well.
With the additional of instrumentation and stubs, the threads can be exercised, and
system performance measured, even in the absence of actual hardware devices and target
application software.
Per (Royce 1998), appendix D, the project he analyzed (CCPDS-R) also made use
of the “core team” concept; he describes this as “unique” for the time
23
. He characterized
the core team approach as focusing on leveraging the skills of a few experts across the
23
There were other projects doing this at this time; in particular, FAAD C
2
I, one of the other cases studied
herein. But this practice certainly was rare.
96
entire team, and specifically, making them responsible for the “development of the
highest leverage components (mostly within the NAS)”, where NAS is his terminology
for what I call herein the SAS. He also says “As a result of encapsulating these complex
issues in a small number of high-leverage components, the mainstream components were
simpler and far less dependent on expert personnel”. CCPDS-R, however, was primarily
a software-development project; it did not develop a large amount of specialized
hardware, and so the use of the core team applied to its software, not to the entire system.
The following two tables show how the design-based technique I developed –
termed the “system architecture skeleton”, or “SAS” – satisfies these requirements. The
first shows this in terms of the methodological elements of the SAS, and the second
shows this in terms of the implementing tool elements of the SAS:
97
Required features:
Methodology
Specific element of the SAS approach that
provides this feature
Allow for the description and documentation
of system dynamic behavior (down to the
level of every independently-schedulable
entity within the system)
A specialized representation has been created,
with appropriate semantics. The representation
can be created manually (e.g., using a tool like
Visio), but automation support has also been
created.
Allow for the recognition of every
mechanism available to a developer that
could create concurrency and other forms of
dynamic behavior within the system, and
train the developers outside of those
implementing the control kernel that they are
not to use these mechanisms
Auditing checklists have been created that
provide insight about how to prepare a
comprehensive version of such a list for a
specific project and its environments.
Enforce that proscription A combination of (a) guidance for project
reviews (e.g., reviewing this matter in peer
reviews and code audits, with guidance
provided about topics and questions), and
automated means (e.g., code auditors, disabling
operating system and compiler features, etc.).
Allow for the specification of the threads
and allowable interconnects / interactions
amongst the components (whether hardware
or software or people)
A specialized representation has been created,
with appropriate semantics. The representation
can be created manually (e.g., using a tool like
Visio), but automation support has also be
created.
Allow for the instrumented exercising and
analysis of the system threads under realistic
stimuli, and do so in a way that allows the
dynamic behavior to be observed and
adjusted separately from the specific
functionality of the system.
A tool has been created for this purpose (see
below), and a method for using that tool.
Table 2.2‐1. SAS – required features – methodology.
98
Required features:
Implementing tools
Specific element of the SAS approach that
provides this feature
Allow for a small portion of the
implementation to specify and control the
dynamic behavior of the system, e.g., sort of
a “control kernel”
This is performed by a tool we call the
“software backplane”, and shell templates for
system applications. In combination, these
items can implement, control, or disable
dynamic behavior of hardware, software, and
human interactions. The specific behavior is
defined through scripts, in a syntax that we
defined. This scripting is similar to BPEL
(business process execution language
24
), but
more complete with regard to controlling
dynamic behavior.
Allow for the entities within the system that
are designed and implemented without
reference to dynamic behavior in some
fashion to “inherit” a set of behavioral
controls in accordance with the parameters
and algorithms implemented within the
control kernel
The shell templates (we call them “system
manager programmer interface”) perform this
function.
Automatically translate the specification of
the threads and allowable interconnects /
interactions amongst the components
(whether hardware or software or people)
into an executable mechanism (the scale and
complexity is usually far too large to do this
credibly and reliably by hand)
The system management scripts are
automatically turned into executable commands
and initialization parameters for the relevant
operating system, compiler, loader, and
attached hardware devices.
Provide an execution and instrumentation
framework for the dynamic system threads,
even prior to the availability of actual system
components (hardware and software).
The execution environment created by the
system management scripts is built up out of
the shell templates. The contents of the shell
templates can include real production software
and hardware, but can be prototypes, models,
and stubs, in any combination. This allows the
dynamic behavior of the system to be observed,
instrumented, and corrected separately from the
functionality of the system.
Table 2.2‐2. SAS – required features – implementing tools.
In this design-based technique, the boundary between systems engineering and
software development is blurred; this is consistent with the findings of (Boehm 2010),
24
The web‐site searchSOA.com defines Business Process Execution Language (BPEL) as “an executable
dialect of XML that facilitates the modeling of interactions between Web services in a cloud computing
environment.” BPEL is defined in (OASIS 2007).
99
who cites efforts and recommendations for more integration between the processes and
approaches of the two fields.
The design-based technique evaluated herein has been brought to full-scale
maturity and application on real-world projects. Some of the cases in this study are real,
“go-to-war” products have been produced using the candidate design-based technique,
performed by professional industrial teams consisting of hundreds of people.
This specific design-based technique will only work, of course, for such systems
where the dynamic behavior can be separated and centralized. Cases, however, have
already been identified where this technique has been used in additional application
problem domains, indicating that there may be real-world value across multiple
application problem domains for this approach.
100
Chapter 3: Methodology
3.1 Overview of the methodology
This work principally takes the form of an observational case study. (Yin 2009)
and (Shepard 2003) define a case study as an in-depth investigation of a single individual,
group, or event, intended to explore causation, in order to find underlying principles.
(Cresswell 1998) cites a set of features of a good case study, including: (a) case
adequately defined, for example bounded by time, place, or other appropriate attributes;
(b) a sense of “story” in the presentation; (c) sufficient raw data presented; (d)
confirmation through what he terms “convergence of information”, or “triangulation”; (e)
point of view of the researcher made clear.
(Flyvbjerg 2006) asserts that one can “generalize on the basis of a single case”.
(Walton 1992) states “case studies are likely to produce the best theory”. Both find the
relatively in-depth focus on a single (or small number) of real examples as conducive to
producing understanding
25
. Similarly, Beveridge [as quoted in (Kuper, 1985)] states
“More discoveries have arisen from intense observation than from statistics applied to
large groups”. (Giddens 1982) speaks to the value that comes from descriptions of events
in which the describer had been a participant. (Flyvbjerg 2006) goes farther, stating that
“the proximity to reality, which the case study entails, and the learning process that it
generates for the researcher will often constitute a prerequisite for advanced
25
There are dissenters, too. For example, (Dogan & Pellasy 1990) and (Campbell & Stanley 1966) raise
limitations and concerns regarding the use of case studies, although when one compares the details of
what Campbell calls a case study and what Yin calls a case study, the differences are profound, and it is
not clear that the terms as used are directly comparable. Campbell may agree with this latter point, as
he wrote the highly complimentary forward to (Yin 1994).
101
understanding”. (Miller 1997) stresses the importance of being what he calls a
“participant observer”.
(Kardos 1979) states that case studies allow for the description of what was
actually done in the context of a development or acquisition.
The observational case studies proposed herein are intended to assess the efficacy
of the hypotheses, which state that by centralizing the responsibility for the portion of a
system design that controls dynamic behavior (through the use of the specific technique
described), unplanned dynamic behavior can be reduced, and that such reduction
improves the outcomes on the development of large-scale, complicated, software-
intensive systems.
The case-study methodology used herein has been adapted from (Yin 1994), (Yin
2002), (Yin 2009), and the application of this method by (Latimer 2008). The hypotheses
(see previous chapter) have been generated. Metrics to be collected and statistics to be
calculated for each case are defined. A principal case is evaluated, examining data that
represents a long period of project performance. Variation in behavior during the case
leads to different measurement results at different portions of the case; the defined
metrics are collected and the defined statistics calculated, thereby permitting the
evaluation of the hypothesis for this case. In addition, nine additional cases are also
evaluated. The selection of the combination of cases is aimed to match the strategy for
selection of samples and cases that (Flyvbjerg 2006) calls “maximum variation cases: To
obtain information about the significance of various circumstances for case process and
outcome (e.g., three or four cases that are very different in one dimension)”. In this case,
102
the dimension in which these cases are different is the use or non-use of the technique
described in the hypothesis; in particular, two cases that used the technique, and two that
did not, all in a single application problem domain, and then six additional cases that
bring in four additional application problem domains. On a similar vein, (Yin 1994)
recommends that in multiple-case studies,
“each case must be carefully selected so that it either (a) predicts similar
results, or (b) produces contrasting results but for predictable reasons”
The case selection in this study follows this recommendation. (Yin 1994) also
points out that cases in a multiple cases study ought not to be thought of as statistical
samples, but rather as what he terms replication, which he draws as analogous to the use
of multiple experiments.
As I laid out the methodology for this study, it was pointed out to me that portions
resemble the results of doing a quasi-experiment. I have therefore created a hybrid
methodology that combines case-study and quasi-experimental elements.
According to (Cook & Campbell 1979), the term quasi-experiment has come into
use to denote “experiments that have treatments, outcome measures, and experimental
units, but do not use random assignments to create the comparisons from which
treatment-caused change is inferred”. Not all situations lend themselves to the use of
randomization; critical elements might not, for example, be under the control of the
experimenter, and therefore, the option of random assignment might not be available.
(Campbell & Stanley 1963) uses the term ‘pre-experimental’, rather than ‘quasi-
experiment’; they distinguish this from ‘true experimental’ designs, e.g., those that
include the use of randomization, e.g.:
103
Methodological reference Nomenclature for
experimental designs that do
not include randomization
Nomenclature for
experimental designs that do
include randomization
(Campbell & Stanley 1963) ‘Pre-experimental’ ‘True experimental’
(Cook & Campbell 1979) ‘Quasi-experimental’ ‘Experimental’
Table 3.1‐1. Nomenclature – experiments and non‐experiments.
There are three elements of this study that, although structured as part of the case
study design, can be considered quasi-experiments under the above definitions, in that,
while several factors are held approximately constant, the independent variable (use /
non-use of the design-based technique) was adjusted, observations were made, and
conclusions will be drawn from the observation of the change in dependent variables, but
no opportunity for introducing randomization was available. To follow the nomenclature
introduced above, there is a treatment (use / non-use of the design-based technique), an
outcome measure (the dependent variables), and some comparison from which change
can be inferred and hopefully attributed to the treatment (e.g., the increase or decrease of
the dependent variable may be attributed to the use / non-use of the design-based
technique).
The first such element of this study that can be considered such a quasi-
experiment is the 3-period behavior on the FBCB2 project (see chapter 3.2.1, below).
The second such element is the 2-period behavior on the FAAD C
2
I project (see chapter
3.2.2, below). The third is the AFATDS / CCPDS-R comparison (see chapter 3.2.2,
below). These are each described in the following paragraphs.
As noted above, the first of these three elements that can be considered a quasi-
experiment is the 3-period behavior on the FBCB2 project, where during period I, the
design-based technique was employed; in period II, it was not employed; and in period
104
III, its use was re-introduced. It is known from the surviving project documentation that
many aspects of the project were left approximately constant across these three periods,
creating a situation where it may be possible to attribute any changes in the dependent
variable(s) to the “treatment”, e.g., use / non-use of the design-based technique. A
method of formulating and assessing plausible rival hypotheses was used to formalize
this (e.g., to determine if such an attribution is valid), drawing on (Yin 1994) and other
identified methodological sources.
In the nomenclature of (Cook & Campbell 1979), this 3-period behavior is
probably best described as “a repeated-treatment design”, usable in situations where it is
“possible to introduce the treatment, fade it out, and then reintroduce it at a later date”. In
this instance, the “treatment” is the use of the design-based technique, which was used in
period I, “faded out” in period II, and re-introduced in period III.
(Cook & Campbell 1979) describe this particular quasi-experimental design via
four states that progress through a sequence of three action-steps, as depicted in the
figure, below:
105
Turning to a different methodological reference, using the nomenclature of
(Campbell & Stanley 1963), this could also be described as three successive /
interlocking instances of what they term “one-group pretest-posttest design”.
Once one has identified a quasi-experiment design that matches one’s work-plan,
both methodological references go on to provide insight regarding typical risks to this
particular quasi-experimental design – in fact, this is practically the only topic these
books discuss under the heading for each quasi-experiment design. Having access to
such a “pre-defined” list of typical risks is one key benefit of identifying this as a quasi-
experiment, rather than just a case study, as it allows me to draw upon these lessons-
learned from the literature.
The second quasi-experimental element in this particular study is the 2-period
behavior on the FAAD C
2
I case. From contract award in 1986 through early 1989, the
project did not use the design-based technique (and the project experienced severe
Figure 3.1‐1. A repeated‐treatment design.
O
1
X O
2
X O
3
X O
4
A repeated‐treatment design
Use of the
design‐based
technique
Re‐adoption
of the design‐
based
technique
Non‐use of
the design‐
based
technique
Period I
Period II
Period III
Project
initiation
̄
106
technical, cost, and schedule problems). The program adopted the design-based
technique in early 1989, and since that time has performed well (the program continues to
this day). The detailed error-report data that I plan to use for this case are drawn from
2002, and are expected (if the hypothesis a is true) to exhibit behavior similar to FBCB2
periods I and III. I also have access to data in less detailed form that documents the
existence and magnitude of the problems in the 1988 time-frame, which I can use for
comparison purposes. This again fits into the structure called by (Campbell & Stanley
1963) “one-group pretest-posttest design”, e.g., the program was started, resulting in a
state O, the treatment X (use of the design-based technique) was introduced, resulting in a
new state, e.g., O X O, or to use the subscripts introduced in (Cook and Campbell, 1979),
O
1
X O
2
.
The third quasi-experimental element in this particular study is the comparison of
AFATDS and CCPDS-R data. The CCPDS-R project used a technique similar to the
design-based technique described herein, and also focused on critical skills acquisition
and deployment within the project. The AFATDS project, observing the success of the
FAAD C
2
I project, used a technique similar to the design-based technique described
herein, but did not focus on critical skills acquisition and deployment (at least, not at
first). I will use this comparison to assess hypothesis d.
Both methodology books cited above identify the potential sources of invalidity
for these specific quasi-experimental designs, citing “history”, “maturation” [in (Cook
and Campbell, 1979), a special version of this threat is introduced that they call “cyclical
maturation”], and “testing” as the most likely threats.
107
Of special interest to this particular study, due to the relatively long time involved
between states (for example, the FBCB2 project, the data for period I are from 1998,
while the data from period II are from 2007), are those risks that derive from various
types of potential “contamination effects across time”. This is in fact covered by the risk
category that (Cook & Campbell 1979) calls ‘History’ – “observed effect might be due to
an event which takes place between the pretest and the posttest”.
Re-stated in the terms of this particular study, for example, if the key technical
personnel performing the work on the FBCB2 project had changed significantly from
period I to period II, for example, that change could be the underlying cause of the error-
report behavior shown in the preliminary data analysis, rather than the use / non-use of
the design-based technique. Similarly, if the military mission changed, or the technical
requirements changed, or the key Government personnel changed, those might be the
underlying cause, rather than the use / non-use of the design-based technique.
I addressed this risk through a series of plausible rival hypothesis (described
below, chapter 3.3); I have postulated what I view as a complete set of such intervening
events that could cause a change in a dependent variable; during the study, each of these
plausible rival hypotheses was assessed in order to determine if there is a credible basis
for finding that they, rather than one of the experimental hypotheses, was the underlying
cause of the behavior of the dependent variables. The applicable plausible rival
hypotheses from chapter 3.3 are the following:
Significant changes in the scope of the product to be produced
108
Significant changes in the key personnel on the prime contractor’s project
team
Significant changes in the key personnel on the customer’s project team
Significant changes in the developmental maturity of the prime
contractor’s project team
Significant changes in the organization institutional framework of policy
guidance, procedures, and training for system engineers and software
developers
Significant changes in the contractual structure within which the
development proceeded
Significant changes in project procedures
(Cook & Campbell 1979) also identifies the following additional threats to
internal validity that may relate to potential contamination effects across time:
Maturation – “observed effect might be due to the respondent’s growing
older, wiser, stronger”.
Testing – “observed effect might be due to the number of times particular
responses are measured”.
Selection – “observed effect might be due to the kinds of people in one
experimental group as opposed to another”
Mortality – “observed effect might be due to the different kinds of persons
who dropped out of a particular treatment group during the course of an
experiment”
109
These, however, all relate to characteristics of human respondents. I am not
tabulating human responses to questions or stimuli; I am tabulating responses of an
electro-mechanical system collected during a formal test program. These risks are
therefore not in fact a material threat to this particular study.
(Campbell & Stanley 1963) repeats many of the same threat categories, but adds
“Reactivity – the observed effect may be due to a change induced by the process of
measuring”. There may be elements of the measurements in this study where this is true,
but there is no indication (or theoretical reason to infer) that this effect, even if it were
present, would vary over time, e.g., would behave differently during period I than during
period II or period III. I investigated this during the study, and it seems that this was not
a material threat to this particular study.
In some other study contexts, there might be a class of risks due to contamination
over time that arise from a loss of fidelity in the data over time, e.g., aging of materials in
a manner that creates difficulty and ambiguity in reading the records, etc. As explained
in Appendix A, this is not a material threat to this particular study.
As can be seen from the above, the identification of those portions of my study
plan that can be categorized as “quasi-experimental”, rather than “case studies”, has
allowed me to take advantage of guidance offered in the methodological literature
regarding what are the likely risks that must be considered in the design of specific quasi-
experimental constructs (e.g., “a repeated-treatment design”, etc.). This is analogous to
the way I was able to take advantage of methodological guidance for the case study
110
portion of this plan to guide the creation of a set of tests to ensure the quality of the case
study.
In this study, the first four cases are drawn from a single problem domain – that of
tactical military command-and-control / military decision-support systems. This
selection is appropriate because: (a) I have significant familiarity with this problem
domain, and can therefore interpret the data with a reasonable degree of confidence and
accuracy [e.g., I have been that “participant observer” to which (Miller 1997) makes
reference]; (b) the domain is one that contributes to National security, and hence,
improving performance in this domain serves the public purpose of improving National
security; (c) the domain is one that involves the expenditure of significant amounts of
public funds – billions of dollars every year – and hence, improving performance in this
domain serves a significant public fiscal purpose; (d) the data available are adequate to
conduct the observational case study. The other cases are drawn from other application
problem domains, including radar, air traffic control, large-scale information storage and
retrieval, and logistics automation. This will indicate the potential of the technique to be
useful in other application problem domains, which increases the value of the study.
The flow of work through the study is summarized in Figure 3.1-2. Personal /
professional experience, informed by the literature, allows me to scope the problem
(chapter 1), and then formulate a hypothesis (chapter 2). Drawing upon methodology in
the literature, I then formulate measurement instruments [together with what (Yin 1994)
terms a case study protocol] suitable for performing the quantitative case study. Also,
drawing on data drawn from real-world projects (the first foru cases), supplemented by
111
published data regarding similar projects (the remaining cases) and personal /
professional experience, the case documentation is collected and organized, resulting in
case files. The internal validity of the case files, e.g., the professionalism of the
collection efforts, and the configuration-control of the resulting raw data, is then verified.
The measurement instruments are then applied to the cases, resulting in measurements
and statistics, which in turn allow for the formulation of an interpretation of what the
cases indicate about the hypothesis.
Although the core of the case study is quantitative, as noted by (Flyvbjerg 2006)
“Case studies often contain a substantial element of narrative. Good narratives typically
approach the complexities and contradictions of real life”. This case study will contain
both quantitative analysis and narrative.
literature
Personal /
professional
experience
Data drawn from
real-world
development
projects
Published
materials
summarizing
development
project results
Formulate
hypothesis
Formulate
measurement
instruments &
case-study
protocol
Compile case
files; validate
integrity /
configuration
control
Evaluate cases,
using
measurement
instruments &
case-study
protocol
Formulate
interpretations
Formulate the
topic; scope the
problem
Describe the
proposed
design-based
technique
Figure 3.1‐2. Overview of the flow‐of‐work through the study.
112
This flow-of-work addresses the four problems that (Yin 1994) recommends be
addressed in case-study design:
Yin’s “problem” Corresponding feature of the flow‐of‐work
What questions to study Formulation of study motivation and background,
definition of the study hypothesis
What data are relevant Selection of cases
What data to collect Formulation of measurement instruments & case-study
protocol, confirmation of data integrity / configuration
control
How to analyze the results Formulation of case-study protocol (guidelines to
interpretation)
Table 3.1‐2. Addressing Yin’s four problems in case‐study design.
3.2 Selection and discussion of the cases
By the term “cases”, I mean selected real-world engineering projects whose data
will be assessed in order to test the validity of the hypotheses of chapter 2. Over the
course of their period of performance, certain data were collected. These data are the
principle sources that I draw upon in performing this study.
In selecting the cases to use, I had one mandatory criterion: I selected cases only
from those projects that exhibit the characteristics that match the scope description of
chapter 2.1, above. Once I had identified a set of such projects, I was then free to select
from among them those projects that had sufficient data available to support the purposes
of this study.
The first case (FBCB2) is a project on which the selected technique described for
centralizing control of the dynamic behavior of a system during the design process was
applied for several years, then its use was discontinued, and then its use was re-instated.
I served as the initial program manager for this project.
113
The next three cases are projects that were performed around the same time as the
FBCB2 case, by the same company, and in the same application problem domain area as
FBCB2. The selected technique described for centralizing control of the dynamic
behavior of a system during the design process was applied mid-way to one of these
projects (I was the chief engineer on this project), and was not applied on the other two.
Taken together with the FBCB2 case, this allows the compilation of case-study data from
two projects on which the technique was applied, and two on which it was not.
The application problem domain area for the first four cases is military command-
and-control. The use of information technology on the battlefield is simultaneously both
an old and a new field. Clearly, it has long been understood that knowledge on the
battlefield can be of significant military value. But the information technology
“revolution” of recent decades has offered unusually high leverage and unusually wide-
spread application of such knowledge. For example, the U.S. Army has estimated that
these “battlefield digitization” efforts have brought improvements on the order of a factor
of 2 to a factor of 5 in the combat power of a U.S. Army tactical unit
26
.
These “battlefield digitization” efforts have been large-scale, complex, and
challenging engineering development programs, involving significant technical
challenges, and also have significant “social” challenges – organizational roles, tradition,
26
(TRADOC 1998) found that one of the systems to be used in this case study, FBCB2, provided an
increase in combat power to a U.S. Army heavy combat brigade of about 2.5. (OPTEC 1994) found that
FAAD C
2
I (another of the projects to be used in this case study) increased the combat power of a U.S.
Army air defense battalion by a factor of more than 3; field exercise results at the National Training
Center in March 1997 resulted in actual FAAD C
2
I measured performance being increased by a factor of
more than 10 over the baseline (e.g., pre‐FAAD‐C
2
I) performance level. There is additional discussion of
this matter in (Siegel 2002).
114
perceptions, safety, budgets (each of the programs to be described below were billion-
dollar-class investments), schedule, coordination, staffing, and so forth. Each involved
some combination of the following: large-scale systems engineering efforts,
requirements specifications with many thousands of requirements, many types of
specialized custom-developed hardware, installations on thousands of combat vehicles,
and also each had on the order of 1,000,000 lines of custom-developed software.
Six additional cases have also been selected; these extend the study into four
additional application problem domains.
The initial case is distinguished by the fact that individual records in the error-
report logs of the project are examined for portions of the project life-time; through this
examination, I have identified error reports that reflect inappropriate system dynamic
behavior, and are therefore errors directly attributable to the cause of interest to this
study. For the next three cases, I use statistical summaries of the monthly error logs, but
perform no examination and characterization of individual error reports. For the
supplementary cases, I draw upon cost/schedule data, qualitative statements from
published literature, project records, and personal interviews with key practitioners.
3.2.1 Initial case
The initial case is the Force XXI Battle Command Brigade-and-Below system
(also sometimes called the “Blue-Force Tracking Systems”; hereafter, either FBCB2 or
FBCB2 / BFT). This system entered operational service in 1998, and is still in use.
FBCB2 brings real-time information about both friendly and enemy forces to every
individual Army combat platform – tanks, armored personnel carriers, helicopters,
115
artillery, reconnaissance, and so forth (and starting in 2003, also for the U.S. Marines). It
provides geographically-based information (e.g., your own position, the position of
nearby friendly and enemy forces), information of the sort that the military terms ‘control
measures’ (e.g., unit boundaries, lines of march, etc.), ‘areas of interest’ (e.g., mine fields,
contaminated areas, etc.), and ‘geographically-based status’ (e.g., bridge damage reports,
etc.). It provides unit-based information (e.g., orders, status reports, supply summaries
and requests, etc.). It provides network-based information (e.g., who you can talk to,
when was the last time you heard from a particular unit, technical status of
communications and networking devices, etc.). By providing near-simultaneous secure
and credible information to every member of the combat team, it has permitted actions
never before possible: rapid and reliable night river-crossings, continued high-speed
maneuver in sand-storms, and so forth, and achieved the improvement in combat power
cited above. This system is the mainstay of the U.S. Army’s efforts to “digitize the
battlefield”. Additional background information about this system is provided in
Appendix C.
The project used the technique described in chapter 2 during an initial portion of
the project (termed “project period I”, about 8 years), did not use the technique during a
second portion of the project (project period II, about 3 years), and resumed using the
technique during a third portion of the project (project period III, about 3 years). The
case study will compare measurement instrument findings across the 3 project periods.
3.2.2 Remaining cases
Nine additional cases will be used. The first of these is the U.S. Army’s Forward-
116
Area Air Defense Command-Control-and-Intelligence system (usually, and hereafter,
abbreviated “FAAD C
2
I”). FAAD C
2
I entered operational military service in 1993, and
is still in use. It has total responsibility for protecting U.S. military land forces (and
civilian personnel and structures in the areas near U.S. military land forces) against
threats that travel through the air, engaging these threats at ranges up to about 6
kilometers. Such threats include enemy fighter aircraft, enemy helicopters, short-range
rockets, artillery shells, and mortar shells. This system, therefore, protects against threats
that might be used by either nation-state or non-nation-state adversaries. Additional
information about this system is provided in Appendix B.
Originally designed to protect U.S. forces in Europe against Soviet high-
performance fighters and attack helicopters, this system has been adapted to new
missions as the military and political threat faced by our country has evolved over the 17-
year period since this system entered operational service. It is at present used every day
in Iraq and Afghanistan to defend against rockets, artillery, and mortar attacks, and is
credited with very significant savings in lives and property. FAAD C
2
I started without
the design-based technique (“period I”), adopted it, and has retained its use to the present
day (“period II”).
The next two cases are called herein AAAA and BBBB. They also perform
military command-and-control missions; the specific nature of the missions of these
programs is classified, and cannot be discussed in this context. Note, however, that the
necessary project metric data are unclassified, and available for use by this study.
Neither program used the design-based technique.
117
Six additional cases are drawn from other projects described in the literature.
These provide data from other application problem domains. These cases are projects that
did use the design-based technique (although, not all at their outset); they are described in
chapter 4.1, below.
Time sufficient to permit reflection and analysis has taken place since the conduct
of these projects. The actual outcome of the projects can be assessed, not just in terms of
their cost, schedule, and management parameters, but in the sense of understanding the
actual efficacy and suitability of these system – all have now been in operational use for
many years
3.2.3 Summary of the cases
All of the projects selected as cases display the characteristics described in
chapter 2.1 as the bounds of systems-of-interest for this study, including: (a) complex
emergent behavior; (b) interactions with physical devices (e.g., moving missile turrets,
other time-sensitive mechanical devices, and so forth); (c) stressing asynchronous stimuli
(such as extraordinary high data-ingest rates [e.g., in systems AAAA and BBBB] or
highly-stressed communications structures [e.g., in FBCB2 and FAAD C
2
I]; (d)
extraordinarily high reliability requirements; (e) development efforts of large size; and (f)
a need to display early progress through prototyping . This is depicted in Figure 3.2-1,
below.
118
Case /
project
name
Complex
emer‐
gent
behavior
Inter‐
actions
with
physical
devices
Stressing
asynch‐
ronous
stimuli
High reliability
requirements
Large
develop‐
ment
effort
Need to
display
early
progress
FBCB2 X X X X X X
FAAD C
2
I X X X X X X
AAAA X X X X X X
BBBB X X X X X X
CCPDS‐R X X X X X X
AFATDS X X X X X X
THAAD
radar
X X X X X X
Logistics
automation
project
(CSSCS)
X some X X X X
Large‐scale
information
processing
(CCCC)
X X X X X X
Air‐traffic
control
system
(Eurocontrol)
X X X X X X
Table 3.2‐1. Cases versus system characteristics.
Table 3.2-2 maps the cases against the independent and dependent variables.
119
Table 3.2‐2. Cases versus independent and dependent variables.
[ Key: 1 – Use / non‐use of design‐based technique. 2 – Use / non‐use of critical‐skills
management. (y) – did use; (n) did not use; (ni) did not use initially, but later adopted its use;
(yny) used it, discontinued its use, re‐adopted it ]
The first portion of the research design allows investigation of the efficacy of the
design-based technique, using fine-grained metrics that relate directly to the literature-
based sources of theory (see answer 2, above). As is explained in the dissertation, all of
these portions of the investigation were done in the context of an organization that had
instituted a strong focus on acquiring and deploying critical skills, including in exactly
the areas relevant to the design-based technique.
Dependent variables
Case / project name
Ind.
variable
Density of
attributable
defects
System
reliability
Variance of
critical port‐to‐
port timing
threads
Scalar metric:
cost / schedule
performance +
customer
assessments
Qualitative
assessments
FBCB2 1 (yny) /
2 (y)
11 2
FAAD C
2
I1 (ni) /
2 (y)
11 1 2
AAAA 1 (n) 1
BBBB 1 (n) 1
CCPDS‐R1 (y)/
2 (y)
21
AFATDS 1 (ni)/
2 (ni)
21
THAAD radar 1 (y)/
2 (y)
21
Logistics
automation
project (CSSCS)
1 (ni)/
2 (y)
21
CCCC 1 (y) 1
Eurocontrol 1 (y) 1
120
In Figure 3.2-1, above, I show the case and dependent-variable coverage for the
first independent variable in the study (use / non-use of the design-based technique),
separately for each dependent variable.
For the first dependent variable (defect density), I have four cases, one of which
(FBCB2) divides into 3 separately-measurable periods, resulting in 6 measurable periods.
Three of these measurable periods were instances where the design-based technique was
used, and three were instances where the design-based technique was not used. I use
arrows in the diagrams to show the sequencing of the periods of FBCB2.
For the second dependent variable (reliability), I have two cases, one of which
(FBCB2) divides into 3 separately-measurable periods, and the other of which (FAAD
C
2
I) divides into two separately-measurable periods, resulting in 5 measurable periods.
Use of the design‐based
technique
Non‐use of the design‐
based technique
FBCB2 PI
FBCB2 PIII
FAAD C
2
I
FBCB2 PII
AAAA
BBBB
DV 1 (defect density)
Use of the design‐based
technique
Non‐use of the design‐
based technique
FBCB2 PI
FBCB2 PIII
FAAD C
2
I
FBCB2 PII
FAAD C
2
I
DV 2 (reliability)
Use of the design‐based
technique
Non‐use of the design‐
based technique
FAAD C
2
IFAAD C
2
I
DV 3 (variance in port‐to‐port timing)
All in the context
of an
organization that
focused on
acquiring and
deploying critical
skills
Figure 3.2‐1. Dependent variable coverage for the first independent variable.
121
Three of these measurable periods were instances where the design-based technique was
used, and two were instances where the design-based technique was not used. I again use
arrows in the diagrams to show the sequencing of the periods within each single case.
For the third dependent variable (variance in port-to-port timing), I have one case,
which divides into 2 separately-measurable periods. One of these measurable periods
was an instance where the design-based technique was used, and the other was an
instance where the design-based technique was not used. I again use an arrow in the
diagrams to show the sequencing of the periods.
Therefore, for DV1 (defect density), I had one case that involved a sequence of
transitions characterized as use / non-use / use (FBCB2). One could characterize this
transition from use to non-use and back to use as an application of experimental controls
so as to create a full treatment cycle, although (as explained in the answer to question 12)
it was the Army project office (and not the experimenter) who made the decision to
transition from use to non-use, and the prime contractor (and not the experimenter) who
made the decision to transition back to use. As discussed in the dissertation, these
transitions took place as a sequence in time, rather than concurrent control and treatment
groups. The combination of transitions taking place outside of the control of the
experimenter (although well-documented), and the lack of randomness in the assignment
are features that justify the characterization in the dissertation of this work as a quasi-
experiment.
Similarly, for DV2 (reliability), I had one case that involved a transition from
non-use to use (FAAD C
2
I) (half of a treatment cycle), and one case that involved a
122
transition from use to non-use, and then back to use (FBCB2) (a full treatment cycle).
For DV3 (variance of port-to-port timing), I had one case of a transition from non-use to
use (FAAD C
2
I) (half of a treatment cycle). There is no use-to-non-use transition for
DV3; this might be viewed as a minor limitation to the study.
I believe that the above provides a sufficient experimental control element to
isolate the effect of the treatment.
There is no stand-alone instance of a use-to-non-use half-treatment cycle in the
study, although there is a use-to-non-use transition embedded in each of the two full
treatment cycles cited above. Because of this, I do not view this as a limitation.
In Figure 3.2-2, above, I show the case and dependent-variable coverage for the
second independent variable in the study (application / non-application of critical skills).
I structured this portion of the study as a 2x2 matrix, e.g., use / non-use of the design-
based technique, crossed with application / non-application of critical skills, resulting in
Use of the design‐based
technique
Non‐use of the design‐
based technique
6 cases / periods 3 cases / periods
DV 4DV 4
1 case / period 1 case / period
DV 4DV 4 Application of
critical skills
Yes
No
Various GAO reports (2003‐2011) document additional system‐
development failures which they attribute to lack of critical skills
in the project work‐force (both contractor and Government)
Figure 3.2‐2. Dependent variable coverage for the second dependent variable.
123
four quadrants of quasi-experimental space. The dependent variable is a scalar metric of
project performance; this metric has the advantage that it utilizes metrics that are
important to the practice community (e.g., cost and schedule, award-fee scores), although
these are metrics that relate in a derived fashion to the theory underlying the operation of
the treatment (while those metrics used in conjunction with the previous figure directly
relate).
I use six cases in this portion of the study, of which one (FBCB2) again divides
into 3 measurable periods, and of which three (FAAD C
2
I, AFATDS, and CSSCS) each
divide into 2 measurable periods, resulting in a total of eleven measurable periods. As
shown on the slide, I have at least one case / period combination in each quadrant of this
matrix, and measure the relevant dependent variable (DV4, the scalar metric of project
performance) in each measurement period, and in each quadrant. I have only a single
measured instance in each of the two lower quadrants, which could be considered a
limitation of the study.
For DV4, I had one case of transitioning from non-use to use of the application of
critical skills (on the AFATDS project) (half of a treatment cycle), and various use / non-
use patterns for the design-based technique. Table 3 3.2-3, below, shows the conditions
covered by the data, grouped according to control / treatment nomenclature:
Control (neither
skills nor DBT)
½ treatment (either skills
or DBT, but not both)
Full treatment (both
skills and DBT)
Dependent
variable 4 (scalar
metric of project
performance)
1 sample 3 samples (skills but no
DBT)
1 sample (DBT but no
skills)
6 samples
Table 3.2‐3. Control & Treatment for Dependent Variable 4.
124
The results include a complicated set of transitions amongst the four quadrants.
Although not every possible transition around the 2x2 matrix takes place (the ones
represented generally move from the lower-right to the lower-left to the upper-left), I
believe that the transitions available provide a sufficient experimental control element to
isolate the effect of the treatment. The transitions that were not present in the study were
those of discontinuing the emphasis on critical skills. It is hard to imagine such an action
on a real project. Data about such transitions could be collected in an academic setting,
but would be limited to far smaller scale than that examined here. The opportunity to
examine these phenomena at realistic scale is a major benefit of this particular study.
My expected results were that project performance (as measured by the scalar
metric) would be worst for the control cases (e.g., application of neither the DBT nor
emphasis on critical skills), and best when both halves of the treatment were used. I
entered the study with the expectation that application of the DBT alone would be more
effective than the application of critical skills alone, e.g.:
neither < critical skills only < DBT only < both
To me, the most striking portion of the result is the large gap between DBT only
and both – the most material improvement takes place only after both halves of the
treatment are applied.
Density of defects attributable to the cause of interest: For the cases used to
assess density of attributable errors, I have error-report counts by date opened and
125
category; earned-value management system data concerning cost, schedule, and risk;
project sizing (in terms of software lines-of-source-code, and other metrics, e.g., parts
count for hardware elements); project status reports; project technical documentation; and
for some of the projects, access to the actual project error-report logs. The latter item is of
cardinal value; I can actually examine all of the category-1 error reports for a given time
period, and score them as being caused by an error in the way the system controls
dynamic behavior or not (e.g., due to some other cause). This allows me to create a direct
representation and analyses of defect density for those defects attributable to the causes
of interest (e.g., hypothesis a) within and between periods I through III. I can also create
a variety of statistical descriptions (e.g., what portion of category-1 errors are, on
average, attributable to the causes of interest, and does that vary within and between the
periods where the project used / did-not-use the design-based technique, etc.).
System reliability: For the cases used to assess system reliability (FBCB2 and
FAAD C2I), I have access to reliability testing data and reliability modeling data. These
allow me to looks at reliability between periods 1, 2, and 3 of FBCB2, and between
periods 1 and 2 of FAAD C
2
I (hypothesis b).
Variance of critical port-to-port timing threads: Both FAAD C
2
I and FBCB2
built detailed discrete-event models of their dynamic operation, and then calibrated those
models via comparison to benchmark measurements made on the actual system under
operation. These modeling and benchmarking results, including port-to-port timing
measurements / predictions, are contained in project technical reports, and on occasion,
126
summarized in monthly project status reports and monthly status briefings. I will use
such data from the FAAD C
2
I project to assess hypothesis c.
Critical skills: Six of the cases are projects that attempted similar technical
interventions, but differed in their approach to managing critical skills within the team.
This will allow me to assess hypothesis d. For the cases used to assess the application of
critical skills to a project, I have descriptions of project outcomes and issues experienced
along the way. These include qualitative data (such as interviews with participants), and a
mix of quantitative and qualitative data drawn from published sources, including project
records (e.g., cost and schedule data). For example, Walker Royce, the former chief
software engineer on the Cheyenne Mountain Complex Processing and Display System –
Replacement (CCPDS-R) project has written a book (Walker 1998) that uses this project
as its central case study, and has a specific appendix dedicated to project metrics and
lessons-learned (Appendix D). There are many conference presentations and articles that
draw data from this same project, as well. As noted above, most of these cases are drawn
from application problem domains (radar, air traffic control, large-scale information
storage and retrieval, and logistics automation) other than the one represented by the first
four cases (military command-and-control).
127
3.3 Preparatory Steps and Study Methodology
The following steps comprise the specific steps that will be used for this study:
a. Scope the problem – explain and analyze the problem that this study is trying to
address:
The motivation for the study is the overly-frequent and persisting failures of
large-scale development projects for software-intensive systems, where “failure”
signifies some combination of requiring significantly more money to complete than
originally planned, requiring significantly more time to complete than originally
planned (a schedule over-run usually implies that there is a corresponding cost over-
run, but it is perfectly commonplace for programs to have cost over-runs independent
of schedule problems, or out of proportion to schedule problems), under-delivery of
specified functionality, lack of suitability of the delivered system for the actual
intended use (even if it meets the specification), or cancellation of the development
project before a useful product has been delivered (usually, due to some combination
of the above factors). For example, (Glass 2001) cites data from 1995 (around the
time of the cases used in this study) indicating that only about 16% of system
development projects that he examined were listed as successful by their own
developers.
The study will look at multiple cases within a single, specific application
problem domain – tactical military command-and-control / military decision-support
systems – and then supplement those in-depth analyses with analysis of projects in
other application program domains.
128
A hypothesis regarding one way materially to improve these development
project outcomes has been formulated (chapter 2). The scope of the study is not to
find a “comprehensive” solution for such a large problem, but to postulate one
approach that could bring a material level of improvement; in particular, to assess the
hypothesis that the use of a specific design-based technique that involves centralizing
the control of the dynamic behavior of a system will produce more consistent and
better development project outcomes. This is based on the assessments cited that one
source of technical problems is “unplanned dynamic behavior” of the system, leading
to severe short-comings in capacity, performance, and reliability; in essence, this
identifies one particular way in which a design for a system may be inadequate.
Without making use of the candidate design-based technique, such design weakness
may not become apparent until fairly late in the design-and-implementation cycle,
where data [e.g., (Shull 2002), (INCOSE 2007), etc. ] indicate that the cost and
schedule impact of correction is far higher, and having such major technical problems
late in a project can lead to project cancellation [ per (Cureton 2010); (Rechtin 1991)
says that when such conceptual errors are discovered late in a program “recovery may
be impossible and the system abruptly terminated” ]. This validity of this hypothesis,
and its implications, will be assessed through the case-study methodology.
129
b. Formulate hypotheses:
In chapter 2, four specific hypotheses were formulated. Also in chapter 2, the
testability of these hypotheses was addressed by developing the chain of reasoning
that links hypothesis to directly-measurable data items.
c. Describe the proposed design-based technique:
In chapter 2, the proposed design-based technique (e.g., use engineering /
design techniques to partition the work into separate “bins” of skill-level, so as to
isolate much of the most difficult work into a small portion of the over-all project
work, and thereby centralize control of the system’s dynamic behavior) was
described. The specific “skill bins” used in the study are described and explained in
Appendix D.
d. Design the approach to the case study, and compile case data:
As described above, the study will take the form of an observational case
study, with certain elements having the nature of a quasi-experiment.
A key step is the selection of the cases. To be a candidate for a case, the
project must first of all display the characteristics described in chapter 2. I also have
elected to choose the first four cases all from a single application problem domain
(tactical command-and-control), in order to decrease the variation across these cases.
To select within the projects available in this domain, I used the criteria of (Yin 1994)
and (Flyvbjerg 2006): I selected cases for “maximum variation”, that is “three or four
cases that are very different in one dimension”, in this case, the use or non-use of the
particular design-based technique described in the hypothesis. I have therefore
130
selected two cases that used the technique, and two that did not. This follows the
guidance of (Yin 1994), who recommends that in multiple-case studies, “each case
must be carefully selected so that it either (a) predicts similar results, or (b) produces
contrasting results but for predictable reasons”. In addition, I have selected additional
supplementary cases from other application problem domains, so as to allow an
assessment of the efficacy of the technique in multiple problem domains. The cases
were described in section 3.2, above.
Another key step is to identify those portions of the study that exhibit traits of
a quasi-experiment, and structure those portions so as to take advantage of
methodological guidance (e.g., the literature describes particular types of risks for
particular forms of quasi-experiments). This was done in chapter 3.1, above.
Steps to be followed in order to compile the case data include: obtaining
written permission from the owner of the data to use it for the purposes of this study;
obtaining the raw data, mostly in electronic form; organizing the data into the cases;
creating indexes of contents; and performing a preliminary data analysis on the initial
case (FBCB2), so as to ensure that the data were complete enough for the purposes of
this study.
The case files consist of large-scale records surviving from four real-world
projects (the first four projects listed in Table 3.2-1), and smaller sets of records from
six additional projects. The type of data available include project plans (e.g.,
management plans, software development plans, integration plans, test plans,
configuration management plans, etc.), requirements documentation (e.g.,
131
specifications, interface control documents, etc.), technical documentation (e.g.,
design documents, design analysis, technical reports, etc.), and status documentation
(e.g., test results including error-report counts by date opened and category; earned-
value management system data concerning cost, schedule, and risk; metrics reports,
including project sizing in terms of software lines-of-source-code, and other metrics,
e.g., parts count for hardware elements; project status reports and briefings, etc.).
e. Formulate measurement instruments & case study protocol:
Measurement instruments and case study protocols have been established for
each of the four hypotheses.
Hypothesis a is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the
control of the dynamic behavior of a system will lower the density of those defects
that are attributable to unplanned adverse dynamic system behavior”. This hypothesis
has been assessed against ten cases.
The project examined for hypothesis a proceeded in multiple increments of
delivered capability, e.g., a cycle that included analysis, design, implementation,
integration, test, and so forth [along the lines of Boehm’s spiral model, described
eloquently in (Brooks 2010), although in this project it included risk-based
stakeholder review points between spirals, making it similar to what (Boehm 2010)
calls the “incremental commitment spiral model”]. This offers repeated phases
during which comparable data can be collected and analyzed, e.g., at the time each
increment enters formal test. Note that at any given period of time, multiple such
132
increments of capability are likely to be in various stages of work simultaneously.
For example, one or more incremental versions may be in operational use. One
version may be undergoing formal test. Another version may be in development, and
another version may be in the specification phase. This simultaneity is a factor that
must be considered in analyzing errors in real projects; it may, for example,
contribute to additional errors and inconsistencies.
For the FBCB2 case, actual project error report logs from the selected time-
periods (corresponding to the selected project phase) were examined; individual error
reports were assessed, and scored as either having a root cause attributable to a
problem with product dynamic behavior, having a root cause not attributable to a
problem with product dynamic behavior, or having a root cause that cannot be
assessed as one or the other. The counts of attributable errors were turned into a
density (e.g., defects / size of implementation, where this size is a composite scalar
representation that combines software and hardware sizing metrics).
A “voice of the process” analysis [ e.g., control charts; see (Wheeler 2000) ]
was performed on the density data within each project period. This allowed a
determination of whether or not the data are consistent enough (“predictable” is
Wheeler’s nomenclature) to be used to draw a conclusion regarding “typical”
behavior of this metric in this project period.
Since a predictable voice of the process was in fact found for all three periods,
a control chart analysis could then be made across the three periods.
133
Other factors that may have changed across project periods (in addition to the
use / non-use of the design-based technique) were identified from project
documentation (e.g., software development plan, management plan, etc.), and from
interviews with project personnel; these were used to establish a set of plausible rival
hypotheses, which were then tested. Multi-cause outcomes were at times selected
from this analysis, e.g., some portion of the effect was attributed to the rival
hypotheses, and the remainder attributed to the study hypothesis.
In such cases, the control charts were adjusted to account for the above multi-
source outcome, and thereby provided the measurement baseline that allowed the
testing as to whether or not the variation between project period I, project period II,
and project period III is due to the cause implied in the study hypothesis.
The findings regarding the directly-attributable errors were used to develop a
statistical “signature” that distinguishes errors due to the cause of interest to the study
from other sources of error.
As can be seen in the above, for hypothesis a the independent variable is the
use or non-use of a method of centralizing responsibility for managing and
controlling the dynamic behavior within the system under development, which I refer
to as the “design-based technique”. The key dependent variable is the density of
errors that are directly attributable to unplanned dynamic behavior. These form the
measurable parameters that will indicate whether the hypothesis is sound (e.g.,
justified by the measurement findings). The defect-reporting methodology employed
134
categorizes all reported errors into 4 bins
27
(termed category-1 through category-4),
with category-1 being the most severe, and on average, these are ones that take the
most time and effort to correct. Instances of unplanned dynamic behavior that were
exhibited during testing would be documented as problem reports, and therefore are
captured in the data. The expectation is that, correcting for other factors, if the
hypothesis is valid, there will be fewer such serious errors at a given stage of a project
that applies the technique than one that does not.
Count, density, and severity of deficiencies at a given project stage are widely
recognized by the customers for systems of the selected problem class as indicators of
emerging quality of the system; these are precisely the metrics that such customers
use to make decisions about the efficacy of the design, and even, whether or not to
continue development to completion. The use of these metrics is likely to increase
the credibility and understandability of the study findings.
A data collection and calculation format (measurement instrument) was
constructed that implemented the above.
A procedural description of the data handling, data processing, and analysis
process has been created. This is termed the case study protocol, which defines the
process to be followed from raw source data, all the way through to the creation of
output data (which is the input to the “formulate interpretations” step). Figure 3.3-1
provides a graphic depiction of the case study protocol for hypothesis a.
27
On some of the projects, 5 bins were used.
135
My first three dependent variables (defect density, reliability, and variance in
port-to-port timing) all are measured during formal test cycles on the projects that
form the cases. A test cycle [ as described in literature sources such as (PMI 2004) ]
consists of a structured series of tests at increasing levels of integration, moving up
from testing the work of individual developers, to testing small assemblies of
software-only and small assemblies of hardware-only, to sub-systems of integrated
hardware and software, eventually reaching tests of the complete system, usually
phased to consist of first a laboratory portion and then a field portion. The criteria for
selecting an event to guide time-period selection within each project period was a
range of such tests sufficient to gather formal, documented test results about the three
Figure 3.3‐1. Graphic depiction the case study protocol for hypothesis a.
Select an event
that will guide
time-period
selection within
each project
period
Hypothesis a
For each period (I-
III), select a time-
window that
corresponds to the
selected event
Go through project
records, and
identify all
category-I error
reports that were
opened during
those time-periods
Read and score
those error
reports: (a)
attributable (b) not
attributable (c) not
scored
Create counts by
calendar month for
the attributable
errors
Create a control
chart for each
period
Check for signals
of unpredictability
in each control
chart
If all six
measurement
periods are
“predictable”,
create a 6-period
chart
Assess if the
difference between
periods match that
implied by the
hypothesis, and if
so, if this
difference is
statistically
significant
136
dependent variable(s) that were selected for collection by that case. To illustrate with
specific instances:
I wanted to collect defect density and reliability data from FBCB2, therefore,
my selection criteria when looking at FBCB2 was “test events that collected
defect density and reliability data”.
I wanted to collect defect density, reliability data, and data regarding variance
in port-to-port timing from FAAD C
2
I, therefore my selection criteria when
looking at FAAD C
2
I was “test events that collected defect density, reliability
data, and data regarding variance in port-to-port timing”.
Once I have established criteria for selecting an event, I examined project
records to determine when an event that matches those criteria took place. I used
documents such as monthly project status reports, project schedules, and project test
plans as source documents. Since, as is clear from the above description of the
selection criteria for the event, the “event” is actually a sequence of tests, the
corresponding window is several months long. The actual windows selected turned
out to be from 9 to 12 months in duration; the consistency of the length of these
windows was desirable, as it increases the comparability of the data from the different
events. At the same time, since I use data from each month within the window,
having windows at least 6 months in duration provided enough data points for time-
series and control-chart analysis, ensuring that I was not inadvertently picking
unrepresentative data from the record.
137
For hypothesis a, I have access to error-report counts by date opened and
category; earned-value management system data concerning cost, schedule, and risk;
project sizing (in terms of software lines-of-source-code, and other metrics, e.g., parts
count for hardware elements); project status reports; project technical documentation;
and most particularly, access to the actual project error-report logs. The latter item is
of cardinal value; I can actually examine all of the category-1 error reports for a given
time period, and score them as being caused by an error in the way the system
controls dynamic behavior or not (e.g., due to some other cause). This allows me to
create a direct representation and analyses of defect density for those defects
attributable to the causes of interest (e.g., hypothesis a) within and between periods I
through III. I can also create a variety of statistical descriptions (e.g., what portion of
category-1 errors are, on average, attributable to the causes of interest, and does that
vary within and between the periods where the project used / did-not-use the design-
based technique, etc.).
The following set of indented paragraphs describes the case study protocol for
the hypothesis a in some detail, so as to illustrate exactly how the surviving project
data are used in the study plan; project documents that are used are indicated in
italics:
I started the analysis of hypothesis a by going through
monthly project status reports and other project documentation
(e.g., project management plan, project test plans, project
software development plan, project integration plan, etc.), and
138
used those data to divide the project into periods I, II, and III
(respectively, use, non-use, and return to use of the design-
based technique).
I then went through the monthly project status reports,
and identified a contiguous time-frame of 9-12 months within
each of the three periods in which one round of formal
acceptance testing (and all of the antecedent activities) were
performed. These became the exact time-frames that were
analyzed from each of the three periods.
I then went through an Excel spreadsheet provided by
the project librarian that lists the following fields for every
problem report: identifying sequence number, severity code
(e.g., category 1 through 5), segment of project (terms like
“core”, “JCRV” for the vehicle software, “TSG” for the tactical
gateway, etc.), state of the problem report (open, closed,
rejected, withdrawn, etc.), date opened, date fixed, system
version number into which the fix is incorporated). There are,
for example, an average of 2,000 to 3,000 problem reports per
year for the FBCB2 case. I used this spreadsheet to identify all
category-1 problem reports opened in each month, during each
of the three time-frames that will be analyzed for the FBCB2
case.
139
For each of these category-1 problem reports, I used the
identifying sequence number field to look up the actual text of
the problem report in the on-line project problem report data
base. Using the criteria defined above, I created a
determination for each as to whether or not this problem is
attributable to an error in the control of system dynamic
behavior. I recorded this determination in a spreadsheet that
also includes the identifying sequence number and a short
description of the problem and its impact.
I then grouped the problem reports determined to be so
attributable into sums by calendar month, and recorded these
sums in a spreadsheet.
I then went through the project metric reports and/or
the Office of Cost Estimation project archives for the time-
frames that correspond to my measurement periods during
periods I, II, and III, and determined the number of effective
source lines of software code & other project sizing metrics
(e.g., hardware sizing) in the product baseline for each period.
This allowed me to convert the defect counts by month
tabulated above into defect densities. These, too, were
recorded in a spreadsheet.
140
I then placed those calculated densities into a
spreadsheet that calculated time-series and control-chart
statistics, e.g., upper natural process limit, lower natural
process limit, moving range, and so forth, and applied the
Wheeler-Kazeef tests to determine if there is a valid “voice of
the process” within each period. The accuracy of this
spreadsheet was calibrated against example data sets with
known outputs that were provided by Professor Kazeef.
Since the data displayed a consistent “voice of the
process” within each period, I then plotted the control charts
from all 3 periods onto one chart.
I linked those plots into a PowerPower file, where
annotations and explanations could be added.
As can be seen from the above, all of the data needed from surviving project
records to perform the planned analysis of hypothesis a are available.
As noted above, one of the steps performed for the FBCB2 case is the
reviewing of the error reports in the project records, and for the selected time-frames
within each of periods I (initial use of the design-based technique), II (non-use of the
design-based technique), and III (return to use of the technique), to assess every error
report in the highest severity category as either having a root cause attributable to a
problem with product dynamic behavior, having a root cause not attributable to a
problem with product dynamic behavior, or having a root cause that cannot be
141
assessed as one or the other. The following are the criteria used to perform this
assessment: Problems in controlling dynamic behavior are likely to manifest
themselves via highly off-nominal performance, e.g., crashes or otherwise un-
commanded cessation of processing, radical slowing down in processing rate, and so
forth. They are also likely to display inconsistent behavior patterns, rather than
showing the same behavior (even if wrong) in response to the same repeated stimuli.
In assessing error reports, those that displayed these characteristics were scored as
arising from problems in controlling dynamic behavior. In some instances, I had
access to the description of the correction that was eventually applied, which also
provided input to the assessment.
Also as noted below, a set of plausible rival hypotheses have been generated.
For the last of these, “changes in project procedures”, I state that the procedure to
assess this plausible rival hypotheses includes a scoring process, e.g., “The error-
report log for project period II will be examined; the errors attributable to errors in
controlling system dynamic behavior will be scored as whether, using the period I
peer review and unit test procedures, these errors could have been detected. Those
errors that could have been so detected will be removed from the scoring, and
therefore the change in project peer review and unit test procedures will not have
been a material cause in the remaining behavior.” The following are the criteria used
to perform this assessment: the project documentation defined the scope of unit
testing and peer reviews. For both peer reviews and unit tests, the scope of the item
under review is the work of a single person, and usually, a single implementation
142
module (e.g., 500 lines or less of software code, a single mechanical assembly, etc.),
and the purpose of the review is to find errors within the boundaries of that module.
This generally eliminates errors that arise from improper coordination between two or
more separate such modules as something that could have been detected by a peer
review or software unit test. Therefore, only errors that were generally within the
scope of a single such module were scored as having the potential to have been
detected by a peer review or unit test.
For the other cases examined as a part of hypothesis a, the data available
include error-report counts by month and category; earned-value management system
data concerning cost, schedule, and risk; project sizing (in terms of software lines-of-
source-code, and other metrics, e.g., parts count for hardware elements); project status
reports; project technical documentation. This allowed me to create the necessary
representations and analyses, through a process similar to that described above for the
FBCB2 case. In these three cases, I used the measurement signature developed
during the FBCB2 case to separate errors due to the cause that I am examining from
other sources of error.
These four cases allowed me to complete the selected case study protocol six
times: three that used the design-based technique (e.g., periods I and III of the FBCB2
case, and for FAAD C
2
I) , and three that did not use the design-based technique (e.g.,
period II of FBCB2, and projects AAAA and BBBB).
For the remaining six cases examined as a part of hypothesis a, publically-
available data was collected from other large-scale programs that were performed at
143
around the same time, and supplemented with data collected via interviews with
individuals who held key roles on these projects. These data, although relatively
sparse as compared to the data available on the first four cases, extended the analysis
into other application problem domains (problem classes).
Hypothesis b is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the
control of the dynamic behavior of a system will produce better reliability for key
system capabilities”. For the FBCB2 and FAAD C
2
I cases, I have access to
reliability testing data and reliability modeling data. These allowed me to looks at
reliability between periods I, II, and III of FBCB2, and between periods I and II of
FAAD C
2
I.
Figure 3.3-2 shows the decomposition of hypothesis b, identifying the
measurable element that is expected to change with the independent variable.
Figure 3.3-3 graphically depicts the case study protocol for hypothesis b:
Figure 3.3‐2. Decomposition of hypothesis b.
144
Hypothesis c is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the
control of the dynamic behavior of a system will reduce the variance for critical port-
to-port timing relationships”. FAAD C
2
I built a detailed discrete-event model of their
dynamic operation, and then calibrated that model via comparison to benchmark
measurements made on the actual system under operation. These modeling and
benchmarking results, including port-to-port timing measurements / predictions, are
contained in project technical reports, and on occasion, summarized in monthly
project status reports and monthly status briefings. I will use such data from the
FAAD C
2
I project to assess hypothesis c, comparing results from project period I (did
Figure 3.3‐3. Case study protocol – hypothesis b.
145
not use the design-based technique) to project period II (did use the design-based
technique). Figure 3.3-4 depicts the case study protocol for hypothesis c.
Hypothesis d is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the
control of the dynamic behavior of a system is less effective if the team implementing
the centralization lacks critical skills”. I will compare data from six projects that
employed similar technical interventions, but some of which also focused on
deployment of critical skills within the team (e.g., CCPDS-R), and some of which did
not (e.g., AFATDS), or did not at first so focus. I will use published materials,
qualitative descriptions, and interviews with personnel who held key positions on
these six projects to assess the validity of hypothesis d. Figure 3.3-5 depicts the case
study protocol for hypothesis d.
Figure 3.3‐4. Case study protocol –hypothesis c.
146
I have created a scalar metric that integrates cost performance, schedule
performance, award-fee scores (if a particular project is one that receives such scores
from their customer), and qualitative statements from customers about project
performance. For those projects that received customer award-fee scores, the metric
was 2/3 cost-schedule performance, 1/6 award-fee scores, and 1/6 qualitative
statements of contractor performance provided by the customers through various
mechanisms. For those projects that did not receive customer award-fee scores, the
metric was 2/3 cost-schedule performance, and 1/3 qualitative statements of
contractor performance provided by the customers through various mechanisms.
Figure 3.3-6, below, provides the details of this scalar metric.
Figure 3.3‐5. Case study protocol – hypothesis d.
Create a definition
of a scalar metric
for project
performance,
combining
quantitative and
qualitative
measures
Hypothesis d – critical skills acquisition and application
For each case,
identify the
contractual type
(specifically ,
award-fee or not)
Collect information
about strengths
and deficiencies,
in each 6-month
period
Apply formula for
scalar project
performance
metric, and create
matrix of scores
versus time
For
award fee
contracts
Create
corresponding
graph, and visually
annotate with
events
Define sensitivity
excursions for the
scale metric: (a)
different weights
(b) quantitative
measures only
Apply the
excursion
formulas, and
create excursion
matrices
Collect award-fee
scores, and
accompanying
information
(strengths and
deficiencies )
For
non-award fee
contracts
147
Data in support of further tests: I also have identified the types of data that are
most likely to provide support for further tests of my hypotheses. Over time, I
anticipated that additional, subsidiary questions would arise as I investigated the
hypotheses a through d. The following surviving projects records have been useful to
support such further testing of the hypotheses:
Cost score:
2% over-run or better in the current period 6
5% over-run to 2.1% over-run in the current period 4
15% over-run to 5.1% over-run in the current period 2
More than 15% over-run in the current period 0
Schedule score:
Schedule performance index of .98 or better in the current period 4
Schedule performance index of .9 to 9.97 in the current period 2
Schedule performance index of .8 to .89 in the current period 1
Schedule performance index of less than .8 in the current period 0
Award-fee score:
Award fee score for the current period > 95% 10
Award fee score for the current period of 90% to 94.9% 7
Award fee score for the current period of 85% to 89.9% 4
Award fee score for the current period of 80% to 84.9% 1
Award fee score for the current period of < 80% 0
Qualitative statements score:
Strong satisfaction, no deficiencies cited 10
Strong satisfaction, only minor deficiencies cited 7
Strong satisfaction, some material deficiencies cited 4
Anything less than strong satisfaction 0
Final score for the current period: For those projects that received award-fee
scores, the final score was calculated as: [ 2/3 * (cost score + schedule score) ] +
[ 1/6 * award-fee score ] + [ 1/6 * qualitative customer statements ]. For those
projects that did not receive award-fee scores, the final score was calculated as: [
2/3 * (cost score + schedule score) ] + [ 1/3 * qualitative customer statements ]
Figure 3.3‐6. Scalar metric for measuring hypothesis d.
148
“Design notebooks”. For the FBCB2 and FAAD C
2
I cases, I have access to a
compendium of every technical study, analysis, and report created over the
life of the project. These provide the “why” behind project design decisions,
show what alternatives were considered, and provide rationale behind project
design decisions.
System performance prediction models and results. For the FBCB2 and
FAAD C
2
I cases, I have access to the documents that describe the system-
level models, the efforts undertaken to calibrate them against real-world
measurements (benchmarks), and the range of predictions created using the
models against a range of nominal and off-nominal conditions.
Variability: I also have dealt with variability in data availability and
definition across projects. One of the reasons I selected Government contracts for the
cases to use in this study was that the nature of Government contracting results in less
variability in data availability and definition across projects than in other businesses;
for example, technical metrics (such as project size metrics and project defect
metrics) are collected and reported in similar manners on all four projects, because
they are all responsive to the same U.S. Government / Department of Defense
guidance and contractual language (e.g., Federal Acquisition Regulations, etc.). The
contracts performed even by the same company and organization for other clients
(e.g., Warner Brothers, the steel industry, various state and local Government entities,
etc.) show much more variability in data availability and definition than do the
selected cases.
149
I have considered variability in the following dimensions:
How accurate are the data?
How complete are the data?
What are the sources of ambiguity, especially across projects
With regards to accuracy, I am on solid ground. The data to be used were all
part of formal contractual document deliveries; a formal and controlled process for
collection, validation, and analysis was used by each project. Configuration-
controlled baselines were established and maintained. For defect data, tests were
executed in accordance with formal test procedures; results were captured on the test
procedures, and any off-nominal behavior documented as a problem report. A formal
configuration-control board dispositioned problem reports, and its secretariat kept
formal records. If and when a problem was corrected, it was not deemed accepted
until reviewed and approved by the configuration-control board, and logged into the
official project records. These processes were documented in a formal project
configuration management plans, and compliance with these established procedures
was verified through periodic audits.
With regards to completeness, I have performed a preliminary data analysis,
which shows that the surviving data appear to be sufficient to support the study
questions. As noted above, additional data are available to support further questions
that may emerge over the course of the study.
With regard to ambiguity within a project, each project has a formal data
dictionary, which provides contractually-binding definitions of terms as used on that
150
project. With regard to ambiguity due to conflicting usage of terms between projects,
I will create a map of semantic equivalence across projects. Because data definitions
in most instances derive from U.S. Government and/or company-organizational
standards (which remained essentially the same over the time-period involved), I
anticipate that the number of such mappings that will be required is modest and
manageable.
Suitable data are available for all cases. For the first four cases, these data
come from official, configuration-controlled sources, and was collected by
professionals with appropriate training. The data for the remaining cases was drawn
from public published sources and DoD public records. These also include
interviews with key personnel from some of these programs.
f. Execute the case-study protocol: evaluate the cases, using the measurement
instruments:
I performed the data analysis: This included performing the indicated data
processing (which includes assessment of individual error reports, as well as
statistical analysis of error report severity, error density, and other factors), i.e.,
executing the case study protocol, which includes filling out the measurement
instruments, and performing the indicated calculations.
g. Assess the quality of the study: construct validity, internal validity, external validity,
and reliability:
(Yin 1994) offers criteria for judging the quality of case study design, in the
form of 4 tests. He also cites similar constructs in other sources (e.g., Kidler and Judd
151
1986). The following matrix summarizes how this work satisfies these criteria.
Yin’s
“test”
Yin’s offered
tactic(s)
Yin’s phase Corresponding feature of
this case study
Construct
validity
Use of multiple
sources of
evidence within
each case
Data collection Using documentation,
archival records, and
interviews – 3 distinct
sources of evidence; in
some cases, also using
participant‐observation
Establish chain
of evidence
Data collection Method and
professionalism of those
collecting the original
data, configuration
control of data
Internal
validity
Do pattern‐
matching /
explanation
building
Data analysis Comparison of empirical
patterns with predicted
ones; statistical
assessment of
correlation; search for
and accounting for other
effectors
Do time‐series
analysis
Data analysis Control charts (Wheeler
2000); derived from
Walter Shewhart’s work
in the 1920’s
External
validity
Replication via
use of multiple
cases
Case‐study
design
Ten cases
Reliability Use case study
protocol
Data collection Use of instrument +
procedures to be
followed in using the
instrument + analysis
criteria
Table 3.3‐1. Satisfaction of Yin’s four tests.
In order to assess construct validity, per (Yin 1994): This involved
developing multiple sources of evidence within each case. This was accomplished
152
within this study through the use of what Yin terms documentation, archival records,
and interviews – 3 distinct sources of evidence; in some cases, also using what Yin
terms participant-observation.
The second portion of construct validity requires the establishment of a chain
of evidence, e.g., select the items to be measured, and establish (via a chain of
reasoning) that these items do indeed drive the behavior to be investigated. How this
was accomplished for this study was described in chapter 2.
In order to assess internal validity, per (Yin 1994), there are 3 key steps in
assessing internal validity of a case study:
Do pattern-matching – “a comparison of an empirically-based pattern with a
predicted one” (Yin 1994)
Do explanation-building –iteratively refine a set of causal links by testing
them against a set of cases
Do time-series analysis – use statistical techniques to separate the “signal” in
the data from random fluctuations
Each of these three steps is discussed in more detail below, including how I
specifically applied them to this particular study. (Yin 1994) does not require that all
three steps be performed on every case study.
Pattern-matching: As noted above, this technique compares patterns of data
that come out of the cases in the case study with those patterns predicted for each of
multiple explanations. If a pattern coincides with a predicted one”, this “helps a case
study strengthen its internal validity” (Yin 1994).
153
One starts this process by creating “patterns of predicted values” (Yin 1994),
for various explanations, including the explanation embodied in the hypothesis, but
also for each of various “threats” to validity.
For example, in the FBCB2 case, the design-based technique was used (period
I), not used (period II), and used again (period III). The pattern predicted for the
dependent variables is “low-high-low”, e.g., performance at a particular level,
degraded performance during period II (when the design-based technique was not
used), and performance returning to near the original level in period III (when use of
the design-based technique was resumed). In the FAAD C
2
I case, the design-based
technique was used, and we therefore expect performance similar to FBCB2 periods I
or III. In projects AAAA and BBBB, the design-based technique was not used, and
we therefore expect performance similar to FBCB2 period II. This is summarized in
the following table:
Table 3.3‐2. Example of expected results.
(Cook & Campbell 1979) define a set of typical “threats” to internal validity
in case-study research. I have drawn from that list as guidance as I created a set of
“plausible rival explanations” [ terminology from (Yin 1994) ] and other specific
threats to the study (see chapter 3.5, below).
154
For each plausible rival explanation, I have developed a pattern of predicted
data results, e.g., if a particular plausible rival explanation is the dominant factor
driving the observed data results, the data would then look like the prediction (Table
3.3-3, parts 1 and 2). These plausible rival explanations, and the associated
prediction regarding data pattern, were created before the case data were analyzed.
155
Threat / plausible
rival hypothesis
Discussion
Predicted data
shape
Significant changes
in the scope of the
product to be
produced.
The military mission did not change materially
between periods I, II, and III. There was a gradual and
incremental addition of new functions and interfaces
across the three periods, as would be expected from
a project with incremental delivery of capabilities.
This incremental accretion of capabilities does not
appear to correlate with the low‐high‐low measures
in the 4
th
chart of appendix A.
Constant across
the 3 project
periods, with
perhaps a
modest up‐slope
over time
Significant changes
in the key personnel
on the prime
contractor’s project
team.
The large majority of key technical personnel
remained the same across periods I, II, and III. The
only technical personnel change of any significance
was that original project chief engineer left the
program for another assignment early in period III,
but remained available as a consultant and reviewer
to the team; this change does not appear to correlate
with the low‐high‐low measures in the 4
th
chart of
Appendix A.
Constant across
the 3 project
periods
Significant changes
in the key personnel
on the customer’s
project team.
No changes in the technical leadership of the
customer’s project team were implemented across
the time‐period represented by periods I, II, and III.
Constant across
the 3 project
periods
Significant changes
in the
developmental
maturity of the
prime contractor’s
project team (as
measured by
objective criteria,
such as CMMI
1
assessments).
The prime contractor’s team was a consistent CMMI
level‐5 (the SEI’s highest‐possible assessment rating)
during periods I, II, and III.
Constant across
the 3 project
periods, with
perhaps a
modest down‐
slope over time
Significant changes
in the organization
institutional
framework of policy
guidance,
procedures, and
training for system
engineers and
software
developers.
There was a gradual and incremental addition of new
guidance, procedures, and example artifacts across
the three periods, as would be expected from a
mature system‐development organization (the sector
in which this project is embedded has the most level‐
5 CMMI ratings of any organization in the world, per
the SEI’s published data) This incremental accretion of
capabilities does not appear to correlate with the
low‐high‐low measures in the 4
th
chart of Appendix A.
Constant across
the 3 project
periods, with
perhaps a
modest down‐
slope over time
Table 3.3‐3, part 1. Threats / Plausible rival explanations / Predicted data shapes.
156
Threat / plausible
rival hypothesis
Discussion
Predicted data
shape
Significant changes
in the contractual
structure within
which the
development
proceeded (e.g., did
the contract switch
from cost‐
reimburseable to
fixed‐price, or other
such change that
might drive
behavioral changes
within the
development
team).
The development portion of the contract remained the
same (cost‐plus incentive fee) for the entire time‐period
under consideration. During period II, the prime
contractor experienced significant over‐runs (a condition
not experienced in either period I or period III), and as a
customer‐relations strategy, invested a significant amount
of their own funds in order to mitigate these cost over‐
runs. But all of the work, whether funded by the
Government or by the company, was performed on the
contract, under contract guidance, and deliverables,
reviews, and quality provisions remained the same. One
might make the argument that the use of contractor
funding during period II would have caused the contractor
to take short‐cuts in testing, so as to minimize the over‐
runs and the expenditure of contractor funds; this would
have created a tendency exactly the opposite of the low‐
high‐low measures in the 4
th
chart, above, as less testing
would have generated fewer error reports.
Approximately
constant over
the 3 project
periods, with
perhaps a small
decrease in
period II.
Significant changes
in project
procedures.
During a detailed lessons‐learned review conducted by the
prime contractor (and led by the author), three such
changes were identified (e.g., where the project did
something different during period II than it did during
period I):
The use / non‐use of the design‐based technique
under consideration in this study
The rigor decreed by project guidance procedures to
be used during software unit testing
The rigor decreed by project guidance procedures to
use used during software peer reviews.
For all three of these, during period III the project
returned to follow the procedures used during project
period I.
The first of these items, of course, is the matter under
consideration in the hypothesis. The second and third
items form rival explanations that must be explored. All
three might show a low‐high‐low behavior pattern, so
additional methods will be needed to distinguish between
these alternatives.
Software peer reviews and software unit tests both are
intended to “catch” errors of a localized nature, e.g.,
errors entirely within the bounds of the item under
review. This would include coding syntax errors, interface
unit errors, and so forth.
The error‐report log for project period II will be
examined; the errors attributable to errors in controlling
system dynamic behavior will be scored as whether, using
the period I peer review and unit test procedures, these
errors could have been detected. Those errors that could
have been so detected will be removed from the scoring,
and therefore the change in project peer review and unit
test procedures will not have been a material cause in the
remaining behavior.
Low‐high‐low
(for all three)
Table 3.3‐3, part 2. Threats / Plausible rival explanations / Predicted data shapes.
157
Explanation-building: This is an iterative technique – closely related to what
(Eisenhart 1989) calls “entering the field”, wherein one builds an explanation of the
underlying phenomenology, tests that explanation against the data results from one
case, refine the explanation in light of one’s learning from those data, and then iterate
this process through additional cases. The ‘explanation’ is created by “stipulating a
set of causal links” (Yin 1994), e.g., a chain of causality or chain of reasoning; this
forms the “initial theoretical statement” (Yin 1994) concerning the explanation. One
then assesses the strength and credibility of the initial theoretical statement in light of
the initial case findings, and then will revise the theoretical statement based on the
learning from that initial iteration. One then subjects the revised theoretical statement
to the same analysis, using the data from a second case, and so forth.
One also should consider plausible (or rival) explanations, e.g., perform the
process on multiple such initial theoretical statements, one corresponding to the
hypothesis, but others that correspond to the various threats and plausible alternative
explanations.
Time-series analysis: According to (Yin 1994), “when a large number of data
points are relevant and available, statistical tests can be used . . . to analyze the data”.
I will use the control-chart methodology (Wheeler 2001) and (Kazeef 2009); these
sourced indicate that five or six data points are sufficient to provide suitable statistical
significance to the findings of such an analysis technique. I will only apply the
technique when I have sufficient data points within a case to meet this criterion.
158
The “ability to track changes over time is a major strength of cases studies”
(Yin 1994), which allows one to test data results so as to separate data statements of
statistical significance from random fluctuations, or as (Kazeef 2009) puts it, to
separate the “voice of the process” from “noise”). This is an effective way to
separate the explanation embodied in the hypothesis from the alternative explanation
that the observations are simply due to random fluctuation.
External validity: External validity is the ability to know whether the study
results are applicable beyond the immediate case study (Yin 1994). For case studies,
(Yin 1994) asserts that this is accomplished through analytical generalization:
generalizing from a particular set of results to some broader theory, which he
describes as analogous to “the way a scientist generalizes from experimental results to
theory”. The method used is replication of the findings to second and third cases,
where the theory has specified that the same results should occur. This is
accomplished in this study through the use of the ten cases. In this portion of the
study, the replication logic will be described and analyzed.
Reliability: Reliability, in this context, means that if a different observer
followed the same procedures (cases and protocols) as were used for this study, the
later investigation should arrive at the same findings and conclusions (Yin 1994).
The key techniques recommended by (Yin 1994) to assure reliability are the case
study protocol and the case study data base; these provide both transparency and
repeatability.
159
h. Formulate interpretations
(Eisenhardt 1989) and (Yin 1994) both provide methodologies for formulating
interpretations. The method depicted in Figure 3.3-7, largely drawn from (Eisenhardt
1989), will be the method used for this purpose in this study.
In chapter 5, I state whether the data appears to support or refute the
hypothesis, and provide a justification / analysis in support of that statement. Per
(Yin 1994), “If two or more cases are shown to support the same theory, replication
may be claimed”.
Also in chapter 5, I summarize the potential contribution to the literature and
the art resulting from the findings. If the findings support the hypothesis, there could
be significant value resulting from this study to the literature, system-development
standards documents, and practice regarding system development process for large-
scale complicated systems, e.g., higher project success rates, more consistent cost and
schedule performance, fewer latent defects (and hence, higher product quality).
Given the high cost and important social value of such systems, these would be
material benefits.
Lastly, in chapter 5, I discuss the potential of the results to be extended to
additional application problem domains. Although the data examined in the first four
cases were all in one problem domain, additional problem domains were examined
during the remaining six cases, and if it appears that the findings support the
hypothesis, this technique may be applicable to these (and perhaps other) problem
domains. Any system with the characteristics described at the beginning of chapter 2
160
where the indicated technique would allow centralization of the control of dynamic
system behavior could potentially benefit from this technique.
This approach will only work, of course, for such systems where those
elements that are difficult to implement are in fact separable from the other elements
of the implementation; this forms the principal boundary between systems that could
benefit from the proposed technique and those that could not. As noted above,
however, it appears the set of systems that could potentially benefit is not vanishingly
small.
Furthermore, although beyond the scope of this study, if additional design
elements that are difficult can be identified, then the analogous identification
techniques to partition their implementation may provide a similar benefit for those
systems, too, although through techniques different than the particular one examined
herein.
161
3.4 Remaining limitations and risks in the study
Several types of risk to empirical-methods studies have been dealt with in the
previous discussions. The following remaining issues have been identified, and in the
author’s assessment, mitigated to the extent indicated in italics:
Credible source data, especially for older cases. The data used for the first
four cases in this study were collected by trained professionals, and
maintained under configuration control. Most of the data have been included
in contractual deliverables to the U.S. Government, and therefore are subject
to anti-fraud and other strictures. Data were collected directly from project
data librarians, and from the corporation’s Office of Cost Estimation. Both of
these organizations have the capacity and charter to maintain data with
Define a candidate
research question
Select cases [ per
(Yin 1994),
(Flyvbjerg 2006) ]
Create
measurement
instruments &
protocols
“Enter the field”;
overlapping data
collection and data
analysis
Analyze the data:
- Within a case
- Across cases
Shape the
hypothesis:
iterative; sharpen
construct by
refining definition,
then building
evidence, and
iterating
Comparison with
the literature
- Contrasting
- Supporting
Reach closure
(when learning
tails off)
Figure 3.3‐7. Method for formulating a interpretation.
Adapted from (Eisenhardt 1989).
162
integrity over the relevant periods of time. Incorporated into the construct
validity and reliability tests.
The results are valid, but only in a single, narrow application problem domain,
and therefore are not likely to be valuable to the community in general. The
examined application problem domain is not inconsequential, and in any case,
the final six cases bring in experiences from at least four other important
application problem domains, showing that the results are likely to be
valuable to a reasonable cross-section of the systems-development / research
community. Incorporated into the external validity tests.
The results are valid, but only for one specific, anomalous, short period of
time. The data from the FBCB2 case, which represent the most detailed data
and analysis within the study, cover results from the time-period 1998 to
2009, and include by implication the four years before 1998. The remaining
cases cover a time-period exceeding 10 years. Incorporated into the construct
validity and external validity tests.
The case study proposed involves measuring performance on actual system
development projects; at the time the projects were being executed, the intent
was to execute the contractual obligations in the best manner possible, rather
than conducting this case study. Purpose-built experiments might provide
higher validity. The provenance of the data is solid, as described above. The
preliminary data analysis indicates that the data, whatever the purpose of
163
their original collection, are adequate for the purposes of this study. A key
factor, however, is the scale of the study permitted by the use of real projects;
any purpose-built academic study of this topic would necessarily be of such
small scale as to have even greater threats to validity. The use of data from
real projects also has the advantage that the hypothesis is being tested in
realistic situations, at real scale, and permits the analysis of data representing
a long period of project performance. A reasonable argument can be made
that these are benefits worth achieving, and that the data from this study will
be useful to the community. Incorporated into the construct validity test.
So much time has passed for some of the instances that the needed data are no
longer available. As indicated above, this is not the case – the project
libraries and the corporate Office of Cost Estimation have preserved the data,
and done so with high integrity, through their configuration-control
processes. A significant amount of data have been made available for this
study (more than 3,000 separate documents). Although it is entirely possible
that certain items that could have contributed to the goals of the study are
simply no longer available
28
, so far no actual such instance has been
encountered. An additional partial mitigation is the author’s personal
knowledge of the situation in which two key cases were collected, refreshed by
28
U.S. Government contracting regulations only require keeping contractually‐generated information for
a certain number of years after a contract is complete; the time period for mandatory document
retention has lapsed for some of the earliest portions of the time‐period under consideration.
164
his contemporaneous notes and memos. Incorporated into the construct
validity test.
Self-involvement of the author in the main cases. Some of the cases display a
“first-person” nature; and in any case, I am attempting to validate the efficacy
of a technique of my own invention. The quantitative nature of a large
portion of the assessment will mitigate the risk inherent in such a situation;
the results will depend on the data analysis. In fact, the author had no
personal involvement in most of the cases. Incorporated into the construct
validity and reliability tests.
165
Chapter 4: Research Results
In chapter 2, I defined a set of hypotheses to be assessed; in chapter 3, I defined a
set of cases for the study, and the protocols to be used to conduct the assessment of each
case. This chapter provides the results of the assessment process; chapter 5 provides
interpretations and conclusions.
4.1 Analysis of hypothesis a
Hypothesis a is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the control
of the dynamic behavior of a system will lower the density of those defects that are
attributable to unplanned adverse dynamic system behavior”. This hypothesis has been
assessed against ten cases.
I start with the FBCB2 case. This project has had about a dozen major system test
events over its 15-year life; some of these are in what I term “period I” (initial use of the
technique); some are in “period II” (non-use of the technique); some are in “period III”
(return to the use of the technique). I have selected the life-cycle-point of entry into and
performance of the contractor-conducted system-level formal acceptance test as a place
to collect comparable defect data from the three periods; there have been such test events
in each of the three periods. I have gone through the actual error report logs from the
relevant portions of periods I, II, and III, and identified errors that satisfied the criteria
established at the beginning of Chapter 3 for errors in the control of system dynamic
behavior, and created monthly totals. [ I summarize those criteria here: Problems that
manifest themselves via highly off-nominal performance, e.g., crashes or otherwise un-
166
commanded cessation of processing, radical slowing down in processing rate, and
inconsistent behavior patterns, rather than showing the same behavior (even if wrong) in
response to the same repeated stimuli, were those that were scored as arising from
problems in controlling dynamic behavior. ] In order to establish that the data from each
period are representative, I have used the Shewhart / Wheeler “control chart”
methodology to show that the “process” that is creating these data is creating predictable
results. This methodology is explained in (Wheeler 2000) and (Kazeef 2009); it involves
the use of two synchronized plots, with the top one plotting the data against the derived
upper and lower natural process limits (as well as 1-sigma and 2-sigma variance bands),
and the lower one plotting the moving range against the derived moving-range limit.
Five tests (listed on the plots) are applied to see if there are signals of unpredictable
behavior. The absence of such signals indicates that the data form a predictable set.
(Wheeler 2000) and (Kazeef 2009) state that as few as five or six data points suffice to
perform such an analysis; I have more than that number of data points for each of periods
I, II, and III. I have performed this analysis separately for each of the three periods; the
results are presented on the first three plots, below. The y-axis unit is density of
attributable error reports opened per month, on a scale where 1 = 1 report per 1,000,000
source lines of code.
As can be seen, the data for each of the three periods form a predictable (in the
sense used by Wheeler and Kazeef) data set.
167
Figure 4.1‐2. Period II contractor test, attributable problem reports by month.
Period II contractor test (Feb –Oct 2007) –FBCB2 –problemreports
attributable to unplanned dynamic system behavior –opened per month
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
0
5
10
15
20
25
30
35
123 4567 89
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
2
4
6
8
10
12
14
16
12 34 56 7 8 9 10
MR
MR limit
None of the signals are present; data therefore indicate a controlled process,
but the range of variation is much larger than in periods I and III.
Initial version 31 August 2010
This version 9 February 2011
February to October 2007
Density –opened this month / 1M SLOC
Figure 4.1‐1. Period I contractor test, attributable problem reports by month.
Period I contractor test (Feb –Oct 1998) –FBCB2 –problem reports
attributable to unplanned dynamic system behavior –opened per month
0
1
2
3
4
5
6
12345 6789
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
0.5
1
1.5
2
2.5
3
3.5
12 34 56 789 10
MR
MR limit
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
None of the signals are present; data therefore indicate a controlled process
Initial version 31 August 2010
This version 28 December 2010
February to October 1998
Density –opened this month / 1M SLOC
168
Having demonstrated that we have data with a clear “voice of the process” in each
period (e.g., form a predictable data set), per (Kazeef 2009), we are justified in
comparing the voice of the process across different periods, in order to see if there are
changes to the voice of the process between periods. This is done in the following plot:
Figure 4.1‐3. Period III contractor test, attributable problem reports by month.
Period III contractor test (Feb –Dec 2009) –FBCB2 – problemreports
attributable to unplanned dynamic system behavior –opened per month
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
None of the signals are present; data therefore indicate a controlled process
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
123456 789 10 11
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
0.5
1
1.5
2
2.5
3
1234 567 89 10 11 12
MR
MR limit
Initial version 31 August 2010
This version 28 December 2010
February to December 2009
Density –opened this month / 1M SLOC
169
In looking at Figures 4.1-1 through 4.1-4, what the data appear to indicate is that
during period I (when the project was using the design-based technique to centralize the
control of system dynamic behavior) the project had a consistent new-occurrence rate of
defects due to errors in controlling system dynamic behavior on the order of 3 per month.
During period II (when the project was not using the design-based technique to
centralize the control of system dynamic behavior), the “voice of the process” is quite
different – the average new-occurrence rate of defects due to errors in controlling system
dynamic behavior is around 18 per month, and displays a larger range of variation than
the data for period I and period III.
Figure 4.1‐4. Periods I‐III contractor tests, attributable problem reports by month.
Periods I‐III contractor test on a single plot – FBCB2 –problem reports
attributable to unplanned dynamic system behavior –opened per month
(time is discontinuous between periods)
Initial version 31 August 2010
This version 28 December 2010
1998 2007 2009
Density –opened this month / 1M SLOC
0
5
10
15
20
25
30
35
1234 56789 10111213141516171819202122232425262728293031
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
2
4
6
8
10
12
14
16
12345 6789 10111213141516171819202122232425262728293031
MR
MR limit
Period II behavior is materially different –o(4x) more problem reports attributable to this cause
170
The rate during period III (when the project was again using the design-based
technique to centralize the control of system dynamic behavior) was similar to period I
(e.g., around 3 per month).
The next question is whether or not this change in the “voice of the process” can
appropriately be attributed to the change in the independent variable (e.g., the use or non-
use of the design-based technique). As noted in chapter 3, this quasi-experimental
configuration was analyzed through a “plausible rival hypothesis” methodology; the
identification of this as a quasi-experiment allowed the use of the literature to identify
likely risks, which in turn led to the selection of the “plausible rival hypothesis”
methodology, and to the selection of the particular rival hypotheses to be considered.
The plausible rival hypotheses to be examined were listed in Table 3.3-3, parts 1
and 2. The first is that there may have been significant changes in the scope of the
product to be produced between the periods, and that this, rather than the use of non-use
of the design-based technique, was the principal cause of the change in the dependent
variable observed in Figure 4.1-4. Note, however, that project documentation reveals that
the Period II and Period III requirements were essentially identical, and the difference
between the Period I and Period II requirements was incremental. In particular, the
project had built, tested, and deployed a product during Period I, based on about 1,000
system-level requirements. Entering Period II, the customer added about 200 additional
requirements, but also wanted a re-structuring of the system to achieve a long-term re-use
objective, something that was independent of the requirements contained in the system
specification. An advisor to the customer wanted this based on a completely new set of
171
software – to be built without the use of the design-based technique – rather than by
adapting the existing software; after much discussion, the customer so directed. So in
Period II, the contractor designed and built a product that implemented 1,200
requirements (e.g., the 1,000 from Period I, plus the 200 additional requirements new
during Period II), but without the use of the design-based technique. The contractor then
conducted a formal test of the Period II product, but failed that test (due to the high level
of remaining defects, as noted in the analysis of hypothesis a, and low reliability, as noted
in the analysis of hypothesis b). They were then told by customer to go off and improve
their design, which led to a decision to re-incorporate the design-based technique, but to
incorporate it into the Period II software, rather than going back to the Period I software.
Period III, therefore, consisted of the contractor re-designing and re-building the system
based on the Period II version of the software, re-testing the system (using same
requirements and the same test procedures as Period II). This time they passed the test,
and received a positive decision to deploy the Period III product.
It is clear from the above discussion that the rival hypothesis does not account for
the observed behavior (e.g., the large difference measured in defect density between
period II and period III). If the rival hypothesis were the true explanation, the Period II
and Period III behavior should be approximately the same, rather than about 4x apart.
This justifies the rejection of this rival hypothesis.
The second plausible rival hypothesis is that there may have been significant
changes in the key personnel on the prime contractor’s project team, and that this is
driving the observed changes in the dependent variable. In fact, the large majority of the
172
project’s key technical personnel remained the same across periods I, II, and III. The
only technical personnel change of any significance was that the original project chief
engineer left the program for another assignment early in period III, but remained
available as a consultant and reviewer to the team. This would not appear to cause a
change in the dependent variable, e.g., if this plausible rival hypothesis were the true
cause of the change in the dependent variable, one would expect to see Figure 4.1-1
display a constant level across the 3 project periods, with perhaps a modest up-slope in
period 3. This is not the pattern displayed, and therefore, it is reasonable to assume that
this rival hypothesis is in fact not plausible.
The third plausible rival hypothesis is that there may have been significant
changes in the key personnel on the customer’s project team, and that this is driving the
observed changes in the dependent variable. In fact, the Government aims to avoid such
changes; military personnel are assigned to the project for 3-year terms, but key technical
positions are filled by civilian Government employees, and they tend to stay in place for
long periods of time. No significant changes in the customer’s key civilian technical
personnel occurred across periods I, II, and III. Therefore, this would not appear to be a
cause of a change in the dependent variable, e.g., if this plausible rival hypothesis was the
true cause of the change in the dependent variable, one would expect to see Figure 4.1-1
display a constant level across the 3 project periods. This is not the pattern displayed,
and therefore, it is reasonable to assume that this rival hypothesis is in fact not plausible.
The fourth plausible rival hypothesis is that there may have been significant
changes in the developmental maturity of the prime contractor’s project team (as
173
measured by objective criteria, such as CMM assessments), and that this is driving the
observed changes in the dependent variable. In fact, the organization to which this prime
contractor team belonged had a constantly-improving CMM rating over the 3 project
periods: they were CMM level 3 at contract award (1995), increased to a level 5 by
1999, and have remained at level 5 (the highest CMM level) since that time. The
organization also acquired ISO-9000 certification in 1997, and has retained that
certification ever since. This might cause a change in the dependent variable, but not one
that matches the pattern observed, e.g., if this plausible rival hypothesis were the true
cause of the change in the dependent variable, one would expect to see Figure 4.1-1
display a decrease between periods I and II, and remain constant between periods II and
III. This is not the pattern displayed, and therefore, it is reasonable to assume that this
rival hypothesis is in fact not plausible.
The fifth plausible rival hypothesis is that there may have been significant
changes in the organization institutional framework of policy guidance, procedures, and
training for system engineers and software developers provided to the project team, and
that this is driving the observed changes in the dependent variable. In fact, the
organization to which this prime contractor team belonged had a mature, slowly-
improving set of organizational policy guidance, procedures, and training over the 3
project periods. There was a gradual and incremental addition of new guidance,
procedures, and example artifacts across the three periods, as would be expected from a
mature system-development organization (the sector in which this project is embedded
has the most level-5 CMM ratings of any organization in the world, per the SEI’s
174
published data). This might cause a change in the dependent variable, but not one that
matches the pattern observed, e.g., if this plausible rival hypothesis were the true cause of
the change in the dependent variable, one would expect to see Figure 4.1-1 remain
constant across the 3 project periods, with perhaps a modest down-slope over time. This
is not the pattern displayed, and therefore, it is reasonable to assume that this rival
hypothesis is in fact not plausible.
The sixth plausible rival hypothesis is that there may have been significant
changes in the contractual structure within which the development proceeded (e.g., did
the contract switch from cost-reimburseable to fixed-price, or other such change that
might drive behavioral changes within the development team). In fact, the contractual
arrangements for the development portion of the contract remained the same (cost-plus
incentive fee) for the entire time-period under consideration. During period II, the prime
contractor experienced significant over-runs (a condition not experienced in either period
I or period III), and as a customer-relations strategy, invested a significant amount of
their own funds in order to mitigate these cost over-runs. But all of the work, whether
funded by the Government or by the company, was performed on the contract, under
contract guidance, and deliverables, reviews, and quality provisions that remained the
same. One might make the argument that the use of contractor funding during period II
would have caused the contractor to take short-cuts in testing, so as to minimize the over-
runs and the expenditure of contractor funds. This might cause a change in the dependent
variable, but not one that matches the pattern observed, e.g., if this plausible rival
hypothesis were the true cause of the change in the dependent variable, one would expect
175
to see Figure 4.1-1 either remain constant across the 3 project periods (since there were
no changes in the contract structure), or one might expect a decrease during project
period II (since less testing would have generated fewer error reports). This is not the
pattern displayed, and therefore, it is reasonable to assume that this rival hypothesis is in
fact not plausible.
The seventh and final plausible rival hypothesis is that there may have been
significant changes in project procedures between the three project periods. During a
detailed lessons-learned review conducted near the end of period II by the prime
contractor (and led by me), three such changes were identified (e.g., where the project did
something different during period II than it did during period I):
The use / non-use of the design-based technique under consideration in this
study
The rigor decreed by project guidance procedures to be used during software
unit testing (relaxed during project period II)
The rigor decreed by project guidance procedures to use used during software
peer reviews (relaxed during project period II).
For all three of these changes, during period III the project returned to follow the
procedures used during project period I. The first of these items, of course, is the matter
under consideration in hypothesis a. The second and third items form rival explanations
that must be explored. All three might show a low-high-low behavior pattern like that of
Figure 4.1-4, so additional methods will be needed to distinguish between these
alternatives.
176
Software peer reviews and software unit tests both are intended to detect errors of
a localized nature, e.g., errors entirely within the bounds of the particular entity under
review. This would include coding syntax errors, interface unit errors, and so forth.
Neither of these types of reviews is intended to catch errors that involve the interaction of
multiple components, such as timing and sequencing problems.
To create Figures 4.1-1 through 4.1-4, I examined the error-report log for project
periods I through III, and (using the criteria defined in chapter 3.3.c), identified those
error reports attributable to errors in controlling system dynamic behavior. In order to
assess this plausible rival hypothesis, I went again through the attributable errors that
occurred during project period II, and scored them (using a set of criteria that are also
defined in chapter 3.3.c) as whether, using the period I peer review and unit test
procedures, these errors could have been detected if the project had during project period
II retained its peer-review and unit-test procedures of project period I. Those errors that
could have been so detected can be assigned to this plausible rival hypothesis; those
errors that could not have been so detected can be assigned to hypothesis a (since at this
point, all of the other remaining plausible rival hypotheses have been eliminated from
consideration).
173 errors that occurred during project period II were attributed to errors in
controlling system dynamic behavior. Of these, 13 were assessed (under the criteria of
chapter 3.3.c) as having been detectable using the period-I peer review and unit test
procedures; the remainder (160), would not have been so detectable, and therefore, are
assigned to having been caused by hypothesis a. [Such a small portion – 13 out of 173 –
177
is consistent with (Royce 1998), appendix D, which in referring to design walk-through
and inspections (his terminology for what herein I term peer reviews), “very few serious
quality flaws were uncovered in these meetings”] Below, in Figure 4.1-5, the data of
Figure 4.1-4 have been re-plotted using the adjusted data (e.g., those errors that were
assessed as having been detectable using the period I peer review and unit test procedures
were removed from the period II data). As can be seen, there is little change in the
outcome (although the variance of the period II data has decreased), and at this point, we
are justified in concluding that hypothesis a is in fact responsible for the majority of the
observed change in the dependent variable between the 3 project periods.
Figure 4.1‐5.
Periods I‐III contractor tests, attributable problem reports by month,
Adjusted for peer‐review and unit‐test procedure change during period II.
Periods I‐III contractor test on a single plot –FBCB2 –problem reports
attributable to unplanned dynamic system behavior –opened per month
(adjusted for peer‐review and unit‐test procedure changes)
(time is discontinuous between periods)
Initial version 31 December 2010
This version 31 December 2010
1998 2007 2009
Density –opened this month / 1M SLOC
‐5
0
5
10
15
20
25
1234567 89 10111213141516171819202122232425262728293031
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
1
2
3
4
5
6
7
8
123456789 10111213141516171819202122232425262728293031
MR
MR limit
Period II behavior is materially different –o(4x) more problem reports attributable to this cause
178
I have also done some assessment of the cost-performance data that corresponds
to the periods I through III. The contract value to-date is about $2.4B ($2,400,000,000).
Although all awarded under the aegis of the original competitive award (January 1995), a
large number of separate contractual delivery orders were used, facilitating separation of
cost and schedule against budget for individual activities. In general, development,
production, and services (e.g., training, maintenance, etc.) were contracted on separate
delivery orders. Individual versions of development capability were generally contracted
on separate delivery orders, as well. This provides far more insight that would be
available looking at aggregate cost / schedule performance for the entire contract.
For example, the period-I development efforts that lead up to the capability tested
in 1998 (e.g., the development effort that took place from 1995 to 1998) show a modest
cost under-run (1% over-run to 4% under-run on the various delivery orders)
29
. The
period-II development efforts show significant and consistent over-runs (15% to 25%).
The period-III developments show a return to the modest under-run pattern of period-1. I
have excluded costs that are not relevant to non-recurring development from the above,
e.g., hardware production, and post-delivery services (e.g., installation, training,
maintenance, etc.). Figure 4.1-6 provides a graph of these data; the cost performance on
the project seemed to vary in synchronicity with the dependent variable of the quasi-
experiment.
29
Drawn from Force XXI Battle Command Brigade‐and‐Below, Contract number DAAB07‐95‐D‐E604, CDRL
F002, Status Report, various dates in 1998, 2007, 2009, and 2010.
179
Next, I consider the FAAD C
2
I case with regard to hypothesis a. I assessed
FAAD C
2
I using data from a 2002 test cycle. Since 1989, FAAD C
2
I has been using the
design-based technique. Figure 4.1-7 presents the FAAD C
2
I data for hypothesis a.
Figure 4.1‐7. FAAD C2I, hypothesis a, with the design‐based technique.
FAAD C
2
I contractor test (Apr– Nov2002) –
problemreports attributable to unplanned dynamic system behavior
–opened per month
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
Initial version 12 September 2010
This version 28 December 2010
April to November 2002
Density –opened this month / 1M SLOC
‐8
‐6
‐4
‐2
0
2
4
6
8
10
Apr May Jun Jul Aug Sep Oct Nov
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
1
2
3
4
5
6
7
8
9
12 34 56 78
MR
MR limit
None of the signals are present; data therefore indicate a controlled process
Figure 4.1‐6. Cost performance in each of the three project periods.
Periods I‐III contractor test on a single plot – FBCB2 –
actualcost per month divided by budgeted cost per month
(time is discontinuous between periods)
1998 2007 2009
Actual cost per month / budget per month
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
123456789 10111213141516171819202122232425262728293031
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
Initial version 2 February 2011
This version 2 February 2011
180
What the data appear to indicate is that during this period (when the project was
using the design-based technique to centralize the control of system dynamic behavior)
the project had a consistent new-occurrence rate of defects due to errors in controlling
system dynamic behavior on the order of 2 per month, and with no new such errors being
reported in about half of the months.
The next case is project AAAA; this project did not use the design-based
technique. Figure 4.1-8 presents the data for project AAAA.
What the data appear to indicate is that this project (which did not use the design-
based technique to centralize the control of system dynamic behavior) had a consistent
Figure 4.1‐8.
Project AAAA, hypothesis a, without the design‐based technique.
Project AAAA contractor test (Apr–Dec 1999) –
problemreports attributable to unplanned dynamic system behavior
–opened per month
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
Initial version 12 September 2010
This version 31 December 2010
April to December 1999
Density –opened this month / 1M SLOC
0
1
2
3
4
5
6
7
8
1 2345 678
MR
MR limit
10.00
15.00
20.00
25.00
30.00
35.00
Apr May Jun Jul Aug Sep Oct Nov Dec
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
None of the signals are present; data therefore indicate a controlled process,
but the range of variation is large.
181
new-occurrence rate of defects due to errors in controlling system dynamic behavior on
the order of 23 per month.
The next case is project BBBB; this project did not use the design-based
technique. Figure 4.1-9 presents the data for project BBBB.
What the data appear to indicate is that this project (which did not use the design-
based technique to centralize the control of system dynamic behavior) had a consistent
new-occurrence rate of defects due to errors in controlling system dynamic behavior on
the order of 18 per month.
Figure 4.1-10 puts all six of the above hypothesis-a data sets onto a single set of
plot axes. A marked grouping appears – three data sets cluster around low values (e.g., 0
Figure 4.1‐9.
Project BBBB, hypothesis a, without the design‐based technique.
Project BBBB contractor test (Apr–Nov 1996) –
problemreports attributable to unplanned dynamic system behavior
–opened per month
Wheeler / Kazeef signals:
1. one point outside of UNPL or LNPL
2. 2 out of 3 in the 2-to-3-sigma zones
3. 4 out of 5 outside of the 1-sigma zone
4. 7 in a row on one side of the mean
5. 1 MR point above the MR limit
Initial version 12 September 2010
This version 12 September 2010
April to November 1996
Density –opened this month / 1M SLOC
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
30.0
Apr May Jun Jul Aug Sep Oct Nov
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
0
1
2
3
4
5
6
12 34 56 78
MR
MR limit
None of the signals are present; data therefore indicate a controlled process,
but the range of variation is large.
182
to 4 new attributable reports per month), and three data sets cluster around higher values
(e.g., 15 to 27 new attributable reports per month). The three data sets with the low
values correspond to those instances where the design-based technique was used (FBCB2
period I, FBCB2 period III, and FAAD C
2
I). The three data sets with the higher values
correspond to those instances where the design-based technique was not used (FBCB2
period II, project AAAA, project BBBB). This would seem to justify a conclusion that
hypothesis a is correct.
Finally, I consider the remaining six cases with regard to hypothesis a.
The Cheyenne Mountain Complex Processing and Display System – Replacement
(usually abbreviated CCPDS-R) project provided the primary missile-warning capability
for the continental United States. It used a technique similar to portions of the design-
based technique of chapter 2. Development took place primarily in the early 1990’s. The
project is by all accounts considered a success: it was completed on-time and on-budget
Figure 4.1‐10. All six hypothesis adata sets on a single plot.
0
5
10
15
20
25
30
123456789 10 11
Project YYYY period II
Project AAAA
Project BBBB
Project YYYY period I
Project YYYY period IIII
Project ZZZZ
Did not use the
design‐based
technique
Did use the
design‐based
technique
Density –opened this month / 1M SLOC
Month of the project period
183
(even under a fixed-price development contract), caused the customer (U.S. Air Force
Electronic Systems Command) to select the prime contractor as its contractor of the year
in the mid-1990’s, and served its intended mission for many years longer than its
intended operational life-time (the next-generation system was about 7 years behind
schedule in being completed). Walker Royce was the chief engineer for the project
through most of its development, and has written extensively about lessons-learned from
this experience, e.g., (Royce 1998). His view is that much of CCPDS-R’s success
30
is
attributable to its use of a SAS-like construct (which he calls NAS, network architecture
services) and its use of a skilled, specially-trained core team to implement it; through the
use of these two techniques, the project was thereby able to keep the complexity
associated with control of the dynamic behavior of the system away from the majority of
the developers. He also emphasizes the value of the sort of “hard-part-first” prototyping
strategy I described in chapter 2.3, and that the SAS-like portion of the system is an
excellent candidate for such early prototyping.
The Advanced Field Artillery Tactical Data Systems (AFATDS) provides tactical
fire control and tactical fire direction to the U.S. Army and the U.S. Marine Corps. The
AFATDS project started in 1990 without anything similar to the design-based technique
of chapter 2. After the FAAD C
2
I project (which was performed at around the same time,
and for the same customer) started performing well after a rocky start (see above), the
AFATDS project elected to adopt a technical approach similar to that which appeared to
30
(Royce 1998), appendix D, defines “success” for CCPDS‐R as a combination of on‐budget and on‐
schedule performance, and being satisfactory to the customer (the U.S. Air Force).
184
be helping the FAAD C
2
I project, e.g., the design-based technique of chapter 2.
According to Art Hawking, lead systems engineer for AFATDS, the adoption of “the
middleware-based design-technique was one of the key technical improvements (along
with a better Ada programming language compiler) made to AFATDS that allowed it to
become a viable development project that eventually was completed and fielded”
31
; prior
to the adoption of the design-based technique, the project was on a “path to failure”. In
his mind, the key learnings were:
“We needed an ‘interpreter / translator’ between the Army
personnel and the contractor personnel – neither party understood
enough about each other’s problem domain to communicate
effectively”
“We discovered a need for certain critical skills in areas that we
did not anticipate – most particularly, software middleware,
tactical communications. We really underestimated how much
knowledge they needed to implement those aspects of the project”
“We ended up sort of re-inventing the integrated product team –
we found that we needed to place software people into the systems
engineering team, and then when we started software development,
we brought those systems engineers into the software development
team (as testers, etc) – trying to keep the knowledge base constant
through the program.”
“Once we acquired all the key technologies – a better Ada
compiler, a better set of development tools, and the software
middleware – and, later, the corresponding critical skills – in the
areas of tactical communications, the Ada programming language,
use of the middleware, etc. – we were able to make progress.”
According to John Williams, the contractor’s project manager, the original $69M
development contract reached a state of being twice the original cost, and more than
31
Interview, 30 January 2011.
185
twice the original schedule duration
32
(it ended up more than 3x the original projected
cost). Symptoms of the technical nature of the problems being experienced included
“opening problem reports faster than we were closing them”, “unexpected issues in the
timing of the operational of the system” (e.g., unplanned dynamic behavior), “as the load
built up, the system would slow down to a crawl”, “AFATDS took almost 18 years to
build and field to the whole Army” (originally planned for 7 years). After the adoption of
portions of the design-based technique of chapter 2 (and other improvements to project
procedures, to be discussed in hypothesis d), the project performed well, was fielded, and
remains in operational use to this day.
The Theater High-Altitude Air-Defense (usually abbreviated THAAD) radar
project developed the solid-state, phased-array radar intended for use with the THAAD
missile. The project considered herein developed the software for this radar. From its
inception, this project used a portion of the design-based technique of chapter 2. The
project is by all accounts considered a success: it was completed on-time and on-budget,
and continues to serve its intended mission. According to John Dowdee, project
manager
33
, they based their product on a software-infrastructure product called
“UNAS”
34
, which implements portions of the design-based technique of chapter 2. A
product called Network Architecture Services (NAS) was originally developed for the
CCPDS-R project, above; Universal Network Architecture Services (UNAS) was a
32
“100% over‐run, and more than 100% behind schedule”, interview, 22 December 2010.
33
Interview, 17 January 2011
34
See (Royce 1998) and http://www.adahome.com/Resources/Tools/Commercial.html for a brief
description of UNAS.
186
generalization of NAS offered for sale as a commercial product, and later supported
through the Rational Software Company (and after their purchase by IBM, supported by
IBM). The customer for the THAAD radar project software was very concerned over the
selection of this SAS-based approach; in particular, they were worried that incorporation
of UNAS into the design might introduce port-to-port timing delays that would be
unacceptable. A radar has fairly stringent timing restrictions of processing time; each
radar pulse must be processed and passed along to the next step in the processing
“pipeline” before the next pulse arrives from the radar aperture. In this radar, the
processing time allocation varied between 10 and 20 milliseconds per pulse (depending
on pulse type). Given the capacity of the computers available at the time this project was
designed, this was considered a difficult requirement. Dr. Dowdee stated that “. . . we
didn’t have any problems”, and the customer eventually came to like having UNAS
within the product: it met the rigorous port-to-port timing allocations (including
variance), while creating a product that was considered more modular, and hence, more
maintainable. This work was done in the early- to mid-1990’s; the product still is in use
today. “We were even able to transition maintenance from the development team to a
customer software-maintenance team” (in the late 1990’s). The project was noted for its
extremely low turn-over in personnel, attributed by Dr. Dowdee in part to the sense of
success that came from the SAS approach
The Combat Service Support Control System is the U.S. Army’s principal tactical
logistics-automation system. The program started in 1988. According to David Bixler,
the long-time chief engineer on the project, in late 1988, the first time the team built the
187
prototype software and tried to execute it, the software appeared to “freeze”
35
. While the
team was sitting around discussing what could have gone wrong, the menu that they had
requested came up onto the screen – after about 20 minutes! Upon investigating, they
discovered that the system infrastructure permitted multiple mechanisms to implement
concurrent processing (e.g., Ada tasks, Unix processes, X-Windows), and that unintended
interactions between these various mechanisms were causing the incredibly-slow
response time. “We had fights between Ada concurrency mechanisms and X-windows
event handling that caused all sorts of chaos. Using hIPC (what they called their version
of the SAS) with a carefully crafted Ada process template (which unrolled the X event
loop) was the key to fixing the problems. We were making unplanned independent
thread calls to X-Windows from different Ada tasks, which would get interleaved from
the point of view of X-Windows, and it didn’t have the ability to de-interleave them
correctly. This resulted in behavior that was not just slow, but caused actual lock-ups in
the system. We used the SAS to sequentialize calls to X-Windows and other system
functions, so that we wouldn’t make calls out of order; we did this by using our own
event handler, rather than the standard UNIX one. Without this, interleaving calls from
different windows would result in writing to the wrong window. The root cause was
calling X-Windows and other system utilities from multiple Ada tasks, and the Ada
scheduler was not deterministic enough to do this properly”. They had to re-think how
they over-saw the design of concurrency and concurrency-control within the system; to
accomplish this, they adopted the system architecture skeleton as the organizing
35
Interview, 17 January 2011.
188
mechanism for implementing this new philosophy of controlling system dynamic
behavior. Once they did that, performance and operating capacity for the system was as
anticipated. The team attributes much of the long-term technical success of the system
(the system remains in operational use to this day, recently under the new name “BCS3”)
to the adoption of the SAS and portions of the design-based technique of chapter 2.
According to Mr. Bixler, “the SAS was a life-saver”.
The CCCC project is a classified, large-scale information processing system. It is
notable for (a) the very demanding ingest rate required of the system – on the order of
1,000,000 instances per second, continuously, (b) the large size of the resulting data base
(many petabytes in size), and (c) a high rate of queries of the resulting data base
36
. The
success rate for systems of this technical complexity is lower even than the already-low
average success rate for system developments discussed in chapter 1. At the time the
CCCC project was developed (the mid-2000’s), it was generally considered to be the
largest and highest ingest-rate data base system in the world that provided high-quality,
schema-based search / query results. It is generally considered to be a success, fielded
on-time and within the original cost estimate, despite the (at the time) very challenging
and risky technical goals for the system (e.g., the ingest rate, etc.). Performance (in terms
of simultaneously meeting the ingest and query rates) was the primary design risk; this
was addressed through the careful incremental development of the system control
structure, using an almost-continuous prototyping / benchmarking cycle, and making use
of a form of the design-based technique of chapter 2.
36
Data on the CCCC project was provided by Jeff Steiner, during an interview in 2009.
189
Eurocontrol is the principal air traffic control authority for the European Union
37
.
Their current implementation of the over-all air-traffic control management system is
based on the same UNAS product used by the THAAD radar project (above), which
implements portions of the design-based technique of chapter 2. The portion of the
Eurocontrol system that uses UNAS is generally considered to be a success, fielded on-
time and nearly within the original cost estimate, despite the challenging social
environment of building products for the European Union (e.g., the need to coordinate
requirements, work-share, and other factors across the member states). The use of
UNAS, and through it, portions of the design-based of chapter 2, have been credited as a
contributing factor to this success
38
.
The FAAD C
2
I project is described in Appendix B. It started without the use the
design-based technique (project period I, August 1986 to January 1989). In early 1989, it
adopted the use of the design-based technique, and has continued in that vein to this day
(project period II, January 1989 to the present). During period I, this project experienced
significant cost, schedule, and technical problems, and in fact, was nearly cancelled due
to poor performance. As is described below (chapter 4.3, hypothesis c), just 2 years into
a 7-year contract, it had spent most of its allocated budget, yet was far behind schedule,
and technical progress appeared disappointing. The key corrective action taken was to
introduce the design-based technique of chapter 2. By mid-1989, cost, schedule, and
37
See http://www.eurocontrol.int/corporate/public/subsite_homepage/index.html for a description of
their mission, etc.
38
Data on the Eurocontrol project was provided by the late Joan Bebb, in a 1993 interview, and from
project documentation.
190
technical metrics had significantly improved, and the project was allowed to initiate a re-
baselining activity – in essence, was given “one more chance” to show that it could
deliver. The project then met every milestone (cost and schedule) leading to a 2002
delivery and a successful 2003 operational test, and reached “first unit equipped” status
(e.g., initial fielding) on the 30 September 1993 date that was baselined in early 1989.
The project has maintained consistent good performance since that time (as of this
writing, the project continues). A quantifiable assessment of FAAD C
2
I period-I /
period-II performance, using a different dependent variable, is provided in section 4.3
(hypothesis c).
The above cases show a pattern of success, once the design-based technique was
incorporated (also see hypothesis d, which adds a second criterion for success). In
contrast, recall the data (Glass 2001) cited in chapter 1, e.g., only 16% of such large-scale
system developments are considered successes even by their own developers.
And also consider the most recent data from the Standish Group data [ published
in (Johnson 2006) ], which describes results from a 2004 survey: 18% of the projects
failed without delivering anything; 53% were “challenged”; and just 29% succeeded. For
the projects they termed “challenged”, their average schedule over-run was 84%; their
average cost over-run was 56%; and the average percent of features delivered was 64%.
64% of features for 156% of the cost is a cost-efficiency of around 41%, e.g.,
productivity per dollar is 41% of that promised at the beginning of the project.
The same report also contained 2002 survey data, which were essentially the same
as the 2004 results noted above, showing some stability in the findings:
191
2004 survey results 2002 survey results
average schedule over-run 84% 82%
average cost over-run 56% 43%
average percent of features
delivered
64% 67%
Table 4.1‐1. Comparison of 2004 and 2004 Standish Group survey findings.
Note that according to U.S. public law, a cost over-run of 20% on a Department
of Defense system development contract constitutes a major breach of the baseline, and is
grounds for automatic termination of the offending project; under such circumstances, a
termination can only be avoided by a series of actions that include legal certification by
the Secretary of Defense that there is no viable alternative to the planned project. Yet the
average over-run reported in the Standish Group surveys (56% and 43%) was more than
twice as high as what legally constitutes failure in U.S. law! And this is not even
accounting for the fact that these projects are delivering far less than their promised
functionality, e.g., their actual cost over-run per unit of functionality delivered is much
higher than indicated just be looking at the percent over-run.
Given data that describes such a norm for system-development performance, the
consistent pattern of success across these cases (and such success only after the design-
based technique was incorporated) is striking. These cases support the conclusion
derived from the first four cases that hypothesis a has been established as correct.
4.2 Analysis of hypothesis b.
Hypothesis b is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the control
of the dynamic behavior of a system will produce better reliability for key system
capabilities”. This will be examined for the FBCB2 and FAAD C
2
I cases.
192
Reliability for software-intensive systems displays unusual behavior across the
industry: in (Siegel 1991), data are provided from various benchmarks that show that the
range of reliability in comparable software-intensive systems can vary by a factor of
1,000. These data draw from comparisons of systems that perform similar functions and
were created around the same time . . . yet the measured reliability of some systems was
found to be more than 1,000 times better than comparable systems. This is a significant
range of outcomes, and motivates looking for approaches that might be able to create
outcomes at the “good” end of that range more consistently.
Figure 3.3-2, above, showed how the system reliability for the systems under
consideration is decomposed into three broad categories: reliability expected of the
hardware, reliability expected of the software, and reliability expected in the presence of
operator-induced errors. This decomposition was similar on both projects that I consider
for hypothesis b: the customer specified the system-level reliability requirement; the
prime contractor then decided what levels of reliability it would strive to achieve in each
of the three above categories, subject to the constraint that these allocations had to
combine so as to reach the system-level reliability requirement. Achievement of the
contractor-selected levels of reliability for the reliability expected of the hardware and the
reliability expected of the software were verified through actual testing. Achievement of
the contractor-selected level of reliability in the presence of operator-induced errors was
verified by reviewing the proposed human interactions with the system through a process
called a “user jury”; one of the outputs of this process was an estimate of the frequency
and impacts of operator errors, which allowed for an estimate of the reliability in the
193
presence of operator-induced errors. In one case, actual system-level reliability achieved
was measured; in the other, it was predicted based on the test results at the allocated
level.
As illustrated in Figure 3.3-2, the use of the design-based technique of chapter 2
did not have a significant impact on either the rate or severity of hardware-related
reliability, or reliability in the presence of operator-induced errors. This is because for
both FBCB2 and FAAD C
2
I the majority of interactions are mediated by software (even
interactions between hardware components are in general so mediated). This leaves
software-related reliability as the key leverage for achieving the system reliability target.
This conclusion was reinforced by cost considerations; analysis during the design phase
showed that the most economical approach to achieving the system reliability target was
to target the hardware to have reliability that would be only slightly better than the target
for the system (about 10% higher), but at the same time imposing a very high reliability
requirement on the software (about 100x the level required of the entire system), and
also imposing a very high reliability requirement in the presence of operator-induced
errors (also about 100x the level required for the entire system).
During the initial 6 or 7 years of the FAAD C
2
I project, the project conducted an
explicit task to predict software reliability, using Department of Defense reliability-
prediction methods
39
. These methods were intended for predicting hardware reliability,
and were adapted by the project for use with software. During both the integration and
39
Derived from U.S. Army document AR 702–3, “Product Assurance Army Material Reliability, Availability,
and Maintainability (RAM)”.
194
test phases of the development effort, the project kept detailed logs of operating hours
and off-nominal events. These events were scored, allowing the development of
measured reliability. The predictive tools were calibrated against these measurements.
Figure 4.2-1 provides the actual system and allocated levels of MTBF
40
for one
portion of FAAD C
2
I. The expressions ⁄ ) and R
sys
= R
1
* R
2
* R
3
(valid
only when all three elements must be available in order for the system to be considered
available) used in the table are taken from "Design for Reliability", a lecture by Scott
Jackson, USC, 2009.
As can be seen in the figure above, the desired system MTBF was 1,070 hours,
and the allocations were 1,100 hours MTBF to the hardware, 100,000 hours to the
software, and 100,000 hours to operator-induced errors that cause system failure. These
40
Mean‐time between failure, the key metric used for reliability on both projects.
Figure 4.2‐1. Allocation of the FAAD C
2
I system MTBF target to the 3 categories.
desired system MTBF = 1,070
allocated hardware MTBF = 1,100
allocated software MTBF: 100,000
allocated operator‐induced MTBF = 100,000
let t = 24
(but note that the value of t is immaterial to the calculation of predicted system‐level MTBF)
then R (system) = e^(‐t/MTBF) = 0.97781978
then R (hardware) = e^(‐t/MTBF) = 0.9784
then R (software) = e^(‐t/MTBF) = 0.9998
then R (hardware) = e^(‐t/MTBF) = 0.9998
then R(system) = 0.98
calculated system MTBF = ‐t / ln(R) = 1,076.32
195
combine to achieve a system-level MTBF of 1,076 hours, just above the 1,070 hour
target.
During period I of this project (non-use of the design-based technique), the
software reliability measures were far below the required level – in fact, initial
measurements put achieved software MTBF levels in the tens-of-hours range, far below
the 100,000-hour allocated requirement.
This was still in the development phase of the project. It is normal to see a
progressive improvement in reliability of a software product as it progresses through its
development and test cycle. For example, Northrop Grumman studies
41
predict that
latent software defects are decreased at a rate of about 30% per development phase (e.g.,
design, coding, integration, system test, deployment, operational usage); if one assumes
that MTBF improves roughly in inverse proportion to the remaining level of latent
defects, one can predict an approximate rate at which MTBF should improve over the
course of development and use. It was clear, however, that maturation at this “normal”
level was going to still fall far short of achieving the required level of reliability – by
about a factor of ten. Something that would introduce a “step-function” improvement in
software reliability was required. This led to a decision to introduce the design-based
SAS technique of chapter 2, as a mechanism to achieve this significant improvement in
software reliability (and in addition, to “cut off” the tail of the variance being experienced
in a key port-to-port timing requirement; see hypothesis c, below; and for other reasons,
41
See, for example, Northrop Grumman Systems Engineering Handbook, CTM‐101, “Tenets of program
success”, 2008, and also “Defect Profiler v1.1 Overview”, 2009.
196
such as improving system adaptability). Once the design-based technique was
incorporated (early 1989), this became project period II (use of the design-based
technique).
This allows a “before and after” comparison (e.g., project period I versus project
period II) of software MTBF for FAAD C
2
I: in the nomenclature of (Campbell & Stanley
1963), a “one-group pretest-posttest design”. On the FAAD C
2
I project, software
reliability analyses and test results were documented in chapter 3.7.7.11 of the FAAD C
2
I
System / Subsystem Design Notebook (a sort of running log of all technical analyses on the
project). Figure 4.2-2 summarizes those data. As can be seen, the measured level of
software MTBF, even when predicted to improve via “natural” maturation over subsequent
project phases during project period I, was going to result in a system-level MTBF far below
the required level (e.g., a predicted level of just over 120 hours, versus a requirement of
1,070 hours). The other two categories of MTBF (hardware-related and operator-induced-
error-related) remained fairly constant over the measurement cycle; measured hardware
MTBF during period II was modestly higher than the initial allocation, but it is clear from the
figure that the big “swinger” that led to being able to meet the system-level MTBF was the
significant increase in software MTBF achieved after the introduction of the design-based
technique of chapter 2.
197
The system-level MTBF
42
target for FBCB2 was 700 hours (threshold), and 910
hours (objective)
43
. An allocation similar to that of FAAD C
2
I was derived, e.g., it was
determined that the most cost-effective manner to reach the system-level MTBF was to
get the hardware MTBF only modestly above the required system-level MTBF, and to
reach the system-level MTBF by having the software MTBF and the operator-inducted
MTBFs be much higher than the system-level MTBF.
42
Technically, the FBCB2 requirement was expressed as a MTBEF, mean‐time‐between‐essential‐failures.
43
From “FBCB2 system reliability”, Mike Gogue, March 2002. Threshold constituted the minimum value
that would allow the system to be considered suitable for operational use, while objective defined the
actual desired value.
Figure 4.2‐2. FAAD C
2
Iallocated and predicted / achieved MTBFs.
Allocated period I measured period I predicted period II measured
desired system MTBF = 1,070 1,070 1,070 1,070
allocated hardware
MTBF = 1,100 1,100 1,100 1,300
allocated software
MTBF: 100,000 45 135 107,000
allocated operator‐
induced MTBF = 100,000 100,000 100,000 100,000
let t = 24 24 24 24
(but note that the value of t is immaterial to the calculation of predicted system‐level MTBF)
then R (system) = e^(‐
t/MTBF) = 0.97781978 0.97781978 0.97781978 0.97781978
then R (hardware) =
e^(‐t/MTBF) = 0.9784 0.9784 0.9784 0.9817
then R (software) = e^(‐
t/MTBF) = 0.9998 0.5866 0.8371 0.9998
then R (hardware) =
e^(‐t/MTBF) = 0.9998 0.9998 0.9998 0.9998
then R(system) = 0.98 0.57 0.82 0.98
calculated system
MTBF = ‐t / ln(R) = 1,076.32 43.21 120.10 1,268.11
198
During the initial few years of the FBCB2 project, system-level and software
MTBF were monitored through an extensive measurement activity. Large numbers of
the FBCB2 developmental hardware were placed in environmental chambers (which
could modulate the temperature, for example, over the extreme ranges required in the
specification). The software was loaded onto this hardware, were interconnected as they
would be in a real system deployment, and these units were connected to a system
stimulator which injected messages and simulated external inputs, and were subjected to
ranges of operator actions. This configuration would be operated for up to 30 days, to
create the opportunity for many types of errors to be induced, e.g., operating hours over
environmental ranges to drive hardware errors, large-scale system loads and network
configurations to drive software errors, long continuous operating periods to drive other
types of software errors (e.g., memory leaks, etc.), and large numbers of simultaneous
operator actions both to create operator-induced errors, but also to create unplanned
combinations of concurrency within the system. This customer did not want to depend
upon predictive modeling for reliability analysis; they wanted instead to depend on actual
measurements – hence their willing to pay for the (relatively expensive) test activities
described above. Also, the customer desired to complete and deploy the FBCB2 product
relatively quickly, and this test-based approach was considered to carry more credibility
with the field Army users than would predictive models.
During these tests, all off-nominal behavior was logged by test observations and
instrumentation, and subjected to a formal test-scoring activity that identified the source
of the off-nominal behavior, and many other items of test data. This test activity
199
therefore created a series of data points for comparable test results over a series of time-
ordered events. Because of the duration and scope of these tests, each data point had
sample sizes that assured statistical significance.
Recall that the FBCB2 project used the design-based technique of chapter 2 from
its inception. In contrast with the results from the FAAD C
2
I project (which started
without the design-based technique), software reliability was reasonably close to the
established goal almost from the very beginning, and those failures observed tended to be
isolatable to a small number of causes that were amenable to correction within localized
portions of the software design and implementation. In contrast, hardware reliability was
very disappointing in the early portion of the project, but these problems also turned out
to be isolatable to a small number of causes that were amenable to correction within
localized portions of the design and implementation. For example, we determined that
one small circuit within the power supply board of the main computer was a cause of
more than half of the failures across one test event; this was corrected via a localized
design change. We found that heat build-up within one portion of the computer caused
another large portion of the hardware errors, and that manufacturing process
inconsistencies in laminating a capacitive membrane to glass (one of history’s first touch-
screen computers) were causing much of the remaining hardware errors. These were all
correctable via localizable changes; none required any real changes to the system-level
design. This was a very different outcome than on FAAD C
2
I period I, for example,
where system-level changes of a material nature were determined to be necessary to
reach the needed levels of reliability.
200
During period II, FBCB2 dropped the use of the design-based technique from its
approach. Early versions of the period-II FBCB2 software showed reliability
performance far worse than that of period I, in addition to showing a very high rate of
errors (see hypothesis a, above), and for some key functions, performance that was much
slower than the period-I FBCB2 software. These observations eventually led to a
decision to re-introduce the design-based technique into FBCB2 (period III). Reliability
then rapidly climbed to a far-better plateau, and then started a slow rate of improvement
similar to that predicted by the Northrop Grumman documents cited above (30% per
development phase). This is depicted in Figure 4.2-3; note that the vertical axis of this
figure uses a logarithmic scale; the drop in MTBF was so dramatic during period II that
using a linear scale for the Y-axis would have resulted in those values appearing as
essentially zero.
Figure 4.2‐3. FBCB2 measured software MTBF.
1
10
100
1000
10000
100000
1000000
1 2 3 4 5 6 7 8 9 10111213 1415 1617
Monthly results
UNPL
LNPL
2‐s upper
2‐s lower
1‐s upper
1‐s lower
average
Period 1 Period 3 Period 2
Measured software MTBF
201
In both cases examined in this section (e.g., FAAD C
2
I and FBCB2), the
dependent variable – software reliability – moved in synchronicity with the independent
variable (use or non-use of the design-based technique of chapter 2), moved by material
amounts, and in the direction predicted by the hypothesis.
4.3 Analysis of hypothesis c.
Hypothesis c is “During the development phase of a large-scale, complex
computer-based system, the use of a design-
based technique that centralizes the control of
the dynamic behavior of a system will reduce
the variance for critical port-to-port timing
relationships”. I assessed hypothesis c using
data from the FAAD C
2
I project.
FAAD C
2
I is designed to engage
moving aircraft and helicopters, and to do so
while the fire units (e.g., the weapons
platforms; see
Figure 4.3-1)
themselves are
moving. The key
operational
requirement is that
when the gunner in
radar
FAAD C
2
I sensor
node
wire
FAAD C
2
I air battle
management
operations center
(ABMOC)
radio
radio
FAAD C
2
I battery
command post
FAAD C
2
I weapon
(on-the-move fire
unit)
radio
Figure 4.3‐2. FAAD C
2
I end‐to‐end timing thread.
On‐the‐move FAAD C
2
I fire unit.
Figure 4.3‐1.
202
the moving fire unit selects a target to engage, the turret on the fire unit slews and
elevates to the right location – and 90% of the time, the correct target must be within the
narrow field-of-view of the gunner’s optical sight, allowing auto-tracking to commence,
and a visual identification of the target (to ensure that it is a hostile unit, rather than a
friendly unit) to be performed. In order to achieve this 90% level of performance, the
principal technical measure that we employed throughout the design process was the
creation and management of an end-to-end error budget, which consisted of all known
possible sources of errors along the entire processing thread, from beginning (tracking by
the radar) to the end (appearance in the narrow field-of-view of the gunner’s optical
sight). These errors include factors such as errors in the known position and orientation
of the radar(s), measurement errors for the radar, and so on, through the entire sequence
of events leading up to the mechanical play in the “slew-to-cue” mechanism. Since both
the target and the fire unit are moving, variance in the timing across the end-to-end thread
contributes to this error, and in fact, it turns out that such timing variances are the
principal source of errors. During the design process, we decomposed the nominal end-
to-end time into budgets (and allowed variances) for each step. A key element of the
design process was verifying that each step stayed within its allocated budget and
variance.
Figure 4.3-2 depicts this end-to-end thread (a complete OV-2 is provided at
Figure B-03, in appendix B). The radar reports a track (after having previously gone
through search and track-initiate processing); the FAAD C
2
I sensor node correlates that
track with tracks reported by other radars, and decides which is the highest-accuracy
203
source for that particular object; the correlated track is reported to the air battle
management operations center (ABMOC), who distributes it to each of the battery
command posts (nominally, there are 3 of these per ABMOC); if the track meets local
priorities, it is distributed to each of the FAAD C
2
I weapons controlled from that battery
command post (nominally, there are 24 of these per battery command post). Decisions as
to which targets are engageable under the current rules of engagement (these rules of
engagement are set, after coordination with the Air Force, at the ABMOC, and are
distributed to the battery command posts and to the fire units) are made locally at each
fire unit. The gunner (this terminology is still used, despite the fact that the weapon is a
missile, and no longer a gun) at the fire unit makes the decision to engage a target
deemed engageable, and this initiates the “slew-to-cue” process described above.
I became chief engineer for this project in January of 1989. The program was not
going well; just over 2 years into a 7-year contract, it had spent most of its allocated
budget, yet was far behind schedule, and technical progress appeared disappointing. The
team made measurements across the critical thread described above. Not all of the actual
tactical radios were yet available, so predictions from the analytic system simulation were
used for the radio time-lags and variances, but actual measurements were made of the
software port-to-port timing elements of the thread. The results were unacceptable; the
average was close to what was required, but the distribution had a “long tail” that was
going to prevent us from meeting the “90% in the narrow field-of-view of the optic”
requirement mentioned above. In response to this (along with other problems uncovered
in the software and system design), we initiated a re-design of the software employing the
204
system architecture skeleton methodology (e.g., the design-based technique of chapter 2).
In March of 1990, the team took the same measurements on the re-designed software.
The results of these two sets of measurements (with the radio-induced delays removing
from the calculations, so that only the software-induced delays are represented) are
depicted in Figure 4.3-3.
Figure 4.3‐3. Distribution of port‐to‐port timing, without SAS and with SAS.
The target mean for this parameter was 80 ms. For the “WITHOUT SAS” case,
the mean was 104.6 ms, with a standard deviation of 27 ms, and 1,476 samples were at
160 ms or greater (e.g., twice the desired mean). For the “WITH SAS” case, the mean
was reduced to 85.7 ms, the standard deviation to 24.9 ms, and only 15 samples were at
160 ms or greater.
As can be seen, in both the “without SAS” and “with SAS” state, we were close to
our goal of an 80 ms average port-to-port timing delay (and the program had 2 more
years of refinement and performance-tuning to go before initial fielding, so further
‐200
0
200
400
600
800
1000
1200
1400
1600
0 50 100 150 200 250
instances, 100‐second high‐load test
port‐to‐port time, ms
instances, 100‐second test,
without
instances, 100‐second test,
with
205
modest improvements could be expected), but the “tail” of the variance differ quite a bit.
The “long tail” on the “without SAS” design meant that more than 15% of the samples
were greater than 160 ms, e.g., nearly twice the desired timing. In contrast, with the SAS
incorporated, fewer than 2/10 of 1% of the samples exceeded 160 ms – a decrease on the
order of a factor of 100. The “long tail” on the “without SAS” design would have caused
us to be significantly below the “90% in the narrow field-of-view of the optic”
requirement; with the SAS incorporated, we achieved the 90% requirement with a
moderate design margin.
4.4 Analysis of hypothesis d
Hypothesis d is “During the development phase of a large-scale, complex
computer-based system, the use of a design-based technique that centralizes the control
of the dynamic behavior of a system is less effective if the team implementing the
centralization lacks critical skills”. The independent variable is therefore the use or non-
use of techniques for acquiring and applying critical skills to the project. The dependent
variable is a scalar metric of project performance, created from a subjective combination
of (principally) cost/schedule performance, award-fee scores, and qualitative statements
of contractor performance provided by the customers through various mechanisms
44
.
In particular, what was examined will be use / non-use of acquiring and applying
critical skills to the project, in combination with use / non-use of the design-based
technique. The goal is to determine if the use of the design-based technique is sufficient
44
See chapter 3.3e, above, and in particular, Figure 3.3‐5.
206
on its own to improve outcomes, or if this improvement is only achieved when attention
is also paid to acquiring and applying related critical skills.
The Ada programming language was developed by the U.S Department of
Defense in the 1980’s. By the late 1980’s, they were expecting contractors to be
proficient enough to start utilizing the new programming language in major DoD
systems. According to (Siegel et al 1993), the particular company that performed most of
the projects considered in the analysis of hypothesis d (TRW), made a deliberate decision
to attempt to become the premier developer of Ada software for the Department of
Defense. They aimed to accomplish this via two steps: (a) building institutional expertise
in exploiting the features of the new programming language that were explicitly intended
to solve the problems perceived by the Department as afflicting large-system
development efforts, and (b) large-scale training of personnel (programmers and their
managers), to enable them to perform as a knowledgeable, confident team, at sufficient
scale to perform on several such large projects simultaneously.
With respect to step (a), the relevant achievements include:
Development of a “process model”, a set of techniques, intermediate products,
standards, and procedures for software engineering in the new language
45
.
UNAS (described above), and hIPC
46
, two run-time executive control
middleware products that implemented a message-based architecture for
distributed systems.
45
See (TRW 1989). (Boehm & Walker 1989) adapts COCOMO coefficients to this Ada process model.
207
Dedicated, multi-year funding for a series of research activities to build the
above, but that had the explicit additional objective of building a cadre of
expert practitioners in the new language and process model. These people
became the “core teams” that implemented the SAS on the projects considered
in this hypothesis.
In today’s terminology, we would term these items the artifacts that would
support the use of a software product line
47
.
With respect to step (b), the relevant achievements include:
Creation, in conjunction with a local university, of a set of four formal
training courses to train employees (and their managers!) in the new Ada
language, and the process model construct that went with it. These were
university-level courses (with required lab work) that actually provided credit
for graduate degrees. Between May 1988 and December 1990, out of an
eligible population of around 1,500 employees, more than 1,175 received such
training.
Funding the creation and delivery of the four courses. About 60% of the
funding was obtained through a grant from the State of California.
By the end of 1990, the company had a pool of trained Ada practitioners, a
modest set of true “gurus” that could form the core teams discussed in chapter 2.3, and
46
Heterogeneous InterProcess Communications, abbreviated “hIPC” with a small “h”. See (Bixler 1988)
for additional information.
47
See (Clements & Northrop 2001) for additional information regarding software product lines.
208
even a set of managers who were versed in the principles involved. The company aimed
to ensure effective application of the new learnings by co-locating most of the
development projects that were to apply these new techniques in two closely-spaced
locations, by mandating the designation of a chief engineer for each project who had to
have completed the courses, and by ensuring that each new project had access
(eventually, changed to dedicated access) to some of the “gurus”. Over time, of course,
they also attempted to grow additional such “gurus”.
In toto, this represented a material and novel commitment to ensuring project
access to critical skills. The company attempted to identify in advance of need a set of
projected critical skills that it knew that it did not already possess. It created (and
funded!) a credible approach to deliver those skills to a set of employees large enough to
perform on the business base desired. It invested additional funds and key personnel in a
multi-year effort to grow a modest number of true experts, who could fulfill key technical
leadership roles, and provide project-specific “on the job” training to the main body of
practitioners. And it institutionalized practices intended to ensure actual use of the new
skills on the actual projects. This investment was intended to have both direct benefits of
making the identified critical skills available, and to have the indirect benefit of having
the employees understand that the company was willing to invest in them and their
future.
This did not happen uniformly and completely all at once; for example, some
projects started before all of the above had been created and absorbed into the
institutional methodology. Also, some variability in application of the new learnings
209
appeared: some projects placed more initial emphasis on training and acquisition of
“gurus” than did other projects; some projects placed more initial emphasis on the
process model (which, over time, evolved into the design-based technique of chapter 2)
than did other projects, and so forth. This disparity in adoption allows for an assessment
of the efficacy of the various combinations that were experienced, e.g.:
Applied the design-based technique of chapter 2, and also emphasized the
acquisition and placement of critical skills.
Applied the design-based technique of chapter 2, and did not (or did not at
first) also emphasize the acquisition and placement of critical skills.
Emphasized the acquisition and placement of critical skills, but did not (or did
not at first) also apply the design-based technique of chapter 2.
Six projects are considered in this assessment, five of which were performed by
that portion of TRW that implemented the steps (a) and (b) described above; the sixth
project was performed by a different company, but one that became aware over time what
TRW was doing, and gradually adopted its own version of both steps (a) and (b).
All six projects eventually came to be considered successful; it would appear
therefore that the requirements for none of these projects were fundamentally
misconceived, or incorporated impossible objectives. Five of the six are still in
operational use today, and the one that now is retired was used for far longer than its
projected service-life. These characteristics make the side-by-side comparison more
reasonable.
210
Table 4.4-1 provides the scoring. The 20 horizontal cells for each project in the
table represent scoring periods, each of approximately 6-month duration. The projects
did not all start at the same time, so the time-progression is periods-after-project-
commencement. The value in each cell is the scalar metric defined in Figure 3.3-5; a
higher value is better.
Figure 4.4-1 provides the same data in graphical form. The main portion of the
Figure is the 6 line graphs, one for each project. The X-axis for each graph is the
progression of time over the course of the project; the graphs are aligned to years-into-
the-project, rather than calendar years. The Y-axis for each graph is the scalar metric of
project performance from Table 4.4-1.
As can be seen, all six projects were started around the same time-frame, all used
the Ada programming language as their main software language, and all had significant
quantities of newly-developed custom software. Three separate application problem
domains are represented; no difference in the behavior of the dependent variable appears
to be keyed to the application problem domain. Similarly, other factors, such as
contractual role (e.g., prime contractor or subcontractor) or contract type (e.g., cost-plus-
FAAD C
2
I 111178910 10 10 10 10 10 10 10 10 10 10 10 10
CSSCS 11178999101010101010101010101010
AFATDS 111133333567788910101010
FBCB2 7891010101010101022278910101010
CCPDS‐R 7891010101010101010101010101010101010
THAAD radar 7 8 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Score for current period
Table 4.4‐1. Scoring for hypothesis d.
211
award-fee, firm-fixed-price, etc.) appear to have no effect on the behavior of the
dependent variable.
FAAD C
2
I benefited from the beginning by the organization’s investment in the
development and deployment of critical skills. Despite this investment, project
performance was not satisfactory until the design-based technique also was adopted.
Project performance thereafter improved rapidly, and has remained high.
CSSCS has a trend-history similar to that of the FAAD C
2
I project; it benefited
from the beginning by the organization’s investment in the development and deployment
of critical skills, but despite this investment, project performance was not satisfactory
until the design-based technique also was adopted. Project performance thereafter
improved rapidly, and has remained high.
AFATDS started with neither a focus on critical skills application, nor the use of
the design-based technique. As discussed in chapter 4.1, above, the project adopted a
form of the design-based technique, but experienced only a modest improvement in
project performance. Later, the project made an incremental commitment to critical skills
acquisition and management (but of smaller scale than that which benefitted the other
five projects discussed herein), and thereafter experienced a slow-but-steady
improvement in project performance. Once achieved, project performance has remained
high. This contract was the one out of these six performed by a company other than
TRW (Magnavox).
FBCB2 made use from the commencement of the project of both the design-based
technique and the application of critical skills. It performed well until a management
212
decision was made to discontinue the use of the design-based technique (see the
discussion in chapters 3.2.1 and 4.1, above), at which time project performance
deteriorated rapidly to a low level. After a while, management re-instituted the use of the
design-based technique; thereafter, project performance improved rapidly, and remained
high until the end of the contract.
CCPDS-R made use from the commencement of the project of both the design-
based technique and the application of critical skills. It performed well for its entire
duration, and appears in this light as something of a model project.
The THAAD radar software project also made use from the commencement of the
project of both the design-based technique and the application of critical skills. It
performed well for its entire duration, and therefore also appears in this light as
something of a model project.
It appears from Figure 4.4-1 that the significant step-function upwards in project
performance does not occur until both the design-based technique and application of
critical skills have been incorporated into the project’s approach.
A sensitivity analysis has also been performed around the selection of the
weightings for the scalar metric. Table 4.4-2 repeats the outcomes with the baseline
weightings (as defined in Figure 3.3-5), and also includes two excursions: excursion #1
with 75% of the score based on the cost/schedule data (as opposed to 2/3 of the score in
the baseline), and excursion #2 with 100% of the score based on the cost/schedule data.
As can be seen, there is practically no difference between the baseline and excursion #1.
The differences between the baseline and excursion #2 are still modest in absolute value,
213
and non-existent in terms of “shape” and trend; it seems reasonable to conclude that the
conclusions regarding hypothesis d are not dependent in a strong way on the particular
weighting values selected to create the scalar metric.
214
Project name started
application
problem domain
Principle
programming
language
Size
(kSLOCs)
prime contractor
or subcontractor
contract
type
c
Comments
FAAD C
2
I 1986 military C2 Ada 500 prime contractor CPIF 11 1178 9 10101010101010101010101010
Step‐function improvement in
project performance upon
adoption of design‐based
technique. The improvement has
endured.
CSSCS 1988
logistics
automation Ada 300 prime contractor CPAF 11 1789 99 101010101010101010101010
Step‐function improvement in
project performance upon
adoption of design‐based
technique. The improvement has
endured.
AFATDS 1990
a
military C2 Ada 1,100 prime contractor CPIF 11 1133 333 567 7889 10101010
Took the longest to become a
success; adopted design‐based
technique without emphasis on
critical skills and found that
ineffective.
FBCB2 1995 military C2 Ada 1,000 prime contractor CPIF 78 9 10101010101010 2 2 2789 10101010
An inadvertent experiment in the
efficacy of the design‐based
technique!
CCPDS‐R 1987 military C2 Ada 350 prime contractor FFP 7 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
A model project ‐‐ used a form of
the design‐based technique from
its inception, and also emphasized
the acquisition and use of critical
skills
THAAD radar 1993
b
radar Ada 250 subcontractor CPAF 7 8 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Used a form of the design‐based
technique from its inception, and
inherited a set of personnel who
had already acquired and learned
the value of the emphasis on
critical skills.
a
Excludes the preceding concept exploration program, as that was not aimed at developing a fieldable product
b
Excludes the preceding experimental program, as that was not aimed at developing a fieldable product
c
Contract types:
CPIF = cost‐plus incentive fee
CPAF = cost‐plus award fee
FFP = firm fixed price
Qualitative assessment of project effectiveness, over time
Application of critical skills, but no use of the design‐based technique
1
1
1
Adopted (or re‐adopted) the design‐based technique 2
2
2
Applied neither critical skills nor the design‐based technique
3
3
2
4 Applied critical skills
4
5
6
Applied both critical skills and the design‐based technique
Discontinued the use of the design‐based technique
6
2
5
5
5
Figure 4.4‐1. Data for hypothesis d.
215
Separate from the issue of sensitivity to the particular weights chosen for the
scalar metric, I considered three additional potential threats to the validity of this
analysis. Each is discussed below. None appear to constitute a material threat to the
conclusion reached above – that the significant step-function upwards in project
performance does not occur until both the design-based technique and application of
critical skills have been incorporated into the project’s approach.
Author participation in projects. Of the six projects used for hypothesis d, the
author participated personally in two (he also participated in the pre-contractual
Table 4.4‐2. Excursions for hypothesis d.
2/3 cost/schedule; 1/3 qualitative
FAAD C
2
I 1111 789 10 10 10 10 10 10 10 10 10 10 10 10 10
CSSCS 1117 8999 10 10 10 10 10 10 10 10 10 10 10 10
AFATDS 1111 3333 3567 7889 10 10 10 10
FBCB2 789 10 10 10 10 10 10 1022 2789 10 10 10 10
CCPDS‐R 7 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
THAAD radar 7899 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
3/4 cost/schedule; 1/4 qualitative
FAAD C
2
I 2211 789 10 10 10 10 10 10 10 10 10 10 10 10 10
CSSCS 2117 8999 10 10 10 10 10 10 10 10 10 10 10 10
AFATDS 1111 3333 4567 7889 10 10 10 10
FBCB2 789 10 10 10 10 10 10 1021 2689 10 10 10 10
CCPDS‐R 8 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
THAAD radar 7899 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
100% cost/schedule
FAAD C
2
I 4300 0006 10 10 10 10 10 10 10 10 10 10 10 10
CSSCS 4466 8 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
AFATDS 0000 3344 4668 888 10 10 10 10 10
FBCB2 10 8 8 10 10 10 10 10 10 10 0 0 0 3 6 10 10 10 10 10
CCPDS‐R 8 8 8 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
THAAD radar 8888 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Final score for current period ‐‐ baseline
Final score for current period ‐‐ excursion 1
Final score for current period ‐‐ excursion 2
216
prototyping period of a 3
rd
project, but that period is not incorporated into the evaluated
time-periods). There are no indications in the data that these two projects behaved
differently with respect to the hypothesis than the four projects in which the author had
no personal participation; the scalar metric responded to the independent variables in the
same manner on all six projects.
Non-replicated projects, lack of randomization in the assessment. The
method used to assess this hypothesis included some elements that can properly be
considered a quasi-experiment, e.g., the FBCB2 case displays the “repeated treatment
design” of Figure 3.1-1, and several of the others display the “one-group pretest-posttest
design” discussed in chapter 3.1. Empirical research on very large systems is necessarily
observational, rather than experimental; no one is going to fund a billion-dollar, multi-
year development project solely for the purpose of conducting experimentation in
systems engineering. As a result, one cannot introduce randomization in the conduct of
the project at all stages desired and appropriate for experimental design, nor can one
replicate all of the possible sources of variation other than the treatment variables.
The approach selected mitigated this threat by confining the domain of the
projects studied to those known to be similar in many respects (e.g., all used the same
programming language, all performed to Department of Defense technical and
contracting standards, most were performed by the same organization within the same
company, etc.). This reduces the scope of this threat to validity, but also reduces the
generality of the results, e.g., they may not apply to small agile commercial projects, or
217
other system development projects of radically different nature. The results stand,
however, within the limits of the described domain.
Mixture of quantitative and qualitative data. This could actually be viewed as
a strength, by providing a broader array of inputs, and therefore decreasing the effect of
any single type of input. In any case, this risk is addressed explicitly through excursion
#2, which included only quantitative data (e.g., cost / schedule data). The results from
excursion #2 pointed to the same trends, the same reactions to the independent variables,
and hence, the same conclusions, as did the baseline analysis.
218
Chapter 5: Interpretations and Conclusions
Chapter 4 presents the data that resulted from applying the methodology (chapter
3.3) to the selected cases (chapter 3.2). That chapter is therefore objective in the sense
that the findings rigorously derive from the data examined.
This chapter offers my interpretations and conclusions I have drawn from those
data. This chapter, therefore, is somewhat subjective, although in each instance I cite the
data results that have caused me to form the interpretation / conclusion that I provide.
This study examined large-scale complex system development projects that share
a specified set of characteristics (chapter 2.1):
Complex emergent behavior, as described by (Rechtin 1991)
Interactions with physical devices (moving missile turrets, other time-
sensitive mechanical devices, etc.)
Stressing asynchronous stimuli (such as extraordinary high data-ingest rates,
or highly-stressed communications structures)
Extraordinarily high reliability requirements
Development efforts of large size
Systems that need to display lots of early progress through prototyping and re-
use, but need to at the same time avoid having their design “locked in” to a
pattern that will be ineffective over the life of the development effort
The background of the problem (chapter 1.1) and the student’s motivation for
studying this problem (chapter 1.4) were provided. A set of specific issues (and the
219
corresponding gaps in the literature) were identified (chapter 1.6), centering around the
high rate of failures of such system development projects.
A general strategy for effecting improvement was formulated, based on the idea of
using design-phase activities to partition the work of the development project into
separable “bins” of implementation, with the intent of using the design process to ensure
that the major portion of these “bins” are less difficult than average to implement, and
accepting in return that a small portion of these bins are more difficult than average to
implement. This then allows a deliberate strategy of aligning the skills of particular
individuals and teams with the degree-of-difficulty of the task to be performed, dependent
on the creation and use of a method that prevents “leakage” of difficult elements into
those portions of the implementation intended to be easier (chapter 2.3).
A particular instance of such a partitioning was proposed for the systems of
interest, based on a strategy of centralizing the mechanisms that implement the control of
the systems’ dynamic behavior (chapter 2.3).
This, in turn, led to the formulation of four specific hypotheses that form the core
of this study
48
:
a. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will lower the density of those defects that are attributable to
unplanned adverse dynamic system behavior.
48
This corresponds to “define a candidate research question”, Figure 3.3‐5. This and the following set of
footnotes show compliance with the method proposed for formulating conclusion, summarized in the
referenced figure.
220
b. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will produce better reliability for key system capabilities.
c. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system will reduce the variance for critical port-to-port timing
relationships.
d. During the development phase of a large-scale, complex computer-based system,
the use of a design-based technique that centralizes the control of the dynamic
behavior of a system is less effective if the team implementing the centralization
lacks critical skills.
Ten cases were selected (chapter 3.2)
49
. A methodology called a case-study
protocol was developed, which defined the steps for handling the data, the processing of
the data into the measurement instruments, and the structuring of the results (chapter
3.3.e)
50
.
Aspects of the problem which had the characteristics of a quasi-experiment were
identified, so as to be able to take advantage of the guidance provided by the literature
regarding potential sources of error (chapter 3.1). A plausible-rival-hypothesis
methodology was adopted to address these sources of potential error (chapter 3.3.g).
49
This corresponds to “select cases”, Figure 3.3‐5.
50
This corresponds to “create measurement instruments and protocols”, Figure 3.3‐5.
221
Other literature-based methodologies were incorporated to ensure the quality of the study
(chapter 3.3.g).
A methodology was adapted from the literature to guide the formulation of
conclusions (chapter 3.3.h).
The data were gathered from the historical records, and the provenance of the data
assessed, so as to ensure its suitability for the intended usage (Appendix A). A
preliminary data analysis was performed, both to assess the completeness of the data with
regard to the data-processing objectives, and to “enter the field” so as to start to gain
insight into what the data are indicating
51
. The data were then processed and analyzed
according the process defined in the case study protocols, and textual / graphic depictions
of the data created (chapter 4)
52
. An iterative process ensued of analysis, building
evidence, and sharpening my understanding of the question followed thereafter
53
, as did a
comparison of the emerging results against the literature
54
.
The data presented indicate that the proposed design-based technique for
centralizing the control of dynamic system behavior during the design and
implementation phase of a complex system development may in fact lead to better
outcomes. In this study, this conclusion was indicated by:
51
This corresponds to “enter the field; overlapping data collection and analysis”, Figure 3.3‐5.
52
This corresponds to “analyze the data, within a case, and across cases”, Figure 3.3‐5.
53
This corresponds to “shape the hypothesis etc”, Figure 3.3‐5.
54
This corresponds to “comparison with the literature etc”, Figure 3.3‐5.
222
(a) The materially-lower density of defects (on the order of four times lower) that
were attributable to errors in controlling such system behavior in those projects
and periods that used the design-based technique than for those that did not use it
(hypothesis a). Not only are there fewer defects to find and fix (and fixing each
defect costs time and money), but having fewer defects means that more of those
defects that remain are likely to be found and fixed earlier in the project life-
cycle, and many sources [e.g., (Shull et al 2002), (Northrop 2010), and many
others ] indicate that the cost to fix a defect is much lower if the defect can be
found and fixed in an earlier project phase. (Shull 2002) quantifies this
difference as about a factor of 100 more expensive to fix a problem at delivery
versus during the requirements phase.
(b) The materially-higher level of system mean-time-between-failure (on the order
of ten times higher) achieved in those projects and periods that used the design-
based technique than for those that did not use it (hypothesis b). As discussed in
(Glass 2001) and other sources, disappointing mean-time-between-failure is a
key source of customer disillusion with a development project, and a likely
cause of project cancellation. Furthermore, low system mean-time-between-
failure often does not appear until late in the development cycle, when (as noted
above) it can cost very much more to implement a fix. Problems that appear late
in the development cycle, of course, also can play havoc with commitments for
deployment date, and otherwise cause project sponsors to lose confidence in the
development project. (Shull 2002) states that “About 90% of the downtime
223
comes from at most 10% of the defects”, and also indicates that such defects are
exactly those that tend to break architecture (rather than those that are comprised
of errors in local processing logic), indicating that techniques (such as the
design-based technique of chapter 2) that reduce architectural defects are likely
to reduce downtime, and thereby, increase MTBF. This forms a link between the
findings of hypothesis a, and those of hypothesis b.
(c) The materially-lower level of variance (about a 100x reduction in the number of
samples that occurred above twice the desired mean value) in a key performance
metric (port-to-port processing time) achieved in those project periods that used
the design-based technique than for those that did not use it (hypothesis c).
Unexpectedly-high levels of variance in various types of performance
parameters is also a problem that tends to appear only late in the development
cycle, and therefore has the negative programmatic and cost affects cited in the
previous paragraph.
Finally, the data presented also indicate that the proposed design-based technique
for centralizing the control of dynamic system behavior during the design and
implementation phase of a complex system development is far more effective in those
instances where the project also places an emphasis on the acquisition and application of
critical skills (hypothesis d); in this case, those skills required to apply the design-based
technique. In this study, this was indicated by the relative value (especially within a
single project over time, but also across multiple projects) of a scalar project-performance
metric. In fact, none of the six projects examined in this portion of the study were
224
deemed successful until they had implemented both the designed-based technique of
chapter 2, and an intense effort to acquire and apply critical skills. It did not seem to
matter whether they did both the design-based technique and the critical skills together,
or one after the other (and in either order) . . . but the project did not perform well until it
had undertaken both.
It also turned out the errors reduced by the use of the design-based technique
formed a large percentage of the most serious errors found in the systems examined as a
part of hypothesis a. Each of the four systems studied classified all reported errors into a
ranking scheme, in terms of the impact on the error upon the system. For example, in
that period of the FBCB2 project where the design-based technique was not used (period
II), the errors attributable to the cause under investigation accounted for 92% of the total
number of the total serious errors reported. Since the number of attributable errors was
much smaller during period I and period III (when the design-based technique was used),
and there was no apparent increase during these periods in other causes of significant
errors, the use of the design-based technique is apparently leading to a significant
reduction in the total number of serious errors.
The design-based technique of chapter 2 is, of course, only one possible technique
for creating a partitioning of a project into easy and difficult portions, and thereby, using
the design process for improving project outcomes by allowing an explicit mechanism for
assigning tasks to people and teams that increases the likelihood of project success.
Future studies could propose and assess other such techniques.
225
Various interesting subsidiary findings came to light during the study. For
example, whereas much of the literature regarding system and software prototyping
emphasizes performing early prototyping of the “easy” portions of a system (for example,
the user interface, and re-use from other systems) in order to show rapid progress (and
thereby, to create the impression of low risk and low cost for the total development), the
method advocated herein is essentially a “hard-part first” strategy. Further investigations
into effective ways to combine “easy-part-first” and “hard-part-first” development and
prototyping strategies may be of value.
In sum, the combination of using the design-based technique and applying critical
skills made these projects perform better, and thereby increased the chance that the
project will be allowed to complete. And, of course, completion is the only way that any
return on the development investment can be realized.
If repeatable, this is a promising result. The failure rate of large-scale software-
intensive system-development projects remains stubbornly high; in addition to the data
cited in the body of the study from (Glass 2001) that only 16% of such large-scale system
development project are considered successes even by their own developers, I offer the
following examples:
Professor Barry Boehm has cited a 2005 Gartner Group survey that indicated,
for example, that half of global outsourcing projected ended as failures
55
.
When I was negotiating a deal with the film studio Warner Brothers in the late
1990’s to create a joint-venture (to create a capability for “digital dailies”, that
55
Personal communication, March 2010.
226
is, sending scenes from a location movie shoot to special-effects labs, then to
editing labs, and then to studio executives and the director [the latter, on site],
and providing the capability for the executives and the director to review
progress), my counter-part at Warner Brothers said that “The film industry
fails at about 90% of its information-technology-related projects; that’s why
we are partnering with you”
56
.
According to the Los Angeles Times newspaper in the late 1990’s, at that time
every major large information technology project in work by the State of
California was turning into a failure; the article had a list of every such project
underway by the State of California at the time who’s dollar value exceed
some threshold, and provided estimate-at-complete data. All had either been
cancelled, or were projected to cost at least double their original projections,
and if not cancelled, projected to be years late in entering service.
In light of the above, the improvements in project performance cited by this study
are likely to be of enduring interest to the field; the consistent pattern of success across
the cases that used the design-based technique is striking.
Again, per Professor Boehm, “Personnel Competencies” is one of the particular
areas of research emphasis by the relatively new Systems Engineering Research Center
(SERC
57
), co-sponsored by Office Under-Secretary of Defense for Acquisition,
56
Personal communication from Randy Blotke, Executive Vice‐President, Warner Brothers, 1999.
57
Co‐led by the Stevens Institute of Technology and the University of Southern California. See
http://www.sercuarc.org/ for more information.
227
Technology, and Logistics [OUSD (AT&L)] and the National Security Agency. This is
manifested, for example, through personnel competencies being a major focus of the
SERC’s SysE Effectiveness Measures task. According to Professor Boehm:
“Our cost models consistently find that people and team capabilities
account for more variance in productivity than do other classes of factors.
However, it has been a particularly difficult area in which to test
hypotheses.”
58
This would indicate that the findings of this study regarding the necessity for the
application of critical skills – and the potential insight that can be derived from how these
particular projects applied those critical skills – are of more general value. For example,
this work may lead to new insights about how to select appropriately which skills are in
fact critical in a particular project setting.
The results described herein only have thus far been demonstrated to apply to
system development efforts within the identified scope of system characteristics, and only
for those systems where a centralization of the method of controlling system dynamic
behavior is possible. The study showed that this scope is non-null, covering five
separate, important, application program domains. Further studies could both extend the
specific design-based technique proposed herein to additional application program
domains (e.g., real-time process and manufacturing control; automation support for
decision-making in many problem domains, such as medicine, public health, information
technology operations; and many others), and could, as noted above, propose additional
techniques to achieve the desired partitioning of work scope.
58
Personal communication, March 2010.
228
The implications for practice include:
The value to the development of complex systems resulting from placing a
stronger emphasis (e.g., beyond that usually suggested by the literature) on the
control structure of the system, on managing the dynamic behavior of the
system, and in particular, on working to avoid unplanned dynamic behavior
that could have adverse consequences on the utility and value of the system.
This is in line with the general recommendations of (Perrow 1999), but much
more specific and implementable.
The apparent feasibility and benefit of introducing an explicit goal for the
design phase of a system-development effort of using the design process to
partition the design / implementation effort to restrict certain difficult
elements to a small portion of the system, which can then be implemented by
a small, select team.
The design-based technique considered herein only applies to projects where such
centralization is possible, but the research did demonstrate that such centralization is
frequently possible. While I measured several recognized metrics for system
development projects, I did not measure every possible metric (for example, I did not
measure the extent of meeting functional requirements, extent of meeting capacity
requirements, etc.). I considered only five out of all possible application problem
domains. My demonstrated regime-of-applicability therefore is: (a) systems that display
my six specified characteristics (chapter 2); (b) systems for which centralization of the
229
control of dynamic behavior is feasible; and (c) systems that fall into one of the five
examined application problem domains.
I believe that, in future work, one could reach into the literature in order to extend
the demonstrated regime-of-applicability of my results. In chapter 2, I showed that the
literature that supports the four hypotheses in fact purport general applicability, rather
than limitation to particular application problem domains. This is also true for the
literature that supports the six system characteristics that I used to bound the scope of the
study. This general applicability of the underlying theory may provide a basis for future
extension of my result to additional application problem domains.
In conclusion, the study examined a particular method of implementing the
partitioning-by-skill of the work on a large, complex system development project into
separate portions, where most are explicitly intended to be easier than average, while a
small portion are intended to be more difficult than average. The study showed that use
of this technique resulted in better project outcomes, as measured by an indicated set of
metrics. This particular method was examined first in one application problem domain,
and then in four additional application problem domains. This result is likely to be of
value, given the high failure rate still experienced on such system development projects.
230
Bibliography
(ANSI 2004) American National Standards Institute, A Guide to the Project
Management Body of Knowledge, Project Management Institute, 3
rd
edition, 2004.
(Bixler 1988) Bixler, David, “Heterogeneous InterProcess Communications”, TRW
Systems, 1988.
(Boehm 1981) Boehm, Barry W., Software Engineering Economics, Prentice Hall,
1981.
(Boehm 1988) Boehm, Barry W., A Spiral Model of Software Development and
Enhancement, IEEE Computer, 1988.
(Boehm & Royce 1989) Boehm, Barry W. & Royce, Walker; "Ada COCOMO and
the Ada Process Model," Proceedings, Fifth COCOMO Users' Group Meeting,
Software Engineering Institute, 1989.
(Boehm et al 2000) Barry W. Boehm (with C. Abts, A.W. Brown, S. Chulani, B.K.
Clark, E. Horowitz, R. Madachy, D. Reifer, and B. Steece), Software Cost Estimation
with COCOMO II, Prentice Hall, 2000
(Boehm 2006) Boehm, Barry W., “Value-Based Software Engineering: Overview
and Agenda”, Barry Boehm, a chapter in Value-Based Software Engineering, Boehm
(with Biffl, Aurum, Erdogmus, and Grunbacher, editors), Springer, 2006
(Boehm 2006 b) Boehm, Barry W., “A View of 20
th
and 21
st
Century
Software Engineering”, presentation from ICSE 2006 Keynote Address, 2006.
(Boehm 2009) Boehm, Barry et al, “Early Identification of SE-Related Program
Risks”, Final Technical Report A013, Systems Engineering Research Center, 2009
(Boehm 2010) Boehm, Barry W., “Some Future Software Engineering Opportunities
and Challenges”, pre-publication copy, 2010.
(Boehm, Lane, Madachy 2010) Barry Boehm, Barry W; Lane, JoAnn; & Madachy,
Ray, Valuing System Flexibility via Total Ownership Cost Analysis, 2010.
(Brooks 1975) Brooks, Fredrick P., The Mythical Man-Month, Essays on Software
Engineering, Addison-Wessley, 1975.
(Brooks 2010) Brooks, Fredrick P, The Design of Design, Addison Wesley, 2010.
(Campbell & Stanley 1966) Campbell D. & Stanley J., Experimental and quasi-
experimental designs for research, Rand McNally, 1966
231
(Cockburn 2007) Cockburn, Alistair, Agile Software Development, Addison Wesley,
2007.
(Cook & Campbell 1979) Cook, Thomas; & Campbell, Donald T., Quasi-
Experimentation – Design & Analysis Issues for Field Settings, Houghton Mifflin
Company, 1979.
(Creswell 1998) Creswell, John W., Qualitative Inquiry and Research Design, Sage
Publications, 1998.
(Cureton 2010) Cureton, Kenneth, SAE-550 class notes, USC School of Engineering,
2010.
(Curtis et al 2002) Curtis, Bill; William E.,and Miller, Sally A.; The People
Capability Maturity Model: Guidelines for Improving the Workforce, Addison
Wesley, 2002.
(Demarco & Lister 1987) DeMarco, Tom; and Lister, Timothy; Peopleware, Dorset
House Publishing, 1987.
(Dogain & Pellasy 1990) Dogan M. & Pellasy D, How to compare nations: strategies
in comparative politics, Chatham House, 1990
(Dijkstra 1988) Dijkstra, Edsger, “On the Cruelty of Really Teaching Computer
Science”, an open letter, 1988.
(Eisenhardt 1989) Eisenhart, Kathleen M., “Building Theories from Case Study
Research”, The Academy of Management Review, Academy of Management, 1989
(Flower 1996) Flowers, Stephen, Software Failure: Management Failure, John Wiley
and Sons, 1996
(Flyvberg 2006) Flyvbjerg, Bent, Five Misunderstandings About Case Study
Research, Qualitative Inquiry vol 12 no 2, April 2006.
(Giddens 1982) Giddens, A., Profiles and Critiques in Social Theory, University of
California Press, 1982.
(Glass 2001) Glass, Robert L., ComputingFailure.com, Prentice Hall, 2001
(Humphrey 1995) Humphrey, Watts, A Discipline for Software Engineering, Addison
Wesley, 1995.
(INCOSE 2007) INCOSE, Systems Engineering Handbook, August 2007
232
(Jackson & Hines 2009) Jackson, Scott, & Hines, James, lecture notes for SAE-541,
University of Southern California, 2009
(Johnson 2006) Johnson, Jim, My Life Is Failure: 100 Things You Should Know to Be
a Better Project Leader, Standish Group International, 2006.
(Juran 1951) Juran, J. M., Quality-control handbook, McGraw-Hill, 1951
(Kardos 1979) Kardos, “Engineering Cases in the Classroom”, National Conferences
on Engineering Case Studies, March 1979.
(Kazeef 2009) Kazeef, Michael, class notes for ISE-517, spring 2009.
(Kidler & Judd 1994) Kidler and Judd, 1986
(Kuper & Kuper 1985) Kuper A. and Kuper J. (editors), The social science
encyclopedia, Routledge, 1985
(Madni 2008) Madni, Azad M., “Towards a Conceptual Framework for Resilience
Engineering”, Accepted for publication in the IEEE Systems Journal, Special Issue in
Resilience Engineering, 2008.
(Miler & Dingwall 1997) Miler, Gale & Dingwall, Robert, Context & Method in
Qualitative Research, Sage Publications, 1997
(Neal 1962) Neal, Roy, Ace in the Hole, Doubleday & Company, 1962.
(Northrop 2008) Northrop Grumman Mission Systems, Engineering for Integration
Guidance, 2008
(Northrop 2009) Northrop Grumman Mission Systems, Standard Process Manual,
revision of April 2009
(Northrop 2010) Northrop Grumman Systems Engineering Handbook, CTM-101,
“Tenets of program success”, 2008
(OASIS 2007) OASIS, Web Services Business Process Execution Language Version
2.0, Organization for the Advancement of Structured Information Standards, 2007
(OMG various dates) Object Management Group (OMG), CORBA middleware
specifications
(http://www.omg.org/technology/documents/spec_catalog.htm#Middleware), various
dates
233
(OPTEC 1994) OPTEC (U.S. Department of Defense Operational Training Agency),
Forward-Areaf Air Defense Command-Control-and-Intelligence System – Report
from system operational testing, 1994.
(Perrow 1999) Perrow, Charles, Normal Accidents, Princeton University Press, 1999.
(PMI 2004) Project Management Institute, A Guide to the Project Management Body
of Knowledge, ANSI standard ANSI/PMI 99-001-2004, Project Management
Institute, 2004.
(Ramo & Booton 1984), Ramo, Simon & Booton, Richard, “The Development of
Systems Engineering”, IEEE Transactions on Aerospace and Electronics Systems,
July 1984.
(Ramo and St. Clair, 1998) Ramo, Simon; and St. Clair, Robin, The Systems
Approach, KNI, 1998.
(Rechtin 1991) Rechtin, Eberhardt; Systems Architecting, Prentice Hall, 1991
(Royce 1998) Royce, Walker, Software Project Management: A Unified Framework,
Addison-Wesley, 1998. Appendix D is about CCPDS-R, one of my cases.
(Shepard & Green 2003) Shepard, Jon and Green, Robert, Sociology and You,
Glencoe McGraw-Hill, 2003.
(Shull et al 2002) Shull, Forrest; Basili, Vic; Boehm, Barry, Brown, A. Winsor; and
others; “What We Have Learned About Fighting Defects”, Proceedings of the Eighth
IEEE Symposium on Software Metrics, 2002.
(Siegel 1993 a) Siegel, Neil, “BM/C3I software technology”, a conference
presentation, January 1993.
(Siegel 1993 b) Siegel, Neil, “Lessons-Learned with Ada in Building the Forward-
Area Air Defense Command, Control, and Intelligence System”, a chapter in Ada:
Lessons Learned in Development and Management, TRW, February 1993
(Siegel et al 1993) Siegel, Neil; Bebb, Joan; Royce, Walker; and Andres, Don, “Ada
and the Management of Transition”, appeared in TRW Technology Series Journal,
TRW-TS-93-01, February 1993.
(Siegel et al 1993 b) Siegel, Neil; Bebb, Joan; Royce, Walker; Royce, Winston;
Andres, Don; Gerhardt, “Ada: Lessons Learned in Development and Management”.
Briefing given in multiple forums, in the U.S., Holland, Belgium, and France.
January / February 1993.
234
(Siegel 1994) Siegel, Neil, "A Manager's Perspective on the Benefits of Ada", TRW
Data Technologies Division Technical Notes Series, 1994.
(Siegel 2002) Siegel, Neil, “Digitization of the Battlefield”, a chapter in the book
Fateful Lightning, Perspectives on IT in Defense Transformation, ITAA, 2002.
(Siegel 2009 a) Siegel, Neil, “Architecting a System for Flexibility: A case study of
the U.S. Army’s Forward-Area Air Defense Command-Control-and-Intelligence
(FAAD C
2
I) system”, summer 2009
(Siegel 2009 b) Siegel, Neil, “Defense of the Free World” (the editors chose the
chapter title, not me!), a chapter in the book Beautiful Teams, O’Reilly, 2009.
(Siegel 2010) Siegel, Neil, “An Engineering Career in Private Industry”, lecture to the
USC undergraduate honors colloquium, March 2010
(TRADOC 1998) TRADOC (U.S. Army Training and Doctrine Command), FBCB2
Limited User Test – Interim Test Report, 1998
(TRW 1976) TRW Incorporated, Systems Engineering & Integration Division
Software Development Standards (training manual), TRW, 1976
(TRW 1989) TRW, Ada Process Model, TRW Systems Engineering and
Development Division, 1989
(TRW 1994) TRW, Army Systems Organization Engineering Process Document,
1994
(DoD 2001) U.S. Department of Defense, Systems Engineering Fundamentals,
Defense Acquisition University Press, January 2001
(Walton 1992) Walton, J, “Making the Theoretical Case”, From Ragin and Becker
(editors) What is a Case? Exploring the Foundations of Social Inquiry, Cambridge
University Press, 1992.
(Webb & Webb 1932) Webb S. and Webb B., Methods of Social Science, Cambridge
University Press, 1932 (re-issued 1975).
(Wheeler 2000) Wheeler, Donald J., Understanding Variation, 2
nd
edition, SPC Press,
2000.
(Wired 2008) Wired.com, “Rogue Satellite’s Rotten, $10 Billion Legacy”,
http://www.wired.com/dangerroom/2008/02/that-satellite/#ixzz0xrJjyrs2, 2008
235
(Yin 1994) Yin, Robert. Case Study Research: Design and Methods, SAGE
Publications, 1994 (2
nd
edition), 2002 (3
rd
edition), and 2009 (4
th
edition).
(DODAF 2010) U.S. Department of Defense, DoD Architecture Reference
Framework, version II, 2010.
236
Appendix A: Provenance of the data
In the following, I document the provenance of the data. This appears to establish
that the data are reasonable to use for the purposes of this study.
The data selected were all part of formal contractual document deliveries to the
U.S. Government (much of it delivered on a monthly basis), and were therefore covered
by the legal certification of accuracy for delivered data required by the contract. Since
these data were covered by this certification, a formal and controlled process for
collection, validation, and analysis of these data was established and maintained by the
projects, operated by professionals with suitable training.
Based on information contained in the relevant project plans (e.g., the project
management plan, the project configuration management plan, etc.), the process for
collecting defect data was as follows:
Configuration-controlled baselines of elements of the system, whether
software, hardware, or mixtures, were established through a project
configuration control board, with assistance from configuration management
professionals. This allowed project personnel to be certain that they were
operating on the intended versions of project elements, for example, when
performing systems integration or conducting systems testing.
System test procedures and test data files were also baselined through the
project configuration control board.
A particular test team was assigned, through the project work-authorization
process (in essence, a mini-contract by which the project manager assigns
237
work, responsibility, budget, and schedule to individuals through the project
work breakdown structure), to perform a test task. The project work
authorization would identify (a) the element to be tested (e.g., point at a
particular piece of configuration-controlled software and/or equipment) (b) the
test documentation to be employed (e.g., test plans, test procedures, test data,
test result collection instruments, etc.), all of which are also under
configuration control (c) a start date (d) a set of physical resources (e.g.,
computers, laboratories, test and measurement equipment, etc).
The assigned test team would execute the test in accordance with the test
procedures. Any off-nominal behavior would be documented on a problem
report form.
Periodically (nominally weekly, but during certain test periods, more
frequently, even daily) all new problem reports would be brought to the
configuration control board, which would disposition them; generally, such
new problem reports were either (a) accepted and entered into the project
records as a new / open problem report, or (b) the problem report would be
sent back to the originator, together with a request for additional information.
This latter course of action was rare, and limited to situations with incomplete
data. Problem reports accepted would be assigned a severity code; the
originator can propose a severity code, but the configuration control board has
the final determination, so as to ensure consistency across the project.
238
The configuration control board secretariat keeps formal records of the
proceedings, including copies of the problem reports themselves. The
problem report is now a part of the permanent project records.
As the problem is analyzed and corrected, additional data are brought to the
configuration control board, and entered into the record for each problem
report.
Eventually, the problem is corrected, and through a formal re-test process, the
correction accepted as valid. This information, too, is brought to the
configuration control board, and logged into the official project records.
On a monthly basis, a snapshot of problem reports status was created by the
configuration management team, using the official configuration control board
records. This snapshot would show current (and recent, e.g., trend) statistics
for problem reports, such as total open in each severity category, and so forth.
These data became a part of the official monthly project status report prepared
under the project manager’s signature, and delivered as a formal contractual
deliverable.
The above processes were documented in a formal project configuration
management plan. The persons responsible for implementing these processes
received formal training in these matters. Compliance with these established
procedures was verified through periodic audits by outside-of-the-project
company professionals; such audits included by examination of artifacts, and
interviews with personnel.
239
Periodically, the project manager would ask the configuration management
team to assemble additional representations of these data. For example, the
raw counts of open problem reports per severity category would be converted
to defect densities (e.g., defects per lines of software, defects per part, and so
forth), and plotted over time.
The defect data used for this study are drawn from the monthly project status
reports, from the data bases maintained by project configuration management
that collected all of these data into permanent records, and from these
occasional additional representations created by the configuration
management professionals.
The process for collecting cost and schedule data was as follows:
The project maintained (a) a baseline budget and schedule (e.g., the budgets
and schedules to which the team was working) (b) a projected budget and
schedule (e.g., the budgets and schedules that formed the latest estimate of
what would, in fact, be required to accomplish the designated work).
The project also, at least monthly, performed a comparison of the latest
estimate to the baseline; these compared the “earned value” of the work
performed to the actual cost of the work performed (cost variance), and also
the “earned value” of the work performed to the amount of work planned
(schedule variance).
Each responsible cost-account manager (these people are responsible for both
cost and schedule performance for their designated portions of the work) had
240
to provide a written variance analysis explaining any divergence beyond
established thresholds. The project manager also conducted face-to-face
meetings on a monthly basis to understand the variance analyses.
The sequence used as was follows: (a) the accounting month closed (b) each
cost-account manager was notified of that fact, and asked to take earned value
for the previous month, for every task under their cognizance, including
variance analyses for those above the designated thresholds (c) the project
manager conducted the variance-analysis reviews with the cost-account
managers who had to prepare variance analyses (d) the earned-value
determinations were provided to the project business manager, who then
collated all earned-value determinations into a total assessment of earned
value for the project, along every “leg” of the project work breakdown
structure (e) the earned value assessments were input into a tool which
incorporated them into the project activity network (f) this tool then made an
updated prediction of project schedule, by adjusting all of the task durations
affected by the new assessment of earned value (g) the schedule implications
were provided to the project manager, who chaired a meeting with all project
leaders to assess the schedule implications, and make determinations about
corrective actions (for example, releasing project management reserve to
perform tasks in parallel that had been planned to be performed in a serial
fashion).
241
All of the data used in the above was controlled by the designated project
professionals (e.g., business office and scheduling manager), and formed a
part of the project’s configuration controlled data baseline. These data were
also used to make formal monthly reports to the Government, and also to
make formal financial billings to the Government.
Periodically, the project manager would ask the business office and/or the
project scheduler to assemble additional representations of these data. For
example, the Government may have used multiple contract vehicles to
purchase what would eventually form a single fielded capability. So it would
be useful to combine data from separate contractual delivery orders (which
had to be reported separately in the monthly reports to the Government) to
form a more coherent view of the cost of delivered capabilities.
The defect data used for this study are drawn from the monthly project status
reports, from the data bases maintained by project configuration management
that collected all of these data into permanent records, and from these
occasional additional representations created by the business office and
scheduling professionals.
The project used a formal, written, Government-approved process called
“Earned Value Management System” to perform the above. Every cost-
account manager received formal training in this process. Compliance with
the process was periodically audited by outside company and Government
242
professionals; such audits included by examination of artifacts, and interviews
with personnel.
243
Appendix B: The Forward-Area Air Defense Command-Control-
and-Intelligence System
59
The U.S. Army’s Forward-Area Air Defense Command-Control-and-
Intelligence system (usually, and hereafter, abbreviated “FAAD C
2
I”) entered operational
military service in 1993
60
, and is still in use. It has total responsibility for protecting U.S.
military land forces (and civilian personnel and structures in the areas near U.S. military
land forces) against threats that travel through the air, engaging these threats at ranges up
to about 6 kilometers. Such threats include enemy fighter aircraft, enemy helicopters,
short-range rockets, artillery shells, and mortars; this system, therefore, protects against
threats that might be used by either nation-state or non-nation-state adversaries. Figure
59
A version of this section originally appeared in (Siegel 2009 b).
60
“FIRST FIGHTING UNIT EQUIPPED WITH AIR DEFENSE SYSTEM”, a press release by the TRW Corporation,
30 September 1993.
Figure B‐01. The Forward‐Area Air Defense System.
244
B-01 (taken from
61
) does not (unfortunately) distinguish between elements of the system
and external interfacing elements, so the following information is provided: In the figure,
the AWACS (“airborne warning and control system”) and E2C HAWKEYE (not an
acronym) are external aircraft-based systems that are used as cueing sensors. The
PATRIOT (“phased-array tracking and intercept of targets”) is an external system that
performs longer-range air defense missions, primarily against ballistic missiles. The
Sentinel is one of two FAAD C
2
I “organic” radar sensors (the other, the Light and
Special Division Interim Sensor, LSDIS, is not shown in this picture). Avenger,
SLAMRAAM (“surface-launched AMRAAM”), MANPADS (“man-portable air defense
system”), and LINEBACKER (not an acronym) are the FAAD C
2
I “organic” weapons;
Avenger and Linebacker operate on-the-move. The Battery CP, the Sensor C2, and the
A2C2 (“Army airspace command-and-control”) / ABMOC (“air battle management
operations center”) are the command posts that form the “brains” and command interface
for the FAAD C
2
I system. JDN (“joint distribution network”), SINCGARS (“single-
channel ground and air radio system”), EPLRS (“enhanced position-location reporting
system”), and MSE (“mobile subscriber equipment”) are the tactical radios used by the
FAAD C
2
I system to implement internal and external tactical communications. ABCS
are other “Army battle command systems” with which FAAD C
2
I must inter-operate; of
most interest is AMDWS, the air and missile defense work-station.
61
Official U.S. Army web‐site: http://peoc3t.monmouth.army.mil/cram/pdfs/FAAD%20C2.pdf
245
The author was the chief engineer for this program from 1989 through 1992, acting
program manager at various times in 1992, and had direct management responsibility for
the system for many years thereafter. He is a co-holder of key patents that pertain to the
program. OV-1 and OV-2 representations (developed by the author) of the system are
provided below.
OV-1 diagram: Figure B-02 provides an OV-1 representation of the FAAD C
2
I
system. The acronyms and other terminology are explained above.
OV-2 diagram: This diagram (Figure B-03) depicts the top-level flow sequence
in and out of the system, and within and amongst the top-level elements within the
system. The numbered arrows depict a typical sequence of operation, in order, for the
Figure B‐02. OV‐1 diagram for FAAD C
2
I.
246
main system activities and threads. Threads of lesser importance are depicted without
numbers. Acronyms and terms are explained above.
This system has undergone a series of adaptations in the nearly 20 years since it
entered active service, as described below:
The original version defended the area over and around a U.S. Army heavy
division (a heavy division is one that is equipped with heavily-armored
vehicles, such as tanks) from the threat posed by high-performance fixed-wing
aircraft (e.g., Soviet fighters) and advanced rotary-wing aircraft (e.g., Soviet
helicopters). The weapons controlled were primarily missiles, with the
Figure B‐03. OV‐2 diagram for FAAD C
2
I.
247
secondary role of also controlling kinetic rounds fired from tanks
62
. This
system has been deployed into active operational use.
In 1990, in light of the dissolution of the Soviet Union, the initial focus of the
program was moved from support of the U.S. Army’s “heavy” divisions to its
“light” divisions, in the expectation that small, air-deployable units would be
more important in the immediate future. Light divisions consist of equipment
low enough in weight to be carried by helicopter. The result was that while
the mission did not change much, the equipment that we were to use and
support changed significantly, e.g., smaller and lighter (but less capable)
radars, lighter but less capable tactical radios, smaller computers, more
emphasis on hand-held weapons, all mounted in smaller vehicles, and
provisioned in a manner that had to survive transport by helicopter (which
includes being dropped from modest heights). This system has been deployed
into active operational use, and, in fact, was the first version of the system to
be deployed (the original heavy-division version coming into operational
service a year or two later).
In the mid-1990’s, a prototype was built wherein this system was adapted to
control a high-power chemical laser – in fact, it became the world’s first
complete end-to-end laser weapon. A very successful test and demonstration
program was completed, but thus far this version has not seen operational use.
62
U.S. Army “DESCRIPTION AND SPECIFICATIONS for FAAD C2I”, U.S. Army PEO C3I, Program Manager Air
and Missile Defense Command and Control Systems, available from an official U.S. Army web site:
http://peoc3t.monmouth.army.mil/cram/pdfs/FAAD%20C2.pdf. Not dated; probably from about 2002.
248
The use of the laser, rather than the use of missiles, introduced a large number
of significant changes to the system which had to be accommodated. For
example, the beam of the laser did not “stop” at the limit of our engagement
range; it could damage aircraft, and even damage satellites, so in addition to
avoiding aircraft other than the target, the system had to incorporate a real-
time catalogue of every known space object, and conduct the engagements in
such a way as to ensure that the beam was kept a certain angular distance from
all space objects. The beam moved at the speed-of-light, so the planning for
engagements differed in not having to account for missile “fly-out” time. The
laser’s optical aperture could not be pointed at the sun. Automatic pairing of
the weapon to targets was introduced.
The missile version was also adapted to control beyond-line-of-sight
engagements. The original version was intended to operate under rules-of-
engagement that allowed the soldier to launch a missile only after obtaining a
positive visual identification of the threat object, thereby ensuring that it was
not a “friendly” aircraft; this required limiting engagements to those targets
within line-of-sight of the missileer. Clearly, extending the system to engage
at longer, beyond-line-of-sight ranges could not require such a visual
identification; different criteria to determine when a target was “engageable”
were utilized, which, again, introduced an additional set of changes to the
system which had to be accommodated.
249
After more than 10 years of successful operational use, the threat changed in
emphasis away from helicopters and fixed-wing aircraft, to mortars and small
artillery rockets / artillery shells. These are threats for which at the time there
was no technical defense. In response to this need, we adapted the system to
two distinct versions: (1) a passive defense version (called “Sense-and-
Warn”), which has been fielded and proven highly effective in contexts where
the personnel were military, and hence, could be trained to take certain
specified actions upon seeing and/or hearing a fine-grained, location-based
warning, and (2) an active defense version (usually termed “Counter-
Rocket/Artillery/Mortar Command-and-Control”, abbreviated “C-RAM C2”)
which used a Gattling gun
63
actually to destroy the mortars and/or artillery
rockets / shells while they were in flight. Destroying the objects in-flight
would allow this version to protect people in locations where the personnel
included civilians, e.g., locations where it would be unlikely that the personnel
could be trained to take specified actions upon an attack. The active version
also can protect facilities. One disadvantage of the active version is that it
introduced the potential for “collateral damage”, e.g., bullets from the Gattling
gun returning to the surface of the Earth. The Gattling gun is also expensive,
and requires extensive maintenance after very short periods of firing. The
Army plans in a few years to replace the Gattling gun with a small, purpose-
63
A photo of this Gattling gun is available an official U.S. Army web site:
http://peoc3t.monmouth.army.mil/cram/cram.html
250
built missile, and in the long run, to supplement that missile with an electric
laser. These two versions of the system, passive and active, are extensively
deployed in active combat, and at the time of writing, are used in the active
combat every day, are very highly regarded by their users, and are credited
with saving many lives and property.
251
Appendix C: The Force-XXI Battle Command, Brigade-and-
Below System
Force XXI Battle Command Brigade-and-Below (usually, and hereafter,
“FBCB2”) is the U.S. Army’s principal combat battle command system. It provides
command-and-control, situational awareness, logistics management, and other
functionality to front-line combat forces in all Army branches (armor, artillery, aviation,
infantry, intelligence, combat service support, air defense, etc.)
64
. In early 2003, the
system was adopted informally by the U.S Marine Corps, and as of 2009, the system was
adopted formally by the Marine Corps for usage similar to the way it is used by the
Army. It is also sometimes known as the “Blue-Force Tracking System”, or “Blue-Force
Tracker”, BFT, or FBCB2 / BFT.
FBCB2 is the centerpiece of the Army’s effort to “digitize the battlefield”, by
which they mean the significant increase of land combat force effectiveness through the
introduction of information technology directly onto the battlefield, in every type and
class of U.S. combat platform. Consider a highly-successful program
65
, it is formally
credited in Government documentation with significantly increasing U.S. Army combat
effectiveness
66
, and for saving hundreds of U.S. soldier, U.S. Marine, and U.K. soldier
lives through its use in the Balkans, Iraq, and Afghanistan
67
.
64
http://peoc3t.monmouth.army.mil/fbcb2/fbcb2.html
65
See, for example, “Considerations for an Affordable LandWarNet”, Army Science Board,18 July 2007.
66
U.S. Army Limited User Test of FBCB2, 1998
67
See, for example, the official U.S. Army lessons‐learned report from heavy‐force combat in Iraq, spring
2003
252
Based on this success, it has been designated by the U.S. Army and the U.S.
Marine Corps to be the basis for their tactical battle-command automation improvement
plans for the next 20 years
68
.
The military impact of FBCB2 has been profound, bringing “disruptive”-class
improvements to the U.S. and allied ground combat team. Large-scale force-on-force
exercises, and analysis by the US Army Training and Doctrine Command, have
demonstrated improvements in overall force effectiveness of more than double; an
August 1998 free-play force-on-force exercise at Ft. Hood, Texas, resulted in outcomes
more in the 3-to-1 to 4-to-1 range. The technology has also proved itself through its use
on tens of thousands of vehicles in actual combat operations (in Iraq and Afghanistan),
and in peacekeeping operations (in Bosnia and Kosovo), resulting in many lives saved,
combat operations improved, and a long list of commander citations that “Blue Force
Tracking” was one of the most decisive new military technologies of our times.
Key technologies within the system include:
The force-structure-aware network (a communications network that is
empowered by continuously-updated, deep knowledge of the military force
structure that it is supporting) – and the revolutionary network management
mechanism that it enables – is the basis for achieving reliable wireless
communications interconnectivity of thousands of mobile platforms with low-cost
(and unreliable) radios, and most importantly, without any fixed infrastructure
(e.g., no cell towers, fixed-site relays, etc.).
68
U.S Army / U.S. Marine Corps memorandum‐of‐understanding, 2006
253
New communications protocols, including unicast routing that finds the route at
the time of the service request (rather than based on cached data, which is how the
Internet works), reliable multicast that works over low-bandwidth
communications, quality-of-service provision over communications bearers not
designed to provide QoS, and IP header compression mechanisms, all of which
ride over standard IP mechanisms, and interoperate with commercial protocols
(e.g., TCP, OSPF, BGP-4, etc.) at the edge of the mobile network.
New techniques for ruggedizing commercial electronics to operate in combat
platforms, that led to a 6x decrease in the cost of such items.
New techniques of partitioning large software developments into work segments
that meet the distribution of talent in real teams, and to the techniques for
decreasing the risk of software integration through design strategies that separate
functional applications from the elements that control dynamic software behavior.
This has resulted in significant improvements in the predictability of achieving
on-time software deliveries.
About 80,000 units have been deployed to date of this system. The US Army
plans to continue to develop and field the various versions: Force XXI Battle Command
Brigade-and-Below (original version), Blue-Force Tracking system (version for combat
in Iraq and Afghanistan, developed by the U.S. Army and also loaned to the U.S. Marines
and the U.K. Army); Joint Battle Command Platform (future version, formal joint U.S.
Army / U.S. Marine program-of-record). The U.S. military plans to acquire more than
254
200,000 of these systems, at a total life-cycle cost of more than $10,000,000,000 ($10
billion).
The key processing at the system level involves interaction by the commander of
the FBCB2-equipped unit with higher military authority (who establishes operational
orders for this unit), coordination with peer FBCB2 systems (which are operating in
support of near-by tactical units, such as an adjacent brigade combat team), interactions
with other battlefield automation systems (which perform specialized functions such as
artillery ballistic calculations), battlefield sensors of various sorts, the personnel that
operate this FBCB2 system (who are present on every tank, jeep, helicopter, fighting
vehicle), the world-wide FBCB2 management structure (implemented in a network
operations center), and the enemy (who may be observed and reported, and potentially at
some point, is engaged in a military action).
The key process at the individual node level involves several threads. One thread
involves developing local information that will be shared across the combat team (e.g.,
own position, own status, locally-obtained knowledge of the enemy such as data on
enemy position derived from a laser-range finder, locally-developed assessment of enemy
intentions, locally-developed information regarding geographic entities such as reports of
bridges that are damaged, contaminated areas, minefields, and so forth), and then
applying various algorithmic rules regarding how frequently such data are to be shared
(intended to manage the loading on the communications networks). Another thread
involves obtaining information from the network and turning it into a locally-tailorable
“common operational picture”; all displays in the same vicinity have the same
255
information available, but each vehicle commander can tailor the way these data are
organized and displayed (“user-defined operational picture”). Default tailorings of these
settings are provided, based on the role of that vehicle within the combat team (“role-
based processing”), so as to minimize the work-load induced on the users, who all have
“full-time” combat roles, and usually cannot (or elect not to) spend a lot of time tailoring
the system. A further set of threads involve requests from this platform for support of
various sorts from off-platform resources, such as call-for-fire, call-for-support, requests
for re-supply, and so forth. Some of these are generated automatically by the system, and
some can be generated by the operator. Additional threads interact with weapons and
other on-board systems, including specialized sensors.
The following photographs illustrate some of the aspects of the FBCB2 system.
Figure C-01 depicts the FBCB2 computer mounted at the commander’s position in an
M2A3 Bradley Infantry Fighting Vehicle. In some variants of the Bradley, there is also
an FBCB2 computer mounted in the squad area in the back of the vehicle. The
installation has demanding human-factors, safety, size, power, heat dissipation, and other
considerations. FBCB2 is similarly integrated into most of the 50 or so types of vehicles
operated at the tactical echelons by the U.S. Army and U.S. Marine Corps.
Figure C-02 depicts an actual screen image from an FBCB2 display captured
during an unclassified test in 1998. It provides a moving real-time “situational
awareness” display. Interestingly, after extensive testing of alternatives with users, it was
decided to operate the system with north as the top of the screen, rather than have the
map rotate to the cardinal points (although the software can do that); this shows the value
256
of intensive user engineering, as the ability of the screen to rotate automatically to the
cardinal points is a key and valued feature of the FAAD C
2
I gunner’s terminal
69
, but the
same capability was deemed a distraction in this system. The screen is about 11” in
diagonal measure, and is generally operated via touch (either with the fingers, or with a
provided stylus – fingers don’t work well if you are wearing arctic mittens or chemical-
protection gloves. And all 12 years before the iPad!). The “touch areas” at the right are
designed to be large enough to be operated with the vehicle on-the-move (and hence,
vibrating extensively). The goal was to allow complex messages (e.g., call-for-fire) to be
composed on-the-move in fewer than 10 seconds.
Figure C-03 lists the nomenclature used by the Army user community for
describing the critical functions provided by FBCB2, e.g., the FBCB2 operational value
proposition.
69
I am one of the co‐holders of that patent, which pre‐dates usage by Garmin, Apple, and others of that
feature
Figure C‐01.
FBCB2 computer mounted at the
commander’s position in a Bradley Infantry
Fighting Vehicle.
Figure C‐02. FBCB2 screen image.
257
The display image provided in Figure C-04 shows the capability of the software
used during real-time operational engagements. FBCB2 also provides planning and
command (operations order) capability. Figure C-04 depicts an example of a planning
display image.
FBCB2 was deployed to Kuwait, Iraq, and Afghanistan at the end of 2002, and
was the primary command-and-control capability of U.S. and Allied forces during heavy
combat operations in Iraq in 2003. In mid-2002, a senior U.S. Army officer approached
Figure C‐04. Planning display image.
Figure C‐03.
Critical FBCB2 user functions.
258
me, told me that the Army had been directed to prepare for potential combat in Iraq, and
requested that I be the principal information architecture for the land combat team’s
operations. I provided supervision and guidance to the team that developed the technical
and fielding approach for this capability, which centers around a dual-mode deployment
of FBCB2, one using military line-of-sight radios for the heavy divisions, and using L-
band satellite communications for the remaining divisions (as well as for the U.S. Marine
Corps and U.K. Army). These two networks operated at different security levels,
requiring a multi-level secure network operations center (depicted in the right-hand side
of the Figure), and including a bi-directional fully-automated secure gateway that passed
high-speed message traffic both directions between the networks, while being trusted to
do so only in accordance with prevailing security rules; this was the world’s first large-
scale deployment of a multi-level secure tactical war-fighting system. The system also
included extensive security features at the platform / device level, to deal with the
likelihood of captured devices, and so forth.
In addition to being used in mobile tactical platforms (and in a hand-held variant
not discussed herein), FBCB2 is also used in command posts. While the numbers are
relatively small (e.g., about 1,000 in command posts, versus 80,000 in tactical vehicles,
with a goal of 200,000 in tactical vehicles), these command-post installations provide
mid-level and senior-level commanders with their only real-time view of the battlespace,
and therefore, is highly prized. In most command posts, the software occupying the
“place of honor” on the central large-screen display is FBCB2. In addition, FBCB2 is
259
used in every mobile (“on-the-pause”) command post; generally, a truck or HMMWV
70
carrying a shelter, or a helicopter with a command installation in the back. Figure C-05
shows FBCB2 in a HMMWV-based command post.
(The above description of the FBCB2 system was cleared for public release by the
Army FBCB2 project office and the U.S. Army Security Office at the Aberdeen Proving
Grounds, 1 April 2011.)
70
High Mobility, Multipurpose Wheeled Vehicle; today’s version of the WW‐II Jeep.
Figure C‐05.
FBCB2 in a mobile command post.
260
Appendix D: Partitioning into skill bins employed on two
exemplar projects
D.1 On the FAAD C
2
I project
I defined an initial skill partitioning bin, called herein “set A”: Those program
personnel who were authorized to incorporate design features that initiate concurrency in
mission processing threads, versus those program personnel who were not authorized to
incorporate design features that initiate concurrency in mission processing threads
I created this partitioning because I had seen a recent program get into trouble
71
by allowing too many of the staff make decisions about initiating and controlling
concurrency in system processing threads (resulting in poor response times, anomalous
behavior, and very low system reliability), and decided that this was a high-leverage
design point through which suitable controls could keep a design “out of trouble”. There
are various system-stimuli and/or hardware mechanisms for initiating concurrent
processing (for example, multiple hostile airplanes in the sky at once), but in this system,
software plays a big role in the system processing threads, and the software environment
selected offered multiple independent mechanisms for concurrent operating software,
e.g., operating system mechanisms (e.g., dispatching concurrent UNIX processes),
programming language mechanisms (e.g., Ada language task creation and dispatch),
middleware mechanisms (e.g., business-process logic). I elected to (a) disallow the use
71
I spent several years of my professional career as the “designated fix‐it guy”, e.g., the person who got
called in to fix large projects that had gotten into trouble (evidenced usually first by poor cost and
schedule performance, but almost always determined to be attributable to poor design decisions; not,
by the way, due to “requirements creep”, as is often assumed). This caused me actually to try to form
generalizations about what got such projects into trouble.
261
of the operating system and Ada language mechanisms to implement concurrent
processing (a decision that was easily enforceable via a code auditor), and (b) restrict the
use of the middleware mechanisms to a 2-person team of hand-selected experts.
We then had the systems engineering team build (a) a description of every
mission processing thread within the system, and (b) a description of the structure of the
system – all the way down to every independently-schedulable entity (hardware or
software) in the system. The 2-person team then worked with me to build a control
structure that implemented the threads, and accounted for every independently-
schedulable entity. This control structure was then implemented (via a BPEL
72
-like
script) in the middleware. This enabled us to build a SAS (system-architecture
skeleton)
73
. All software-concurrency within the system was then implemented via this
middleware script . . . which in turn was developed by this two-person team of experts,
configuration-controlled by the regular program configuration-control board, with my
signature required to implement changes. So we had an easily-enforcable mechanism to
partition the work into two broad skill bins – those (2 people out of 300) who were
authorized to implement concurrency in the processing logic of the system, and those
who were not (everyone else). About 10,000 lines of software (plus 5,000 lines of BPEL-
like scripts), were involved in the dispatch, control, and rendezvous of concurrent
processing – in a system that totaled more than 1,000,000 SLOCs. The rest of the team
implemented single-thread, run-to-completion, “straight line” software code.
72
Business Process Execution Language
73
A functional description of the SAS methodology and its benefits is provided in Chapter 2.
262
The goal was to “Pareto-ize” the implementation, e.g., get more than 80% of the
“hard” stuff into less than 20% of the implementation man-months. We actually did far
better than that, putting all of the above (“hard”) work into a 2-person team, on project
that had about 300 people working for us as the prime contractor, and about 1,000 people
across associate contractors and the customer.
I later created an additional skill partitioning bin, called herein “set B”: Those
program personnel who were required to implement “hard real-time” constraints, versus
those program personnel who were not required to implement “hard real-time”
constraints, e.g., who had at most to implement “soft real-time” processing.
Definitions: “hard real-time” involves incorporating features and mechanisms to
guarantee proper synchronization of steps and results, and where (in contrast) “soft real-
time” involves at most showing compliance via instrumented measurements with
allocated timing margins.
As we proceeded, we identified only three areas where there were true “hard real-
time” constraints in the system implementation: (a) a small portion of the low-level
graphics driver for the user displays (specifically, displaying real-time moving track
symbology on the displays), (b) getting each output data packet into the right time-slot
for our various time-division-multiplexed, radio-frequency communications
mechanisms
74
, and controlling the motors and sensors that moved the missile-turret on
74
In order to have assurance of port‐to‐port delivery times across the tactical radio network, it was
designed around a set of fixed “time‐slots”, for example, on one of them, every 16.66 ms. A data
packet to be transmitted in a particular time‐slot had to be copied into the radio’s input buffer no later
than a certain margin before the time‐slot commenced, else the time‐slot would be wasted.
263
our on-the-move weapon systems
75
. But these three areas were generating a significant
(and disproportionately high) portion of our problem reports, and were also tending to be
the problems with the highest hours-to-fix allocations.
I decided, therefore, to create a new skill bin to formally and rigorously isolate
this kind of work to appropriately-skilled people. This bin turned out to have 4 or 5
people (one of whom turned out to be one of the people on our 2-person team
implementing Skill Bin A).
D.2 On the FBCB2 / BFT project
I re-used both skill partitioning-bin “Set A” and “Set B”, as defined for FAAD
C
2
I: Set A: Those program personnel who were authorized to incorporate design features
that initiate concurrency in mission processing threads, versus those program personnel
who were not authorized to incorporate design features that initiate concurrency in
mission processing threads.
This was similar (and in fact, done with the same middleware product, and even
one of the same personnel) to the corresponding function on FAAD C
2
I.
Set B: Those program personnel who were required to implement “hard real-
time” constraints, versus those program personnel who were not required to implement
75
FAAD C
2
I had a system requirement called “slew‐to‐cue”, wherein when the gunner selected a target
that the system had determined was engageable under the current rules of engagement, while the
vehicle is moving, we had to slew and elevate the missile‐turret so that 90% of the time, the designated
target would be in the narrow field‐of‐view of the optical tracker. We had, of course, to compensate in
real‐time for target motion and vehicle motion, while achieving the movement fast enough that the
target was still in the sky – taking too long and losing the shot opportunity counted against the 90%
requirement).
264
“hard real-time” constraints, e.g., who had at most to implement “soft real-time”
processing.
Again, this was similar to the corresponding function on FAAD C
2
I.
Specific FBCB2 examples of “hard real-time” functions: interfaces with vehicle
electronics, e.g., laser range-finders, command-and-control of robots, mapping software,
etc.
Somewhat later, I also created an explicit skill partitioning-bin “Set C”, around
the implementation of automation logic for making decisions about the application of
lethal force: e.g., those program personnel who were authorized to implement elements
of the system that made decisions about the application of lethal force, versus those
program personnel who were not authorized to implement elements of the system that
made decisions about the application of lethal force. This was left implicit within FAAD
C
2
I; I decided that it should be explicit, and made it so for FBCB2.
On FAAD C
2
I, this included the decision about which targets were engagable
under the current rules-of-engagement, and the slew-to-cue processing.
On FBCB2, this included the interaction with combat-identification devices and
the associated “red / green” lights in gunner’s reticles.
265
Appendix E: Vitæ – Neil Gilbert Siegel
Born: 19 February 1954, Brooklyn, New York.
Education: Bachelor (1974) and Master (1976) degrees, mathematics, USC.
Marriage: Met Robyn Christine Friend on 17 May 1974, and married her on 8 July 1979.
Professional Career: Started work at TRW in November 1976. Participated in the start-
up of COMPUNET in 1980. COMPUNET merged with Titan Systems, 1983. Titan
went public, 1986. Returned to TRW, 1988. Appointed a vice-president and officer of
TRW, 1998. Vice-President & Division General Manager, Tactical Systems Division,
1998. Vice-President, Technology, 2001. TRW acquired by Northrop Grumman, 2002.
Vice-President, Technology & Advanced Systems, 2007. Vice-President & Chief
Engineer, 2009.
Principal Honors and Awards:
21 patents.
Member, U.S. National Academy of Engineering.
Fellow, IEEE.
Winner, Simon Ramo medal for systems engineering & systems science.
Member, Honorable Order of Saint Barbara.
Winner, TRW Chairman’s Award for Innovation, 1994, 1996, 1997.
Public Service:
Elected public official, California State Hazard Abatement District, 1998-2010.
Member, board of directors, Institute of Persian Performing Arts, 1986 to present.
Member, board of directors, The Hospice Foundation, 2002 to present.
Abstract (if available)
Abstract
Many of the key products and services that improve the lives of people, and/or are vital to the defense of our Nation, are the result of large-scale engineering projects. Despite decades of theoretical and practical work in the art of systems engineering and project management, project execution results remain somewhat inconsistent, in the sense that many projects fail to produce a product that meets the original specifications, and many more projects achieve some measure of technical success only after taking significantly more time and/or money than originally expected.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
COSYSMO 3.0: an extended, unified cost estimating model for systems engineering
PDF
Impacts of system of system management strategies on system of system capability engineering effort
PDF
Quantifying the impact of requirements volatility on systems engineering effort
PDF
On the synthesis of dynamics and control for complex multibody systems
PDF
Optimizing a lean logistic system and the identification of its breakdown points
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Effectiveness of engineering practices for the acquisition and employment of robotic systems
PDF
The identification, validation, and modeling of critical parameters in lean six sigma implementations
PDF
A system framework for evidence based implementations in a health care organization
PDF
Investigation of health system performance: effects of integrated triple element method of high reliability, patient safety, and care coordination
PDF
Reconfigurable high-speed processing and noise mitigation of optical data
PDF
A framework for comprehensive assessment of resilience and other dimensions of asset management in metropolis-scale transport systems
PDF
Robustness of gradient methods for data-driven decision making
PDF
The next generation of power-system operations: modeling and optimization innovations to mitigate renewable uncertainty
PDF
Dynamic graph analytics for cyber systems security applications
PDF
The projection immersed boundary method for compressible flow and its application in fluid-structure interaction simulations of parachute systems
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
Geometric and dynamical modeling of multiscale neural population activity
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Federated and distributed machine learning at scale: from systems to algorithms to applications
Asset Metadata
Creator
Siegel, Neil Gilbert
(author)
Core Title
Organizing complex projects around critical skills, and the mitigation of risks arising from system dynamic behavior
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
05/20/2011
Defense Date
04/05/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
defect rates,dynamic behavior of systems,error rates,large-scale systems,OAI-PMH Harvest,SAS,system architecture skeleton,system development projects
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry W. (
committee chair
), Madni, Azad (
committee member
), Majchrzak, Ann (
committee member
), Moore, James (
committee member
), Settles, F. Stan (
committee member
)
Creator Email
neil.siegel@ngc.com,siegel.neil@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3950
Unique identifier
UC1338231
Identifier
etd-Siegel-4603 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-482433 (legacy record id),usctheses-m3950 (legacy record id)
Legacy Identifier
etd-Siegel-4603.pdf
Dmrecord
482433
Document Type
Dissertation
Rights
Siegel, Neil Gilbert
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
defect rates
dynamic behavior of systems
error rates
large-scale systems
SAS
system architecture skeleton
system development projects